<h1 align=center><font size = 10>The Battle of Neighborhoods</font></h1>

<h1 align=left><font size = 5>Introduction</font></h1>
<br><br>
An entrepreneur who owns a coffee shop in Flatbush neighborhood (in New York City) wants to open another coffee shop. And he wants to know the best location to do so.<br>
Because his first coffee shop goes quite well, he wants us to open his new shop in a neighborhood quite similar to Flatbush. That way, he expects that similar neighborhood will give similar success for his new coffee shop.<br>
However, he would also like to go in a neighborhood where the life standard is greater or equal to Flatbush, in order to be sure that the quality standard of his coffee shop will be still adequate. <br>
Last, he considers that if there are too many coffee shops in a same area, the business will go bad. So he wants the new neighborhood to have a smaller density of coffee shop than Flatbush.<br>
<br>
So the study we will make is basically to list neighborhoods that are similar to Flatbush, and to plot on a map those neighborhoods where he could open a new coffee shop.<br>
In a second time, we will work on that list to keep only the neighborhoods where the median income is higher than in Flatbush, in order to avoid neighborhood with lower life standard. Last we will identify in this short list the neighborhoods where we could consider there is a a smaller density of coffee shop than in Flatbush.


<h1 align=left><font size = 5>Data to be used and process to answer the question</font></h1>

We will use data on New York city neigborhoods:<br>
We will use the newyork_data.json file of the Module 3 to get the list of neighborhoods and their geographic coordinates.<br>
We will also data from https://geodacenter.github.io/data-and-lab/NYC-Nhood-ACS-2008-12/ to have the population and median income by neighborhood<br>
Because neighborhoods are cited by NTA code, we will also use a table that gives use the correspondance between NTA code and neighborhood name  https://www1.nyc.gov/assets/planning/download/office/data-maps/nyc-population/census2010/nyc2010census_tabulation_equiv.xlsx
<br><br>

For the study, we will proceed as follow :<br><br>
<u>Import libraries</u><br>
we will first import libraries required for analysing and plotting the data<br><br>
<u>Import input datas</u><br>
We will then read the neighborhoods data and import it in a dataframe<br>
We will also read the population and income data and import it in a dataframe<br>
We will then merge those dataframes, and check we don't loose too many neighborhood in the process<br>
As a result of this step, we will have a dataframe with for each neighborhood : its latitude and longitude, its population, the median income<br>
<br>
<u>Get the venues</u><br>
We will then get the venues in each neighborhood through the Foursquare API, and count the number of venues of each type in each neighborhood<br>
The dataframe we will now have will have the number of venues of each type, for each neighborhood<br>
<br>
<u>Cluster the neighborhoods</u><br>
We will then scale the number of venues with the population of the neighborhood, and then work with the number of venues per million inhabitants<br>
We will last cluster the neighborhoods using the number of venues for each type of venue and be able to answer to the first question and plot the required map<br>
<br>
<u>Go further with the list of similar neighborhoods</u><br>
For the second question, we will remove the neighborhoods where the median income is lower than in Flatbush in order to respect the life standard criteria<br>
We then will remove the neighborhoods where there are more coffee shops per million people than in Flatbush, which will respect the second criteria of density of coffee shops<br>
A the end, we will have a short list of neighborhoods respecting all the criterias<br>


***Import libraries***

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files   - will be used for neighborhood lists and location

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests   - will be used to get the venues
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules   -  unnecessary
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage - will be used for clustering
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library   -  will be used for plotting neighborhoods on map

# import beautifulSoup   -  unnecessary
# from bs4 import BeautifulSoup

# import DBF - will be used for the population and income data
# pip install dbfread   # only the first time to install it 
from dbfread import DBF

print('Libraries imported.')

Libraries imported.


***Import input datas***<br>
<u>First step</u> : import the list of neighborhoods and their latitude/longitudes from the json file. Then plot it on a New York map

In [2]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

neighborhoods_data = newyork_data['features']

neighborhoods = pd.DataFrame(columns=['Borough', 'Neighborhood', 'Latitude', 'Longitude'])
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)


In [3]:
# Show a template of the data created
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [4]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [5]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

<u>Second step</u> : import population and income data<br>
The file is in .dbf format, so we have imported dbfread library at the beginning.
In the website description, they tell us the variable names :
* poptot : total population
* ntacode : Code associated with the Neighborhood Tabulation Area (NTA)
* medianinco : Median household income (In 2012 Inflation Adjusted Dollars)
To get the correspondance between NTA code and neighborhood, we will use the table for nyc.gov.

The idea is to merge the dataframes so that we will combine the informations from the first dataframe (ie latitude and longitude) with the information from this new dataframe (ie population and median income)

In [6]:
table = DBF('NYC_Nhood ACS2008_12.dbf', load=True)

In [7]:
# Check length of the table
len(table)

195

We can see there are less neighborhoods than in the .json file (195 vs 306)<br>
We will go on with that, but we will plot the map with the neigborhood to check we have a good coverage of the city

In [8]:
# create a dataframe and fill it with the NTA, population and income data from table
nyc_population_data = pd.DataFrame(columns=['NTA', 'Total population', 'Median income'])
for i in range(len(table)):
    nta = table.records[i]['ntacode']
    totalpop = table.records[i]['poptot']
    medianinco = table.records[i]['medianinco']
    nyc_population_data = nyc_population_data.append({'NTA': nta,
                                          'Total population': totalpop,
                                          'Median income': medianinco}, ignore_index=True)

In [9]:
# Check the result dataframe
nyc_population_data.head()

Unnamed: 0,NTA,Total population,Median income
0,BK45,48351,1520979
1,BK17,61584,1054259
2,BK61,100130,980637
3,BK90,33155,519058
4,QN23,24199,354073


We will now read the excel file with correspondance between NTA code and neigborhood

In [10]:
# read excel file :
nta_tabuliation = pd.read_excel("nyc2010census_tabulation_equiv.xlsx")
# keep only NTA code and Neighborhood name :
nta_tabuliation=nta_tabuliation[['Code','Name']]
# suppress duplicates, because there are multiple lines for each NTA code
nta_tabuliation=nta_tabuliation.drop_duplicates()
# rename column in order to have same name as in nyc_population_data ; will be necessary for merging them
nta_tabuliation.rename(columns={"Code": "NTA"},inplace=True)
# Check the result dataframe
nta_tabuliation.head()

Unnamed: 0,NTA,Name
0,BX31,Allerton-Pelham Gardens
11,BX05,Bedford Park-Fordham North
22,BX06,Belmont
27,BX07,Bronxdale
35,BX01,Claremont-Bathgate


In [11]:
# merge the dataframes, with the NTA column as the key
nyc_population_data = pd.merge(nyc_population_data, nta_tabuliation, on='NTA')

In [12]:
# Check the result
nyc_population_data.head(10)

Unnamed: 0,NTA,Total population,Median income,Name
0,BK45,48351,1520979,Georgetown-Marine Park-Bergen Beach-Mill Basin
1,BK17,61584,1054259,Sheepshead Bay-Gerritsen Beach-Manhattan Beach
2,BK61,100130,980637,Crown Heights North
3,BK90,33155,519058,East Williamsburg
4,QN23,24199,354073,College Point
5,SI11,24083,342708,Charleston-Richmond Valley-Tottenville
6,QN45,25619,554014,Douglas Manor-Douglaston-Little Neck
7,SI01,28727,521048,Annadale-Huguenot-Prince's Bay-Eltingville
8,SI54,43427,718593,Great Kills
9,BX09,53800,490852,Soundview-Castle Hill-Clason Point-Harding Park


In [13]:
# Rename Name column to Neighborhood, in order to have same column name as in neighborhoods dataframe
nyc_population_data.rename(columns={"Name": "Neighborhood"},inplace=True)
nyc_population_data.head()

Unnamed: 0,NTA,Total population,Median income,Neighborhood
0,BK45,48351,1520979,Georgetown-Marine Park-Bergen Beach-Mill Basin
1,BK17,61584,1054259,Sheepshead Bay-Gerritsen Beach-Manhattan Beach
2,BK61,100130,980637,Crown Heights North
3,BK90,33155,519058,East Williamsburg
4,QN23,24199,354073,College Point


In [14]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [15]:
# merge the dataframe with the first one : will give a dataframe with both latitude / longitude and population and income data
nyc_population_data = pd.merge(nyc_population_data, neighborhoods, on='Neighborhood')
# Check result
nyc_population_data.head()

Unnamed: 0,NTA,Total population,Median income,Neighborhood,Borough,Latitude,Longitude
0,BK90,33155,519058,East Williamsburg,Brooklyn,40.708492,-73.938858
1,QN23,24199,354073,College Point,Queens,40.784903,-73.843045
2,SI54,43427,718593,Great Kills,Staten Island,40.54948,-74.149324
3,QN05,28201,506307,Rosedale,Queens,40.659816,-73.735261
4,BX27,27562,149520,Hunts Point,Bronx,40.80973,-73.883315


In [16]:
# Check dataframe shape
nyc_population_data.shape

(88, 7)

We have lost a lot of neighborhoods, from 306 at the beginning down to 88 now.
We plot the map with the neighborhood to see if we have still a good coverage of the city

In [17]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [18]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(nyc_population_data['Latitude'], nyc_population_data['Longitude'], nyc_population_data['Borough'], nyc_population_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

We will consider we have a good coverage of the city.<br>
If we want to improve the study, we would need to find more complete input data

***Get the venues***<br>
We will use the Foursquare API to get the venues in a radius of 1000 m around the location of the neighborhood

In [19]:
#Define Foursquare credentials
CLIENT_ID = 'L33OJGAD0GA5VKLGQVQXJVHGYURB1LSKG1FBKNW10LQU0A41' # your Foursquare ID
CLIENT_SECRET = 'ZTOVIJBFAX5AJLJWT5SXKANXUY2HUS0G0RPMRJTCMU2TDWCG ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

In [20]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [21]:
# Get venues around eahc neighborhood using the above function
NewYork_venues = getNearbyVenues(names=nyc_population_data['Neighborhood'],
                                   latitudes=nyc_population_data['Latitude'],
                                   longitudes=nyc_population_data['Longitude']
                                  )

In [22]:
# Check the result
print(NewYork_venues.shape)
NewYork_venues.head()

(3477, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,East Williamsburg,40.708492,-73.938858,Dun-Well Doughnuts,40.707429,-73.94026,Donut Shop
1,East Williamsburg,40.708492,-73.938858,Newtown,40.709153,-73.937147,Café
2,East Williamsburg,40.708492,-73.938858,Champs Diner,40.708335,-73.940816,Vegetarian / Vegan Restaurant
3,East Williamsburg,40.708492,-73.938858,The Anchored Inn,40.709243,-73.937182,Dive Bar
4,East Williamsburg,40.708492,-73.938858,The Topaz,40.707327,-73.939754,Cocktail Bar


In [23]:
NewYork_onehot = pd.get_dummies(NewYork_venues[['Venue Category']], prefix="", prefix_sep="")
NewYork_onehot['Neighborhood'] = NewYork_venues['Neighborhood'] 
fixed_columns = [NewYork_onehot.columns[-1]] + list(NewYork_onehot.columns[:-1])
NewYork_onehot = NewYork_onehot[fixed_columns]


print(NewYork_onehot.shape)
NewYork_onehot.head()

(3477, 321)


Unnamed: 0,Yoga Studio,Accessories Store,African Restaurant,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Bath House,Beach,Beer Bar,Beer Garden,Beer Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Board Shop,Bookstore,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Buffet,Building,Burger Joint,Burmese Restaurant,Burrito Place,Bus Line,Bus Station,Bus Stop,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Casino,Caucasian Restaurant,Check Cashing Service,Cheese Shop,Child Care Service,Chinese Restaurant,Church,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Bookstore,College Cafeteria,College Theater,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Cuban Restaurant,Cupcake Shop,Cycle Studio,Dance Studio,Daycare,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Service,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Halal Restaurant,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Insurance Office,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Kosher Restaurant,Lake,Latin American Restaurant,Laundromat,Library,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts School,Massage Studio,Mattress Store,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Moroccan Restaurant,Motel,Motorcycle Shop,Movie Theater,Moving Target,Museum,Music School,Music Store,Music Venue,Nail Salon,Neighborhood,New American Restaurant,Newsstand,Nightclub,Non-Profit,Noodle House,North Indian Restaurant,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Nightlife,Other Repair Shop,Outdoor Sculpture,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Photography Studio,Piano Bar,Pie Shop,Pier,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Pool Hall,Post Office,Pub,Puerto Rican Restaurant,Ramen Restaurant,Record Shop,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Resort,Restaurant,River,Rock Club,Roof Deck,Russian Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Social Club,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Street Art,Supermarket,Supplement Shop,Surf Spot,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tapas Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Tibetan Restaurant,Tiki Bar,Toy / Game Store,Track,Trail,Train,Train Station,Turkish Restaurant,Used Bookstore,Vape Store,Varenyky restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Waste Facility,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,East Williamsburg,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,East Williamsburg,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,East Williamsburg,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,East Williamsburg,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,East Williamsburg,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [24]:
# grouping rows by neighborhood and sum of occurencies
NewYork_grouped = NewYork_onehot.groupby('Neighborhood').sum().reset_index()
NewYork_grouped.head()

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,African Restaurant,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Bath House,Beach,Beer Bar,Beer Garden,Beer Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Board Shop,Bookstore,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Buffet,Building,Burger Joint,Burmese Restaurant,Burrito Place,Bus Line,Bus Station,Bus Stop,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Casino,Caucasian Restaurant,Check Cashing Service,Cheese Shop,Child Care Service,Chinese Restaurant,Church,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Bookstore,College Cafeteria,College Theater,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Cuban Restaurant,Cupcake Shop,Cycle Studio,Dance Studio,Daycare,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Service,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Halal Restaurant,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Insurance Office,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Kosher Restaurant,Lake,Latin American Restaurant,Laundromat,Library,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts School,Massage Studio,Mattress Store,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Moroccan Restaurant,Motel,Motorcycle Shop,Movie Theater,Moving Target,Museum,Music School,Music Store,Music Venue,Nail Salon,New American Restaurant,Newsstand,Nightclub,Non-Profit,Noodle House,North Indian Restaurant,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Nightlife,Other Repair Shop,Outdoor Sculpture,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Photography Studio,Piano Bar,Pie Shop,Pier,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Pool Hall,Post Office,Pub,Puerto Rican Restaurant,Ramen Restaurant,Record Shop,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Resort,Restaurant,River,Rock Club,Roof Deck,Russian Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Social Club,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Street Art,Supermarket,Supplement Shop,Surf Spot,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tapas Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Tibetan Restaurant,Tiki Bar,Toy / Game Store,Track,Trail,Train,Train Station,Turkish Restaurant,Used Bookstore,Vape Store,Varenyky restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Waste Facility,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Arden Heights,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Astoria,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,3,0,6,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0,1,0,0,0,0,0,0,0,3,1,0,0,0,0,0,0,0,0,1,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,2,0,2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,2,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,3,2,2,2,0,0,0,0,1,0,0,0,0,0,0,6,0,0,0,0,0,2,3,0,0,0,0,0,0,0,2,0,2,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,3,0,0,1,6,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,1,0,0,0,0,2,0,0,0,0,0,0,0,1,0,0,0,0,1,1,1,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,2,0,0
2,Auburndale,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Bath Beach,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,1,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,2,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,2,1,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,2,0,0,0,0,0,2,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0
4,Bay Ridge,1,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,1,1,0,0,0,2,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,2,0,1,0,0,0,0,0,0,0,0,0,0,0,2,0,0,1,0,0,2,1,0,0,0,0,0,0,0,6,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,1,0,0,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,2,0,0,0,0,0,4,2,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,5,0,0,0,1,0,0,0,1,0,0,2,0,0,1,0,0,0,0,1,0,2,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0


***Clustering neighborhoods with venues***<br>
First, we will scale the number of venues in each neighborhood with the number of inhabitants in this neighborhood.<br>
Thus we will not have the number of venues, but the numer of venues per million inhabitant<br>
We consider it is more representative of the density of venues<br><br>
Then we will cluster neighborhoods based on this number of venues (of each type) by million people and plot the map of neighborhood with colors for their cluster


In [25]:
# Merging the venues table with population data
NewYork_merged_scaled = nyc_population_data.join(NewYork_grouped.set_index('Neighborhood'), on='Neighborhood')

# Scaling numbers of venues per million of inhabitants
NewYork_merged_scaled.iloc[:,7:] = NewYork_merged_scaled.iloc[:,7:].div(NewYork_merged_scaled['Total population'], axis=0)*1000000

# set number of clusters
kclusters = 5

NewYork_scaled_clustering = NewYork_merged_scaled.iloc[:,7:]

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(NewYork_scaled_clustering)

# check cluster labels generated for each row in the dataframe
print(kmeans.labels_[0:10])

# add clustering labels
NewYork_merged_scaled.insert(0, 'Cluster Labels', kmeans.labels_)

NewYork_merged_scaled.head()

[1 1 3 3 3 3 1 3 3 3]


Unnamed: 0,Cluster Labels,NTA,Total population,Median income,Neighborhood,Borough,Latitude,Longitude,Yoga Studio,Accessories Store,African Restaurant,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Bath House,Beach,Beer Bar,Beer Garden,Beer Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Board Shop,Bookstore,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Buffet,Building,Burger Joint,Burmese Restaurant,Burrito Place,Bus Line,Bus Station,Bus Stop,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Casino,Caucasian Restaurant,Check Cashing Service,Cheese Shop,Child Care Service,Chinese Restaurant,Church,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Bookstore,College Cafeteria,College Theater,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Cuban Restaurant,Cupcake Shop,Cycle Studio,Dance Studio,Daycare,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Service,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Halal Restaurant,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Insurance Office,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Kosher Restaurant,Lake,Latin American Restaurant,Laundromat,Library,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts School,Massage Studio,Mattress Store,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Moroccan Restaurant,Motel,Motorcycle Shop,Movie Theater,Moving Target,Museum,Music School,Music Store,Music Venue,Nail Salon,New American Restaurant,Newsstand,Nightclub,Non-Profit,Noodle House,North Indian Restaurant,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Nightlife,Other Repair Shop,Outdoor Sculpture,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Photography Studio,Piano Bar,Pie Shop,Pier,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Pool Hall,Post Office,Pub,Puerto Rican Restaurant,Ramen Restaurant,Record Shop,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Resort,Restaurant,River,Rock Club,Roof Deck,Russian Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Social Club,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Street Art,Supermarket,Supplement Shop,Surf Spot,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tapas Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Tibetan Restaurant,Tiki Bar,Toy / Game Store,Track,Trail,Train,Train Station,Turkish Restaurant,Used Bookstore,Vape Store,Varenyky restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Waste Facility,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,1,BK90,33155,519058,East Williamsburg,Brooklyn,40.708492,-73.938858,0,0.0,0,0.0,0,0,0,0,0,0,30.1614,0.0,0,0,0,30.1614,30.1614,120.645,30.1614,180.968,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,30.1614,0.0,0.0,0,0,0,0,0,0.0,0,0,0.0,60.3227,0,0,0,0.0,0,0,0,0,0,0.0,0,0,0,0,120.645,60.3227,0,0,0,0,0,0,0,0,90.4841,0,0.0,0.0,0,0,0,0,0,0,180.968,0,0.0,0,0.0,0,30.1614,0,60.3227,0,0,0,0,0,0,0,0,0,0,30.1614,0.0,0,0,0,0,0,0,0.0,30.1614,0,0,0,0,0.0,0,0,0,0,0,0,0,30.1614,0,0,0.0,0,0.0,0,30.1614,0,60.3227,30.1614,0,0,0,0,0,0,0,0,0,0,0,30.1614,0,0,0.0,0,0,0,0,30.1614,0,0,0,0,0,0.0,0,0.0,0,0.0,0,0.0,30.1614,0,0,0,0,0,30.1614,0,0,0,0.0,0,0,0,0,0,0,0,0,0,0,90.4841,0,0,0,0,0,0,0,0,0,0,0,0,60.3227,30.1614,0,0,0,0,0,0,0,0.0,0,0,0,0,0,0,0.0,0,30.1614,0,0,0,0,0.0,30.1614,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0,0,30.1614,30.1614,0,0,0,0,30.1614,0,0,0,0,0,0,60.3227,0,0,0,0,30.1614,0,0.0,30.1614,0,0,0.0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0.0,0,0.0,0,0,0,0,0,0.0,0,30.1614,0,0,0,0,30.1614,0,30.1614,30.1614,0,0,0,0,0,0,0.0,0,0,0,0,60.3227,0,0.0,0,0.0,0,0,0,0,0
1,1,QN23,24199,354073,College Point,Queens,40.784903,-73.843045,0,0.0,0,41.324,0,0,0,0,0,0,0.0,82.648,0,0,0,0.0,0.0,82.648,41.324,82.648,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,41.324,41.324,0,0,0,0,0,41.324,0,0,0.0,0.0,0,0,0,41.324,0,0,0,0,0,82.648,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0,0,0.0,0,41.324,0.0,0,0,0,0,0,0,206.62,0,41.324,0,41.324,0,0.0,0,41.324,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0.0,41.324,0,0,0,0,41.324,0,0,0,0,0,0,0,0.0,0,0,41.324,0,0.0,0,0.0,0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0.0,0,0,41.324,0,0,0,0,0.0,0,0,0,0,0,41.324,0,0.0,0,0.0,0,0.0,41.324,0,0,0,0,0,82.648,0,0,0,0.0,0,0,0,0,0,0,0,0,0,0,82.648,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0,41.324,0,0,0,0,0,0,41.324,0,0.0,0,0,0,0,41.324,0.0,0,0,0,0,82.648,0.0,0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0.0,0,0,0,0,0,0,82.648,0,0,0,0,82.648,0,0.0,0.0,0,0,0.0,0,0,0,0,0,0,0,41.324,0.0,0,0,0,0,82.648,0,41.324,0,0,0,0,0,41.324,0,0.0,0,0,0,0,0.0,0,0.0,0.0,0,0,0,0,0,0,0.0,0,0,0,0,0.0,0,41.324,0,0.0,0,0,0,0,0
2,3,SI54,43427,718593,Great Kills,Staten Island,40.54948,-74.149324,0,0.0,0,0.0,0,0,0,0,0,0,0.0,0.0,0,0,0,0.0,0.0,23.0271,0.0,69.0814,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0.0,0,0,23.0271,0.0,0,0,0,0.0,0,0,0,0,0,23.0271,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0,0,0.0,0,0.0,23.0271,0,0,0,0,0,0,0.0,0,23.0271,0,0.0,0,0.0,0,0.0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0.0,23.0271,0,0,0,0,0.0,0,0,0,0,0,0,0,0.0,0,0,0.0,0,0.0,0,23.0271,0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0.0,0,0,0.0,0,0,0,0,0.0,0,0,0,0,0,69.0814,0,23.0271,0,0.0,0,0.0,0.0,0,0,0,0,0,0.0,0,0,0,23.0271,0,0,0,0,0,0,0,0,0,0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0,0.0,0,0,0,0,0,0,0.0,0,0.0,0,0,0,0,23.0271,0.0,0,0,0,0,46.0543,0.0,0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0.0,0,0,0,0,0,0,23.0271,0,0,0,0,0.0,0,0.0,0.0,0,0,23.0271,0,0,0,0,0,0,0,0.0,23.0271,0,0,0,0,0.0,0,0.0,0,0,0,0,0,0.0,0,0.0,0,0,0,0,0.0,0,0.0,0.0,0,0,0,0,0,0,23.0271,0,0,0,0,0.0,0,0.0,0,0.0,0,0,0,0,0
3,3,QN05,28201,506307,Rosedale,Queens,40.659816,-73.735261,0,35.4597,0,0.0,0,0,0,0,0,0,0.0,0.0,0,0,0,0.0,0.0,0.0,0.0,0.0,35.4597,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,35.4597,0,0,0.0,0.0,0,0,0,70.9195,0,0,0,0,0,35.4597,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0,0,0.0,0,0.0,35.4597,0,0,0,0,0,0,35.4597,0,0.0,0,0.0,0,0.0,0,0.0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,35.4597,0,0,0,0,0,0,0,0.0,0,0,0.0,0,0.0,0,0.0,0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0.0,0,0,0.0,0,0,0,0,0.0,0,0,0,0,0,0.0,0,0.0,0,35.4597,0,0.0,0.0,0,0,0,0,0,0.0,0,0,0,0.0,0,0,0,0,0,0,0,0,0,0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0,0.0,0,0,0,0,0,0,0.0,0,0.0,0,0,0,0,35.4597,0.0,0,0,0,0,0.0,35.4597,0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0.0,0,0,0,0,0,0,35.4597,0,0,0,0,0.0,0,0.0,0.0,0,0,35.4597,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0.0,0,35.4597,0,0,0,0,0,0.0,0,0.0,0,0,0,0,0.0,0,0.0,0.0,0,0,0,0,0,0,0.0,0,0,0,0,0.0,0,0.0,0,0.0,0,0,0,0,0
4,3,BX27,27562,149520,Hunts Point,Bronx,40.80973,-73.883315,0,0.0,0,0.0,0,0,0,0,0,0,0.0,0.0,0,0,0,36.2818,0.0,0.0,36.2818,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0,0,0,0,0,0.0,0,0,0.0,36.2818,0,0,0,0.0,0,0,0,0,0,0.0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0,0,0.0,0,0.0,0.0,0,0,0,0,0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0,0,0,0,0,0,0,0,0,0.0,36.2818,0,0,0,0,0,0,36.2818,0.0,0,0,0,0,0.0,0,0,0,0,0,0,0,0.0,0,0,0.0,0,36.2818,0,36.2818,0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0.0,0,0,0.0,0,0,0,0,0.0,0,0,0,0,0,0.0,0,0.0,0,0.0,0,36.2818,0.0,0,0,0,0,0,0.0,0,0,0,0.0,0,0,0,0,0,0,0,0,0,0,36.2818,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0,0,0,0.0,0,0,0,0,0,0,0.0,0,0.0,0,0,0,0,0.0,0.0,0,0,0,0,36.2818,0.0,0,0,0,0,0,0,0,0,0.0,0.0,0,0,0,0,0.0,0,0,0,0,0,0,0.0,0,0,0,0,0.0,0,36.2818,0.0,0,0,0.0,0,0,0,0,0,0,0,0.0,36.2818,0,0,0,0,0.0,0,0.0,0,0,0,0,0,0.0,0,0.0,0,0,0,0,0.0,0,0.0,0.0,0,0,0,0,0,0,0.0,0,0,0,0,0.0,0,0.0,0,36.2818,0,0,0,0,0


In [26]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(NewYork_merged_scaled['Latitude'], NewYork_merged_scaled['Longitude'], NewYork_merged_scaled['Neighborhood'], NewYork_merged_scaled['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)],
        fill=True,
        fill_color=rainbow[int(cluster)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [27]:
# Creating a dataframe with only the neighborhoods in the Flatbush Cluster :
# Get Flatbush cluster label
Flatbush_label = NewYork_merged_scaled[NewYork_merged_scaled['Neighborhood']=='Flatbush'].iloc[0][0]
# Create dataframe with only this cluster label
Cluster = NewYork_merged_scaled[NewYork_merged_scaled['Cluster Labels']==Flatbush_label]
# print the list of neighborhoods that are similar to Flatbush = answer to the first question with the map above
print(Cluster['Neighborhood'])

2             Great Kills
3                Rosedale
4             Hunts Point
5              Whitestone
7           Starrett City
8           East New York
9                Canarsie
10            Westerleigh
11               Steinway
12              Homecrest
14             St. Albans
15              Bronxdale
16                 Corona
17           East Tremont
19               Longwood
20           Borough Park
21          East Elmhurst
22             Auburndale
25            Parkchester
26        Cambria Heights
28                 Hollis
29        Jackson Heights
30    Morningside Heights
31                Madison
35          South Jamaica
38             Co-op City
39         Brighton Beach
41          Port Richmond
42         Pelham Parkway
43             Mount Hope
44        Upper West Side
46          Richmond Hill
47         Lincoln Square
49          Arden Heights
51              Woodhaven
52              Laurelton
53         Queens Village
54              Bellerose
56       Sou

***Going further***<br>
To go further, we will now reduce the list by keeping only the neighborhoods where median income is higher than the one in Flatbush, and where there are less coffee shop per million people than in Flatbush<br>

In [28]:
Cluster.replace({'NA': None},inplace=True)
Cluster_reworked=Cluster.dropna()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  method=method,


In [29]:
# Change value type to integer
Cluster_reworked['Median income'] = Cluster_reworked['Median income'].astype(int)

# Get neighborhoods from the cluster that have higher or equal income than Flatbush
Cluster_high_income = Cluster_reworked[Cluster_reworked['Median income'].ge(1223438)]
print(Cluster_high_income['Neighborhood'])

44     Upper West Side
46       Richmond Hill
53      Queens Village
56    South Ozone Park
63           Flatlands
72           Bay Ridge
73            Flatbush
84        Forest Hills
87             Astoria
Name: Neighborhood, dtype: object


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [30]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(Cluster_high_income['Latitude'], Cluster_high_income['Longitude'], Cluster_high_income['Borough'], Cluster_high_income['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

In [31]:
# sorting cluster dataframe by number of Coffee shop per million inhabitants

Cluster_high_income_sorted = Cluster_high_income.sort_values(by=['Coffee Shop'])
Cluster_high_income_sorted[['Neighborhood','Coffee Shop']]

Unnamed: 0,Neighborhood,Coffee Shop
46,Richmond Hill,0.0
53,Queens Village,0.0
56,South Ozone Park,0.0
63,Flatlands,0.0
72,Bay Ridge,0.0
84,Forest Hills,0.0
87,Astoria,13.306897
73,Flatbush,19.030582
44,Upper West Side,22.104658


In [32]:
# Keep only neighborhood with high income and that have less coffee shop per million people than Flatbush
Cluster_second_question = Cluster_high_income[Cluster_high_income['Coffee Shop'].lt(19)]
# print the list of neighborhood that match income and lack of coffee shop criterias
print(Cluster_second_question['Neighborhood'])

46       Richmond Hill
53      Queens Village
56    South Ozone Park
63           Flatlands
72           Bay Ridge
84        Forest Hills
87             Astoria
Name: Neighborhood, dtype: object


# Conclusion

### First Question
What are the neighborhoods similar to Flatbush in terms of surrounding venues ?

In [33]:
print("The neighborhoods similar to Flatbush are listed below:\n")
print(Cluster['Neighborhood'])
print("\n The map of the neighborhoods with colors representing similar cluster is shown below :\n")
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(NewYork_merged_scaled['Latitude'], NewYork_merged_scaled['Longitude'], NewYork_merged_scaled['Neighborhood'], NewYork_merged_scaled['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)],
        fill=True,
        fill_color=rainbow[int(cluster)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

The neighborhoods similar to Flatbush are listed below:

2             Great Kills
3                Rosedale
4             Hunts Point
5              Whitestone
7           Starrett City
8           East New York
9                Canarsie
10            Westerleigh
11               Steinway
12              Homecrest
14             St. Albans
15              Bronxdale
16                 Corona
17           East Tremont
19               Longwood
20           Borough Park
21          East Elmhurst
22             Auburndale
25            Parkchester
26        Cambria Heights
28                 Hollis
29        Jackson Heights
30    Morningside Heights
31                Madison
35          South Jamaica
38             Co-op City
39         Brighton Beach
41          Port Richmond
42         Pelham Parkway
43             Mount Hope
44        Upper West Side
46          Richmond Hill
47         Lincoln Square
49          Arden Heights
51              Woodhaven
52              Laurelton
53     

### Second Question
What are the neighborhoods similar to Flatbush in terms of surrounding venues and having higher median income and less coffee shop per million people ?

In [35]:
print("The neighborhoods similar to Flatbush and respecting the income and lack of coffee shops criterias are listed below:\n")
print(Cluster_second_question['Neighborhood'])
print("\n The map of the neighborhoods respecting all criterias is shown below :\n")

# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(Cluster_second_question['Latitude'], Cluster_second_question['Longitude'], Cluster_second_question['Borough'], Cluster_second_question['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

The neighborhoods similar to Flatbush and respecting the income and lack of coffee shops criterias are listed below:

46       Richmond Hill
53      Queens Village
56    South Ozone Park
63           Flatlands
72           Bay Ridge
84        Forest Hills
87             Astoria
Name: Neighborhood, dtype: object

 The map of the neighborhoods respecting all criterias is shown below :



### Remarks
Based on the study, we have a list of neighborhoods that show similarity with Flatbush where the first coffee shop is located.<br>
The coffee shop owner can use the map of proposed neighborhood to make his choice, for instance based on geographic proximity, or distanciation, with its first coffee shop.<br><br>
The study could be improved by having more comprehensive input data that cover all of the city neighborhoods.<br>
Also, we can note that in the list proposed, only Astoria already has coffee shop. The reason why other proposed neighborhood don't have any coffee shop is to be understood. It could either signify there could be a big interest in a new coffee shop in this area, or at the contrary it could warn that the population in those neighborhoods may be not interested by such a venue.