# The Battle of Neighborhoods

### 1 - Description

In this project we will compare neighborhoods of different cities based on the venues each neighborhood contains. The question we will be answering is: Are neighborhoods in one cities more similar to each other than they are similar to neighborhoods in other cities? Would city centers cluster together?

In other part, we will be trying to provide an advice for an entrepreneur regarding where they should open a restaurent. We will be looking at the density of restaurants in each neighborhood. We assume that a restauant will be more profitable in an area with a low density of restaurents. 

### 2 - Data requirements

The data we will require is: 
- A List of the neighborhoods in each city with 
- Venues data for each location will be retrived using the Foursquare API

The required libraries: 
- Pandas 
- Numpy
- Sklearn and more specifically the kmeans method
- Geocoder to obtain the location of each neighborhood
- BeautifulSoup to scrap data from wikipedia for the city of Toronto and maybe other cities

### 3 - Importing the libraries

In [1]:
import pandas as pd
from bs4 import BeautifulSoup
import requests
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
import json # library to handle JSON files

!pip install geopy
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
!pip install pgeocode
import pgeocode
!pip install folium
import folium

print('Libraries imported.')

Libraries imported.


### 4 - Toronto

#### 4.1 - Web-Scraping data about toronto

In [2]:
html = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(html, 'html5lib')
table = soup.find('table')

In [3]:
data = []
rows = table.find_all('td')
for t in rows: 
        data.append(t.text.strip())

In [4]:
toronto = pd.DataFrame(columns = ['postalcode', 'borough', 'neighborhood'])

In [5]:
for i in data:
    if (i.find('Not assigned') == -1):
        p_code = i[0:3]
        if (i.find('(') == -1):
            borough = i[3:]
            neighborhood = borough
        else:
            borough = i[3:i.find('(')]
            neighborhood = i [i.find('(')+1:i.find(')')]
        toronto.loc[len(toronto.index)] = [p_code, borough, neighborhood]

In [6]:
#cleaning the borough values
for i, b in enumerate(toronto['borough']): 
    if (b.find('East York') != -1):
        toronto.at[i, 'borough'] = 'East York'
    elif (b.find('Mississauga') != -1): 
        toronto.at[i, 'borough'] = 'Mississauga'
    elif (b.find('East Toronto') != -1): 
        toronto.at[i, 'borough'] = 'East Toronto'
    elif (b.find('Downtown Toronto') != -1): 
        toronto.at[i, 'borough'] = 'Downtown Toronto'
    elif (b.find('Etobicoke') != -1): 
        toronto.at[i, 'borough'] = 'Etobicoke'

In [7]:
print(toronto.shape)
toronto.borough.unique()

(103, 3)


array(['North York', 'Downtown Toronto', "Queen's Park", 'Etobicoke',
       'Scarborough', 'East York', 'York', 'East Toronto', 'West Toronto',
       'Central Toronto', 'Mississauga'], dtype=object)

In [8]:
toronto.tail()

Unnamed: 0,postalcode,borough,neighborhood
98,M8X,Etobicoke,The Kingsway / Montgomery Road / Old Mill North
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,Enclave of M4L
101,M8Y,Etobicoke,Old Mill South / King's Mill Park / Sunnylea /...
102,M8Z,Etobicoke,Mimico NW / The Queensway West / South of Bloo...


#### 4.1 - Getting the location of each neighborhood

In [9]:
#I didn't manage to retrieve the addresses (same method as the new york assignment)
#This code was found in the Coursera Forum
pgeocode.Nominatim('ca')
geolocator = pgeocode.Nominatim('ca')
pcodes = toronto['postalcode'].tolist()
latitudes = []
longitudes = []
for i, pcode in enumerate(pcodes):
    # initialize your variable to None
    #print(f'--Getting Postal Code: {postal_code}')
    g = geolocator.query_postal_code(pcode)
    
    if not g.empty:
        #print(f'Postal Code {pcode} has been retrieved. {len(pcodes) - (i + 1)} codes left')
        latitudes.append(g.latitude)
        longitudes.append(g.longitude)

In [10]:
toronto['latitude'] = latitudes
toronto['longitude'] = longitudes
toronto.head()

Unnamed: 0,postalcode,borough,neighborhood,latitude,longitude
0,M3A,North York,Parkwoods,43.7545,-79.33
1,M4A,North York,Victoria Village,43.7276,-79.3148
2,M5A,Downtown Toronto,Regent Park / Harbourfront,43.6555,-79.3626
3,M6A,North York,Lawrence Manor / Lawrence Heights,43.7223,-79.4504
4,M7A,Queen's Park,Ontario Provincial Government,43.6641,-79.3889


In [11]:
toronto = toronto.drop(toronto.index[toronto.latitude.isnull() == True].tolist(), axis=0)
toronto = toronto.drop(toronto.index[toronto.longitude.isnull() == True].tolist(), axis=0)
toronto = toronto.reset_index(drop=True)
print(toronto.shape)
toronto.tail()

(102, 5)


Unnamed: 0,postalcode,borough,neighborhood,latitude,longitude
97,M8X,Etobicoke,The Kingsway / Montgomery Road / Old Mill North,43.6518,-79.5076
98,M4Y,Downtown Toronto,Church and Wellesley,43.6656,-79.383
99,M7Y,East Toronto,Enclave of M4L,43.7804,-79.2505
100,M8Y,Etobicoke,Old Mill South / King's Mill Park / Sunnylea /...,43.6325,-79.4939
101,M8Z,Etobicoke,Mimico NW / The Queensway West / South of Bloo...,43.6256,-79.5231


Display the map of Toronto and its Neighborhoods:

In [12]:
#Coordinates of the city of Toronto
toronto_lat = 43.6529
toronto_long = -79.3849
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[toronto_lat, toronto_long], zoom_start=12)
#map_toronto

In [13]:
# add markers to map
for lat, lng, borough, neighborhood in zip(toronto['latitude'], toronto['longitude'], toronto['borough'], toronto['neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
#map_toronto

In [14]:
toronto = toronto.drop(['postalcode'], axis=1)
toronto.head()

Unnamed: 0,borough,neighborhood,latitude,longitude
0,North York,Parkwoods,43.7545,-79.33
1,North York,Victoria Village,43.7276,-79.3148
2,Downtown Toronto,Regent Park / Harbourfront,43.6555,-79.3626
3,North York,Lawrence Manor / Lawrence Heights,43.7223,-79.4504
4,Queen's Park,Ontario Provincial Government,43.6641,-79.3889


In [15]:
city = []
for i in range(toronto.shape[0]):
    city.append('toronto')
toronto['city'] = city
toronto.head()

Unnamed: 0,borough,neighborhood,latitude,longitude,city
0,North York,Parkwoods,43.7545,-79.33,toronto
1,North York,Victoria Village,43.7276,-79.3148,toronto
2,Downtown Toronto,Regent Park / Harbourfront,43.6555,-79.3626,toronto
3,North York,Lawrence Manor / Lawrence Heights,43.7223,-79.4504,toronto
4,Queen's Park,Ontario Provincial Government,43.6641,-79.3889,toronto


### 5 - New York

In [16]:
with open('newyork_data.json') as json_data:
    nyc_data = json.load(json_data)

In [17]:
neigh_data = nyc_data['features']

# define the dataframe columns
column_names = ['borough', 'neighborhood', 'latitude', 'longitude'] 

# instantiate the dataframe
nyc = pd.DataFrame(columns=column_names)

In [18]:
for data in neigh_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    nyc = nyc.append({'borough': borough,
                      'neighborhood': neighborhood_name,
                      'latitude': neighborhood_lat,
                      'longitude': neighborhood_lon}, ignore_index=True)

In [19]:
nyc.shape

(306, 4)

In [20]:
city = []
for i in range(nyc.shape[0]):
    city.append('nyc')
nyc['city'] = city
nyc.head()

Unnamed: 0,borough,neighborhood,latitude,longitude,city
0,Bronx,Wakefield,40.894705,-73.847201,nyc
1,Bronx,Co-op City,40.874294,-73.829939,nyc
2,Bronx,Eastchester,40.887556,-73.827806,nyc
3,Bronx,Fieldston,40.895437,-73.905643,nyc
4,Bronx,Riverdale,40.890834,-73.912585,nyc


In [21]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
nyc_loc = geolocator.geocode(address)
nyc_lat = nyc_loc.latitude
nyc_long = nyc_loc.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(nyc_lat, nyc_long))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [22]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[nyc_lat, nyc_long], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(nyc['latitude'], nyc['longitude'], nyc['borough'], nyc['neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

### 6 - Paris

#### 6.1 - Web-Scraping data about paris

In [23]:
html = requests.get('https://en.wikipedia.org/wiki/Quarters_of_Paris').text
soup = BeautifulSoup(html, 'html5lib')
table = soup.find('table')

In [24]:
data = []
rows = table.find_all('td')
for t in rows: 
        data.append(t.text.strip())

In [25]:
paris = pd.DataFrame(columns = ['borough', 'neighborhood'])

In [26]:
d = []
for i in data: 
    if (i != ''):
        d.append(i)

In [27]:
paris_borough = []
for i in range(20): 
    for j in range(4): 
        paris_borough.append(d[i*17])
    
paris_neigh = []
for i in range(len(d)):
    count = 0
    for j in list(d[i]):
        if j.isdigit():
            count = count + 1
    if not count: 
        paris_neigh.append(d[i])

In [28]:
paris = pd.DataFrame(columns = ['borough', 'neighborhood'])
paris['borough'] = paris_borough
paris['neighborhood'] = paris_neigh
paris.head()

Unnamed: 0,borough,neighborhood
0,"1st arrondissement(Called ""du Louvre"")",Saint-Germain-l'Auxerrois
1,"1st arrondissement(Called ""du Louvre"")",Les Halles
2,"1st arrondissement(Called ""du Louvre"")",Palais-Royal
3,"1st arrondissement(Called ""du Louvre"")",Place-Vendôme
4,"2nd arrondissement(Called ""de la Bourse"")",Gaillon


In [29]:
type(paris.at[0, 'neighborhood'])

str

In [30]:
paris_lat = []
paris_long = []

for i in paris['neighborhood']:
    address = '{}, Paris, France'.format(i)
    
    geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(address)
    paris_lat.append(location.latitude)
    paris_long.append(location.longitude)
    #print('The geograpical coordinate of {} are {}, {}.'.format(i, location.latitude, location.latitude))

In [31]:
paris['latitude'] = paris_lat
paris['longitude'] = paris_long

In [32]:
city = []
for i in range(paris.shape[0]):
    city.append('paris')
paris['city'] = city

In [33]:
paris.head()

Unnamed: 0,borough,neighborhood,latitude,longitude,city
0,"1st arrondissement(Called ""du Louvre"")",Saint-Germain-l'Auxerrois,48.860211,2.336299,paris
1,"1st arrondissement(Called ""du Louvre"")",Les Halles,48.862707,2.346183,paris
2,"1st arrondissement(Called ""du Louvre"")",Palais-Royal,48.863585,2.336204,paris
3,"1st arrondissement(Called ""du Louvre"")",Place-Vendôme,48.867463,2.329428,paris
4,"2nd arrondissement(Called ""de la Bourse"")",Gaillon,48.869135,2.332909,paris


### 7 - Getting the venues for each neighborhood and created a dataframe that contain all data

In [34]:
all_neigh = pd.concat([paris, nyc, toronto]).reset_index(drop = True)
all_neigh

Unnamed: 0,borough,neighborhood,latitude,longitude,city
0,"1st arrondissement(Called ""du Louvre"")",Saint-Germain-l'Auxerrois,48.860211,2.336299,paris
1,"1st arrondissement(Called ""du Louvre"")",Les Halles,48.862707,2.346183,paris
2,"1st arrondissement(Called ""du Louvre"")",Palais-Royal,48.863585,2.336204,paris
3,"1st arrondissement(Called ""du Louvre"")",Place-Vendôme,48.867463,2.329428,paris
4,"2nd arrondissement(Called ""de la Bourse"")",Gaillon,48.869135,2.332909,paris
...,...,...,...,...,...
483,Etobicoke,The Kingsway / Montgomery Road / Old Mill North,43.651800,-79.507600,toronto
484,Downtown Toronto,Church and Wellesley,43.665600,-79.383000,toronto
485,East Toronto,Enclave of M4L,43.780400,-79.250500,toronto
486,Etobicoke,Old Mill South / King's Mill Park / Sunnylea /...,43.632500,-79.493900,toronto


In [40]:
CLIENT_ID = 'UFDZPG4403HHHUJI4M153QW2MWEE3T3JGZRKXPCKCBCUPYO4' # your Foursquare ID
CLIENT_SECRET = 'HX2O44MI4OPLTCQGLUOUDWIRA2QNISL2R0UP53BEKNGPGUO2' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: UFDZPG4403HHHUJI4M153QW2MWEE3T3JGZRKXPCKCBCUPYO4
CLIENT_SECRET:HX2O44MI4OPLTCQGLUOUDWIRA2QNISL2R0UP53BEKNGPGUO2


In [41]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    print('retrieval completed')
    
    return(nearby_venues)

In [42]:
all_venues = getNearbyVenues(names=all_neigh['neighborhood'],
                             latitudes=all_neigh['latitude'],
                             longitudes=all_neigh['longitude'])

retrieval completed


In [43]:
print(all_venues.shape)
all_venues.head()

(17951, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Saint-Germain-l'Auxerrois,48.860211,2.336299,Cour Carrée du Louvre,48.86036,2.338543,Pedestrian Plaza
1,Saint-Germain-l'Auxerrois,48.860211,2.336299,Musée du Louvre,48.860847,2.33644,Art Museum
2,Saint-Germain-l'Auxerrois,48.860211,2.336299,La Vénus de Milo (Vénus de Milo),48.859943,2.337234,Exhibit
3,Saint-Germain-l'Auxerrois,48.860211,2.336299,Cour Napoléon,48.861172,2.335088,Plaza
4,Saint-Germain-l'Auxerrois,48.860211,2.336299,Pont des Arts,48.858565,2.337635,Bridge


In [44]:
count = all_venues.groupby('Neighborhood').count()
count = count[['Venue Category']]
count.rename(columns = {'Venue Category' : 'Number of Venues'}, inplace=True)
print(count.shape)
count.head()

(474, 1)


Unnamed: 0_level_0,Number of Venues
Neighborhood,Unnamed: 1_level_1
Agincourt,4
Alderwood / Long Branch,6
Allerton,31
Amérique,13
Annadale,9


In [45]:
print('There are {} uniques categories.'.format(len(all_venues['Venue Category'].unique())))

There are 501 uniques categories.


In [46]:
# one hot encoding
all_onehot = pd.get_dummies(all_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
all_onehot['Neighborhood'] = all_venues['Neighborhood'] 

# move neighborhood column to the first column
first_column = all_onehot.pop('Neighborhood')
all_onehot.insert(0, 'Neighborhood', first_column)
all_onehot.head()

print(all_onehot.shape)
all_onehot.head()

(17951, 501)


Unnamed: 0,Neighborhood,ATM,Accessories Store,Acupuncturist,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,Airport Terminal,Alsatian Restaurant,...,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yemeni Restaurant,Yoga Studio,Zoo,Zoo Exhibit
0,Saint-Germain-l'Auxerrois,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Saint-Germain-l'Auxerrois,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Saint-Germain-l'Auxerrois,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Saint-Germain-l'Auxerrois,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Saint-Germain-l'Auxerrois,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [47]:
#Average the venues for each neighborhood
all_grouped = all_onehot.groupby('Neighborhood').mean().reset_index()
print(all_grouped.shape)
all_grouped.head()

(474, 501)


Unnamed: 0,Neighborhood,ATM,Accessories Store,Acupuncturist,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,Airport Terminal,Alsatian Restaurant,...,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yemeni Restaurant,Yoga Studio,Zoo,Zoo Exhibit
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Alderwood / Long Branch,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Allerton,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Amérique,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Annadale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### 8 - Clustering the Neighborhoods

In [59]:
# set number of clusters
kclusters = 5

all_grouped_clustering = all_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(all_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 3, 3, 0, 3, 0, 3, 3, 3, 0])

In [60]:
# add clustering labels
count.insert(0, 'Cluster Labels', kmeans.labels_)
neigh_labeled = count
neigh_labeled.reset_index(inplace = True)
print(neigh_labeled.shape)
neigh_labeled.head()

ValueError: cannot insert Cluster Labels, already exists

In [61]:
# Merge the labels with the locations dataframe

all_merged = all_neigh.set_index('neighborhood').join(neigh_labeled.set_index('Neighborhood'), how = 'inner')

all_merged.reset_index(inplace = True)


#Change cluster labels to integers
all_merged['Cluster Labels'] = all_merged['Cluster Labels'].astype(int)

all_merged.rename(columns = {'index' : 'neighborhood'}, inplace=True)

print(all_merged.shape)
all_merged.head() 

(484, 7)


Unnamed: 0,neighborhood,borough,latitude,longitude,city,Cluster Labels,Number of Venues
0,Agincourt,Scarborough,43.7946,-79.2644,toronto,0,4
1,Alderwood / Long Branch,Etobicoke,43.6021,-79.5402,toronto,3,6
2,Allerton,Bronx,40.865788,-73.859319,nyc,3,31
3,Amérique,"19th arrondissement(Called ""des Buttes-Chaumont"")",48.882424,2.394025,paris,0,13
4,Annadale,Staten Island,40.538114,-74.178549,nyc,3,9


In [51]:
def nans(df): return df[df.isnull().any(axis=1)]

In [52]:
nans(all_merged)

Unnamed: 0,neighborhood,borough,latitude,longitude,city,Cluster Labels,Number of Venues


In [53]:
nyc_labeled = all_merged[all_merged['city'] == 'nyc'].reset_index(drop = True)
toronto_labeled = all_merged[all_merged['city'] == 'toronto'].reset_index(drop = True)
paris_labeled = all_merged[all_merged['city'] == 'paris'].reset_index(drop = True)

In [54]:
paris_labeled.head()

Unnamed: 0,neighborhood,borough,latitude,longitude,city,Cluster Labels,Number of Venues
0,Amérique,"19th arrondissement(Called ""des Buttes-Chaumont"")",48.882424,2.394025,paris,0,13
1,Archives,"3rd arrondissement(Called ""du Temple"")",48.859571,2.362576,paris,0,100
2,Arsenal,"4th arrondissement(Called ""de l'Hôtel-de-Ville"")",48.851572,2.364795,paris,0,70
3,Arts-et-Métiers,"3rd arrondissement(Called ""du Temple"")",48.865441,2.356132,paris,0,100
4,Auteuil,"16th arrondissement(Called ""de Passy"")",48.847722,2.266738,paris,0,55


In [55]:
paris_lat = 48.856613
paris_long = 2.352222
toronto_lat = 43.741667
toronto_long = -79.373333
nyc_lat = 40.712778
nyc_long = -74.006111

In [56]:
# create map
map_clusters = folium.Map(location=[toronto_lat, toronto_long], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_labeled['latitude'], toronto_labeled['longitude'], toronto_labeled['neighborhood'], toronto_labeled['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [57]:
# create map
map_clusters = folium.Map(location=[paris_lat, paris_long], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(paris_labeled['latitude'], paris_labeled['longitude'], paris_labeled['neighborhood'], paris_labeled['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [58]:
# create map
map_clusters = folium.Map(location=[nyc_lat, nyc_long], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(nyc_labeled['latitude'], nyc_labeled['longitude'], nyc_labeled['neighborhood'], nyc_labeled['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Conclusion:
After clusing all the neighborhoods for the 3 cities, Paris, Toronto, and New York City, we observe that neighborhoods of Toronto and Paris have a very similar identity (Red Cluster) with Toronto having some Neighborhoods in the Violet cluster. Neighborhoods of New York on the other hand has a very distinct identity (red and green cluster). 
Limitations: the number of venues in each neighborhood varies a lot with some neighborhood having 1 or 2 venues and neighborhoods having hundreds of venues.

### 9 - Restaurent density study

In [62]:
all_grouped.head()

Unnamed: 0,Neighborhood,ATM,Accessories Store,Acupuncturist,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,Airport Terminal,Alsatian Restaurant,...,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yemeni Restaurant,Yoga Studio,Zoo,Zoo Exhibit
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Alderwood / Long Branch,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Allerton,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Amérique,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Annadale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [63]:
col = []
col.append('Neighborhood')

for i in all_grouped.columns.tolist():
    if (i.find('Restaurant') != -1):
        col.append(i)
    
len(col)

112

In [64]:
restaurants = all_onehot.groupby('Neighborhood').sum().reset_index()
restaurants = restaurants[col]
restaurants.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,Alsatian Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,...,Tex-Mex Restaurant,Thai Restaurant,Theme Restaurant,Tibetan Restaurant,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Yemeni Restaurant
0,Agincourt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Alderwood / Long Branch,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Allerton,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Amérique,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Annadale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [65]:
restaurants.shape

(474, 112)

In [66]:
rest = pd.DataFrame()
rest['Neighborhood'] = restaurants['Neighborhood']
rest['Number of Restaurants'] = restaurants.sum(axis=1).tolist()

In [67]:
rest

Unnamed: 0,Neighborhood,Number of Restaurants
0,Agincourt,1
1,Alderwood / Long Branch,0
2,Allerton,4
3,Amérique,2
4,Annadale,1
...,...,...
469,York Mills / Silver Hills,0
470,York Mills West,0
471,Yorkville,30
472,École-Militaire,9


In [68]:
rest = all_neigh.set_index('neighborhood').join(rest.set_index('Neighborhood'), how = 'inner')


In [85]:
rest = rest[rest['Number of Restaurants'] == 0]

In [95]:
rest

Unnamed: 0,level_0,index,borough,latitude,longitude,city,Number of Restaurants
0,1,Alderwood / Long Branch,Etobicoke,43.602100,-79.540200,toronto,0
1,6,Arden Heights,Staten Island,40.549286,-74.185887,nyc,0
2,25,Bayswater,Queens,40.611322,-73.765968,nyc,0
3,26,Bayview Village,North York,43.779700,-79.381300,toronto,0
4,40,Bergen Beach,Brooklyn,40.615150,-73.898556,nyc,0
...,...,...,...,...,...,...,...
66,464,Whitestone,Queens,40.781291,-73.814202,nyc,0
67,465,Williamsbridge,Bronx,40.881039,-73.857446,nyc,0
68,470,Willowdale / Newtonbrook,North York,43.791500,-79.410300,toronto,0
69,479,York Mills / Silver Hills,North York,43.754700,-79.376400,toronto,0


#### Conclusion: 
There are 71 neighborhoods with no restaurants. If an entrepreneur is open a restaurant somewhere I would suggest they open it in one of these neighborhoods. 
Limitations: I suspect the data base of Foursquare doesn't include all the restaurants in each city.

In [86]:
rest = rest.reset_index(drop=False)

In [87]:
toronto_rest = rest[rest['city']=='toronto']
nyc_rest = rest[rest['city']=='nyc']
paris_rest = rest[rest['city']=='paris']

In [88]:
toronto_rest.shape

(36, 7)

In [89]:

map_toronto = folium.Map(location=[toronto_lat, toronto_long], zoom_start=10)


for lat, lng, borough, neighborhood in zip(toronto_rest['latitude'], toronto_rest['longitude'], toronto_rest['borough'], toronto_rest['index']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [90]:

map_nyc = folium.Map(location=[nyc_lat, nyc_long], zoom_start=10)


for lat, lng, borough, neighborhood in zip(nyc_rest['latitude'], nyc_rest['longitude'], nyc_rest['borough'], nyc_rest['index']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_nyc)  
    
map_nyc

In [94]:
paris_rest.head()

Unnamed: 0,level_0,index,borough,latitude,longitude,city,Number of Restaurants
