# New York vs. Paris
This notebook will share how close these two cities are.

## Introduction/Business Problem:

According to: [Matador Network](https://matadornetwork.com/abroad/the-5-most-important-differences-between-paris-and-new-york/), these two cities have a lot of differences, knowing what venues they have and how they relate to each other with clusters, can give you a good idea how the cities handle multiculture.

Comparing them can give you a good idea if you want to move, open a restaurant, visit a venue category, learn about their multicultural sites without having to visit the city.


## Data Section:

The data is going to be obtained from the Foursquare API. The data contains the following:
- New York (Manhattan) and Paris Venues
- New York (Manhattan) and Paris Postal Codes
- New York (Manhattan) and Paris Neighborhoods
- New York (Manhattan) and Paris latitude and longitudes

In [1]:
!pip install folium

Collecting folium
  Downloading folium-0.11.0-py2.py3-none-any.whl (93 kB)
[K     |████████████████████████████████| 93 kB 4.0 MB/s eta 0:00:011
Collecting branca>=0.3.0
  Downloading branca-0.4.1-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0


In [2]:
!pip install geopy



In [3]:
#Importing libraries
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import folium
import json
from pandas.io.json import json_normalize
import requests
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
from geopy.geocoders import Nominatim

# New York (Manhattan)

In [4]:
#Download Data
!wget -q -O 'newyork_data.json' https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json

In [5]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
neighborhoods_data = newyork_data['features']

In [6]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

#Creating the dataframe
for data in neighborhoods_data: #for is used to loop through every feature 
    borough = neighborhood_name = data['properties']['borough'] #assigning the borough values from the nested dictionary where "data" is a single feature
    neighborhood_name = data['properties']['name'] #assigning the name values from the nested distionary
        
    neighborhood_latlon = data['geometry']['coordinates'] #assigning the lat and long
    neighborhood_lat = neighborhood_latlon[1] #taking the second item in the array as latitude
    neighborhood_lon = neighborhood_latlon[0] #taking the first item in the array as the longitude
    
    #Filling the DF
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
print('Rows: {}, Columns: {}'.format(neighborhoods.shape[0],neighborhoods.shape[1]))

Rows: 306, Columns: 4


In [7]:
#Obtaining Manhattan neighborhoods
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan']
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
6,Manhattan,Marble Hill,40.876551,-73.91066
100,Manhattan,Chinatown,40.715618,-73.994279
101,Manhattan,Washington Heights,40.851903,-73.9369
102,Manhattan,Inwood,40.867684,-73.92121
103,Manhattan,Hamilton Heights,40.823604,-73.949688


In [8]:
#Locating lat and long for Manhattan
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


## Getting the venues

In [9]:
# The code was removed by Watson Studio for sharing.

Your credentails:
CLIENT_ID: WSXLMLLIEZFRCONJWFL2YVI2RVHP0ICX2ZXXRALWPZAM42NS
CLIENT_SECRET:SSUUMDE42AF5ZW1VQHQOQBWMV3Q0GQK05H1MYTJCAIZA5JUI


In [10]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [11]:
#Getting all manhattan venues
manhattan_venues = getNearbyVenues(names = manhattan_data.Neighborhood, latitudes = manhattan_data.Latitude, longitudes = manhattan_data.Longitude)

In [12]:
manhattan_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop
4,Marble Hill,40.876551,-73.91066,Dunkin',40.877136,-73.906666,Donut Shop


## Normalizing data and adding it to a DF

In [13]:
#Normalizing the data

# one hot encoding
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,African Restaurant,American Restaurant,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,...,Video Store,Vietnamese Restaurant,Volleyball Court,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [14]:
#Grouping by Neighborhood and adding the mean

manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
manhattan_grouped

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,African Restaurant,American Restaurant,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,...,Video Store,Vietnamese Restaurant,Volleyball Court,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Battery Park City,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.015152,0.0,0.0,0.0
1,Carnegie Hill,0.0,0.0,0.0,0.0,0.0,0.0,0.011628,0.0,0.011628,...,0.0,0.023256,0.0,0.0,0.0,0.011628,0.034884,0.0,0.011628,0.034884
2,Central Harlem,0.0,0.0,0.066667,0.044444,0.0,0.0,0.0,0.022222,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Chelsea,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.05,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0
4,Chinatown,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,...,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Civic Center,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.04
6,Clinton,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0
7,East Harlem,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,East Village,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,...,0.0,0.02,0.0,0.0,0.0,0.04,0.01,0.01,0.0,0.0
9,Financial District,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0


In [15]:
#Function to sort venues in desceding order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [16]:
#Top 10 venues
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Park,Hotel,Gym,Coffee Shop,Memorial Site,Shopping Mall,Plaza,Burger Joint,Gourmet Shop,Playground
1,Carnegie Hill,Coffee Shop,Café,Bookstore,Italian Restaurant,Gym / Fitness Center,Gym,French Restaurant,Yoga Studio,Wine Shop,Vietnamese Restaurant
2,Central Harlem,African Restaurant,Chinese Restaurant,Bar,Seafood Restaurant,American Restaurant,French Restaurant,Cosmetics Shop,Fried Chicken Joint,Caribbean Restaurant,Café
3,Chelsea,Coffee Shop,Art Gallery,Bakery,American Restaurant,Ice Cream Shop,Italian Restaurant,Park,Bookstore,Cycle Studio,Cupcake Shop
4,Chinatown,Chinese Restaurant,Bakery,Cocktail Bar,American Restaurant,Noodle House,Salon / Barbershop,Shanghai Restaurant,Hotpot Restaurant,Spa,Dessert Shop


## Clustering

In [17]:
# set number of clusters
kclusters = 5

manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 1, 1, 1, 3, 0, 1], dtype=int32)

In [18]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

manhattan_merged = manhattan_data

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

manhattan_merged.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Manhattan,Marble Hill,40.876551,-73.91066,4,Coffee Shop,Sandwich Place,Discount Store,Gym,Supplement Shop,Donut Shop,Tennis Stadium,Kids Store,Pharmacy,Yoga Studio
100,Manhattan,Chinatown,40.715618,-73.994279,1,Chinese Restaurant,Bakery,Cocktail Bar,American Restaurant,Noodle House,Salon / Barbershop,Shanghai Restaurant,Hotpot Restaurant,Spa,Dessert Shop
101,Manhattan,Washington Heights,40.851903,-73.9369,0,Café,Bakery,Grocery Store,Mobile Phone Shop,Bank,Sandwich Place,Coffee Shop,Park,Spanish Restaurant,Deli / Bodega
102,Manhattan,Inwood,40.867684,-73.92121,3,Mexican Restaurant,Café,Lounge,Restaurant,Park,Chinese Restaurant,Bakery,Frozen Yogurt Shop,Caribbean Restaurant,American Restaurant
103,Manhattan,Hamilton Heights,40.823604,-73.949688,0,Pizza Place,Coffee Shop,Café,Mexican Restaurant,Cocktail Bar,Indian Restaurant,Liquor Store,Sushi Restaurant,Park,Deli / Bodega


In [19]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Paris

In [20]:
#scapping data

#Reading URL
url = 'https://en.wikipedia.org/wiki/Arrondissements_of_Paris'
df = pd.read_html(url)

#Converting html to DF
paris_arron = df[2]

In [21]:
#Data manipulation
paris_arron = paris_arron.rename(columns={'Arrondissement (R for Right Bank, L for Left Bank)':'Arrondissement'})
paris_arron = paris_arron.drop(['Area (km2)', 'Population(2017 estimate)', 'Peak of population', 'Density (2017)(inhabitants per km2)','2020-2026','Mayor'],axis=1)

#The first row includes the first 4 arrondissements so we will separate them into different rows
temp_arron = pd.DataFrame({"Arrondissement":['Paris Centre 1st (Ier)','2nd (IIe)','3rd (III)','4th (IVe)'],
                            "Name":['Louvre','Bourse', 'Temple', 'Hôtel-de-Ville']})
#Lets drop the first row
paris_arron.drop([0],inplace=True)

#Lets merge the both dfs and reindex
paris_merged = paris_arron.append(temp_arron,ignore_index=True)
indexes = [16,17,18,19,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
paris_newarron = paris_merged.reindex(indexes)
paris_newarron.reset_index(drop=True,inplace=True)
paris_newarron

Unnamed: 0,Arrondissement,Name
0,Paris Centre 1st (Ier),Louvre
1,2nd (IIe),Bourse
2,3rd (III),Temple
3,4th (IVe),Hôtel-de-Ville
4,5th (Ve) L,Panthéon
5,6th (VIe) L,Luxembourg
6,7th (VIIe) L,Palais-Bourbon
7,8th (VIIIe) R,Élysée
8,9th (IXe) R,Opéra
9,10th (Xe) R,Entrepôt


In [22]:
#Function to get the latitude and longitude of all the arrondissements of Paris

def getArronLocation(name):
    
    arron_list= []
    
    for n in name:
        address = '{}, Paris'.format(n)

        geolocator = Nominatim(user_agent="ny_explorer")
        location = geolocator.geocode(address)
        latitude = location.latitude
        longitude = location.longitude
        
        #adding data to the list
        arron_list.append([(n,latitude,longitude)])
    
    arron_wlocs = pd.DataFrame([item for arron_list in arron_list for item in arron_list])   
    arron_wlocs.columns = ['Name','Latitude','Longitude']
    
    return(arron_wlocs)

In [23]:
#Merging Arrondissements with their respective latitude and longitude
arron_wlocs = getArronLocation(name=paris_newarron.Name)


arron_locs = paris_newarron.merge(arron_wlocs, on = 'Name')
arron_locs

Unnamed: 0,Arrondissement,Name,Latitude,Longitude
0,Paris Centre 1st (Ier),Louvre,48.861147,2.338028
1,2nd (IIe),Bourse,48.86863,2.341474
2,3rd (III),Temple,48.8665,2.360708
3,4th (IVe),Hôtel-de-Ville,48.856426,2.352528
4,5th (Ve) L,Panthéon,48.846191,2.346079
5,6th (VIe) L,Luxembourg,49.504314,6.279185
6,7th (VIIe) L,Palais-Bourbon,48.861596,2.317909
7,8th (VIIIe) R,Élysée,48.846644,2.36983
8,9th (IXe) R,Opéra,48.870645,2.33233
9,10th (Xe) R,Entrepôt,48.876106,2.35991


In [24]:
#Since The 6th Arrondissement's name is Luxembourg, the geolocator is giving wrong latitude and longitude, so we will manually change the value of that row
arron_locs.loc[5,'Latitude'] = 48.850531
arron_locs.loc[5,'Longitude'] = 2.332233
arron_locs

Unnamed: 0,Arrondissement,Name,Latitude,Longitude
0,Paris Centre 1st (Ier),Louvre,48.861147,2.338028
1,2nd (IIe),Bourse,48.86863,2.341474
2,3rd (III),Temple,48.8665,2.360708
3,4th (IVe),Hôtel-de-Ville,48.856426,2.352528
4,5th (Ve) L,Panthéon,48.846191,2.346079
5,6th (VIe) L,Luxembourg,48.850531,2.332233
6,7th (VIIe) L,Palais-Bourbon,48.861596,2.317909
7,8th (VIIIe) R,Élysée,48.846644,2.36983
8,9th (IXe) R,Opéra,48.870645,2.33233
9,10th (Xe) R,Entrepôt,48.876106,2.35991


In [25]:
#Locating Paris
address_par = 'Paris, France'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address_par)
par_lat = location.latitude
par_long = location.longitude

In [26]:
#Map of Paris
map_paris = folium.Map(location=[par_lat, par_long], zoom_start=13)

# add markers to map
for lat, lng, label, a in zip(arron_locs['Latitude'], arron_locs['Longitude'], arron_locs['Name'],arron_locs['Arrondissement']):
    label = '{}, {}'.format(a,label)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_paris)  
    
map_paris

## Getting all Paris venues

In [27]:
#Getting all paris venues
paris_venues = getNearbyVenues(names = arron_locs.Arrondissement, latitudes = arron_locs.Latitude, longitudes = arron_locs.Longitude)
paris_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Paris Centre 1st (Ier),48.861147,2.338028,Cour Carrée du Louvre,48.86036,2.338543,Pedestrian Plaza
1,Paris Centre 1st (Ier),48.861147,2.338028,Musée du Louvre,48.860847,2.33644,Art Museum
2,Paris Centre 1st (Ier),48.861147,2.338028,La Vénus de Milo (Vénus de Milo),48.859943,2.337234,Exhibit
3,Paris Centre 1st (Ier),48.861147,2.338028,Place du Palais Royal,48.862523,2.336688,Plaza
4,Paris Centre 1st (Ier),48.861147,2.338028,Cour Napoléon,48.861172,2.335088,Plaza


In [28]:
paris_venues.shape

(1385, 7)

## Normalizing Data

In [29]:
#Normalizing the data

# one hot encoding
paris_onehot = pd.get_dummies(paris_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
paris_onehot['Neighborhood'] = paris_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [paris_onehot.columns[-1]] + list(paris_onehot.columns[:-1])
paris_onehot = paris_onehot[fixed_columns]

paris_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,Alsatian Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Train Station,Travel Agency,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Paris Centre 1st (Ier),0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Paris Centre 1st (Ier),0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Paris Centre 1st (Ier),0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Paris Centre 1st (Ier),0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Paris Centre 1st (Ier),0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [30]:
#Grouping by Neighborhood and adding the mean

paris_grouped = paris_onehot.groupby('Neighborhood').mean().reset_index()
paris_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,Alsatian Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Train Station,Travel Agency,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,10th (Xe) R,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,...,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.0
1,11th (XIe) R,0.013333,0.013333,0.0,0.0,0.0,0.0,0.013333,0.0,0.013333,...,0.0,0.0,0.0,0.0,0.026667,0.013333,0.013333,0.0,0.013333,0.0
2,12th (XIIe) R,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.02381,0.0,0.0,0.0,0.02381,0.0,0.0,0.0
3,13th (XIIIe) L,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.018182,...,0.0,0.0,0.0,0.0,0.0,0.054545,0.0,0.0,0.0,0.0
4,14th (XIVe) L,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,15th (XVe) L,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,...,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0
6,16th (XVIe) R,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,17th (XVIIe) R,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286
8,18th (XVIIIe) R,0.0,0.0,0.0,0.0,0.014286,0.014286,0.0,0.0,0.014286,...,0.0,0.0,0.0,0.0,0.028571,0.0,0.014286,0.0,0.0,0.0
9,19th (XIXe) R,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0


In [31]:
#Top 10 venues
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
par_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
par_neighborhoods_venues_sorted['Neighborhood'] = paris_grouped['Neighborhood']

for ind in np.arange(paris_grouped.shape[0]):
    par_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(paris_grouped.iloc[ind, :], num_top_venues)

par_neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,10th (Xe) R,French Restaurant,Hotel,Coffee Shop,Bistro,Café,Pizza Place,Japanese Restaurant,Restaurant,Indian Restaurant,Bar
1,11th (XIe) R,French Restaurant,Bistro,Café,Cocktail Bar,Restaurant,Bar,Italian Restaurant,Pastry Shop,Bakery,Japanese Restaurant
2,12th (XIIe) R,Hotel,French Restaurant,Bistro,Supermarket,Bakery,Sushi Restaurant,Ice Cream Shop,Cambodian Restaurant,Cheese Shop,Chinese Restaurant
3,13th (XIIIe) L,Hotel,Bar,French Restaurant,Thai Restaurant,Vietnamese Restaurant,Italian Restaurant,Bakery,Indian Restaurant,Japanese Restaurant,Cosmetics Shop
4,14th (XIVe) L,French Restaurant,Hotel,Bakery,Café,Brasserie,Food & Drink Shop,Bistro,Fast Food Restaurant,Tea Room,Italian Restaurant


## Clustering

In [32]:
# set number of clusters
kclusters = 5

paris_grouped_clustering = paris_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(paris_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 1, 3, 3, 0, 3, 3, 3, 2, 4], dtype=int32)

In [33]:
#changing the column name to Arrondissement
par_neighborhoods_venues = par_neighborhoods_venues_sorted.rename(columns={'Neighborhood':'Arrondissement'})

In [34]:
par_neighborhoods_venues.head()

Unnamed: 0,Arrondissement,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,10th (Xe) R,French Restaurant,Hotel,Coffee Shop,Bistro,Café,Pizza Place,Japanese Restaurant,Restaurant,Indian Restaurant,Bar
1,11th (XIe) R,French Restaurant,Bistro,Café,Cocktail Bar,Restaurant,Bar,Italian Restaurant,Pastry Shop,Bakery,Japanese Restaurant
2,12th (XIIe) R,Hotel,French Restaurant,Bistro,Supermarket,Bakery,Sushi Restaurant,Ice Cream Shop,Cambodian Restaurant,Cheese Shop,Chinese Restaurant
3,13th (XIIIe) L,Hotel,Bar,French Restaurant,Thai Restaurant,Vietnamese Restaurant,Italian Restaurant,Bakery,Indian Restaurant,Japanese Restaurant,Cosmetics Shop
4,14th (XIVe) L,French Restaurant,Hotel,Bakery,Café,Brasserie,Food & Drink Shop,Bistro,Fast Food Restaurant,Tea Room,Italian Restaurant


In [35]:
# add clustering labels
par_neighborhoods_venues.insert(0, 'Cluster Labels', kmeans.labels_)

paris_merged = arron_locs

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
paris_merged = paris_merged.join(par_neighborhoods_venues.set_index('Arrondissement'), on='Arrondissement')

paris_merged.head()

Unnamed: 0,Arrondissement,Name,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Paris Centre 1st (Ier),Louvre,48.861147,2.338028,1,French Restaurant,Plaza,Coffee Shop,Hotel,Italian Restaurant,Cosmetics Shop,Art Museum,Café,Exhibit,Bar
1,2nd (IIe),Bourse,48.86863,2.341474,1,French Restaurant,Wine Bar,Cocktail Bar,Bistro,Hotel,Japanese Restaurant,Salad Place,Italian Restaurant,Bakery,Clothing Store
2,3rd (III),Temple,48.8665,2.360708,1,French Restaurant,Hotel,Wine Bar,Restaurant,Art Gallery,Sandwich Place,Bistro,Chinese Restaurant,Coffee Shop,Bakery
3,4th (IVe),Hôtel-de-Ville,48.856426,2.352528,1,French Restaurant,Ice Cream Shop,Plaza,Art Gallery,Theater,Clothing Store,Bookstore,Cocktail Bar,Coffee Shop,Gay Bar
4,5th (Ve) L,Panthéon,48.846191,2.346079,3,French Restaurant,Bar,Hotel,Italian Restaurant,Bakery,Café,Pub,Indie Movie Theater,Plaza,Ice Cream Shop


In [36]:
# create map
paris_cluster = folium.Map(location=[par_lat, par_long], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(paris_merged['Latitude'], paris_merged['Longitude'], paris_merged['Arrondissement'], paris_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(paris_cluster)
       
paris_cluster

# Comparing both maps

In [37]:
map_clusters

In [38]:
paris_cluster

## Examining Clusters

### Manhattan

In [39]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
101,Washington Heights,Café,Bakery,Grocery Store,Mobile Phone Shop,Bank,Sandwich Place,Coffee Shop,Park,Spanish Restaurant,Deli / Bodega
103,Hamilton Heights,Pizza Place,Coffee Shop,Café,Mexican Restaurant,Cocktail Bar,Indian Restaurant,Liquor Store,Sushi Restaurant,Park,Deli / Bodega
108,Yorkville,Italian Restaurant,Coffee Shop,Gym,Bar,Deli / Bodega,Sushi Restaurant,Japanese Restaurant,Wine Shop,Mexican Restaurant,Diner
109,Lenox Hill,Italian Restaurant,Sushi Restaurant,Coffee Shop,Pizza Place,Cocktail Bar,Deli / Bodega,Gym,Gym / Fitness Center,Café,Burger Joint
111,Upper West Side,Italian Restaurant,Bar,Café,Indian Restaurant,Coffee Shop,Wine Bar,Pizza Place,Bakery,Ice Cream Shop,Mediterranean Restaurant
112,Lincoln Square,Plaza,Performing Arts Venue,Concert Hall,Italian Restaurant,Café,Theater,Bakery,French Restaurant,Indie Movie Theater,Wine Shop
117,Greenwich Village,Italian Restaurant,Sushi Restaurant,Clothing Store,Café,Indian Restaurant,American Restaurant,Gym,Boutique,Bubble Tea Shop,Chinese Restaurant
118,East Village,Bar,Pizza Place,Mexican Restaurant,Wine Bar,Coffee Shop,Ice Cream Shop,Vegetarian / Vegan Restaurant,Italian Restaurant,Cocktail Bar,Speakeasy
120,Tribeca,Park,American Restaurant,Wine Bar,Italian Restaurant,Coffee Shop,Spa,Café,Greek Restaurant,French Restaurant,Skate Park
121,Little Italy,Bakery,Café,Italian Restaurant,Bubble Tea Shop,Chinese Restaurant,Mediterranean Restaurant,Ice Cream Shop,Cocktail Bar,Pizza Place,Coffee Shop


In [40]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
100,Chinatown,Chinese Restaurant,Bakery,Cocktail Bar,American Restaurant,Noodle House,Salon / Barbershop,Shanghai Restaurant,Hotpot Restaurant,Spa,Dessert Shop
105,Central Harlem,African Restaurant,Chinese Restaurant,Bar,Seafood Restaurant,American Restaurant,French Restaurant,Cosmetics Shop,Fried Chicken Joint,Caribbean Restaurant,Café
107,Upper East Side,Exhibit,Italian Restaurant,Coffee Shop,Bakery,Gym / Fitness Center,Yoga Studio,Cosmetics Shop,French Restaurant,Juice Bar,Spa
110,Roosevelt Island,Deli / Bodega,Japanese Restaurant,Outdoors & Recreation,Greek Restaurant,Supermarket,Bubble Tea Shop,Food & Drink Shop,Soccer Field,Farmers Market,School
113,Clinton,Theater,Gym / Fitness Center,American Restaurant,Sandwich Place,Coffee Shop,Gym,Spa,Hotel,Italian Restaurant,Pizza Place
114,Midtown,Hotel,Bakery,Coffee Shop,Steakhouse,Theater,Sporting Goods Shop,Clothing Store,Sandwich Place,Bookstore,Pizza Place
115,Murray Hill,Japanese Restaurant,Coffee Shop,Hotel,Gym / Fitness Center,Sandwich Place,American Restaurant,Bar,Restaurant,Pizza Place,Italian Restaurant
116,Chelsea,Coffee Shop,Art Gallery,Bakery,American Restaurant,Ice Cream Shop,Italian Restaurant,Park,Bookstore,Cycle Studio,Cupcake Shop
119,Lower East Side,Chinese Restaurant,Pharmacy,Coffee Shop,Café,Bakery,Japanese Restaurant,Art Gallery,Pizza Place,Ramen Restaurant,Pet Café
124,Manhattan Valley,Bar,Coffee Shop,Yoga Studio,Pizza Place,Playground,Thai Restaurant,Mexican Restaurant,Cosmetics Shop,Bike Shop,Bubble Tea Shop


In [41]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
275,Stuyvesant Town,Park,Pet Service,Cocktail Bar,Harbor / Marina,Gym / Fitness Center,Baseball Field,Bar,Bistro,Heliport,Farmers Market


In [42]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
102,Inwood,Mexican Restaurant,Café,Lounge,Restaurant,Park,Chinese Restaurant,Bakery,Frozen Yogurt Shop,Caribbean Restaurant,American Restaurant
104,Manhattanville,Seafood Restaurant,Coffee Shop,Deli / Bodega,Italian Restaurant,Mexican Restaurant,Bus Station,Lounge,Boutique,Sushi Restaurant,Supermarket
106,East Harlem,Mexican Restaurant,Bakery,Thai Restaurant,Deli / Bodega,Latin American Restaurant,Sandwich Place,Spa,Liquor Store,Taco Place,Gas Station
125,Morningside Heights,Park,American Restaurant,Coffee Shop,Bookstore,Burger Joint,Café,Deli / Bodega,Pub,Paper / Office Supplies Store,Seafood Restaurant
274,Tudor City,Café,Park,Mexican Restaurant,Asian Restaurant,Diner,Deli / Bodega,Coffee Shop,Restaurant,Garden,Sushi Restaurant


In [43]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 4, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Marble Hill,Coffee Shop,Sandwich Place,Discount Store,Gym,Supplement Shop,Donut Shop,Tennis Stadium,Kids Store,Pharmacy,Yoga Studio


### Paris

In [44]:
paris_merged.loc[paris_merged['Cluster Labels'] == 0, paris_merged.columns[[1] + list(range(5, paris_merged.shape[1]))]]

Unnamed: 0,Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Observatoire,French Restaurant,Hotel,Bakery,Café,Brasserie,Food & Drink Shop,Bistro,Fast Food Restaurant,Tea Room,Italian Restaurant


In [45]:
paris_merged.loc[paris_merged['Cluster Labels'] == 1, paris_merged.columns[[1] + list(range(5, paris_merged.shape[1]))]]

Unnamed: 0,Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Louvre,French Restaurant,Plaza,Coffee Shop,Hotel,Italian Restaurant,Cosmetics Shop,Art Museum,Café,Exhibit,Bar
1,Bourse,French Restaurant,Wine Bar,Cocktail Bar,Bistro,Hotel,Japanese Restaurant,Salad Place,Italian Restaurant,Bakery,Clothing Store
2,Temple,French Restaurant,Hotel,Wine Bar,Restaurant,Art Gallery,Sandwich Place,Bistro,Chinese Restaurant,Coffee Shop,Bakery
3,Hôtel-de-Ville,French Restaurant,Ice Cream Shop,Plaza,Art Gallery,Theater,Clothing Store,Bookstore,Cocktail Bar,Coffee Shop,Gay Bar
5,Luxembourg,French Restaurant,Plaza,Italian Restaurant,Café,Chocolate Shop,Pastry Shop,Ice Cream Shop,Tailor Shop,Wine Bar,Steakhouse
10,Popincourt,French Restaurant,Bistro,Café,Cocktail Bar,Restaurant,Bar,Italian Restaurant,Pastry Shop,Bakery,Japanese Restaurant


In [46]:
paris_merged.loc[paris_merged['Cluster Labels'] == 2, paris_merged.columns[[1] + list(range(5, paris_merged.shape[1]))]]

Unnamed: 0,Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,Butte-Montmartre,Bar,French Restaurant,Italian Restaurant,Bistro,Convenience Store,Middle Eastern Restaurant,Café,Coffee Shop,Gastropub,Plaza
19,Ménilmontant,Bar,Pizza Place,Cocktail Bar,Restaurant,Brewery,Burger Joint,Beer Bar,Italian Restaurant,French Restaurant,Kebab Restaurant


In [47]:
paris_merged.loc[paris_merged['Cluster Labels'] == 3, paris_merged.columns[[1] + list(range(5, paris_merged.shape[1]))]]

Unnamed: 0,Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Panthéon,French Restaurant,Bar,Hotel,Italian Restaurant,Bakery,Café,Pub,Indie Movie Theater,Plaza,Ice Cream Shop
7,Élysée,Hotel,French Restaurant,Sandwich Place,Bakery,Hotel Bar,Harbor / Marina,Train Station,Cocktail Bar,Coffee Shop,Plaza
8,Opéra,Hotel,French Restaurant,Japanese Restaurant,Clothing Store,Coffee Shop,Pastry Shop,Bakery,Café,Tea Room,Sandwich Place
9,Entrepôt,French Restaurant,Hotel,Coffee Shop,Bistro,Café,Pizza Place,Japanese Restaurant,Restaurant,Indian Restaurant,Bar
11,Reuilly,Hotel,French Restaurant,Bistro,Supermarket,Bakery,Sushi Restaurant,Ice Cream Shop,Cambodian Restaurant,Cheese Shop,Chinese Restaurant
12,Gobelins,Hotel,Bar,French Restaurant,Thai Restaurant,Vietnamese Restaurant,Italian Restaurant,Bakery,Indian Restaurant,Japanese Restaurant,Cosmetics Shop
14,Vaugirard,French Restaurant,Italian Restaurant,Hotel,Coffee Shop,Park,Supermarket,Japanese Restaurant,Bar,Lebanese Restaurant,Gastropub
15,Passy,French Restaurant,Hotel,Italian Restaurant,Japanese Restaurant,Plaza,Bar,Thai Restaurant,Bakery,Clothing Store,Chinese Restaurant
16,Batignolles-Monceau,French Restaurant,Hotel,Bistro,Italian Restaurant,Bakery,Sushi Restaurant,Pizza Place,Steakhouse,Japanese Restaurant,Mediterranean Restaurant


In [48]:
paris_merged.loc[paris_merged['Cluster Labels'] == 4, paris_merged.columns[[1] + list(range(5, paris_merged.shape[1]))]]

Unnamed: 0,Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Palais-Bourbon,French Restaurant,Plaza,Hotel,Food Truck,Italian Restaurant,Beer Garden,Pedestrian Plaza,Coffee Shop,Bakery,Fountain
18,Buttes-Chaumont,French Restaurant,Pool,Restaurant,Italian Restaurant,Park,Thai Restaurant,Skating Rink,Soup Place,Scenic Lookout,Chocolate Shop
