# The Battle of Neighborhoods

## Description of the problem and a discussion of the background

In 2001, Dylan Lauren pioneered the world’s largest confectionery emporium and lifestyle brand, Dylan’s Candy Bar. Its mission is to merge fashion, art and pop culture with candy to ignite the creative spirit and inner child in everyone that visits. This innovative concept has changed the way the world experiences candy today.  

Dylan’s Candy Bar houses over 7,000 confections, boasting an unparalleled selection of candies and candy-related gifts from around the world.

![alt text](https://kidonthetown.com/wp-content/uploads/2019/08/Dylans-Candy-Bar.jpg)

The supposition is made that the Dylan's Candy Bar chain of stores wants to continue to expand internationally. For this reason, they want to open a store in Paris, emblematic city that would be a beautiful showcase for these fashion stores. Then the aim is to find the best location where this shop could be installed in Paris. For this, a study of the neighborhoods of Paris will be carried out.
As these shops are very colorful like you can see in the photo, children are the target clientele by excellence. The study therefore concerns the ability to position the store according to the possible presence of children. Moreover, several stores are already opened in the USA and could be an example of location where a fashion candy bar is likely to open.  

Where is the best place for the candy bar to be seen by children ?   
Where is the best place to open a fashion candy bar like those already opened in the USA ?
What neighborhoods have such criteria ?

## Data

The data of Paris and his neighborhoods will be collected.  
First part of the analysis will be about Paris neigborhoods. A list of values will be selected as important criteria for the localisation of the shop. This values will be analysed in each neighborhood to choose the best ones thanks to Foursquare.  
Second part will be the comparison of existing shop neighborhoods in the USA and neighborhoods in Paris. The aim is to see if some neighborhoods in Paris are closed to the neighborhoods of Dylan's Candy Bars. The methodology will be to evaluate the closeness of two neighborhoods according to their values collected thanks to Foursquare and by using clustering.

In [1]:
import requests
import pandas as pd
import numpy as np

import geocoder
from geopy.geocoders import Nominatim
import folium

## Scrapping the neighborhoods of Paris with opendata

We first scrap the wikipedia page for the neighborhoods of Paris using pandas.

In [2]:
url = 'https://fr.wikipedia.org/wiki/Liste_des_quartiers_administratifs_de_Paris'
response = requests.get(url)
df_list = pd.read_html(response.text)

In [3]:
df_paris = df_list[0]
df_paris = df_paris[['Arrondissement[1],[n 1]', 'Quartiers', 'Quartiers.1', 'Densitéhab/km2']]
df_paris.columns = ['Borough(Arrondissement)', 'NoQuartiers', 'Neighborhood(Quartiers)', 'Density']
print(df_paris.shape)
df_paris.head(10)

(80, 4)


Unnamed: 0,Borough(Arrondissement),NoQuartiers,Neighborhood(Quartiers),Density
0,1er arrondissementdit « du Louvre »,1er,Saint-Germain-l'Auxerrois,1 924
1,1er arrondissementdit « du Louvre »,2e,Halles,21 806
2,1er arrondissementdit « du Louvre »,3e,Palais-Royal,11 661
3,1er arrondissementdit « du Louvre »,4e,Place-Vendôme,11 316
4,2e arrondissementdit « de la Bourse »,5e,Gaillon,7 154
5,2e arrondissementdit « de la Bourse »,6e,Vivienne,11 955
6,2e arrondissementdit « de la Bourse »,7e,Mail,20 802
7,2e arrondissementdit « de la Bourse »,8e,Bonne-Nouvelle,34 514
8,3e arrondissementdit « du Temple »,9e,Arts-et-Métiers,30 063
9,3e arrondissementdit « du Temple »,10e,Enfants-Rouges,31 478


We recover the coordinates of each neighborhood using geocoder.

In [4]:
longitude, latitude = [], []

for quartier in df_paris['Neighborhood(Quartiers)']:
    lat_lng_coords = None
    while(lat_lng_coords is None):
        g = geocoder.arcgis('Quartier {}, Paris, France'.format(quartier))
        lat_lng_coords = g.latlng
    longitude.append(lat_lng_coords[1])
    latitude.append(lat_lng_coords[0])
    
df_paris['Latitude'] = latitude
df_paris['Longitude'] = longitude
df_paris.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # This is added back by InteractiveShellApp.init_path()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if sys.path[0] == '':


Unnamed: 0,Borough(Arrondissement),NoQuartiers,Neighborhood(Quartiers),Density,Latitude,Longitude
0,1er arrondissementdit « du Louvre »,1er,Saint-Germain-l'Auxerrois,1 924,48.858335,2.34464
1,1er arrondissementdit « du Louvre »,2e,Halles,21 806,48.859471,2.346976
2,1er arrondissementdit « du Louvre »,3e,Palais-Royal,11 661,48.862792,2.336958
3,1er arrondissementdit « du Louvre »,4e,Place-Vendôme,11 316,48.867114,2.329976
4,2e arrondissementdit « de la Bourse »,5e,Gaillon,7 154,48.868925,2.3343


Then we create the map of Paris to see the dispotition of the neighborhoods in the city.

In [5]:
address = 'Paris, France'

geolocator = Nominatim(user_agent="paris_explorer")
location = geolocator.geocode(address)
latitude_paris = location.latitude
longitude_paris = location.longitude
print('The geograpical coordinate of Paris are {}, {}.'.format(latitude_paris, longitude_paris))

The geograpical coordinate of Paris are 48.8566969, 2.3514616.


In [6]:
# create map of paris using latitude and longitude values
map_paris = folium.Map(location=[latitude_paris, longitude_paris], zoom_start=12)

# add markers to map
for lat, lng, label in zip(df_paris['Latitude'], df_paris['Longitude'], df_paris['Neighborhood(Quartiers)']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_paris)  
    
map_paris

The coordinates of the neighborhoods are not really precise as the disposition is not regular.   
We have to find another way to get the coordinates. We will use the coordinates of the opendata of Paris.  
[Opendata of neighborhoods](https://opendata.paris.fr/explore/dataset/quartier_paris/information/)

In [7]:
df_paris2 = pd.read_csv('quartier_paris.csv', header=0, sep=';')

In [8]:
df_latlong = df_paris2['Geometry X Y'].str.split(',', expand=True)
df_latlong.columns = ['Latitude', 'Longitude']
df_latlong = df_latlong.astype('float')

In this opendata, there are also the perimeter and the area available for each neighborhood.

In [9]:
df_paris2 = pd.concat([df_paris2, df_latlong], axis=1)
df_paris2 = df_paris2[['C_AR', 'C_QU', 'L_QU', 'PERIMETRE', 'SURFACE', 'Latitude', 'Longitude']]
df_paris2.columns = ['Borough(Arrondissement)', 'NoQuartiers', 'Neighborhood(Quartiers)', 
                     'Perimeter', 'Area', 'Latitude', 'Longitude']
print(df_paris2.shape)
df_paris2.head(10)

(80, 7)


Unnamed: 0,Borough(Arrondissement),NoQuartiers,Neighborhood(Quartiers),Perimeter,Area,Latitude,Longitude
0,5,20,Sorbonne,2892.944068,433197.8,48.849045,2.345747
1,9,33,Saint-Georges,3429.188334,717091.6,48.879934,2.33285
2,9,34,Chaussée-d'Antin,3133.580092,543441.2,48.873547,2.332269
3,1,3,Palais-Royal,2166.839239,273696.8,48.86466,2.336309
4,8,32,Europe,4803.242769,1182467.0,48.878148,2.317175
5,11,44,Sainte-Marguerite,4591.310799,929609.2,48.852097,2.388765
6,14,54,Parc-de-Montsouris,5224.265369,1357950.0,48.823453,2.33707
7,15,57,Saint-Lambert,6928.792072,2829202.0,48.834294,2.29692
8,2,6,Vivienne,2058.472959,243550.8,48.8691,2.339461
9,3,10,Enfants-Rouges,2139.625388,271750.3,48.863887,2.363123


In [10]:
# create map of paris using latitude and longitude values
map_paris = folium.Map(location=[latitude_paris, longitude_paris], zoom_start=12)

# add markers to map
for lat, lng, label in zip(df_paris2['Latitude'], df_paris2['Longitude'], df_paris2['Neighborhood(Quartiers)']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_paris)  
    
map_paris

In [11]:
df_paris.columns

Index(['Borough(Arrondissement)', 'NoQuartiers', 'Neighborhood(Quartiers)',
       'Density', 'Latitude', 'Longitude'],
      dtype='object')

In [12]:
df_paris2.columns

Index(['Borough(Arrondissement)', 'NoQuartiers', 'Neighborhood(Quartiers)',
       'Perimeter', 'Area', 'Latitude', 'Longitude'],
      dtype='object')

A few neighborhoods dont have the same name in the two dataframes.

In [13]:
df_paris['Neighborhood(Quartiers)'].replace({'Sainte-Avoye':'Sainte-Avoie',
                                            'École-Militaire':'Ecole-Militaire',
                                            'Champs-Élysées':'Champs-Elysées',
                                            'Plaine-de-Monceaux':'Plaine de Monceaux',
                                            'Épinettes':'Epinettes',
                                            'Chapelle':'La Chapelle'}, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._update_inplace(new_data)


We merge all the data available about the neighborhoods in a single dataframe.

In [14]:
df_paris_final = pd.merge(df_paris[['Borough(Arrondissement)', 'Neighborhood(Quartiers)', 'Density']], 
                          df_paris2[['NoQuartiers', 'Neighborhood(Quartiers)', 'Perimeter', 'Area', 'Latitude', 'Longitude']],
                          on='Neighborhood(Quartiers)')

In [15]:
df_paris_final['Density'] = df_paris_final['Density'].str.replace('\xa0', '').astype('float')

In [16]:
print(df_paris_final.shape)
df_paris_final.head()

(80, 8)


Unnamed: 0,Borough(Arrondissement),Neighborhood(Quartiers),Density,NoQuartiers,Perimeter,Area,Latitude,Longitude
0,1er arrondissementdit « du Louvre »,Saint-Germain-l'Auxerrois,1924.0,1,5057.549475,869000.6646,48.86065,2.33491
1,1er arrondissementdit « du Louvre »,Halles,21806.0,2,2606.417128,412458.4963,48.862289,2.344899
2,1er arrondissementdit « du Louvre »,Palais-Royal,11661.0,3,2166.839239,273696.7933,48.86466,2.336309
3,1er arrondissementdit « du Louvre »,Place-Vendôme,11316.0,4,2147.817602,269456.7806,48.867019,2.328582
4,2e arrondissementdit « de la Bourse »,Gaillon,7154.0,5,1866.982041,188012.2039,48.869307,2.333432


In [17]:
df_paris_final.dtypes

Borough(Arrondissement)     object
Neighborhood(Quartiers)     object
Density                    float64
NoQuartiers                  int64
Perimeter                  float64
Area                       float64
Latitude                   float64
Longitude                  float64
dtype: object

## Explore the neighborhoods in Paris

In [18]:
CLIENT_ID = 'TUSRXPGOAJSTWDVZSXVDJPLC4VKDMAMDLPLPFB5VFJ0PCM2N' # your Foursquare ID
CLIENT_SECRET = 'Z5QZU4MQNTR3ONFDJHJJT0IJFEQS5RHANLAAF2WGVNKCMC0J' # your Foursquare Secret
VERSION = '20201202' # Foursquare API version
LIMIT = 200 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: TUSRXPGOAJSTWDVZSXVDJPLC4VKDMAMDLPLPFB5VFJ0PCM2N
CLIENT_SECRET:Z5QZU4MQNTR3ONFDJHJJT0IJFEQS5RHANLAAF2WGVNKCMC0J


This function is used to see a Choropleth map of Paris according to a quantitative feature.

In [19]:
paris_geo = r'quartier_paris.geojson'

In [20]:
def map_paris(data, columns, legend):

    map_paris = folium.Map(location=[latitude_paris, longitude_paris], zoom_start=12)

    folium.Choropleth(
        geo_data=paris_geo,
        data=data,
        columns=columns,
        key_on='feature.properties.l_qu',
        fill_color='YlGn',
        fill_opacity=0.7,
        line_opacity=0.5,
        legend_name=legend,
        reset=True
    ).add_to(map_paris)

    for lat, lng, label in zip(df_paris_final['Latitude'], df_paris_final['Longitude'], df_paris_final['Neighborhood(Quartiers)']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(map_paris)  

    return(map_paris)

This function is used to get a dataframe of all the venues of a specified category near a point in each neighborhood.

In [21]:
def getNearCategorybyVenues(names, latitudes, longitudes, radius, category):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&categoryId={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION,
            category,
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['venues']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['id'],
            v['name'], 
            v['location']['lat'], 
            v['location']['lng'],
            v['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood(Quartiers)', 
                  'Quartier Latitude', 
                  'Quartier Longitude', 
                  'Venue Id', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude',
                  'Venue Category']
    
    return(nearby_venues)

### Density

In [36]:
map_paris(df_paris_final, ['Neighborhood(Quartiers)', 'Density'], 'Density')

### Candy Store

In [22]:
candy_venues = getNearCategorybyVenues(names=df_paris_final['Neighborhood(Quartiers)'],
                                   latitudes=df_paris_final['Latitude'],
                                   longitudes=df_paris_final['Longitude'],
                                   radius=1000,
                                   category='4bf58dd8d48988d117951735'
                                  )

In [23]:
print(candy_venues.shape)
candy_venues.head()

(913, 8)


Unnamed: 0,Neighborhood(Quartiers),Quartier Latitude,Quartier Longitude,Venue Id,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Saint-Germain-l'Auxerrois,48.86065,2.33491,4ee38f17e5faffd730c77e78,Le Cure Gourmande,48.864152,2.331293,Candy Store
1,Saint-Germain-l'Auxerrois,48.86065,2.33491,4f64a10fe4b09ff9bca54b53,Délices de France,48.86397,2.331448,Candy Store
2,Saint-Germain-l'Auxerrois,48.86065,2.33491,4b989501f964a520a94735e3,Debauve & Gallais,48.855309,2.33107,Candy Store
3,Saint-Germain-l'Auxerrois,48.86065,2.33491,571b9b12498e52f5b11e7bb1,Le Petit Duc,48.861883,2.346845,Candy Store
4,Saint-Germain-l'Auxerrois,48.86065,2.33491,4d0e71c43d45b1f7364e9ff2,Maison Georges Larnicol,48.852551,2.33916,Chocolate Shop


In [24]:
df_candy = candy_venues[['Neighborhood(Quartiers)','Venue']].groupby('Neighborhood(Quartiers)').count()

In [25]:
df_candy.reset_index(inplace=True)

In [26]:
map_paris(df_candy, ['Neighborhood(Quartiers)', 'Venue'], 'Candy Stores')

### Elemantary School

In [51]:
school_venues = getNearCategorybyVenues(names=df_paris_final['Neighborhood(Quartiers)'],
                                   latitudes=df_paris_final['Latitude'],
                                   longitudes=df_paris_final['Longitude'],
                                   radius=1000,
                                   category='4f4533804b9074f6e4fb0105'
                                  )

In [52]:
print(school_venues.shape)
school_venues.head()

(862, 8)


Unnamed: 0,Neighborhood(Quartiers),Quartier Latitude,Quartier Longitude,Venue Id,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Saint-Germain-l'Auxerrois,48.86065,2.33491,4ba87900f964a5205ddc39e3,Ecole Elémentaire Argenteuil,48.86538,2.333961,Elementary School
1,Saint-Germain-l'Auxerrois,48.86065,2.33491,4ba655ccf964a520894739e3,École Cambon,48.868087,2.326922,Elementary School
2,Saint-Germain-l'Auxerrois,48.86065,2.33491,4b95fd8ef964a5207db934e3,Ecole élémentaire Chomel,48.852166,2.325719,Elementary School
3,Saint-Germain-l'Auxerrois,48.86065,2.33491,4c6e60abef4b199ca5827c66,École Sainte Clotilde,48.85559,2.32339,Elementary School
4,Saint-Germain-l'Auxerrois,48.86065,2.33491,4ba6450df964a520fb4039e3,École Étienne Marcel,48.864069,2.348221,Elementary School


In [53]:
df_school = school_venues[['Neighborhood(Quartiers)','Venue']].groupby('Neighborhood(Quartiers)').count()

In [54]:
df_school.reset_index(inplace=True)

In [55]:
map_paris(df_school, ['Neighborhood(Quartiers)', 'Venue'], 'Elementary Schools')

### Playground

In [57]:
playground_venues = getNearCategorybyVenues(names=df_paris_final['Neighborhood(Quartiers)'],
                                   latitudes=df_paris_final['Latitude'],
                                   longitudes=df_paris_final['Longitude'],
                                   radius=1000,
                                   category='5744ccdfe4b0c0459246b4b5,4bf58dd8d48988d1e7941735'
                                  )

In [58]:
print(playground_venues.shape)
playground_venues.head()

(1031, 8)


Unnamed: 0,Neighborhood(Quartiers),Quartier Latitude,Quartier Longitude,Venue Id,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Saint-Germain-l'Auxerrois,48.86065,2.33491,580c9f62d67c329461f882c2,Aire de jeu des Moyens du Jardin des Halles,48.862189,2.343587,Playground
1,Saint-Germain-l'Auxerrois,48.86065,2.33491,5d18e96adbde110025505a13,Aire De Jeux,48.864565,2.327222,Playground
2,Saint-Germain-l'Auxerrois,48.86065,2.33491,5169644be4b0d8dadc46f9cd,Terrain d'aventures,48.862061,2.344427,Playground
3,Saint-Germain-l'Auxerrois,48.86065,2.33491,4c5fffefde6920a14ce29464,Square St Germain des Prés,48.852831,2.337404,Playground
4,Saint-Germain-l'Auxerrois,48.86065,2.33491,4bc044d4920eb7139ebd182c,Square Desruelles,48.853493,2.334667,Playground


In [59]:
df_playground = playground_venues[['Neighborhood(Quartiers)','Venue']].groupby('Neighborhood(Quartiers)').count()

In [60]:
df_playground.reset_index(inplace=True)

In [61]:
map_paris(df_playground, ['Neighborhood(Quartiers)', 'Venue'], 'Playgrounds')

### Games (Toy / Game Store, Video Games Store)

In [62]:
game_venues = getNearCategorybyVenues(names=df_paris_final['Neighborhood(Quartiers)'],
                                   latitudes=df_paris_final['Latitude'],
                                   longitudes=df_paris_final['Longitude'],
                                   radius=1000,
                                   category='4bf58dd8d48988d1f3941735,4bf58dd8d48988d10b951735'
                                  )

In [63]:
print(game_venues.shape)
game_venues.head()

(1690, 8)


Unnamed: 0,Neighborhood(Quartiers),Quartier Latitude,Quartier Longitude,Venue Id,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Saint-Germain-l'Auxerrois,48.86065,2.33491,5dd2bb5b0f6819000755d83c,King Jouet,48.860659,2.342073,Toy / Game Store
1,Saint-Germain-l'Auxerrois,48.86065,2.33491,4b894991f964a520af2832e3,Variantes,48.853374,2.342223,Toy / Game Store
2,Saint-Germain-l'Auxerrois,48.86065,2.33491,4baa0b2cf964a5202a463ae3,Le Bridgeur,48.869747,2.335099,Toy / Game Store
3,Saint-Germain-l'Auxerrois,48.86065,2.33491,4b4f0fdaf964a5208bf926e3,EOL modelisme,48.861273,2.340989,Toy / Game Store
4,Saint-Germain-l'Auxerrois,48.86065,2.33491,522f43be498ec5e147de95e6,Games Workshop,48.851725,2.343622,Toy / Game Store


In [64]:
df_game = game_venues[['Neighborhood(Quartiers)','Venue']].groupby('Neighborhood(Quartiers)').count()

In [65]:
df_game.reset_index(inplace=True)

In [66]:
map_paris(df_game, ['Neighborhood(Quartiers)', 'Venue'], 'Games')

### Athletics & Sports

In [67]:
sport_venues = getNearCategorybyVenues(names=df_paris_final['Neighborhood(Quartiers)'],
                                   latitudes=df_paris_final['Latitude'],
                                   longitudes=df_paris_final['Longitude'],
                                   radius=1000,
                                   category='4f4528bc4b90abdf24c9de85'
                                  )

In [68]:
print(sport_venues.shape)
sport_venues.head()

(3350, 8)


Unnamed: 0,Neighborhood(Quartiers),Quartier Latitude,Quartier Longitude,Venue Id,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Saint-Germain-l'Auxerrois,48.86065,2.33491,4bc4cff0abf49521509bc593,GYM-LOUVRE,48.862214,2.341375,Gym / Fitness Center
1,Saint-Germain-l'Auxerrois,48.86065,2.33491,5602bf77498ee30f21777624,Neoness Paris Châtelet Montorgueil,48.865143,2.349847,Gym
2,Saint-Germain-l'Auxerrois,48.86065,2.33491,4b219532f964a520563e24e3,Klay,48.866039,2.349656,Gym / Fitness Center
3,Saint-Germain-l'Auxerrois,48.86065,2.33491,4b508c9bf964a520ee2627e3,Gymnase du marché St. Germain,48.852448,2.335935,Gym
4,Saint-Germain-l'Auxerrois,48.86065,2.33491,5b0edb69149946002c47ecfd,Midtown Studio,48.865047,2.342022,Gym / Fitness Center


In [69]:
df_sport = sport_venues[['Neighborhood(Quartiers)','Venue']].groupby('Neighborhood(Quartiers)').count()

In [70]:
df_sport.reset_index(inplace=True)

In [72]:
map_paris(df_sport, ['Neighborhood(Quartiers)', 'Venue'], 'Sport')

### Sport Clubs

In [74]:
club_venues = getNearCategorybyVenues(names=df_paris_final['Neighborhood(Quartiers)'],
                                   latitudes=df_paris_final['Latitude'],
                                   longitudes=df_paris_final['Longitude'],
                                   radius=1000,
                                   category='52e81612bcbc57f1066b7a2e'
                                  )

In [75]:
print(club_venues.shape)
club_venues.head()

(222, 8)


Unnamed: 0,Neighborhood(Quartiers),Quartier Latitude,Quartier Longitude,Venue Id,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Saint-Germain-l'Auxerrois,48.86065,2.33491,53b7f1c9498e952f7e1b78a0,Temple Noble Art,48.865017,2.335966,Sports Club
1,Saint-Germain-l'Auxerrois,48.86065,2.33491,53a1cc34498e7105f940726a,gym suedoise place des petits pères,48.866275,2.340781,Sports Club
2,Halles,48.862289,2.344899,53b7f1c9498e952f7e1b78a0,Temple Noble Art,48.865017,2.335966,Sports Club
3,Halles,48.862289,2.344899,532e9ca2498e816a23881737,Kylie Minogue Sexcercise Club KMSC,48.870979,2.347883,Sports Club
4,Halles,48.862289,2.344899,53a1cc34498e7105f940726a,gym suedoise place des petits pères,48.866275,2.340781,Sports Club


In [76]:
df_club = club_venues[['Neighborhood(Quartiers)','Venue']].groupby('Neighborhood(Quartiers)').count()

In [77]:
df_club.reset_index(inplace=True)

In [78]:
map_paris(df_club, ['Neighborhood(Quartiers)', 'Venue'], 'club')

## Neighborhoods of Paris and Dylans Candy Bars

This function is used to get a dataframe of all the venues near a point in each neighborhood.

In [33]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood(Quartiers)', 
                  'Quartier Latitude', 
                  'Quartier Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### Neighborhoods of Paris

In [34]:
paris_venues = getNearbyVenues(names=df_paris_final['Neighborhood(Quartiers)'],
                                   latitudes=df_paris_final['Latitude'],
                                   longitudes=df_paris_final['Longitude'],
                                   radius=1000
                                  )

In [35]:
print(paris_venues.shape)
paris_venues.head()

(7631, 7)


Unnamed: 0,Neighborhood(Quartiers),Quartier Latitude,Quartier Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Saint-Germain-l'Auxerrois,48.86065,2.33491,Musée du Louvre,48.860847,2.33644,Art Museum
1,Saint-Germain-l'Auxerrois,48.86065,2.33491,Vestige de la Forteresse du Louvre,48.861577,2.333508,Historic Site
2,Saint-Germain-l'Auxerrois,48.86065,2.33491,La Vénus de Milo (Vénus de Milo),48.859943,2.337234,Exhibit
3,Saint-Germain-l'Auxerrois,48.86065,2.33491,"Pavillon des Sessions – Arts d'Afrique, d'Asie...",48.860724,2.332121,Art Museum
4,Saint-Germain-l'Auxerrois,48.86065,2.33491,Cour Napoléon,48.861172,2.335088,Plaza


In [37]:
paris_venues.groupby('Neighborhood(Quartiers)').count()

Unnamed: 0_level_0,Quartier Latitude,Quartier Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood(Quartiers),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Amérique,85,85,85,85,85,85
Archives,100,100,100,100,100,100
Arsenal,100,100,100,100,100,100
Arts-et-Métiers,100,100,100,100,100,100
Auteuil,72,72,72,72,72,72
Batignolles,80,80,80,80,80,80
Bel-Air,50,50,50,50,50,50
Belleville,100,100,100,100,100,100
Bercy,100,100,100,100,100,100
Bonne-Nouvelle,100,100,100,100,100,100


In [39]:
# one hot encoding
paris_onehot = pd.get_dummies(paris_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
paris_onehot['Neighborhood(Quartiers)'] = paris_venues['Neighborhood(Quartiers)'] 

# move neighborhood column to the first column
fixed_columns = [paris_onehot.columns[-1]] + list(paris_onehot.columns[:-1])
paris_onehot = paris_onehot[fixed_columns]

paris_grouped = paris_onehot.groupby('Neighborhood(Quartiers)').mean().reset_index()
paris_grouped.head()

Unnamed: 0,Neighborhood(Quartiers),Accessories Store,Afghan Restaurant,African Restaurant,Alsatian Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,...,Vietnamese Restaurant,Vineyard,Water Park,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Amérique,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Archives,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.02,...,0.01,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0
2,Arsenal,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,...,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0
3,Arts-et-Métiers,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,...,0.01,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0
4,Auteuil,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.013889,...,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.0


In [40]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [286]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Postal Code']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
paris_venues_sorted = pd.DataFrame(columns=columns)
paris_venues_sorted['Postal Code'] = paris_grouped['Postal Code']

for ind in np.arange(paris_grouped.shape[0]):
    paris_venues_sorted.iloc[ind, 1:] = return_most_common_venues(paris_grouped.iloc[ind, :], num_top_venues)

paris_venues_sorted.head()

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Amérique,Plaza,French Restaurant,Supermarket,Pool,Bed & Breakfast,Park,Café,Theater,Bistro,Zoo Exhibit
1,Archives,French Restaurant,Clothing Store,Coffee Shop,Bistro,Hotel,Cocktail Bar,Art Gallery,Bar,Bookstore,Burger Joint
2,Arsenal,French Restaurant,Hotel,Park,Tapas Restaurant,Plaza,Pedestrian Plaza,Italian Restaurant,Boat or Ferry,Thai Restaurant,Vegetarian / Vegan Restaurant
3,Arts-et-Métiers,French Restaurant,Hotel,Wine Bar,Cocktail Bar,Italian Restaurant,Bar,Restaurant,Chinese Restaurant,Japanese Restaurant,Vietnamese Restaurant
4,Auteuil,Tennis Court,Stadium,Garden,Racecourse,Outdoors & Recreation,Museum,Botanical Garden,French Restaurant,Office,Sporting Goods Shop


In [41]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood(Quartiers)']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
paris_venues_sorted = pd.DataFrame(columns=columns)
paris_venues_sorted['Neighborhood(Quartiers)'] = paris_grouped['Neighborhood(Quartiers)']

for ind in np.arange(paris_grouped.shape[0]):
    paris_venues_sorted.iloc[ind, 1:] = return_most_common_venues(paris_grouped.iloc[ind, :], num_top_venues)

paris_venues_sorted.head()

Unnamed: 0,Neighborhood(Quartiers),1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Amérique,French Restaurant,Bar,Supermarket,Hotel,Japanese Restaurant,Bakery,Theater,Chinese Restaurant,Grocery Store,Metro Station
1,Archives,French Restaurant,Art Gallery,Burger Joint,Hotel,Coffee Shop,Clothing Store,Pizza Place,Pastry Shop,Cocktail Bar,Wine Bar
2,Arsenal,French Restaurant,Hotel,Plaza,Cocktail Bar,Italian Restaurant,Tapas Restaurant,Pizza Place,Bakery,Coffee Shop,Pedestrian Plaza
3,Arts-et-Métiers,Coffee Shop,Bakery,Cocktail Bar,Burger Joint,Wine Bar,Italian Restaurant,Sandwich Place,Art Museum,Restaurant,Furniture / Home Store
4,Auteuil,Tennis Court,French Restaurant,Supermarket,Sporting Goods Shop,Plaza,Flower Shop,Italian Restaurant,Restaurant,Garden,Outdoors & Recreation


### Neighborhoods of Dylans Candy Bars

The bars in the airports were excluded from the study as the neighborhood of such locations is really specific.

In [42]:
dylans_bar = ['1011 Third Ave, New York, NY 10065',
        '6333 West Third Street, Los Angeles, CA 90036',
        '801 Lincoln Rd, Miami Beach, FL 33139',
        '20 Hudson Yards, 4th floor, New York, NY 10001',
        '1000 8th Ave, New York, NY 10019',
        '2424 Kalakaua Avenue, Honolulu, HI 96815',
        '52 Main Street, East Hampton, NY 11937',
        '231 Hudson Street, New York, NY 10013',
        '127 S. Ocean Road, New Providence, Bahamas']

In [43]:
df_dylans = pd.DataFrame(dylans_bar, columns=['Dylans'])

In [44]:
longitude, latitude = [], []

for dylans in df_dylans['Dylans']:
    lat_lng_coords = None
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}'.format(dylans))
        lat_lng_coords = g.latlng
    longitude.append(lat_lng_coords[1])
    latitude.append(lat_lng_coords[0])
    print(dylans)
    
df_dylans['Latitude'] = latitude
df_dylans['Longitude'] = longitude

1011 Third Ave, New York, NY 10065
6333 West Third Street, Los Angeles, CA 90036
801 Lincoln Rd, Miami Beach, FL 33139
20 Hudson Yards, 4th floor, New York, NY 10001
1000 8th Ave, New York, NY 10019
2424 Kalakaua Avenue, Honolulu, HI 96815
52 Main Street, East Hampton, NY 11937
231 Hudson Street, New York, NY 10013
127 S. Ocean Road, New Providence, Bahamas


In [45]:
df_dylans

Unnamed: 0,Dylans,Latitude,Longitude
0,"1011 Third Ave, New York, NY 10065",40.76242,-73.965726
1,"6333 West Third Street, Los Angeles, CA 90036",34.071988,-118.360331
2,"801 Lincoln Rd, Miami Beach, FL 33139",25.790711,-80.136626
3,"20 Hudson Yards, 4th floor, New York, NY 10001",40.753502,-74.000887
4,"1000 8th Ave, New York, NY 10019",40.76733,-73.982654
5,"2424 Kalakaua Avenue, Honolulu, HI 96815",21.27599,-157.825015
6,"52 Main Street, East Hampton, NY 11937",40.962721,-72.185752
7,"231 Hudson Street, New York, NY 10013",40.72442,-74.007988
8,"127 S. Ocean Road, New Providence, Bahamas",25.031505,-77.348232


In [46]:
map_dylans = folium.Map(location=[35.071988, -115.060331], zoom_start=4)

for lat, lng, label in zip(df_dylans['Latitude'], df_dylans['Longitude'], df_dylans['Dylans']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_dylans)  

map_dylans

In [47]:
dylans_venues = getNearbyVenues(names=df_dylans['Dylans'],
                                   latitudes=df_dylans['Latitude'],
                                   longitudes=df_dylans['Longitude'],
                                   radius=1000
                                  )

In [48]:
print(dylans_venues.shape)
dylans_venues.head()

(785, 7)


Unnamed: 0,Neighborhood(Quartiers),Quartier Latitude,Quartier Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"1011 Third Ave, New York, NY 10065",40.76242,-73.965726,Birch Coffee,40.763781,-73.966492,Coffee Shop
1,"1011 Third Ave, New York, NY 10065",40.76242,-73.965726,JackRabbit,40.763682,-73.965032,Sporting Goods Shop
2,"1011 Third Ave, New York, NY 10065",40.76242,-73.965726,Equinox East 63rd Street,40.764489,-73.966511,Gym
3,"1011 Third Ave, New York, NY 10065",40.76242,-73.965726,Magnolia Bakery,40.761979,-73.966547,Bakery
4,"1011 Third Ave, New York, NY 10065",40.76242,-73.965726,The Pleasure Chest New York UES,40.761423,-73.963562,Adult Boutique


In [49]:
dylans_venues.groupby('Neighborhood(Quartiers)').count()

Unnamed: 0_level_0,Quartier Latitude,Quartier Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood(Quartiers),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"1000 8th Ave, New York, NY 10019",100,100,100,100,100,100
"1011 Third Ave, New York, NY 10065",100,100,100,100,100,100
"127 S. Ocean Road, New Providence, Bahamas",16,16,16,16,16,16
"20 Hudson Yards, 4th floor, New York, NY 10001",100,100,100,100,100,100
"231 Hudson Street, New York, NY 10013",100,100,100,100,100,100
"2424 Kalakaua Avenue, Honolulu, HI 96815",100,100,100,100,100,100
"52 Main Street, East Hampton, NY 11937",69,69,69,69,69,69
"6333 West Third Street, Los Angeles, CA 90036",100,100,100,100,100,100
"801 Lincoln Rd, Miami Beach, FL 33139",100,100,100,100,100,100


In [50]:
# one hot encoding
dylans_onehot = pd.get_dummies(dylans_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dylans_onehot['Neighborhood(Quartiers)'] = dylans_venues['Neighborhood(Quartiers)'] 

# move neighborhood column to the first column
fixed_columns = [dylans_onehot.columns[-1]] + list(dylans_onehot.columns[:-1])
dylans_onehot = dylans_onehot[fixed_columns]

dylans_grouped = dylans_onehot.groupby('Neighborhood(Quartiers)').mean().reset_index()
dylans_grouped.head()

Unnamed: 0,Neighborhood(Quartiers),Accessories Store,Adult Boutique,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Auditorium,...,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Volleyball Court,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo
0,"1000 8th Ave, New York, NY 10019",0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0
1,"1011 Third Ave, New York, NY 10065",0.01,0.01,0.01,0.0,0.01,0.01,0.0,0.01,0.0,...,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.02,0.0,0.0
2,"127 S. Ocean Road, New Providence, Bahamas",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"20 Hudson Yards, 4th floor, New York, NY 10001",0.0,0.0,0.02,0.01,0.14,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"231 Hudson Street, New York, NY 10013",0.0,0.0,0.05,0.0,0.01,0.0,0.01,0.0,0.0,...,0.0,0.01,0.01,0.01,0.01,0.03,0.01,0.0,0.0,0.0


In [56]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood(Quartiers)']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
dylans_venues_sorted = pd.DataFrame(columns=columns)
dylans_venues_sorted['Neighborhood(Quartiers)'] = dylans_grouped['Neighborhood(Quartiers)']

for ind in np.arange(dylans_grouped.shape[0]):
    dylans_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dylans_grouped.iloc[ind, :], num_top_venues)

dylans_venues_sorted.head()

Unnamed: 0,Neighborhood(Quartiers),1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"1000 8th Ave, New York, NY 10019",Theater,Concert Hall,Jazz Club,Performing Arts Venue,Hotel,Sandwich Place,Coffee Shop,Spa,Italian Restaurant,Wine Bar
1,"1011 Third Ave, New York, NY 10065",Hotel,Italian Restaurant,Boutique,Gym,French Restaurant,Cycle Studio,Gym / Fitness Center,Department Store,Spa,Salon / Barbershop
2,"127 S. Ocean Road, New Providence, Bahamas",Fast Food Restaurant,Department Store,Shipping Store,Ice Cream Shop,Fried Chicken Joint,Plaza,Supermarket,Pizza Place,Furniture / Home Store,Pharmacy
3,"20 Hudson Yards, 4th floor, New York, NY 10001",Art Gallery,Park,Hotel,Gym / Fitness Center,Dance Studio,Indie Theater,Lounge,Theater,Coffee Shop,Music Venue
4,"231 Hudson Street, New York, NY 10013",Italian Restaurant,Clothing Store,American Restaurant,Café,Sushi Restaurant,Coffee Shop,Gym,Wine Bar,Hotel,Men's Store


### Cluster Neighborhoods

In [51]:
df_clustering = pd.concat([paris_grouped.drop('Neighborhood(Quartiers)', 1), dylans_grouped.drop('Neighborhood(Quartiers)', 1)], sort=True)
df_clustering.fillna(0, inplace=True)

In [52]:
from sklearn.cluster import KMeans

In [53]:
# set number of clusters
kclusters = 9

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(df_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 6, 6, 2, 0, 0, 0, 0, 6, 6])

In [58]:
# add clustering labels
#paris_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_[:80])
dylans_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_[80:])
paris_venues_sorted['Cluster Labels'] = kmeans.labels_[:80]
#dylans_venues_sorted['Cluster Labels'] = kmeans.labels_[80:]

In [62]:
dylans_venues_sorted

Unnamed: 0,Cluster Labels,Neighborhood(Quartiers),1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,2,"1000 8th Ave, New York, NY 10019",Theater,Concert Hall,Jazz Club,Performing Arts Venue,Hotel,Sandwich Place,Coffee Shop,Spa,Italian Restaurant,Wine Bar
1,2,"1011 Third Ave, New York, NY 10065",Hotel,Italian Restaurant,Boutique,Gym,French Restaurant,Cycle Studio,Gym / Fitness Center,Department Store,Spa,Salon / Barbershop
2,5,"127 S. Ocean Road, New Providence, Bahamas",Fast Food Restaurant,Department Store,Shipping Store,Ice Cream Shop,Fried Chicken Joint,Plaza,Supermarket,Pizza Place,Furniture / Home Store,Pharmacy
3,2,"20 Hudson Yards, 4th floor, New York, NY 10001",Art Gallery,Park,Hotel,Gym / Fitness Center,Dance Studio,Indie Theater,Lounge,Theater,Coffee Shop,Music Venue
4,2,"231 Hudson Street, New York, NY 10013",Italian Restaurant,Clothing Store,American Restaurant,Café,Sushi Restaurant,Coffee Shop,Gym,Wine Bar,Hotel,Men's Store
5,1,"2424 Kalakaua Avenue, Honolulu, HI 96815",Hotel,American Restaurant,Japanese Restaurant,Steakhouse,Dessert Shop,Beach,Hawaiian Restaurant,Shopping Mall,Surf Spot,Sushi Restaurant
6,2,"52 Main Street, East Hampton, NY 11937",Italian Restaurant,Women's Store,Clothing Store,Hotel,Bakery,Bank,Coffee Shop,Pharmacy,Bookstore,Park
7,2,"6333 West Third Street, Los Angeles, CA 90036",Italian Restaurant,Coffee Shop,French Restaurant,Café,Mexican Restaurant,American Restaurant,Burger Joint,Clothing Store,Bookstore,Grocery Store
8,2,"801 Lincoln Rd, Miami Beach, FL 33139",Clothing Store,Italian Restaurant,Hotel,Bakery,Peruvian Restaurant,Coffee Shop,Vegetarian / Vegan Restaurant,Pizza Place,Cuban Restaurant,Art Gallery


In [59]:
total_merged = pd.concat([paris_venues_sorted, dylans_venues_sorted])

Cluster with the neighborhoods of the dylans bars.

In [61]:
total_merged.loc[total_merged['Cluster Labels'] == 2, total_merged.columns[[1] + list(range(2, total_merged.shape[1]))]]

Unnamed: 0,Neighborhood(Quartiers),1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Arts-et-Métiers,Coffee Shop,Bakery,Cocktail Bar,Burger Joint,Wine Bar,Italian Restaurant,Sandwich Place,Art Museum,Restaurant,Furniture / Home Store
18,Enfants-Rouges,French Restaurant,Art Gallery,Coffee Shop,Sandwich Place,Wine Bar,Italian Restaurant,Cocktail Bar,Restaurant,Bistro,Burger Joint
56,Porte-Saint-Martin,French Restaurant,Coffee Shop,Wine Bar,Italian Restaurant,Hotel,Seafood Restaurant,Pizza Place,Bar,Bakery,Vegetarian / Vegan Restaurant
72,Sainte-Avoie,French Restaurant,Burger Joint,Italian Restaurant,Art Gallery,Chinese Restaurant,Seafood Restaurant,Restaurant,Ice Cream Shop,Bookstore,Wine Bar
0,"1000 8th Ave, New York, NY 10019",Theater,Concert Hall,Jazz Club,Performing Arts Venue,Hotel,Sandwich Place,Coffee Shop,Spa,Italian Restaurant,Wine Bar
1,"1011 Third Ave, New York, NY 10065",Hotel,Italian Restaurant,Boutique,Gym,French Restaurant,Cycle Studio,Gym / Fitness Center,Department Store,Spa,Salon / Barbershop
3,"20 Hudson Yards, 4th floor, New York, NY 10001",Art Gallery,Park,Hotel,Gym / Fitness Center,Dance Studio,Indie Theater,Lounge,Theater,Coffee Shop,Music Venue
4,"231 Hudson Street, New York, NY 10013",Italian Restaurant,Clothing Store,American Restaurant,Café,Sushi Restaurant,Coffee Shop,Gym,Wine Bar,Hotel,Men's Store
6,"52 Main Street, East Hampton, NY 11937",Italian Restaurant,Women's Store,Clothing Store,Hotel,Bakery,Bank,Coffee Shop,Pharmacy,Bookstore,Park
7,"6333 West Third Street, Los Angeles, CA 90036",Italian Restaurant,Coffee Shop,French Restaurant,Café,Mexican Restaurant,American Restaurant,Burger Joint,Clothing Store,Bookstore,Grocery Store


Visualisation of the Paris neighborhoods in the cluster with neighborhoods of Dylans bars in the USA

In [74]:
paris_venues_sorted['Dylans Cluster'] = paris_venues_sorted['Cluster Labels']==2
paris_venues_sorted['Dylans Cluster'] = paris_venues_sorted['Dylans Cluster'].astype(int)
paris_venues_sorted.head()

Unnamed: 0,Cluster Labels,Neighborhood(Quartiers),1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Dylans Cluster
0,0,Amérique,French Restaurant,Bar,Supermarket,Hotel,Japanese Restaurant,Bakery,Theater,Chinese Restaurant,Grocery Store,Metro Station,0
1,6,Archives,French Restaurant,Art Gallery,Burger Joint,Hotel,Coffee Shop,Clothing Store,Pizza Place,Pastry Shop,Cocktail Bar,Wine Bar,0
2,6,Arsenal,French Restaurant,Hotel,Plaza,Cocktail Bar,Italian Restaurant,Tapas Restaurant,Pizza Place,Bakery,Coffee Shop,Pedestrian Plaza,0
3,2,Arts-et-Métiers,Coffee Shop,Bakery,Cocktail Bar,Burger Joint,Wine Bar,Italian Restaurant,Sandwich Place,Art Museum,Restaurant,Furniture / Home Store,1
4,0,Auteuil,Tennis Court,French Restaurant,Supermarket,Sporting Goods Shop,Plaza,Flower Shop,Italian Restaurant,Restaurant,Garden,Outdoors & Recreation,0


In [78]:
map_paris = folium.Map(location=[latitude_paris, longitude_paris], zoom_start=12)

folium.Choropleth(
    geo_data=paris_geo,
    data=paris_venues_sorted,
    columns=['Neighborhood(Quartiers)', 'Dylans Cluster'],
    key_on='feature.properties.l_qu',
    fill_color='BuPu',
    fill_opacity=0.7,
    line_opacity=0.5,
    legend_name='Dylans CLuster',
    reset=True
).add_to(map_paris)

for lat, lng, label in zip(df_paris_final['Latitude'], df_paris_final['Longitude'], df_paris_final['Neighborhood(Quartiers)']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_paris)  

map_paris