# IBM Data Science Capstone Project - The Battle of Neighborhoods (Week 2)

### Introduction/Business Problem

Paris, capital of France, is one of the most important and influential cities in the world. In terms of tourism, Paris is the second most visited city in Europe after London. The capital of France seems to have been designed specifically for the enjoyment of its visitors. Its streets, squares, buildings, gardens and monuments beckon tourists to return, and indeed, many do.

Tourists from everywhere of the world visit Paris everyday in two ways bascally: with travel agencies providing standard "must see" guided tours or plan themselves with the help of websites or traval app full of impersonalised recommandation. Time is short and time is money. How to make the trip as personalised as possible so make the days in Paris as profitable as possible? Customised travel services are usually very expensive and not accessible to everyone then they always rely on experimented humain guide expertise which can be asychronised in termes of POI (Points of Interests) data. 

This project will try to give access to everyone the possibilité to customise his trip in Paris based on his personal interests. With the help of data visualisation he can easily make his ideas on which parisian arrondissement to be visited on priority with a deeper understanding on points of interests in different categories of arrondissement. 

### Data section

Geo-Coordinate Data: 

For this project we will use dataset from opendata.paris.fr for the arrondissements of Paris

Points of Interests Data:

We will need data about different venues across all of Paris and connect each venue to its respective arrondissement. To gain this information, we will use Foursquare API. 

# Coding for Week 2

In [None]:
# Import libraries
import numpy as np # library to handle data in a vectorized manner
import json # library to handle JSON files
import pandas as pd
import requests
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
from bs4 import BeautifulSoup
# Import k-means from clustering stage
from sklearn.cluster import KMeans
import folium # map rendering library
print('Libraries imported.')

In [20]:
# Download the Paris' arrondissements dataset from https://opendata.paris.fr/explore/dataset/arrondissements/table/?disjunctive.c_ar&disjunctive.c_arinsee&disjunctive.l_ar and read it into a pandas dataframe.
paris = pd.read_csv('https://raw.githubusercontent.com/BolinF77/IBM_DS_Capstone/main/paris_arrondissements.csv')
paris

Unnamed: 0,CAR,NAME,NSQAR,CAR.1,CARINSEE,LAR,NSQCO,SURFACE,PERIMETRE,Geometry_X,Geometry_Y
0,3,Temple,750000003,3,3,3eme Ardt,750001537,1170882828,4519264,48.862872,2.360001
1,19,Buttes-Chaumont,750000019,19,19,19eme Ardt,750001537,6792651129,11253182,48.887076,2.384821
2,14,Observatoire,750000014,14,14,14eme Ardt,750001537,5614877309,10317483,48.829245,2.326542
3,10,Entrepot,750000010,10,10,10eme Ardt,750001537,2891739442,6739375,48.87613,2.360728
4,12,Reuilly,750000012,12,12,12eme Ardt,750001537,16314782637,24089666,48.834974,2.421325
5,16,Passy,750000016,16,16,16eme Ardt,750001537,16372542129,17416110,48.860392,2.261971
6,11,Popincourt,750000011,11,11,11eme Ardt,750001537,3665441552,8282012,48.859059,2.380058
7,2,Bourse,750000002,2,2,2eme Ardt,750001537,991153745,4554104,48.868279,2.342803
8,4,Hotel-de-Ville,750000004,4,4,4eme Ardt,750001537,1600585632,5420908,48.854341,2.35763
9,17,Batignolles-Monceau,750000017,17,17,17eme Ardt,750001537,5668834504,10775580,48.887327,2.306777


In [21]:
# Data Wrangling
# Rename the necessary columns and remove unnecessary colums
paris.rename(columns={'NAME': 'Neighborhood ', 'CAR': 'Arrondissement', 'Geometry_X': 'Latitude', 'Geometry_Y': 'Longitude',  'LAR': 'French_Name'}, inplace=True)
paris.drop(['NSQAR','CAR.1','CARINSEE','NSQCO','SURFACE', 'PERIMETRE' ], axis=1, inplace=True)
paris

Unnamed: 0,Arrondissement,Neighborhood,French_Name,Latitude,Longitude
0,3,Temple,3eme Ardt,48.862872,2.360001
1,19,Buttes-Chaumont,19eme Ardt,48.887076,2.384821
2,14,Observatoire,14eme Ardt,48.829245,2.326542
3,10,Entrepot,10eme Ardt,48.87613,2.360728
4,12,Reuilly,12eme Ardt,48.834974,2.421325
5,16,Passy,16eme Ardt,48.860392,2.261971
6,11,Popincourt,11eme Ardt,48.859059,2.380058
7,2,Bourse,2eme Ardt,48.868279,2.342803
8,4,Hotel-de-Ville,4eme Ardt,48.854341,2.35763
9,17,Batignolles-Monceau,17eme Ardt,48.887327,2.306777


In [22]:
paris_df=paris
paris_df.head()

Unnamed: 0,Arrondissement,Neighborhood,French_Name,Latitude,Longitude
0,3,Temple,3eme Ardt,48.862872,2.360001
1,19,Buttes-Chaumont,19eme Ardt,48.887076,2.384821
2,14,Observatoire,14eme Ardt,48.829245,2.326542
3,10,Entrepot,10eme Ardt,48.87613,2.360728
4,12,Reuilly,12eme Ardt,48.834974,2.421325


Get the latitude and longitude values of Paris with the help of geopy library

In [23]:
# Retrieve the Latitude and Longitude for Paris
from geopy.geocoders import Nominatim 
address = 'Paris'
# Define the user_agent as Paris_explorer
geolocator = Nominatim(user_agent="Paris_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinates of Paris France are {}, {}.'.format(latitude, longitude))

The geographical coordinates of Paris France are 48.8566969, 2.3514616.


In [24]:
# create map of Paris using the above latitude and longitude values
map_paris = folium.Map(location=[latitude, longitude], zoom_start=12)
# add markers to map
for lat, lng, label in zip(paris['Latitude'], paris['Longitude'], paris['French_Name']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#e8dc54',
        fill_opacity=0.5,
        parse_html=False).add_to(map_paris)  
    
map_paris

Use Foursquare API to explore the Arrondissements of Paris

In [25]:
CLIENT_ID = 'VRG4X04MG2MR2JTKQ3XSMVHRH3XQW1ALRVGAZ54C5DRIUJ1K' 
CLIENT_SECRET = '1QFQOZWVHBHNI4MWQ20DJCRKAPF2PK5EG3ZWQ3M3OX2BCXD0' 
VERSION = '20210605' 
radius=500
LIMIT=100

In [26]:
# Explore the first Neighborhood in our dataframe.
# Get the Neighborhood's French name.
paris_df.loc[0, 'French_Name']

'3eme Ardt'

In [27]:
# Get the Neighborhood's latitude and longitude values.
neighborhood_latitude = paris_df.loc[0, 'Latitude'] # Neighborhood latitude value
neighborhood_longitude = paris_df.loc[0, 'Longitude'] # Neighborhood longitude value
neighborhood_name = paris_df.loc[0, 'French_Name'] # Neighborhood name
print('Latitude and longitude values of the neighborhood {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of the neighborhood 3eme Ardt are 48.86287238, 2.360000986.


In [28]:
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '60ef11b47d35353ca82aeebe'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Enfants-Rouges',
  'headerFullLocation': 'Enfants-Rouges, Paris',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 109,
  'suggestedBounds': {'ne': {'lat': 48.8673723845, 'lng': 2.366828546806527},
   'sw': {'lat': 48.8583723755, 'lng': 2.3531734251934733}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4d974096a2c654814aa6d353',
       'name': 'Mmmozza',
       'location': {'address': '57 rue de Bretagne',
        'lat': 48.86391016055883,
        'lng': 2.360590696334839,
        'labeledLatLngs': [{'label': 'display',
          'lat': 48.863910160558

In [29]:
# define a function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [30]:
#Structure the json file into a pandas dataframe
venues = results['response']['groups'][0]['items'] 
nearby_venues = json_normalize(venues)
# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]
# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)
# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues.head(30)

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Mmmozza,Sandwich Place,48.86391,2.360591
1,Chez Alain Miam Miam,Sandwich Place,48.862369,2.36195
2,Marché des Enfants Rouges,Farmers Market,48.862806,2.361996
3,Chez Taeko,Japanese Restaurant,48.862734,2.362136
4,Square du Temple,Park,48.864475,2.360816
5,Les Enfants Du Marché,French Restaurant,48.862746,2.36195
6,Les Enfants Rouges,Wine Bar,48.863013,2.36126
7,Okomusu,Okonomiyaki Restaurant,48.861453,2.360879
8,Chez Alain Miam Miam,Sandwich Place,48.862781,2.362064
9,Le Burger Fermier des Enfants Rouges,Burger Joint,48.862831,2.362073


In [31]:
# Check how many venues there are in 3eme Ardt within a radius of 500 meters
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


Create a nearby venues function for all the neighborhoods in Paris

In [32]:
def getNearbyVenues(name, latitudes, longitudes, radius=500):    
    venues_list=[]
    for name, lat, lng in zip(name, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['French_Name', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Create a new dataframe called for the venues of Paris called paris_venues

In [34]:
paris_venues = getNearbyVenues(name=paris['French_Name'],
                                   latitudes=paris['Latitude'],
                                   longitudes=paris['Longitude']
                                  )
paris_venues.head(60)

3eme Ardt
19eme Ardt
14eme Ardt
10eme Ardt
12eme Ardt
16eme Ardt
11eme Ardt
2eme Ardt
4eme Ardt
17eme Ardt
18eme Ardt
1er Ardt
5eme Ardt
7eme Ardt
20eme Ardt
8eme Ardt
9eme Ardt
13eme Ardt
15eme Ardt
6eme Ardt


Unnamed: 0,French_Name,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,3eme Ardt,48.862872,2.360001,Mmmozza,48.86391,2.360591,Sandwich Place
1,3eme Ardt,48.862872,2.360001,Chez Alain Miam Miam,48.862369,2.36195,Sandwich Place
2,3eme Ardt,48.862872,2.360001,Marché des Enfants Rouges,48.862806,2.361996,Farmers Market
3,3eme Ardt,48.862872,2.360001,Chez Taeko,48.862734,2.362136,Japanese Restaurant
4,3eme Ardt,48.862872,2.360001,Square du Temple,48.864475,2.360816,Park
5,3eme Ardt,48.862872,2.360001,Les Enfants Du Marché,48.862746,2.36195,French Restaurant
6,3eme Ardt,48.862872,2.360001,Les Enfants Rouges,48.863013,2.36126,Wine Bar
7,3eme Ardt,48.862872,2.360001,Okomusu,48.861453,2.360879,Okonomiyaki Restaurant
8,3eme Ardt,48.862872,2.360001,Chez Alain Miam Miam,48.862781,2.362064,Sandwich Place
9,3eme Ardt,48.862872,2.360001,Le Burger Fermier des Enfants Rouges,48.862831,2.362073,Burger Joint


In [35]:
paris_venues.shape

(1300, 7)

Check how many venues were returned for each neighborhood

In [36]:
paris_venues.groupby('French_Name').count()

Unnamed: 0_level_0,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
French_Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
10eme Ardt,100,100,100,100,100,100
11eme Ardt,69,69,69,69,69,69
12eme Ardt,4,4,4,4,4,4
13eme Ardt,60,60,60,60,60,60
14eme Ardt,24,24,24,24,24,24
15eme Ardt,60,60,60,60,60,60
16eme Ardt,10,10,10,10,10,10
17eme Ardt,57,57,57,57,57,57
18eme Ardt,46,46,46,46,46,46
19eme Ardt,44,44,44,44,44,44


In [37]:
# Calculate how many unique categories there are.
print('There are {} unique venue categories.'.format(len(paris_venues['Venue Category'].unique())))

There are 199 unique venue categories.


In [38]:
# Analyze each of the Neighborhoods from the results
# one hot encoding
paris_onehot = pd.get_dummies(paris_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column back to dataframe
paris_onehot['Neighborhood'] = paris_venues['French_Name'] 
# move neighborhood column to the first column
fixed_columns = [paris_onehot.columns[-1]] + list(paris_onehot.columns[:-1])
paris_onehot = paris_onehot[fixed_columns]
paris_onehot

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Zoo,Zoo Exhibit
0,3eme Ardt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,3eme Ardt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,3eme Ardt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,3eme Ardt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,3eme Ardt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1295,6eme Ardt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1296,6eme Ardt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1297,6eme Ardt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1298,6eme Ardt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [39]:
paris_grouped = paris_onehot.groupby('Neighborhood').mean().reset_index()
paris_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Zoo,Zoo Exhibit
0,10eme Ardt,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,...,0.0,0.0,0.01,0.0,0.0,0.02,0.02,0.0,0.0,0.0
1,11eme Ardt,0.014493,0.0,0.0,0.0,0.0,0.0,0.014493,0.0,0.028986,...,0.0,0.0,0.014493,0.0,0.028986,0.028986,0.0,0.014493,0.0,0.0
2,12eme Ardt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.25
3,13eme Ardt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.183333,...,0.0,0.0,0.0,0.0,0.233333,0.0,0.0,0.0,0.0,0.0
4,14eme Ardt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,15eme Ardt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,16eme Ardt,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,17eme Ardt,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.017544,...,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,18eme Ardt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.043478,0.021739,0.0,0.0,0.0,0.0
9,19eme Ardt,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0


Display each neighborhood with it's top 10 most common venues

In [40]:
num_top_venues = 10
for hood in paris_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = paris_grouped[paris_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----10eme Ardt----
                venue  freq
0   French Restaurant  0.12
1         Coffee Shop  0.05
2              Bistro  0.05
3   Indian Restaurant  0.04
4                Café  0.04
5               Hotel  0.04
6         Pizza Place  0.03
7  Italian Restaurant  0.03
8    Asian Restaurant  0.02
9  Seafood Restaurant  0.02


----11eme Ardt----
                   venue  freq
0      French Restaurant  0.12
1            Supermarket  0.06
2             Restaurant  0.04
3                   Café  0.04
4                 Bakery  0.04
5            Pastry Shop  0.04
6     Italian Restaurant  0.04
7            Pizza Place  0.03
8               Wine Bar  0.03
9  Vietnamese Restaurant  0.03


----12eme Ardt----
                    venue  freq
0             Zoo Exhibit  0.25
1                     Zoo  0.25
2     Monument / Landmark  0.25
3             Supermarket  0.25
4   Performing Arts Venue  0.00
5               Nightclub  0.00
6            Noodle House  0.00
7  Okonomiyaki Restaurant  0.00
8 

Put that data into a pandas dataframe and sort the venues in descending order

In [41]:
# First sort the venues in descending order.
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False) 
    return row_categories_sorted.index.values[0:num_top_venues]

In [42]:
# create the new dataframe and display the top 10 venues for each neighborhood
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))     
# create a new dataframe
paris_venues_sorted = pd.DataFrame(columns=columns)
paris_venues_sorted['Neighborhood'] = paris_grouped['Neighborhood']
for ind in np.arange(paris_grouped.shape[0]):
    paris_venues_sorted.iloc[ind, 1:] = return_most_common_venues(paris_grouped.iloc[ind, :], num_top_venues)
paris_venues_sorted.head(30)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,10eme Ardt,French Restaurant,Coffee Shop,Bistro,Indian Restaurant,Café,Hotel,Pizza Place,Italian Restaurant,Asian Restaurant,Seafood Restaurant
1,11eme Ardt,French Restaurant,Supermarket,Restaurant,Café,Bakery,Pastry Shop,Italian Restaurant,Pizza Place,Wine Bar,Vietnamese Restaurant
2,12eme Ardt,Zoo Exhibit,Zoo,Monument / Landmark,Supermarket,Performing Arts Venue,Nightclub,Noodle House,Okonomiyaki Restaurant,Optical Shop,Outdoor Sculpture
3,13eme Ardt,Vietnamese Restaurant,Asian Restaurant,Chinese Restaurant,Thai Restaurant,French Restaurant,Juice Bar,Gourmet Shop,Creperie,Butcher,Bus Stop
4,14eme Ardt,French Restaurant,Food & Drink Shop,Hotel,Supermarket,Pizza Place,Bistro,Tea Room,Bakery,Brasserie,Fast Food Restaurant
5,15eme Ardt,Hotel,Italian Restaurant,French Restaurant,Coffee Shop,Bistro,Thai Restaurant,Brasserie,Supermarket,Indian Restaurant,Bakery
6,16eme Ardt,Lake,Plaza,Bus Station,Bus Stop,Art Museum,French Restaurant,Park,Boat or Ferry,Afghan Restaurant,Perfume Shop
7,17eme Ardt,French Restaurant,Hotel,Italian Restaurant,Bistro,Bakery,Japanese Restaurant,Café,Plaza,Restaurant,Asian Restaurant
8,18eme Ardt,French Restaurant,Bar,Café,Supermarket,Convenience Store,Coffee Shop,Restaurant,Vietnamese Restaurant,Sandwich Place,Beer Store
9,19eme Ardt,French Restaurant,Bar,Café,Seafood Restaurant,Beer Bar,Hotel,Supermarket,Bistro,Brewery,Creperie


This top 10 venues by arrondissement is very useful to give deeper insights to tourists

Then let's Cluster the data

In [45]:
# set number of clusters
kclusters = 9
clustering_grouped_paris = paris_grouped.drop('Neighborhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(clustering_grouped_paris)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:20]

array([0, 0, 1, 5, 3, 7, 4, 2, 6, 6, 0, 8, 0, 0, 0, 0, 0, 2, 2, 2])

In [46]:
# add clustering labels
paris_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [47]:
#new dataframe
paris_merged = paris_df
paris_merged = paris_merged.join(paris_venues_sorted.set_index("Neighborhood"), on = "French_Name")
paris_merged.head()

Unnamed: 0,Arrondissement,Neighborhood,French_Name,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,3,Temple,3eme Ardt,48.862872,2.360001,0,French Restaurant,Art Gallery,Cocktail Bar,Coffee Shop,Burger Joint,Wine Bar,Bakery,Bistro,Italian Restaurant,Sandwich Place
1,19,Buttes-Chaumont,19eme Ardt,48.887076,2.384821,6,French Restaurant,Bar,Café,Seafood Restaurant,Beer Bar,Hotel,Supermarket,Bistro,Brewery,Creperie
2,14,Observatoire,14eme Ardt,48.829245,2.326542,3,French Restaurant,Food & Drink Shop,Hotel,Supermarket,Pizza Place,Bistro,Tea Room,Bakery,Brasserie,Fast Food Restaurant
3,10,Entrepot,10eme Ardt,48.87613,2.360728,0,French Restaurant,Coffee Shop,Bistro,Indian Restaurant,Café,Hotel,Pizza Place,Italian Restaurant,Asian Restaurant,Seafood Restaurant
4,12,Reuilly,12eme Ardt,48.834974,2.421325,1,Zoo Exhibit,Zoo,Monument / Landmark,Supermarket,Performing Arts Venue,Nightclub,Noodle House,Okonomiyaki Restaurant,Optical Shop,Outdoor Sculpture


####  Visualize the results

In [48]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(paris_merged['Latitude'], paris_merged['Longitude'], paris_merged['French_Name'], paris_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Evaluate Each Cluster

In [56]:
paris_merged.loc[paris_merged["Cluster Labels"] == 0, 
                    paris_merged.columns[[0] + [1] + list(range(5, paris_merged.shape[1]))]]

Unnamed: 0,Arrondissement,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,3,Temple,0,French Restaurant,Art Gallery,Cocktail Bar,Coffee Shop,Burger Joint,Wine Bar,Bakery,Bistro,Italian Restaurant,Sandwich Place
3,10,Entrepot,0,French Restaurant,Coffee Shop,Bistro,Indian Restaurant,Café,Hotel,Pizza Place,Italian Restaurant,Asian Restaurant,Seafood Restaurant
6,11,Popincourt,0,French Restaurant,Supermarket,Restaurant,Café,Bakery,Pastry Shop,Italian Restaurant,Pizza Place,Wine Bar,Vietnamese Restaurant
7,2,Bourse,0,French Restaurant,Cocktail Bar,Wine Bar,Bakery,Plaza,Hotel,Japanese Restaurant,Pedestrian Plaza,Burger Joint,Salad Place
8,4,Hotel-de-Ville,0,French Restaurant,Ice Cream Shop,Clothing Store,Pastry Shop,Hotel,Italian Restaurant,Plaza,Pedestrian Plaza,Cocktail Bar,Thai Restaurant
11,1,Louvre,0,French Restaurant,Japanese Restaurant,Plaza,Hotel,Coffee Shop,Café,Italian Restaurant,Art Museum,Ramen Restaurant,Bistro
12,5,Pantheon,0,French Restaurant,Italian Restaurant,Science Museum,Hotel,Plaza,Bakery,Café,Greek Restaurant,Coffee Shop,Bar
19,6,Luxembourg,0,French Restaurant,Bakery,Italian Restaurant,Cocktail Bar,Plaza,Ice Cream Shop,Bookstore,Seafood Restaurant,Fountain,Tailor Shop


In [57]:
paris_merged.loc[paris_merged["Cluster Labels"] == 1, 
                    paris_merged.columns[[0] + [1] + list(range(5, paris_merged.shape[1]))]]

Unnamed: 0,Arrondissement,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,12,Reuilly,1,Zoo Exhibit,Zoo,Monument / Landmark,Supermarket,Performing Arts Venue,Nightclub,Noodle House,Okonomiyaki Restaurant,Optical Shop,Outdoor Sculpture


In [58]:
paris_merged.loc[paris_merged["Cluster Labels"] == 2, 
                    paris_merged.columns[[0] + [1] + list(range(5, paris_merged.shape[1]))]]

Unnamed: 0,Arrondissement,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,17,Batignolles-Monceau,2,French Restaurant,Hotel,Italian Restaurant,Bistro,Bakery,Japanese Restaurant,Café,Plaza,Restaurant,Asian Restaurant
13,7,Palais-Bourbon,2,Hotel,French Restaurant,Italian Restaurant,Plaza,Café,Cocktail Bar,History Museum,Coffee Shop,Gourmet Shop,Bistro
15,8,elysee,2,French Restaurant,Hotel,Spa,Art Gallery,Cocktail Bar,Bakery,Bar,Hotel Bar,Furniture / Home Store,Park
16,9,Opera,2,French Restaurant,Hotel,Bistro,Cocktail Bar,Bakery,Wine Bar,Restaurant,Lounge,Japanese Restaurant,Gym / Fitness Center


In [59]:
paris_merged.loc[paris_merged["Cluster Labels"] == 3, 
                    paris_merged.columns[[0] + [1] + list(range(5, paris_merged.shape[1]))]]

Unnamed: 0,Arrondissement,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,14,Observatoire,3,French Restaurant,Food & Drink Shop,Hotel,Supermarket,Pizza Place,Bistro,Tea Room,Bakery,Brasserie,Fast Food Restaurant


In [55]:
paris_merged.loc[paris_merged["Cluster Labels"] == 4, 
                    paris_merged.columns[[0] + [1] + list(range(5, paris_merged.shape[1]))]]

Unnamed: 0,Arrondissement,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,16,Passy,4,Lake,Plaza,Bus Station,Bus Stop,Art Museum,French Restaurant,Park,Boat or Ferry,Afghan Restaurant,Perfume Shop


In [60]:
paris_merged.loc[paris_merged["Cluster Labels"] == 5, 
                    paris_merged.columns[[0] + [1] + list(range(5, paris_merged.shape[1]))]]

Unnamed: 0,Arrondissement,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,13,Gobelins,5,Vietnamese Restaurant,Asian Restaurant,Chinese Restaurant,Thai Restaurant,French Restaurant,Juice Bar,Gourmet Shop,Creperie,Butcher,Bus Stop


In [61]:
paris_merged.loc[paris_merged["Cluster Labels"] == 6, 
                    paris_merged.columns[[0] + [1] + list(range(5, paris_merged.shape[1]))]]

Unnamed: 0,Arrondissement,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,19,Buttes-Chaumont,6,French Restaurant,Bar,Café,Seafood Restaurant,Beer Bar,Hotel,Supermarket,Bistro,Brewery,Creperie
10,18,Buttes-Montmartre,6,French Restaurant,Bar,Café,Supermarket,Convenience Store,Coffee Shop,Restaurant,Vietnamese Restaurant,Sandwich Place,Beer Store


In [62]:
paris_merged.loc[paris_merged["Cluster Labels"] == 7, 
                    paris_merged.columns[[0] + [1] + list(range(5, paris_merged.shape[1]))]]

Unnamed: 0,Arrondissement,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,15,Vaugirard,7,Hotel,Italian Restaurant,French Restaurant,Coffee Shop,Bistro,Thai Restaurant,Brasserie,Supermarket,Indian Restaurant,Bakery


In [63]:
paris_merged.loc[paris_merged["Cluster Labels"] == 8, 
                    paris_merged.columns[[0] + [1] + list(range(5, paris_merged.shape[1]))]]

Unnamed: 0,Arrondissement,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,20,Menilmontant,8,Bakery,Japanese Restaurant,French Restaurant,Bar,Plaza,Italian Restaurant,Café,Park,Bistro,Pizza Place


According to the results of the K-Means Machine Learning clustering, 2 significant clusters (cluster 0 and cluster 2)stics 