<center><font size=5>Clustering the Neighbourhoods of London and Paris </font></center>

# Problematic and project background

**As the capitals of France and the United Kingdom, Paris and London are also the most prestigious tourist cities in Europe. In terms of economy and culture, these two cities have very high similarities. When large international companies choose to create a new European office in Europe, they often choose Paris and London as candidates. But making a choice is very difficult. Now we try to analyse the neighbourhoods of London and Paris respectively and picture insights to what they look like.**

London

In [19]:
Image(url= "https://london.ac.uk/sites/default/files/styles/promo_large/public/2018-10/london-aerial-cityscape-river-thames_1.jpg",width=400)

Paris

In [18]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "https://images.prismic.io/figaroimmo%2F99439d29-f927-483b-9667-d280eaf7d061_shutterstock_1420728554-compressor.jpg",width=400)

# Data Description

### London  

The data about london areas is available from Wikipedia https://en.wikipedia.org/wiki/List_of_areas_of_London.   
we can get all the information about the neighbourhoods

1. London borough : Name of Neighbourhood
2. Post town : Name of borough
3. post_code : Postal codes for London.


### Paris
To derive our solution, We leverage JSON data available at https://www.data.gouv.fr/fr/datasets/r/e88c6fda-1d09-42a0-a069-606d3259114e

The JSON file has data about all the neighbourhoods in France.

1. postal_code : Postal codes for France
2. nom_comm : Name of Neighbourhoods in France
3. nom_dept : Name of the boroughs, equivalent to towns in France
4. geo_point_2d : Tuple containing the latitude and longitude of the Neighbourhoods.

# Lib

In [35]:
import pandas as pd
import requests
import numpy as np
import geopandas as gpd
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium

# import k-means for the clustering stage
from sklearn.cluster import KMeans

import warnings
warnings.filterwarnings("ignore")

# London Data

## Get london data

In [41]:
url_grand_london = "https://en.wikipedia.org/wiki/List_of_areas_of_London"
wiki_grand_london_url = requests.get(url_grand_london)

wiki_grand_london_data = pd.read_html(wiki_grand_london_url.text)

grand_london_wiki_df = wiki_grand_london_data[1]
grand_london_wiki_df.head()

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


## Select columns London borough/Post town/Postcode district

In [43]:
grand_london_wiki_df.columns

Index(['Location', 'London borough', 'Post town', 'Postcode district',
       'Dial code', 'OS grid ref'],
      dtype='object')

In [44]:
grand_london_df = grand_london_wiki_df.iloc[:,[1,2,3]]
grand_london_df.columns = ['Borough','Neighbourhood','Post_code']
grand_london_df.head()

Unnamed: 0,Borough,Neighbourhood,Post_code
0,"Bexley, Greenwich [7]",LONDON,SE2
1,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4"
2,Croydon[8],CROYDON,CR0
3,Croydon[8],CROYDON,CR0
4,Bexley,"BEXLEY, SIDCUP","DA5, DA14"


Remove [  ] from the Borough column

In [45]:
grand_london_df['Borough'] = grand_london_df['Borough'].map(lambda x: x.split('[')[0].strip())
grand_london_df.head()

Unnamed: 0,Borough,Neighbourhood,Post_code
0,"Bexley, Greenwich",LONDON,SE2
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4"
2,Croydon,CROYDON,CR0
3,Croydon,CROYDON,CR0
4,Bexley,"BEXLEY, SIDCUP","DA5, DA14"


## Select the areas only in London

In [77]:
london_df = grand_london_df[grand_london_df['Neighbourhood'].str.contains('LONDON')]
london_df.reset_index(drop=True,inplace=True)
london_df.head()

Unnamed: 0,Borough,Neighbourhood,Post_code
0,"Bexley, Greenwich",LONDON,SE2
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4"
2,City,LONDON,EC3
3,Westminster,LONDON,WC2
4,Bromley,LONDON,SE20


In [78]:
london_df.shape

(308, 3)

## Add Geolocations for London Neighbourhoods

In [79]:
from arcgis.geocoding import geocode
from arcgis.gis import GIS
gis = GIS()

### Function to get the geo 2D position

France

In [80]:
# For France
def get_2D_FR(address):
    lat_coords = 0
    lng_coords = 0
    g = geocode(address='{}, France'.format(address))[0]
    lng_coords = g['location']['x']
    lat_coords = g['location']['y']
    return [str(lat_coords), str(lng_coords)]

UK

In [81]:
# For Canada
def get_2D_UK(address):
    lat_coords = 0
    lng_coords = 0
    g = geocode(address='{}, England, GBR'.format(address))[0]
    lng_coords = g['location']['x']
    lat_coords = g['location']['y']
    return [str(lat_coords), str(lng_coords)]

Test the geo function

In [82]:
get_2D_UK('W3, W4')

['51.51324000000005', '-0.2674599999999714']

Get the London postal code series

In [83]:
london_postalcode = london_df['Post_code']
london_postalcode.head()

0       SE2
1    W3, W4
2       EC3
3       WC2
4      SE20
Name: Post_code, dtype: object

### Query geo 2D position

According the postal code, retrieve the 2D geo positions

In [84]:
london_geo_2D = london_postalcode.apply(lambda x: get_2D_UK(x))
london_geo_2D.head()

0    [51.492450000000076, 0.12127000000003818]
1     [51.51324000000005, -0.2674599999999714]
2    [51.51200000000006, -0.08057999999994081]
3    [51.51651000000004, -0.11967999999995982]
4     [51.48249000000004, 0.11919361600007505]
Name: Post_code, dtype: object

Merge the tow dataframe into one

In [85]:
london_geo_2D.name='geo_2D'
london_merged = pd.concat([london_df,london_geo_2D], axis=1)
london_merged.head()

Unnamed: 0,Borough,Neighbourhood,Post_code,geo_2D
0,"Bexley, Greenwich",LONDON,SE2,"[51.492450000000076, 0.12127000000003818]"
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4","[51.51324000000005, -0.2674599999999714]"
2,City,LONDON,EC3,"[51.51200000000006, -0.08057999999994081]"
3,Westminster,LONDON,WC2,"[51.51651000000004, -0.11967999999995982]"
4,Bromley,LONDON,SE20,"[51.48249000000004, 0.11919361600007505]"


### Construct the final dataframe london

In [86]:
london_merged['latitude'] = london_merged['geo_2D'].apply(lambda x: float(x[0]))
london_merged['longitude'] = london_merged['geo_2D'].apply(lambda x: float(x[1]))
london_merged.drop(['geo_2D'], axis=1, inplace=True)
london_merged.head()

Unnamed: 0,Borough,Neighbourhood,Post_code,latitude,longitude
0,"Bexley, Greenwich",LONDON,SE2,51.49245,0.12127
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",51.51324,-0.26746
2,City,LONDON,EC3,51.512,-0.08058
3,Westminster,LONDON,WC2,51.51651,-0.11968
4,Bromley,LONDON,SE20,51.48249,0.119194


In [88]:
print(london_df.shape)
print(london_merged.shape)

(308, 3)
(308, 5)


Check the rows for before and after the combination, nothing missed. it's great!

# Map and Venue of London

## Map

Get London Geo location

In [89]:
London_loc = get_2D_UK('london')
London_loc

['51.50642000000005', '-0.1272099999999341']

In [90]:
london_merged.columns

Index(['Borough', 'Neighbourhood', 'Post_code', 'latitude', 'longitude'], dtype='object')

Create the map of London

In [93]:
import folium # map rendering library

map_london = folium.Map(location=London_loc, zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood in zip(london_merged['latitude'], london_merged['longitude'], 
                                           london_merged['Borough'], london_merged['Neighbourhood']):
    
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

## Venue

### Define Foursquare Credentials and Version

In [171]:
CLIENT_ID = 'Q4WDHSAFIIKLZ33UKV4EVWOR3AHWIR4D1YIZ4E0JOB5HGGSN' # your Foursquare ID
CLIENT_SECRET = 'AIQGAJBL14DNCBCC3KAUY0NALFIZQYS1LCG0CRT5DETVIP0W' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('CLIENT_SECRET:' + CLIENT_SECRET)

CLIENT_SECRET:AIQGAJBL14DNCBCC3KAUY0NALFIZQYS1LCG0CRT5DETVIP0W


Foursquare reauest function

In [95]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    LIMIT = 100
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [98]:
venues_in_London = getNearbyVenues(names=london_merged['Borough'],
                                   latitudes=london_merged['latitude'],
                                   longitudes=london_merged['longitude']
                                  )
venues_in_London

Bexley, Greenwich
Ealing, Hammersmith and Fulham
City
Westminster
Bromley
Islington
Islington
Barnet
Enfield
Wandsworth
Southwark
City
Richmond upon Thames
Barnet
Islington
Wandsworth
Westminster
Bromley
Newham
Ealing
Westminster
Lewisham
Camden
Southwark
Tower Hamlets
Bexley
City
Lewisham
Greenwich
Tower Hamlets
Camden
Haringey
Tower Hamlets
Haringey
Barnet
Brent
Lambeth
Lewisham
Tower Hamlets
Kensington and Chelsea, Hammersmith and Fulham
Brent
Barnet
Barnet
Southwark
Tower Hamlets
Camden
Tower Hamlets
Waltham Forest
Newham
Islington
Richmond upon Thames
Lewisham
Camden
Westminster
Greenwich
Kensington and Chelsea
Barnet
Westminster
Lewisham
Waltham Forest
Hounslow, Ealing, Hammersmith and Fulham
Brent
Barnet
Lambeth, Wandsworth
Islington
Barnet
Merton
Barnet
Westminster
Barnet, Brent, Camden
Lewisham
Bexley
Haringey
Bromley
Tower Hamlets
Newham
Hackney
Islington
Southwark
Lewisham
Brent
Southwark
Ealing
Kensington and Chelsea
Wandsworth
Southwark
Barnet
Newham
Richmond upon Thames
E

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Bexley, Greenwich",51.49245,0.12127,Lesnes Abbey,51.489526,0.125839,Historic Site
1,"Bexley, Greenwich",51.49245,0.12127,Sainsbury's,51.492826,0.120524,Supermarket
2,"Bexley, Greenwich",51.49245,0.12127,Lidl,51.496152,0.118417,Supermarket
3,"Bexley, Greenwich",51.49245,0.12127,Abbey Wood Railway Station (ABW),51.491097,0.121334,Train Station
4,"Bexley, Greenwich",51.49245,0.12127,Bean @ Work,51.491172,0.120649,Coffee Shop
...,...,...,...,...,...,...,...
13126,Hammersmith and Fulham,51.50645,-0.23691,Nut Case,51.506512,-0.233696,Gourmet Shop
13127,Hammersmith and Fulham,51.50645,-0.23691,West One Guesthouse,51.504132,-0.239130,Hotel
13128,Hammersmith and Fulham,51.50645,-0.23691,New Sweet'n'Sour Chinese Takeaway,51.506343,-0.231878,Chinese Restaurant
13129,Hammersmith and Fulham,51.50645,-0.23691,The Vine Leaves Taverna,51.506262,-0.230796,Greek Restaurant


In [100]:
print(venues_in_London.shape)

(13131, 7)


check how many venues were returned for each neighborhood

In [101]:
venues_in_London.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Barnet,710,710,710,710,710,710
"Barnet, Brent, Camden",4,4,4,4,4,4
Bexley,20,20,20,20,20,20
"Bexley, Greenwich",8,8,8,8,8,8
Brent,620,620,620,620,620,620
"Brent, Camden",33,33,33,33,33,33
"Brent, Ealing",88,88,88,88,88,88
"Brent, Harrow",2,2,2,2,2,2
Bromley,30,30,30,30,30,30
Camden,803,803,803,803,803,803


how many unique categories can be curated from all the returned venues

In [102]:
print('There are {} uniques categories.'.format(len(venues_in_London['Venue Category'].unique())))

There are 316 uniques categories.


## One Hot Encoding

In [104]:
# one hot encoding
London_venue_onehot = pd.get_dummies(venues_in_London[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
London_venue_onehot['Neighborhood'] = venues_in_London['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [London_venue_onehot.columns[-1]] + list(London_venue_onehot.columns[:-1])
London_venue_onehot = London_venue_onehot[fixed_columns]

London_venue_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Aquarium,Arcade,Arepa Restaurant,...,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo Exhibit
0,"Bexley, Greenwich",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Bexley, Greenwich",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Bexley, Greenwich",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Bexley, Greenwich",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Bexley, Greenwich",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [105]:
London_venue_onehot.shape

(13131, 317)

## Venue categories mean value

group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [106]:
London_grouped = London_venue_onehot.groupby('Neighborhood').mean().reset_index()
London_grouped

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Aquarium,Arcade,Arepa Restaurant,...,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo Exhibit
0,Barnet,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.009859,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Barnet, Brent, Camden",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bexley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Bexley, Greenwich",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Brent,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.014516,0.0,0.0,0.0,0.003226,0.0,0.0,0.0,0.0
5,"Brent, Camden",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.030303,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0
6,"Brent, Ealing",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Brent, Harrow",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Bromley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Camden,0.002491,0.0,0.0,0.0,0.006227,0.001245,0.0,0.001245,0.0,...,0.001245,0.001245,0.0,0.009963,0.0,0.001245,0.001245,0.0,0.001245,0.029888


function to sort the venues in descending order

In [107]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

## Top venue categories

In [108]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns title according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
London_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
London_neighborhoods_venues_sorted['Neighborhood'] = London_grouped['Neighborhood']

for ind in np.arange(London_grouped.shape[0]):
    London_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(London_grouped.iloc[ind, :], num_top_venues)

London_neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barnet,Pub,Coffee Shop,Café,Bus Stop,Bakery,Park,Chinese Restaurant,Grocery Store,Sushi Restaurant,Gastropub
1,"Barnet, Brent, Camden",Gym / Fitness Center,Supermarket,Hardware Store,Clothing Store,Accessories Store,Optical Shop,Pakistani Restaurant,Outdoors & Recreation,Outdoor Sculpture,Outdoor Event Space
2,Bexley,Supermarket,Convenience Store,Train Station,Coffee Shop,Historic Site,Child Care Service,Accessories Store,Optical Shop,Pakistani Restaurant,Outdoors & Recreation
3,"Bexley, Greenwich",Supermarket,Bakery,Train Station,Convenience Store,Coffee Shop,Gastropub,Historic Site,Optical Shop,Pakistani Restaurant,Outdoors & Recreation
4,Brent,Coffee Shop,Pub,Pizza Place,Italian Restaurant,Supermarket,Greek Restaurant,Pharmacy,Middle Eastern Restaurant,Park,Cocktail Bar


# Cluster Neighborhoods Modeling London

## kmeans

Cluster london areas roughly to 10 catagories

In [132]:
# set number of clusters
kclusters = 10

London_grouped_clustering = London_grouped.drop('Neighborhood', axis = 1)

# run k-means clustering
kmeans_london = KMeans(n_clusters=kclusters, random_state=0).fit(London_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans_london.labels_[0:10] 

array([7, 5, 3, 3, 9, 9, 9, 1, 9, 7], dtype=int32)

## Add the cluster label into dataframe

In [133]:
London_neighborhoods_venues_sorted.drop(columns=['Cluster Labels'], inplace=True)

In [134]:
London_neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans_london.labels_ +1)
London_neighborhoods_venues_sorted

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,8,Barnet,Pub,Coffee Shop,Café,Bus Stop,Bakery,Park,Chinese Restaurant,Grocery Store,Sushi Restaurant,Gastropub
1,6,"Barnet, Brent, Camden",Gym / Fitness Center,Supermarket,Hardware Store,Clothing Store,Accessories Store,Optical Shop,Pakistani Restaurant,Outdoors & Recreation,Outdoor Sculpture,Outdoor Event Space
2,4,Bexley,Supermarket,Convenience Store,Train Station,Coffee Shop,Historic Site,Child Care Service,Accessories Store,Optical Shop,Pakistani Restaurant,Outdoors & Recreation
3,4,"Bexley, Greenwich",Supermarket,Bakery,Train Station,Convenience Store,Coffee Shop,Gastropub,Historic Site,Optical Shop,Pakistani Restaurant,Outdoors & Recreation
4,10,Brent,Coffee Shop,Pub,Pizza Place,Italian Restaurant,Supermarket,Greek Restaurant,Pharmacy,Middle Eastern Restaurant,Park,Cocktail Bar
5,10,"Brent, Camden",Indian Restaurant,Pub,Supermarket,Brazilian Restaurant,Café,Park,Grocery Store,Middle Eastern Restaurant,Gastropub,Fast Food Restaurant
6,10,"Brent, Ealing",Coffee Shop,Pub,Pizza Place,Greek Restaurant,Italian Restaurant,Supermarket,Middle Eastern Restaurant,Pharmacy,Restaurant,Tea Room
7,2,"Brent, Harrow",Construction & Landscaping,Health Food Store,Accessories Store,Opera House,Paper / Office Supplies Store,Pakistani Restaurant,Outdoors & Recreation,Outdoor Sculpture,Outdoor Event Space,Organic Grocery
8,10,Bromley,Bus Station,Forest,Campground,Athletics & Sports,Park,Construction & Landscaping,Café,Bus Stop,Gastropub,Supermarket
9,8,Camden,Pub,Café,Coffee Shop,Bakery,Italian Restaurant,Bookstore,Zoo Exhibit,Hotel,Garden,Japanese Restaurant


Combine dataframe london_merged and London_neighborhoods_venues_sorted

In [135]:
London_labeled = london_merged

London_labeled = London_labeled.join(London_neighborhoods_venues_sorted.set_index('Neighborhood'), on='Borough')

London_labeled.head()

Unnamed: 0,Borough,Neighbourhood,Post_code,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Bexley, Greenwich",LONDON,SE2,51.49245,0.12127,4,Supermarket,Bakery,Train Station,Convenience Store,Coffee Shop,Gastropub,Historic Site,Optical Shop,Pakistani Restaurant,Outdoors & Recreation
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",51.51324,-0.26746,5,Grocery Store,Indian Restaurant,Train Station,Breakfast Spot,Park,Polish Restaurant,Poke Place,Outdoor Sculpture,Outdoor Event Space,Organic Grocery
2,City,LONDON,EC3,51.512,-0.08058,1,Hotel,Coffee Shop,Italian Restaurant,Gym / Fitness Center,Pub,Sandwich Place,Restaurant,Wine Bar,Falafel Restaurant,Cocktail Bar
3,Westminster,LONDON,WC2,51.51651,-0.11968,1,Coffee Shop,Hotel,Sandwich Place,Café,Pub,Italian Restaurant,Theater,Restaurant,Hotel Bar,Burger Joint
4,Bromley,LONDON,SE20,51.48249,0.119194,10,Bus Station,Forest,Campground,Athletics & Sports,Park,Construction & Landscaping,Café,Bus Stop,Gastropub,Supermarket


Dropna for the Cluster Labels

In [136]:
London_labeled = London_labeled.dropna(subset=['Cluster Labels'])

# London cluster map

## Map

In [137]:
London_labeled.columns

Index(['Borough', 'Neighbourhood', 'Post_code', 'latitude', 'longitude',
       'Cluster Labels', '1st Most Common Venue', '2nd Most Common Venue',
       '3rd Most Common Venue', '4th Most Common Venue',
       '5th Most Common Venue', '6th Most Common Venue',
       '7th Most Common Venue', '8th Most Common Venue',
       '9th Most Common Venue', '10th Most Common Venue'],
      dtype='object')

In [139]:
# create map
map_clusters_london = folium.Map(location=London_loc, zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(London_labeled['latitude'], London_labeled['longitude'],
                                  London_labeled['Borough'], London_labeled['Cluster Labels']):
    label = folium.Popup('Cluster ' + str(int(cluster) +1) + '\n' + str(poi) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7
        ).add_to(map_clusters_london)
        
map_clusters_london

## Examining London Clusters

We take an exemple for the cluster equal to 1

### Cluster 1

In [146]:
London_labeled.loc[London_labeled['Cluster Labels'] == 1, London_labeled.columns[[1] + list(range(5, London_labeled.shape[1]))]]


Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,LONDON,1,Hotel,Coffee Shop,Italian Restaurant,Gym / Fitness Center,Pub,Sandwich Place,Restaurant,Wine Bar,Falafel Restaurant,Cocktail Bar
3,LONDON,1,Coffee Shop,Hotel,Sandwich Place,Café,Pub,Italian Restaurant,Theater,Restaurant,Hotel Bar,Burger Joint
5,LONDON,1,Coffee Shop,Pub,Café,Food Truck,Vietnamese Restaurant,Cocktail Bar,Park,Italian Restaurant,Hotel,Breakfast Spot
6,LONDON,1,Coffee Shop,Pub,Café,Food Truck,Vietnamese Restaurant,Cocktail Bar,Park,Italian Restaurant,Hotel,Breakfast Spot
10,LONDON,1,Pub,Coffee Shop,Café,Hotel,Park,Sandwich Place,Grocery Store,Bar,Bakery,Bus Stop
...,...,...,...,...,...,...,...,...,...,...,...,...
294,LONDON,1,Coffee Shop,Pub,Hotel,Bar,Café,Gym / Fitness Center,Italian Restaurant,Sandwich Place,Thai Restaurant,Grocery Store
295,LONDON,1,Coffee Shop,Hotel,Sandwich Place,Café,Pub,Italian Restaurant,Theater,Restaurant,Hotel Bar,Burger Joint
303,LONDON,1,Coffee Shop,Hotel,Indian Restaurant,Pub,Café,Bar,Pizza Place,Gym / Fitness Center,Korean Restaurant,Middle Eastern Restaurant
304,"LONDON, WOODFORD GREEN",1,Coffee Shop,Hotel,Indian Restaurant,Pub,Café,Grocery Store,Bar,Pizza Place,Sandwich Place,Middle Eastern Restaurant


# Paris Data

## Import data

In [158]:
import json
import pandas as pd
from pandas.io.json import json_normalize

file = open('datasets/correspondances-code-insee-code-postal.json', "r")
text = file.read()
text = json.loads(text)

df = pd.DataFrame(json_normalize(text))
df

Unnamed: 0,datasetid,recordid,record_timestamp,fields.code_comm,fields.nom_dept,fields.statut,fields.z_moyen,fields.nom_region,fields.code_reg,fields.insee_com,...,fields.id_geofla,fields.code_cant,fields.geo_shape.type,fields.geo_shape.coordinates,fields.superficie,fields.nom_comm,fields.code_arr,fields.population,geometry.type,geometry.coordinates
0,correspondances-code-insee-code-postal,2bf36b38314b6c39dfbcd09225f97fa532b1fc45,2016-09-21T00:29:06.175+02:00,645,ESSONNE,Commune simple,121.0,ILE-DE-FRANCE,11,91645,...,16275,03,Polygon,"[[[2.238024349288764, 48.735565859837095], [2....",999.0,VERRIERES-LE-BUISSON,3,15.5,Point,"[2.251712972144151, 48.750443119964764]"
1,correspondances-code-insee-code-postal,7ee82e74e059b443df18bb79fc5a19b1f05e5a88,2016-09-21T00:29:06.175+02:00,133,SEINE-ET-MARNE,Commune simple,88.0,ILE-DE-FRANCE,11,77133,...,31428,20,Polygon,"[[[3.076046701822989, 48.397361878531605], [3....",1082.0,COURCELLES-EN-BASSEE,3,0.2,Point,"[3.052940505560729, 48.41256065214989]"
2,correspondances-code-insee-code-postal,e2cd3186f07286705ed482a10b6aebd9de633c81,2016-09-21T00:29:06.175+02:00,378,ESSONNE,Commune simple,150.0,ILE-DE-FRANCE,11,91378,...,30975,09,Polygon,"[[[2.203466690733517, 48.51655284725087], [2.1...",313.0,MAUCHAMPS,1,0.3,Point,"[2.19718165044305, 48.52726809075556]"
3,correspondances-code-insee-code-postal,868bf03527a1d0a9defe5cf4e6fa0a730d725699,2016-09-21T00:29:06.175+02:00,243,SEINE-ET-MARNE,Chef-lieu canton,71.0,ILE-DE-FRANCE,11,77243,...,17000,14,Polygon,"[[[2.727542158243183, 48.85975862454365], [2.7...",579.0,LAGNY-SUR-MARNE,5,20.2,Point,"[2.7097808131278462, 48.87307018579678]"
4,correspondances-code-insee-code-postal,21e809b1d4480333c8b6fe7addd8f3b06f343e2c,2016-09-21T00:29:06.175+02:00,003,VAL-DE-MARNE,Chef-lieu canton,70.0,ILE-DE-FRANCE,11,94003,...,32123,34,Polygon,"[[[2.34385114554979, 48.79766105911435], [2.32...",232.0,ARCUEIL,3,19.5,Point,"[2.333510249842654, 48.80588035965699]"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1295,correspondances-code-insee-code-postal,e48340f14024559a7602be7aa5167cf2af29b459,2016-09-21T00:29:06.175+02:00,068,SEINE-ET-MARNE,Commune simple,137.0,ILE-DE-FRANCE,11,77068,...,21587,10,Polygon,"[[[3.161508435480842, 48.49807082682062], [3.1...",529.0,CESSOY-EN-MONTOIS,3,0.2,Point,"[3.138844194183689, 48.50730730461658]"
1296,correspondances-code-insee-code-postal,64afe3728721b9954d7f2da353419df0d4b88b4e,2016-09-21T00:29:06.175+02:00,078,SEINE-SAINT-DENIS,Chef-lieu canton,65.0,ILE-DE-FRANCE,11,93078,...,24704,40,Polygon,"[[[2.557045023117815, 48.935302946618414], [2....",1042.0,VILLEPINTE,2,35.7,Point,"[2.536306342059409, 48.95902025378707]"
1297,correspondances-code-insee-code-postal,24353a5117491797d2ef35d0ab6a179b6d9c254f,2016-09-21T00:29:06.175+02:00,061,SEINE-ET-MARNE,Commune simple,60.0,ILE-DE-FRANCE,11,77061,...,20172,20,Polygon,"[[[3.004939078607779, 48.33869986171514], [3.0...",862.0,CANNES-ECLUSE,3,2.6,Point,"[2.990786679832767, 48.36403767307805]"
1298,correspondances-code-insee-code-postal,47a9cca82e7c9fdea46fa74a7731f9be64785b09,2016-09-21T00:29:06.175+02:00,677,YVELINES,Commune simple,96.0,ILE-DE-FRANCE,11,78677,...,24364,07,Polygon,"[[[1.702290092689364, 48.91216884312589], [1.6...",462.0,VILLETTE,1,0.5,Point,"[1.6937417245662671, 48.92627887061508]"


## Select Features

In [160]:
communes_paris_df = df[['fields.postal_code','fields.nom_comm','fields.nom_dept','fields.geo_point_2d']]
communes_paris_df.columns = ['postal_code','nom_comm','nom_dept','geo_point_2d']
communes_paris_df.head()

Unnamed: 0,postal_code,nom_comm,nom_dept,geo_point_2d
0,91370,VERRIERES-LE-BUISSON,ESSONNE,"[48.750443119964764, 2.251712972144151]"
1,77126,COURCELLES-EN-BASSEE,SEINE-ET-MARNE,"[48.41256065214989, 3.052940505560729]"
2,91730,MAUCHAMPS,ESSONNE,"[48.52726809075556, 2.19718165044305]"
3,77400,LAGNY-SUR-MARNE,SEINE-ET-MARNE,"[48.87307018579678, 2.7097808131278462]"
4,94110,ARCUEIL,VAL-DE-MARNE,"[48.80588035965699, 2.333510249842654]"


## Gelocations Neighbourhoods Paris

In [161]:
communes_paris_df['latitude'] = communes_paris_df['geo_point_2d'].apply(lambda x: float(x[0]))
communes_paris_df['longitude'] = communes_paris_df['geo_point_2d'].apply(lambda x: float(x[1]))
communes_paris_df.drop(['geo_point_2d'], axis=1, inplace=True)
communes_paris_df.head()

Unnamed: 0,postal_code,nom_comm,nom_dept,latitude,longitude
0,91370,VERRIERES-LE-BUISSON,ESSONNE,48.750443,2.251713
1,77126,COURCELLES-EN-BASSEE,SEINE-ET-MARNE,48.412561,3.052941
2,91730,MAUCHAMPS,ESSONNE,48.527268,2.197182
3,77400,LAGNY-SUR-MARNE,SEINE-ET-MARNE,48.87307,2.709781
4,94110,ARCUEIL,VAL-DE-MARNE,48.80588,2.33351


In [172]:
communes_paris_df.shape

(1300, 5)

The free Foursquare API only offer 950 regular calls per day, we have to spilt the dataframe into two 

In [173]:
communes_paris_df1 = communes_paris_df.head(700)
communes_paris_df2 = communes_paris_df.tail(600)

## Function to get the geo 2D position

In [162]:
# For France
def get_2D_FR(address):
    lat_coords = 0
    lng_coords = 0
    g = geocode(address='{}, France'.format(address))[0]
    lng_coords = g['location']['x']
    lat_coords = g['location']['y']
    return [str(lat_coords), str(lng_coords)]

In [163]:
paris_loc = get_2D_FR('paris')
paris_loc

['48.85717000000005', '2.3414000000000215']

# Grand paris areas map

In [166]:
# Creating the map of Paris
map_Paris= folium.Map(location=paris_loc, zoom_start=12)
map_Paris

# adding markers to map
for latitude, longitude, borough, town in zip(communes_paris_df['latitude'], communes_paris_df['longitude'],
                                              communes_paris_df['nom_comm'], communes_paris_df['nom_dept']):
    label = '{}, {}'.format(town, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_opacity=0.8
        ).add_to(map_Paris)  
    
map_Paris

# Venues in Grand Paris areas

Venues in Grand Paris areas dataframe 1

In [174]:
venues_in_Paris1 = getNearbyVenues(names=communes_paris_df1['nom_comm'],
                                   latitudes=communes_paris_df1['latitude'],
                                   longitudes=communes_paris_df1['longitude']
                                  )
venues_in_Paris1


VERRIERES-LE-BUISSON
COURCELLES-EN-BASSEE
MAUCHAMPS
LAGNY-SUR-MARNE
ARCUEIL
SAINT-HILLIERS
SAINT-PATHUS
GRESSY
GUYANCOURT
SAINT-GERMAIN-SUR-ECOLE
MENNECY
TORFOU
SOISY-SOUS-MONTMORENCY
BOISSISE-LE-ROI
CONDE-SAINTE-LIBIAIRE
SERVON
REAU
ONCY-SUR-ECOLE
FONTAINS
PLESSIS-SAINT-BENOIST
PONTCARRE
VIGNY
COURCELLES-SUR-VIOSNE
NEAUPHLETTE
LEUDEVILLE
MAUREPAS
ORVILLIERS
SAINT-CYR-EN-ARTHIES
LONGNES
AUVERS-SUR-OISE
BELLOY-EN-FRANCE
GRISY-LES-PLATRES
MARCHEMORET
LA CROIX-EN-BRIE
SCEAUX
PARMAIN
BEAUMONT-SUR-OISE
VALENCE-EN-BRIE
BOUGIVAL
MONTALET-LE-BOIS
VAYRES-SUR-ESSONNE
GOUPILLIERES
CLAYE-SOUILLY
ORLY
ARNOUVILLE-LES-MANTES
LA BOISSIERE-ECOLE
COUTENCON
BRETIGNY-SUR-ORGE
CREGY-LES-MEAUX
FAREMOUTIERS
PUISIEUX
GROSLAY
CHAINTREAUX
MONDREVILLE
LE TARTRE-GAUDRAN
VILLEMOISSON-SUR-ORGE
LA FERTE-SOUS-JOUARRE
HONDEVILLIERS
NOISY-SUR-OISE
LES ECRENNES
FLEXANVILLE
PRUNAY-SUR-ESSONNE
CARRIERES-SUR-SEINE
LA NORVILLE
CELY
GAZERAN
CARNETIN
LE PLESSIS-PATE
VILLAINES-SOUS-BOIS
MARLY-LE-ROI
BRIE-COMTE-ROBERT
CHEVREUSE

FLEURY-MEROGIS
NOISIEL
ANDREZEL
BOURRON-MARLOTTE
TROCY-EN-MULTIEN
SAINT-ESCOBILLE
AMBLEVILLE
RUNGIS
PARIS-8E-ARRONDISSEMENT
ABLEIGES
EGLY
SAINT-WITZ
SAINT-SAUVEUR-SUR-ECOLE
OISSERY
HERMERAY
ROSNY-SUR-SEINE
CLICHY
SAINT-ARNOULT-EN-YVELINES
SOUPPES-SUR-LOING
LA HAUTE-MAISON
CHAMPCENEST
PARIS-13E-ARRONDISSEMENT
VILLEMAREUIL
SAINTE-COLOMBE
SAINT-MARS-VIEUX-MAISONS
EPINAY-SUR-ORGE
VILLEMOMBLE
THORIGNY-SUR-MARNE
MAINCY
DANNEMOIS
MONTIGNY-LES-CORMEILLES
SAINT-REMY-L'HONORE
BRY-SUR-MARNE
GRESSEY
PARIS-12E-ARRONDISSEMENT
LONGUESSE
NEAUPHLE-LE-VIEUX
SAINT-PIERRE-DU-PERRAY
BLANDY
BOISSY-LE-CHATEL
CACHAN
BAZEMONT
FRETOY
CHATOU
LE PORT-MARLY
CHAUVRY
SARCELLES
SAINT-REMY-LES-CHEVREUSE
VILLE-SAINT-JACQUES
MONTARLOT
LAINVILLE-EN-VEXIN
LINAS
NOISY-LE-GRAND
VETHEUIL
PARIS-5E-ARRONDISSEMENT
MAUDETOUR-EN-VEXIN
MONTIGNY-SUR-LOING
AUGERS-EN-BRIE
FUBLAINES
DIANT
TAVERNY
VILLENEUVE-LES-BORDES
SAINTS
GOMETZ-LE-CHATEL
PASSY-SUR-SEINE
BONNEUIL-EN-FRANCE
MONTGEROULT
LE VAL-SAINT-GERMAIN
VITRY-SUR-SEINE
THOURY-FER

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,VERRIERES-LE-BUISSON,48.750443,2.251713,Poney Club de Verrières,48.747829,2.253170,Stables
1,VERRIERES-LE-BUISSON,48.750443,2.251713,Restaurant Des Gatines,48.747892,2.249335,French Restaurant
2,LAGNY-SUR-MARNE,48.873070,2.709781,Lagny's Pizza,48.873843,2.712933,Pizza Place
3,LAGNY-SUR-MARNE,48.873070,2.709781,BIENVENU DECO,48.870255,2.706794,Arts & Crafts Store
4,LAGNY-SUR-MARNE,48.873070,2.709781,"HYPNOOSEZ, Gersende DIQUELOU",48.869401,2.708859,Health & Beauty Service
...,...,...,...,...,...,...,...
2004,LE VESINET,48.893864,2.130393,Marché du Vésinet,48.893088,2.132603,Farmers Market
2005,LE VESINET,48.893864,2.130393,Place du Marché,48.892657,2.131438,Plaza
2006,LE VESINET,48.893864,2.130393,Soprano,48.892711,2.133261,Italian Restaurant
2007,LE VESINET,48.893864,2.130393,Cinéma Jean Marais,48.894152,2.134367,Movie Theater


In [175]:
print(venues_in_Paris1.shape)

(2009, 7)


Venues in Grand Paris areas dataframe 2

In [176]:
venues_in_Paris2 = getNearbyVenues(names=communes_paris_df2['nom_comm'],
                                   latitudes=communes_paris_df2['latitude'],
                                   longitudes=communes_paris_df2['longitude']
                                  )
venues_in_Paris2

FLINS-SUR-SEINE
BOULOGNE-BILLANCOURT
OSMOY
PALAISEAU
TOUSSON
MOURS
BRUNOY
BRIERES-LES-SCELLES
GUIRY-EN-VEXIN
CHAILLY-EN-BRIE
PARIS-19E-ARRONDISSEMENT
AMENUCOURT
BEZONS
PARIS-20E-ARRONDISSEMENT
CHARTRETTES
BUTHIERS
VAUDOY-EN-BRIE
HARDRICOURT
PISCOP
AMPONVILLE
LIVERDY-EN-BRIE
LAVAL-EN-BRIE
BUNO-BONNEVAUX
BOISSY-MAUVOISIN
MAISONCELLES-EN-GATINAIS
LES MUREAUX
CLAIREFONTAINE-EN-YVELINES
MALAKOFF
DONNEMARIE-DONTILLY
SAINT-BRICE-SOUS-FORET
SAINT-GERMAIN-LAXIS
CHANGIS-SUR-MARNE
MEZY-SUR-SEINE
BOIS-LE-ROI
COUILLY-PONT-AUX-DAMES
BOBIGNY
PRUNAY-LE-TEMPLE
GERMIGNY-SOUS-COULOMBS
FONTENAY-EN-PARISIS
CHARENTON-LE-PONT
SAINT-CYR-SOUS-DOURDAN
VALPUISEAUX
LE PIN
PARIS-10E-ARRONDISSEMENT
RUEIL-MALMAISON
LISSES
MONTGE-EN-GOELE
COURCOURONNES
VILLIERS-LE-BEL
CROISSY-SUR-SEINE
PARIS-16E-ARRONDISSEMENT
POMMEUSE
GUITRANCOURT
MORSANG-SUR-ORGE
LEVALLOIS-PERRET
ROISSY-EN-BRIE
VOULTON
LA CELLE-LES-BORDES
MONTHYON
LES BREVIAIRES
COUBRON
VIGNEUX-SUR-SEINE
SAINT-MARTIN-DU-BOSCHET
RUPEREUX
JOUARS-PONTCHARTRAIN
BERNAY-

CANNES-ECLUSE
VILLETTE
LE PLESSIS-LUZARCHES


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,FLINS-SUR-SEINE,48.967369,1.872722,McDonald's,48.967841,1.867321,Fast Food Restaurant
1,FLINS-SUR-SEINE,48.967369,1.872722,El Rancho,48.968877,1.871479,Mexican Restaurant
2,FLINS-SUR-SEINE,48.967369,1.872722,MAYANIM,48.964174,1.871499,Event Service
3,FLINS-SUR-SEINE,48.967369,1.872722,KFC,48.967923,1.866344,Fast Food Restaurant
4,FLINS-SUR-SEINE,48.967369,1.872722,SOARES HABITAT,48.964067,1.868191,Garden
...,...,...,...,...,...,...,...
1793,IVERNY,48.996244,2.790889,A.T.P,48.999194,2.795645,Construction & Landscaping
1794,PAMFOU,48.468614,2.864156,L'Escargot de France,48.468924,2.858033,Farmers Market
1795,VILLEPINTE,48.959020,2.536306,McDonald's,48.960226,2.542867,Fast Food Restaurant
1796,VILLEPINTE,48.959020,2.536306,Saray,48.954571,2.535999,Middle Eastern Restaurant


In [177]:
print(venues_in_Paris2.shape)

(1798, 7)


Concat the two dataframe as one named venues_in_Paris

In [179]:
venues_in_Paris = pd.concat([venues_in_Paris1, venues_in_Paris2], ignore_index=True)
venues_in_Paris.shape

(3807, 7)

In [180]:
venues_in_Paris.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,VERRIERES-LE-BUISSON,48.750443,2.251713,Poney Club de Verrières,48.747829,2.25317,Stables
1,VERRIERES-LE-BUISSON,48.750443,2.251713,Restaurant Des Gatines,48.747892,2.249335,French Restaurant
2,LAGNY-SUR-MARNE,48.87307,2.709781,Lagny's Pizza,48.873843,2.712933,Pizza Place
3,LAGNY-SUR-MARNE,48.87307,2.709781,BIENVENU DECO,48.870255,2.706794,Arts & Crafts Store
4,LAGNY-SUR-MARNE,48.87307,2.709781,"HYPNOOSEZ, Gersende DIQUELOU",48.869401,2.708859,Health & Beauty Service


check how many venues were returned for each neighborhood

In [182]:
venues_in_Paris.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ABLON-SUR-SEINE,3,3,3,3,3,3
ACHERES,1,1,1,1,1,1
ACHERES-LA-FORET,1,1,1,1,1,1
AIGREMONT,1,1,1,1,1,1
AINCOURT,1,1,1,1,1,1
...,...,...,...,...,...,...
VOISINS-LE-BRETONNEUX,5,5,5,5,5,5
VOULX,1,1,1,1,1,1
WISSOUS,1,1,1,1,1,1
YEBLES,1,1,1,1,1,1


how many unique categories can be curated from all the returned venues

In [183]:
print('There are {} uniques categories.'.format(len(venues_in_Paris['Venue Category'].unique())))

There are 361 uniques categories.


## One Hot Encoding

In [184]:
# one hot encoding
Paris_venue_onehot = pd.get_dummies(venues_in_Paris[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Paris_venue_onehot['Neighborhood'] = venues_in_Paris['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Paris_venue_onehot.columns[-1]] + list(Paris_venue_onehot.columns[:-1])
Paris_venue_onehot = Paris_venue_onehot[fixed_columns]

Paris_venue_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Warehouse Store,Watch Shop,Water Park,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,VERRIERES-LE-BUISSON,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,VERRIERES-LE-BUISSON,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,LAGNY-SUR-MARNE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,LAGNY-SUR-MARNE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,LAGNY-SUR-MARNE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [185]:
Paris_venue_onehot.shape

(3807, 362)

## Venue categories mean value

group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [186]:
Paris_grouped = Paris_venue_onehot.groupby('Neighborhood').mean().reset_index()
Paris_grouped

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Warehouse Store,Watch Shop,Water Park,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,ABLON-SUR-SEINE,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,ACHERES,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,ACHERES-LA-FORET,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,AIGREMONT,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,AINCOURT,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
671,VOISINS-LE-BRETONNEUX,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
672,VOULX,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
673,WISSOUS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
674,YEBLES,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


function to sort the venues in descending order

In [187]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

## Top venue categories

In [189]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns title according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
Paris_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
Paris_neighborhoods_venues_sorted['Neighborhood'] = Paris_grouped['Neighborhood']

for ind in np.arange(Paris_grouped.shape[0]):
    Paris_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Paris_grouped.iloc[ind, :], num_top_venues)

Paris_neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ABLON-SUR-SEINE,Train Station,Café,Fish Market,Accessories Store,Paintball Field,Pedestrian Plaza,Pastry Shop,Parking,Park,Paper / Office Supplies Store
1,ACHERES,Lake,Accessories Store,Outdoors & Recreation,Perfume Shop,Performing Arts Venue,Pedestrian Plaza,Pastry Shop,Parking,Park,Paper / Office Supplies Store
2,ACHERES-LA-FORET,Café,Accessories Store,Persian Restaurant,Performing Arts Venue,Pedestrian Plaza,Pastry Shop,Parking,Park,Paper / Office Supplies Store,Palace
3,AIGREMONT,Home Service,Accessories Store,Outdoors & Recreation,Performing Arts Venue,Pedestrian Plaza,Pastry Shop,Parking,Park,Paper / Office Supplies Store,Palace
4,AINCOURT,Construction & Landscaping,Accessories Store,Outdoors & Recreation,Perfume Shop,Performing Arts Venue,Pedestrian Plaza,Pastry Shop,Parking,Park,Paper / Office Supplies Store


# Cluster Neighborhoods Modeling for Paris

## kmeans

Cluster Paris areas to 20 catagories

In [208]:
# set number of clusters
kclusters_paris = 20

Paris_grouped_clustering = Paris_grouped.drop('Neighborhood', axis = 1)

# run k-means clustering
kmeans_paris = KMeans(n_clusters=kclusters_paris, random_state=0).fit(Paris_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans_paris.labels_[0:10] 

array([ 5, 15, 14,  3,  0, 14,  3,  5, 14, 14], dtype=int32)

## Add the cluster label into dataframe

In [210]:
Paris_neighborhoods_venues_sorted.drop(columns=['Cluster Labels'], inplace=True)

In [211]:
Paris_neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans_paris.labels_ +1)
Paris_neighborhoods_venues_sorted

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,6,ABLON-SUR-SEINE,Train Station,Café,Fish Market,Accessories Store,Paintball Field,Pedestrian Plaza,Pastry Shop,Parking,Park,Paper / Office Supplies Store
1,16,ACHERES,Lake,Accessories Store,Outdoors & Recreation,Perfume Shop,Performing Arts Venue,Pedestrian Plaza,Pastry Shop,Parking,Park,Paper / Office Supplies Store
2,15,ACHERES-LA-FORET,Café,Accessories Store,Persian Restaurant,Performing Arts Venue,Pedestrian Plaza,Pastry Shop,Parking,Park,Paper / Office Supplies Store,Palace
3,4,AIGREMONT,Home Service,Accessories Store,Outdoors & Recreation,Performing Arts Venue,Pedestrian Plaza,Pastry Shop,Parking,Park,Paper / Office Supplies Store,Palace
4,1,AINCOURT,Construction & Landscaping,Accessories Store,Outdoors & Recreation,Perfume Shop,Performing Arts Venue,Pedestrian Plaza,Pastry Shop,Parking,Park,Paper / Office Supplies Store
...,...,...,...,...,...,...,...,...,...,...,...,...
671,15,VOISINS-LE-BRETONNEUX,Brasserie,Tennis Stadium,Bakery,Restaurant,Athletics & Sports,Pedestrian Plaza,Pastry Shop,Parking,Park,Paper / Office Supplies Store
672,15,VOULX,Funeral Home,Perfume Shop,Pedestrian Plaza,Pastry Shop,Parking,Park,Paper / Office Supplies Store,Palace,Paintball Field,Outdoors & Recreation
673,3,WISSOUS,Restaurant,Accessories Store,Outdoor Sculpture,Performing Arts Venue,Pedestrian Plaza,Pastry Shop,Parking,Park,Paper / Office Supplies Store,Palace
674,2,YEBLES,Bar,Accessories Store,Outdoors & Recreation,Performing Arts Venue,Pedestrian Plaza,Pastry Shop,Parking,Park,Paper / Office Supplies Store,Palace


Combine dataframe communes_paris_df and Paris_neighborhoods_venues_sorted

In [212]:
communes_paris_df

Unnamed: 0,postal_code,nom_comm,nom_dept,latitude,longitude
0,91370,VERRIERES-LE-BUISSON,ESSONNE,48.750443,2.251713
1,77126,COURCELLES-EN-BASSEE,SEINE-ET-MARNE,48.412561,3.052941
2,91730,MAUCHAMPS,ESSONNE,48.527268,2.197182
3,77400,LAGNY-SUR-MARNE,SEINE-ET-MARNE,48.873070,2.709781
4,94110,ARCUEIL,VAL-DE-MARNE,48.805880,2.333510
...,...,...,...,...,...
1295,77520,CESSOY-EN-MONTOIS,SEINE-ET-MARNE,48.507307,3.138844
1296,93420,VILLEPINTE,SEINE-SAINT-DENIS,48.959020,2.536306
1297,77130,CANNES-ECLUSE,SEINE-ET-MARNE,48.364038,2.990787
1298,78930,VILLETTE,YVELINES,48.926279,1.693742


In [213]:
Paris_labeled = communes_paris_df

#Paris_labeled = Paris_labeled.join(Paris_neighborhoods_venues_sorted.set_index('Neighborhood'), on='Borough')

Paris_labeled = pd.merge(left=Paris_labeled, right=Paris_neighborhoods_venues_sorted,
                         left_on='nom_comm', right_on='Neighborhood')


Paris_labeled.head()

Unnamed: 0,postal_code,nom_comm,nom_dept,latitude,longitude,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,91370,VERRIERES-LE-BUISSON,ESSONNE,48.750443,2.251713,14,VERRIERES-LE-BUISSON,Stables,French Restaurant,Accessories Store,Outdoor Sculpture,Pedestrian Plaza,Pastry Shop,Parking,Park,Paper / Office Supplies Store,Palace
1,77400,LAGNY-SUR-MARNE,SEINE-ET-MARNE,48.87307,2.709781,15,LAGNY-SUR-MARNE,Health & Beauty Service,Pizza Place,Arts & Crafts Store,Peruvian Restaurant,Perfume Shop,Performing Arts Venue,Pedestrian Plaza,Pastry Shop,Parking,Park
2,94110,ARCUEIL,VAL-DE-MARNE,48.80588,2.33351,14,ARCUEIL,Brasserie,Japanese Restaurant,French Restaurant,Bike Rental / Bike Share,Bus Stop,Paintball Field,Performing Arts Venue,Pedestrian Plaza,Pastry Shop,Parking
3,78280,GUYANCOURT,YVELINES,48.773078,2.076052,14,GUYANCOURT,Japanese Restaurant,Supermarket,Park,French Restaurant,Smoke Shop,Hotel,Outdoor Sculpture,Pastry Shop,Parking,Paper / Office Supplies Store
4,91540,MENNECY,ESSONNE,48.558624,2.437532,6,MENNECY,Pizza Place,Accessories Store,Outdoors & Recreation,Performing Arts Venue,Pedestrian Plaza,Pastry Shop,Parking,Park,Paper / Office Supplies Store,Palace


In [214]:
Paris_labeled.shape

(678, 17)

Dropna for the Cluster Labels if needed

In [215]:
#Paris_labeled = Paris_labeled.dropna(subset=['Cluster Labels'])

# Paris cluster map

## Map

In [227]:
# create map
map_clusters_paris = folium.Map(location=paris_loc, zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters_paris)
ys = [i + x + (i*x)**2 for i in range(kclusters_paris)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Paris_labeled['latitude'], Paris_labeled['longitude'],
                                  Paris_labeled['nom_comm'], Paris_labeled['Cluster Labels']):
    label = folium.Popup('Cluster ' + str(int(cluster) +1) + '\n' + str(poi) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-10)],
        fill=True,
        fill_color=rainbow[int(cluster-10)],
        fill_opacity=0.7
        ).add_to(map_clusters_paris)
        
map_clusters_paris

## Examining Paris Clusters

We take an exemple for the cluster equal to 1

### Cluster 1

In [228]:
Paris_labeled.loc[Paris_labeled['Cluster Labels'] == 1, Paris_labeled.columns[[1] + list(range(5, Paris_labeled.shape[1]))]]


Unnamed: 0,nom_comm,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,AUVERS-SUR-OISE,1,AUVERS-SUR-OISE,Construction & Landscaping,Accessories Store,Outdoors & Recreation,Perfume Shop,Performing Arts Venue,Pedestrian Plaza,Pastry Shop,Parking,Park,Paper / Office Supplies Store
27,COUTENCON,1,COUTENCON,Construction & Landscaping,Home Service,Accessories Store,Paintball Field,Perfume Shop,Performing Arts Venue,Pedestrian Plaza,Pastry Shop,Parking,Park
35,GAZERAN,1,GAZERAN,Construction & Landscaping,Accessories Store,Outdoors & Recreation,Perfume Shop,Performing Arts Venue,Pedestrian Plaza,Pastry Shop,Parking,Park,Paper / Office Supplies Store
56,SAINT-AUGUSTIN,1,SAINT-AUGUSTIN,Construction & Landscaping,Pool,Accessories Store,Outdoors & Recreation,Performing Arts Venue,Pedestrian Plaza,Pastry Shop,Parking,Park,Paper / Office Supplies Store
58,CHAMPLAN,1,CHAMPLAN,Construction & Landscaping,Café,Miscellaneous Shop,Accessories Store,Paintball Field,Performing Arts Venue,Pedestrian Plaza,Pastry Shop,Parking,Park
85,MOUSSY-LE-VIEUX,1,MOUSSY-LE-VIEUX,Outdoors & Recreation,Construction & Landscaping,Accessories Store,Perfume Shop,Performing Arts Venue,Pedestrian Plaza,Pastry Shop,Parking,Park,Paper / Office Supplies Store
104,BREUIL-BOIS-ROBERT,1,BREUIL-BOIS-ROBERT,Caribbean Restaurant,Construction & Landscaping,Paintball Field,Persian Restaurant,Perfume Shop,Performing Arts Venue,Pedestrian Plaza,Pastry Shop,Parking,Park
149,AINCOURT,1,AINCOURT,Construction & Landscaping,Accessories Store,Outdoors & Recreation,Perfume Shop,Performing Arts Venue,Pedestrian Plaza,Pastry Shop,Parking,Park,Paper / Office Supplies Store
150,LA CHAPELLE-EN-VEXIN,1,LA CHAPELLE-EN-VEXIN,Construction & Landscaping,Accessories Store,Outdoors & Recreation,Perfume Shop,Performing Arts Venue,Pedestrian Plaza,Pastry Shop,Parking,Park,Paper / Office Supplies Store
152,CHAMPDEUIL,1,CHAMPDEUIL,Construction & Landscaping,Home Service,Accessories Store,Paintball Field,Perfume Shop,Performing Arts Venue,Pedestrian Plaza,Pastry Shop,Parking,Park


# Results and Discussion     

As we have seen, the clusters of various categories in Paris and London are intersected with each other and are very complicated. I think this is a manifestation of multiculturalism and the combination of traditional and modern architectural culture. In fact, I live in Paris and I can confirm to you that the classification of these clusters is quite reasonable. In Paris, there are many museums, many restaurants from all over the world, a very developed subway line(and of course, the airports in Paris also), and the most famous Parisian cafe. These are all important factors that affect the classification of the Kmeans model.

# Conclusion

Through the analysis of the cities of Paris and London, we found that this is very exciting, and the results of the analysis can guide the similarity of different districts in our city. Someday in the future, if you want to open a new European office in Paris for your company, this map will definitely give you a very good idea to find a suitable location. But, if you already have a successful commercial store in Paris, and then want to expand a new one, this model can help you accurately find a similar business environment location.  

**Perspective**: In order to further improve the accuracy of the model, a street can be analyzed, which means that more data needs to be introduced, such as population density, house density, and so on. I think adding population data is a feasible way to improve the model. I once participated in the work of a housing price prediction model. The current model can be used as an auxiliary model to support housing price prediction. We know that the same city pattern will have a very important impact on housing prices.

The end!