<center><font size=5>Clustering the Neighbourhoods of London and Paris </font></center>

# Problematic and project background

**As the capitals of France and the United Kingdom, Paris and London are also the most prestigious tourist cities in Europe. In terms of economy and culture, these two cities have very high similarities. When large international companies choose to create a new European office in Europe, they often choose Paris and London as candidates. But making a choice is very difficult. Now we try to analyse the neighbourhoods of London and Paris respectively and picture insights to what they look like.**

London

In [19]:
Image(url= "https://london.ac.uk/sites/default/files/styles/promo_large/public/2018-10/london-aerial-cityscape-river-thames_1.jpg",width=400)

Paris

In [18]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "https://images.prismic.io/figaroimmo%2F99439d29-f927-483b-9667-d280eaf7d061_shutterstock_1420728554-compressor.jpg",width=400)

# Data Description

### London  

The data about london areas is available from Wikipedia https://en.wikipedia.org/wiki/List_of_areas_of_London.   
we can get all the information about the neighbourhoods

1. London borough : Name of Neighbourhood
2. Post town : Name of borough
3. post_code : Postal codes for London.


### Paris
To derive our solution, We leverage JSON data available at https://www.data.gouv.fr/fr/datasets/r/e88c6fda-1d09-42a0-a069-606d3259114e

The JSON file has data about all the neighbourhoods in France.

1. postal_code : Postal codes for France
2. nom_comm : Name of Neighbourhoods in France
3. nom_dept : Name of the boroughs, equivalent to towns in France
4. geo_point_2d : Tuple containing the latitude and longitude of the Neighbourhoods.

# Lib

In [35]:
import pandas as pd
import requests
import numpy as np
import geopandas as gpd
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium

# import k-means for the clustering stage
from sklearn.cluster import KMeans

import warnings
warnings.filterwarnings("ignore")

# London Data

## Get london data

In [41]:
url_grand_london = "https://en.wikipedia.org/wiki/List_of_areas_of_London"
wiki_grand_london_url = requests.get(url_grand_london)

wiki_grand_london_data = pd.read_html(wiki_grand_london_url.text)

grand_london_wiki_df = wiki_grand_london_data[1]
grand_london_wiki_df.head()

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


## Select columns London borough/Post town/Postcode district

In [43]:
grand_london_wiki_df.columns

Index(['Location', 'London borough', 'Post town', 'Postcode district',
       'Dial code', 'OS grid ref'],
      dtype='object')

In [44]:
grand_london_df = grand_london_wiki_df.iloc[:,[1,2,3]]
grand_london_df.columns = ['Borough','Neighbourhood','Post_code']
grand_london_df.head()

Unnamed: 0,Borough,Neighbourhood,Post_code
0,"Bexley, Greenwich [7]",LONDON,SE2
1,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4"
2,Croydon[8],CROYDON,CR0
3,Croydon[8],CROYDON,CR0
4,Bexley,"BEXLEY, SIDCUP","DA5, DA14"


Remove [  ] from the Borough column

In [45]:
grand_london_df['Borough'] = grand_london_df['Borough'].map(lambda x: x.split('[')[0].strip())
grand_london_df.head()

Unnamed: 0,Borough,Neighbourhood,Post_code
0,"Bexley, Greenwich",LONDON,SE2
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4"
2,Croydon,CROYDON,CR0
3,Croydon,CROYDON,CR0
4,Bexley,"BEXLEY, SIDCUP","DA5, DA14"


## Select the areas only in London

In [77]:
london_df = grand_london_df[grand_london_df['Neighbourhood'].str.contains('LONDON')]
london_df.reset_index(drop=True,inplace=True)
london_df.head()

Unnamed: 0,Borough,Neighbourhood,Post_code
0,"Bexley, Greenwich",LONDON,SE2
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4"
2,City,LONDON,EC3
3,Westminster,LONDON,WC2
4,Bromley,LONDON,SE20


In [78]:
london_df.shape

(308, 3)

## Add Geolocations for London Neighbourhoods

In [79]:
from arcgis.geocoding import geocode
from arcgis.gis import GIS
gis = GIS()

### Function to get the geo 2D position

UK

In [81]:
# For Canada
def get_2D_UK(address):
    lat_coords = 0
    lng_coords = 0
    g = geocode(address='{}, England, GBR'.format(address))[0]
    lng_coords = g['location']['x']
    lat_coords = g['location']['y']
    return [str(lat_coords), str(lng_coords)]

Test the geo function

In [82]:
get_2D_UK('W3, W4')

['51.51324000000005', '-0.2674599999999714']

Get the London postal code series

In [83]:
london_postalcode = london_df['Post_code']
london_postalcode.head()

0       SE2
1    W3, W4
2       EC3
3       WC2
4      SE20
Name: Post_code, dtype: object

### Query geo 2D position

According the postal code, retrieve the 2D geo positions

In [84]:
london_geo_2D = london_postalcode.apply(lambda x: get_2D_UK(x))
london_geo_2D.head()

0    [51.492450000000076, 0.12127000000003818]
1     [51.51324000000005, -0.2674599999999714]
2    [51.51200000000006, -0.08057999999994081]
3    [51.51651000000004, -0.11967999999995982]
4     [51.48249000000004, 0.11919361600007505]
Name: Post_code, dtype: object

Merge the tow dataframe into one

In [85]:
london_geo_2D.name='geo_2D'
london_merged = pd.concat([london_df,london_geo_2D], axis=1)
london_merged.head()

Unnamed: 0,Borough,Neighbourhood,Post_code,geo_2D
0,"Bexley, Greenwich",LONDON,SE2,"[51.492450000000076, 0.12127000000003818]"
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4","[51.51324000000005, -0.2674599999999714]"
2,City,LONDON,EC3,"[51.51200000000006, -0.08057999999994081]"
3,Westminster,LONDON,WC2,"[51.51651000000004, -0.11967999999995982]"
4,Bromley,LONDON,SE20,"[51.48249000000004, 0.11919361600007505]"


### Construct the final dataframe london

In [86]:
london_merged['latitude'] = london_merged['geo_2D'].apply(lambda x: float(x[0]))
london_merged['longitude'] = london_merged['geo_2D'].apply(lambda x: float(x[1]))
london_merged.drop(['geo_2D'], axis=1, inplace=True)
london_merged.head()

Unnamed: 0,Borough,Neighbourhood,Post_code,latitude,longitude
0,"Bexley, Greenwich",LONDON,SE2,51.49245,0.12127
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",51.51324,-0.26746
2,City,LONDON,EC3,51.512,-0.08058
3,Westminster,LONDON,WC2,51.51651,-0.11968
4,Bromley,LONDON,SE20,51.48249,0.119194


In [88]:
print(london_df.shape)
print(london_merged.shape)

(308, 3)
(308, 5)


Check the rows for before and after the combination, nothing missed. it's great!

# Paris Data

## Import data

In [158]:
import json
import pandas as pd
from pandas.io.json import json_normalize

file = open('datasets/correspondances-code-insee-code-postal.json', "r")
text = file.read()
text = json.loads(text)

df = pd.DataFrame(json_normalize(text))
df

Unnamed: 0,datasetid,recordid,record_timestamp,fields.code_comm,fields.nom_dept,fields.statut,fields.z_moyen,fields.nom_region,fields.code_reg,fields.insee_com,...,fields.id_geofla,fields.code_cant,fields.geo_shape.type,fields.geo_shape.coordinates,fields.superficie,fields.nom_comm,fields.code_arr,fields.population,geometry.type,geometry.coordinates
0,correspondances-code-insee-code-postal,2bf36b38314b6c39dfbcd09225f97fa532b1fc45,2016-09-21T00:29:06.175+02:00,645,ESSONNE,Commune simple,121.0,ILE-DE-FRANCE,11,91645,...,16275,03,Polygon,"[[[2.238024349288764, 48.735565859837095], [2....",999.0,VERRIERES-LE-BUISSON,3,15.5,Point,"[2.251712972144151, 48.750443119964764]"
1,correspondances-code-insee-code-postal,7ee82e74e059b443df18bb79fc5a19b1f05e5a88,2016-09-21T00:29:06.175+02:00,133,SEINE-ET-MARNE,Commune simple,88.0,ILE-DE-FRANCE,11,77133,...,31428,20,Polygon,"[[[3.076046701822989, 48.397361878531605], [3....",1082.0,COURCELLES-EN-BASSEE,3,0.2,Point,"[3.052940505560729, 48.41256065214989]"
2,correspondances-code-insee-code-postal,e2cd3186f07286705ed482a10b6aebd9de633c81,2016-09-21T00:29:06.175+02:00,378,ESSONNE,Commune simple,150.0,ILE-DE-FRANCE,11,91378,...,30975,09,Polygon,"[[[2.203466690733517, 48.51655284725087], [2.1...",313.0,MAUCHAMPS,1,0.3,Point,"[2.19718165044305, 48.52726809075556]"
3,correspondances-code-insee-code-postal,868bf03527a1d0a9defe5cf4e6fa0a730d725699,2016-09-21T00:29:06.175+02:00,243,SEINE-ET-MARNE,Chef-lieu canton,71.0,ILE-DE-FRANCE,11,77243,...,17000,14,Polygon,"[[[2.727542158243183, 48.85975862454365], [2.7...",579.0,LAGNY-SUR-MARNE,5,20.2,Point,"[2.7097808131278462, 48.87307018579678]"
4,correspondances-code-insee-code-postal,21e809b1d4480333c8b6fe7addd8f3b06f343e2c,2016-09-21T00:29:06.175+02:00,003,VAL-DE-MARNE,Chef-lieu canton,70.0,ILE-DE-FRANCE,11,94003,...,32123,34,Polygon,"[[[2.34385114554979, 48.79766105911435], [2.32...",232.0,ARCUEIL,3,19.5,Point,"[2.333510249842654, 48.80588035965699]"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1295,correspondances-code-insee-code-postal,e48340f14024559a7602be7aa5167cf2af29b459,2016-09-21T00:29:06.175+02:00,068,SEINE-ET-MARNE,Commune simple,137.0,ILE-DE-FRANCE,11,77068,...,21587,10,Polygon,"[[[3.161508435480842, 48.49807082682062], [3.1...",529.0,CESSOY-EN-MONTOIS,3,0.2,Point,"[3.138844194183689, 48.50730730461658]"
1296,correspondances-code-insee-code-postal,64afe3728721b9954d7f2da353419df0d4b88b4e,2016-09-21T00:29:06.175+02:00,078,SEINE-SAINT-DENIS,Chef-lieu canton,65.0,ILE-DE-FRANCE,11,93078,...,24704,40,Polygon,"[[[2.557045023117815, 48.935302946618414], [2....",1042.0,VILLEPINTE,2,35.7,Point,"[2.536306342059409, 48.95902025378707]"
1297,correspondances-code-insee-code-postal,24353a5117491797d2ef35d0ab6a179b6d9c254f,2016-09-21T00:29:06.175+02:00,061,SEINE-ET-MARNE,Commune simple,60.0,ILE-DE-FRANCE,11,77061,...,20172,20,Polygon,"[[[3.004939078607779, 48.33869986171514], [3.0...",862.0,CANNES-ECLUSE,3,2.6,Point,"[2.990786679832767, 48.36403767307805]"
1298,correspondances-code-insee-code-postal,47a9cca82e7c9fdea46fa74a7731f9be64785b09,2016-09-21T00:29:06.175+02:00,677,YVELINES,Commune simple,96.0,ILE-DE-FRANCE,11,78677,...,24364,07,Polygon,"[[[1.702290092689364, 48.91216884312589], [1.6...",462.0,VILLETTE,1,0.5,Point,"[1.6937417245662671, 48.92627887061508]"


## Select Features

In [160]:
communes_paris_df = df[['fields.postal_code','fields.nom_comm','fields.nom_dept','fields.geo_point_2d']]
communes_paris_df.columns = ['postal_code','nom_comm','nom_dept','geo_point_2d']
communes_paris_df.head()

Unnamed: 0,postal_code,nom_comm,nom_dept,geo_point_2d
0,91370,VERRIERES-LE-BUISSON,ESSONNE,"[48.750443119964764, 2.251712972144151]"
1,77126,COURCELLES-EN-BASSEE,SEINE-ET-MARNE,"[48.41256065214989, 3.052940505560729]"
2,91730,MAUCHAMPS,ESSONNE,"[48.52726809075556, 2.19718165044305]"
3,77400,LAGNY-SUR-MARNE,SEINE-ET-MARNE,"[48.87307018579678, 2.7097808131278462]"
4,94110,ARCUEIL,VAL-DE-MARNE,"[48.80588035965699, 2.333510249842654]"


## Gelocations Neighbourhoods Paris

In [161]:
communes_paris_df['latitude'] = communes_paris_df['geo_point_2d'].apply(lambda x: float(x[0]))
communes_paris_df['longitude'] = communes_paris_df['geo_point_2d'].apply(lambda x: float(x[1]))
communes_paris_df.drop(['geo_point_2d'], axis=1, inplace=True)
communes_paris_df.head()

Unnamed: 0,postal_code,nom_comm,nom_dept,latitude,longitude
0,91370,VERRIERES-LE-BUISSON,ESSONNE,48.750443,2.251713
1,77126,COURCELLES-EN-BASSEE,SEINE-ET-MARNE,48.412561,3.052941
2,91730,MAUCHAMPS,ESSONNE,48.527268,2.197182
3,77400,LAGNY-SUR-MARNE,SEINE-ET-MARNE,48.87307,2.709781
4,94110,ARCUEIL,VAL-DE-MARNE,48.80588,2.33351


In [172]:
communes_paris_df.shape

(1300, 5)

The free Foursquare API only offer 950 regular calls per day, we have to spilt the dataframe into two 

In [173]:
communes_paris_df1 = communes_paris_df.head(700)
communes_paris_df2 = communes_paris_df.tail(600)

## Function to get the geo 2D position

In [162]:
# For France
def get_2D_FR(address):
    lat_coords = 0
    lng_coords = 0
    g = geocode(address='{}, France'.format(address))[0]
    lng_coords = g['location']['x']
    lat_coords = g['location']['y']
    return [str(lat_coords), str(lng_coords)]

In [163]:
paris_loc = get_2D_FR('paris')
paris_loc

['48.85717000000005', '2.3414000000000215']