# IBM Capstone Project - Neighborhoods comparison between Paris and London

This notebook will include the IBM capstone project.

## Table of contents
* [Introduction](#Introduction)
* [Data](#Data)
* [Methodology](#Methodology)
* [Analysis](#Analysis)
* [Conclusion](#Conclusion)

## Introduction 

In this project, I will compare the neighbourhoods of Paris and London in terms of common places and venues. I will then cluster the neighbourhoods within each city using **DBSCAN** algorithm which is an unsupervised machine learning algorithm and then I will explore what are the common venues within each cluster. 

### Interset 

The findings of this project will help **travellers** and people who never been to these cities to get to know them very well and choose which city to visit based on the venues within them. 

## Data 

In this project, I will use the following data: 
* Neighbourhood data using web scraping. 
* Coordinates data using geopy library.
* Venues using Foursqure API.

First, I will install the libraries that I'm going to use in web scrabing.

In [1]:
!pip install bs4
!pip install requests

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting bs4
  Downloading bs4-0.0.1.tar.gz (1.1 kB)
Building wheels for collected packages: bs4
  Building wheel for bs4 (setup.py) ... [?25ldone
[?25h  Created wheel for bs4: filename=bs4-0.0.1-py3-none-any.whl size=1273 sha256=25afcee848fa89a7ce141e1a0f9c29ec09fb1a8c598bf50ed9b3f89d09fa7c30
  Stored in directory: /tmp/wsuser/.cache/pip/wheels/0a/9e/ba/20e5bbc1afef3a491f0b3bb74d508f99403aabe76eda2167ca
Successfully built bs4
Installing collected packages: bs4
Successfully installed bs4-0.0.1
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes


Then, I will import the libraries I will be using in web scrabing. 

In [2]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import json
import numpy as np

## Neighbourhood Data

I will start with **Paris** neighbourhoods.

First, the url that contains the data to be web scrabed will be assigned to a variable 'url1'.

In [3]:
url1 = 'https://www.worldpostalcodes.org/en/france/department/list-of-postal-codes-in-paris'

Data will then be pulled from the url and converted into text.

In [4]:
data1 = requests.get(url1).text


A beautifulsoap object will be created and then I will retrieve all the tables using the find_all method.

In [5]:
soup = BeautifulSoup(data1,"html5lib")
tables1 = soup.find_all('table')


The table of interest is extracted.

In [6]:
pd.read_html(str(tables1[0]), flavor='bs4')

[    Postal Code                  Place Name         Region Department  \
 0       75001.0    1er Arrondissement Paris  Île-de-France      Paris   
 1       75002.0   2ème Arrondissement Paris  Île-de-France      Paris   
 2       75003.0   3ème Arrondissement Paris  Île-de-France      Paris   
 3       75004.0   4ème Arrondissement Paris  Île-de-France      Paris   
 4       75005.0   5ème Arrondissement Paris  Île-de-France      Paris   
 5       75006.0   6ème Arrondissement Paris  Île-de-France      Paris   
 6       75007.0   7ème Arrondissement Paris  Île-de-France      Paris   
 7       75008.0   8ème Arrondissement Paris  Île-de-France      Paris   
 8       75009.0   9ème Arrondissement Paris  Île-de-France      Paris   
 9           NaN                         NaN            NaN        NaN   
 10      75011.0  11ème Arrondissement Paris  Île-de-France      Paris   
 11      75012.0  12ème Arrondissement Paris  Île-de-France      Paris   
 12      75013.0  13ème Arrondissement


Then, I will convert the table from a list to a panda dataframe.

In [7]:
table1 = pd.read_html(str(tables1[0]), flavor='bs4')[0]
table1

Unnamed: 0,Postal Code,Place Name,Region,Department,Arrondissement,Canton
0,75001.0,1er Arrondissement Paris,Île-de-France,Paris,Paris,Paris
1,75002.0,2ème Arrondissement Paris,Île-de-France,Paris,Paris,Paris
2,75003.0,3ème Arrondissement Paris,Île-de-France,Paris,Paris,Paris
3,75004.0,4ème Arrondissement Paris,Île-de-France,Paris,Paris,Paris
4,75005.0,5ème Arrondissement Paris,Île-de-France,Paris,Paris,Paris
5,75006.0,6ème Arrondissement Paris,Île-de-France,Paris,Paris,Paris
6,75007.0,7ème Arrondissement Paris,Île-de-France,Paris,Paris,Paris
7,75008.0,8ème Arrondissement Paris,Île-de-France,Paris,Paris,Paris
8,75009.0,9ème Arrondissement Paris,Île-de-France,Paris,Paris,Paris
9,,,,,,


In [8]:
table1 = table1.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
table1 = table1.reset_index(drop = True)
table1

Unnamed: 0,Postal Code,Place Name,Region,Department,Arrondissement,Canton
0,75001.0,1er Arrondissement Paris,Île-de-France,Paris,Paris,Paris
1,75002.0,2ème Arrondissement Paris,Île-de-France,Paris,Paris,Paris
2,75003.0,3ème Arrondissement Paris,Île-de-France,Paris,Paris,Paris
3,75004.0,4ème Arrondissement Paris,Île-de-France,Paris,Paris,Paris
4,75005.0,5ème Arrondissement Paris,Île-de-France,Paris,Paris,Paris
5,75006.0,6ème Arrondissement Paris,Île-de-France,Paris,Paris,Paris
6,75007.0,7ème Arrondissement Paris,Île-de-France,Paris,Paris,Paris
7,75008.0,8ème Arrondissement Paris,Île-de-France,Paris,Paris,Paris
8,75009.0,9ème Arrondissement Paris,Île-de-France,Paris,Paris,Paris
9,75011.0,11ème Arrondissement Paris,Île-de-France,Paris,Paris,Paris


Since, we don't have the loation data of **Paris** neighbourhoods, I will add them manullly from this website and create a new dataframe.

In [9]:
paris_data = pd.DataFrame(np.array([['Paris 1er Arrondissement','75101', '48.8634724','2.3485682'], 
                                        ['Paris 2e Arrondissement','75102', '48.8675136','2.34401'],
                                        ['Paris 3e Arrondissement','75103', '48.8654263','2.3600595'],
                                        ['Paris 4e Arrondissement','75104', '48.8541559','2.3567892'],
                                        ['Paris 5e Arrondissement','75105', '48.8463943','2.3483645'],
                                        ['Paris 6e Arrondissement','75106', '48.848913','2.3392942'],
                                        ['Paris 7e Arrondissement','75107', '48.8547572','2.3247895'],
                                        ['Paris 8e Arrondissement','75108', '48.8736011','2.307613'],
                                        ['Paris 9e Arrondissement','75109', '48.880069','2.3319307'],
                                        ['Paris 10e Arrondissement','75110', '48.8792014','2.3543906'],
                                        ['Paris 11e Arrondissement','75111', '48.8642864','2.3739555'],
                                        ['Paris 12e Arrondissement','75112', '48.8395675','2.3959529'],
                                        ['Paris 13e Arrondissement','75113', '48.8308786','2.3558807'],
                                        ['Paris 14e Arrondissement','75114', '48.8377124','2.3324264'],
                                        ['Paris 15e Arrondissement','75115', '48.8363015','2.2826809'],
                                        ['Paris 16e Arrondissement','75116', '48.8564994','2.2748522'],
                                        ['Paris 17e Arrondissement','75117', '48.8762189','2.2896492'],
                                        ['Paris 18e Arrondissement','75118', '48.8834217','2.335236'],
                                        ['Paris 19e Arrondissement','75119', '48.8780763','2.3761978'],
                                        ['Paris 20e Arrondissement','75120', '48.8603441','2.4029441']
                                       ]),
                              columns=['Neighbourhood','postal_code', 'latitude','longitude'])
paris_data

Unnamed: 0,Neighbourhood,postal_code,latitude,longitude
0,Paris 1er Arrondissement,75101,48.8634724,2.3485682
1,Paris 2e Arrondissement,75102,48.8675136,2.34401
2,Paris 3e Arrondissement,75103,48.8654263,2.3600595
3,Paris 4e Arrondissement,75104,48.8541559,2.3567892
4,Paris 5e Arrondissement,75105,48.8463943,2.3483645
5,Paris 6e Arrondissement,75106,48.848913,2.3392942
6,Paris 7e Arrondissement,75107,48.8547572,2.3247895
7,Paris 8e Arrondissement,75108,48.8736011,2.307613
8,Paris 9e Arrondissement,75109,48.880069,2.3319307
9,Paris 10e Arrondissement,75110,48.8792014,2.3543906


In [10]:
paris_data['latitude'] = pd.to_numeric(paris_data['latitude'])
paris_data['longitude'] = pd.to_numeric(paris_data['longitude'])
paris_data

Unnamed: 0,Neighbourhood,postal_code,latitude,longitude
0,Paris 1er Arrondissement,75101,48.863472,2.348568
1,Paris 2e Arrondissement,75102,48.867514,2.34401
2,Paris 3e Arrondissement,75103,48.865426,2.36006
3,Paris 4e Arrondissement,75104,48.854156,2.356789
4,Paris 5e Arrondissement,75105,48.846394,2.348365
5,Paris 6e Arrondissement,75106,48.848913,2.339294
6,Paris 7e Arrondissement,75107,48.854757,2.32479
7,Paris 8e Arrondissement,75108,48.873601,2.307613
8,Paris 9e Arrondissement,75109,48.880069,2.331931
9,Paris 10e Arrondissement,75110,48.879201,2.354391


Now, let's get **London** neighbourhood information. I will redo the same steps I did for **Paris**.

In [11]:
url2 = 'https://www.doogal.co.uk/AdministrativeAreas.php?district=E09000001'

In [12]:
data2 = requests.get(url2).text

In [13]:
soup = BeautifulSoup(data2,"html5lib")
tables2 = soup.find_all('table')

In [14]:
pd.read_html(str(tables2[0]), flavor='bs4')

[     Postcode               Ward   Latitude  Longitude  Easting  Northing  \
 0      E1 6AN        Bishopsgate  51.518852  -0.078510   533416    181742   
 1      E1 7AA          Portsoken  51.515567  -0.075635   533625    181382   
 2      E1 7AD          Portsoken  51.515457  -0.076718   533550    181368   
 3      E1 7AE          Portsoken  51.515613  -0.076899   533537    181385   
 4      E1 7AF          Portsoken  51.515613  -0.076899   533537    181385   
 ..        ...                ...        ...        ...      ...       ...   
 195  EC1A 4NT              Cheap  51.516411  -0.097721   532090    181436   
 196  EC1A 4PS              Cheap  51.516411  -0.097721   532090    181436   
 197  EC1A 4WA              Cheap  51.516411  -0.097721   532090    181436   
 198  EC1A 7AA  Farringdon Within  51.516249  -0.101591   531822    181411   
 199  EC1A 7AB  Farringdon Within  51.519495  -0.098775   532008    181777   
 
      Grid ref  
 0    TQ334817  
 1    TQ336813  
 2    TQ335

In [15]:
table2 = pd.read_html(str(tables2[0]), flavor='bs4')[0]
table2

Unnamed: 0,Postcode,Ward,Latitude,Longitude,Easting,Northing,Grid ref
0,E1 6AN,Bishopsgate,51.518852,-0.078510,533416,181742,TQ334817
1,E1 7AA,Portsoken,51.515567,-0.075635,533625,181382,TQ336813
2,E1 7AD,Portsoken,51.515457,-0.076718,533550,181368,TQ335813
3,E1 7AE,Portsoken,51.515613,-0.076899,533537,181385,TQ335813
4,E1 7AF,Portsoken,51.515613,-0.076899,533537,181385,TQ335813
...,...,...,...,...,...,...,...
195,EC1A 4NT,Cheap,51.516411,-0.097721,532090,181436,TQ320814
196,EC1A 4PS,Cheap,51.516411,-0.097721,532090,181436,TQ320814
197,EC1A 4WA,Cheap,51.516411,-0.097721,532090,181436,TQ320814
198,EC1A 7AA,Farringdon Within,51.516249,-0.101591,531822,181411,TQ318814


In [16]:
London_data = table2.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
London_data = table2.reset_index(drop = True)
London_data

Unnamed: 0,Postcode,Ward,Latitude,Longitude,Easting,Northing,Grid ref
0,E1 6AN,Bishopsgate,51.518852,-0.078510,533416,181742,TQ334817
1,E1 7AA,Portsoken,51.515567,-0.075635,533625,181382,TQ336813
2,E1 7AD,Portsoken,51.515457,-0.076718,533550,181368,TQ335813
3,E1 7AE,Portsoken,51.515613,-0.076899,533537,181385,TQ335813
4,E1 7AF,Portsoken,51.515613,-0.076899,533537,181385,TQ335813
...,...,...,...,...,...,...,...
195,EC1A 4NT,Cheap,51.516411,-0.097721,532090,181436,TQ320814
196,EC1A 4PS,Cheap,51.516411,-0.097721,532090,181436,TQ320814
197,EC1A 4WA,Cheap,51.516411,-0.097721,532090,181436,TQ320814
198,EC1A 7AA,Farringdon Within,51.516249,-0.101591,531822,181411,TQ318814


In [17]:
London_data = London_data.rename(columns = {'Ward': 'Neighbourhood'})
London_data.head()

Unnamed: 0,Postcode,Neighbourhood,Latitude,Longitude,Easting,Northing,Grid ref
0,E1 6AN,Bishopsgate,51.518852,-0.07851,533416,181742,TQ334817
1,E1 7AA,Portsoken,51.515567,-0.075635,533625,181382,TQ336813
2,E1 7AD,Portsoken,51.515457,-0.076718,533550,181368,TQ335813
3,E1 7AE,Portsoken,51.515613,-0.076899,533537,181385,TQ335813
4,E1 7AF,Portsoken,51.515613,-0.076899,533537,181385,TQ335813


Now, I will import the geolocation library to help me retrieve the latitude and longitude of Paris and London.

### First, I will start with Paris location data.

In [18]:
from geopy.geocoders import Nominatim

In [19]:
address = 'Paris, PAR'

geolocator = Nominatim(user_agent="par_explorer")
location = geolocator.geocode(address)
latitude_Paris = location.latitude
longitude_Paris = location.longitude
print('The geograpical coordinate of Paris are {}, {}.'.format(latitude_Paris, longitude_Paris))

The geograpical coordinate of Paris are 48.8602244, 2.3335177595398875.


Then, I will import the visualization libraries.

In [20]:
import matplotlib.cm as cm
import matplotlib.colors as colors
!conda install -c conda-forge folium=0.5.0 --yes
import folium

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python-3.7-OpenCE

  added / updated specs:
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.1.0               |             py_1         614 KB  conda-forge
    branca-0.4.2               |     pyhd8ed1ab_0          26 KB  conda-forge
    certifi-2021.5.30          |   py37h89c1867_0         141 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    openssl-1.1.1k             |       h7f98852_0         2.1 MB  conda-forge
    python_abi-3.7             |          2_cp37m           4 KB  conda-forge
    vincent-0.4.4              |           

Now, I will create a map of Paris that have the neighbourhoods on it.

In [21]:
map_paris = folium.Map(location=[latitude_Paris, longitude_Paris], zoom_start=10)

for lat, lng, Neighbourhood in zip(paris_data['latitude'], paris_data['longitude'],  paris_data['Neighbourhood'] ):
    label = '{}'.format(Neighbourhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_paris)  
    
map_paris

### Now, let's get London location data.

In [22]:
address = 'London, LDN'

geolocator = Nominatim(user_agent="ldn_explorer")
location = geolocator.geocode(address)
latitude_London = location.latitude
longitude_London = location.longitude
print('The geograpical coordinate of London are {}, {}.'.format(latitude_London, longitude_London))

The geograpical coordinate of London are 51.4806033, -0.1086343.


In [23]:
map_london = folium.Map(location=[latitude_London, longitude_London], zoom_start=10)

for lat, lng, Neighbourhood in zip(London_data['Latitude'], London_data['Longitude'],  London_data['Neighbourhood'] ):
    label = '{}'.format(Neighbourhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

# Methodology 

In this project, I will use the Foursquare API to explore the neighborhoods within each city. Then, I will use the DBSCAN, which is an unsupervised machine learning algorithm. DBSCAN will identify a number of clusters and outliers within neighborhoods. The DBSCAN did not work with this data because we don't have many data points and DBSCAN requires high-density data, So I will use Kmeans instead. 

First,  I will utilize the Foursquare API to explore the neighbourhoods and segment them.

In [24]:
CLIENT_ID = 'B0TGUAOTILJSBOU0TQ0K0CFXXT0JXH3B1G4NDQR1P0Z4MPUC'
CLIENT_SECRET = 'QYK513CD2HTSWL2RQ5Z03Q2AAON3DRT3GNSLSBPVLSYAJVPH'
VERSION = '20180605'
LIMIT = 100

#### Now, I will define a function that get the nearby venues for each neighbourhood.

In [25]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
    
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

## Fist, I will start with Paris data. 

In [26]:
Paris_venues = getNearbyVenues(names = paris_data['Neighbourhood'],
                                latitudes = paris_data['latitude'],
                                longitudes = paris_data['longitude']
                                 )

Paris 1er Arrondissement


KeyError: 'groups'

In [None]:
Paris_venues.head()

Now, let's check how many venues are for each neoghbourhood.

In [None]:
Paris_venues.groupby('Neighbourhood').count()

Let's check how many unique venues categories do we have.

In [None]:
print('There are {} uniques categories.'.format(len(Paris_venues['Venue Category'].unique())))

#### Now, let's analyze each neighbourhood individually to see venues within and their categories.

In [None]:
Paris_onehot = pd.get_dummies(Paris_venues[['Venue Category']], prefix="", prefix_sep="")

Paris_onehot['Neighbourhood'] = Paris_venues['Neighbourhood'] 


fixed_columns = [Paris_onehot.columns[-1]] + list(Paris_onehot.columns[:-1])
Paris_onehot = Paris_onehot[fixed_columns]

Paris_onehot.head()

In [None]:
Paris_onehot.shape

Now, let's group the resulted dataframe by neighbourhood and take the mean and the frequency of each category.

In [None]:
Paris_grouped = Paris_onehot.groupby('Neighbourhood').mean().reset_index()
Paris_grouped

In [None]:
Paris_grouped.shape

### Now, let's create a dataframe that have top 10 most common venues for each neighbourhood.

First I will a create a function that sorts venues in descending order.

In [None]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [None]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neighbourhoods_venues_sorted_Paris = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted_Paris['Neighbourhood'] = Paris_grouped['Neighbourhood']

for ind in np.arange(Paris_grouped.shape[0]):
    neighbourhoods_venues_sorted_Paris.iloc[ind, 1:] = return_most_common_venues(Paris_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted_Paris.head()

## Now, I will redo the same analysis for **London**. 

In [None]:
London_venues = getNearbyVenues(names = London_data['Neighbourhood'],
                                latitudes = London_data['Latitude'],
                                longitudes = London_data['Longitude']
                                 )

In [None]:
London_venues.head()

In [None]:
London_venues.groupby('Neighbourhood').count()

In [None]:
print('There are {} uniques categories.'.format(len(London_venues['Venue Category'].unique())))

In [None]:
London_onehot = pd.get_dummies(London_venues[['Venue Category']], prefix="", prefix_sep="")

London_onehot['Neighbourhood'] = London_venues['Neighbourhood'] 


fixed_columns = [London_onehot.columns[-1]] + list(London_onehot.columns[:-1])
London_onehot = London_onehot[fixed_columns]

London_onehot.head()

In [None]:
London_onehot.shape

In [None]:
London_grouped = London_onehot.groupby('Neighbourhood').mean().reset_index()
London_grouped

In [None]:
London_grouped.shape

In [None]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neighbourhoods_venues_sorted_London = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted_London['Neighbourhood'] = London_grouped['Neighbourhood']

for ind in np.arange(London_grouped.shape[0]):
    neighbourhoods_venues_sorted_London.iloc[ind, 1:] = return_most_common_venues(London_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted_London.head()

# Analysis 

# Clustering the neighbourhoods

Now, I will cluster the neighbourhoods in each city using the DBSCAN algorithm. 

In [None]:
from sklearn.cluster import  KMeans
from sklearn.preprocessing import StandardScaler

#### First, I will start with Paris data. 

In [None]:
kclusters = 4
Paris_grouped_clustering = Paris_grouped.drop('Neighbourhood', 1)

Paris_grouped_clustering = StandardScaler().fit_transform(Paris_grouped_clustering)

kmeans = KMeans(init="k-means++", n_clusters=kclusters, random_state=0).fit(Paris_grouped_clustering)

kmeans.labels_[0:10]


Now, let's create a datafame that include the cluster and the top 10 venues for each neighbourhood.

In [None]:
neighbourhoods_venues_sorted_Paris.insert(0, 'Cluster Labels', kmeans.labels_)

Paris_merged = paris_data

Paris_merged = Paris_merged.join(neighbourhoods_venues_sorted_Paris.set_index('Neighbourhood'), on='Neighbourhood')

Paris_merged.head()

Now, let's visiualize the results.

In [None]:
map_clusters_Paris = folium.Map(location=[latitude_Paris, longitude_Paris], zoom_start=11)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(Paris_merged['latitude'], Paris_merged['longitude'], Paris_merged['Neighbourhood'], Paris_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_Paris)
       
map_clusters_Paris

Now, let'ts examine each cluster and determine what venues best describe this category.

## Cluster 1

In [None]:
Paris_merged.loc[Paris_merged['Cluster Labels'] == 0, Paris_merged.columns[[0] + list(range(4, Paris_merged.shape[1]))]]

## Cluster 2

In [None]:
Paris_merged.loc[Paris_merged['Cluster Labels'] == 1, Paris_merged.columns[[0] + list(range(4, Paris_merged.shape[1]))]]

## Cluster 3

In [None]:
Paris_merged.loc[Paris_merged['Cluster Labels'] == 2, Paris_merged.columns[[0] + list(range(4, Paris_merged.shape[1]))]]

## Cluster 4

In [None]:
Paris_merged.loc[Paris_merged['Cluster Labels'] == 3, Paris_merged.columns[[0] + list(range(4, Paris_merged.shape[1]))]]

## Now, I will redo the same analysis for London.

In [None]:
kclusters = 4
London_grouped_clustering = London_grouped.drop('Neighbourhood', 1)

London_grouped_clustering = StandardScaler().fit_transform(Paris_grouped_clustering)

kmeans = KMeans(init="k-means++", n_clusters=kclusters, random_state=0).fit(London_grouped_clustering)

kmeans.labels_[0:10]


In [None]:
neighbourhoods_venues_sorted_London.insert(0, 'Cluster Labels1', kmeans.labels_)

London_merged = London_data

London_merged = London_merged.join(neighbourhoods_venues_sorted_London.set_index('Neighbourhood'), on='Neighbourhood')

London_merged.head()

In [None]:
map_clusters_London = folium.Map(location=[latitude_London, longitude_London], zoom_start=11)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(London_merged['latitude'], London_merged['longitude'], London_merged['Neighbourhood'], London_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_Paris)
       
map_clusters_London

## Cluster 1

In [None]:
London_merged.loc[London_merged['Cluster Labels'] == 0, London_merged.columns[[0] + list(range(4, London_merged.shape[1]))]]

## Cluster 2

In [None]:
London_merged.loc[London_merged['Cluster Labels'] == 1, London_merged.columns[[0] + list(range(4, London_merged.shape[1]))]]

## Cluster 3

In [None]:
London_merged.loc[London_merged['Cluster Labels'] == 2, London_merged.columns[[0] + list(range(4, London_merged.shape[1]))]]

## Cluster 4

In [None]:
London_merged.loc[London_merged['Cluster Labels'] == 3, London_merged.columns[[0] + list(range(4, London_merged.shape[1]))]]

# Conclusion

In this project, I tried to compare the neighbourhoods of Paris and London. I used web scraping to get neighbourhood data for the cities. Then, I used the Foursquare API to discover venues within neighbourhoods. After that, I tried to use the DBSCAN algorithm to cluster the neighbourhoods based on their latitude, longitude, and the top 10 venues. Unfortunately, due to the low density of the data points, DBSCAN did not provide and meaningful results. So, I used k-means instead. 

**Note: In the final execution of the notebook, I faced problems with Foursquare API; from the API not from my side; so I couldn't get the output cells in the final notebook. Please take this into consideration when grading my work. Thank you.**