# Capstone Project - The Battle of the Neighborhoods: Prague vs Paris+Saint Denis
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data Analysis](#data)
* [Data Visualization](#methodology)
* [Clustering the neighborhoods](#analysis)
* [Exploring the clusters, discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

The aim of the Capstone project is to compare two cities, Paris and Prague, which is my hometown. Both cities are the capital cities of countries, the city centres are very similar and attractive for tourists. For this project I consider Paris as the centre of Paris plus a part of Paris called Saint Denis. The agglomeration of Paris would be too huge for this project. The centre of Paris is very specific, it is historical, very expensive for living and there are many cafés, museums, tourists attractions and so on. Saint Denis is a part of Paris where many people live. We can find many stadiums and houses of blocks there. Cheap hotels and hostels are also there. In Prague we have city centre, which is very expensive, and in the parts more far away from the city centre there are many houses of blocks, stadiums and so on. I have experience from both these cities and I think that they are very similar and convenient for this project.
This project should help people to decide which city to visit, how many tourist attractions there are and how long they should stay there. It also can be convenient and helpful for people who want to change their neighborhoods within the city. It can be also helpful for people thinking about relocating into one of these cities.The idea is to look for venues in the different neighborhoods, to cluster them and compare them. 

## Data <a name="introduction"></a>

Firstly lets import all the necessary libraries.

In [1]:
!pip install beautifulsoup4
!pip install lxml
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 


from IPython.display import display_html
import pandas as pd
import numpy as np
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library
from bs4 import BeautifulSoup
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

!pip install opendatasets
import opendatasets as od

print('Folium installed')
print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Folium installed
Libraries imported.


### Prague Dataset <a name="introduction"></a>

We need to have geographical coordinates for the neighborhoods of Paris and Prague. 
For the Prague neighborhood I created an CSV. dataset, which is webscraped from wikipedia. The dataset is available here: https://www.kaggle.com/konecfil/prague-neighborhoods-dataset. The first column is name of the neighborhood, the second and third are Lat and Lon, respectively. We will use the geographical coordinates as centroids of the Prague neighborhoods. 



We will use od.download("https://www.kaggle.com/konecfil/prague-neighborhoods-dataset"). It will ask us to insert username and key. It can be found on your kaggle account ( you have to create an account and then it can be found if you click "your account". It will create a new directory with the file. 
You also need to add "!pip install opendatasets" and
"import opendatasets as od" to work properly.



In [None]:
od.download("https://www.kaggle.com/konecfil/prague-neighborhoods-dataset")

In [77]:
prague_data=pd.read_csv('prague neighborhoods.csv')
prague_data.head(10)


Unnamed: 0,Neighborhood,Lat,Lon
0,Prague 1,50.086389,14.411111
1,Prague 2,50.074167,14.442778
2,Prague 3,50.084444,14.454167
3,Prague 4,50.062222,14.440278
4,Prague 5,50.06,14.393333
5,Prague 6,50.100833,14.394722
6,Prague 7,50.100556,14.435556
7,Prague 8,50.107778,14.471389
8,Prague 9,50.110556,14.5
9,Prague 10,50.06667,14.46417


We have 57 neighborhoods in Prague.

### Paris Dataset <a name="introduction"></a>

The Paris' dataset is available here: https://www.data.gouv.fr/fr/datasets/r/e88c6fda-1d09-42a0-a069-606d3259114e. The JSON file is for the whole France, so we have to limit it for Paris only. Columns are : postal_code: Postal codes for France, nom_comm: Name of Neighborhoods in France, nom_dept: Name of the boroughs,
geo_point_2d: Tuple containing the latitude and longitude of the Neighborhoods.

### Foursquare API <a name="introduction"></a>

For the locations of venues we will use the Foursquare API. Foursquare API provides us with information about venues in the neighborhoods within an area of interest. We will use radius of 800 meters. Foursquare API is the only data source we will be using to obtain these data. 

### Data preprocessing <a name="introduction"></a>

We download a json file. Pay attention, the name is 'france-data.json'!

In [3]:
!wget -q -O 'france-data.json' https://www.data.gouv.fr/fr/datasets/r/e88c6fda-1d09-42a0-a069-606d3259114e
print("Data Downloaded!")
paris_raw = pd.read_json("'france-data.json'")
paris_raw.head()

Data Downloaded!


Unnamed: 0,datasetid,recordid,fields,geometry,record_timestamp
0,correspondances-code-insee-code-postal,2bf36b38314b6c39dfbcd09225f97fa532b1fc45,"{'code_comm': '645', 'nom_dept': 'ESSONNE', 's...","{'type': 'Point', 'coordinates': [2.2517129721...",2016-09-21T00:29:06.175+02:00
1,correspondances-code-insee-code-postal,7ee82e74e059b443df18bb79fc5a19b1f05e5a88,"{'code_comm': '133', 'nom_dept': 'SEINE-ET-MAR...","{'type': 'Point', 'coordinates': [3.0529405055...",2016-09-21T00:29:06.175+02:00
2,correspondances-code-insee-code-postal,e2cd3186f07286705ed482a10b6aebd9de633c81,"{'code_comm': '378', 'nom_dept': 'ESSONNE', 's...","{'type': 'Point', 'coordinates': [2.1971816504...",2016-09-21T00:29:06.175+02:00
3,correspondances-code-insee-code-postal,868bf03527a1d0a9defe5cf4e6fa0a730d725699,"{'code_comm': '243', 'nom_dept': 'SEINE-ET-MAR...","{'type': 'Point', 'coordinates': [2.7097808131...",2016-09-21T00:29:06.175+02:00
4,correspondances-code-insee-code-postal,1bbcee92101fdb50f5f5fceb052681f2421ff961,"{'code_comm': '414', 'nom_dept': 'SEINE-ET-MAR...","{'type': 'Point', 'coordinates': [3.2582355268...",2016-09-21T00:29:06.175+02:00


In [4]:
paris_field_data = pd.DataFrame()
for f in paris_raw.fields:
    dict_new = f
    paris_field_data = paris_field_data.append(dict_new, ignore_index=True)
 
paris_field_data.head()

Unnamed: 0,code_arr,code_cant,code_comm,code_dept,code_reg,geo_point_2d,geo_shape,id_geofla,insee_com,nom_comm,nom_dept,nom_region,population,postal_code,statut,superficie,z_moyen
0,3,3,645,91,11,"[48.750443119964764, 2.251712972144151]","{'type': 'Polygon', 'coordinates': [[[2.238024...",16275,91645,VERRIERES-LE-BUISSON,ESSONNE,ILE-DE-FRANCE,15.5,91370,Commune simple,999.0,121.0
1,3,20,133,77,11,"[48.41256065214989, 3.052940505560729]","{'type': 'Polygon', 'coordinates': [[[3.076046...",31428,77133,COURCELLES-EN-BASSEE,SEINE-ET-MARNE,ILE-DE-FRANCE,0.2,77126,Commune simple,1082.0,88.0
2,1,9,378,91,11,"[48.52726809075556, 2.19718165044305]","{'type': 'Polygon', 'coordinates': [[[2.203466...",30975,91378,MAUCHAMPS,ESSONNE,ILE-DE-FRANCE,0.3,91730,Commune simple,313.0,150.0
3,5,14,243,77,11,"[48.87307018579678, 2.7097808131278462]","{'type': 'Polygon', 'coordinates': [[[2.727542...",17000,77243,LAGNY-SUR-MARNE,SEINE-ET-MARNE,ILE-DE-FRANCE,20.2,77400,Chef-lieu canton,579.0,71.0
4,3,25,414,77,11,"[48.62891464105825, 3.2582355268439223]","{'type': 'Polygon', 'coordinates': [[[3.294591...",34949,77414,SAINT-HILLIERS,SEINE-ET-MARNE,ILE-DE-FRANCE,0.4,77160,Commune simple,1907.0,158.0


In [5]:
df_2 = paris_field_data[['postal_code','nom_comm','nom_dept','geo_point_2d']]

Then we filter the dataset so nom_dept contains 'PARIS' and 'SEINE-SAINT-DENIS' only. 

In [6]:
df_denis = df_2[df_2['nom_dept'].str.contains('SEINE-SAINT-DENIS')].reset_index(drop=True)
df_paris_ = df_2[df_2['nom_dept'].str.contains('PARIS')].reset_index(drop=True)
df_paris=pd.concat([df_denis,df_paris_])
df_paris.head(10)

Unnamed: 0,postal_code,nom_comm,nom_dept,geo_point_2d
0,93410,VAUJOURS,SEINE-SAINT-DENIS,"[48.932477260516166, 2.58100257040038]"
1,93250,VILLEMOMBLE,SEINE-SAINT-DENIS,"[48.884837002092105, 2.508934060353894]"
2,93270,SEVRAN,SEINE-SAINT-DENIS,"[48.93860701530393, 2.531240575670606]"
3,93450,L'ILE-SAINT-DENIS,SEINE-SAINT-DENIS,"[48.93956937690977, 2.325452527639678]"
4,93140,BONDY,SEINE-SAINT-DENIS,"[48.9023234526246, 2.483727693897052]"
5,93430,VILLETANEUSE,SEINE-SAINT-DENIS,"[48.957297650147964, 2.345066336514906]"
6,93120,LA COURNEUVE,SEINE-SAINT-DENIS,"[48.93225695457796, 2.399780648014392]"
7,93370,MONTFERMEIL,SEINE-SAINT-DENIS,"[48.898261667942165, 2.567143547956258]"
8,93800,EPINAY-SUR-SEINE,SEINE-SAINT-DENIS,"[48.95501320616889, 2.3145304323082883]"
9,93170,BAGNOLET,SEINE-SAINT-DENIS,"[48.86908363081595, 2.4227409668793163]"


Now we divide geo_point_2d into lat and lng. 

In [7]:
paris_lat = df_paris['geo_point_2d'].apply(lambda x: x[0])
paris_lng = df_paris['geo_point_2d'].apply(lambda x: x[1])
paris_combined_data=df_paris


In [8]:
paris_combined_data["lat"]=paris_lat
paris_combined_data["lon"]=paris_lng
paris_combined_data.drop("geo_point_2d",axis=1)

Unnamed: 0,postal_code,nom_comm,nom_dept,lat,lon
0,93410,VAUJOURS,SEINE-SAINT-DENIS,48.932477,2.581003
1,93250,VILLEMOMBLE,SEINE-SAINT-DENIS,48.884837,2.508934
2,93270,SEVRAN,SEINE-SAINT-DENIS,48.938607,2.531241
3,93450,L'ILE-SAINT-DENIS,SEINE-SAINT-DENIS,48.939569,2.325453
4,93140,BONDY,SEINE-SAINT-DENIS,48.902323,2.483728
5,93430,VILLETANEUSE,SEINE-SAINT-DENIS,48.957298,2.345066
6,93120,LA COURNEUVE,SEINE-SAINT-DENIS,48.932257,2.399781
7,93370,MONTFERMEIL,SEINE-SAINT-DENIS,48.898262,2.567144
8,93800,EPINAY-SUR-SEINE,SEINE-SAINT-DENIS,48.955013,2.31453
9,93170,BAGNOLET,SEINE-SAINT-DENIS,48.869084,2.422741


Now we have 60 neighborhoods of Paris. In Prague there are 57. The number of neighborhoods is almost the same.


## Visualization <a name="introduction"></a>

Lets visualize the neighborhoods of Prague. We will use Folium library for visualization.

In [79]:
map_prague = folium.Map(location=[50.083333, 14.416667],zoom_start=11)

for lat,lng,neighborhood in zip(prague_data["Lat"],prague_data["Lon"],prague_data["Neighborhood"]):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_prague)
map_prague

Now lets visualize Paris neighborhoods.


In [80]:
map_paris = folium.Map(location=[48.856613, 2.352222],zoom_start=11)

for lat,lng,neighborhood in zip(paris_combined_data['lat'],paris_combined_data['lon'],paris_combined_data['nom_comm']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_paris)
map_paris

From the maps it is obvious that there are more neighborhoods in Prague than in Paris. 

Now we will use Foursquare API to show us what venues are in the neighborhoods. We will use radius of 500 metres. Firstly we have to define client id/ secret and version.

In [11]:
CLIENT_ID = 'XXQ04DGX0JO1HN1VQG5OCYULWGSSFC24GVZS41T1UMXP2PKH' # your Foursquare ID
CLIENT_SECRET = 'NOKRGGEO0K5VYT4TTE3QKYKPV3AO5HCSQEXZ2CZAOCS3Z3P3' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: XXQ04DGX0JO1HN1VQG5OCYULWGSSFC24GVZS41T1UMXP2PKH
CLIENT_SECRET:NOKRGGEO0K5VYT4TTE3QKYKPV3AO5HCSQEXZ2CZAOCS3Z3P3


Lets create a function to get nearby venues. 

In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [13]:
prague_venues = getNearbyVenues(names=prague_data['Neighborhood'],
                                   latitudes=prague_data['Lat'],
                                   longitudes=prague_data['Lon']
                                  )

Prague 1
Prague 2
Prague 3
Prague 4
Prague 5
Prague 6
Prague 7
Prague 8
Prague 9
Prague 10
Prague 11
Prague 12
Prague 13
Prague 14
Prague 15
Prague 16
Prague 17
Prague 18
Prague 19
Prague 20
Prague 21
Prague 22
Prague Bechovice 
Prague Benice
Prague Brezineves
Prague Cakovice
Prague Dablice
Prague Dolni Chabry
Prague Dolni Mecholupy
Prague Dolni Pocernice
Prague Dubec
Prague Klanovice
Prague Kolodeje
Prague Kolovraty
Prague Kralovice
Prague Kreslice
Prague Kunratice
Prague Libus
Prague Lipence
Prague Lochkov
Prague Lysolaje
Prague Nebusice
Prague Nedvezi
Prague Petrovice
Prague Predni Kopanina
Prague Reporyje
Prague Satalice
Prague Slivenec
Prague Suchdol
Prague Seberov
Prague Sterboholy
Prague Troja
Prague Ujezd
Prague Velka Chuchle
Prague Vinor
Prague Zbraslav
Prague Zlicin


There are 1152 events in Prague's neighborhoods.

In [14]:
prague_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Prague 1,50.086389,14.411111,Karlův most | Charles Bridge (Karlův most),50.086480,14.411442,Bridge
1,Prague 1,50.086389,14.411111,Staroměstská mostecká věž,50.086177,14.413569,Monument / Landmark
2,Prague 1,50.086389,14.411111,Mlýnec,50.085389,14.413620,Mediterranean Restaurant
3,Prague 1,50.086389,14.411111,Kampa Park,50.087364,14.409678,Modern European Restaurant
4,Prague 1,50.086389,14.411111,Shakespeare & synové,50.087617,14.408628,Bookstore
...,...,...,...,...,...,...,...
1137,Prague Zlicin,50.061667,14.278333,Stezka Sobín-->Zličín,50.060649,14.281544,Trail
1138,Prague Zlicin,50.061667,14.278333,Pískoviště u domu M Zličín,50.058996,14.281978,Playground
1139,Prague Zlicin,50.061667,14.278333,Arri Rental Zličín,50.058059,14.278046,Video Store
1140,Prague Zlicin,50.061667,14.278333,U rybnicku,50.062550,14.284280,Hot Spring


Lets see how many venues there are for each neighborhood.

In [85]:
prague_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Prague 1,81,81,81,81,81,81
Prague 10,39,39,39,39,39,39
Prague 11,27,27,27,27,27,27
Prague 12,17,17,17,17,17,17
Prague 13,23,23,23,23,23,23
Prague 14,7,7,7,7,7,7
Prague 15,16,16,16,16,16,16
Prague 16,29,29,29,29,29,29
Prague 17,16,16,16,16,16,16
Prague 18,14,14,14,14,14,14


In [16]:
paris_venues = getNearbyVenues(names=paris_combined_data['nom_comm'],
                                   latitudes=paris_combined_data['lat'],
                                   longitudes=paris_combined_data['lon']
                                  )


paris_venues

VAUJOURS
VILLEMOMBLE
SEVRAN
L'ILE-SAINT-DENIS
BONDY
VILLETANEUSE
LA COURNEUVE
MONTFERMEIL
EPINAY-SUR-SEINE
BAGNOLET
TREMBLAY-EN-FRANCE
BOBIGNY
COUBRON
LIVRY-GARGAN
LE PRE-SAINT-GERVAIS
PIERREFITTE-SUR-SEINE
GOURNAY-SUR-MARNE
NOISY-LE-GRAND
LE BOURGET
MONTREUIL
LES LILAS
DRANCY
PANTIN
LE RAINCY
SAINT-OUEN
AUBERVILLIERS
SAINT-DENIS
LES PAVILLONS-SOUS-BOIS
DUGNY
ROSNY-SOUS-BOIS
ROMAINVILLE
NEUILLY-PLAISANCE
STAINS
CLICHY-SOUS-BOIS
LE BLANC-MESNIL
GAGNY
NEUILLY-SUR-MARNE
AULNAY-SOUS-BOIS
NOISY-LE-SEC
VILLEPINTE
PARIS-9E-ARRONDISSEMENT
PARIS-2E-ARRONDISSEMENT
PARIS-11E-ARRONDISSEMENT
PARIS-8E-ARRONDISSEMENT
PARIS-13E-ARRONDISSEMENT
PARIS-12E-ARRONDISSEMENT
PARIS-3E-ARRONDISSEMENT
PARIS-6E-ARRONDISSEMENT
PARIS-4E-ARRONDISSEMENT
PARIS-10E-ARRONDISSEMENT
PARIS-16E-ARRONDISSEMENT
PARIS-5E-ARRONDISSEMENT
PARIS-19E-ARRONDISSEMENT
PARIS-20E-ARRONDISSEMENT
PARIS-7E-ARRONDISSEMENT
PARIS-18E-ARRONDISSEMENT
PARIS-17E-ARRONDISSEMENT
PARIS-15E-ARRONDISSEMENT
PARIS-1ER-ARRONDISSEMENT
PARIS-14E-ARRONDISSE

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,VAUJOURS,48.932477,2.581003,Casino supermarché,48.935020,2.580431,Supermarket
1,VAUJOURS,48.932477,2.581003,Arrêt Alsace [Bus 8],48.934929,2.583707,Bus Stop
2,VILLEMOMBLE,48.884837,2.508934,BP,48.885230,2.504930,Gas Station
3,VILLEMOMBLE,48.884837,2.508934,Marché Villemomble Outrebon,48.887701,2.511067,Market
4,VILLEMOMBLE,48.884837,2.508934,Parc De La Garenne,48.882743,2.504914,Park
...,...,...,...,...,...,...,...
1503,PARIS-14E-ARRONDISSEMENT,48.828993,2.327101,Laverie,48.824721,2.328518,Laundromat
1504,PARIS-14E-ARRONDISSEMENT,48.828993,2.327101,Parc Hotel Paris,48.824567,2.326784,Hotel
1505,PARIS-14E-ARRONDISSEMENT,48.828993,2.327101,Dog Club,48.829572,2.333786,Pet Store
1506,PARIS-14E-ARRONDISSEMENT,48.828993,2.327101,Bonjour Bakery,48.830163,2.333681,Bakery


Paris has 1508 events in the neighborhoods.

In [86]:
paris_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AUBERVILLIERS,9,9,9,9,9,9
AULNAY-SOUS-BOIS,4,4,4,4,4,4
BAGNOLET,6,6,6,6,6,6
BOBIGNY,4,4,4,4,4,4
BONDY,7,7,7,7,7,7
CLICHY-SOUS-BOIS,4,4,4,4,4,4
COUBRON,1,1,1,1,1,1
DRANCY,4,4,4,4,4,4
EPINAY-SUR-SEINE,7,7,7,7,7,7
GAGNY,6,6,6,6,6,6


Lets see how many unique categories there are.


In [18]:
print('There are {} uniques categories in Prague.'.format(len(prague_venues['Venue Category'].unique())))
print('There are {} uniques categories in Paris.'.format(len(paris_venues['Venue Category'].unique())))

There are 230 uniques categories in Prague.
There are 237 uniques categories in Paris.


### Analyzing each neighborhood <a name="introduction"></a>

One hot encoding of venue categories

In [19]:
# one hot encoding
prague_onehot = pd.get_dummies(prague_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
prague_onehot['Neighborhood'] = prague_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [prague_onehot.columns[-1]] + list(prague_onehot.columns[:-1])
prague_onehot = prague_onehot[fixed_columns]

prague_onehot.head(5)

Unnamed: 0,Neighborhood,ATM,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,Auto Workshop,...,Vehicle Inspection Station,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Waterfront,Wine Bar,Wine Shop,Yoga Studio,Zoo
0,Prague 1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Prague 1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Prague 1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Prague 1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Prague 1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Paris one hot encoding

In [20]:
# one hot encoding
paris_onehot = pd.get_dummies(paris_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
paris_onehot['Neighborhood'] = paris_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [paris_onehot.columns[-1]] + list(paris_onehot.columns[:-1])
paris_onehot = paris_onehot[fixed_columns]

paris_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,Airport Terminal,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Zoo,Zoo Exhibit
0,VAUJOURS,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,VAUJOURS,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,VILLEMOMBLE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,VILLEMOMBLE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,VILLEMOMBLE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [43]:
prague_grouped = prague_onehot.groupby('Neighborhood').mean().reset_index()
prague_grouped.head()

Unnamed: 0,Neighborhood,ATM,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,Auto Workshop,...,Vehicle Inspection Station,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Waterfront,Wine Bar,Wine Shop,Yoga Studio,Zoo
0,Prague 1,0.0,0.012346,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.012346,0.0,0.037037,0.024691,0.012346,0.012346,0.0
1,Prague 10,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,...,0.0,0.025641,0.0,0.025641,0.0,0.0,0.025641,0.0,0.0,0.0
2,Prague 11,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0
3,Prague 12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,...,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Prague 13,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0


In [44]:
paris_grouped = paris_onehot.groupby('Neighborhood').mean().reset_index()
paris_grouped.head(10)

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,Airport Terminal,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Zoo,Zoo Exhibit
0,AUBERVILLIERS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,AULNAY-SOUS-BOIS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,BAGNOLET,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,BOBIGNY,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,BONDY,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,CLICHY-SOUS-BOIS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,COUBRON,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,DRANCY,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,EPINAY-SUR-SEINE,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,GAGNY,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


First, let's write a function to sort the venues in descending order.


In [45]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Top venues for Prague Neighborhoods.

In [92]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
prague_venues_sorted = pd.DataFrame(columns=columns)
prague_venues_sorted['Neighborhood'] = prague_grouped['Neighborhood']

for ind in np.arange(prague_grouped.shape[0]):
    prague_venues_sorted.iloc[ind, 1:] = return_most_common_venues(prague_grouped.iloc[ind, :], num_top_venues)

prague_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Prague 1,Café,Hotel,Park,Pub,Theater,Beer Bar,Plaza,Italian Restaurant,Restaurant,Waterfront
1,Prague 10,Stadium,Sporting Goods Shop,Drugstore,Fried Chicken Joint,Leather Goods Store,Bookstore,Fountain,Bus Station,Sandwich Place,Bus Stop
2,Prague 11,Supermarket,Pizza Place,Bus Stop,Park,Grocery Store,Bakery,Restaurant,Gym,Flower Shop,Food & Drink Shop
3,Prague 12,Restaurant,Ski Shop,Scenic Lookout,Salon / Barbershop,Electronics Store,Music Store,Bookstore,Dessert Shop,Tram Station,Stadium
4,Prague 13,Gastropub,Reservoir,Sushi Restaurant,Bistro,Market,Bus Stop,Salad Place,Restaurant,Indian Restaurant,Theme Restaurant


Top venues for Paris Neighborhoods.

In [94]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
paris_venues_sorted = pd.DataFrame(columns=columns)
paris_venues_sorted['Neighborhood'] = paris_grouped['Neighborhood']

for ind in np.arange(paris_grouped.shape[0]):
    paris_venues_sorted.iloc[ind, 1:] = return_most_common_venues(paris_grouped.iloc[ind, :], num_top_venues)

paris_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,AUBERVILLIERS,French Restaurant,Market,Park,Grocery Store,Fast Food Restaurant,Pizza Place,Coffee Shop,Supermarket,Theater,Fish Market
1,AULNAY-SOUS-BOIS,Gas Station,Kebab Restaurant,Chinese Restaurant,Thai Restaurant,Falafel Restaurant,Frozen Yogurt Shop,French Restaurant,Fountain,Food & Drink Shop,Flower Shop
2,BAGNOLET,Furniture / Home Store,French Restaurant,Hotel,Bar,Electronics Store,Fast Food Restaurant,Falafel Restaurant,Frozen Yogurt Shop,Fountain,Food & Drink Shop
3,BOBIGNY,Performing Arts Venue,Supermarket,Chinese Restaurant,Tram Station,Fish & Chips Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Zoo Exhibit
4,BONDY,Bank,Pet Store,French Restaurant,Supermarket,Electronics Store,Middle Eastern Restaurant,Pizza Place,Farmers Market,Fast Food Restaurant,Fish & Chips Shop


## Clustering Neighborhoods

I will create 4 clusters using K-means Algorithm, firstly for Prague



In [48]:
# set number of clusters
kclusters = 5

prague_grouped_clustering = prague_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(prague_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([3, 3, 3, 3, 3, 1, 1, 3, 1, 1, 3, 3, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3,
       1, 3, 3, 1, 2, 1, 1, 3, 1, 3, 1, 1, 0, 1, 3, 1, 1, 1, 1, 1, 3, 1,
       2, 3, 1, 1, 3, 3, 3, 3, 1, 1, 1, 3, 4])

In [49]:
prague_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

prague_merged = prague_data

# merge prague_grouped with prague_data to add latitude/longitude for each neighborhood
prague_merged = prague_merged.join(prague_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

prague_merged.head(5)

Unnamed: 0,Neighborhood,Lat,Lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Prague 1,50.086389,14.411111,3,Café,Hotel,Park,Pub,Theater,Beer Bar,Plaza,Italian Restaurant,Restaurant,Waterfront
1,Prague 2,50.074167,14.442778,3,Café,Bar,Bistro,Pub,Wine Bar,Beer Bar,Ice Cream Shop,Park,Yoga Studio,Escape Room
2,Prague 3,50.084444,14.454167,3,Pub,Café,Bakery,Vietnamese Restaurant,Italian Restaurant,Asian Restaurant,Hostel,Wine Bar,Gym / Fitness Center,Russian Restaurant
3,Prague 4,50.062222,14.440278,3,Café,Bar,Pizza Place,Plaza,Gastropub,Kebab Restaurant,Theater,Pub,Restaurant,Vietnamese Restaurant
4,Prague 5,50.06,14.393333,3,Restaurant,Pub,Coffee Shop,Café,Roof Deck,Park,Gym / Fitness Center,Gym Pool,Tram Station,Grocery Store


K-means for Paris

In [50]:
# set number of clusters
kclusters = 5

paris_grouped_clustering = paris_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans2 = KMeans(n_clusters=kclusters, random_state=0).fit(paris_grouped_clustering)

# check cluster labels generated for each row in the dataframe
labels=kmeans2.labels_
labels


array([2, 2, 0, 2, 0, 2, 1, 4, 2, 2, 0, 0, 0, 2, 4, 2, 0, 2, 2, 2, 4, 2,
       0, 2, 2, 2, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 2, 2, 2, 0, 2, 2, 0, 4, 2, 3, 2])

In [51]:
paris_venues_sorted.insert(0, 'Cluster Labels', kmeans2.labels_)

paris_merged = paris_combined_data
paris_merged.drop("geo_point_2d",axis=1)

# merge paris_grouped with paris_data to add latitude/longitude for each neighborhood
paris_merged = paris_merged.join(paris_venues_sorted.set_index('Neighborhood'), on='nom_comm')

paris_merged.head(5)

Unnamed: 0,postal_code,nom_comm,nom_dept,geo_point_2d,lat,lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,93410,VAUJOURS,SEINE-SAINT-DENIS,"[48.932477260516166, 2.58100257040038]",48.932477,2.581003,4.0,Supermarket,Bus Stop,Garden,Furniture / Home Store,Frozen Yogurt Shop,French Restaurant,Fountain,Food & Drink Shop,Flower Shop,Flea Market
1,93250,VILLEMOMBLE,SEINE-SAINT-DENIS,"[48.884837002092105, 2.508934060353894]",48.884837,2.508934,2.0,Gas Station,Tattoo Parlor,Park,Market,Farm,Frozen Yogurt Shop,French Restaurant,Fountain,Food & Drink Shop,Flower Shop
2,93270,SEVRAN,SEINE-SAINT-DENIS,"[48.93860701530393, 2.531240575670606]",48.938607,2.531241,2.0,Stadium,Convenience Store,Gas Station,Fast Food Restaurant,Food & Drink Shop,Train Station,Fish Market,Farm,Farmers Market,Fish & Chips Shop
3,93450,L'ILE-SAINT-DENIS,SEINE-SAINT-DENIS,"[48.93956937690977, 2.325452527639678]",48.939569,2.325453,0.0,Pool,Hotel,Business Service,Farm,Zoo Exhibit,Exhibit,French Restaurant,Fountain,Food & Drink Shop,Flower Shop
4,93140,BONDY,SEINE-SAINT-DENIS,"[48.9023234526246, 2.483727693897052]",48.902323,2.483728,0.0,Bank,Pet Store,French Restaurant,Supermarket,Electronics Store,Middle Eastern Restaurant,Pizza Place,Farmers Market,Fast Food Restaurant,Fish & Chips Shop


Let's visualize the clusters in a Map

Prague Clusters

In [64]:
# create map
map_clusters_prague = folium.Map(location=[50.083333, 14.416667], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(prague_merged['Lat'], prague_merged['Lon'], prague_merged['Neighborhood'], prague_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.9).add_to(map_clusters_prague)
       
map_clusters_prague

In [99]:
# create map
map_clusters_paris = folium.Map(location=[48.856613, 2.352222], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(paris_merged['lat'], paris_merged['lon'], paris_merged['nom_comm'], labels):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.9).add_to(map_clusters_paris)
       
map_clusters_paris

## Exploring the Clusters

### Cluster 0 of Prague

In [100]:
prague_merged.loc[prague_merged['Cluster Labels'] == 0].head(10)

Unnamed: 0,Neighborhood,Lat,Lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
34,Prague Kralovice,50.037778,14.635556,0,Field,Auto Workshop,Zoo,Donut Shop,Food Truck,Food & Drink Shop,Flower Shop,Fishing Spot,Fast Food Restaurant,Farmers Market


### Cluster 1 of Prague

In [101]:
prague_merged.loc[prague_merged['Cluster Labels'] == 1].head(5)

Unnamed: 0,Neighborhood,Lat,Lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Prague 14,50.102778,14.552222,1,Bus Stop,Caucasian Restaurant,Reservoir,Restaurant,Eastern European Restaurant,Zoo,Flower Shop,Fishing Spot,Field,Fast Food Restaurant
14,Prague 15,50.046667,14.556667,1,Bus Stop,Golf Course,Italian Restaurant,Supermarket,Restaurant,Grocery Store,Gym,Bar,Café,Mexican Restaurant
16,Prague 17,50.068889,14.303611,1,Beer Bar,Grocery Store,Czech Restaurant,Supermarket,Bus Stop,Steakhouse,Gym,Tram Station,Chinese Restaurant,Park
17,Prague 18,50.135611,14.511694,1,Bus Stop,Pub,Garden Center,Hotel,Park,Burger Joint,Spa,Steakhouse,Bridal Shop,Playground
19,Prague 20,50.114722,14.6125,1,Bus Stop,Pub,Clothing Store,Burger Joint,Go Kart Track,Chinese Restaurant,Train Station,Park,Gym / Fitness Center,Museum


### Cluster 2 of Prague

In [102]:
prague_merged.loc[prague_merged['Cluster Labels'] == 2].head(10)

Unnamed: 0,Neighborhood,Lat,Lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
26,Prague Dablice,50.145,14.4825,2,Bus Stop,Restaurant,Soccer Field,Zoo,Donut Shop,Flower Shop,Fishing Spot,Field,Fast Food Restaurant,Farmers Market
44,Prague Predni Kopanina,50.11639,14.29583,2,Restaurant,Plaza,Soccer Field,Bus Stop,Zoo,Donut Shop,Fishing Spot,Field,Fast Food Restaurant,Farmers Market


### Cluster 3 of Prague

In [103]:
prague_merged.loc[prague_merged['Cluster Labels'] == 3].head(10)

Unnamed: 0,Neighborhood,Lat,Lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Prague 1,50.086389,14.411111,3,Café,Hotel,Park,Pub,Theater,Beer Bar,Plaza,Italian Restaurant,Restaurant,Waterfront
1,Prague 2,50.074167,14.442778,3,Café,Bar,Bistro,Pub,Wine Bar,Beer Bar,Ice Cream Shop,Park,Yoga Studio,Escape Room
2,Prague 3,50.084444,14.454167,3,Pub,Café,Bakery,Vietnamese Restaurant,Italian Restaurant,Asian Restaurant,Hostel,Wine Bar,Gym / Fitness Center,Russian Restaurant
3,Prague 4,50.062222,14.440278,3,Café,Bar,Pizza Place,Plaza,Gastropub,Kebab Restaurant,Theater,Pub,Restaurant,Vietnamese Restaurant
4,Prague 5,50.06,14.393333,3,Restaurant,Pub,Coffee Shop,Café,Roof Deck,Park,Gym / Fitness Center,Gym Pool,Tram Station,Grocery Store
5,Prague 6,50.100833,14.394722,3,Coffee Shop,Café,Pizza Place,Hotel,Vietnamese Restaurant,Bakery,Italian Restaurant,Public Art,Paper / Office Supplies Store,Pedestrian Plaza
6,Prague 7,50.100556,14.435556,3,Café,Czech Restaurant,Asian Restaurant,Pub,Coffee Shop,Pizza Place,Dessert Shop,Vietnamese Restaurant,Burger Joint,Steakhouse
7,Prague 8,50.107778,14.471389,3,Restaurant,Beer Garden,Coffee Shop,Historic Site,Bakery,Gastropub,Vietnamese Restaurant,Café,Pub,Playground
8,Prague 9,50.110556,14.5,3,Coffee Shop,Restaurant,Gastropub,Electronics Store,Gym,Hotel,Czech Restaurant,Clothing Store,Dessert Shop,Sushi Restaurant
9,Prague 10,50.06667,14.46417,3,Stadium,Sporting Goods Shop,Drugstore,Fried Chicken Joint,Leather Goods Store,Bookstore,Fountain,Bus Station,Sandwich Place,Bus Stop


### Cluster 4 of Prague

In [108]:
prague_merged.loc[prague_merged['Cluster Labels'] == 4].head(10)

Unnamed: 0,Neighborhood,Lat,Lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
56,Prague Zlicin,50.061667,14.278333,4,Playground,Ice Cream Shop,Outdoors & Recreation,Video Store,Trail,Hot Spring,Cupcake Shop,Flower Shop,Fishing Spot,Field


### Clusters of Paris

### Cluster 0 of Paris

In [112]:
paris_merged.drop("geo_point_2d",axis=1)
paris_merged.loc[paris_merged['Cluster Labels'] == 0].head(10)

Unnamed: 0,postal_code,nom_comm,nom_dept,geo_point_2d,lat,lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,93450,L'ILE-SAINT-DENIS,SEINE-SAINT-DENIS,"[48.93956937690977, 2.325452527639678]",48.939569,2.325453,0.0,Pool,Hotel,Business Service,Farm,Zoo Exhibit,Exhibit,French Restaurant,Fountain,Food & Drink Shop,Flower Shop
4,93140,BONDY,SEINE-SAINT-DENIS,"[48.9023234526246, 2.483727693897052]",48.902323,2.483728,0.0,Bank,Pet Store,French Restaurant,Supermarket,Electronics Store,Middle Eastern Restaurant,Pizza Place,Farmers Market,Fast Food Restaurant,Fish & Chips Shop
6,93120,LA COURNEUVE,SEINE-SAINT-DENIS,"[48.93225695457796, 2.399780648014392]",48.932257,2.399781,0.0,Soccer Field,Intersection,Concert Hall,Auto Garage,Hotel,Martial Arts School,Zoo Exhibit,Falafel Restaurant,Farm,Farmers Market
9,93170,BAGNOLET,SEINE-SAINT-DENIS,"[48.86908363081595, 2.4227409668793163]",48.869084,2.422741,0.0,Furniture / Home Store,French Restaurant,Hotel,Bar,Electronics Store,Fast Food Restaurant,Falafel Restaurant,Frozen Yogurt Shop,Fountain,Food & Drink Shop
10,93290,TREMBLAY-EN-FRANCE,SEINE-SAINT-DENIS,"[48.97843041205846, 2.554685015432852]",48.97843,2.554685,0.0,Restaurant,Indian Restaurant,French Restaurant,Gastropub,Italian Restaurant,Airport Terminal,Indie Movie Theater,Frozen Yogurt Shop,Fountain,Food & Drink Shop
16,93460,GOURNAY-SUR-MARNE,SEINE-SAINT-DENIS,"[48.86058179017992, 2.575433137961149]",48.860582,2.575433,0.0,Motorcycle Shop,Miscellaneous Shop,Greek Restaurant,Farmers Market,Seafood Restaurant,Health & Beauty Service,Pizza Place,French Restaurant,Bank,Flea Market
22,93500,PANTIN,SEINE-SAINT-DENIS,"[48.89830938758385, 2.4087214747535]",48.898309,2.408721,0.0,French Restaurant,Trail,Italian Restaurant,Art Gallery,Pool,Mediterranean Restaurant,Bar,Dance Studio,Hotel Bar,Frozen Yogurt Shop
23,93340,LE RAINCY,SEINE-SAINT-DENIS,"[48.89674475850312, 2.519736640206343]",48.896745,2.519737,0.0,Japanese Restaurant,Sushi Restaurant,Bistro,French Restaurant,Health Food Store,Falafel Restaurant,Frozen Yogurt Shop,Fountain,Food & Drink Shop,Flower Shop
24,93400,SAINT-OUEN,SEINE-SAINT-DENIS,"[48.90980657500511, 2.332570422050525]",48.909807,2.33257,0.0,Bakery,French Restaurant,Resort,Diner,Sandwich Place,Park,Fast Food Restaurant,Movie Theater,Bookstore,Supermarket
31,93360,NEUILLY-PLAISANCE,SEINE-SAINT-DENIS,"[48.864328785155266, 2.510402498982637]",48.864329,2.510402,0.0,Italian Restaurant,Playground,Gym,Café,Falafel Restaurant,French Restaurant,Fountain,Food & Drink Shop,Flower Shop,Flea Market


### Cluster 1 of Paris

In [105]:
paris_merged.loc[paris_merged['Cluster Labels'] == 1].head(10)

Unnamed: 0,postal_code,nom_comm,nom_dept,geo_point_2d,lat,lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,93470,COUBRON,SEINE-SAINT-DENIS,"[48.91765195551786, 2.576312316520748]",48.917652,2.576312,1.0,Flea Market,Zoo Exhibit,Exhibit,Frozen Yogurt Shop,French Restaurant,Fountain,Food & Drink Shop,Flower Shop,Fish Market,Fish & Chips Shop


### Cluster 2 of Paris

In [115]:
paris_merged.loc[paris_merged['Cluster Labels'] == 2].head(10)

Unnamed: 0,postal_code,nom_comm,nom_dept,geo_point_2d,lat,lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,93250,VILLEMOMBLE,SEINE-SAINT-DENIS,"[48.884837002092105, 2.508934060353894]",48.884837,2.508934,2.0,Gas Station,Tattoo Parlor,Park,Market,Farm,Frozen Yogurt Shop,French Restaurant,Fountain,Food & Drink Shop,Flower Shop
2,93270,SEVRAN,SEINE-SAINT-DENIS,"[48.93860701530393, 2.531240575670606]",48.938607,2.531241,2.0,Stadium,Convenience Store,Gas Station,Fast Food Restaurant,Food & Drink Shop,Train Station,Fish Market,Farm,Farmers Market,Fish & Chips Shop
5,93430,VILLETANEUSE,SEINE-SAINT-DENIS,"[48.957297650147964, 2.345066336514906]",48.957298,2.345066,2.0,Thrift / Vintage Store,Light Rail Station,Tram Station,Fish & Chips Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Zoo Exhibit,Exhibit
8,93800,EPINAY-SUR-SEINE,SEINE-SAINT-DENIS,"[48.95501320616889, 2.3145304323082883]",48.955013,2.31453,2.0,Skate Park,Supermarket,Chinese Restaurant,Hotel,Laundromat,Shopping Mall,Asian Restaurant,Fast Food Restaurant,Falafel Restaurant,Farm
11,93000,BOBIGNY,SEINE-SAINT-DENIS,"[48.907688243955754, 2.438639827268387]",48.907688,2.43864,2.0,Performing Arts Venue,Supermarket,Chinese Restaurant,Tram Station,Fish & Chips Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Zoo Exhibit
13,93190,LIVRY-GARGAN,SEINE-SAINT-DENIS,"[48.91976332125543, 2.534865923320668]",48.919763,2.534866,2.0,Tourist Information Center,Park,Sandwich Place,Art Gallery,Turkish Restaurant,Fast Food Restaurant,Falafel Restaurant,Farm,Farmers Market,Fish & Chips Shop
14,93310,LE PRE-SAINT-GERVAIS,SEINE-SAINT-DENIS,"[48.88467348774406, 2.405422329606518]",48.884673,2.405422,2.0,Supermarket,French Restaurant,Pharmacy,Print Shop,Pool,Recording Studio,Bike Rental / Bike Share,Bus Stop,Farmers Market,Bakery
15,93380,PIERREFITTE-SUR-SEINE,SEINE-SAINT-DENIS,"[48.96098333553691, 2.363281254453645]",48.960983,2.363281,2.0,Tram Station,Light Rail Station,Dog Run,Zoo Exhibit,Fish & Chips Shop,Farm,Farmers Market,Fast Food Restaurant,Fish Market,Exhibit
17,93160,NOISY-LE-GRAND,SEINE-SAINT-DENIS,"[48.83618254008734, 2.564437368137358]",48.836183,2.564437,2.0,Park,Auto Dealership,Fast Food Restaurant,Automotive Shop,Zoo Exhibit,Farm,Furniture / Home Store,Frozen Yogurt Shop,French Restaurant,Fountain
19,93100,MONTREUIL,SEINE-SAINT-DENIS,"[48.863317505427545, 2.448162118570861]",48.863318,2.448162,2.0,Plaza,Hotel,Supermarket,Indian Restaurant,Turkish Restaurant,Theater,Diner,Sushi Restaurant,Pharmacy,Bar


### Cluster 3 of Paris

In [107]:
paris_merged.loc[paris_merged['Cluster Labels'] == 3].head(10)

Unnamed: 0,postal_code,nom_comm,nom_dept,geo_point_2d,lat,lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
39,93420,VILLEPINTE,SEINE-SAINT-DENIS,"[48.95902025378707, 2.536306342059409]",48.95902,2.536306,3.0,Middle Eastern Restaurant,Fast Food Restaurant,Zoo Exhibit,Gas Station,Furniture / Home Store,Frozen Yogurt Shop,French Restaurant,Fountain,Food & Drink Shop,Flower Shop


### Cluster 4 of Paris

In [116]:
paris_merged.loc[paris_merged['Cluster Labels'] == 4].head(10)

Unnamed: 0,postal_code,nom_comm,nom_dept,geo_point_2d,lat,lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,93410,VAUJOURS,SEINE-SAINT-DENIS,"[48.932477260516166, 2.58100257040038]",48.932477,2.581003,4.0,Supermarket,Bus Stop,Garden,Furniture / Home Store,Frozen Yogurt Shop,French Restaurant,Fountain,Food & Drink Shop,Flower Shop,Flea Market
7,93370,MONTFERMEIL,SEINE-SAINT-DENIS,"[48.898261667942165, 2.567143547956258]",48.898262,2.567144,4.0,Supermarket,Park,Furniture / Home Store,Frozen Yogurt Shop,French Restaurant,Fountain,Food & Drink Shop,Flower Shop,Flea Market,Fish Market
18,93350,LE BOURGET,SEINE-SAINT-DENIS,"[48.936184745775115, 2.428278555934854]",48.936185,2.428279,4.0,Supermarket,Hotel,Optical Shop,Park,Bakery,Zoo Exhibit,Farm,French Restaurant,Fountain,Food & Drink Shop
21,93700,DRANCY,SEINE-SAINT-DENIS,"[48.92342462588384, 2.44492688692071]",48.923425,2.444927,4.0,Supermarket,Business Service,Park,Pastry Shop,Zoo Exhibit,Farm,French Restaurant,Fountain,Food & Drink Shop,Flower Shop


## Discussion ( see the report)

Prague Clusters:
Cluster 0:
The first cluster consists of only one neighborhood- Prague Kralovice. It is located in the suburbs and three the most frequent events there are: Field, Auto Workshop and a Zoo. 
Cluster 1:
Cluster 1 is much more interesting. This cluster has 26 neighborhoods. It is the dark blue cluster in the map. The most frequent events there are Bus stops, gardens, restaurants and so on. It is obvious that this is a cluster of the suburbs. 
Cluster 2:
Cluster 2 consists of only 2 neighborhoods. Prague Dablice and Prague Predni Kopanina. Both of these clusters have in the top 5 events: Soccer field, Bus stop, Zoo and Restaurant. In the map it is light blue. 
Cluster 3:
Cluster 3 is the most interesting one. It is the yellow cluster in the city center. It has 27 neighborhoods. There are restaurants, pubs, cafés, hotels and similar. This cluster is the most interesting one for tourists and investors or business person. The most interesting places are in this cluster. 
Cluster 4:
This cluster consist of Prague Zlicin only. This cluster is very specific as well, we can see Playground, Trail and Outdoor and Recreation there. It is the yellow cluster.
Paris Clusters:
Cluster 0:
There are 28 neighborhoods in Cluster 0 ( red one). There are many restaurants, banks, hotels, pools and shops. This cluster is mixed in the whole Paris.
Cluster 1:
The dark blue cluster consists of 1 neighborhood only. The most frequent venues are Flea market, Zoo Exhibit and Exhibit. This cluster is very specific. 
Cluster 2:
There are 25 neighborhoods in Cluster 2. It is the light blue one. There are many supermarkets, parks, gas stations and shops. I think that this cluster is very convenient for living. 
Cluster 3:
There is only one neighborhood in cluster 3. The most common venue is Middle Eastern Restaurant, so I suppose it is a Middle Eastern Neighborhood. Interesting is that this cluster is in the center of Paris ( green one).
Cluster 4:
It is the brown cluster. The most common venue there is Supermarket ( in all cases). This cluster is probably good for living.


## Conclusion

Conclusion:
The aim of the project was to compare two cities, Prague and Paris ( center of Paris and Saint Denis).  In both cases we made 5 clusters using the same algorithm- kmeans.
In Prague the there were 2 main clusters. One of them was very strictly for the center of the city. There were many pubs, cafés, restaurants, banks and so on. The second main cluster had many bus stops, gardens, restaurants. It is the cluster convenient for living. Then we had 3 clusters, located in the suburbs, which were very specific and had few neighborhoods. 
The situation in Paris was much more interesting. We also had 2 main clusters, but these clusters were mixed in the city. The first main cluster had many restaurants, banks, hotels, pools and shops. I would say this is the “business” cluster. The second main cluster had many supermarkets, parks, gas stations and shops. From my point of view this cluster is for living. Then we had 3 very specific clusters. One with many Middle Eastern Restaurants and one with the most supermarkets. 
