# Capstone Project - The Battle of the Neighborhoods
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

The aim of the Capstone project is to compare two cities, Paris and Prague, which is my hometown. Both cities are the capital cities of countries, the city centres are very similar and attractive for tourists. This project should help people to decide which city to visit, how many tourist attractions there are and how long they should stay there. It also can be convenient and helpful for people who want to change their neighborhoods within the city. It can be also helpful for people thinking about relocating into one of these cities.The idea is to look for venues in the different neighborhoods, to cluster them and compare them. 

## Data <a name="introduction"></a>

Firstly lets import all the necessary libraries.

In [1]:
!pip install beautifulsoup4
!pip install lxml
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 


from IPython.display import display_html
import pandas as pd
import numpy as np
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library
from bs4 import BeautifulSoup
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

!pip install opendatasets
import opendatasets as od

print('Folium installed')
print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Folium installed
Libraries imported.


### Prague Dataset <a name="introduction"></a>

We need to have geographical coordinates for the neighborhoods of Paris and Prague. 
For the Prague neighborhood I created an CSV. dataset, which is webscraped from wikipedia. The dataset is available here: https://www.kaggle.com/konecfil/prague-neighborhoods-dataset. The first column is name of the neighborhood, the second and third are Lat and Lon, respectively. We will use the geographical coordinates as centroids of the Prague neighborhoods. 



We will use od.download("https://www.kaggle.com/konecfil/prague-neighborhoods-dataset"). It will ask us to insert username and key. It can be found on your kaggle account ( you have to create an account and then it can be found if you click "your account". It will create a new directory with the file. 
You also need to add "!pip install opendatasets" and
"import opendatasets as od" to work properly.



In [None]:
od.download("https://www.kaggle.com/konecfil/prague-neighborhoods-dataset")

In [2]:
prague_data=pd.read_csv('prague neighborhoods.csv')
prague_data.head(10)

Unnamed: 0,Neighborhood,Lat,Lon
0,Prague 1,50.086389,14.411111
1,Prague 2,50.074167,14.442778
2,Prague 3,50.084444,14.454167
3,Prague 4,50.062222,14.440278
4,Prague 5,50.06,14.393333
5,Prague 6,50.100833,14.394722
6,Prague 7,50.100556,14.435556
7,Prague 8,50.107778,14.471389
8,Prague 9,50.110556,14.5
9,Prague 10,50.06667,14.46417


### Paris Dataset <a name="introduction"></a>

The Paris' dataset is available here: https://www.data.gouv.fr/fr/datasets/r/e88c6fda-1d09-42a0-a069-606d3259114e. The JSON file is for the whole France, so we have to limit it for Paris only. Columns are : postal_code: Postal codes for France, nom_comm: Name of Neighborhoods in France, nom_dept: Name of the boroughs,
geo_point_2d: Tuple containing the latitude and longitude of the Neighborhoods.

### Foursquare API <a name="introduction"></a>

For the locations of venues we will use the Foursquare API. Foursquare API provides us with information about venues in the neighborhoods within an area of interest. We will use radius of 800 metres. Foursquare API is the only data source we will be using to obtain these data. 

### Data preprocessing <a name="introduction"></a>

We download a json file. Pay attention, the name is 'france-data.json'!

In [3]:
!wget -q -O 'france-data.json' https://www.data.gouv.fr/fr/datasets/r/e88c6fda-1d09-42a0-a069-606d3259114e
print("Data Downloaded!")
paris_raw = pd.read_json("'france-data.json'")
paris_raw.head()

Data Downloaded!


Unnamed: 0,datasetid,recordid,fields,geometry,record_timestamp
0,correspondances-code-insee-code-postal,2bf36b38314b6c39dfbcd09225f97fa532b1fc45,"{'code_comm': '645', 'nom_dept': 'ESSONNE', 's...","{'type': 'Point', 'coordinates': [2.2517129721...",2016-09-21T00:29:06.175+02:00
1,correspondances-code-insee-code-postal,7ee82e74e059b443df18bb79fc5a19b1f05e5a88,"{'code_comm': '133', 'nom_dept': 'SEINE-ET-MAR...","{'type': 'Point', 'coordinates': [3.0529405055...",2016-09-21T00:29:06.175+02:00
2,correspondances-code-insee-code-postal,e2cd3186f07286705ed482a10b6aebd9de633c81,"{'code_comm': '378', 'nom_dept': 'ESSONNE', 's...","{'type': 'Point', 'coordinates': [2.1971816504...",2016-09-21T00:29:06.175+02:00
3,correspondances-code-insee-code-postal,868bf03527a1d0a9defe5cf4e6fa0a730d725699,"{'code_comm': '243', 'nom_dept': 'SEINE-ET-MAR...","{'type': 'Point', 'coordinates': [2.7097808131...",2016-09-21T00:29:06.175+02:00
4,correspondances-code-insee-code-postal,1bbcee92101fdb50f5f5fceb052681f2421ff961,"{'code_comm': '414', 'nom_dept': 'SEINE-ET-MAR...","{'type': 'Point', 'coordinates': [3.2582355268...",2016-09-21T00:29:06.175+02:00


In [4]:
paris_field_data = pd.DataFrame()
for f in paris_raw.fields:
    dict_new = f
    paris_field_data = paris_field_data.append(dict_new, ignore_index=True)
 
paris_field_data.head()

Unnamed: 0,code_arr,code_cant,code_comm,code_dept,code_reg,geo_point_2d,geo_shape,id_geofla,insee_com,nom_comm,nom_dept,nom_region,population,postal_code,statut,superficie,z_moyen
0,3,3,645,91,11,"[48.750443119964764, 2.251712972144151]","{'type': 'Polygon', 'coordinates': [[[2.238024...",16275,91645,VERRIERES-LE-BUISSON,ESSONNE,ILE-DE-FRANCE,15.5,91370,Commune simple,999.0,121.0
1,3,20,133,77,11,"[48.41256065214989, 3.052940505560729]","{'type': 'Polygon', 'coordinates': [[[3.076046...",31428,77133,COURCELLES-EN-BASSEE,SEINE-ET-MARNE,ILE-DE-FRANCE,0.2,77126,Commune simple,1082.0,88.0
2,1,9,378,91,11,"[48.52726809075556, 2.19718165044305]","{'type': 'Polygon', 'coordinates': [[[2.203466...",30975,91378,MAUCHAMPS,ESSONNE,ILE-DE-FRANCE,0.3,91730,Commune simple,313.0,150.0
3,5,14,243,77,11,"[48.87307018579678, 2.7097808131278462]","{'type': 'Polygon', 'coordinates': [[[2.727542...",17000,77243,LAGNY-SUR-MARNE,SEINE-ET-MARNE,ILE-DE-FRANCE,20.2,77400,Chef-lieu canton,579.0,71.0
4,3,25,414,77,11,"[48.62891464105825, 3.2582355268439223]","{'type': 'Polygon', 'coordinates': [[[3.294591...",34949,77414,SAINT-HILLIERS,SEINE-ET-MARNE,ILE-DE-FRANCE,0.4,77160,Commune simple,1907.0,158.0


In [5]:
df_2 = paris_field_data[['postal_code','nom_comm','nom_dept','geo_point_2d']]

Then we filter the dataset so nom_dept contains Paris only. 

In [6]:
df_paris = df_2[df_2['nom_dept'].str.contains('PARIS')].reset_index(drop=True)
df_paris.head()

Unnamed: 0,postal_code,nom_comm,nom_dept,geo_point_2d
0,75009,PARIS-9E-ARRONDISSEMENT,PARIS,"[48.87689616237872, 2.337460241388529]"
1,75002,PARIS-2E-ARRONDISSEMENT,PARIS,"[48.86790337886785, 2.344107166658533]"
2,75011,PARIS-11E-ARRONDISSEMENT,PARIS,"[48.85941549762748, 2.378741060237548]"
3,75008,PARIS-8E-ARRONDISSEMENT,PARIS,"[48.87252726662346, 2.312582560420059]"
4,75013,PARIS-13E-ARRONDISSEMENT,PARIS,"[48.82871768452136, 2.362468228516128]"


Now we divide geo_point_2d into lat and lng. 

In [10]:
paris_lat = df_paris['geo_point_2d'].apply(lambda x: x[0])
paris_lng = df_paris['geo_point_2d'].apply(lambda x: x[1])
paris_combined_data=df_paris


In [11]:
paris_combined_data["lat"]=paris_lat
paris_combined_data["lon"]=paris_lng
paris_combined_data.drop("geo_point_2d",axis=1)

Unnamed: 0,postal_code,nom_comm,nom_dept,lat,lon
0,75009,PARIS-9E-ARRONDISSEMENT,PARIS,48.876896,2.33746
1,75002,PARIS-2E-ARRONDISSEMENT,PARIS,48.867903,2.344107
2,75011,PARIS-11E-ARRONDISSEMENT,PARIS,48.859415,2.378741
3,75008,PARIS-8E-ARRONDISSEMENT,PARIS,48.872527,2.312583
4,75013,PARIS-13E-ARRONDISSEMENT,PARIS,48.828718,2.362468
5,75012,PARIS-12E-ARRONDISSEMENT,PARIS,48.835156,2.419807
6,75003,PARIS-3E-ARRONDISSEMENT,PARIS,48.863054,2.359361
7,75006,PARIS-6E-ARRONDISSEMENT,PARIS,48.848968,2.332671
8,75004,PARIS-4E-ARRONDISSEMENT,PARIS,48.854228,2.357362
9,75010,PARIS-10E-ARRONDISSEMENT,PARIS,48.876029,2.361113


Now we have 20 neighborhoods of Paris. 


## Visualization <a name="introduction"></a>

Lets visualize the neighborhoods of Prague. We will use Folium library for visualization.

In [12]:
map_prague = folium.Map(location=[50.083333, 14.416667],zoom_start=11)

for lat,lng,neighborhood in zip(prague_data["Lat"],prague_data["Lon"],prague_data["Neighborhood"]):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_prague)
map_prague

Now lets visualize Paris neighborhoods.


In [13]:
map_paris = folium.Map(location=[48.856613, 2.352222],zoom_start=11)

for lat,lng,neighborhood in zip(paris_combined_data['lat'],paris_combined_data['lon'],paris_combined_data['nom_comm']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_paris)
map_paris

From the maps it is obvious that there are more neighborhoods in Prague than in Paris. 

Now we will use Foursquare API to show us what venues are in the neighborhoods. We will use radius of 500 metres. Firstly we have to define client id/ secret and version.

In [14]:
CLIENT_ID = 'XXQ04DGX0JO1HN1VQG5OCYULWGSSFC24GVZS41T1UMXP2PKH' # your Foursquare ID
CLIENT_SECRET = 'NOKRGGEO0K5VYT4TTE3QKYKPV3AO5HCSQEXZ2CZAOCS3Z3P3' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: XXQ04DGX0JO1HN1VQG5OCYULWGSSFC24GVZS41T1UMXP2PKH
CLIENT_SECRET:NOKRGGEO0K5VYT4TTE3QKYKPV3AO5HCSQEXZ2CZAOCS3Z3P3


Lets create a function to get nearby venues. 

In [15]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [16]:
prague_venues = getNearbyVenues(names=prague_data['Neighborhood'],
                                   latitudes=prague_data['Lat'],
                                   longitudes=prague_data['Lon']
                                  )

Prague 1
Prague 2
Prague 3
Prague 4
Prague 5
Prague 6
Prague 7
Prague 8
Prague 9
Prague 10
Prague 11
Prague 12
Prague 13
Prague 14
Prague 15
Prague 16
Prague 17
Prague 18
Prague 19
Prague 20
Prague 21
Prague 22
Prague Bechovice 
Prague Benice
Prague Brezineves
Prague Cakovice
Prague Dablice
Prague Dolni Chabry
Prague Dolni Mecholupy
Prague Dolni Pocernice
Prague Dubec
Prague Klanovice
Prague Kolodeje
Prague Kolovraty
Prague Kralovice
Prague Kreslice
Prague Kunratice
Prague Libus
Prague Lipence
Prague Lochkov
Prague Lysolaje
Prague Nebusice
Prague Nedvezi
Prague Petrovice
Prague Predni Kopanina
Prague Reporyje
Prague Satalice
Prague Slivenec
Prague Suchdol
Prague Seberov
Prague Sterboholy
Prague Troja
Prague Ujezd
Prague Velka Chuchle
Prague Vinor
Prague Zbraslav
Prague Zlicin


There are 1152 events in Prague's neighborhoods.

In [17]:
prague_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Prague 1,50.086389,14.411111,Karlův most | Charles Bridge (Karlův most),50.086480,14.411442,Bridge
1,Prague 1,50.086389,14.411111,Staroměstská mostecká věž,50.086177,14.413569,Monument / Landmark
2,Prague 1,50.086389,14.411111,Mlýnec,50.085389,14.413620,Mediterranean Restaurant
3,Prague 1,50.086389,14.411111,Kampa Park,50.087364,14.409678,Modern European Restaurant
4,Prague 1,50.086389,14.411111,Shakespeare & synové,50.087617,14.408628,Bookstore
...,...,...,...,...,...,...,...
1147,Prague Zlicin,50.061667,14.278333,Stezka Sobín-->Zličín,50.060649,14.281544,Trail
1148,Prague Zlicin,50.061667,14.278333,Pískoviště u domu M Zličín,50.058996,14.281978,Playground
1149,Prague Zlicin,50.061667,14.278333,Arri Rental Zličín,50.058059,14.278046,Video Store
1150,Prague Zlicin,50.061667,14.278333,U rybnicku,50.062550,14.284280,Hot Spring


Lets see how many venues there are for each neighborhood.

In [18]:
prague_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Prague 1,79,79,79,79,79,79
Prague 10,49,49,49,49,49,49
Prague 11,28,28,28,28,28,28
Prague 12,21,21,21,21,21,21
Prague 13,23,23,23,23,23,23
Prague 14,8,8,8,8,8,8
Prague 15,17,17,17,17,17,17
Prague 16,32,32,32,32,32,32
Prague 17,15,15,15,15,15,15
Prague 18,16,16,16,16,16,16


In [19]:
paris_venues = getNearbyVenues(names=paris_combined_data['nom_comm'],
                                   latitudes=paris_combined_data['lat'],
                                   longitudes=paris_combined_data['lon']
                                  )


paris_venues

PARIS-9E-ARRONDISSEMENT
PARIS-2E-ARRONDISSEMENT
PARIS-11E-ARRONDISSEMENT
PARIS-8E-ARRONDISSEMENT
PARIS-13E-ARRONDISSEMENT
PARIS-12E-ARRONDISSEMENT
PARIS-3E-ARRONDISSEMENT
PARIS-6E-ARRONDISSEMENT
PARIS-4E-ARRONDISSEMENT
PARIS-10E-ARRONDISSEMENT
PARIS-16E-ARRONDISSEMENT
PARIS-5E-ARRONDISSEMENT
PARIS-19E-ARRONDISSEMENT
PARIS-20E-ARRONDISSEMENT
PARIS-7E-ARRONDISSEMENT
PARIS-18E-ARRONDISSEMENT
PARIS-17E-ARRONDISSEMENT
PARIS-15E-ARRONDISSEMENT
PARIS-1ER-ARRONDISSEMENT
PARIS-14E-ARRONDISSEMENT


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,PARIS-9E-ARRONDISSEMENT,48.876896,2.337460,Farine & O,48.877209,2.339464,Bakery
1,PARIS-9E-ARRONDISSEMENT,48.876896,2.337460,RAP,48.876628,2.339359,Gourmet Shop
2,PARIS-9E-ARRONDISSEMENT,48.876896,2.337460,La Compagnie du Café,48.877916,2.337997,Café
3,PARIS-9E-ARRONDISSEMENT,48.876896,2.337460,Le Bouclier de Bacchus,48.876834,2.337843,Wine Bar
4,PARIS-9E-ARRONDISSEMENT,48.876896,2.337460,So Nat,48.876277,2.338614,Vegetarian / Vegan Restaurant
...,...,...,...,...,...,...,...
1279,PARIS-14E-ARRONDISSEMENT,48.828993,2.327101,Hotel Chatillon,48.825725,2.326107,Hotel
1280,PARIS-14E-ARRONDISSEMENT,48.828993,2.327101,Vélib' [14-19],48.825658,2.326358,Bike Rental / Bike Share
1281,PARIS-14E-ARRONDISSEMENT,48.828993,2.327101,U Express,48.827319,2.332178,Supermarket
1282,PARIS-14E-ARRONDISSEMENT,48.828993,2.327101,Eclat Laverie,48.827228,2.332140,Laundromat


Paris has 1284 events in the neighborhoods.

In [20]:
paris_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
PARIS-10E-ARRONDISSEMENT,100,100,100,100,100,100
PARIS-11E-ARRONDISSEMENT,44,44,44,44,44,44
PARIS-12E-ARRONDISSEMENT,5,5,5,5,5,5
PARIS-13E-ARRONDISSEMENT,59,59,59,59,59,59
PARIS-14E-ARRONDISSEMENT,25,25,25,25,25,25
PARIS-15E-ARRONDISSEMENT,60,60,60,60,60,60
PARIS-16E-ARRONDISSEMENT,10,10,10,10,10,10
PARIS-17E-ARRONDISSEMENT,64,64,64,64,64,64
PARIS-18E-ARRONDISSEMENT,70,70,70,70,70,70
PARIS-19E-ARRONDISSEMENT,45,45,45,45,45,45


Lets see how many unique categories there are.


In [21]:
print('There are {} uniques categories in Prague.'.format(len(prague_venues['Venue Category'].unique())))
print('There are {} uniques categories in Paris.'.format(len(paris_venues['Venue Category'].unique())))

There are 230 uniques categories in Prague.
There are 206 uniques categories in Paris.


## Analyzing each neighborhood <a name="introduction"></a>

One hot encoding of venue categories

In [22]:
# one hot encoding
prague_onehot = pd.get_dummies(prague_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
prague_onehot['Neighborhood'] = prague_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [prague_onehot.columns[-1]] + list(prague_onehot.columns[:-1])
prague_onehot = prague_onehot[fixed_columns]

prague_onehot.head(5)

Unnamed: 0,Neighborhood,ATM,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,Auto Workshop,...,Video Game Store,Video Store,Vietnamese Restaurant,Vineyard,Volleyball Court,Waterfront,Wine Bar,Wine Shop,Yoga Studio,Zoo
0,Prague 1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Prague 1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Prague 1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Prague 1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Prague 1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Paris one hot encoding

In [23]:
# one hot encoding
paris_onehot = pd.get_dummies(paris_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
paris_onehot['Neighborhood'] = paris_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [paris_onehot.columns[-1]] + list(paris_onehot.columns[:-1])
paris_onehot = paris_onehot[fixed_columns]

paris_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Zoo,Zoo Exhibit
0,PARIS-9E-ARRONDISSEMENT,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,PARIS-9E-ARRONDISSEMENT,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,PARIS-9E-ARRONDISSEMENT,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,PARIS-9E-ARRONDISSEMENT,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
4,PARIS-9E-ARRONDISSEMENT,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0


#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [24]:
prague_grouped = prague_onehot.groupby('Neighborhood').mean().reset_index()
prague_grouped.head()

Unnamed: 0,Neighborhood,ATM,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,Auto Workshop,...,Video Game Store,Video Store,Vietnamese Restaurant,Vineyard,Volleyball Court,Waterfront,Wine Bar,Wine Shop,Yoga Studio,Zoo
0,Prague 1,0.0,0.012658,0.012658,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.012658,0.0,0.0,0.037975,0.025316,0.012658,0.012658,0.0
1,Prague 10,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,...,0.020408,0.0,0.020408,0.0,0.0,0.0,0.020408,0.0,0.0,0.0
2,Prague 11,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0
3,Prague 12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,...,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0
4,Prague 13,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0


In [25]:
paris_grouped = paris_onehot.groupby('Neighborhood').mean().reset_index()
paris_grouped.head(10)

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Zoo,Zoo Exhibit
0,PARIS-10E-ARRONDISSEMENT,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.03,...,0.0,0.01,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0
1,PARIS-11E-ARRONDISSEMENT,0.022727,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.045455,...,0.0,0.022727,0.0,0.022727,0.045455,0.045455,0.0,0.0,0.0,0.0
2,PARIS-12E-ARRONDISSEMENT,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2
3,PARIS-13E-ARRONDISSEMENT,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.186441,...,0.0,0.0,0.0,0.0,0.220339,0.0,0.0,0.0,0.0,0.0
4,PARIS-14E-ARRONDISSEMENT,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,PARIS-15E-ARRONDISSEMENT,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,PARIS-16E-ARRONDISSEMENT,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,PARIS-17E-ARRONDISSEMENT,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0
8,PARIS-18E-ARRONDISSEMENT,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.014286,0.0,0.0,0.028571,0.014286,0.014286,0.0,0.0,0.0
9,PARIS-19E-ARRONDISSEMENT,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.022222,0.0,0.022222,0.0,0.0,0.0


First, let's write a function to sort the venues in descending order.


In [26]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Top venues for Prague Neighborhoods.

In [27]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
prague_venues_sorted = pd.DataFrame(columns=columns)
prague_venues_sorted['Neighborhood'] = prague_grouped['Neighborhood']

for ind in np.arange(prague_grouped.shape[0]):
    prague_venues_sorted.iloc[ind, 1:] = return_most_common_venues(prague_grouped.iloc[ind, :], num_top_venues)

prague_venues_sorted.head(10)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Prague 1,Café,Hotel,Park,Historic Site,Italian Restaurant,Beer Bar,Plaza,Restaurant,Theater,Waterfront
1,Prague 10,Bus Stop,Sporting Goods Shop,Playground,Chinese Restaurant,Stadium,Mobile Phone Shop,Coffee Shop,Drugstore,Leather Goods Store,Fried Chicken Joint
2,Prague 11,Supermarket,Drugstore,Pizza Place,Bus Stop,Bakery,Gastropub,Food & Drink Shop,Snack Place,Electronics Store,Pub
3,Prague 12,Bus Stop,Restaurant,Pub,Tram Station,Ski Shop,Stadium,Scenic Lookout,Bookstore,Salon / Barbershop,Music Store
4,Prague 13,Gastropub,Pizza Place,Coffee Shop,Bus Stop,Sporting Goods Shop,Salad Place,Sushi Restaurant,Bistro,Restaurant,Chinese Restaurant
5,Prague 14,Bus Stop,Restaurant,Reservoir,Pharmacy,Eastern European Restaurant,Caucasian Restaurant,Fishing Spot,Field,Flower Shop,Food & Drink Shop
6,Prague 15,Bus Stop,ATM,Gym,Pizza Place,Restaurant,Café,Soccer Field,Mexican Restaurant,Supermarket,Swim School
7,Prague 16,Café,Plaza,Restaurant,Pizza Place,Pub,Modern European Restaurant,Chinese Restaurant,Movie Theater,Bowling Alley,Brewery
8,Prague 17,Czech Restaurant,Beer Bar,Bus Stop,Park,Supermarket,Gym,Tram Station,Chinese Restaurant,Grocery Store,Steakhouse
9,Prague 18,Bus Stop,Pub,Bridal Shop,Burger Joint,Spa,Park,Garden Center,Steakhouse,Athletics & Sports,Asian Restaurant


Top venues for Paris Neighborhoods.

In [28]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
paris_venues_sorted = pd.DataFrame(columns=columns)
paris_venues_sorted['Neighborhood'] = paris_grouped['Neighborhood']

for ind in np.arange(paris_grouped.shape[0]):
    paris_venues_sorted.iloc[ind, 1:] = return_most_common_venues(paris_grouped.iloc[ind, :], num_top_venues)

paris_venues_sorted.head(10)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,PARIS-10E-ARRONDISSEMENT,French Restaurant,Bistro,Coffee Shop,Café,Hotel,Pizza Place,Indian Restaurant,Japanese Restaurant,Asian Restaurant,Restaurant
1,PARIS-11E-ARRONDISSEMENT,Café,Restaurant,Asian Restaurant,Pastry Shop,Wine Bar,Vietnamese Restaurant,Bakery,Italian Restaurant,French Restaurant,Bistro
2,PARIS-12E-ARRONDISSEMENT,Zoo Exhibit,Bistro,Monument / Landmark,Supermarket,Zoo,Argentinian Restaurant,African Restaurant,French Restaurant,Fountain,Food Court
3,PARIS-13E-ARRONDISSEMENT,Vietnamese Restaurant,Asian Restaurant,Thai Restaurant,Chinese Restaurant,French Restaurant,Juice Bar,Grocery Store,Gourmet Shop,Furniture / Home Store,Cambodian Restaurant
4,PARIS-14E-ARRONDISSEMENT,French Restaurant,Hotel,Bakery,Food & Drink Shop,Brasserie,Fast Food Restaurant,Supermarket,Sushi Restaurant,Bistro,Bike Rental / Bike Share
5,PARIS-15E-ARRONDISSEMENT,Italian Restaurant,French Restaurant,Hotel,Bistro,Thai Restaurant,Lebanese Restaurant,Brasserie,Restaurant,Japanese Restaurant,Coffee Shop
6,PARIS-16E-ARRONDISSEMENT,Lake,French Restaurant,Art Museum,Trail,Boat or Ferry,Bus Station,Plaza,Pool,Park,Flower Shop
7,PARIS-17E-ARRONDISSEMENT,French Restaurant,Hotel,Italian Restaurant,Bakery,Café,Bistro,Plaza,Japanese Restaurant,Restaurant,Furniture / Home Store
8,PARIS-18E-ARRONDISSEMENT,French Restaurant,Bar,Italian Restaurant,Café,Restaurant,Plaza,Pizza Place,Bistro,Hotel,Vietnamese Restaurant
9,PARIS-19E-ARRONDISSEMENT,French Restaurant,Bar,Hotel,Beer Bar,Seafood Restaurant,Supermarket,Bistro,Spa,Steakhouse,Restaurant


## Clustering Neighborhoods

I will create 4 clusters using K-means Algorithm, firstly for Prague



In [29]:
# set number of clusters
kclusters = 4

prague_grouped_clustering = prague_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(prague_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 0, 2, 2, 1, 2, 1, 2, 1, 3, 2, 1, 2, 1, 2, 2, 2, 0, 2,
       0, 1, 2, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1])

In [30]:
prague_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

prague_merged = prague_data

# merge prague_grouped with prague_data to add latitude/longitude for each neighborhood
prague_merged = prague_merged.join(prague_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

prague_merged.head(5)

Unnamed: 0,Neighborhood,Lat,Lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Prague 1,50.086389,14.411111,1,Café,Hotel,Park,Historic Site,Italian Restaurant,Beer Bar,Plaza,Restaurant,Theater,Waterfront
1,Prague 2,50.074167,14.442778,1,Café,Bistro,Bar,Pub,Wine Bar,Ice Cream Shop,Beer Bar,Playground,Cocktail Bar,Yoga Studio
2,Prague 3,50.084444,14.454167,1,Pub,Café,Bakery,Wine Bar,Gym / Fitness Center,Hostel,Vietnamese Restaurant,Asian Restaurant,Italian Restaurant,Czech Restaurant
3,Prague 4,50.062222,14.440278,1,Café,Pizza Place,Bar,Theater,Kebab Restaurant,Pub,Restaurant,Gastropub,Vietnamese Restaurant,Plaza
4,Prague 5,50.06,14.393333,1,Restaurant,Furniture / Home Store,Coffee Shop,Buffet,Café,Gym / Fitness Center,Gym Pool,Park,Tram Station,Hill


K-means for Paris

In [31]:
# set number of clusters
kclusters = 4

paris_grouped_clustering = paris_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans2 = KMeans(n_clusters=kclusters, random_state=0).fit(paris_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans2.labels_

array([1, 1, 3, 1, 0, 1, 2, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1])

In [32]:
paris_venues_sorted.insert(0, 'Cluster Labels', kmeans2.labels_)

paris_merged = paris_combined_data
paris_merged.drop("geo_point_2d",axis=1)

# merge paris_grouped with paris_data to add latitude/longitude for each neighborhood
paris_merged = paris_merged.join(paris_venues_sorted.set_index('Neighborhood'), on='nom_comm')

paris_merged.head(5)

Unnamed: 0,postal_code,nom_comm,nom_dept,geo_point_2d,lat,lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,75009,PARIS-9E-ARRONDISSEMENT,PARIS,"[48.87689616237872, 2.337460241388529]",48.876896,2.33746,1,French Restaurant,Hotel,Japanese Restaurant,Bistro,Cocktail Bar,Wine Bar,Restaurant,Bakery,Tea Room,Lounge
1,75002,PARIS-2E-ARRONDISSEMENT,PARIS,"[48.86790337886785, 2.344107166658533]",48.867903,2.344107,1,French Restaurant,Cocktail Bar,Wine Bar,Bakery,Coffee Shop,Italian Restaurant,Plaza,Salad Place,Hotel,Pastry Shop
2,75011,PARIS-11E-ARRONDISSEMENT,PARIS,"[48.85941549762748, 2.378741060237548]",48.859415,2.378741,1,Café,Restaurant,Asian Restaurant,Pastry Shop,Wine Bar,Vietnamese Restaurant,Bakery,Italian Restaurant,French Restaurant,Bistro
3,75008,PARIS-8E-ARRONDISSEMENT,PARIS,"[48.87252726662346, 2.312582560420059]",48.872527,2.312583,0,French Restaurant,Hotel,Art Gallery,Cocktail Bar,Spa,Corsican Restaurant,Plaza,Theater,Furniture / Home Store,Mediterranean Restaurant
4,75013,PARIS-13E-ARRONDISSEMENT,PARIS,"[48.82871768452136, 2.362468228516128]",48.828718,2.362468,1,Vietnamese Restaurant,Asian Restaurant,Thai Restaurant,Chinese Restaurant,French Restaurant,Juice Bar,Grocery Store,Gourmet Shop,Furniture / Home Store,Cambodian Restaurant


Let's visualize the clusters in a Map

Prague Clusters

In [33]:
# create map
map_clusters_prague = folium.Map(location=[50.083333, 14.416667], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i*i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(prague_merged['Lat'], prague_merged['Lon'], prague_merged['Neighborhood'], prague_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.9).add_to(map_clusters_prague)
       
map_clusters_prague

In [34]:
# create map
map_clusters_paris = folium.Map(location=[48.856613, 2.352222], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i*i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(paris_merged['lat'], paris_merged['lon'], paris_merged['nom_comm'], paris_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.9).add_to(map_clusters_paris)
       
map_clusters_paris

## Examining Clusters

### Cluster 0 of Prague

In [44]:
prague_merged.loc[prague_merged['Cluster Labels'] == 0].head(10)

Unnamed: 0,Neighborhood,Lat,Lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
26,Prague Dablice,50.145,14.4825,0,Restaurant,Bus Stop,Soccer Field,Zoo,Electronics Store,Fried Chicken Joint,Food Truck,Food & Drink Shop,Flower Shop,Fishing Spot
42,Prague Nedvezi,50.0181,14.6528,0,Trail,Bus Stop,Restaurant,Cocktail Bar,Zoo,Fried Chicken Joint,Food Truck,Food & Drink Shop,Flower Shop,Fishing Spot
44,Prague Predni Kopanina,50.11639,14.29583,0,Restaurant,Plaza,Soccer Field,Bus Stop,Zoo,Electronics Store,Food Truck,Food & Drink Shop,Flower Shop,Fishing Spot


### Cluster 1 of Prague

In [45]:
prague_merged.loc[prague_merged['Cluster Labels'] == 1].head(10)

Unnamed: 0,Neighborhood,Lat,Lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Prague 1,50.086389,14.411111,1,Café,Hotel,Park,Historic Site,Italian Restaurant,Beer Bar,Plaza,Restaurant,Theater,Waterfront
1,Prague 2,50.074167,14.442778,1,Café,Bistro,Bar,Pub,Wine Bar,Ice Cream Shop,Beer Bar,Playground,Cocktail Bar,Yoga Studio
2,Prague 3,50.084444,14.454167,1,Pub,Café,Bakery,Wine Bar,Gym / Fitness Center,Hostel,Vietnamese Restaurant,Asian Restaurant,Italian Restaurant,Czech Restaurant
3,Prague 4,50.062222,14.440278,1,Café,Pizza Place,Bar,Theater,Kebab Restaurant,Pub,Restaurant,Gastropub,Vietnamese Restaurant,Plaza
4,Prague 5,50.06,14.393333,1,Restaurant,Furniture / Home Store,Coffee Shop,Buffet,Café,Gym / Fitness Center,Gym Pool,Park,Tram Station,Hill
5,Prague 6,50.100833,14.394722,1,Coffee Shop,Café,Pizza Place,ATM,Vietnamese Restaurant,Italian Restaurant,Bakery,Hotel,Bus Stop,Gym
6,Prague 7,50.100556,14.435556,1,Café,Czech Restaurant,Dessert Shop,Pizza Place,Asian Restaurant,Pub,Coffee Shop,Plaza,Chinese Restaurant,Thrift / Vintage Store
7,Prague 8,50.107778,14.471389,1,Restaurant,Coffee Shop,Beer Garden,Playground,Vietnamese Restaurant,Gastropub,Pub,Café,Historic Site,Pizza Place
8,Prague 9,50.110556,14.5,1,Restaurant,Coffee Shop,Pub,Czech Restaurant,Gastropub,Electronics Store,Hotel,Clothing Store,Pizza Place,Gym
9,Prague 10,50.06667,14.46417,1,Bus Stop,Sporting Goods Shop,Playground,Chinese Restaurant,Stadium,Mobile Phone Shop,Coffee Shop,Drugstore,Leather Goods Store,Fried Chicken Joint


### Cluster 2 of Prague

In [46]:
prague_merged.loc[prague_merged['Cluster Labels'] == 2].head(10)

Unnamed: 0,Neighborhood,Lat,Lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Prague 14,50.102778,14.552222,2,Bus Stop,Restaurant,Reservoir,Pharmacy,Eastern European Restaurant,Caucasian Restaurant,Fishing Spot,Field,Flower Shop,Food & Drink Shop
17,Prague 18,50.135611,14.511694,2,Bus Stop,Pub,Bridal Shop,Burger Joint,Spa,Park,Garden Center,Steakhouse,Athletics & Sports,Asian Restaurant
20,Prague 21,50.075833,14.659444,2,Bus Stop,Bar,Pub,Soccer Field,Tea Room,Dessert Shop,Supermarket,Bakery,Zoo,Fast Food Restaurant
27,Prague Dolni Chabry,50.14639,14.44778,2,Bus Stop,Restaurant,Plaza,Bowling Alley,Soccer Field,Reservoir,Supermarket,Flower Shop,Pharmacy,Czech Restaurant
28,Prague Dolni Mecholupy,50.059,14.558,2,Bus Stop,Pizza Place,Pub,Steakhouse,Rental Car Location,Music Store,Grocery Store,Electronics Store,Food & Drink Shop,Flower Shop
30,Prague Dubec,50.061667,14.590556,2,Bus Stop,Zoo,History Museum,Eastern European Restaurant,Toy / Game Store,Park,Diner,Tennis Stadium,Historic Site,Reservoir
32,Prague Kolodeje,50.063056,14.640833,2,Reservoir,Italian Restaurant,Bus Stop,Historic Site,Zoo,Electronics Store,Fried Chicken Joint,Food Truck,Food & Drink Shop,Flower Shop
35,Prague Kreslice,50.0231,14.5647,2,Bus Stop,Scenic Lookout,Trail,Pub,Czech Restaurant,Dance Studio,Fried Chicken Joint,Food Truck,Food & Drink Shop,Flower Shop
37,Prague Libus,50.009167,14.462222,2,Bus Stop,Buffet,Dessert Shop,Music Store,Park,Dog Run,Steakhouse,Rental Service,Vehicle Inspection Station,Asian Restaurant
39,Prague Lochkov,50.00306,14.35222,2,Czech Restaurant,Soccer Field,Auto Garage,Bus Stop,Zoo,Escape Room,Fruit & Vegetable Store,Fried Chicken Joint,Food Truck,Food & Drink Shop


### Cluster 3 of Prague

In [47]:
prague_merged.loc[prague_merged['Cluster Labels'] == 3].head(10)

Unnamed: 0,Neighborhood,Lat,Lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
34,Prague Kralovice,50.037778,14.635556,3,Auto Workshop,Field,Zoo,Electronics Store,Furniture / Home Store,Fruit & Vegetable Store,Fried Chicken Joint,Food Truck,Food & Drink Shop,Flower Shop


### Clusters of Paris

### Cluster 0 of Paris

In [48]:
paris_merged.drop("geo_point_2d",axis=1)
paris_merged.loc[paris_merged['Cluster Labels'] == 0].head(10)

Unnamed: 0,postal_code,nom_comm,nom_dept,geo_point_2d,lat,lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,75008,PARIS-8E-ARRONDISSEMENT,PARIS,"[48.87252726662346, 2.312582560420059]",48.872527,2.312583,0,French Restaurant,Hotel,Art Gallery,Cocktail Bar,Spa,Corsican Restaurant,Plaza,Theater,Furniture / Home Store,Mediterranean Restaurant
14,75007,PARIS-7E-ARRONDISSEMENT,PARIS,"[48.85608259819694, 2.312438687733857]",48.856083,2.312439,0,French Restaurant,Hotel,Italian Restaurant,Café,Plaza,Bistro,Cocktail Bar,Coffee Shop,Art Museum,History Museum
16,75017,PARIS-17E-ARRONDISSEMENT,PARIS,"[48.88733716648682, 2.307485559493426]",48.887337,2.307486,0,French Restaurant,Hotel,Italian Restaurant,Bakery,Café,Bistro,Plaza,Japanese Restaurant,Restaurant,Furniture / Home Store
19,75014,PARIS-14E-ARRONDISSEMENT,PARIS,"[48.82899321160942, 2.327100883257538]",48.828993,2.327101,0,French Restaurant,Hotel,Bakery,Food & Drink Shop,Brasserie,Fast Food Restaurant,Supermarket,Sushi Restaurant,Bistro,Bike Rental / Bike Share


### Cluster 1 of Paris

In [49]:
paris_merged.loc[paris_merged['Cluster Labels'] == 1].head(10)

Unnamed: 0,postal_code,nom_comm,nom_dept,geo_point_2d,lat,lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,75009,PARIS-9E-ARRONDISSEMENT,PARIS,"[48.87689616237872, 2.337460241388529]",48.876896,2.33746,1,French Restaurant,Hotel,Japanese Restaurant,Bistro,Cocktail Bar,Wine Bar,Restaurant,Bakery,Tea Room,Lounge
1,75002,PARIS-2E-ARRONDISSEMENT,PARIS,"[48.86790337886785, 2.344107166658533]",48.867903,2.344107,1,French Restaurant,Cocktail Bar,Wine Bar,Bakery,Coffee Shop,Italian Restaurant,Plaza,Salad Place,Hotel,Pastry Shop
2,75011,PARIS-11E-ARRONDISSEMENT,PARIS,"[48.85941549762748, 2.378741060237548]",48.859415,2.378741,1,Café,Restaurant,Asian Restaurant,Pastry Shop,Wine Bar,Vietnamese Restaurant,Bakery,Italian Restaurant,French Restaurant,Bistro
4,75013,PARIS-13E-ARRONDISSEMENT,PARIS,"[48.82871768452136, 2.362468228516128]",48.828718,2.362468,1,Vietnamese Restaurant,Asian Restaurant,Thai Restaurant,Chinese Restaurant,French Restaurant,Juice Bar,Grocery Store,Gourmet Shop,Furniture / Home Store,Cambodian Restaurant
6,75003,PARIS-3E-ARRONDISSEMENT,PARIS,"[48.86305413181178, 2.359361058970589]",48.863054,2.359361,1,French Restaurant,Japanese Restaurant,Coffee Shop,Gourmet Shop,Art Gallery,Bakery,Italian Restaurant,Wine Bar,Sandwich Place,Cocktail Bar
7,75006,PARIS-6E-ARRONDISSEMENT,PARIS,"[48.84896809191946, 2.332670898588416]",48.848968,2.332671,1,Bakery,Chocolate Shop,French Restaurant,Pastry Shop,Restaurant,Theater,Italian Restaurant,Fountain,Market,Deli / Bodega
8,75004,PARIS-4E-ARRONDISSEMENT,PARIS,"[48.854228281954754, 2.357361938142205]",48.854228,2.357362,1,French Restaurant,Ice Cream Shop,Pastry Shop,Clothing Store,Italian Restaurant,Park,Hotel,Wine Bar,Gay Bar,Pedestrian Plaza
9,75010,PARIS-10E-ARRONDISSEMENT,PARIS,"[48.87602855694339, 2.361112904561707]",48.876029,2.361113,1,French Restaurant,Bistro,Coffee Shop,Café,Hotel,Pizza Place,Indian Restaurant,Japanese Restaurant,Asian Restaurant,Restaurant
11,75005,PARIS-5E-ARRONDISSEMENT,PARIS,"[48.844508659617546, 2.349859385560182]",48.844509,2.349859,1,French Restaurant,Hotel,Italian Restaurant,Bar,Café,Bakery,Plaza,Pub,Coffee Shop,Creperie
12,75019,PARIS-19E-ARRONDISSEMENT,PARIS,"[48.88686862295828, 2.384694327870042]",48.886869,2.384694,1,French Restaurant,Bar,Hotel,Beer Bar,Seafood Restaurant,Supermarket,Bistro,Spa,Steakhouse,Restaurant


### Cluster 2 of Paris

In [50]:
paris_merged.loc[paris_merged['Cluster Labels'] == 2].head(10)

Unnamed: 0,postal_code,nom_comm,nom_dept,geo_point_2d,lat,lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,75016,PARIS-16E-ARRONDISSEMENT,PARIS,"[48.86039876035177, 2.262099559395783]",48.860399,2.2621,2,Lake,French Restaurant,Art Museum,Trail,Boat or Ferry,Bus Station,Plaza,Pool,Park,Flower Shop


### Cluster 3 of Paris

In [51]:
paris_merged.loc[paris_merged['Cluster Labels'] == 3].head(10)

Unnamed: 0,postal_code,nom_comm,nom_dept,geo_point_2d,lat,lon,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,75012,PARIS-12E-ARRONDISSEMENT,PARIS,"[48.83515623066034, 2.419807034965275]",48.835156,2.419807,3,Zoo Exhibit,Bistro,Monument / Landmark,Supermarket,Zoo,Argentinian Restaurant,African Restaurant,French Restaurant,Fountain,Food Court


We clustered the neighborhoods of Prague and Paris. The conclusion and further discussion will be in the presentation. 