# Clustering Capitals From Europe
In the first part of this Notebook we will try to cluster the capitals from Europe based in the frequency of its Venues categories. For that, we will use the K-Means algorithm from sklearn.
In the second part, we will choose a city in the world, and we will try answer wich capitals from Europe are more similars to this city based in its more frequent venues categories.


# Part 1: Clustering Capitals From Europe

### 1.1 Importing libraries

In [1]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
from geopy.geocoders import Nominatim
import folium
import json

### 1.2 Reading our Data

In [2]:
europe_venues= pd.read_csv('europe_venues_complete.csv')
europe_venues = europe_venues.drop('Unnamed: 0',axis=1)
europe_venues.head()

FileNotFoundError: File b'europe_venues_complete.csv' does not exist

### 1.3 Preparing our Data
The code below makes one column per Venue Category

In [3]:
europe_onehot = pd.get_dummies(europe_venues['Venue Category'],prefix="",prefix_sep="")
europe_onehot['Capital'] = europe_venues['Capital']
europe_onehot = europe_onehot[[europe_onehot.columns[-1]]+list(europe_onehot.columns[0:-1])]
europe_onehot.head()

Unnamed: 0,Capital,ATM,Accessories Store,Adult Boutique,African Restaurant,Agriturismo,American Restaurant,Amphitheater,Antique Shop,Apres Ski Bar,...,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Tirana,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Tirana,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Tirana,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Tirana,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Tirana,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


The code below groups rows by Capital, taking the mean of the frequency of each venue category

In [4]:
europe_grouped = europe_onehot.groupby('Capital').mean().reset_index()
europe_grouped.head()

Unnamed: 0,Capital,ATM,Accessories Store,Adult Boutique,African Restaurant,Agriturismo,American Restaurant,Amphitheater,Antique Shop,Apres Ski Bar,...,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Amsterdam,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.001742,0.008711,0.0,0.0,0.0,0.001742,0.020906,0.001742,0.010453
1,Andorra la Vella,0.0,0.002545,0.0,0.0,0.0,0.002545,0.0,0.0,0.007634,...,0.0,0.0,0.002545,0.0,0.0,0.0,0.002545,0.0,0.0,0.0
2,Ankara,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.001675,0.001675,0.0,0.0,0.00335,0.00335,0.0,0.0
3,Astana,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.005764,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Athens,0.0,0.0,0.0,0.0,0.0,0.003311,0.0,0.0,0.0,...,0.0,0.006623,0.013245,0.0,0.0,0.0,0.001656,0.003311,0.0,0.0


The function below sort row based in the frequency of the venue's category and return the 'n' top venues.

In [5]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

The code below creates a new dataframe with the 'n' most popular venues of each Capital.

In [6]:
num_top_venues = 20

indicators = ['st', 'nd', 'rd']

columns = ['Capital']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))


capitals_venues_sorted = pd.DataFrame(columns=columns)
capitals_venues_sorted['Capital'] = europe_grouped['Capital']

for ind in np.arange(europe_grouped.shape[0]):
    capitals_venues_sorted.iloc[ind, 1:] = return_most_common_venues(europe_grouped.iloc[ind, :], num_top_venues)

capitals_venues_sorted.head(10)

Unnamed: 0,Capital,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,...,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,Amsterdam,Bar,Coffee Shop,Café,Supermarket,Park,Marijuana Dispensary,Museum,Gym / Fitness Center,Restaurant,...,Art Museum,Yoga Studio,Sandwich Place,Gym,Music Venue,Plaza,Theater,Pub,Department Store,Italian Restaurant
1,Andorra la Vella,Restaurant,Ski Area,Café,Plaza,Coffee Shop,Ski Chairlift,Bar,Pub,Clothing Store,...,Perfume Shop,Ski Trail,Shopping Mall,Cocktail Bar,Spanish Restaurant,BBQ Joint,Mediterranean Restaurant,Pizza Place,Mountain,Lounge
2,Ankara,Café,Pub,Bar,Coffee Shop,Gym / Fitness Center,Supermarket,Park,Theater,Dance Studio,...,Music Venue,Gym,Grocery Store,Art Gallery,History Museum,Restaurant,Scenic Lookout,Bookstore,Seafood Restaurant,Gastropub
3,Astana,Coffee Shop,Café,Shopping Mall,Park,Supermarket,Gym / Fitness Center,Restaurant,Italian Restaurant,Electronics Store,...,Asian Restaurant,Hotel Bar,Grocery Store,Bar,Eastern European Restaurant,Plaza,Pharmacy,Convenience Store,Multiplex,Karaoke Bar
4,Athens,Café,Bar,Supermarket,Theater,Coffee Shop,Cocktail Bar,Park,Gym / Fitness Center,Gym,...,Plaza,Greek Restaurant,Movie Theater,Electronics Store,Historic Site,History Museum,Wine Bar,Museum,Multiplex,Gourmet Shop
5,Baku,Café,Park,Coffee Shop,Lounge,Restaurant,Pub,Tea Room,Movie Theater,Supermarket,...,Shopping Mall,Gym / Fitness Center,Concert Hall,Gym,Italian Restaurant,Theater,Big Box Store,Plaza,Music Venue,Multiplex
6,Belgrade,Bar,Coffee Shop,Restaurant,Supermarket,Café,Park,Gym / Fitness Center,Art Gallery,Museum,...,Cosmetics Shop,Gym,Jazz Club,Pizza Place,Plaza,Eastern European Restaurant,Cocktail Bar,Multiplex,Bistro,BBQ Joint
7,Berlin,Coffee Shop,Café,Park,Bar,Supermarket,Cocktail Bar,Gym / Fitness Center,Wine Bar,Plaza,...,Indie Movie Theater,History Museum,Organic Grocery,Art Museum,Art Gallery,Concert Hall,Italian Restaurant,Drugstore,Theater,Bookstore
8,Bern,Supermarket,Café,Plaza,Bar,Swiss Restaurant,Grocery Store,Park,Restaurant,Coffee Shop,...,Shopping Mall,Movie Theater,Bakery,Scenic Lookout,Electronics Store,Science Museum,Pizza Place,Asian Restaurant,Athletics & Sports,Music Venue
9,Bratislava,Café,Supermarket,Art Gallery,Coffee Shop,Gym / Fitness Center,Pub,Park,Theater,Wine Bar,...,Bar,Plaza,Clothing Store,Vegetarian / Vegan Restaurant,Brewery,Italian Restaurant,Beer Bar,Restaurant,Outdoor Sculpture,Shopping Mall


## 1.4. Clustering with Kmeans

In [7]:
from sklearn.cluster import AgglomerativeClustering
k = 10
europe_clusters = europe_grouped.drop('Capital',axis=1)
Hierar = AgglomerativeClustering(n_clusters= k).fit(europe_clusters)
Hierar.labels_

array([2, 9, 1, 6, 1, 3, 2, 2, 4, 3, 2, 6, 2, 3, 1, 3, 1, 6, 7, 0, 6, 5,
       7, 6, 8, 6, 1, 1, 7, 0, 3, 1, 3, 7, 0, 0, 0, 2, 1, 3, 3, 0, 4, 8,
       3, 2, 1, 3, 0])

In [8]:
capitals_venues_sorted.insert(0,'Cluster Labels',Hierar.labels_)

europe_merged = europe_venues.drop_duplicates('Country').iloc[:,0:4]

europe_merged = europe_merged.join(capitals_venues_sorted.set_index('Capital'), on = 'Capital')

europe_merged

Unnamed: 0,Country,Capital,Capital Latitude,Capital Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,...,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,Albania,Tirana,41.327946,19.818532,0,Café,Coffee Shop,Italian Restaurant,Bar,Shopping Mall,...,Park,Lounge,Supermarket,Pizza Place,Boutique,Clothing Store,Market,Bistro,Multiplex,Pub
235,Andorra,Andorra la Vella,42.506939,1.521247,9,Restaurant,Ski Area,Café,Plaza,Coffee Shop,...,Perfume Shop,Ski Trail,Shopping Mall,Cocktail Bar,Spanish Restaurant,BBQ Joint,Mediterranean Restaurant,Pizza Place,Mountain,Lounge
628,Armenia,Yerevan,40.177612,44.512585,3,Café,Park,Restaurant,Pub,Grocery Store,...,Gym,Coffee Shop,Theater,Museum,Bakery,Movie Theater,Italian Restaurant,Electronics Store,Art Gallery,Pool
1033,Austria,Vienna,48.208354,16.372504,3,Café,Supermarket,Park,Plaza,Austrian Restaurant,...,History Museum,Concert Hall,Gym / Fitness Center,Italian Restaurant,Zoo Exhibit,Clothing Store,Art Museum,Gym,Museum,Palace
1602,Azerbaijan,Baku,40.375443,49.832675,3,Café,Park,Coffee Shop,Lounge,Restaurant,...,Shopping Mall,Gym / Fitness Center,Concert Hall,Gym,Italian Restaurant,Theater,Big Box Store,Plaza,Music Venue,Multiplex
2155,Belarus,Minsk,53.902334,27.561879,6,Coffee Shop,Café,Park,Gym / Fitness Center,Shopping Mall,...,Restaurant,Hookah Bar,Supermarket,Bookstore,Dance Studio,Concert Hall,Bar,History Museum,Multiplex,Electronics Store
2716,Belgium,Brussels,50.846557,4.351697,2,Supermarket,Bar,Coffee Shop,Gym / Fitness Center,Plaza,...,Park,Art Museum,Café,Cocktail Bar,Museum,History Museum,Soccer Field,Bakery,Theater,Movie Theater
3330,Bosnia and Herzegovina,Sarajevo,43.851977,18.386687,0,Café,Shopping Mall,Restaurant,Eastern European Restaurant,Bar,...,Italian Restaurant,Hookah Bar,Gym / Fitness Center,Fast Food Restaurant,Bakery,Mountain,Coffee Shop,Dessert Shop,Theater,Gym
3685,Bulgaria,Sofia,42.697863,23.322179,2,Bar,Café,Supermarket,Coffee Shop,Park,...,Cocktail Bar,Dance Studio,Gym,Dessert Shop,Plaza,Art Gallery,Pharmacy,Movie Theater,Nightclub,Clothing Store
4253,Croatia,Zagreb,45.813177,15.977048,0,Café,Bar,Supermarket,Plaza,Restaurant,...,BBQ Joint,Trail,Pub,Gym / Fitness Center,Gym,Shopping Mall,Farmers Market,Pizza Place,Coffee Shop,Bistro


## 1.5 Visualizing the Results on a Map

In [9]:
import matplotlib.cm as cm
import matplotlib.colors as colors

adress = "Europe"
geolocator = Nominatim(user_agent = "europe_explorer",timeout=3)
local = geolocator.geocode(adress)
latitude = local.latitude
longitude = local.longitude

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=3)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, capital, cluster in zip(europe_merged['Capital Latitude'], europe_merged['Capital Longitude'], europe_merged['Capital'], europe_merged['Cluster Labels']):
    if( np.isnan(cluster)): cluster = -1
    label = folium.Popup(str(capital) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 1.5 Seeing the cluster for each label

### Cluster 0

In [11]:
pd.set_option('display.max_columns', 30)
europe_merged.loc[europe_merged['Cluster Labels']== 0,:]

Unnamed: 0,Country,Capital,Capital Latitude,Capital Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,Albania,Tirana,41.327946,19.818532,0,Café,Coffee Shop,Italian Restaurant,Bar,Shopping Mall,Cocktail Bar,Plaza,Eastern European Restaurant,Restaurant,Nightclub,Park,Lounge,Supermarket,Pizza Place,Boutique,Clothing Store,Market,Bistro,Multiplex,Pub
3330,Bosnia and Herzegovina,Sarajevo,43.851977,18.386687,0,Café,Shopping Mall,Restaurant,Eastern European Restaurant,Bar,Park,History Museum,Pub,Grocery Store,Plaza,Italian Restaurant,Hookah Bar,Gym / Fitness Center,Fast Food Restaurant,Bakery,Mountain,Coffee Shop,Dessert Shop,Theater,Gym
4253,Croatia,Zagreb,45.813177,15.977048,0,Café,Bar,Supermarket,Plaza,Restaurant,Park,Theater,Grocery Store,Dessert Shop,Museum,BBQ Joint,Trail,Pub,Gym / Fitness Center,Gym,Shopping Mall,Farmers Market,Pizza Place,Coffee Shop,Bistro
15404,Montenegro,Podgorica,42.441524,19.262108,0,Café,Restaurant,Park,Gym / Fitness Center,Bar,Supermarket,Pub,Cosmetics Shop,Pizza Place,Italian Restaurant,Market,Lounge,Furniture / Home Store,Athletics & Sports,Gym,Shopping Mall,Soccer Field,Bookstore,River,Bakery
16351,North Macedonia,Skopje,41.996092,21.431649,0,Café,Bar,Gym / Fitness Center,Park,Supermarket,Lounge,Restaurant,Shopping Mall,Grocery Store,Market,Coffee Shop,Bakery,Plaza,BBQ Joint,Dance Studio,Movie Theater,Pizza Place,Playground,Electronics Store,Italian Restaurant
19744,San Marino,San Marino,43.945862,12.458306,0,Café,Italian Restaurant,Beach,Plaza,Bar,Park,Nightclub,Pub,Cocktail Bar,Ice Cream Shop,Restaurant,Clothing Store,Pizza Place,Coffee Shop,Gym,Seafood Restaurant,Supermarket,Food & Drink Shop,Shopping Mall,Wine Bar
21378,Slovenia,Ljubljana,46.049815,14.506782,0,Café,Bar,Convenience Store,Restaurant,Plaza,Theater,Pub,Supermarket,Grocery Store,Park,Trail,Art Gallery,Coffee Shop,Gym,Eastern European Restaurant,Pizza Place,Burger Joint,Dance Studio,Museum,Lounge


### Cluster 1

In [12]:
europe_merged.loc[europe_merged['Cluster Labels']== 1,:]

Unnamed: 0,Country,Capital,Capital Latitude,Capital Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
4784,Cyprus,Nicosia,35.180282,33.373696,1,Café,Coffee Shop,Bar,Gym,Supermarket,Gym / Fitness Center,Music Venue,Park,Restaurant,Art Gallery,Theater,Nightclub,Soccer Field,Dance Studio,History Museum,Sandwich Place,Grocery Store,Furniture / Home Store,Wine Bar,Market
5929,Denmark,Copenhagen,55.686724,12.570072,1,Café,Coffee Shop,Supermarket,Grocery Store,Park,Theater,Cocktail Bar,Wine Bar,Music Venue,Scandinavian Restaurant,Plaza,Gym / Fitness Center,Beer Bar,Bar,Bakery,Concert Hall,Gym,Movie Theater,Beach,Playground
7014,Finland,Helsinki,60.17132,24.941457,1,Café,Supermarket,Gym / Fitness Center,Grocery Store,Coffee Shop,Scandinavian Restaurant,Bar,Park,Theater,Art Museum,Gym,History Museum,Bakery,Pizza Place,Art Gallery,Dance Studio,Wine Bar,Beer Bar,Restaurant,Museum
9341,Greece,Athens,37.984149,23.727984,1,Café,Bar,Supermarket,Theater,Coffee Shop,Cocktail Bar,Park,Gym / Fitness Center,Gym,Clothing Store,Plaza,Greek Restaurant,Movie Theater,Electronics Store,Historic Site,History Museum,Wine Bar,Museum,Multiplex,Gourmet Shop
10534,Iceland,Reykjavík,64.244427,-21.768106,1,Café,Bar,Coffee Shop,Grocery Store,Supermarket,Seafood Restaurant,Gym,Park,Restaurant,Pool,Theater,Pizza Place,Art Museum,Burger Joint,History Museum,Gym / Fitness Center,Scandinavian Restaurant,Electronics Store,Movie Theater,Soccer Field
16850,Norway,Oslo,59.91333,10.73897,1,Grocery Store,Coffee Shop,Bar,Café,Park,Gym / Fitness Center,History Museum,Cocktail Bar,Bakery,Theater,Ski Lodge,Music Venue,Movie Theater,Burger Joint,Wine Shop,Supermarket,Art Gallery,Scandinavian Restaurant,Concert Hall,Gym
17400,Poland,Warsaw,52.231924,21.006727,1,Café,Park,Coffee Shop,Supermarket,Theater,Cocktail Bar,Plaza,Bar,Gym / Fitness Center,History Museum,Beer Bar,Market,Clothing Store,Shopping Mall,Grocery Store,Dance Studio,Music Venue,Vegetarian / Vegan Restaurant,Bistro,Gym
22508,Sweden,Stockholm,59.325117,18.071093,1,Café,Scandinavian Restaurant,Park,Grocery Store,Supermarket,Gym / Fitness Center,Coffee Shop,Theater,Plaza,Museum,Movie Theater,Art Gallery,History Museum,Clothing Store,Bakery,Bar,Cocktail Bar,Art Museum,Liquor Store,Wine Bar
23504,Turkey,Ankara,39.921522,32.853793,1,Café,Pub,Bar,Coffee Shop,Gym / Fitness Center,Supermarket,Park,Theater,Dance Studio,Shopping Mall,Music Venue,Gym,Grocery Store,Art Gallery,History Museum,Restaurant,Scenic Lookout,Bookstore,Seafood Restaurant,Gastropub


### Cluster 2

In [13]:
europe_merged.loc[europe_merged['Cluster Labels']== 2,:]

Unnamed: 0,Country,Capital,Capital Latitude,Capital Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
2716,Belgium,Brussels,50.846557,4.351697,2,Supermarket,Bar,Coffee Shop,Gym / Fitness Center,Plaza,Sandwich Place,Italian Restaurant,Concert Hall,Tea Room,Wine Bar,Park,Art Museum,Café,Cocktail Bar,Museum,History Museum,Soccer Field,Bakery,Theater,Movie Theater
3685,Bulgaria,Sofia,42.697863,23.322179,2,Bar,Café,Supermarket,Coffee Shop,Park,Gym / Fitness Center,Bakery,Restaurant,Theater,Italian Restaurant,Cocktail Bar,Dance Studio,Gym,Dessert Shop,Plaza,Art Gallery,Pharmacy,Movie Theater,Nightclub,Clothing Store
8727,Germany,Berlin,52.517036,13.38886,2,Coffee Shop,Café,Park,Bar,Supermarket,Cocktail Bar,Gym / Fitness Center,Wine Bar,Plaza,Clothing Store,Indie Movie Theater,History Museum,Organic Grocery,Art Museum,Art Gallery,Concert Hall,Italian Restaurant,Drugstore,Theater,Bookstore
9945,Hungary,Budapest,47.498382,19.040471,2,Coffee Shop,Supermarket,Park,Café,Bar,Theater,Gym / Fitness Center,Grocery Store,Wine Bar,Plaza,Pub,Italian Restaurant,Gastropub,History Museum,Breakfast Spot,Scenic Lookout,Beer Bar,Farmers Market,Restaurant,Indie Movie Theater
13393,Lithuania,Vilnius,54.687046,25.282911,2,Coffee Shop,Park,Art Gallery,Supermarket,Museum,Bar,Gym,Café,Restaurant,Grocery Store,Theater,Gym / Fitness Center,Cocktail Bar,Clothing Store,Pub,Plaza,Convenience Store,Shopping Mall,Pizza Place,Bakery
15777,Netherlands,Amsterdam,52.37454,4.897976,2,Bar,Coffee Shop,Café,Supermarket,Park,Marijuana Dispensary,Museum,Gym / Fitness Center,Restaurant,Clothing Store,Art Museum,Yoga Studio,Sandwich Place,Gym,Music Venue,Plaza,Theater,Pub,Department Store,Italian Restaurant
20269,Serbia,Belgrade,44.817813,20.456897,2,Bar,Coffee Shop,Restaurant,Supermarket,Café,Park,Gym / Fitness Center,Art Gallery,Museum,Theater,Cosmetics Shop,Gym,Jazz Club,Pizza Place,Plaza,Eastern European Restaurant,Cocktail Bar,Multiplex,Bistro,BBQ Joint


### Cluster 3

In [14]:
europe_merged.loc[europe_merged['Cluster Labels']== 3,:]

Unnamed: 0,Country,Capital,Capital Latitude,Capital Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
628,Armenia,Yerevan,40.177612,44.512585,3,Café,Park,Restaurant,Pub,Grocery Store,Clothing Store,Shopping Mall,Plaza,Supermarket,Gym / Fitness Center,Gym,Coffee Shop,Theater,Museum,Bakery,Movie Theater,Italian Restaurant,Electronics Store,Art Gallery,Pool
1033,Austria,Vienna,48.208354,16.372504,3,Café,Supermarket,Park,Plaza,Austrian Restaurant,Restaurant,Coffee Shop,Cocktail Bar,Bar,Theater,History Museum,Concert Hall,Gym / Fitness Center,Italian Restaurant,Zoo Exhibit,Clothing Store,Art Museum,Gym,Museum,Palace
1602,Azerbaijan,Baku,40.375443,49.832675,3,Café,Park,Coffee Shop,Lounge,Restaurant,Pub,Tea Room,Movie Theater,Supermarket,Grocery Store,Shopping Mall,Gym / Fitness Center,Concert Hall,Gym,Italian Restaurant,Theater,Big Box Store,Plaza,Music Venue,Multiplex
5353,Czech Republic,Prague,50.087465,14.421254,3,Café,Supermarket,Theater,Park,Cocktail Bar,Coffee Shop,Pub,Wine Bar,Zoo Exhibit,Clothing Store,Italian Restaurant,Gym / Fitness Center,Art Gallery,Plaza,Restaurant,Scenic Lookout,Garden,Bar,Drugstore,Gym
6490,Estonia,Tallinn,59.437216,24.745369,3,Café,Park,Restaurant,Supermarket,Museum,Coffee Shop,Gym / Fitness Center,Grocery Store,History Museum,Movie Theater,Wine Bar,Scenic Lookout,Shopping Mall,Trail,Beach,Bar,Theater,Gym,Cocktail Bar,Burger Joint
8192,Georgia,Tbilisi,41.693459,44.801449,3,Café,Park,Plaza,Caucasian Restaurant,Restaurant,Supermarket,Bar,Coffee Shop,Outdoor Sculpture,Gym / Fitness Center,Theater,Wine Bar,Bakery,Pub,Shopping Mall,Lounge,Movie Theater,Concert Hall,Art Gallery,Nightclub
11035,Ireland,Dublin,53.349764,-6.260273,3,Pub,Café,Coffee Shop,Park,Supermarket,Clothing Store,Museum,Theater,Restaurant,Grocery Store,Gym,Beach,Gym / Fitness Center,Art Gallery,Outdoor Sculpture,Italian Restaurant,History Museum,Movie Theater,Plaza,Gastropub
12547,Latvia,Riga,56.949398,24.105185,3,Café,Bar,Coffee Shop,Gym / Fitness Center,Park,Restaurant,Shopping Mall,Concert Hall,History Museum,Clothing Store,Bakery,Theater,Museum,Gym,Eastern European Restaurant,Supermarket,Cocktail Bar,Grocery Store,Lounge,Art Gallery
14557,Moldova,Chișinău,47.024471,28.832253,3,Supermarket,Café,Gym / Fitness Center,Park,Pub,Gym,Coffee Shop,Bar,Theater,Electronics Store,Romanian Restaurant,Lake,Plaza,Bakery,Mobile Phone Shop,Athletics & Sports,History Museum,Grocery Store,Karaoke Bar,Modern European Restaurant
20831,Slovakia,Bratislava,48.135908,17.159744,3,Café,Supermarket,Art Gallery,Coffee Shop,Gym / Fitness Center,Pub,Park,Theater,Wine Bar,Gym,Bar,Plaza,Clothing Store,Vegetarian / Vegan Restaurant,Brewery,Italian Restaurant,Beer Bar,Restaurant,Outdoor Sculpture,Shopping Mall


### Cluster 4

In [15]:
europe_merged.loc[europe_merged['Cluster Labels']== 4,:]

Unnamed: 0,Country,Capital,Capital Latitude,Capital Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
13118,Liechtenstein,Vaduz,47.139286,9.522796,4,Supermarket,Café,Restaurant,Grocery Store,Swiss Restaurant,Ski Area,Bar,Shopping Mall,Clothing Store,Trail,History Museum,Gastropub,Electronics Store,Museum,Austrian Restaurant,Furniture / Home Store,Mountain,Ski Chairlift,Italian Restaurant,Fast Food Restaurant
23078,Switzerland,Bern,46.948271,7.451451,4,Supermarket,Café,Plaza,Bar,Swiss Restaurant,Grocery Store,Park,Restaurant,Coffee Shop,Italian Restaurant,Shopping Mall,Movie Theater,Bakery,Scenic Lookout,Electronics Store,Science Museum,Pizza Place,Asian Restaurant,Athletics & Sports,Music Venue


### Cluster 5

In [16]:
europe_merged.loc[europe_merged['Cluster Labels']== 5,:]

Unnamed: 0,Country,Capital,Capital Latitude,Capital Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
13910,Luxembourg,Luxembourg,49.815868,6.129675,5,Campground,Italian Restaurant,French Restaurant,Bar,Shopping Mall,Supermarket,Restaurant,Café,Bakery,Castle,Pizza Place,Gym,Scenic Lookout,History Museum,Gastropub,Diner,Hotel Bar,German Restaurant,Pub,Pool


### Cluster 6

In [17]:
europe_merged.loc[europe_merged['Cluster Labels']== 6,:]

Unnamed: 0,Country,Capital,Capital Latitude,Capital Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
2155,Belarus,Minsk,53.902334,27.561879,6,Coffee Shop,Café,Park,Gym / Fitness Center,Shopping Mall,Big Box Store,Gym,Movie Theater,Theater,Grocery Store,Restaurant,Hookah Bar,Supermarket,Bookstore,Dance Studio,Concert Hall,Bar,History Museum,Multiplex,Electronics Store
12200,Kazakhstan,Astana,51.128258,71.43055,6,Coffee Shop,Café,Shopping Mall,Park,Supermarket,Gym / Fitness Center,Restaurant,Italian Restaurant,Electronics Store,Gym,Asian Restaurant,Hotel Bar,Grocery Store,Bar,Eastern European Restaurant,Plaza,Pharmacy,Convenience Store,Multiplex,Karaoke Bar
18548,Romania,Bucharest,44.436141,26.10272,6,Coffee Shop,Supermarket,Gym,Café,Gym / Fitness Center,Theater,Park,Pub,Restaurant,Art Gallery,Italian Restaurant,Lounge,Multiplex,Plaza,Bakery,Shopping Mall,History Museum,Department Store,Mediterranean Restaurant,Nightclub
19118,Russia,Moscow,55.750446,37.617494,6,Coffee Shop,Park,Theater,Gourmet Shop,Café,Plaza,Gym / Fitness Center,Cocktail Bar,Wine Bar,Supermarket,Shopping Mall,Convenience Store,Bar,Gastropub,Art Museum,Caucasian Restaurant,Tea Room,Seafood Restaurant,Yoga Studio,Restaurant
24101,Ukraine,Kiev,50.450064,30.524104,6,Coffee Shop,Supermarket,Park,Café,Gym / Fitness Center,Cocktail Bar,Dance Studio,Theater,Hookah Bar,Bar,Caucasian Restaurant,Wine Bar,Art Gallery,Plaza,Bakery,Gym,Italian Restaurant,History Museum,Clothing Store,Furniture / Home Store
24703,United Kingdom,London,51.507322,-0.127647,6,Coffee Shop,Theater,Cocktail Bar,Café,Park,Supermarket,Pub,Gym / Fitness Center,Clothing Store,Art Gallery,Grocery Store,Plaza,Wine Bar,Movie Theater,Hotel Bar,Steakhouse,Tea Room,Indian Restaurant,Lounge,Scenic Lookout


### Cluster 7

In [18]:
europe_merged.loc[europe_merged['Cluster Labels']== 7,:]

Unnamed: 0,Country,Capital,Capital Latitude,Capital Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
7582,France,Paris,48.85661,2.351499,7,Wine Bar,Plaza,Coffee Shop,French Restaurant,Cocktail Bar,Art Museum,Park,Café,Garden,Bar,Tea Room,Clothing Store,Museum,Bookstore,Music Venue,Boutique,Bakery,Bistro,Theater,Gourmet Shop
11597,Italy,Rome,41.894802,12.485338,7,Café,Plaza,Italian Restaurant,Supermarket,Art Museum,Cocktail Bar,Wine Bar,History Museum,Park,Sandwich Place,Museum,Ice Cream Shop,Pizza Place,Theater,Historic Site,Clothing Store,Fountain,Pub,Bookstore,Trattoria/Osteria
17967,Portugal,Lisbon,38.707751,-9.136592,7,Café,Portuguese Restaurant,Supermarket,Plaza,Coffee Shop,Bar,Park,Wine Bar,Theater,Museum,Scenic Lookout,Restaurant,History Museum,Gym,Clothing Store,Garden,Electronics Store,Grocery Store,Gym / Fitness Center,Art Gallery
21919,Spain,Madrid,40.416705,-3.703582,7,Café,Plaza,Bar,Coffee Shop,Spanish Restaurant,Park,Restaurant,Supermarket,Tapas Restaurant,Theater,Cocktail Bar,Gym / Fitness Center,Art Museum,Art Gallery,Clothing Store,Grocery Store,Museum,Garden,Gym,Sporting Goods Shop


### Cluster 8

In [20]:
europe_merged.loc[europe_merged['Cluster Labels']== 8,:]

Unnamed: 0,Country,Capital,Capital Latitude,Capital Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
14107,Malta,Valletta,35.898982,14.513676,8,Café,Mediterranean Restaurant,Supermarket,History Museum,Plaza,Beach,Bar,Italian Restaurant,Bay,Clothing Store,Restaurant,Pub,Grocery Store,Convenience Store,Park,Lounge,Garden,Wine Bar,Harbor / Marina,Coffee Shop
14919,Monaco,Monaco,43.731142,7.419758,8,Beach,French Restaurant,Café,Supermarket,Italian Restaurant,Bar,Park,Restaurant,Museum,Art Museum,Coffee Shop,Clothing Store,Mediterranean Restaurant,Garden,Plaza,Cocktail Bar,Boutique,Pedestrian Plaza,Scenic Lookout,Lounge


### Cluster 9

In [21]:
europe_merged.loc[europe_merged['Cluster Labels']== 9,:]

Unnamed: 0,Country,Capital,Capital Latitude,Capital Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
235,Andorra,Andorra la Vella,42.506939,1.521247,9,Restaurant,Ski Area,Café,Plaza,Coffee Shop,Ski Chairlift,Bar,Pub,Clothing Store,Sporting Goods Shop,Perfume Shop,Ski Trail,Shopping Mall,Cocktail Bar,Spanish Restaurant,BBQ Joint,Mediterranean Restaurant,Pizza Place,Mountain,Lounge


# Part 2: Comparing a City with the European Capitals

### Let's compare the Capitals from Europe with: 
 * Toronto, Canada
 * New York, USA
 * Sidney , Australia
 * Rio de Janeiro, Brazil

## 2.1 Getting the Latitude and Longitude of our cities

In [22]:
World = ['Canada','USA','Australia','Brazil']
Cities= ['Toronto','New York', 'Sidney','Rio de Janeiro']
Lat = []
Lon = []
for w, c in zip(World,Cities):
    adress = "{}, {}".format(c,w)
    geolocator = Nominatim(user_agent = "europe_explorer",timeout=3)
    local = geolocator.geocode(adress)
    latitude = local.latitude
    longitude = local.longitude
    print(latitude,longitude)
    Lat.append(latitude)
    Lon.append(longitude)

43.653963 -79.387207
40.7308619 -73.9871558
-33.8548157 151.2164539
-22.9110137 -43.2093727


In [23]:
df_cities = pd.DataFrame({'Country':World,'City':Cities,'Latitude':Lat,'Longitude':Lon})
df_cities.head()

Unnamed: 0,Country,City,Latitude,Longitude
0,Canada,Toronto,43.653963,-79.387207
1,USA,New York,40.730862,-73.987156
2,Australia,Sidney,-33.854816,151.216454
3,Brazil,Rio de Janeiro,-22.911014,-43.209373


## 2.2 getting the Venues with Foursquare API

In [24]:
with open('Foursquare_Developer.json') as fs:
    credentials = json.load(fs)
CLIENT_ID = credentials["Client ID"] 
CLIENT_SECRET = credentials["Client SECRET"] 
VERSION = '20180605'
RADIUS = 20000
LIMIT = 200

In [25]:
def getNearbyVenues(countries, cities, latitudes, longitudes, radius):
    
    venues_list=[]
    section = ['food','drinks','coffee','shops','arts','outdoors','sights'] 
    for country, city, lat, lng in zip(countries, cities, latitudes, longitudes):
        #print(country)
        for s in section:
            # create the API request URL
            url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&section={}'.format(
                CLIENT_ID, 
                CLIENT_SECRET, 
                VERSION, 
                lat, 
                lng, 
                radius, 
                LIMIT,
                s)
            
            # make the GET request
            results = requests.get(url).json()["response"]['groups'][0]['items']
        
            # return only relevant information for each nearby venue
            venues_list.append([(
                country,
                city, 
                lat, 
                lng, 
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Country',
                             'Capital', 
                  'Capital Latitude', 
                  'Capital Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Getting the venues

In [26]:
world_venues = getNearbyVenues(countries =df_cities['Country'],
                                cities = df_cities['City'], 
                               latitudes= df_cities['Latitude'],
                               longitudes=df_cities['Longitude'], radius = RADIUS)
world_venues.shape

(2800, 8)

Removing possible duplicates

In [27]:
world_venues = world_venues.drop_duplicates()
world_venues.shape

(2401, 8)

## 2.3 Preparing our Data

In [28]:
world_onehot = pd.get_dummies(world_venues['Venue Category'],prefix="",prefix_sep="")
world_onehot['Capital'] = world_venues['Capital']
world_onehot = world_onehot[[world_onehot.columns[-1]]+list(world_onehot.columns[0:-1])]
world_grouped = world_onehot.groupby('Capital').mean().reset_index()
world_grouped.head()

Unnamed: 0,Capital,American Restaurant,Amphitheater,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,...,Toy / Game Store,Track,Trail,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,New York,0.007937,0.0,0.0,0.0,0.0,0.009524,0.009524,0.0,0.003175,0.0,0.001587,0.001587,0.001587,0.009524,...,0.004762,0.0,0.001587,0.001587,0.001587,0.001587,0.0,0.0,0.004762,0.038095,0.004762,0.001587,0.007937,0.0,0.0
1,Rio de Janeiro,0.0,0.00161,0.0,0.0,0.00161,0.008052,0.009662,0.0,0.00161,0.00161,0.0,0.0,0.0,0.011272,...,0.0,0.00161,0.003221,0.008052,0.0,0.0,0.003221,0.00161,0.0,0.006441,0.0,0.0,0.0,0.0,0.0
2,Sidney,0.0,0.0,0.0,0.0,0.0,0.01049,0.003497,0.0,0.003497,0.0,0.017483,0.003497,0.0,0.01049,...,0.0,0.0,0.005245,0.005245,0.0,0.0,0.0,0.0,0.005245,0.012238,0.0,0.0,0.001748,0.001748,0.01049
3,Toronto,0.00173,0.0,0.00173,0.00173,0.0,0.022491,0.00346,0.00346,0.00346,0.00346,0.0,0.00346,0.0,0.010381,...,0.0,0.0,0.00173,0.008651,0.0,0.00173,0.010381,0.0,0.00173,0.00692,0.0,0.0,0.00519,0.0,0.0


Concat with europe_grouped

In [37]:
europe_grouped.head()

Unnamed: 0,Capital,ATM,Accessories Store,Adult Boutique,African Restaurant,Agriturismo,American Restaurant,Amphitheater,Antique Shop,Apres Ski Bar,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,...,Vineyard,Volleyball Court,Warehouse Store,Water Park,Waterfall,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Amsterdam,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010453,0.020906,...,0.0,0.0,0.001742,0.0,0.0,0.0,0.001742,0.008711,0.0,0.0,0.0,0.001742,0.020906,0.001742,0.010453
1,Andorra la Vella,0.0,0.002545,0.0,0.0,0.0,0.002545,0.0,0.0,0.007634,0.0,0.0,0.005089,0.005089,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.002545,0.0,0.0,0.0,0.002545,0.0,0.0,0.0
2,Ankara,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020101,0.001675,...,0.0,0.001675,0.0,0.0,0.0,0.0,0.0,0.001675,0.001675,0.0,0.0,0.00335,0.00335,0.0,0.0
3,Astana,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.005764,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Athens,0.0,0.0,0.0,0.0,0.0,0.003311,0.0,0.0,0.0,0.0,0.0,0.0,0.004967,0.006623,...,0.0,0.0,0.0,0.0,0.0,0.0,0.006623,0.013245,0.0,0.0,0.0,0.001656,0.003311,0.0,0.0


In [35]:
all_grouped = pd.concat([world_grouped,europe_grouped])

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  """Entry point for launching an IPython kernel.


In [38]:
all_grouped=all_grouped.set_index('Capital')
all_grouped.head()

Unnamed: 0_level_0,ATM,Accessories Store,Adult Boutique,African Restaurant,Agriturismo,American Restaurant,Amphitheater,Antique Shop,Apres Ski Bar,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Vineyard,Volleyball Court,Warehouse Store,Water Park,Waterfall,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
Capital,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
New York,0.0,0.0,0.0,0.0,0.0,0.007937,0.0,0.0,0.0,0.0,0.0,0.0,0.009524,0.009524,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.004762,0.038095,0.004762,0.0,0.0,0.001587,0.007937,0.0,0.0
Rio de Janeiro,0.0,0.0,0.0,0.0,0.0,0.0,0.00161,0.0,0.0,0.0,0.0,0.00161,0.008052,0.009662,0.0,...,0.0,0.0,0.003221,0.0,0.0,0.00161,0.0,0.006441,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Sidney,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01049,0.003497,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.005245,0.012238,0.0,0.0,0.0,0.0,0.001748,0.001748,0.01049
Toronto,0.0,0.0,0.0,0.0,0.0,0.00173,0.0,0.0,0.0,0.00173,0.0,0.0,0.022491,0.00346,0.00346,...,0.0,0.0,0.010381,0.0,0.0,0.0,0.00173,0.00692,0.0,0.0,0.0,0.0,0.00519,0.0,0.0
Amsterdam,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010453,0.020906,0.0,...,0.0,0.0,0.001742,0.0,0.0,0.0,0.001742,0.008711,0.0,0.0,0.0,0.001742,0.020906,0.001742,0.010453


In [40]:
k = 10
all_clusters = all_grouped
Hierar = AgglomerativeClustering(n_clusters= k).fit(all_clusters)
Hierar.labels_

array([5, 4, 3, 1, 4, 9, 1, 5, 1, 3, 4, 4, 7, 3, 4, 5, 4, 3, 1, 3, 1, 5,
       2, 0, 5, 6, 2, 5, 8, 5, 1, 1, 2, 0, 3, 1, 3, 2, 0, 0, 0, 4, 1, 3,
       3, 0, 7, 8, 3, 4, 1, 3, 0])

In [42]:
all_clusters.insert(0,'Cluster Labels',Hierar.labels_)



ValueError: cannot insert Cluster Labels, already exists

In [46]:
all_clusters.loc[all_clusters['Cluster Labels']== 1]

Unnamed: 0_level_0,Cluster Labels,ATM,Accessories Store,Adult Boutique,African Restaurant,Agriturismo,American Restaurant,Amphitheater,Antique Shop,Apres Ski Bar,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,...,Vineyard,Volleyball Court,Warehouse Store,Water Park,Waterfall,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
Capital,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
Toronto,1,0.0,0.0,0.0,0.0,0.0,0.00173,0.0,0.0,0.0,0.00173,0.0,0.0,0.022491,0.00346,...,0.0,0.0,0.010381,0.0,0.0,0.0,0.00173,0.00692,0.0,0.0,0.0,0.0,0.00519,0.0,0.0
Ankara,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020101,0.001675,...,0.0,0.001675,0.0,0.0,0.0,0.0,0.0,0.001675,0.001675,0.0,0.0,0.00335,0.00335,0.0,0.0
Athens,1,0.0,0.0,0.0,0.0,0.0,0.003311,0.0,0.0,0.0,0.0,0.0,0.0,0.004967,0.006623,...,0.0,0.0,0.0,0.0,0.0,0.0,0.006623,0.013245,0.0,0.0,0.0,0.001656,0.003311,0.0,0.0
Copenhagen,1,0.0,0.0,0.0,0.001783,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012478,0.012478,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026738,0.001783,0.0,0.0,0.001783,0.001783,0.0,0.0
Helsinki,1,0.0,0.0,0.0,0.0,0.0,0.001761,0.0,0.001761,0.0,0.0,0.0,0.0,0.014085,0.017606,...,0.0,0.001761,0.001761,0.0,0.0,0.007042,0.0,0.014085,0.001761,0.0,0.0,0.0,0.003521,0.0,0.0
Nicosia,1,0.0,0.0,0.0,0.0,0.0,0.001757,0.001757,0.0,0.0,0.0,0.0,0.0,0.02109,0.001757,...,0.0,0.0,0.0,0.0,0.0,0.001757,0.0,0.01406,0.001757,0.0,0.0,0.001757,0.001757,0.0,0.0
Oslo,1,0.0,0.0,0.0,0.0,0.0,0.001818,0.0,0.0,0.0,0.0,0.0,0.0,0.016364,0.010909,...,0.0,0.0,0.0,0.0,0.0,0.001818,0.001818,0.003636,0.016364,0.0,0.0,0.0,0.007273,0.0,0.0
Reykjavík,1,0.0,0.0,0.0,0.0,0.0,0.001996,0.0,0.0,0.0,0.0,0.0,0.0,0.005988,0.015968,...,0.0,0.0,0.0,0.0,0.0,0.0,0.001996,0.005988,0.011976,0.0,0.0,0.001996,0.001996,0.0,0.001996
Stockholm,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.015789,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014035,0.0,0.0,0.0,0.001754,0.003509,0.0,0.001754
Warsaw,1,0.0,0.0,0.0,0.0,0.0,0.003527,0.0,0.0,0.0,0.0,0.0,0.0,0.007055,0.010582,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010582,0.001764,0.0,0.0,0.0,0.001764,0.0,0.005291


In [31]:
num_top_venues = 20

indicators = ['st', 'nd', 'rd']

columns = ['Capital']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))


world_venues_sorted = pd.DataFrame(columns=columns)
world_venues_sorted['Capital'] = world_grouped['Capital']

for ind in np.arange(world_grouped.shape[0]):
    world_venues_sorted.iloc[ind, 1:] = return_most_common_venues(world_grouped.iloc[ind, :], num_top_venues)

world_venues_sorted.head(10)

Unnamed: 0,Capital,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,New York,Coffee Shop,Theater,Cocktail Bar,Gym,Café,Grocery Store,Wine Bar,Park,Italian Restaurant,Gym / Fitness Center,Speakeasy,Tea Room,Cycle Studio,Clothing Store,Bar,Bookstore,Pizza Place,Furniture / Home Store,Music Venue,Donut Shop
1,Rio de Janeiro,Bar,Coffee Shop,Gym / Fitness Center,Theater,Supermarket,Café,Fruit & Vegetable Store,Music Venue,Beach,Shopping Mall,Brazilian Restaurant,Chocolate Shop,Plaza,Dive Bar,Japanese Restaurant,Bookstore,Pizza Place,Park,Scenic Lookout,Steakhouse
2,Sidney,Café,Park,Supermarket,Beach,Coffee Shop,Bar,Pub,Cocktail Bar,Multiplex,Shopping Mall,Theater,Movie Theater,Grocery Store,Australian Restaurant,Italian Restaurant,Pool,Gym,Museum,Wine Bar,Thai Restaurant
3,Toronto,Café,Park,Coffee Shop,Grocery Store,Bar,Supermarket,Gym,Theater,Art Gallery,Movie Theater,Music Venue,Beer Bar,Clothing Store,Restaurant,Gastropub,Italian Restaurant,Cocktail Bar,Pharmacy,Japanese Restaurant,Furniture / Home Store


## 2.4 Comparing the cities with the European Capitals 

In [34]:
world_clusters = world_grouped.drop('Capital',axis=1)
labels = Hierar.fit_predict(world_clusters)
labels

ValueError: Cannot extract more clusters than samples: 10 clusters where given for a tree with 4 leaves.

In [316]:
world_venues_sorted.insert(0,'Cluster Labels',labels)

## 2.5 Results

As we can see in the table below, all cities that we choose are in the same cluster.

In [317]:
world_venues_sorted = world_venues_sorted.set_index('Capital')
world_venues_sorted.head()

Unnamed: 0_level_0,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
Capital,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
New York,1,Coffee Shop,Theater,Cocktail Bar,Gym,Café,Grocery Store,Wine Bar,Park,Italian Restaurant,Gym / Fitness Center,Speakeasy,Tea Room,Cycle Studio,Clothing Store,Bar,Bookstore,Pizza Place,Furniture / Home Store,Music Venue,Donut Shop
Rio de Janeiro,1,Bar,Coffee Shop,Gym / Fitness Center,Theater,Supermarket,Café,Fruit & Vegetable Store,Music Venue,Beach,Shopping Mall,Brazilian Restaurant,Chocolate Shop,Plaza,Dive Bar,Japanese Restaurant,Bookstore,Pizza Place,Park,Scenic Lookout,Steakhouse
Sidney,1,Café,Park,Supermarket,Beach,Coffee Shop,Bar,Pub,Cocktail Bar,Multiplex,Shopping Mall,Theater,Movie Theater,Grocery Store,Australian Restaurant,Italian Restaurant,Pool,Gym,Museum,Wine Bar,Thai Restaurant
Toronto,1,Café,Park,Coffee Shop,Grocery Store,Bar,Supermarket,Gym,Theater,Art Gallery,Movie Theater,Music Venue,Beer Bar,Clothing Store,Restaurant,Gastropub,Italian Restaurant,Cocktail Bar,Pharmacy,Japanese Restaurant,Furniture / Home Store


### MAP

In [318]:
adress = "Europe"
geolocator = Nominatim(user_agent = "europe_explorer",timeout=3)
local = geolocator.geocode(adress)
latitude = local.latitude
longitude = local.longitude
print(latitude,longitude)
cluster = europe_merged.loc[europe_merged['Cluster Labels']== world_venues_sorted.loc['New York','Cluster Labels']]
europe = folium.Map(location=[latitude,longitude], zoom_start = 3)
for lat,long,country,city in zip(cluster['Capital Latitude'],cluster['Capital Longitude'],cluster['Country'],cluster['Capital']):
    label = "{}, {}".format(city,country)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat,long],
        radius = 5,
        popup = label,
        color= 'red',
        fill = True,
        fill_opacity=0.7,
        parse_html= False).add_to(europe)
europe

51.0 10.0
