# Coursera Capstone project
This notebook will contain the code for the IBM applied capstone project course

In [58]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
from numpy.random import randint
import folium
from sklearn.cluster import KMeans

import matplotlib.cm as cm
import matplotlib.colors as colors
from IPython.display import display

In [66]:
print(" Hello Capstone Project Course!")

 Hello Capstone Project Course!


## Purpose of the project
I'll now be acquiring information regarding Milan's boroughs and will perform an analysis similar to what we've done with Toronto and NY. I'll then proceed to compare it to other cities. Milan data are not readily available as for NY or Toronto; in order to build a suitable database of Boroughs and coordinates, I am going to select the main boroughs as listed on the wikipedia page, and I am getting their coordinates from an institutional database containing all coordinates from every single address in Milan, by averaging the coordinates which are listed in a given area. This operation is somewhat arbitrary as borders between areas are not necessarily well defined but the results are good enough.

In [4]:
url = "https://en.wikipedia.org/wiki/Zones_of_Milan"
r = requests.get(url)
soup = BeautifulSoup(r.content, 'lxml') 
#print(soup.prettify())
My_table = soup.find('table',{'class':'wikitable sortable'})
#My_table
ths = My_table.find_all('th')
headings = [th.text.strip() for th in ths]
headings[0:6]

['Borough',
 'Name',
 'Area(km2)',
 'Population(2014)',
 'Population density(inhabitants/km2)',
 'Quartieri (districts)']

In [5]:
rows=[]
tools=[]
cols=[]
for link in My_table.find_all('tr'):
    tools = link.find_all('td')
    #print(tools.extend(th.find_all(text='a')))
    #zone = name.find_all('td')
    #print(type(name))
    rows.append(tools)
    #print(link.get_text('title'))
rows[4][1].get_text()
#print(tools)
#cols

'Porta Vittoria, Forlanini'

In [6]:
table_df = pd.DataFrame(columns=["Code", "Borough", "Neighbourhood"])
for rrx in range(9):
    rr=rrx+1
    table_df = table_df.append({
    "Code": rows[rr][0].get_text('title'),
    "Borough":  rows[rr][1].get_text(),
    "Neighbourhood": rows[rr][5].get_text()
      }, ignore_index=True)
    #print("Code: %s, Borough: %s, Neighbourhood: %s" % \
         #(rows[rr][0].get_text('title'),rows[rr][1].get_text(), rows[rr][5].get_text()))
        
table_df

Unnamed: 0,Code,Borough,Neighbourhood
0,1,Centro storico,"Brera, Centro Storico, Conca del Naviglio, Gua..."
1,2,"Stazione Centrale, Gorla, Turro, Greco, Cresce...","Adriano, Crescenzago, Gorla, Greco, Loreto, Ma..."
2,3,"Città Studi, Lambrate, Porta Venezia","Casoretto, Cimiano, Città Studi, Dosso, Lambra..."
3,4,"Porta Vittoria, Forlanini","Acquabella, Calvairate, Castagnedo, Cavriano, ..."
4,5,"Vigentino, Chiaravalle, Gratosoglio","Basmetto, Cantalupa, Case Nuove, Chiaravalle, ..."
5,6,"Barona, Lorenteggio","Arzaga, Barona, Boffalora, Cascina Bianca, Con..."
6,7,"Baggio, De Angeli, San Siro","Assiano, Baggio, Figino, Fopponino, Forze Arma..."
7,8,"Fiera, Gallaratese, Quarto Oggiaro","Boldinasco, Bullona, Cagnola, Campo dei Fiori,..."
8,9,"Porta Garibaldi, Niguarda","Affori, Bicocca, Bovisa, Bovisasca, Bruzzano, ..."


### NB
We cannot analyse Borough as listed because they are big and overly generic, plus they have elongated geographical shapes which would make the analysis not optimal. This is because the previous boroughs were merged for the sake of administrative efficiency. We can go back to a previous division for better accuracy.

In [8]:
areas = []
for rrx in range(9):
    rr=rrx+1
    areas = areas+rows[rr][1].get_text().split (",")
    #areas.append(rows[rr][1].get_text().split (","))
areas = [x.strip(' ') for x in areas]
areas

['Centro storico',
 'Stazione Centrale',
 'Gorla',
 'Turro',
 'Greco',
 'Crescenzago',
 'Città Studi',
 'Lambrate',
 'Porta Venezia',
 'Porta Vittoria',
 'Forlanini',
 'Vigentino',
 'Chiaravalle',
 'Gratosoglio',
 'Barona',
 'Lorenteggio',
 'Baggio',
 'De Angeli',
 'San Siro',
 'Fiera',
 'Gallaratese',
 'Quarto Oggiaro',
 'Porta Garibaldi',
 'Niguarda']

In [9]:
coordinates_df = pd.DataFrame(columns=[ "Borough","Latitude","Longitude"])
coordinates_df.Borough = areas

In [10]:
other_boroughs = ['Duomo','Brera','Sarpi','Ticinese']
for i in range(len(other_boroughs)):
    coordinates_df.loc[i+24] = other_boroughs[i]
    coordinates_df.Latitude[i+24]=np.nan
    coordinates_df.Longitude[i+24]=np.nan

In [11]:
coordinates_df

Unnamed: 0,Borough,Latitude,Longitude
0,Centro storico,,
1,Stazione Centrale,,
2,Gorla,,
3,Turro,,
4,Greco,,
5,Crescenzago,,
6,Città Studi,,
7,Lambrate,,
8,Porta Venezia,,
9,Porta Vittoria,,


### Inserting spatial coordinartes
Due to the unavailability of geocoder, I have found a database of Milan coordinates (for each and every address within the City area!) and I am extracting the coordinates based on matching address of Borough central location. The database is available at :

http://dati.comune.milano.it/dataset/5c6519f6-6d26-41c9-b53b-6106e08d1b90/resource/533b4e63-3d78-4bb5-aeb4-6c5f648f7f21/download/ds634_civici_coordinategeografiche_20190902_csv.zip

To do that I get the average coordinates when the borough name matches between the two tables, with the exception of "Centro storico", which corresponds to "MUNICIPIO" = 1 in the Milan_coordinates table.
We also need to make corrections on the borough names due to slight differences between tables namings.

In [12]:
milan_db = pd.read_excel('Milan_coordinates.xlsx')
milan_db.columns

Index(['MUNICIPIO', 'OPENSTREETMAP', 'PROGANNCSU', 'code', 'borough', 'Lat',
       'Lon'],
      dtype='object')

In [13]:
centro = milan_db[milan_db.MUNICIPIO == 1]
coordinates_df.Latitude[0]= centro.Lat.mean()
coordinates_df.Longitude[0]= centro.Lon.mean()

Renaming some boroughs

In [14]:
coordinates_df.Borough[3]='Vle Monza'
coordinates_df.Borough[2]='Isola'
coordinates_df.Borough[5]='Padova'
coordinates_df.Borough[9]='Guastalla'
coordinates_df.Borough[18]='S. Siro'
coordinates_df.Borough[19]='Tre Torri'

In [15]:
for xx in range(1,28):
    name = coordinates_df.Borough[xx].upper()
    if name == 'STAZIONE CENTRALE':
        name = 'CENTRALE'
    elif name == 'PORTA VENEZIA':
        name = 'BUENOS AIRES - VENEZIA'
    elif name =='CITTÀ STUDI':
        name = 'CITTA\' STUDI'
    elif name=='FORLANINI':
        name='Parco Forlanini - ORTICA'
    elif name=='VIGENTINO':
        name='VIGENTINA'
    elif name=='GRATOSOGLIO':
        name ='GRATOSOGLIO - TICINELLO'
    elif name == 'DE ANGELI':
        name = 'DE ANGELI - MONTE ROSA'
    elif name == 'PORTA GARIBALDI':
        name = 'GARIBALDI REPUBBLICA'
    elif name == 'NIGUARDA':
        name = "NIGUARDA - CA' GRANDA"
    bor = milan_db[milan_db.borough == name]
    coordinates_df.Latitude[xx]= bor.Lat.mean()
    coordinates_df.Longitude[xx]= bor.Lon.mean()

In [16]:
coordinates_df

Unnamed: 0,Borough,Latitude,Longitude
0,Centro storico,45.3917,9.18577
1,Stazione Centrale,45.486,9.20433
2,Isola,45.4907,9.18961
3,Vle Monza,45.5116,9.22721
4,Greco,45.5033,9.20728
5,Padova,45.5021,9.23471
6,Città Studi,45.4786,9.23005
7,Lambrate,45.479,9.24471
8,Porta Venezia,45.4775,9.21479
9,Guastalla,45.4635,9.20229


### Acquisition of venue informations regarding Milan Boroughs

We are now going to use foursquare API to get information on Milan's boroughs.

In [18]:
latitude, longitude =45.4632,9.18704

# create map of Milan using latitude and longitude values
map_milan = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, borough in zip(coordinates_df['Latitude'], coordinates_df['Longitude'], coordinates_df['Borough']):
    label = '{}'.format(borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_milan)  
    
map_milan

### Foursquare credentials and version

In [19]:
CLIENT_ID = 'Q5JFHWQ0A5KGBLIHSUDYF0FLVARXYNMOTVJXPFR300E20NTW' # your Foursquare ID
CLIENT_SECRET = 'URBNDCCDV0MEZ4OPEZSHHRUUHRW4ZXQZUVLZ5OEEEDJY5NP4' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)
radius=500
LIMIT=100

Your credentails:
CLIENT_ID: Q5JFHWQ0A5KGBLIHSUDYF0FLVARXYNMOTVJXPFR300E20NTW
CLIENT_SECRET:URBNDCCDV0MEZ4OPEZSHHRUUHRW4ZXQZUVLZ5OEEEDJY5NP4


In [38]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

We can now get information about Milan's venues.

In [39]:
milan_venues = getNearbyVenues(names=coordinates_df['Borough'],
                                   latitudes=coordinates_df['Latitude'],
                                   longitudes=coordinates_df['Longitude']
                                  )

Centro storico
Stazione Centrale
Isola
Vle Monza
Greco
Padova
Città Studi
Lambrate
Porta Venezia
Guastalla
Forlanini
Vigentino
Chiaravalle
Gratosoglio
Barona
Lorenteggio
Baggio
De Angeli
S. Siro
Tre Torri
Gallaratese
Quarto Oggiaro
Porta Garibaldi
Niguarda
Duomo
Brera
Sarpi
Ticinese


In [41]:
print(milan_venues.shape)
milan_venues.head()

(1045, 7)


Unnamed: 0,Borough,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Centro storico,45.391712,9.185767,pontesestracc,45.390277,9.186062,Trail
1,Centro storico,45.391712,9.185767,Bar snack,45.390775,9.188436,Restaurant
2,Centro storico,45.391712,9.185767,Market,45.3914,9.1895,Restaurant
3,Centro storico,45.391712,9.185767,"Area di Servizio ""Rozzano Est""",45.391432,9.189544,Gas Station
4,Stazione Centrale,45.485976,9.204331,Ostello Bello Grande,45.484188,9.205571,Hostel


## Analyse neighbourhoods
Using one hot encoding to organise information about venue categories.

In [42]:
# one hot encoding
milan_onehot = pd.get_dummies(milan_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
milan_onehot['Borough'] = milan_venues['Borough'] 

# move neighborhood column to the first column
fixed_columns = [milan_onehot.columns[-1]] + list(milan_onehot.columns[:-1])
milan_onehot = milan_onehot[fixed_columns]

milan_onehot.head()

Unnamed: 0,Borough,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,...,Trail,Train Station,Tram Station,Trattoria/Osteria,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Wine Bar,Wine Shop,Women's Store
0,Centro storico,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
1,Centro storico,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Centro storico,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Centro storico,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Stazione Centrale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [43]:
milan_grouped = milan_onehot.groupby('Borough').mean().reset_index()
milan_grouped.head()

Unnamed: 0,Borough,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,...,Trail,Train Station,Tram Station,Trattoria/Osteria,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Wine Bar,Wine Shop,Women's Store
0,Baggio,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Barona,0.0,0.0,0.0,0.0,0.090909,0.0,0.090909,0.0,0.0,...,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0
2,Brera,0.0,0.0,0.01,0.01,0.02,0.0,0.01,0.0,0.0,...,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.0
3,Centro storico,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Chiaravalle,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Printing the top 5 venues of each neighbourhood, and making a database with the top 10 per borough.

In [44]:
num_top_venues = 5

for hood in milan_grouped['Borough']:
    print("----"+hood+"----")
    temp = milan_grouped[milan_grouped['Borough'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Baggio----
                venue  freq
0  Italian Restaurant  0.17
1         Pizza Place  0.17
2                Café  0.08
3        Soccer Field  0.08
4           Gastropub  0.08


----Barona----
            venue  freq
0    Soccer Field  0.18
1         Theater  0.09
2          Bakery  0.09
3  Tennis Stadium  0.09
4      Food Court  0.09


----Brera----
                venue  freq
0  Italian Restaurant  0.20
1      Ice Cream Shop  0.08
2               Hotel  0.05
3          Restaurant  0.04
4            Wine Bar  0.04


----Centro storico----
                 venue  freq
0           Restaurant  0.50
1          Gas Station  0.25
2                Trail  0.25
3  American Restaurant  0.00
4  Peruvian Restaurant  0.00


----Chiaravalle----
                venue  freq
0          Restaurant  0.17
1  Italian Restaurant  0.17
2                Café  0.17
3         Beer Garden  0.17
4   Convenience Store  0.17


----Città Studi----
            venue  freq
0     Pizza Place  0.12
1  Ice Cream 

In [26]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [53]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Borough']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
milan_venues_sorted = pd.DataFrame(columns=columns)
milan_venues_sorted['Borough'] = milan_grouped['Borough']

for ind in np.arange(milan_grouped.shape[0]):
    milan_venues_sorted.iloc[ind, 1:] = return_most_common_venues(milan_grouped.iloc[ind, :], num_top_venues)

milan_venues_sorted.head()

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Baggio,Italian Restaurant,Pizza Place,Japanese Restaurant,Supermarket,Gastropub,Convenience Store,Café,Bar,Soccer Field,Park
1,Barona,Soccer Field,Japanese Restaurant,Café,Food Court,Tennis Stadium,Bakery,Theater,Athletics & Sports,Trattoria/Osteria,Arts & Crafts Store
2,Brera,Italian Restaurant,Ice Cream Shop,Hotel,Restaurant,Wine Bar,Plaza,Pizza Place,Bakery,Dessert Shop,Café
3,Centro storico,Restaurant,Trail,Gas Station,Women's Store,Fish & Chips Shop,Furniture / Home Store,Fried Chicken Joint,Fountain,Food Truck,Food Court
4,Chiaravalle,Beer Garden,General Entertainment,Italian Restaurant,Restaurant,Convenience Store,Café,Women's Store,Furniture / Home Store,Fried Chicken Joint,Fountain


# Clustering boroughs

In [54]:
kclusters = 10

milan_grouped_clustering = milan_grouped.drop('Borough', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(milan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:40]

array([1, 9, 1, 3, 0, 9, 1, 1, 1, 7, 8, 1, 1, 1, 1, 5, 4, 9, 1, 1, 2, 6, 1,
       1, 1, 1, 9, 1])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [55]:
# add clustering labels
milan_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

milan_merged = coordinates_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
milan_merged = milan_merged.join(milan_venues_sorted.set_index('Borough'), on='Borough')

milan_merged.head()

Unnamed: 0,Borough,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Centro storico,45.3917,9.18577,3,Restaurant,Trail,Gas Station,Women's Store,Fish & Chips Shop,Furniture / Home Store,Fried Chicken Joint,Fountain,Food Truck,Food Court
1,Stazione Centrale,45.486,9.20433,1,Hotel,Café,Italian Restaurant,Pizza Place,Restaurant,Ice Cream Shop,Sandwich Place,Bistro,Breakfast Spot,Plaza
2,Isola,45.4907,9.18961,1,Italian Restaurant,Café,Ice Cream Shop,Pizza Place,Lombard Restaurant,Cocktail Bar,Burger Joint,Bakery,Ramen Restaurant,Pub
3,Vle Monza,45.5116,9.22721,1,Italian Restaurant,Café,Pizza Place,Puglia Restaurant,Supermarket,Pharmacy,Cupcake Shop,Cosmetics Shop,Restaurant,Food & Drink Shop
4,Greco,45.5033,9.20728,1,Pizza Place,Italian Restaurant,Supermarket,Pet Store,Cocktail Bar,Chinese Restaurant,Other Repair Shop,Steakhouse,Plaza,Tunnel


# Clusters map

In [56]:
# create map
map_milan_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(milan_merged['Latitude'], milan_merged['Longitude'], milan_merged['Borough'], milan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_milan_clusters)
       
map_milan_clusters

## Examining clusters
by checking clusters composition we can understand something about how they were made.

In [65]:
for clst in range(kclusters):
    
    display(milan_merged.loc[milan_merged['Cluster Labels'] == clst, milan_merged.columns[[1] + list(range(4, milan_merged.shape[1]))]])

Unnamed: 0,Latitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,45.4166,Beer Garden,General Entertainment,Italian Restaurant,Restaurant,Convenience Store,Café,Women's Store,Furniture / Home Store,Fried Chicken Joint,Fountain


Unnamed: 0,Latitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,45.486,Hotel,Café,Italian Restaurant,Pizza Place,Restaurant,Ice Cream Shop,Sandwich Place,Bistro,Breakfast Spot,Plaza
2,45.4907,Italian Restaurant,Café,Ice Cream Shop,Pizza Place,Lombard Restaurant,Cocktail Bar,Burger Joint,Bakery,Ramen Restaurant,Pub
3,45.5116,Italian Restaurant,Café,Pizza Place,Puglia Restaurant,Supermarket,Pharmacy,Cupcake Shop,Cosmetics Shop,Restaurant,Food & Drink Shop
4,45.5033,Pizza Place,Italian Restaurant,Supermarket,Pet Store,Cocktail Bar,Chinese Restaurant,Other Repair Shop,Steakhouse,Plaza,Tunnel
7,45.479,Plaza,Café,Italian Restaurant,Supermarket,Basketball Court,Buffet,Pizza Place,Social Club,Mobile Phone Shop,Steakhouse
8,45.4775,Italian Restaurant,Pizza Place,Cocktail Bar,Hotel,Clothing Store,Dessert Shop,Bakery,Steakhouse,Chinese Restaurant,Pub
9,45.4635,Pizza Place,Italian Restaurant,Sandwich Place,Clothing Store,Hotel,Ice Cream Shop,Japanese Restaurant,Furniture / Home Store,Sporting Goods Shop,Bistro
10,45.4681,Italian Restaurant,Nightclub,Art Gallery,Café,Steakhouse,Historic Site,Beer Bar,Women's Store,Food,Furniture / Home Store
16,45.4612,Italian Restaurant,Pizza Place,Japanese Restaurant,Supermarket,Gastropub,Convenience Store,Café,Bar,Soccer Field,Park
17,45.4739,Italian Restaurant,Plaza,Café,Pizza Place,Asian Restaurant,Hotel,Bar,Sushi Restaurant,Dessert Shop,Chinese Restaurant


Unnamed: 0,Latitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,45.5146,Supermarket,Sports Club,Park,Women's Store,Flea Market,Furniture / Home Store,Fried Chicken Joint,Fountain,Food Truck,Food Court


Unnamed: 0,Latitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,45.3917,Restaurant,Trail,Gas Station,Women's Store,Fish & Chips Shop,Furniture / Home Store,Fried Chicken Joint,Fountain,Food Truck,Food Court


Unnamed: 0,Latitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,45.5168,Pizza Place,Cafeteria,Café,Hotel,Sushi Restaurant,Women's Store,Flea Market,Fried Chicken Joint,Fountain,Food Truck


Unnamed: 0,Latitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,45.4514,Italian Restaurant,Café,Pizza Place,Hockey Arena,Park,Women's Store,Fish & Chips Shop,Fried Chicken Joint,Fountain,Food Truck


Unnamed: 0,Latitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,45.4759,Soccer Stadium,Bar,Snack Place,Diner,Sporting Goods Shop,Bistro,Metro Station,Bed & Breakfast,Burger Joint,Pizza Place


Unnamed: 0,Latitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,45.4973,Ice Cream Shop,Pizza Place,Fast Food Restaurant,Fried Chicken Joint,Asian Restaurant,Bookstore,Metro Station,Clothing Store,Park,Women's Store


Unnamed: 0,Latitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,45.4121,Italian Restaurant,Light Rail Station,Tram Station,Food Court,Restaurant,Park,Supermarket,Women's Store,Fish & Chips Shop,Fried Chicken Joint


Unnamed: 0,Latitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,45.5021,Café,Italian Restaurant,Restaurant,Gay Bar,Food & Drink Shop,Chinese Restaurant,Supermarket,Park,Performing Arts Venue,Clothing Store
6,45.4786,Ice Cream Shop,Pizza Place,Pub,Restaurant,Café,Bar,Cupcake Shop,Supermarket,Pool,Radio Station
11,45.4516,Café,Wine Bar,Pizza Place,Burger Joint,Cocktail Bar,Restaurant,Bistro,Italian Restaurant,Bed & Breakfast,Food Truck
14,45.4325,Soccer Field,Japanese Restaurant,Café,Food Court,Tennis Stadium,Bakery,Theater,Athletics & Sports,Trattoria/Osteria,Arts & Crafts Store


# Comparison between cities
I'll now proceed to compare Milan to other towns. Interesting comparisons would be with other European economic capitals of similar size (Munich, Barcelona, Vienna...) or Toronto itself.
Possible ideas:
- compare different towns to see if they are distinguishable in terms of neighbourhood composition
- see where given kinds of restaurant or other activity are more popular 

For instance I could import the grouped table from the Toronto exercise, join it to the one from Milan and see if, when clustered, the neughbourhoods/boroughs get mixed up or are kept separated.

In [67]:
trno_grouped = pd.read_excel('toronto_grouped_table.xls')