# New Hotel in Milan

----

## Instructions

Now that you have been equipped with the skills and the tools to use location data to explore a geographical location, over the course of two weeks, you will have the opportunity to be as creative as you want and come up with an idea to leverage the Foursquare location data to explore or compare neighborhoods or cities of your choice or to come up with a problem that you can use the Foursquare location data to solve. If you cannot think of an idea or a problem, here are some ideas to get you started:
 + In Module 3, we explored New York City and the city of Toronto and segmented and clustered their neighborhoods. Both cities are very diverse and are the financial capitals of their respective countries. One interesting idea would be to compare the neighborhoods of the two cities and determine how similar or dissimilar they are. Is New York City more like Toronto or Paris or some other multicultural city? I will leave it to you to refine this idea.
 + In a city of your choice, if someone is looking to open a restaurant, where would you recommend that they open it? Similarly, if a contractor is trying to start their own business, where would you recommend that they setup their office?

These are just a couple of many ideas and problems that can be solved using location data in addition to other datasets. No matter what you decide to do, make sure to provide sufficient justification of why you think what you want to do or solve is important and why would a client or a group of people be interested in your project.


For this week, you will required to submit the following:

 + A description of the problem and a discussion of the background.
 + A description of the data and how it will be used to solve the problem.


For the second week, the final deliverables of the project will be:

+ A link to your Notebook on your Github repository, showing your code.
+ A full report consisting of all of the following components:
    + Introduction where you discuss the business problem and who would be interested in this project.
    + Data where you describe the data that will be used to solve the problem and the source of the data.
    + Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, if any, and what machine learnings were used and why.
    + Results section where you discuss the results.
    + Discussion section where you discuss any observations you noted and any recommendations you can make based on the results.
    + Conclusion section where you conclude the report.
+ Your choice of a presentation or blogpost.

## Table of Contents

1.  <a href="#item1">Introduction / Business Problem</a>
2.  <a href="#item2">Data</a>


### Import needed dependencies

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

import shapefile #library to handle shapefile

!pip install geopandas #uncomment this line if installation is needed
import geopandas as gpd  #library to read geofiles, used to convert polygon data to point

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting geopandas
[?25l  Downloading https://files.pythonhosted.org/packages/2a/9f/e8a440a993e024c0d3d4e5c7d3346367c50c9a1a3d735caf5ee3bde0aab1/geopandas-0.8.2-py2.py3-none-any.whl (962kB)
[K     |████████████████████████████████| 972kB 8.6MB/s eta 0:00:01
[?25hCollecting shapely (from geopandas)
[?25l  Downloading https://files.pythonhosted.org/packages/9d/18/557d4f55453fe00f59807b111cc7b39ce53594e13ada88e16738fb4ff7fb/Shapely-1.7.1-cp36-cp36m-manylinux1_x86_64.whl (1.0MB)
[K     |████████████████████████████████| 1.0MB 33.8MB/s eta 0:00:01
[?25hCollecting fiona (from geopandas)
[?25l  Downloading https://files.pythonhosted.org/packages/37/94/4910fd55246c1d963727b03885ead6ef1cd3748a465f7b0239ab25dfc9a3/Fiona-1.8.18-cp36-cp36m-manylinux1_x86_64.whl (14.8MB)
[K     |████████████████████████████████| 14.8MB 7.0MB/s eta 0:00:011     |██████████████████████████████  | 13.8MB 7.0MB/s eta 0:00:01
[?25hCollecting pyproj>=2.2.0 (from geopandas)
[?25l  Downloading https://files.pyt

## 1. Introduction / Business Problem

After 2015 Expo, Milan is on the rise. The city is growing very fast and also its appeal to turists and business men. For this reason, a new investor from abroad wants to open a brand new hotel in town. He doesn't know the city, so he wants to understand which are the best locations to look at.

Prior collecting any data we have a conversation, where we talk about what "best location" means to him. The investor is an art expert and wants to build an hotel that talks about his passion for art. For this reason he believes his hotel should be **near a Metro Station** and with enough **Arts & Entertainment** venues nearby. From this analysis he expects me to give him a **top-three list of best Milan Neighborhoods** for his hotel location, so he can proceed in contacting relevant stakeholders.

## 2. Data

Now that the business problem is clearly defined, we know which data we need:

1. List of Neighborhoods in Milan
2. List of Metro Station and Arts & Entartainment venues

For point 1 I will scrape Neighborhoods list from the Milan Municipality website, while for point 2 I will use the Foursquare API.

### Get Milan Neighborhoods

From the Milan town website, a csv list of all neighborhoods is available for download

In [2]:
!wget -q -O 'milan_neigh.csv' https://dati.comune.milano.it/dataset/e8e765fc-d882-40b8-95d8-16ff3d39eb7c/resource/3fce7202-0076-4a7b-ac2c-d2ab9b5dc658/download/ds964_nil_wm_.csv
print('Data downloaded!')

Data downloaded!


In [3]:
import pandas as pd

milan_neigh = pd.read_csv('milan_neigh.csv', sep=';')

print('dataframe shape is ',milan_neigh.shape)
milan_neigh.head()

dataframe shape is  (88, 8)


Unnamed: 0,ID_NIL,NIL,Valido_dal,Valido_al,Fonte,Shape_Length,Shape_Area,OBJECTID
0,48,RONCHETTO SUL NAVIGLIO - Q.RE LODOVICO IL MORO,05/02/2020,Vigente,Milano 2030 - PGT Approvato,8723.368714,2406306.0,89
1,64,TRENNO,05/02/2020,Vigente,Milano 2030 - PGT Approvato,3309.9988,489692.1,90
2,67,PORTELLO,05/02/2020,Vigente,Milano 2030 - PGT Approvato,3800.750663,909602.2,91
3,81,BOVISASCA,05/02/2020,Vigente,Milano 2030 - PGT Approvato,7105.469715,1578028.0,92
4,84,PARCO NORD,05/02/2020,Vigente,Milano 2030 - PGT Approvato,11741.717005,1532331.0,93


From the same website, we can find also the geojson file and a .zip folder which contains shapefiles. Since the geojson files contains polygon coordinates, for the purpose of this analysis I will download the zip folder and use geopandas library to easily convert polygon into points.

In [4]:
sf = shapefile.Reader("NIL_WM.shp")
sf.shapeTypeName

'POLYGON'

In [5]:
# get geopanda dataframe with polygon information

polys = gpd.read_file("NIL_WM.shp")
polys.head()

Unnamed: 0,geometry
0,"POLYGON ((9.15422 45.43775, 9.15419 45.43707, ..."
1,"POLYGON ((9.10623 45.49016, 9.10295 45.48939, ..."
2,"POLYGON ((9.15636 45.48785, 9.15724 45.48721, ..."
3,"POLYGON ((9.16803 45.52234, 9.16687 45.52027, ..."
4,"POLYGON ((9.20040 45.52848, 9.20055 45.52828, ..."


In [6]:
#convert polygon to points

points = polys.copy()
points['geometry'] = points['geometry'].centroid
points.head()

Unnamed: 0,geometry
0,POINT (9.13726 45.43846)
1,POINT (9.10167 45.49282)
2,POINT (9.15395 45.48449)
3,POINT (9.15673 45.51743)
4,POINT (9.18424 45.52351)


Now that I have points, I need to create two columns, one for latitude and one for longitude. To do this, I can pandas string split method:

In [7]:
coordinates = points.copy()
coordinates['geometry'] = coordinates['geometry'].astype(str).str.replace('(','').str.replace(')','')

coordinates[['Drop','Longitude','Latitude']] = coordinates.geometry.str.split(" ",expand=True)
coordinates = pd.DataFrame(coordinates[['Latitude','Longitude']]).astype(float)

print('data type is ', coordinates.dtypes)
print('df shape is ', coordinates.shape)
coordinates.head()

data type is  Latitude     float64
Longitude    float64
dtype: object
df shape is  (88, 2)




Unnamed: 0,Latitude,Longitude
0,45.43846,9.13726
1,45.492822,9.101675
2,45.48449,9.153947
3,45.517433,9.156731
4,45.523514,9.184235


Last steps are merging the csv and the points dataframe and keep in the dataframe only relevant columns:

In [8]:
df_milan = milan_neigh.join(coordinates)
df_milan = df_milan[['ID_NIL','NIL','Latitude','Longitude']]

print('df_milan shape is ', df_milan.shape)
df_milan.head()

df_milan shape is  (88, 4)


Unnamed: 0,ID_NIL,NIL,Latitude,Longitude
0,48,RONCHETTO SUL NAVIGLIO - Q.RE LODOVICO IL MORO,45.43846,9.13726
1,64,TRENNO,45.492822,9.101675
2,67,PORTELLO,45.48449,9.153947
3,81,BOVISASCA,45.517433,9.156731
4,84,PARCO NORD,45.523514,9.184235


Let's finally plot our results on map to see where these neighborhoods are located:

In [9]:
#get Milan coordinates using Nominatim
address = 'Milan, IT'

geolocator = Nominatim(user_agent="on_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Milan are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Milan are 45.4668, 9.1905.


In [10]:
# create map of Milan using latitude and longitude values
map_milan = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(df_milan['Latitude'], df_milan['Longitude'], df_milan['NIL']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_milan)  
    
map_milan

### Get Metro Station and Arts & Entertainment venues

Now that I have collected information and location of each neighborhood, it's time to use the Foursquare API to get a list of Metro Station and Arts & Entertainment venues for every location

In [93]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
ACCESS_TOKEN = '' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 300

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: CLIENT_SECRET:

In [79]:
# function that extracts the category of the venue
#def get_category_type(row):
#    try:
 #       categories_list = row['categories']
  #  except:
   #     categories_list = row['venue.categories']
    #    
  #  if len(categories_list) == 0:
   #     return None
    #else:
     #   return categories_list[0]['name']

To quickly get a list of Metro Station and Arts and Entertainment venues, I have adapted the formula ceated in the lab including categoryId as a parameter. In this way we can lower computational times and obtain a pre-filtered ready-to-use database:

In [90]:
def getNearbyVenues(names, latitudes, longitudes, categoryId, radius):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng,
            categoryId,
            radius, 
            limit)
        
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

First I call the Metro station database:

In [100]:
milan_metro = getNearbyVenues(names=df_milan['NIL'], latitudes=df_milan['Latitude'], longitudes=df_milan['Longitude'], categoryId='4bf58dd8d48988d1fd931735', radius=1000)
milan_metro.tail()

RONCHETTO SUL NAVIGLIO - Q.RE LODOVICO IL MORO
TRENNO
PORTELLO
BOVISASCA
PARCO NORD
FIGINO
LORETO - CASORETTO - NOLO
QUARTO OGGIARO - VIALBA - MUSOCCO
ISOLA
QUARTO CAGNINO
STADIO - IPPODROMI
QUINTO ROMANO
DUOMO
GUASTALLA
SAN SIRO
COMASINA
TIBALDI
GRECO - SEGNANO
DE ANGELI - MONTE ROSA
FARINI
BRUZZANO
QT 8
STEPHENSON
CANTALUPA
QUINTOSOLE
PARCO SEMPIONE
BARONA
VILLAPIZZONE - CAGNOLA - BOLDINASCO
PARCO BOSCO IN CITTA'
GORLA - PRECOTTO
NIGUARDA - CA' GRANDA - PRATO CENTENARO - Q.RE FULVIO TESTI
TRIULZO SUPERIORE
PTA ROMANA
TALIEDO - MORSENCHIO - Q.RE FORLANINI
PORTA TICINESE - CONCA DEL NAVIGLIO
TRE TORRI
ASSIANO
MORIVIONE
VIGENTINO - Q.RE FATIMA
BICOCCA
ORTOMERCATO
LODI - CORVETTO
MUGGIANO
PORTA TICINESE - CONCHETTA
UMBRIA - MOLISE - CALVAIRATE
ROSERIO
RONCHETTO DELLE RANE
Q.RE GALLARATESE - Q.RE SAN LEONARDO - LAMPUGNANO
MONLUE' - PONTE LAMBRO
PADOVA - TURRO - CRESCENZAGO
GRATOSOGLIO - Q.RE MISSAGLIA - Q.RE TERRAZZE
PORTA MAGENTA
FORZE ARMATE
GHISOLFA
CHIARAVALLE
PARCO DELLE ABBAZIE
MACI

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
233,ADRIANO,45.514093,9.248356,Metro Crescenzago (M2),45.505187,9.248058,Metro Station
234,LORENTEGGIO,45.451353,9.119323,Metro Bisceglie (M1),45.455345,9.113312,Metro Station
235,LORENTEGGIO,45.451353,9.119323,Bisceglie Bus Station,45.455163,9.113337,Bus Station
236,LORENTEGGIO,45.451353,9.119323,Metro Inganni (M1),45.457262,9.121497,Metro Station
237,LORENTEGGIO,45.451353,9.119323,Edicola Metro Bisceglie,45.455623,9.113303,Newsstand


In [103]:
milan_metro.shape

(238, 7)

Then I create a second call for Arts venues:

In [102]:
milan_arts = getNearbyVenues(names=df_milan['NIL'], latitudes=df_milan['Latitude'], longitudes=df_milan['Longitude'], categoryId='4d4b7104d754a06370d81259', radius=1000)
milan_arts.head()

RONCHETTO SUL NAVIGLIO - Q.RE LODOVICO IL MORO
TRENNO
PORTELLO
BOVISASCA
PARCO NORD
FIGINO
LORETO - CASORETTO - NOLO
QUARTO OGGIARO - VIALBA - MUSOCCO
ISOLA
QUARTO CAGNINO
STADIO - IPPODROMI
QUINTO ROMANO
DUOMO
GUASTALLA
SAN SIRO
COMASINA
TIBALDI
GRECO - SEGNANO
DE ANGELI - MONTE ROSA
FARINI
BRUZZANO
QT 8
STEPHENSON
CANTALUPA
QUINTOSOLE
PARCO SEMPIONE
BARONA
VILLAPIZZONE - CAGNOLA - BOLDINASCO
PARCO BOSCO IN CITTA'
GORLA - PRECOTTO
NIGUARDA - CA' GRANDA - PRATO CENTENARO - Q.RE FULVIO TESTI
TRIULZO SUPERIORE
PTA ROMANA
TALIEDO - MORSENCHIO - Q.RE FORLANINI
PORTA TICINESE - CONCA DEL NAVIGLIO
TRE TORRI
ASSIANO
MORIVIONE
VIGENTINO - Q.RE FATIMA
BICOCCA
ORTOMERCATO
LODI - CORVETTO
MUGGIANO
PORTA TICINESE - CONCHETTA
UMBRIA - MOLISE - CALVAIRATE
ROSERIO
RONCHETTO DELLE RANE
Q.RE GALLARATESE - Q.RE SAN LEONARDO - LAMPUGNANO
MONLUE' - PONTE LAMBRO
PADOVA - TURRO - CRESCENZAGO
GRATOSOGLIO - Q.RE MISSAGLIA - Q.RE TERRAZZE
PORTA MAGENTA
FORZE ARMATE
GHISOLFA
CHIARAVALLE
PARCO DELLE ABBAZIE
MACI

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,RONCHETTO SUL NAVIGLIO - Q.RE LODOVICO IL MORO,45.43846,9.13726,FE Fabbrica Esperienza,45.442009,9.140764,Theater
1,RONCHETTO SUL NAVIGLIO - Q.RE LODOVICO IL MORO,45.43846,9.13726,Scuola di danza Arteka ASD,45.442993,9.139563,Dance Studio
2,RONCHETTO SUL NAVIGLIO - Q.RE LODOVICO IL MORO,45.43846,9.13726,Circo delle Pulci,45.443317,9.137659,Public Art
3,RONCHETTO SUL NAVIGLIO - Q.RE LODOVICO IL MORO,45.43846,9.13726,Connecting Cultures,45.434962,9.131393,Arts & Entertainment
4,RONCHETTO SUL NAVIGLIO - Q.RE LODOVICO IL MORO,45.43846,9.13726,Silverwhood,45.443533,9.129999,Music Venue


In [104]:
milan_arts.shape

(763, 7)

Finally I create a unique database for all the venue categories I want to analyze:

In [105]:
milan_venues = milan_metro.append(milan_arts, ignore_index=True)
milan_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,TRENNO,45.492822,9.101675,Metro Bonola (M1),45.496913,9.11045,Metro Station
1,TRENNO,45.492822,9.101675,Metro San Leonardo (M1),45.501083,9.1014,Metro Station
2,PORTELLO,45.48449,9.153947,Metro Portello (M5),45.481807,9.150506,Metro Station
3,PORTELLO,45.48449,9.153947,Metro Tre Torri (M5),45.477957,9.156873,Metro Station
4,PORTELLO,45.48449,9.153947,Metro Domodossola FN (M5),45.48186,9.162487,Metro Station


In [106]:
milan_venues.shape

(1001, 7)

In [122]:
if milan_venues['Neighborhood'].count() == milan_arts['Neighborhood'].count() + milan_metro['Neighborhood'].count():
    print('Database append done correctly'),
else:
    print('Check and try again')

Database append done correctly


*Thank you*