# Capstone Project - The Battle of Neighborhoods

### Table of contents
___
1. *Problem Statement*
2. *Data description*
3. *Methodology*
4. *Results*
5. *Discussion and Coclusions*

### 1. Problem Statement 
___
I work for a global mulinational located in Milan, IT but due to COVID I left my apartment. Probably, with the vaccination campaigns in place, we will return to the office. Therefore, I'm looking for a new apartment in Milan and I would like to use this opportunity to practice my learnings in Coursera, particularly with Foursquare, in order to answer the relevant questions arisen.
The key question that I would address whitin this project is the following: How can I find a convenient and enjoyable place that fit my interests?
In order to make a comparison and an evaluation of the rental option, below is a list of some "constraints" based on what I am looking for:
- Apartment must be one or two room flat
- Desired location is near a metro station within 500mt radius
- Price of rent not exceed €1,200 per month
- Nice to have venues such as gym, food shops and restaurants

Finding an apartment in Milan is always an hard job, especially for one and two rooms flats. Therefore, I believe that this work could be useful first for helping me to find a solution and, in general, for anyone moving to other large city in Italy. 

### 2. Data description 
___
To empirically investigate the research question identified in this study, the following data is required:
- List of Boroughs and Neighborhoods of Milan with their geodata (latitude and longitude)
- List of Subway metro stations in Milan with their address location
- List of apartments for rent in Milan including their price
- List of venues for each Milan neighborhood

To retrieve the list of boroughs and neighborhoods of Milan, the Wikipedia page (URL: https://en.wikipedia.org/wiki/Municipalities_of_Milan) will be used, scraping through the python library BeautifulSoup. To get the list of Subway metro stations in Milan, the CKAN Data API will be used to query the open data provided by the Government. A detailed view of the dataset used in this study can be found at the following [link](https://dati.comune.milano.it/dataset/ds535_atm-fermate-linee-metropolitane). To fetch apartments for rent, including their price, a real estate API will be used. Finally, venues will be collected via the Foursquare API.

#### 2.1 List of Boroughs and Neighborhoods of Milan with their geodata (latitude and longitude)

In [1]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

url= "https://en.wikipedia.org/wiki/Municipalities_of_Milan"
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')

table_contents = []
table = soup.find("table",{"class":"wikitable sortable"})
table_body = table.find('tbody')

rows = table_body.find_all('tr')

for row in rows:
    cell = {}
    cols = row.findAll('td')
    cols = [ele.text.strip() for ele in cols]
    if cols:
        cell['Borough'] = cols[0]
        cell['BoroughName'] = cols[1]
        cell['Neighborhood'] = cols[5]
        table_contents.append(cell)

milan_neighborhood=pd.DataFrame(table_contents)
milan_neighborhood['Borough'] = milan_neighborhood['Borough'].astype(int)
milan_neighborhood.head()

Unnamed: 0,Borough,BoroughName,Neighborhood
0,1,Centro storico,"Brera, Centro Storico, Conca del Naviglio, Gua..."
1,2,"Stazione Centrale, Gorla, Turro, Greco, Cresce...","Adriano, Crescenzago, Gorla, Greco, Loreto, Ma..."
2,3,"Città Studi, Lambrate, Porta Venezia","Casoretto, Cimiano, Città Studi, Dosso, Lambra..."
3,4,"Porta Vittoria, Forlanini","Acquabella, Calvairate, Castagnedo, Cavriano, ..."
4,5,"Vigentino, Chiaravalle, Gratosoglio","Basmetto, Cantalupa, Case Nuove, Chiaravalle, ..."


In [2]:
# The code was removed by Watson Studio for sharing.

Load local csv file containing latitude and longitude for each borough


Unnamed: 0,Borough,Latitude,Longitude
0,1,45.465362,9.188748
1,2,45.492814,9.203981
2,3,45.481547,9.218666
3,4,45.440969,9.217621
4,5,45.445495,9.183412


In [3]:
milan_neighborhood = pd.merge(milan_neighborhood, geo_coordinates, on='Borough')
milan_neighborhood.head()

Unnamed: 0,Borough,BoroughName,Neighborhood,Latitude,Longitude
0,1,Centro storico,"Brera, Centro Storico, Conca del Naviglio, Gua...",45.465362,9.188748
1,2,"Stazione Centrale, Gorla, Turro, Greco, Cresce...","Adriano, Crescenzago, Gorla, Greco, Loreto, Ma...",45.492814,9.203981
2,3,"Città Studi, Lambrate, Porta Venezia","Casoretto, Cimiano, Città Studi, Dosso, Lambra...",45.481547,9.218666
3,4,"Porta Vittoria, Forlanini","Acquabella, Calvairate, Castagnedo, Cavriano, ...",45.440969,9.217621
4,5,"Vigentino, Chiaravalle, Gratosoglio","Basmetto, Cantalupa, Case Nuove, Chiaravalle, ...",45.445495,9.183412


In [6]:
# Create map of Miln using latitude and longitude values
# !conda install -c conda-forge folium=0.5.0 --yes
import folium
from branca.element import Figure

latitude = 45.46993590738357
longitude = 9.189059689797839
map_milan = folium.Map(location=[latitude, longitude],
                       zoom_start=12)

# Add markers to map
for lat, lng, borough, neighborhood in zip(milan_neighborhood['Latitude'], milan_neighborhood['Longitude'], milan_neighborhood['Borough'], milan_neighborhood['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        fill=True,
        fill_color='#FFFFFF',
        fill_opacity=0.7,
        parse_html=False).add_to(map_milan)

fig = Figure(width=650, height=350)
fig.add_child(map_milan)

#### 2.2 List of Subway metro stations in Milan with their address location

In [7]:
import requests
import pandas as pd

req = requests.get('https://dati.comune.milano.it/api/3/action/datastore_search?resource_id=0f4d4d05-b379-45a4-9a10-412a34708484').json()

milan_subway = pd.DataFrame(req['result']['records'])

#Drop first column
milan_subway = milan_subway.drop(['_id', 'Location'], 1)

#Rename columns
milan_subway.rename(columns={'id_amat':'id', 'nome':'Name', 'linee':'Lines', 'LONG_X_4326':'Longitude', 'LAT_Y_4326':'Latitude'}, inplace=True)

milan_subway.head()

Unnamed: 0,id,Name,Lines,Longitude,Latitude
0,889,TRE TORRI,5,9.156675,45.47814
1,890,ZARA,35,9.192601,45.492664
2,891,WAGNER,1,9.155914,45.46795
3,892,VIMODRONE,2,9.285989,45.515783
4,893,VILLA S.G.,1,9.22613,45.517455


#### 2.3 List of apartments for rent in Milan including their price
In order to retrieve apartments for rent in Milan, the Idealista API is used. Since the API requires the authotization (i.e., API key and secret number) the following cell will be hided. Then, the search API will be used to query the API. Therefore, a list of 100 apartments in Milan is retrieved.

In [53]:
# The code was removed by Watson Studio for sharing.

{"access_token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzY29wZSI6WyJyZWFkIl0sImV4cCI6MTYzNjQzNzc3MiwiYXV0aG9yaXRpZXMiOlsiUk9MRV9QVUJMSUMiXSwianRpIjoiMjYwNWQwZTQtMTBkZS00YWIyLWI2YzYtN2JjZDMwODFiMjRkIiwiY2xpZW50X2lkIjoiczk4ZGZ0MWJxcG4wdXhiNDJtczF5ZDczeGF1emxpMXMifQ.ETzj52ep73xA7TTy4NT0iOfJUDdPdlPYFoB9S13-foM","token_type":"bearer","expires_in":43199,"scope":"read","jti":"2605d0e4-10de-4ab2-b6c6-7bcd3081b24d"}

In [63]:
# The code was removed by Watson Studio for sharing.

<Response [200]>

In [64]:
# Creating Pandas DataFrame from Json response content
import json
import pandas as pd
json_data = json.loads(res.text)
apartments_for_rent = pd.json_normalize(json_data['elementList'])

Unnamed: 0,propertyCode,thumbnail,externalReference,numPhotos,floor,price,propertyType,operation,size,exterior,...,has360,hasStaging,topNewDevelopment,detailedType.typology,detailedType.subTypology,suggestedTexts.subtitle,suggestedTexts.title,parkingSpace.hasParkingSpace,parkingSpace.isParkingSpaceIncludedInPrice,parkingSpace.parkingSpacePrice
0,22804953,https://img3.idealista.it/blur/WEB_LISTING/0/i...,gvr1375 - Via San Maurilio,19,en,1200.0,studio,rent,45.0,False,...,False,False,False,flat,studio,"Zona Sant'Ambrogio-Università Cattolica, Milano",Monolocale in Via San Maurilio,,,
1,22550241,https://img3.idealista.it/blur/WEB_LISTING/0/i...,201,21,7,950.0,studio,rent,35.0,False,...,False,False,False,flat,studio,"Vittorio Emanuele-Augusto, Milano","Monolocale in Via Larga, 10",,,
2,22361401,https://img3.idealista.it/blur/WEB_LISTING/0/i...,MRV5,17,5,900.0,studio,rent,35.0,False,...,False,False,False,flat,studio,"Duomo-Castello, Milano",Monolocale in Via Meravigli s.c.n,,,
3,21289384,https://img3.idealista.it/blur/WEB_LISTING/0/i...,8067RA64868,17,1,1200.0,studio,rent,37.0,False,...,False,False,False,flat,studio,"Duomo-Castello, Milano",Monolocale in Via Meravigli s.c.n,,,
4,23192443,https://img3.idealista.it/blur/WEB_LISTING/0/i...,Mono Erculea,12,5,790.0,studio,rent,41.0,False,...,False,False,False,flat,studio,"Vetra-Missori, Milano",Monolocale in Piazza Erculea,True,False,150.0
5,23214843,https://img3.idealista.it/blur/WEB_LISTING/0/i...,,4,1,1000.0,studio,rent,32.0,False,...,False,False,False,flat,studio,"Brera-Montenapoleone, Milano","Monolocale in Via dell'Orso, 12",,,
6,19107458,https://img3.idealista.it/blur/WEB_LISTING/0/i...,CUSANI.,22,1,950.0,studio,rent,35.0,False,...,False,False,False,flat,studio,"Duomo-Castello, Milano","Monolocale in cusani, 10",,,
7,19383423,https://img3.idealista.it/blur/WEB_LISTING/0/i...,,15,2,1200.0,studio,rent,40.0,False,...,False,False,False,flat,studio,"Vetra-Missori, Milano","Monolocale in Via Torino, 57",,,
8,20789213,https://img3.idealista.it/blur/WEB_LISTING/0/i...,,10,3,800.0,studio,rent,28.0,False,...,False,False,False,flat,studio,"Brera-Montenapoleone, Milano",Monolocale in san carpoforo,,,
9,22169689,https://img3.idealista.it/blur/WEB_LISTING/0/i...,,15,4,960.0,studio,rent,36.0,False,...,False,False,False,flat,studio,"Zona Sant'Ambrogio-Università Cattolica, Milano",Monolocale in Corso Magenta s.c.n,,,


In [None]:
# Since Idealista API search licence is for academic use only, here the obtained dataframe is stored as csv into project folder. 
from project_lib import Project
project = Project(None,"4e4b6114-5f23-4afc-ab45-e5b3bef37191","p-3cac1c4256e68635a057c1a3d962ddecb0743438")
project.save_data(file_name = "Milan_Apartments.csv",data = apartments_for_rent.to_csv(index=False))

In [8]:
# The code was removed by Watson Studio for sharing.

Load local csv file containing the Milan apartments for rent.


Unnamed: 0,propertyCode,thumbnail,numPhotos,floor,price,propertyType,operation,size,exterior,rooms,...,has360,hasStaging,topNewDevelopment,detailedType.typology,detailedType.subTypology,suggestedTexts.subtitle,suggestedTexts.title,externalReference,parkingSpace.hasParkingSpace,parkingSpace.isParkingSpaceIncludedInPrice
0,19768062,https://img3.idealista.it/blur/WEB_LISTING/0/i...,10,5,900.0,studio,rent,55.0,False,1,...,False,False,False,flat,studio,"Vetra-Missori, Milano","Monolocale in Via DELL'UNIONE, 8",,,
1,22804953,https://img3.idealista.it/blur/WEB_LISTING/0/i...,19,en,1200.0,studio,rent,45.0,False,1,...,False,False,False,flat,studio,"Zona Sant'Ambrogio-Università Cattolica, Milano",Monolocale in Via San Maurilio,gvr1375 - Via San Maurilio,,
2,20881977,https://img3.idealista.it/blur/WEB_LISTING/0/i...,20,2,1150.0,studio,rent,40.0,False,1,...,False,False,False,flat,studio,"Vittorio Emanuele-Augusto, Milano","Monolocale in Corso Vittorio Emanuele II, 2",116,,
3,20674000,https://img3.idealista.it/blur/WEB_LISTING/0/i...,5,2,1000.0,studio,rent,25.0,False,1,...,False,False,False,flat,studio,"Vittorio Emanuele-Augusto, Milano",Monolocale in Via Agnello,21/002 Agnello,,
4,21719294,https://img3.idealista.it/blur/WEB_LISTING/0/i...,10,2,1200.0,studio,rent,40.0,False,1,...,False,False,False,flat,studio,"Vittorio Emanuele-Augusto, Milano",Monolocale in Corso Vittorio Emanuele II,11572,,


#### 2.4 List of venues for each Milan neighborhood
Let's retreive Milan's venues by leveraging on Foursquare search API. The 'getNearbyVenues' function will retrieve a list of venues within a 1km radius from each Borough.  Since Foursquare takes the CLIENT_ID as well as the CLIEN_SECRET number, the following cell will be hidden. 

In [9]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        # print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [10]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Brera, Centro Storico, Conca del Naviglio, Gua...",45.465362,9.188748,Piazza del Duomo,45.46419,9.189527,Plaza
1,"Brera, Centro Storico, Conca del Naviglio, Gua...",45.465362,9.188748,Galleria Vittorio Emanuele II,45.465276,9.190043,Monument / Landmark
2,"Brera, Centro Storico, Conca del Naviglio, Gua...",45.465362,9.188748,Room Mate Giulia Hotel,45.46525,9.189396,Hotel
3,"Brera, Centro Storico, Conca del Naviglio, Gua...",45.465362,9.188748,Starbucks Reserve Roastery,45.46492,9.186153,Coffee Shop
4,"Brera, Centro Storico, Conca del Naviglio, Gua...",45.465362,9.188748,Park Hyatt Milan,45.465532,9.188911,Hotel


### 3. Methodolody
___

To address the key points identified in this study, it needs to extrapolate and clusterize venues, grouped by neighbourood, with the aim to identify the best solution that best fit the needs. 

 #### 3.1 Extrapolate venue category by creating dummie columns

In [11]:
# Let's categorize each 'Venue Category'
# one hot encoding
milan_onehot = pd.get_dummies(milan_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
milan_onehot['Neighborhood'] = milan_venues['Neighborhood'] 

# move neighborhood column to the first column
cols = [milan_onehot.columns[-1]] + list(milan_onehot.columns[:-1])
milan_onehot = milan_onehot[cols]

milan_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,African Restaurant,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Bakery,...,Tea Room,Theater,Train Station,Tram Station,Trattoria/Osteria,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Women's Store
0,"Brera, Centro Storico, Conca del Naviglio, Gua...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Brera, Centro Storico, Conca del Naviglio, Gua...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Brera, Centro Storico, Conca del Naviglio, Gua...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Brera, Centro Storico, Conca del Naviglio, Gua...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Brera, Centro Storico, Conca del Naviglio, Gua...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### 3.2 Group rows by Neighborhood and by taking the mean of the frequency of occurrence of each category

In [12]:
milan_grouped = milan_onehot.groupby('Neighborhood').mean().reset_index()
milan_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,African Restaurant,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Bakery,...,Tea Room,Theater,Train Station,Tram Station,Trattoria/Osteria,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Women's Store
0,"Acquabella, Calvairate, Castagnedo, Cavriano, ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.022222,...,0.0,0.022222,0.0,0.011111,0.011111,0.0,0.011111,0.0,0.0,0.0
1,"Adriano, Crescenzago, Gorla, Greco, Loreto, Ma...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,...,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0
2,"Affori, Bicocca, Bovisa, Bovisasca, Bruzzano, ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021053,...,0.0,0.0,0.0,0.021053,0.0,0.0,0.0,0.0,0.0,0.0
3,"Arzaga, Barona, Boffalora, Cascina Bianca, Con...",0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.018519,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Assiano, Baggio, Figino, Fopponino, Forze Arma...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [13]:
# Function returning the most common venues by ordering them 
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### 3.3 Retrieve the top 10 venues for each neighborhood

In [14]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = milan_grouped['Neighborhood']

for ind in np.arange(milan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(milan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Acquabella, Calvairate, Castagnedo, Cavriano, ...",Pizza Place,Italian Restaurant,Café,Supermarket,Plaza,Hotel,Japanese Restaurant,Dessert Shop,Theater,Gym
1,"Adriano, Crescenzago, Gorla, Greco, Loreto, Ma...",Café,Italian Restaurant,Hotel,Seafood Restaurant,Restaurant,Pizza Place,Ice Cream Shop,Bistro,Sushi Restaurant,Beer Bar
2,"Affori, Bicocca, Bovisa, Bovisasca, Bruzzano, ...",Pizza Place,Café,Italian Restaurant,Diner,Supermarket,Plaza,Ice Cream Shop,Japanese Restaurant,Chinese Restaurant,Nightclub
3,"Arzaga, Barona, Boffalora, Cascina Bianca, Con...",Pizza Place,Café,Italian Restaurant,Ice Cream Shop,Food Court,Supermarket,Bus Stop,Restaurant,Japanese Restaurant,Plaza
4,"Assiano, Baggio, Figino, Fopponino, Forze Arma...",Pizza Place,Café,Italian Restaurant,Bus Station,Japanese Restaurant,Bar,Park,Plaza,Dance Studio,Department Store


#### 3.4 Cluster neighborhoods

In [15]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

milan_grouped_clustering = milan_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(milan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([2, 0, 2, 2, 1, 0, 4, 3, 0], dtype=int32)

In [16]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

milan_merged = milan_neighborhood

# merge toronto_merged with toronto_data to add latitude/longitude for each neighborhood
milan_merged = milan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')


milan_merged.head()

Unnamed: 0,Borough,BoroughName,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,Centro storico,"Brera, Centro Storico, Conca del Naviglio, Gua...",45.465362,9.188748,3,Hotel,Plaza,Ice Cream Shop,Italian Restaurant,Boutique,Bakery,Pizza Place,Art Museum,Cocktail Bar,Gourmet Shop
1,2,"Stazione Centrale, Gorla, Turro, Greco, Cresce...","Adriano, Crescenzago, Gorla, Greco, Loreto, Ma...",45.492814,9.203981,0,Café,Italian Restaurant,Hotel,Seafood Restaurant,Restaurant,Pizza Place,Ice Cream Shop,Bistro,Sushi Restaurant,Beer Bar
2,3,"Città Studi, Lambrate, Porta Venezia","Casoretto, Cimiano, Città Studi, Dosso, Lambra...",45.481547,9.218666,0,Italian Restaurant,Café,Chinese Restaurant,Hotel,Pizza Place,Ice Cream Shop,Dessert Shop,Cocktail Bar,Restaurant,Plaza
3,4,"Porta Vittoria, Forlanini","Acquabella, Calvairate, Castagnedo, Cavriano, ...",45.440969,9.217621,2,Pizza Place,Italian Restaurant,Café,Supermarket,Plaza,Hotel,Japanese Restaurant,Dessert Shop,Theater,Gym
4,5,"Vigentino, Chiaravalle, Gratosoglio","Basmetto, Cantalupa, Case Nuove, Chiaravalle, ...",45.445495,9.183412,0,Italian Restaurant,Cocktail Bar,Ice Cream Shop,Pizza Place,Café,Bar,Japanese Restaurant,Lounge,Pub,Restaurant


### 4. Results
___

 #### 4.1 Visualize clusterized borough within the map

In [65]:
import matplotlib.cm as cm
import matplotlib.colors as colors
from branca.element import Figure

# create map
milan_result = folium.Map(location=[latitude, longitude],
                          zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(milan_merged['Latitude'], milan_merged['Longitude'], milan_merged['Neighborhood'], milan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.Circle(
        [lat, lon],
        radius=150,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        weight=1,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(milan_result)
       
fig = Figure(width=650, height=350)
fig.add_child(milan_result)

#### 4.2 Visualize subway metro station within the map

In [66]:
for name, lat, lng in zip(milan_subway['Name'], milan_subway['Latitude'], milan_subway['Longitude']):
    label = 'Station: {}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.Circle(
        [lat, lng],
        radius=500,
        popup=label,
        weight= 1,
        color='#000000',
        fill=True,
        fill_opacity=0.1,
        parse_html=False).add_to(milan_result)

fig = Figure(width=650, height=350)
fig.add_child(milan_result)

#### 4.3 Visualize apartments within the map

In [67]:
for rooms, price, lat, lng in zip(apartments_for_rent['rooms'], apartments_for_rent['price'], apartments_for_rent['latitude'], apartments_for_rent['longitude']):
    label = 'Rooms: {}, Price: {} €/month'.format(rooms, price)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        weight= 1,
        color='#999900',
        fill_color='#CCCC00',
        fill=True,
        parse_html=False).add_to(milan_result)

fig = Figure(width=650, height=350)
fig.add_child(milan_result)

### 5. Discussion and Conclusions
___

In this study a real problem has been identified and addressed in order to practice the knowldege acquired during this course. More specifically, it has been highlighted the need to rent an apartment in Milan that meets the constraints identified during the initial phase of the project. Unfortunately, it has been raised that Idealista API has some limitations due to Academinc user credential that didn't allow to retrieve data properly. Anyway, the results showed that the best solution, in terms of apartments, is between Zone 1 (highlighted with a green point). Indeed, there are a lot of solutions that are close to metro stations as well as venues such as Hotels, Italian Restaurants,	Boutique and so on. The study also highlighted that Zone 2, 3 and 5 belong to the same cluser. This means that they share almost the same venues.
Finally, I would like to remind the reader that this work represents, for sure, a starting point for a more completness study. As future works, it could be useful ask for a complete access to Idealista API or any other Real Estate API. It can alse be useful find a map having the Boroughs area instead of the centroids. 