# Capstone Project - The Battle of Neighborhoods

### Table of contents
___
1. *Problem Statement*
2. *Data description*
3. *Methodology*
4. *Results*
5. *Discussion and Coclusions*

### 1. Problem Statement 
___
I work for a global mulinational located in Milan, IT but due to COVID I left my apartment. Probably, with the vaccination campaigns in place, we will return to the office. Therefore, I'm looking for a new apartment in Milan and I would like to use this opportunity to practice my learnings in Coursera, particularly with Foursquare, in order to answer the relevant questions arisen.
The key question that I would address whitin this project is the following: How can I find a convenient and enjoyable place that fit my interests?
In order to make a comparison and an evaluation of the rental option, below is a list of some "constraints" based on what I am looking for:
- Apartment must be one or two room flat
- Desired location is near a metro station within 1km radius
- Price of rent not exceed €1,200 per month
- Nice to have venues such as gym, food shops and restaurants

Finding an apartment in Milan is always an hard job, especially for one and two rooms flats. Therefore, I believe that this work could be useful first for helping me to find a solution and, in general, for anyone moving to other large city in Italy. 

### 2. Data description 
___
To empirically investigate the research question identified in this study, the following data is required:
- List of Boroughs and Neighborhoods of Milan with their geodata (latitude and longitude)
- List of Subway metro stations in Milan with their address location
- List of apartments for rent in Milan including their price
- List of venues for each Milan neighborhood

To retrieve the list of boroughs and neighborhoods of Milan, the Wikipedia page (URL: https://en.wikipedia.org/wiki/Municipalities_of_Milan) will be used, scraping through the python library BeautifulSoup. To get the list of Subway metro stations in Milan, the CKAN Data API will be used to query the open data provided by the Government. A detailed view of the dataset used in this study can be found at the following [link](https://dati.comune.milano.it/dataset/ds535_atm-fermate-linee-metropolitane). To fetch apartments for rent, including their price, a real estate API will be used. Finally, venues will be collected via the Foursquare API.

#### 2.1 List of Boroughs and Neighborhoods of Milan with their geodata (latitude and longitude)

In [16]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

url= "https://en.wikipedia.org/wiki/Municipalities_of_Milan"
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')

table_contents = []
table = soup.find("table",{"class":"wikitable sortable"})
table_body = table.find('tbody')

rows = table_body.find_all('tr')

for row in rows:
    cell = {}
    cols = row.findAll('td')
    cols = [ele.text.strip() for ele in cols]
    if cols:
        cell['Borough'] = cols[0]
        cell['BoroughName'] = cols[1]
        cell['Neighborhood'] = cols[5]
        table_contents.append(cell)

milan_neighborhood=pd.DataFrame(table_contents)
milan_neighborhood['Borough'] = milan_neighborhood['Borough'].astype(int)
milan_neighborhood.head()

Unnamed: 0,Borough,BoroughName,Neighborhood
0,1,Centro storico,"Brera, Centro Storico, Conca del Naviglio, Gua..."
1,2,"Stazione Centrale, Gorla, Turro, Greco, Cresce...","Adriano, Crescenzago, Gorla, Greco, Loreto, Ma..."
2,3,"Città Studi, Lambrate, Porta Venezia","Casoretto, Cimiano, Città Studi, Dosso, Lambra..."
3,4,"Porta Vittoria, Forlanini","Acquabella, Calvairate, Castagnedo, Cavriano, ..."
4,5,"Vigentino, Chiaravalle, Gratosoglio","Basmetto, Cantalupa, Case Nuove, Chiaravalle, ..."


In [17]:
# The code was removed by Watson Studio for sharing.

Load local csv file containing latitude and longitude for each borough


Unnamed: 0,Borough,Latitude,Longitude
0,1,45.465362,9.188748
1,2,45.492814,9.203981
2,3,45.481547,9.218666
3,4,45.440969,9.217621
4,5,45.445495,9.183412


In [18]:
milan_neighborhood = pd.merge(milan_neighborhood, geo_coordinates, on='Borough')
milan_neighborhood.head()

Unnamed: 0,Borough,BoroughName,Neighborhood,Latitude,Longitude
0,1,Centro storico,"Brera, Centro Storico, Conca del Naviglio, Gua...",45.465362,9.188748
1,2,"Stazione Centrale, Gorla, Turro, Greco, Cresce...","Adriano, Crescenzago, Gorla, Greco, Loreto, Ma...",45.492814,9.203981
2,3,"Città Studi, Lambrate, Porta Venezia","Casoretto, Cimiano, Città Studi, Dosso, Lambra...",45.481547,9.218666
3,4,"Porta Vittoria, Forlanini","Acquabella, Calvairate, Castagnedo, Cavriano, ...",45.440969,9.217621
4,5,"Vigentino, Chiaravalle, Gratosoglio","Basmetto, Cantalupa, Case Nuove, Chiaravalle, ...",45.445495,9.183412


In [21]:
# Create map of Miln using latitude and longitude values
# !conda install -c conda-forge folium=0.5.0 --yes
import folium
from branca.element import Figure

latitude = 45.46993590738357
longitude = 9.189059689797839
map_milan = folium.Map(location=[latitude, longitude],
                       zoom_start=12,
                       width='55%',
                       height='55%')

# Add markers to map
for lat, lng, borough, neighborhood in zip(milan_neighborhood['Latitude'], milan_neighborhood['Longitude'], milan_neighborhood['Borough'], milan_neighborhood['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        fill=True,
        fill_color='#FFFFFF',
        fill_opacity=0.7,
        parse_html=False).add_to(map_milan)

map_milan

#### 2.2 List of Subway metro stations in Milan with their address location

In [15]:
import requests
import pandas as pd

req = requests.get('https://dati.comune.milano.it/api/3/action/datastore_search?resource_id=0f4d4d05-b379-45a4-9a10-412a34708484').json()

milan_subway = pd.DataFrame(req['result']['records'])

#Drop first column
milan_subway = milan_subway.drop(['_id', 'Location'], 1)

#Rename columns
milan_subway.rename(columns={'id_amat':'id', 'nome':'Name', 'linee':'Lines', 'LONG_X_4326':'Longitude', 'LAT_Y_4326':'Latitude'}, inplace=True)

milan_subway

Unnamed: 0,id,Name,Lines,Longitude,Latitude
0,889,TRE TORRI,5,9.156675,45.478140
1,890,ZARA,35,9.192601,45.492664
2,891,WAGNER,1,9.155914,45.467950
3,892,VIMODRONE,2,9.285989,45.515783
4,893,VILLA S.G.,1,9.226130,45.517455
...,...,...,...,...,...
95,985,CADORNA FN M1,1,9.176426,45.468195
96,986,BUSSERO,2,9.375897,45.525302
97,987,BUONARROTI,1,9.155292,45.470402
98,988,BRENTA,3,9.218481,45.442811


#### 2.3 List of apartments for rent in Milan including their price
In order to retrieve apartments for rent in Milan, the Idealista API is used. Since the API requires the authotization (i.e., API key and secret number) the following cell will be hided. Then, the search API will be used to query the API. Therefore, a list of 100 apartments in Milan is retrieved.

In [None]:
# The code was removed by Watson Studio for sharing.

In [None]:
# The code was removed by Watson Studio for sharing.

In [None]:
# Creating Pandas DataFrame from Json response content
import json
import pandas as pd
json_data = json.loads(res.text)
apartments_for_rent = pd.json_normalize(json_data['elementList'])

In [None]:
# Since Idealista API search licence is academic use only, here the dataframe obtained is stored as csv into project folder. 
from project_lib import Project
project = Project(None,"4e4b6114-5f23-4afc-ab45-e5b3bef37191","p-3cac1c4256e68635a057c1a3d962ddecb0743438")
project.save_data(file_name = "Milan_Apartments.csv",data = apartments_for_rent.to_csv(index=False))

In [22]:
# The code was removed by Watson Studio for sharing.

Load local csv file containing the Milan apartments for rent.


Unnamed: 0,propertyCode,thumbnail,numPhotos,floor,price,propertyType,operation,size,exterior,rooms,...,has360,hasStaging,topNewDevelopment,detailedType.typology,detailedType.subTypology,suggestedTexts.subtitle,suggestedTexts.title,externalReference,parkingSpace.hasParkingSpace,parkingSpace.isParkingSpaceIncludedInPrice
0,19768062,https://img3.idealista.it/blur/WEB_LISTING/0/i...,10,5,900.0,studio,rent,55.0,False,1,...,False,False,False,flat,studio,"Vetra-Missori, Milano","Monolocale in Via DELL'UNIONE, 8",,,
1,22804953,https://img3.idealista.it/blur/WEB_LISTING/0/i...,19,en,1200.0,studio,rent,45.0,False,1,...,False,False,False,flat,studio,"Zona Sant'Ambrogio-Università Cattolica, Milano",Monolocale in Via San Maurilio,gvr1375 - Via San Maurilio,,
2,20881977,https://img3.idealista.it/blur/WEB_LISTING/0/i...,20,2,1150.0,studio,rent,40.0,False,1,...,False,False,False,flat,studio,"Vittorio Emanuele-Augusto, Milano","Monolocale in Corso Vittorio Emanuele II, 2",116,,
3,20674000,https://img3.idealista.it/blur/WEB_LISTING/0/i...,5,2,1000.0,studio,rent,25.0,False,1,...,False,False,False,flat,studio,"Vittorio Emanuele-Augusto, Milano",Monolocale in Via Agnello,21/002 Agnello,,
4,21719294,https://img3.idealista.it/blur/WEB_LISTING/0/i...,10,2,1200.0,studio,rent,40.0,False,1,...,False,False,False,flat,studio,"Vittorio Emanuele-Augusto, Milano",Monolocale in Corso Vittorio Emanuele II,11572,,


#### 2.4 List of venues for each Milan neighborhood
Let's retreive Milan's venues by leveraging on Foursquare search API. The 'getNearbyVenues' function will retrieve a list of venues within a 1km radius from each Borough.  Since Foursquare takes the CLIENT_ID as well as the CLIEN_SECRET number, the following cell will be hidden. 

In [23]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        # print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [24]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Brera, Centro Storico, Conca del Naviglio, Gua...",45.465362,9.188748,Piazza del Duomo,45.46419,9.189527,Plaza
1,"Brera, Centro Storico, Conca del Naviglio, Gua...",45.465362,9.188748,Galleria Vittorio Emanuele II,45.465276,9.190043,Monument / Landmark
2,"Brera, Centro Storico, Conca del Naviglio, Gua...",45.465362,9.188748,Room Mate Giulia Hotel,45.46525,9.189396,Hotel
3,"Brera, Centro Storico, Conca del Naviglio, Gua...",45.465362,9.188748,Teatro alla Scala,45.467027,9.189686,Opera House
4,"Brera, Centro Storico, Conca del Naviglio, Gua...",45.465362,9.188748,Park Hyatt Milan,45.465532,9.188911,Hotel


### 3. Methodolody
___


### 4. Results
___

### 5. Discussion and Conclusions
___