# Capstone Project - The Battle of Neighborhoods

## Introduction

#### "How to select the best home location in Milan?"  
My customer asked me to find a suitable place to by a home in Milan. He wants to select the best place to live, according to the nearby venues. In particular, he wants to select the place that offers leisure time possibilities, such as Cinemas, theaters, shops and restaurants and transportation services, such as Metro, Tram and Bus stations. He has a clear idea on which are his priorities and he asked me to analyze the neighborhood for each of the five home location candidates. 

#### Data section
The data available for fulfilling the task are:
 - Milan city centre coordinates and the radius from the centre to determine the searching zone
 - the list of the venue categories that are important for the neighborhood analysis
 - the weight that every venue category has in the decision

#### Methodology
This section describes the steps done to find the best home location. Basically the venues around Milan city center are identified. Then a kmeans clustering algorithm is used to locate the centroid of the obtained clusters. At the end the best cluster is chosen and the final best position is identified.

In [27]:
# Import necessary library
import json
import pandas as pd
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import numpy as np
import sklearn.utils
from sklearn.preprocessing import StandardScaler
sklearn.utils.check_random_state(1000)
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5 --yes
import folium

# Install FourSquare client library
!pip install foursquare
import foursquare

Solving environment: done

# All requested packages already installed.

[31mdistributed 1.21.8 requires msgpack, which is not installed.[0m


In [28]:
# Input data
radius = 1000
milan_coordinates = [45.464919, 9.186645]

# the weights are chosen according to the customer preferences. The higher, the most important
weigth = {
    'Food': 1,
    'Shop & Service': 1.1,
    'Bus Stop': 1,
    'Metro Station': 1.2,
    'Cinema' : .9,
    'Theater' : .9,
    'Bus stop' : 1,
    'Tram' : .9
}

In [29]:
# The categories to be used to find the relevant venues
fs_categories = {
    'Food': '4d4b7105d754a06374d81259',
    'Shop & Service': '4d4b7105d754a06378d81259',
    'Bus Stop': '52f2ab2ebcbc57f1066b8b4f',
    'Metro Station': '4bf58dd8d48988d1fd931735',
    'Cinema' : '4bf58dd8d48988d17f941735',
    'Theater' : '4bf58dd8d48988d1f2931735',
    'Tram' : '52f2ab2ebcbc57f1066b8b51'
}
', '.join([ cat for cat in fs_categories])

'Food, Shop & Service, Bus Stop, Metro Station, Cinema, Theater, Tram'

In [37]:
# Create connection to Foursquare
fs_client_id='................' #replace with your own id
fs_client_secret='..............' #replace with your own secret

fs = foursquare.Foursquare(client_id=fs_client_id, client_secret=fs_client_secret)

In [31]:
# Define a function to search nearby information and convert the result as dataframe
def venues(latitude, longitude, location, category, radius, verbose=True):    
    results = fs.venues.search(
        params = {
            'query': category, 
            'll': '{},{}'.format(latitude, longitude),
            'radius': radius,
            'categoryId': fs_categories[category],
            'limit' : 50
        }
    )    
    df = json_normalize(results['venues'])
    cols = ['Name','Latitude','Longitude','Tips','Users','Visits']    
    if( len(df) == 0 ):        
        df = pd.DataFrame(columns=cols)
    else:        
        df = df[['name','location.lat','location.lng','stats.tipCount','stats.usersCount','stats.visitsCount']]
        df.columns = cols
    if( verbose ):
        print('{} "{}" venues are found within {}m of location "{}"'.format(len(df), category, radius, location))
    return df

In [32]:
#Search for the venues around milan city center.
frames = []

for categories in fs_categories:
    pd_temp = venues(milan_coordinates[0],milan_coordinates[1], 'Milan', categories,radius)
    pd_temp['Type'] = categories
    frames.append(pd_temp)
    

data = pd.concat(frames)

32 "Food" venues are found within 1000m of location "Milan"
44 "Shop & Service" venues are found within 1000m of location "Milan"
5 "Bus Stop" venues are found within 1000m of location "Milan"
10 "Metro Station" venues are found within 1000m of location "Milan"
17 "Cinema" venues are found within 1000m of location "Milan"
1 "Theater" venues are found within 1000m of location "Milan"
37 "Tram" venues are found within 1000m of location "Milan"


In [33]:
#Clustering using Kmeans

Clus_dataSet = data[['Latitude','Longitude']]

# Number of clusters
n_clusters = 12
kmeans = KMeans(n_clusters)
# Fitting the input data
kmeans = kmeans.fit(Clus_dataSet)
# Getting the cluster labels
labels = kmeans.predict(Clus_dataSet)
# Centroid values
centroids = kmeans.cluster_centers_
data["Cluster"]=labels

In [34]:
data.head()

Unnamed: 0,Name,Latitude,Longitude,Tips,Users,Visits,Type,Cluster
0,Poker Food,45.46228,9.19438,0,0,0,Food,0
1,God Save The Food,45.47015,9.185552,0,0,0,Food,2
2,Love It Real Italian Food,45.459454,9.19028,0,0,0,Food,7
3,Food Good,45.466132,9.186847,0,0,0,Food,3
4,fluid fresh food,45.460037,9.188208,0,0,0,Food,7


In [35]:

map = folium.Map(location = milan_coordinates, zoom_start = 14)

targets_fg = folium.FeatureGroup()
centroids_fg = folium.FeatureGroup()

colors = ['red', 'blue', 'green', 'purple', 'orange', 'darkred','lightred', 'beige', 'darkblue', 'darkgreen', 'cadetblue', 'darkpurple', 'white', 'pink', 'lightblue', 'lightgreen', 'gray', 'black', 'lightgray']

colors = colors + colors

for(name, latitude, longitude, tips, users, visits, Type, cluster) in data.itertuples(index=False):
    targets_fg.add_child(
        folium.features.CircleMarker(
            location=(latitude, longitude),
            #popup=location,
            radius=5,
            fill=True,
            color=colors[cluster],
            fill_opacity=1
        )
    )
    
    
for i in range(len(centroids)):
    folium.Marker([centroids[i,0], centroids[i,1]],icon=folium.Icon(color=colors[i])).add_to(map)

                                                                                             
map.add_child(targets_fg)
map.add_child(centroids_fg)

In [36]:
# find the best cluster according to the preferences
res = [0] * n_clusters


for(name, latitude, longitude, tips, users, visits, Type, cluster) in data.itertuples(index=False):
    res[cluster] = res[cluster] + weigth[Type] #weighted sum of every venue n the cluster
        
print(res)

print(res.index(max(res))) #best cluster
print(centroids[res.index(max(res))]) #best cluster coordinates

[7.000000000000001, 6.299999999999999, 15.2, 19.299999999999997, 15.2, 21.19999999999999, 8.4, 14.1, 18.199999999999996, 4.2, 5.9, 11.900000000000002]
5
[45.46585786  9.19044228]


## Result

According to the analysis done, the best place is highlighted n the map

In [39]:
#Show the best place
map = folium.Map(location = [centroids[res.index(max(res)),0], centroids[res.index(max(res)),1]], zoom_start = 15)

targets_fg = folium.FeatureGroup()
centroids_fg = folium.FeatureGroup()

colors = ['red', 'blue', 'green', 'purple', 'orange', 'darkred','lightred', 'beige', 'darkblue', 'darkgreen', 'cadetblue', 'darkpurple', 'white', 'pink', 'lightblue', 'lightgreen', 'gray', 'black', 'lightgray']

colors = colors + colors

for(name, latitude, longitude, tips, users, visits, Type, cluster) in data.itertuples(index=False):
    targets_fg.add_child(
        folium.features.CircleMarker(
            location=(latitude, longitude),
            #popup=location,
            radius=5,
            fill=True,
            color=colors[cluster],
            fill_opacity=1
        )
    )
    
folium.Marker([centroids[res.index(max(res)),0], centroids[res.index(max(res)),1]],icon=folium.Icon(color=colors[res.index(max(res))])).add_to(map)

                                                                                             
map.add_child(targets_fg)
map.add_child(centroids_fg)

## Discussion

We were able to find the best place according to the algorithm chosen. There are however some critical points. First of all, 12 cluster groups were chosen. This number was obtained empirically, since the use of train and data set to select the right k value is meaningless in this case. If this number is changed also the position of the best place will also change. We did some experiments, but at the end the position is more or less in the same zone.
In addition o this, how weights are chosen plays a role in the final position.
It is however to be said that the methodology used is able to find the best position in the city because it select the center of a zone full of venues. Even changing the number of clusters and the weigh of each venue, the result will be quite close to the position obtained as the best.

## Conclusion

Using the information obtained from Foursquare and using machine learning alorithm, we were able to identify the best place to buy a Home in Milan.