# The battle of Neighborhoods - Week 2
### Capstone Project - IBM Data Science Professional Certificate

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Results](#results)
* [Discussion and Conclusion](#conclusion)

## 1. Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find the best location to **open a new restaurant in the city of Milan.** 

In Milan there a lot of restaurants, each specialized in a particular cuisine, like Italian, Japanese, Chinese, Indian, Brazilian. Our objective is to **find locations that are not well covered with restaurants**, and  to discover the **type of cuisine which is still missing or is less present in those locations**, in order to have the least competition when opening the new restaurant.

We will analyze the top venues around the potential locations and determine what type of cuisine our restaurant should serve in each location, and **decide the best one based on the number of potential customers** , i.e. how much popular is the type of cuisine in the city and how close to the city center each potential location is. We will **assume the popularity of a certain type of cuisine based on the number of corresponding locations** that serve it and also that **the closer the neighborhood to the city center, the larger the number of potential customers**.

The project may be of interest for potential restaurateurs that want to open a new business in Milan.

## 2. Data <a name="data"></a>

We will take into consideration the following aspects:
* **number of already existing restaurants** around each location/neighborhood;
* **most popular restaurants and their type of cuisine** in each location/neighborhood;
* **distance** of each neighborhood **from the city center**.
    
We will divide the city into neighborhoods, based on a geojson file provided by the municipality's website of the city of Milan, with the coordinates of the neighborhoods already given. Using the coordinates values, we will be able to **determine the center of each neighborhood and its distance from the city center**. 

We will use **Foursquare API** to **retrieve the number of restaurants and their type** in each neighborhood, and  to **get the most popular restaurants** in every location, in order to choose the candidate cuisine for our restaurant.

## 3. Methodology <a name="methodology"></a>

First of all, it is necessary to import and download all the dependencies needed.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda update --all --yes
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         238 KB

The following NEW packages will be INSTALLED:

    geographiclib: 1.50-py_0        conda-forge
    geopy:         1.20.0-py_0      conda-forge

The following packages will be UPDATED:

    certifi:       2019.6.

#### Obtaining the coordinates of Milan using geopy library

In [2]:
address = 'Milano, Italy'

geolocator = Nominatim(user_agent="mi_explorer")
location = geolocator.geocode(address)
milan_latitude = location.latitude
milan_longitude = location.longitude
print('The geograpical coordinates of Milan are {}, {}.'.format(milan_latitude, milan_longitude))

The geograpical coordinates of Milan are 45.4667971, 9.1904984.


#### Defining Foursquare Credentials and Version

In [3]:
# The code was removed by Watson Studio for sharing.

### 3.1 Downloading and exploring the dataset

#### Downloading the data

In [4]:
!wget -q -O 'milan_data.geojson' http://dati.comune.milano.it/dataset/e5a0d956-2eff-454d-b0ea-659cb7b55c0b/resource/af78bd3f-ea45-403a-8882-91cca05087f0/download/nilzone.geojson
print('Data downloaded')

Data downloaded


#### Loading the data

In [5]:
with open('milan_data.geojson') as geojson_data:
    milan_data = json.load(geojson_data)
    
milan_data

{'type': 'FeatureCollection',
 'name': 'NILZone',
 'crs': {'type': 'name',
  'properties': {'name': 'urn:ogc:def:crs:OGC:1.3:CRS84'}},
 'features': [{'type': 'Feature',
   'properties': {'FID_1': 0,
    'FID_1_1': 0,
    'ID_NIL': 74,
    'NIL': 'SACCO',
    'AreaHA': 70.84658,
    'AreaMQ': 708465.80062},
   'geometry': {'type': 'Polygon',
    'coordinates': [[[9.121949239998024, 45.51602089945655],
      [9.121632914402111, 45.51589000208662],
      [9.121201704878128, 45.51576930650467],
      [9.120921328228487, 45.515713977012595],
      [9.120109671194136, 45.51610187134214],
      [9.11694857883526, 45.51764961305016],
      [9.116076539087741, 45.51805756729788],
      [9.115917926619067, 45.51813162558219],
      [9.114769833539013, 45.5186701373432],
      [9.114144535098607, 45.519004561714354],
      [9.107672665851712, 45.52260116301071],
      [9.109105399535705, 45.52274886315487],
      [9.108658177797333, 45.52397260422105],
      [9.10896640312444, 45.52404508401885],

All the relevant information are in the key called "features", so let's consider a variable that includes only that part of the data.

In [6]:
neighborhood_data = milan_data['features']
# looking at the first item in the list
neighborhood_data[0]

{'type': 'Feature',
 'properties': {'FID_1': 0,
  'FID_1_1': 0,
  'ID_NIL': 74,
  'NIL': 'SACCO',
  'AreaHA': 70.84658,
  'AreaMQ': 708465.80062},
 'geometry': {'type': 'Polygon',
  'coordinates': [[[9.121949239998024, 45.51602089945655],
    [9.121632914402111, 45.51589000208662],
    [9.121201704878128, 45.51576930650467],
    [9.120921328228487, 45.515713977012595],
    [9.120109671194136, 45.51610187134214],
    [9.11694857883526, 45.51764961305016],
    [9.116076539087741, 45.51805756729788],
    [9.115917926619067, 45.51813162558219],
    [9.114769833539013, 45.5186701373432],
    [9.114144535098607, 45.519004561714354],
    [9.107672665851712, 45.52260116301071],
    [9.109105399535705, 45.52274886315487],
    [9.108658177797333, 45.52397260422105],
    [9.10896640312444, 45.52404508401885],
    [9.109341602752306, 45.523423302582394],
    [9.109803149786279, 45.52309266217421],
    [9.110530049465927, 45.52281851507242],
    [9.111013839126041, 45.52224635334161],
    [9.111578

#### Moving the data into a *pandas* dataframe

In [7]:
# defining the dataframe columns
column_names = ['Neighborhood', 'Latitude', 'Longitude'] 

# instantiating the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

The values of the coordinates of the neighborhoods in the geojson file represent the borders of the neighborhoods themselves. Therefore, as a **first approach** to obtain the approximate coordinates of the centers, we will perform a **spatial mean** among the coordinates of each neighborhood's borders.

In [8]:
from statistics import mean

for row in neighborhood_data:
    neighborhood_name = row['properties']['NIL']
    neighborhood_ll = row['geometry']['coordinates'][0]
    
    latitudes = []
    longitudes = []
    for ll in neighborhood_ll:
        latitudes.append(ll[1])
        longitudes.append(ll[0])
        
    neighborhood_lat = mean(latitudes)
    neighborhood_lon = mean(longitudes)
    
    neighborhoods = neighborhoods.append({'Neighborhood': neighborhood_name, 'Latitude': neighborhood_lat, 'Longitude': neighborhood_lon}, ignore_index=True)

neighborhoods = pd.DataFrame(data = neighborhoods, columns = column_names).reset_index(drop=True)
neighborhoods.head()    

Unnamed: 0,Neighborhood,Latitude,Longitude
0,SACCO,45.521228,9.122654
1,COMASINA,45.526585,9.159486
2,STEPHENSON,45.511943,9.121515
3,QT 8,45.485983,9.137247
4,ORTOMERCATO,45.452505,9.230223


#### Creating a map of Milan with neighborhoods superimposed on top

In [9]:
# create map of Milan using latitude and longitude values 
map_milan = folium.Map(location=[milan_latitude, milan_longitude], zoom_start=12)

map_milan.choropleth(
    geo_data=milan_data,
    key_on='features.properties.NIL',
    fill_color='Red', 
    fill_opacity=0.7, 
)

# add markers to map
for lat, lng, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_milan)  
    
# display map
map_milan

The coordinates of the centers are initially calculated as the spatial mean between all coordinates of each neighborhood. Therefore, **some of the coordinates of the neighborhoods' centers do not respect the neighborhood division**. 

Let's try to get the coordinates of the centers using the **geopy** library instead.

In [10]:
lat = []
long = []

for neighborhood_name in neighborhoods['Neighborhood']:
    
    address = '{}, Milano, Italy'.format(neighborhood_name)
    geolocator = Nominatim(user_agent="mi_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    lat.append(latitude)
    long.append(longitude)

Let's move the coordinates obtained with geopy in a *pandas* dataframe.

In [11]:
neighborhoods2 = pd.DataFrame(columns = column_names)
neighborhoods2['Neighborhood'] = neighborhoods['Neighborhood']
neighborhoods2['Latitude'] = lat
neighborhoods2['Longitude'] = long
neighborhoods2.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,SACCO,45.520365,9.123899
1,COMASINA,45.52693,9.161565
2,STEPHENSON,45.511392,9.122565
3,QT 8,45.48694,9.13666
4,ORTOMERCATO,45.452223,9.232368


#### Displaying the neighborhoods with the new center coordinates calculated

In [12]:
# create map of Milan using latitude and longitude values
map_milan = folium.Map(location=[milan_latitude, milan_longitude], zoom_start=12)

map_milan.choropleth(
    geo_data=milan_data,
    key_on='features.properties.NIL',
    fill_color = 'Green', 
    fill_opacity = 0.3, 
)

# add markers to map
for lat, lng, neighborhood in zip(neighborhoods2['Latitude'], neighborhoods2['Longitude'], neighborhoods2['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False).add_to(map_milan)  

# add Milan coordinates
label = 'Milan Coordinates'
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
    [milan_latitude, milan_longitude],
    radius = 5,
    popup = label,
    color = 'red',
    fill = True,
    fill_opacity = 0.7,
    parse_html = False).add_to(map_milan)

    
# display map
map_milan

These values look more reasonable, even though there are still some center coordinates that do not allow a proper coverage of the area. We can see that in this case the neighborhoods whose centers are not included in the borders are indeed well represented in the case before, where the coordinates were obtained with a spatial mean. Thus, let's **substitute in the dataframe _neighborhoods2_ the coordinates** that don't match the neighborhood division **with the corresponding values in the dataframe _neighborhoods_**.

In [13]:
# list containing neighborhoods whose center coordinates need to be substituted
correction_list = ['PORTELLO', 'LORENTEGGIO', 'S. CRISTOFORO', 'BARONA', 'LORETO', 'VIALE MONZA', 'RIPAMONTI', 'PORTA ROMANA', 'GHISOLFA']

indexes = []
for name in correction_list:
    correction_index = neighborhoods.index[neighborhoods['Neighborhood'] == name].tolist()
    indexes.append(correction_index)

for value in indexes:
    neighborhoods2.replace(neighborhoods2.iloc[value[0]], neighborhoods.iloc[value[0]], inplace=True)  

#### Displaying the neighborhoods with their definitive center coordinates

In [14]:
# create map of Milan using latitude and longitude values
map_milan = folium.Map(location=[milan_latitude, milan_longitude], zoom_start=12)

map_milan.choropleth(
    geo_data=milan_data,
    #data=df_can,
    #columns=['Country', 'Total'],
    key_on='features.properties.NIL',
    fill_color = 'Green', 
    fill_opacity = 0.3, 
    #line_opacity = 0.2
)

# add markers to map
for lat, lng, neighborhood in zip(neighborhoods2['Latitude'], neighborhoods2['Longitude'], neighborhoods2['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False).add_to(map_milan)  

# add Milan coordinates
label = 'Milan Coordinates'
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
    [milan_latitude, milan_longitude],
    radius = 5,
    popup = label,
    color = 'red',
    fill = True,
    fill_opacity = 0.7,
    parse_html = False).add_to(map_milan)

    
# display map
map_milan

#### Calculating the distance from the city center

We are going to assume that the city center is located at the coordinates of Milan given by geopy. We will calculate the distance of each neighborhood from the city center and we will insert the distance values for each neighborhood in a dataframe.

In [15]:
import geopy.distance

# creating a list of the values of distance
center_coords = (milan_latitude, milan_longitude)
distances = []

# filling the list with values
for index, row in neighborhoods2.iterrows():
    coords_1 = (row['Latitude'],row['Longitude'])
    # calculating the distance
    distances.append(geopy.distance.geodesic(coords_1, center_coords).km)

# putting the distances in a new dataframe
neighborhoods3 = pd.DataFrame(columns = ['Neighborhood','Latitude','Longitude','Dist_from_center'])
neighborhoods3['Neighborhood'] = neighborhoods2['Neighborhood']
neighborhoods3['Latitude'] = neighborhoods2['Latitude']
neighborhoods3['Longitude'] = neighborhoods2['Longitude']
neighborhoods3['Dist_from_center'] = distances
neighborhoods3.head(20)

Unnamed: 0,Neighborhood,Latitude,Longitude,Dist_from_center
0,SACCO,45.520365,9.123899,7.908696
1,COMASINA,45.52693,9.161565,7.05548
2,STEPHENSON,45.511392,9.122565,7.264144
3,QT 8,45.48694,9.13666,4.767919
4,ORTOMERCATO,45.452223,9.232368,3.653469
5,MAGGIORE - MUSOCCO,45.505177,9.117462,7.127205
6,PARCO LAMBRO - CIMIANO,45.499372,9.250219,5.908261
7,GALLARATESE,45.496641,9.108251,7.235443
8,S. SIRO,45.4782,9.123964,5.35487
9,GHISOLFA,45.490783,9.163536,3.39863


### 3.2 Using Foursquare

#### Getting the top 100 resturants in each neighborhood 

We explore now the top 100 venues in each neighborhood, setting the radius equal to 1km. We are also going to create a function to put the results given by Foursquare in a _pandas_ dataframe. Since the ***food* category in foursquare includes other places other than restaurants**, like bakeries, bagel shops, and so on, we will make sure that our function **only consider restaurant categories**.

In [16]:
LIMIT = 100

def getvenues(category,radius): # category indicates the type of venues, radius the distance from the location until where to search
    
    venues = []
    
    for index, row in neighborhoods3.iterrows():
        url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        CLIENT_ID, CLIENT_SECRET, VERSION, row['Latitude'], row['Longitude'], category, radius, LIMIT)
        # get request
        results = requests.get(url).json()['response']['groups'][0]['items']
    
        # return relevant information for each nearby venue, selecting only restaurants
        venues.append([(row['Neighborhood'], venue['venue']['name'], venue['venue']['categories'][0]['name']) for venue in results
                      if venue['venue']['categories'][0]['name'].find(" Restaurant")!=-1])
    nearby_venues = pd.DataFrame([item for venue in venues for item in venue])
    nearby_venues.columns = ['Neighborhood','Venue','Venue Category']
    
    return(nearby_venues)

Let's create now the dataframe.

In [17]:
milan_venues = getvenues(category='4d4b7105d754a06374d81259',radius=800) # '4d4b7105d754a06374d81259' = category indicating food in Foursquare
milan_venues.head(20)

Unnamed: 0,Neighborhood,Venue,Venue Category
0,SACCO,Shi So Restaurant Sushi,Japanese Restaurant
1,COMASINA,McDonald's,Fast Food Restaurant
2,COMASINA,McDonald's,Fast Food Restaurant
3,STEPHENSON,Rossopomodoro,Italian Restaurant
4,QT 8,Ristorante Ribot,Italian Restaurant
5,QT 8,Unico Restaurant,Italian Restaurant
6,QT 8,Ristorante Pizzeria Monte Stella,Italian Restaurant
7,QT 8,McDonald's,Fast Food Restaurant
8,QT 8,L'Arca,Italian Restaurant
9,ORTOMERCATO,Oste Italiano,Italian Restaurant


Now let's count how many venues were returned for each neighborhood.

In [18]:
milan_count = milan_venues[['Neighborhood','Venue']]
venues_number = milan_count.groupby('Neighborhood').count().reset_index()
venues_number.columns = ['Neighborhood', 'Number_venues']
venues_number.head()

Unnamed: 0,Neighborhood,Number_venues
0,ADRIANO,3
1,AFFORI,4
2,BAGGIO,3
3,BANDE NERE,12
4,BARONA,1


### 3.3 Determing the recommendation in each neighborhood

#### One hot encoding

In [19]:
# one hot encoding
milan_onehot = pd.get_dummies(milan_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
milan_onehot['Neighborhood'] = milan_venues['Neighborhood'] 

# move neighborhood column to the first column
new_columns = [milan_onehot.columns[-1]] + list(milan_onehot.columns[:-1])
milan_onehot = milan_onehot[new_columns]
milan_onehot.head()

Unnamed: 0,Neighborhood,Abruzzo Restaurant,African Restaurant,American Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Belgian Restaurant,Brazilian Restaurant,Campanian Restaurant,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Doner Restaurant,Eastern European Restaurant,Emilia Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Greek Restaurant,Hawaiian Restaurant,Himalayan Restaurant,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Kosher Restaurant,Latin American Restaurant,Lombard Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Mongolian Restaurant,Moroccan Restaurant,New American Restaurant,Paella Restaurant,Persian Restaurant,Peruvian Restaurant,Piedmontese Restaurant,Puglia Restaurant,Ramen Restaurant,Roman Restaurant,Russian Restaurant,Sardinian Restaurant,Seafood Restaurant,Shabu-Shabu Restaurant,Sicilian Restaurant,South American Restaurant,South Tyrolean Restaurant,Southern / Soul Food Restaurant,Spanish Restaurant,Sushi Restaurant,Szechuan Restaurant,Tapas Restaurant,Thai Restaurant,Turkish Restaurant,Tuscan Restaurant,Umbrian Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,SACCO,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,COMASINA,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,COMASINA,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,STEPHENSON,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,QT 8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Calculating the popularity of each kind of restaurant

Let's determine how much popular is each type of restaurant in Milan, assuming the **popularity as directly proportional to the total number of the different types of restaurant in the city**.

In [20]:
milan_grouped = milan_onehot[milan_onehot.columns[1:]].sum().reset_index()
milan_grouped.columns = ['Category','Popularity']
milan_grouped.head()

Unnamed: 0,Category,Popularity
0,Abruzzo Restaurant,2
1,African Restaurant,7
2,American Restaurant,12
3,Argentinian Restaurant,13
4,Asian Restaurant,42


Let's **sort the dataframe** based on the column 'Popularity', in **descending** order.

In [21]:
milan_grouped.sort_values(by=['Popularity'], ascending=False, inplace=True)
milan_grouped.reset_index(drop=True, inplace=True)
milan_grouped.head(15)

Unnamed: 0,Category,Popularity
0,Italian Restaurant,732
1,Japanese Restaurant,138
2,Seafood Restaurant,111
3,Sushi Restaurant,96
4,Chinese Restaurant,84
5,Asian Restaurant,42
6,Kebab Restaurant,37
7,Indian Restaurant,33
8,Vegetarian / Vegan Restaurant,30
9,Mediterranean Restaurant,25


#### Determining the frequency of occurrence of each type of restaurant in each neighborhood

Let's determine the **frequency of occurrence** of every kind of restaurant in each neighborhood by using the one hot encoding technique.

In [22]:
milan_grouped2 = milan_onehot.groupby('Neighborhood').mean().reset_index()
milan_grouped2.head()

Unnamed: 0,Neighborhood,Abruzzo Restaurant,African Restaurant,American Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Belgian Restaurant,Brazilian Restaurant,Campanian Restaurant,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Doner Restaurant,Eastern European Restaurant,Emilia Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Greek Restaurant,Hawaiian Restaurant,Himalayan Restaurant,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Kosher Restaurant,Latin American Restaurant,Lombard Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Mongolian Restaurant,Moroccan Restaurant,New American Restaurant,Paella Restaurant,Persian Restaurant,Peruvian Restaurant,Piedmontese Restaurant,Puglia Restaurant,Ramen Restaurant,Roman Restaurant,Russian Restaurant,Sardinian Restaurant,Seafood Restaurant,Shabu-Shabu Restaurant,Sicilian Restaurant,South American Restaurant,South Tyrolean Restaurant,Southern / Soul Food Restaurant,Spanish Restaurant,Sushi Restaurant,Szechuan Restaurant,Tapas Restaurant,Thai Restaurant,Turkish Restaurant,Tuscan Restaurant,Umbrian Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,ADRIANO,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,AFFORI,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.75,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,BAGGIO,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.666667,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,BANDE NERE,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.416667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,BARONA,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Displaying each neighborhood along with the first ten most common restaurant types

In order to look for the candidate type of restaurant in each neighborhood, we will **display all neighborhoods along with the top ten most common restaurant types**. Thus, let's create a dataframe with all the information we need.

In [23]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False) 
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = milan_grouped2['Neighborhood']

for ind in np.arange(milan_grouped2.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(milan_grouped2.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head(10)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ADRIANO,Italian Restaurant,Vietnamese Restaurant,Latin American Restaurant,Korean Restaurant,Kebab Restaurant,Japanese Restaurant,Indian Restaurant,Himalayan Restaurant,Hawaiian Restaurant,Greek Restaurant
1,AFFORI,Italian Restaurant,Japanese Restaurant,Vietnamese Restaurant,Latin American Restaurant,Korean Restaurant,Kebab Restaurant,Indian Restaurant,Himalayan Restaurant,Hawaiian Restaurant,Greek Restaurant
2,BAGGIO,Italian Restaurant,Japanese Restaurant,Vietnamese Restaurant,Latin American Restaurant,Korean Restaurant,Kebab Restaurant,Indian Restaurant,Himalayan Restaurant,Hawaiian Restaurant,Greek Restaurant
3,BANDE NERE,Italian Restaurant,Sushi Restaurant,Japanese Restaurant,Seafood Restaurant,Falafel Restaurant,Fast Food Restaurant,German Restaurant,Filipino Restaurant,French Restaurant,Vietnamese Restaurant
4,BARONA,Japanese Restaurant,Vietnamese Restaurant,Latin American Restaurant,Korean Restaurant,Kebab Restaurant,Italian Restaurant,Indian Restaurant,Himalayan Restaurant,Hawaiian Restaurant,Greek Restaurant
5,BICOCCA,Italian Restaurant,Sushi Restaurant,Kebab Restaurant,Seafood Restaurant,Sardinian Restaurant,Vietnamese Restaurant,French Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant
6,BOVISA,Italian Restaurant,Vegetarian / Vegan Restaurant,Fast Food Restaurant,Kebab Restaurant,Doner Restaurant,Sicilian Restaurant,Chinese Restaurant,German Restaurant,Filipino Restaurant,French Restaurant
7,BOVISASCA,Italian Restaurant,Chinese Restaurant,Fast Food Restaurant,Vietnamese Restaurant,Ethiopian Restaurant,Korean Restaurant,Kebab Restaurant,Japanese Restaurant,Indian Restaurant,Himalayan Restaurant
8,BRERA,Italian Restaurant,Japanese Restaurant,Seafood Restaurant,Asian Restaurant,Mediterranean Restaurant,Sushi Restaurant,Modern European Restaurant,Shabu-Shabu Restaurant,French Restaurant,Puglia Restaurant
9,BRUZZANO,Italian Restaurant,Fast Food Restaurant,Vietnamese Restaurant,Latin American Restaurant,Korean Restaurant,Kebab Restaurant,Japanese Restaurant,Indian Restaurant,Himalayan Restaurant,Hawaiian Restaurant


#### Attaining the recommended restaurant type in each neighborhood

Let's determine the candidate type of restaurant in each neighborhood, by looking at the dataframe with the types of restaurant sorted by popularity and **choosing the first element** which is **not contained in the top ten venues** for each neighborhood.

In [24]:
# create a list of recommended
recommended = []

for index, row in neighborhoods_venues_sorted.iterrows():
    new_list = [item for item in milan_grouped['Category'].tolist() if item not in row.tolist()[1:]]
    recommended.append(new_list[0])
    
print(recommended)

['Seafood Restaurant', 'Seafood Restaurant', 'Seafood Restaurant', 'Chinese Restaurant', 'Seafood Restaurant', 'Japanese Restaurant', 'Japanese Restaurant', 'Seafood Restaurant', 'Chinese Restaurant', 'Seafood Restaurant', 'Asian Restaurant', 'Seafood Restaurant', 'Seafood Restaurant', 'Asian Restaurant', 'Seafood Restaurant', 'Sushi Restaurant', 'Seafood Restaurant', 'Asian Restaurant', 'Chinese Restaurant', 'Sushi Restaurant', 'Seafood Restaurant', 'Japanese Restaurant', 'Sushi Restaurant', 'Italian Restaurant', 'Sushi Restaurant', 'Indian Restaurant', 'Seafood Restaurant', 'Asian Restaurant', 'Seafood Restaurant', 'Japanese Restaurant', 'Chinese Restaurant', 'Asian Restaurant', 'Japanese Restaurant', 'Seafood Restaurant', 'Seafood Restaurant', 'Indian Restaurant', 'Asian Restaurant', 'Seafood Restaurant', 'Italian Restaurant', 'Seafood Restaurant', 'Seafood Restaurant', 'Asian Restaurant', 'Japanese Restaurant', 'Seafood Restaurant', 'Japanese Restaurant', 'Seafood Restaurant', 'Jap

Let's add the column 'Recommendation' in the dataframe. 

In [25]:
neighborhoods_venues_sorted['Recommendation'] = recommended
neighborhoods_venues_sorted.head(10)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Recommendation
0,ADRIANO,Italian Restaurant,Vietnamese Restaurant,Latin American Restaurant,Korean Restaurant,Kebab Restaurant,Japanese Restaurant,Indian Restaurant,Himalayan Restaurant,Hawaiian Restaurant,Greek Restaurant,Seafood Restaurant
1,AFFORI,Italian Restaurant,Japanese Restaurant,Vietnamese Restaurant,Latin American Restaurant,Korean Restaurant,Kebab Restaurant,Indian Restaurant,Himalayan Restaurant,Hawaiian Restaurant,Greek Restaurant,Seafood Restaurant
2,BAGGIO,Italian Restaurant,Japanese Restaurant,Vietnamese Restaurant,Latin American Restaurant,Korean Restaurant,Kebab Restaurant,Indian Restaurant,Himalayan Restaurant,Hawaiian Restaurant,Greek Restaurant,Seafood Restaurant
3,BANDE NERE,Italian Restaurant,Sushi Restaurant,Japanese Restaurant,Seafood Restaurant,Falafel Restaurant,Fast Food Restaurant,German Restaurant,Filipino Restaurant,French Restaurant,Vietnamese Restaurant,Chinese Restaurant
4,BARONA,Japanese Restaurant,Vietnamese Restaurant,Latin American Restaurant,Korean Restaurant,Kebab Restaurant,Italian Restaurant,Indian Restaurant,Himalayan Restaurant,Hawaiian Restaurant,Greek Restaurant,Seafood Restaurant
5,BICOCCA,Italian Restaurant,Sushi Restaurant,Kebab Restaurant,Seafood Restaurant,Sardinian Restaurant,Vietnamese Restaurant,French Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Japanese Restaurant
6,BOVISA,Italian Restaurant,Vegetarian / Vegan Restaurant,Fast Food Restaurant,Kebab Restaurant,Doner Restaurant,Sicilian Restaurant,Chinese Restaurant,German Restaurant,Filipino Restaurant,French Restaurant,Japanese Restaurant
7,BOVISASCA,Italian Restaurant,Chinese Restaurant,Fast Food Restaurant,Vietnamese Restaurant,Ethiopian Restaurant,Korean Restaurant,Kebab Restaurant,Japanese Restaurant,Indian Restaurant,Himalayan Restaurant,Seafood Restaurant
8,BRERA,Italian Restaurant,Japanese Restaurant,Seafood Restaurant,Asian Restaurant,Mediterranean Restaurant,Sushi Restaurant,Modern European Restaurant,Shabu-Shabu Restaurant,French Restaurant,Puglia Restaurant,Chinese Restaurant
9,BRUZZANO,Italian Restaurant,Fast Food Restaurant,Vietnamese Restaurant,Latin American Restaurant,Korean Restaurant,Kebab Restaurant,Japanese Restaurant,Indian Restaurant,Himalayan Restaurant,Hawaiian Restaurant,Seafood Restaurant


### 3.4 Selecting the potential neighborhood along with the type of restaurant

To select the neighborhood where to open the restaurant and to choose the type of cuisine, we will focus on two parameters, the distance from the center and the number of existing restaurants. In particular, **the smaller the distance and the fewer the existing restaurants, the better it is**. 

First of all, let's merge the different dataframes created in order to have all the necessary information displayed.

In [26]:
neighborhoods4 = pd.DataFrame.merge(neighborhoods_venues_sorted, neighborhoods3, how='inner', on='Neighborhood')
neighborhoods5 = pd.DataFrame.merge(neighborhoods4, venues_number, how='inner', on='Neighborhood')

num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
# columns = ['Latitude','Longitude']
columns = []
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
        
neighborhoods5.drop(labels=columns, axis=1, inplace=True)
neighborhoods5.head(20)

Unnamed: 0,Neighborhood,Recommendation,Latitude,Longitude,Dist_from_center,Number_venues
0,ADRIANO,Seafood Restaurant,45.513572,9.251202,7.038687,3
1,AFFORI,Seafood Restaurant,45.517029,9.169653,5.815828,4
2,BAGGIO,Seafood Restaurant,45.461384,9.089843,7.894986,3
3,BANDE NERE,Chinese Restaurant,45.461504,9.136484,4.265065,12
4,BARONA,Seafood Restaurant,45.431686,9.155134,4.783389,1
5,BICOCCA,Japanese Restaurant,45.514917,9.211138,5.586139,13
6,BOVISA,Japanese Restaurant,45.50277,9.161264,4.605204,13
7,BOVISASCA,Seafood Restaurant,45.515842,9.153778,6.160506,4
8,BRERA,Chinese Restaurant,45.471519,9.187735,0.567514,56
9,BRUZZANO,Seafood Restaurant,45.527369,9.173292,6.865102,4


From the dataframe above it is not easy to make a decision about the best location for the new restaurant, i.e. it is **difficult to set threshold values for the distance from the center and for the number of existing venues** in order to choose the best neighborhood. For this purpose, it may be helpful to use **clustering** in order to simplify the choice.

#### Clustering neighborhoods

Let's cluster the neighborhoods into **4 clusters**. We will use the *k-means* clustering technique. 

In [27]:
milan_grouped3 = pd.DataFrame.merge(milan_grouped2, venues_number, how='inner', on='Neighborhood')
milan_grouped4 = pd.DataFrame.merge(milan_grouped3, neighborhoods3, how='inner', on='Neighborhood')
milan_grouped4.drop(labels=['Latitude', 'Longitude'], axis=1, inplace=True)

# set the number of clusters
num_clusters = 4

milan_grouped_clustering = milan_grouped4.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=num_clusters, random_state=0).fit(milan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:30]

array([0, 0, 0, 3, 0, 3, 3, 0, 2, 0, 2, 0, 0, 2, 0, 1, 0, 3, 3, 3, 2, 3,
       3, 0, 2, 1, 3, 2, 0, 3], dtype=int32)

Let's add the cluster labels to the previous dataframe.

In [28]:
# add clustering labels
neighborhoods5.insert(0, 'Cluster', kmeans.labels_)
# put cluster column at the end of the column labels
labels = neighborhoods5.columns.tolist()
labels = labels[1:] + ['Cluster']
neighborhoods5 = neighborhoods5[labels]
neighborhoods5.head(15)

Unnamed: 0,Neighborhood,Recommendation,Latitude,Longitude,Dist_from_center,Number_venues,Cluster
0,ADRIANO,Seafood Restaurant,45.513572,9.251202,7.038687,3,0
1,AFFORI,Seafood Restaurant,45.517029,9.169653,5.815828,4,0
2,BAGGIO,Seafood Restaurant,45.461384,9.089843,7.894986,3,0
3,BANDE NERE,Chinese Restaurant,45.461504,9.136484,4.265065,12,3
4,BARONA,Seafood Restaurant,45.431686,9.155134,4.783389,1,0
5,BICOCCA,Japanese Restaurant,45.514917,9.211138,5.586139,13,3
6,BOVISA,Japanese Restaurant,45.50277,9.161264,4.605204,13,3
7,BOVISASCA,Seafood Restaurant,45.515842,9.153778,6.160506,4,0
8,BRERA,Chinese Restaurant,45.471519,9.187735,0.567514,56,2
9,BRUZZANO,Seafood Restaurant,45.527369,9.173292,6.865102,4,0


Let's display the neighborhoods along with their cluster labels.

In [29]:
# create map
map_clusters = folium.Map(location=[milan_latitude, milan_longitude], zoom_start=12)

# set color scheme for the clusters 
x = np.arange(num_clusters)
ys = [i + x + (i*x)**2  for i in range(num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)*2))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(neighborhoods5['Latitude'], neighborhoods5['Longitude'], neighborhoods5['Neighborhood'], neighborhoods5['Cluster']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=3).add_to(map_clusters)
       
map_clusters

#### Examining Clusters

Let's examine each cluster and let's try to **define their unique features**.

#### Cluster 0

In [30]:
neighborhoods5[neighborhoods5['Cluster'] == 0]

Unnamed: 0,Neighborhood,Recommendation,Latitude,Longitude,Dist_from_center,Number_venues,Cluster
0,ADRIANO,Seafood Restaurant,45.513572,9.251202,7.038687,3,0
1,AFFORI,Seafood Restaurant,45.517029,9.169653,5.815828,4,0
2,BAGGIO,Seafood Restaurant,45.461384,9.089843,7.894986,3,0
4,BARONA,Seafood Restaurant,45.431686,9.155134,4.783389,1,0
7,BOVISASCA,Seafood Restaurant,45.515842,9.153778,6.160506,4,0
9,BRUZZANO,Seafood Restaurant,45.527369,9.173292,6.865102,4,0
11,CANTALUPA,Seafood Restaurant,45.421965,9.156848,5.635435,1,0
12,CASCINA TRIULZA - EXPO,Seafood Restaurant,45.523592,9.099068,9.535147,1,0
14,CHIARAVALLE,Seafood Restaurant,45.416697,9.237421,6.669467,1,0
16,COMASINA,Seafood Restaurant,45.52693,9.161565,7.05548,2,0


We can see that Cluster 0 contains those neighborhoods that are **far from the city center and not well covered with restaurants**. If the second option is good for opening the new restaurant, the fact that these neighborhoods are far from the city center, with the closest being 5 km far, is a disadvantage. Furthermore, the recommendations are pretty much the same, due to the fact that there are not many restaurants in the area.

#### Cluster 1

In [31]:
neighborhoods5[neighborhoods5['Cluster'] == 1]

Unnamed: 0,Neighborhood,Recommendation,Latitude,Longitude,Dist_from_center,Number_venues,Cluster
15,CITTA' STUDI,Sushi Restaurant,45.477056,9.226575,3.042737,30,1
25,GHISOLFA,Indian Restaurant,45.490783,9.163536,3.39863,40,1
35,LORETO,Indian Restaurant,45.49051,9.222467,3.632286,43,1
37,MAGENTA - S. VITTORE,Seafood Restaurant,45.464689,9.169665,1.646077,32,1
45,PAGANO,Seafood Restaurant,45.468285,9.1611,2.304995,41,1
54,PORTELLO,Asian Restaurant,45.484139,9.154279,3.425705,42,1
72,TIBALDI,Seafood Restaurant,45.441302,9.180175,2.946347,34,1
75,TRE TORRI,Chinese Restaurant,45.478374,9.155361,3.033995,27,1
77,UMBRIA - MOLISE,Seafood Restaurant,45.453199,9.219422,2.720618,34,1
79,VIGENTINA,Vegetarian / Vegan Restaurant,45.451087,9.191564,1.748021,35,1


Cluster 1 contains neighborhoods that are **moderately near to the city center**, with a **number of exisiting venues greater than Cluster 0**.

#### Cluster 2

In [32]:
neighborhoods5[neighborhoods5['Cluster'] == 2]

Unnamed: 0,Neighborhood,Recommendation,Latitude,Longitude,Dist_from_center,Number_venues,Cluster
8,BRERA,Chinese Restaurant,45.471519,9.187735,0.567514,56,2
10,BUENOS AIRES - VENEZIA,Asian Restaurant,45.477892,9.212902,2.142291,61,2
13,CENTRALE,Asian Restaurant,45.484352,9.203372,2.195455,54,2
20,DUOMO,Seafood Restaurant,45.464138,9.188555,0.332283,54,2
24,GARIBALDI REPUBBLICA,Sushi Restaurant,45.483527,9.189933,1.859916,63,2
27,GIARDINI PORTA VENEZIA,Asian Restaurant,45.474727,9.20075,1.191326,53,2
30,GUASTALLA,Chinese Restaurant,45.458252,9.200023,1.206977,48,2
31,ISOLA,Asian Restaurant,45.487565,9.188972,2.311273,52,2
41,NAVIGLI,Asian Restaurant,45.450176,9.170897,2.400625,61,2
52,PARCO SEMPIONE,Seafood Restaurant,45.473033,9.17697,1.264677,50,2


From what we can see, Cluster 2 includes those neighborhoods that are **close to the city center**, but with a **high number of existing locations**, as one can expect. 

#### Cluster 3

In [33]:
neighborhoods5[neighborhoods5['Cluster'] == 3]

Unnamed: 0,Neighborhood,Recommendation,Latitude,Longitude,Dist_from_center,Number_venues,Cluster
3,BANDE NERE,Chinese Restaurant,45.461504,9.136484,4.265065,12,3
5,BICOCCA,Japanese Restaurant,45.514917,9.211138,5.586139,13,3
6,BOVISA,Japanese Restaurant,45.50277,9.161264,4.605204,13,3
17,CORSICA,Asian Restaurant,45.463909,9.230802,3.168273,13,3
18,DE ANGELI - MONTE ROSA,Chinese Restaurant,45.47613,9.147302,3.533522,23,3
19,DERGANO,Sushi Restaurant,45.502513,9.176784,4.111718,17,3
21,EX OM - MORIVIONE,Japanese Restaurant,45.440539,9.193754,2.929383,18,3
22,FARINI,Sushi Restaurant,45.49365,9.17348,3.267605,19,3
26,GIAMBELLINO,Seafood Restaurant,45.446969,9.137871,4.669172,11,3
29,GRECO,Japanese Restaurant,45.502184,9.211233,4.253921,10,3


In Cluster 3 there are neighborhoods with a **number of existing restaurants which is higher than Cluster 0 but smaller than Cluster 1**. These neighborhoods are **moderately far from the city center**.

## 4. Results <a name="results"></a>

From the clusters, we can see that the **ideal neighborhoods are those in Cluster 1**. These neighborhoods are not as crowded with restaurants as Cluster 2, and they are generally located farther from the city center with respect to Cluster 1, but still within an acceptable value. 

Having said that, let's analyze further this cluster. 

In [34]:
candidates = neighborhoods5[neighborhoods5['Cluster'] == 1]
candidates

Unnamed: 0,Neighborhood,Recommendation,Latitude,Longitude,Dist_from_center,Number_venues,Cluster
15,CITTA' STUDI,Sushi Restaurant,45.477056,9.226575,3.042737,30,1
25,GHISOLFA,Indian Restaurant,45.490783,9.163536,3.39863,40,1
35,LORETO,Indian Restaurant,45.49051,9.222467,3.632286,43,1
37,MAGENTA - S. VITTORE,Seafood Restaurant,45.464689,9.169665,1.646077,32,1
45,PAGANO,Seafood Restaurant,45.468285,9.1611,2.304995,41,1
54,PORTELLO,Asian Restaurant,45.484139,9.154279,3.425705,42,1
72,TIBALDI,Seafood Restaurant,45.441302,9.180175,2.946347,34,1
75,TRE TORRI,Chinese Restaurant,45.478374,9.155361,3.033995,27,1
77,UMBRIA - MOLISE,Seafood Restaurant,45.453199,9.219422,2.720618,34,1
79,VIGENTINA,Vegetarian / Vegan Restaurant,45.451087,9.191564,1.748021,35,1


Let's put some restrictions on the Cluster. Let's say, we want those neighborhoods whose distance is **not greater than 3 km** and that have a **number of existing restaurant which is less than or equal to 40**.

In [35]:
new_candidates = candidates[candidates['Dist_from_center']<=3].reset_index(drop=True)
new_candidates = new_candidates[new_candidates['Number_venues']<=40].reset_index(drop=True)
new_candidates

Unnamed: 0,Neighborhood,Recommendation,Latitude,Longitude,Dist_from_center,Number_venues,Cluster
0,MAGENTA - S. VITTORE,Seafood Restaurant,45.464689,9.169665,1.646077,32,1
1,TIBALDI,Seafood Restaurant,45.441302,9.180175,2.946347,34,1
2,UMBRIA - MOLISE,Seafood Restaurant,45.453199,9.219422,2.720618,34,1
3,VIGENTINA,Vegetarian / Vegan Restaurant,45.451087,9.191564,1.748021,35,1
4,WASHINGTON,Kebab Restaurant,45.461206,9.15631,2.745026,35,1


By giving a look at the dataframe above it's clear that the **best neighborhoods are 'MAGENTA - S.VITTORE' and 'VIGENTINA'**. In fact, they are the **closest neighborhoods to the city center**. The number of existing locations is basically the same above all 5 neighborhoods.

The two neighborhoods have also a similar distance from the city center, but looking at 'Recommendation', we can say, based on the __milan_grouped__ dataframe, that the **recommended restaurant for 'MAGENTA - S.VITTORE' is more popular**, since it has a higher rank in 'Popularity', since is a Seafood restaurant, **than a Vegetarian/Vegan restaurant which is recommended for 'VIGENTINA'**. 

Therefore, the best location is 'MAGENTA - S.VITTORE', where the ideal cuisine should be a restaurant that serves Seafood.

## 5. Discussion and Conclusion <a name="conclusion"></a>

The results obtained are characterized by some **simplifications**. For example, we assumed that the popularity of a certain type of cuisine is proportional to the number of corresponding existing  restaurants. In reality, one should have a database **containing all the restaurants and their ratings**, in order to decide which type of cuisine is most popular among people. 

Another aspect to take into consideration is how many people are **willing to frequent** the candidate neighborhoods, which appear at first sight to be good candidates. In fact, it might be possible that in that neighborhood, where the distance from the city center and the number of existing venues have really good values, not so many people would be eager to pass their night. This is because it is also **necessary to take into account** other aspects, such as **safety of the neighborhood and the presence of other places where to spend time** after of before having eaten at the restaurant. Moreover, the **costs of opening the new restaurant** should be accounted. In fact, **land prices vary according to the place**, and this shall be considered when opening the restaurant.

It is also possible to **divide the neighborhoods in more clusters**, in order to find further connections between neighborhoods in the same cluster.
We considered a redius of 800 metres, since this can be assumed on average as the distance one would be willing to walk after reaching the place. Of course, considering such a value for the radius **implies taking into consideration certain venues not just for one neighborhood, but also for the other surrounding neighborhoods**.

In general, we can see that is possible to **make a decision based on two simple factors** like the **distance from the city center** and the **number of existing restaurants**.