<h1 style="color:blue">Opening a New Japanese Restaurant in Milan, Italy</h1>

## Table of Contents
* [Introduction: the Business Problem](#introduction)
* [The Dataset](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: the Business Problem <a name="introduction"></a>

<p>Writers, artists, travelers, and explorers have extolled the beauty of Italy and its alluring lifestyle for centuries. Located in Northern Italy, the <b>city of Milan</b>, in particular, is for sure one of the world capitals of fashion, design, and stunning food and beverage experiences. Sipping a lively cup of <i>espresso</i> in a <i>caffetteria</i> looking out on Piazza Duomo, or enjoying a reviving plate of <i>risotto alla milanese</i> with a scented glass of Chianti, one of the easiest ways to come to terms with this charming city is to spend some time in a local restaurant or café.</p>
<p><b>FoodExperiences Inc.</b>, a fictitious American corporation that owns and manages a number of restaurants around the world, knows this very well and it is in fact planning to open one of its restaurants in Milan. Thanks to extensive Market Research, it has already established that an innovative, high-quality and premium-price Japanese restaurant concept would gain a lot of traction in the city and be, therefore, a great business opportunity. As FoodExperiences Inc. makes rapid progress with the development of its business plan, it's now time to start identifying potentially good locations for the restaurant. Picking a location in the city center, a guarantee of huge foot traffic, seems to be a no-brainer, but at the same time, of course, the management team does not want to choose one where other Japanese and/or Asian restaurants are within reach.</p>
In order to address this key business problem, FoodExperiences Inc.'s mandates <b>DataScienceWizards srl</b>, a fictitious Italian data science boutique that specializes in addressing and solving key business questions through data and location data, to:
<ol><li>conduct a quick, preliminary analysis of the current food and restaurant offering in Milan; and</li><li>provide FoodExperiences Inc. with a first recommendation of good restaurant location neighborhoods in the city center.</li></ol> 

## The Dataset <a name="data"></a>

<p>As we said, DataScienceWizards srl offers analytics services that exactly cater to the needs of small businesses, or larger companies like FoodExperiences Inc., who are faced with business questions that can be answered, at least partly, by exploring and analyzing location data and performing geospatial intelligence. The Foursquare API is a gold standard for this task. Querying the Foursquare API allows Data Scientists to get a rich description of locations and venues that include: the name of the venue, the geospatial coordinates of the venue, the full physical address of the venue, the venue's distance for the set point, etc.</p> 
<p>The Team at DataScienceWizards srl immediately gets down to work. It gets the most recent copy of a <a href='https://theculturetrip.com/europe/italy/articles/the-coolest-neighbourhoods-in-milan/'>highly influential report that lists the trendiest neighborhoods in Milan</a> in terms of lifestyle and night life, opens up a Jupyter Notebook, and creates a Pandas dataframe describing those 12 neighborhoods, along with their exact latitude and longitude. For each of them, the different types of offered food and beverage experiences are determined, once again thanks to the data provided by the Foursquare API.</p>
<p>Finally the neighborhoods are clustered by using unsupervised Machine Learning (K-Means) in order to identify the neighborhoods that offer similar food experiences. Clusters are analyzed to determine their differentiating features and a narrowed-down list of Milan center neighborhoods, that best match the business requirements, are recommended to FoodExperiences Inc.</p>

<p> We start by importing the necessary libraries:</p>

In [1]:
import pandas as pd
import requests
import json
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim 
from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
!conda install -c conda-forge folium=0.5.0 --yes
import folium
%matplotlib inline

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.22.0               |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          97 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-1.22.0-pyh9f0ad1d_0



Downloading and Extracting Packages
geopy-1.22.0         | 63 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ###############################

### The trendiest neighborhoods in the City of Milan, Italy

We kick the exercise off by capturing the trendiest neighborhoods in the center of Milan (based on information provided in lifestyle reports like <a href="https://www.milanopocket.it/quartieri-belli-milano/">this one</a>) into a Pandas dataframe.

In [2]:
# Build the dataset with the neighborhoods in Milan center:
milan_json = {'Borough': ['Milan City Center','Milan City Center','Milan City Center','Milan City Center','Milan City Center','Milan City Center','Milan City Center','Milan City Center','Milan City Center','Milan City Center','Milan City Center','Milan City Center'],
        'Neighborhood': ['20121','20122','20123','20124','20125','20126','20127','20128','20129','20145','20144','20143'],
        'latitude': [45.473098,45.459848,45.463481,45.484534,45.498444,45.515986,45.499361,45.511964,45.469305,45.470361,45.456529,45.449025],
        'longitude': [9.191635,9.199988,9.174259,9.202812,9.208493,9.216185,9.226081,9.239731,9.215967,9.155620,9.167259,9.170601]
        }
df_milan = pd.DataFrame(milan_json, columns = ['Borough', 'Neighborhood', 'latitude', 'longitude'])
df_milan

Unnamed: 0,Borough,Neighborhood,latitude,longitude
0,Milan City Center,20121,45.473098,9.191635
1,Milan City Center,20122,45.459848,9.199988
2,Milan City Center,20123,45.463481,9.174259
3,Milan City Center,20124,45.484534,9.202812
4,Milan City Center,20125,45.498444,9.208493
5,Milan City Center,20126,45.515986,9.216185
6,Milan City Center,20127,45.499361,9.226081
7,Milan City Center,20128,45.511964,9.239731
8,Milan City Center,20129,45.469305,9.215967
9,Milan City Center,20145,45.470361,9.15562


With Folium, we can easily visualize Milan center and the neighborhoods in our dataframe. For the center of Milan, we use the geospatial coordinates of "Piazza Duomo", one of the key symbols of the city.

In [3]:
# Create map of Milan center using its latitude and longitude values:
map_milan = folium.Map(location=[45.464195, 9.190017], zoom_start=12)

# Add markers for the neighborhoods in Milan center:
for lat, lng, borough, neighborhood in zip(df_milan.latitude, df_milan.longitude, df_milan.Borough, df_milan.Neighborhood):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_milan)  
    
map_milan

Next, we set up to use the Foursquare API to start exploring the restaurants that we have in these neighborhoods.

In [4]:
# @hidden_cell
CLIENT_ID = 'GGVBZPHRRFK5G4CMVWNGZKV23DME1KBRMRSXOWDAKFVQGPRV'
CLIENT_SECRET = 'QDEZBIKZ0KFK5DJ2FQLW1MUTLKSYY5JIZSNZGZLPDJZQWCUC'
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: GGVBZPHRRFK5G4CMVWNGZKV23DME1KBRMRSXOWDAKFVQGPRV
CLIENT_SECRET:QDEZBIKZ0KFK5DJ2FQLW1MUTLKSYY5JIZSNZGZLPDJZQWCUC


To get a sense of what we can get, let's explore the first neighborhood in Milan.

In [5]:
df_milan.loc[0, 'Neighborhood']

'20121'

In [6]:
neighborhood_latitude = df_milan.loc[0, 'latitude']
neighborhood_longitude = df_milan.loc[0, 'longitude']
neighborhood_name = df_milan.loc[0, 'Neighborhood']
print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, neighborhood_latitude, neighborhood_longitude))

Latitude and longitude values of 20121 are 45.473098, 9.191635.


In [7]:
# Limit of number of venues returned by Foursquare API:
LIMIT = 100 
# Let's get the restaurants in this neighborhood within 500 meters:
radius = 500
# Create URL with the proper Foursquare categoryId for 'Food':
url = 'https://api.foursquare.com/v2/venues/explore?categoryId=4d4b7105d754a06374d81259&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius,
    LIMIT)

In [8]:
# Send the GET request:
results = requests.get(url).json()

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [9]:
# This function extracts the category of the food venue:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [10]:
venues = results['response']['groups'][0]['items']
nearby_restaurants = json_normalize(venues)
# Filter columns:
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_restaurants =nearby_restaurants.loc[:, filtered_columns]
# Filter the category for each row:
nearby_restaurants['venue.categories'] = nearby_restaurants.apply(get_category_type, axis=1)
# Clean the columns:
nearby_restaurants.columns = [col.split(".")[-1] for col in nearby_restaurants.columns]
nearby_restaurants.head()

  


Unnamed: 0,name,categories,lat,lng
0,Nobu,Japanese Restaurant,45.470974,9.192958
1,Ristorante Yazawa,Shabu-Shabu Restaurant,45.475806,9.189566
2,Bice,Italian Restaurant,45.470152,9.19414
3,22,Bistro,45.474928,9.193852
4,Antica Osteria Stendhal,Italian Restaurant,45.473978,9.187678


How many real restaurants do we have in this neighborhood? Coffee shops, pizza places, bakeries, etc. are not direct competitors, so we don't care about those. Let's then pick the venues that actually have the string 'Restaurant' in the 'categories' column.

In [11]:
# Let's filter the venues that are real restaurant venues:
nearby_restaurants = nearby_restaurants[nearby_restaurants['categories'].str.contains('Restaurant')]
nearby_restaurants.head()

Unnamed: 0,name,categories,lat,lng
0,Nobu,Japanese Restaurant,45.470974,9.192958
1,Ristorante Yazawa,Shabu-Shabu Restaurant,45.475806,9.189566
2,Bice,Italian Restaurant,45.470152,9.19414
4,Antica Osteria Stendhal,Italian Restaurant,45.473978,9.187678
6,SUSHI B,Japanese Restaurant,45.472153,9.186883


In [12]:
print('This neighborhood has {} restaurants.'.format(nearby_restaurants.shape[0]))

This neighborhood has 32 restaurants.


Ok, that's good. Now let's create a function to be able to get the restaurants in all the neighboroods.

In [13]:
def getNearbyRestaurants(names, latitudes, longitudes, radius=500):
    
    restaurant_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL:
        url = 'https://api.foursquare.com/v2/venues/explore?categoryId=4d4b7105d754a06374d81259&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius)
            
        # make the GET request:
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each restaurant venue:
        restaurant_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_restaurants = pd.DataFrame([item for restaurant_list in restaurant_list for item in restaurant_list])
    nearby_restaurants.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_restaurants)

With the following line of code, we capture all the food venues in the selected neighborhoods into the Pandas dataframe 'milan_restaurants'.

In [14]:
milan_restaurants = getNearbyRestaurants(names=df_milan['Neighborhood'],latitudes=df_milan['latitude'],longitudes=df_milan['longitude'])

In [15]:
print(milan_restaurants.shape)
milan_restaurants.head()

(289, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,20121,45.473098,9.191635,Nobu,45.470974,9.192958,Japanese Restaurant
1,20121,45.473098,9.191635,Ristorante Yazawa,45.475806,9.189566,Shabu-Shabu Restaurant
2,20121,45.473098,9.191635,Bice,45.470152,9.19414,Italian Restaurant
3,20121,45.473098,9.191635,22,45.474928,9.193852,Bistro
4,20121,45.473098,9.191635,Antica Osteria Stendhal,45.473978,9.187678,Italian Restaurant


Once again, let's focus on the real restaurants.

In [16]:
milan_restaurants = milan_restaurants[milan_restaurants['Venue Category'].str.contains('Restaurant')]
milan_restaurants.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,20121,45.473098,9.191635,Nobu,45.470974,9.192958,Japanese Restaurant
1,20121,45.473098,9.191635,Ristorante Yazawa,45.475806,9.189566,Shabu-Shabu Restaurant
2,20121,45.473098,9.191635,Bice,45.470152,9.19414,Italian Restaurant
4,20121,45.473098,9.191635,Antica Osteria Stendhal,45.473978,9.187678,Italian Restaurant
6,20121,45.473098,9.191635,SUSHI B,45.472153,9.186883,Japanese Restaurant


Let's get an idea of how many restaurants we have, by restaurant category, in our target area.

In [17]:
milan_restaurants['Venue Category'].value_counts()

Italian Restaurant                 68
Restaurant                         20
Seafood Restaurant                 12
Japanese Restaurant                11
Chinese Restaurant                  7
Sushi Restaurant                    6
Spanish Restaurant                  3
Vegetarian / Vegan Restaurant       3
Asian Restaurant                    3
Campanian Restaurant                2
Mediterranean Restaurant            2
Empanada Restaurant                 1
Dim Sum Restaurant                  1
Kebab Restaurant                    1
Sardinian Restaurant                1
Southern / Soul Food Restaurant     1
Shabu-Shabu Restaurant              1
Filipino Restaurant                 1
Tuscan Restaurant                   1
Thai Restaurant                     1
French Restaurant                   1
Greek Restaurant                    1
Abruzzo Restaurant                  1
Fast Food Restaurant                1
Indian Restaurant                   1
American Restaurant                 1
Name: Venue 

In [18]:
print('Our focus area in Milan center has {} restaurants in total.'.format(milan_restaurants.shape[0]))

Our focus area in Milan center has 152 restaurants in total.


The first observation that we can make is that, not surprisingly, the Italian food represents the dominant food experience in our focus area. However, there are some restaurants offering Japanese or Asian food in general.

In [19]:
# Let's define the list of restaurants offering Oriental food:
asian_rest_types = ['Japanese Restaurant','Chinese Restaurant','Sushi Restaurant','Asian Restaurant','Shabu-Shabu Restaurant','Filipino Restaurant','Thai Restaurant']

In [20]:
num_oriental_rest = milan_restaurants[milan_restaurants['Venue Category'].isin(asian_rest_types)].shape[0]
print('Our focus area in Milan center has {} Oriental restaurants in total.'.format(num_oriental_rest))

Our focus area in Milan center has 30 Oriental restaurants in total.


Let's now see all the collected restaurants in our area of interest on a map. Let's also show the Oriental restaurants in different color to better appreciate their density.

In [21]:
# Create map of Milan center using its latitude and longitude values:
map_milan = folium.Map(location=[45.464195, 9.190017], zoom_start=13)

# Add markers for the restaurants:
for lat, lng, rest, neighborhood, cat in zip(milan_restaurants['Venue Latitude'], milan_restaurants['Venue Longitude'], milan_restaurants['Venue'], milan_restaurants['Neighborhood'], milan_restaurants['Venue Category']):
    if cat in asian_rest_types:
        marker_color = 'red'
    else:
        marker_color = 'blue'
    label = '{}, {} [{}]'.format(rest, cat, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=marker_color,
        fill=True,
        fill_color=marker_color,
        fill_opacity=0.7,
        parse_html=False).add_to(map_milan)  
    
map_milan

<p>Looking good. To recap, we now have all the restaurants in the best neighborhoods in Milan center and we have a sense of how many Oriental food ones are there, and where exactly they are located across the neighborhoods.</p>

<p>This concludes the data gathering and preliminary exploration phase. We're now ready to use this data to perform further analysis in order to come up with ideal locations for the FoodExperiences Inc.'s Japanese restaurant in Milan.

## Methodology <a name="methodology"></a>

In this project we are directing our efforts on understanding the food offering in the trendiest neighborhoods in Milan center, particularly the offering of Oriental food as opposed to non-Oriental food experiences.

In the first part of the exercise, we have collected the required **data, location and type (category) of every restaurant of the neighborhoods in Milan center ("Piazza Duomo")**. We have also specifically **identified the Oriental food restaurants** as these might be seen as competitors of the restaurant that FoodExperiences Inc. wants to open in Milan.

In the second and last part, we will use **Machine Learning (k-means clustering)** in order to identify neighborhoods / addresses which should be a starting point for a final 'street-level' exploration and search for the optimal venue location by the management team of our Client, FoodExperiences Inc.

## Analysis <a name="analysis"></a>

Let's check how many venues were returned for each neighborhood.

In [22]:
milan_restaurants.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
20121,23,23,23,23,23,23
20122,14,14,14,14,14,14
20123,11,11,11,11,11,11
20124,17,17,17,17,17,17
20125,11,11,11,11,11,11
20126,4,4,4,4,4,4
20127,8,8,8,8,8,8
20128,1,1,1,1,1,1
20129,8,8,8,8,8,8
20143,21,21,21,21,21,21


What this quick picture tells us is that there are neighborhoods with fewer restaurants in general, like '20125', '20126', '20127', or '20128'.

Let's start our work towards clustering the neighborhoods in a meaningful way to draw some conclusions.

In [23]:
import numpy as np

In [24]:
# One-hot encoding:
restaurants_onehot = pd.get_dummies(milan_restaurants[['Venue Category']], prefix="", prefix_sep="")

# Add neighborhood column back to dataframe:
restaurants_onehot['Neighborhood'] = milan_restaurants['Neighborhood'] 

# Move neighborhood column to the first column:
fixed_columns = [restaurants_onehot.columns[-1]] + list(restaurants_onehot.columns[:-1])
restaurants_onehot = restaurants_onehot[fixed_columns]

# Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category:
milan_grouped = restaurants_onehot.groupby('Neighborhood').mean().reset_index()
milan_grouped

Unnamed: 0,Neighborhood,Abruzzo Restaurant,American Restaurant,Asian Restaurant,Campanian Restaurant,Chinese Restaurant,Dim Sum Restaurant,Empanada Restaurant,Fast Food Restaurant,Filipino Restaurant,...,Restaurant,Sardinian Restaurant,Seafood Restaurant,Shabu-Shabu Restaurant,Southern / Soul Food Restaurant,Spanish Restaurant,Sushi Restaurant,Thai Restaurant,Tuscan Restaurant,Vegetarian / Vegan Restaurant
0,20121,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.217391,0.0,0.043478,0.043478,0.0,0.0,0.043478,0.0,0.0,0.0
1,20122,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.285714,0.071429,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.071429
2,20123,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.181818,0.0,0.0,0.0,0.0,0.090909,0.181818,0.0,0.0,0.0
3,20124,0.0,0.0,0.058824,0.0,0.117647,0.058824,0.0,0.0,0.0,...,0.058824,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0
4,20125,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,...,0.272727,0.0,0.181818,0.0,0.0,0.0,0.090909,0.0,0.0,0.0
5,20126,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,20127,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.125,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,20128,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,20129,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,...,0.375,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,20143,0.047619,0.0,0.0,0.0,0.047619,0.0,0.047619,0.0,0.0,...,0.0,0.0,0.047619,0.0,0.047619,0.0,0.0,0.047619,0.047619,0.047619


Then we print each neighborhood along with the top 3 most common restaurant types.

In [25]:
# Let's print each neighborhood along with the top 3 most common restaurants:
num_top_venues = 3

for hood in milan_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = milan_grouped[milan_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----20121----
                 venue  freq
0   Italian Restaurant  0.43
1           Restaurant  0.22
2  Japanese Restaurant  0.17


----20122----
                 venue  freq
0   Italian Restaurant  0.36
1           Restaurant  0.29
2  Japanese Restaurant  0.14


----20123----
                venue  freq
0  Italian Restaurant  0.55
1    Sushi Restaurant  0.18
2          Restaurant  0.18


----20124----
                 venue  freq
0   Italian Restaurant  0.59
1   Chinese Restaurant  0.12
2  Japanese Restaurant  0.06


----20125----
                venue  freq
0          Restaurant  0.27
1  Chinese Restaurant  0.18
2  Seafood Restaurant  0.18


----20126----
                venue  freq
0    Kebab Restaurant  0.25
1          Restaurant  0.25
2  Italian Restaurant  0.25


----20127----
                  venue  freq
0    Italian Restaurant  0.75
1    Chinese Restaurant  0.12
2  Fast Food Restaurant  0.12


----20128----
                venue  freq
0  Italian Restaurant   1.0
1  Abruzzo Res

In [26]:
# A function to sort the restaurants in descending order:
def return_most_common_restaurants(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

<p>Now let's create a new dataframe and display the top 5 restaurants for each neighborhood.</p>

In [27]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# Create columns according to number of top restaurants:
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Restaurant'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Restaurant'.format(ind+1))

# Create a new dataframe:
neighborhoods_rest_sorted = pd.DataFrame(columns=columns)
neighborhoods_rest_sorted['Neighborhood'] = milan_grouped['Neighborhood']

for ind in np.arange(milan_grouped.shape[0]):
    neighborhoods_rest_sorted.iloc[ind, 1:] = return_most_common_restaurants(milan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_rest_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant
0,20121,Italian Restaurant,Restaurant,Japanese Restaurant,French Restaurant,Sushi Restaurant
1,20122,Italian Restaurant,Restaurant,Japanese Restaurant,Sardinian Restaurant,Vegetarian / Vegan Restaurant
2,20123,Italian Restaurant,Sushi Restaurant,Restaurant,Spanish Restaurant,Greek Restaurant
3,20124,Italian Restaurant,Chinese Restaurant,Asian Restaurant,Spanish Restaurant,Dim Sum Restaurant
4,20125,Restaurant,Italian Restaurant,Chinese Restaurant,Seafood Restaurant,Greek Restaurant


#### We run k-means to cluster the neighborhoods into 3 clusters.

In [28]:
# Set number of clusters:
kclusters = 3

milan_grouped_clustering = milan_grouped.drop('Neighborhood', 1)

# Run k-means clustering:
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(milan_grouped_clustering)

# Check cluster labels generated for each row in the dataframe:
kmeans.labels_[0:10]

array([1, 0, 1, 1, 0, 0, 2, 2, 0, 1], dtype=int32)

<p>Let's create a new dataframe that includes the indication of the cluster, as well as the top 5 restaurants for each neighborhood of the 12 neighborhoods we have in our dataframe.</p>

In [29]:
# Add clustering labels:
neighborhoods_rest_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

milan_merged = df_milan

# Merge toronto_grouped df_toronto to add latitude/longitude for each neighborhood:
milan_merged = milan_merged.join(neighborhoods_rest_sorted.set_index('Neighborhood'), on='Neighborhood')

milan_merged

Unnamed: 0,Borough,Neighborhood,latitude,longitude,Cluster Labels,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant
0,Milan City Center,20121,45.473098,9.191635,1,Italian Restaurant,Restaurant,Japanese Restaurant,French Restaurant,Sushi Restaurant
1,Milan City Center,20122,45.459848,9.199988,0,Italian Restaurant,Restaurant,Japanese Restaurant,Sardinian Restaurant,Vegetarian / Vegan Restaurant
2,Milan City Center,20123,45.463481,9.174259,1,Italian Restaurant,Sushi Restaurant,Restaurant,Spanish Restaurant,Greek Restaurant
3,Milan City Center,20124,45.484534,9.202812,1,Italian Restaurant,Chinese Restaurant,Asian Restaurant,Spanish Restaurant,Dim Sum Restaurant
4,Milan City Center,20125,45.498444,9.208493,0,Restaurant,Italian Restaurant,Chinese Restaurant,Seafood Restaurant,Greek Restaurant
5,Milan City Center,20126,45.515986,9.216185,0,Italian Restaurant,American Restaurant,Restaurant,Kebab Restaurant,Indian Restaurant
6,Milan City Center,20127,45.499361,9.226081,2,Italian Restaurant,Chinese Restaurant,Fast Food Restaurant,Tuscan Restaurant,American Restaurant
7,Milan City Center,20128,45.511964,9.239731,2,Italian Restaurant,Tuscan Restaurant,American Restaurant,Asian Restaurant,Campanian Restaurant
8,Milan City Center,20129,45.469305,9.215967,0,Italian Restaurant,Restaurant,Campanian Restaurant,Seafood Restaurant,Indian Restaurant
9,Milan City Center,20145,45.470361,9.15562,1,Italian Restaurant,Asian Restaurant,Restaurant,Campanian Restaurant,Chinese Restaurant


<p>The final step is to display the resulting clusters on a map, which will allow us to derive some insights.</p>

In [30]:
# Create map:
map_clusters = folium.Map(location=[45.464195, 9.190017], zoom_start=11)

# Set color scheme for the clusters:
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Add markers to the map:
markers_colors = []
for lat, lon, poi, cluster in zip(milan_merged.latitude, milan_merged.longitude, milan_merged.Neighborhood, milan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

We have already maintained that the dominant food experience in this part of Milan is the Italian one, which of course is not a surprise. However, we can now examine each cluster of restaurants and label it based on the predominant "food experience" <u>after</u> the Italian one.

#### Cluster 0

In [31]:
milan_merged.loc[milan_merged['Cluster Labels'] == 0, milan_merged.columns[[1] + list(range(5, milan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant
1,20122,Italian Restaurant,Restaurant,Japanese Restaurant,Sardinian Restaurant,Vegetarian / Vegan Restaurant
4,20125,Restaurant,Italian Restaurant,Chinese Restaurant,Seafood Restaurant,Greek Restaurant
5,20126,Italian Restaurant,American Restaurant,Restaurant,Kebab Restaurant,Indian Restaurant
8,20129,Italian Restaurant,Restaurant,Campanian Restaurant,Seafood Restaurant,Indian Restaurant


We could label this cluster as **"Blending Italian with other food experiences and suggestions"**.

#### Cluster 1

In [32]:
milan_merged.loc[milan_merged['Cluster Labels'] == 1, milan_merged.columns[[1] + list(range(5, milan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant
0,20121,Italian Restaurant,Restaurant,Japanese Restaurant,French Restaurant,Sushi Restaurant
2,20123,Italian Restaurant,Sushi Restaurant,Restaurant,Spanish Restaurant,Greek Restaurant
3,20124,Italian Restaurant,Chinese Restaurant,Asian Restaurant,Spanish Restaurant,Dim Sum Restaurant
9,20145,Italian Restaurant,Asian Restaurant,Restaurant,Campanian Restaurant,Chinese Restaurant
10,20144,Italian Restaurant,Seafood Restaurant,Japanese Restaurant,Sushi Restaurant,Filipino Restaurant
11,20143,Italian Restaurant,Mediterranean Restaurant,Chinese Restaurant,Empanada Restaurant,Indian Restaurant


We could label this cluster as **"Young and easy food experiences"**.

#### Cluster 2

In [33]:
milan_merged.loc[milan_merged['Cluster Labels'] == 2, milan_merged.columns[[1] + list(range(5, milan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant
6,20127,Italian Restaurant,Chinese Restaurant,Fast Food Restaurant,Tuscan Restaurant,American Restaurant
7,20128,Italian Restaurant,Tuscan Restaurant,American Restaurant,Asian Restaurant,Campanian Restaurant


We could label this cluster as **"Dominant Italian and Oriental food experiences"**.

## Results and Discussion <a name="results"></a>

<p>Our work has shown that <b>Milan center</b> has a lot to offer to those who are planning to dine out, with Italian restaurants being the mainstream offering (with 70 venues out of a total of 153 - a 45% incidence), but where Oriental restaurants in general represent close to 20% of the offering. Venues are not distributed evenly across the neighborhoods and there are some where, as a whole, the restaurant density is lower, like in '20125', '20126', '20127', 20128', or '20129' as already noted above.</p>
<p>We also processed the data in such a way to reveal, for each targeted neighborhood, the top 5 most present restaurant categories, thus spotting the ones where Oriental food is a significant part of the offering. There are in fact neighborhoods where this is quite apparent and, as such, all other factors being equal, these would not be ideal candidates for starting up a new Japanese restaurant.</p>
<p>Next, we used k-means to cluster the neighborhoods into 3 clusters in order to establish their predominant "food color(s)". This allowed us to identify the neighborhoods for "Dominant Italian and Oriental food experiences" (Cluster 2), therefore not a good pool of location candidates, as well as areas where the predominant food experience is non-Oriental (Cluster 0 and Cluster 1).</p>
<p>Based on the combined evidences of the analysis, DataScienceWizards srl would recommend neighborhood '20129' as the ideal location candidate, followed by '20126' and '20127'. The differential advantage of '20129' is that it is closer to Milan center ("Piazza Duomo") and at a walking distance form it.</p>
<p>This, of course, does not automatically imply that the neighborhoods '2019', '20126', and '20127' are indeed optimal locations for the new Japanese restaurant that FoodExperiences Inc. wants to establish. It is entirely possible that good reasons are there for fewer restaurants in these areas, and fewer Oriental restaurants in particular, reasons which would make them unsuitable for a new restaurant regardless of the apparently reduced competition in the area.</p>
<p>These recommendations are only a starting point and will require more in-depth analysis and evaluations.</p>

## Conclusion <a name="conclusion"></a>

<p>The goal of this project was to analyze <b>the trendiest neighborhoods, for lifestyle and night life, in the city center of Milan</b> in order to better understand the food experiences offered there (particularly Oriental as opposed to Italian) and thus help the management team of FoodExperiences Inc. narrow down the search for an optimal location for a new Japanese restaurant in Milan.</p>
<p>By performing an extensive location data analysis, made possible by the Foursquare API, DataScienceWizards srl was able to come up with a recommended neighborhood, '20129', which satisfies a number of basic requirements for the location selection. Neighborhoods '20126' and '20127' were also identified as 'second-' and 'third-best' recommendations.</p>
<p>The final decision of the optimal restaurant location will be made by FoodExperiences Inc.'s management team based on other features of the locations in the three recommended neighborhoods, taking into consideration additional elements like attractiveness of each location, level of noise, proximity to major roads, real estate availability, prices, social and economic factors, and the like.</p>