# Capstone Project:  Choose the best place for your new Pizza Place: A data science project in İtaly.

### __@Zafer Uzun__

### Introduction

- One of the important points for establishing a new business is the location of the workplace.For example, a crowded population can provide the best chance experiences such as social gatherings, entertainment, performances, festivals, and tourists to earn money. And all the things provided can help your business. So we have to chose the right place.



### Businees Problem

- The objective of this capstone project is to analyze and select the best locations in the city of Rome, Italy to open a new __pizza place__. Using data science methodology and machine learning techniques like clustering, this project aims to provide solutions to answer the business question: In the city of Rome , Italy, if a property developer is looking to open a new pizza place, where would you recommend that they open it?

to be countinue :)

### Data


- In this section we will use Wikipedia page "https://en.wikipedia.org/wiki/Category:Subdivisions_of_Rome",  contains a list of neighborhoods in Rome from Italy

- And we will use the Latitude and longitude coordinates of those neighborhoods to explore the city.

- We will use Python Geocoder package to to examine the coordinates of the neighborhoods in the city.

- Venue data, particularly data related to restaurants.

- To get the venue data for those neighborhoods we will use Foursquare API ( https://foursquare.com/ com/)

### Methodology 

##### At this stage, to focus on the research problem we use a holistic analysis approach.

- The first step is preparing the data:
    * to get the list of neighborhoods in the city of Rome, available on the Wikipedia page "https://en.wikipedia.org/wiki/Category:Subdivisions_of_Rome".
    - to web scraping using Python requests and "beautifulsoup" packages to extract the list of neighborhoods. 
    - To use the Geocoder package that will allow us to convert address into geographical coordinates in the form of latitude and longitude.
    
- The second step is checking the data:
    - to make sure we are working on the right form of the data, we will use one of the geographical visualization library "Folium package" and visualization library Pandas DataFrame.
    
- The third step is analyzing the data:

    - Firstly,  we will convert the data into a pandas DataFrame and then visualize the neighborhoods in a map using the Folium package. 

    - Next, we will use Foursquare API to get the top 100 venues that are within a radius of 2000 meters. To use  Foursquare API we will use our Foursquare ID and Foursquare secret key. 

    -  Then, we will focus on the mean of the frequency of occurrence of each venue category to analyze each neighborhood by grouping the rows by neighborhood.

    - To solve our Bussiness Problem, we will filter the “Pizza Place” as venue category on the neighborhoods.

    - Lastly, to cluster the neighborhoods into 3 clusters based on their frequency of occurrence for “Pizza Place”, we will cluster on the data by using k-means clustering. 

    - By making a selection among the clustered neighborhood groups we have acquired, we will provide the process of making the optimum decision by using data science support. 


### Results

- We categorize the neighborhoods of Roma into 3 clusters based on the frequency of the occurence for "Pizza Place (Restaurants)".
    - Cluster 0: Neigbordhoods with moderate number of Pizza Place.
    - Cluster 1: Neigbordhoods with low number of Pizza Place.
    - Cluster 2: Neigbordhoods with high number of Pizza Place.

### Discussion

- The highest number of Pizza Place in cluster 2 and moderate number of Pizza Placein cluster 0.
- On the other hand, cluster 1 has very low number to no Pizza Place in the neighborhoods.
- As a recommendation: 
    - __The neighborhoods in cluster 1__ are the most preferred locations to open a new Pizza Place.

### Conclusion 

- In this project we use the power of the data science, to make a desicion and chose the right place for a Pizza Place.  

- To answer the business question :

    - The neighbourhoods in cluster 1 are the most preferred locations to open a Pizza Place.
    - The findings of this project will help the relevant stakeholders to capitalize on the opportunities in high potential locations while avoiding overcrowded areas, are high competition,  in their decisions to set up an entrepreneurial business. 

### PREPARING THE ENVIRONTMENT 

In [3]:

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("All the Libraries imported.")

All the Libraries imported.


### Importing the Data

In [5]:
# send the GET request
data = requests.get('https://en.wikipedia.org/wiki/Category:Subdivisions_of_Rome').text
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')
# create a list to store neighborhood data
neighborhoodList = []
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)
# create a new DataFrame from the list
df = pd.DataFrame({"Neighborhood": neighborhoodList})

df.head(10)

Unnamed: 0,Neighborhood
0,Administrative subdivision of Rome
1,Colle Salario
2,List of shopping areas and markets in Rome
3,14 regions of Augustan Rome
4,14 regions of Medieval Rome
5,Acilia
6,Balduina
7,Casalotti
8,Castel Giubileo (zone of Rome)
9,Cinecittà


In [9]:
df.shape

(49, 1)

#### Getting the geographical coordinates of city of Roma in İtaly

In [6]:
address = 'rome , italy'

geolocator = Nominatim(user_agent="can_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Rome are Latitude: {}, Longitude: {}.'.format(latitude, longitude))

The geograpical coordinate of Rome are Latitude: 41.8933203, Longitude: 12.4829321.


#### Getting the geographical coordinates of neighborhoods of the city of Roma.

In [7]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, rome , italy'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [8]:
coords = [ get_latlng(neighborhood) for neighborhood in df["Neighborhood"].tolist() ]

In [9]:
coords

[[41.90322000000003, 12.495650000000069],
 [41.91331000000008, 12.502170000000035],
 [41.90322000000003, 12.495650000000069],
 [41.92300590349251, 12.609024108952255],
 [41.92300590349251, 12.609024108952255],
 [41.78340000000003, 12.365110000000072],
 [41.92286000000007, 12.438840000000027],
 [41.91589000000005, 12.36955000000006],
 [41.98849004320263, 12.498590053291338],
 [41.849410000000034, 12.574350000000038],
 [41.902120000000025, 12.46100000000007],
 [41.94372000000004, 12.474740000000054],
 [41.862430000000074, 12.49007000000006],
 [41.73499000000004, 12.362240000000043],
 [42.009870000000035, 12.378480000000025],
 [41.99281000000008, 12.489860000000022],
 [41.90322000000003, 12.495650000000069],
 [41.83689000000004, 12.430050000000051],
 [41.87903000000006, 12.350150000000042],
 [41.75937000000005, 12.30066000000005],
 [41.76027000000005, 12.301680000000033],
 [42.124450000000024, 12.289570000000026],
 [41.82144000000005, 12.350910000000056],
 [42.003720000000044, 12.48195000

Note: There is not any mistake in the coords code ( [41.89251, 12.48417]).  

Note: Our "coords" code is list format now. So we have to change the it into dataframe.  

In [10]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [11]:
# merge the coordinates into the original dataframe
df['Latitude'] = df_coords['Latitude']
df['Longitude'] = df_coords['Longitude']

In [12]:
df.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Administrative subdivision of Rome,41.90322,12.49565
1,Colle Salario,41.91331,12.50217
2,List of shopping areas and markets in Rome,41.90322,12.49565
3,14 regions of Augustan Rome,41.923006,12.609024
4,14 regions of Medieval Rome,41.923006,12.609024


In [25]:
df.shape

(49, 3)

#### Checking the Data

In [13]:
# create map of rome using latitude and longitude values
map_da = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(df['Latitude'],df['Longitude'], df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_da)  
    
map_da

### Analyzing the Data

In [14]:

# Define Foursquare Credentials and Version
LIMIT = 100
CLIENT_ID = 'N1POG5RAAV2FUYD01CPN450UYAM4XZGYEITTLIQKYXOT2CR0' # your Foursquare ID
CLIENT_SECRET = 'ZNB2Y3QOXDC5GUYILYFLJSBV0URBUSIMCKNSUV22HF2VJO1B' # your Foursquare Secret

#CLIENT_ID = 'XXPOG5RAAV2FUYD01CPN450UYAM4XZGYEITTLIQKYXOT2CR0' # your Foursquare ID-Hidden
#CLIENT_SECRET = 'XXNB2Y3QOXDC5GUYILYFLJSBV0URBUSIMCKNSUV22HF2VJO1B' # your Foursquare Secret-Hidden
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: N1POG5RAAV2FUYD01CPN450UYAM4XZGYEITTLIQKYXOT2CR0
CLIENT_SECRET:ZNB2Y3QOXDC5GUYILYFLJSBV0URBUSIMCKNSUV22HF2VJO1B


Not: Because of the limit we will ask a request for top 100 venues within a redius of 2000 meters.

In [15]:

radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [16]:

# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(2990, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Administrative subdivision of Rome,41.90322,12.49565,Piazza della Repubblica,41.902422,12.496367,Plaza
1,Administrative subdivision of Rome,41.90322,12.49565,The St. Regis Rome,41.904072,12.494873,Hotel
2,Administrative subdivision of Rome,41.90322,12.49565,Museo delle Terme di Diocleziano,41.902912,12.498882,History Museum
3,Administrative subdivision of Rome,41.90322,12.49565,Culinaria,41.903718,12.49961,Italian Restaurant
4,Administrative subdivision of Rome,41.90322,12.49565,Come Il Latte,41.907164,12.495989,Ice Cream Shop


In [19]:
venues_df.groupby(["Neighborhood"]).count().head()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
14 regions of Augustan Rome,26,26,26,26,26,26
14 regions of Medieval Rome,26,26,26,26,26,26
Acilia,42,42,42,42,42,42
Administrative subdivision of Rome,100,100,100,100,100,100
Balduina,100,100,100,100,100,100


In [22]:
# to explore unique categories
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 188 uniques categories.


In [25]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:20]

array(['Plaza', 'Hotel', 'History Museum', 'Italian Restaurant',
       'Ice Cream Shop', 'Church', 'Winery', 'Pizza Place', 'Art Museum',
       'Roman Restaurant', 'Coffee Shop', 'Trattoria/Osteria', 'Wine Bar',
       'American Restaurant', 'Market', 'Pastry Shop', 'Café', 'Fountain',
       'Museum', 'Seafood Restaurant'], dtype=object)

In [26]:

# one hot encoding
kl_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
kl_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [kl_onehot.columns[-1]] + list(kl_onehot.columns[:-1])
kl_onehot = kl_onehot[fixed_columns]

print(kl_onehot.shape)
kl_onehot.head()

(2990, 189)


Unnamed: 0,Neighborhoods,Accessories Store,Airport,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Auditorium,Auto Dealership,Automotive Shop,BBQ Joint,Bakery,Bar,Basketball Court,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Bistro,Boarding House,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Burger Joint,Bus Station,Business Service,Café,Camera Store,Campground,Castle,Chinese Restaurant,Church,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Cosmetics Shop,Courthouse,Cuban Restaurant,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Dog Run,Donut Shop,Electronics Store,Farm,Farmers Market,Fast Food Restaurant,Film Studio,Fish & Chips Shop,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gaming Cafe,Garden,Gastropub,General Entertainment,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Historic Site,History Museum,Hobby Shop,Hockey Field,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Kebab Restaurant,Kids Store,Lake,Library,Light Rail Station,Lighthouse,Lounge,Market,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Miscellaneous Shop,Monument / Landmark,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music School,Music Venue,Nightclub,Noodle House,Office,Opera House,Other Nightlife,Outdoors & Recreation,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Peruvian Restaurant,Pet Café,Pharmacy,Pizza Place,Planetarium,Playground,Plaza,Pool,Pub,Ramen Restaurant,Record Shop,Recording Studio,Resort,Rest Area,Restaurant,Road,Rock Club,Roman Restaurant,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Skating Rink,Smoke Shop,Soccer Field,Soccer Stadium,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Club,Stables,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Tattoo Parlor,Tea Room,Temple,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Theme Park,Thrift / Vintage Store,Toy / Game Store,Track Stadium,Train Station,Trattoria/Osteria,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Warehouse Store,Wine Bar,Wine Shop,Winery
0,Administrative subdivision of Rome,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Administrative subdivision of Rome,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Administrative subdivision of Rome,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Administrative subdivision of Rome,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Administrative subdivision of Rome,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [27]:
kl_grouped = kl_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(kl_grouped.shape)
kl_grouped.head()

(49, 189)


Unnamed: 0,Neighborhoods,Accessories Store,Airport,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Auditorium,Auto Dealership,Automotive Shop,BBQ Joint,Bakery,Bar,Basketball Court,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Bistro,Boarding House,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Burger Joint,Bus Station,Business Service,Café,Camera Store,Campground,Castle,Chinese Restaurant,Church,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Cosmetics Shop,Courthouse,Cuban Restaurant,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Dog Run,Donut Shop,Electronics Store,Farm,Farmers Market,Fast Food Restaurant,Film Studio,Fish & Chips Shop,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gaming Cafe,Garden,Gastropub,General Entertainment,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Historic Site,History Museum,Hobby Shop,Hockey Field,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Kebab Restaurant,Kids Store,Lake,Library,Light Rail Station,Lighthouse,Lounge,Market,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Miscellaneous Shop,Monument / Landmark,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music School,Music Venue,Nightclub,Noodle House,Office,Opera House,Other Nightlife,Outdoors & Recreation,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Peruvian Restaurant,Pet Café,Pharmacy,Pizza Place,Planetarium,Playground,Plaza,Pool,Pub,Ramen Restaurant,Record Shop,Recording Studio,Resort,Rest Area,Restaurant,Road,Rock Club,Roman Restaurant,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Skating Rink,Smoke Shop,Soccer Field,Soccer Stadium,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Club,Stables,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Tattoo Parlor,Tea Room,Temple,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Theme Park,Thrift / Vintage Store,Toy / Game Store,Track Stadium,Train Station,Trattoria/Osteria,Tunnel,Turkish Restaurant,Vegetarian / Vegan Restaurant,Warehouse Store,Wine Bar,Wine Shop,Winery
0,14 regions of Augustan Rome,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.192308,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.153846,0.0,0.0,0.0,0.038462,0.0,0.115385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,14 regions of Medieval Rome,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.192308,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.153846,0.0,0.0,0.0,0.038462,0.0,0.115385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Acilia,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.02381,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.02381,0.02381,0.0,0.02381,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.02381,0.0,0.0,0.119048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.047619,0.0,0.0,0.0,0.0,0.0,0.02381,0.047619,0.0,0.0,0.071429,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.047619,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Administrative subdivision of Rome,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.02,0.0,0.0,0.0,0.14,0.0,0.08,0.0,0.0,0.0,0.1,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.07,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.04,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.01
4,Balduina,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.08,0.0,0.0,0.0,0.16,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.07,0.0,0.0,0.02,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.03,0.01,0.0,0.01,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0


__Note: Out business problem is exploring the right place for the Pizza Place.__ 

In [28]:
len(kl_grouped[kl_grouped["Pizza Place"] > 0])

39

In [29]:
kl_pizza = kl_grouped[["Neighborhoods","Pizza Place"]]

In [32]:
kl_pizza.head(20)

Unnamed: 0,Neighborhoods,Pizza Place
0,14 regions of Augustan Rome,0.076923
1,14 regions of Medieval Rome,0.076923
2,Acilia,0.047619
3,Administrative subdivision of Rome,0.02
4,Balduina,0.07
5,Casalotti,0.052632
6,Castel Giubileo (zone of Rome),0.029412
7,Cinecittà,0.115385
8,Colle Salario,0.02
9,Columbus (Rome),0.03


-------------------------------------CLUSTERING---------------------------------------------
- Now, we find the Pizza Places in the Neigborhoods.
- To cluster the neigbourhoods we will use k-means. 
- Run k-means to cluster the neighborhoods in Roma into 3 clusters.

In [33]:
# set number of clusters
kclusters = 3

kl_clustering = kl_pizza.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(kl_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 1, 0, 0, 1, 2, 1, 1])

In [34]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
kl_merged = kl_pizza.copy()

# add clustering labels
kl_merged["Cluster Labels"] = kmeans.labels_

In [35]:
kl_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
kl_merged.head()

Unnamed: 0,Neighborhood,Pizza Place,Cluster Labels
0,14 regions of Augustan Rome,0.076923,0
1,14 regions of Medieval Rome,0.076923,0
2,Acilia,0.047619,0
3,Administrative subdivision of Rome,0.02,1
4,Balduina,0.07,0


In [36]:
# merge data to add latitude/longitude for each neighborhood
kl_merged = kl_merged.join(df.set_index("Neighborhood"), on="Neighborhood")

print(kl_merged.shape)
kl_merged.head() # check the last columns!

(49, 5)


Unnamed: 0,Neighborhood,Pizza Place,Cluster Labels,Latitude,Longitude
0,14 regions of Augustan Rome,0.076923,0,41.923006,12.609024
1,14 regions of Medieval Rome,0.076923,0,41.923006,12.609024
2,Acilia,0.047619,0,41.7834,12.36511
3,Administrative subdivision of Rome,0.02,1,41.90322,12.49565
4,Balduina,0.07,0,41.92286,12.43884


In [37]:
# sort the results by Cluster Labels
print(kl_merged.shape)
kl_merged.sort_values(["Cluster Labels"], inplace=True)
kl_merged

(49, 5)


Unnamed: 0,Neighborhood,Pizza Place,Cluster Labels,Latitude,Longitude
0,14 regions of Augustan Rome,0.076923,0,41.923006,12.609024
45,Tor Tre Teste,0.046512,0,41.886888,12.590621
44,Tor Cervara,0.088235,0,41.916692,12.589213
42,Spinaceto,0.086957,0,41.78045,12.4393
41,Settecamini,0.045455,0,41.93795,12.6209
40,San Lorenzo (Rome),0.08,0,41.89663,12.51482
36,Regio XI Circus Maximus,0.05,0,41.88595,12.4857
25,Rebibbia,0.095238,0,41.92711,12.57467
17,Magliana,0.055556,0,41.83689,12.43005
13,La Storta,0.045455,0,42.00987,12.37848


In [38]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(kl_merged['Latitude'], kl_merged['Longitude'], kl_merged['Neighborhood'], kl_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

 - Cluster 0: Neigbordhoods with moderate number of Pizza Place.

In [39]:
kl_merged.loc[kl_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Pizza Place,Cluster Labels,Latitude,Longitude
0,14 regions of Augustan Rome,0.076923,0,41.923006,12.609024
45,Tor Tre Teste,0.046512,0,41.886888,12.590621
44,Tor Cervara,0.088235,0,41.916692,12.589213
42,Spinaceto,0.086957,0,41.78045,12.4393
41,Settecamini,0.045455,0,41.93795,12.6209
40,San Lorenzo (Rome),0.08,0,41.89663,12.51482
36,Regio XI Circus Maximus,0.05,0,41.88595,12.4857
25,Rebibbia,0.095238,0,41.92711,12.57467
17,Magliana,0.055556,0,41.83689,12.43005
13,La Storta,0.045455,0,42.00987,12.37848


- Cluster 1: Neigbordhoods with low number of Pizza Place.

In [40]:
kl_merged.loc[kl_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Pizza Place,Cluster Labels,Latitude,Longitude
31,Regio V Esquiliae,0.0,1,42.198415,12.54473
33,Regio VII Via Lata,0.03,1,41.89822,12.48119
34,Regio VIII Forum Romanum,0.03,1,41.89251,12.48417
35,Regio X Palatium,0.01,1,41.90398,12.48013
6,Castel Giubileo (zone of Rome),0.029412,1,41.98849,12.49859
38,Regio XIII Aventinus,0.02,1,41.881039,12.486777
10,Fleming (Rome),0.01,1,41.94372,12.47474
39,Regio XIV Transtiberim,0.02,1,41.90322,12.49565
3,Administrative subdivision of Rome,0.02,1,41.90322,12.49565
43,Suburbs of Rome,0.02,1,41.90322,12.49565


 - Cluster 2: Neigbordhoods with high number of Pizza Place.

In [41]:
kl_merged.loc[kl_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Pizza Place,Cluster Labels,Latitude,Longitude
47,Val Melaina,0.12,2,41.948567,12.525914
7,Cinecittà,0.115385,2,41.84941,12.57435
32,Regio VI Alta Semita,0.151515,2,38.11047,15.66129
46,Torrenova (Rome),0.142857,2,41.85924,12.62248


In [42]:
kl_merged_final=kl_merged.loc[kl_merged['Cluster Labels'] == 1]

In [43]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(kl_merged_final['Latitude'], kl_merged_final['Longitude'], kl_merged_final['Neighborhood'], kl_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [53]:
Pizza_Place_List=kl_merged.loc[kl_merged['Cluster Labels'] == 1]
Pizza_Place_List.sort_values("Pizza Place", axis=0, ascending=True)

Unnamed: 0,Neighborhood,Pizza Place,Cluster Labels,Latitude,Longitude
31,Regio V Esquiliae,0.0,1,42.198415,12.54473
23,Prima Porta,0.0,1,42.00372,12.48195
22,Ponte Galeria,0.0,1,41.82144,12.35091
21,Polline Martignano,0.0,1,42.12445,12.28957
19,Ostia (Rome),0.0,1,41.75937,12.30066
18,Massimina,0.0,1,41.87903,12.35015
37,Regio XII Piscina Publica,0.0,1,41.88404,12.81063
12,Infernetto,0.0,1,41.73499,12.36224
20,Ostia Antica (district),0.0,1,41.76027,12.30168
14,Labaro,0.0,1,41.99281,12.48986
