# Where to open a new Italian restaurant in Central Yokohama, Naka-ward?

# Introduction
Restaurant business is easy to start and can be profitable when it gets many customers.  I have a client who spent long time in Venice, Italy, to run small Italian cafe/restaurant as a chef who is thinking to open a new Italian restaurant in Yokohama because the client thinks the city has enough customers for good Italian foods but misses good Italian restaurants.

# Business Problem
I, as a business consultant, agree that Yokohama has lots of potential for Italian restaurant business, but the city is large, and the location is very important to the restaurant business.  Within Yokohama, Central Yokohama, especially Naka-ward, is one of the most commercially developed wards in the city, thus it is considered attractive area to open a new Italian restaurant.  Yet, Naka-ward has 21 km2 (= square kilometer) with more than 150 thousands people living in the area and day time population becomes 240 thousands people with tourists and office workers visiting, a large and densed area.  Within Naka-ward, there are several commercial districts, and it is important to understand their characteristics, and even which Town in disctrict is a better location for Italian restaurant.

## Analytics approach
1. Identify best District within Naka-ward
    - 1.1 Segment Naka-ward by Town: using Japan Post Office postal code to segment the ward into multiple Towns
    - 1.2 Attach Coordinates to Town for Folium map visualization and Foursquare venue data download 
    - 1.3 Map Towns over Naka-ward geography with Folium 
    - 1.4 Cluster Towns rolling up to District: Use Foursquare venue data to cluster Towns to District by k-mean method
    - 1.5 Review Districts to understand their characteristics and name them
2. Identify successful Italian restaurants in Naka-ward
    - 2.1 Identify Italian Restaurants in Naka-ward using Foursquare 
    - 2.2 Pick top Italian restaurants in Naka-ward in Tabelog.com and Retty, two most popular restaurant review sites in Japan
    - 

## 1. Identify best District within Naka-ward

### 1.1 Segment Naka-ward by Town: using Japan Post Office postal code
Download the Town postal code list of Naka-ward from Post Office web site to local CSV file (= Town.csv)

https://api.nipponsoft.co.jp/zipcode/%E7%A5%9E%E5%A5%88%E5%B7%9D%E7%9C%8C%E6%A8%AA%E6%B5%9C%E5%B8%82%E4%B8%AD%E5%8C%BA

Use Geo-coding web service by uploading the CSV file to get latitude and longitude coordinates by postal code

https://www.tree-maps.com/zip-code-to-coordinate/

Copy the coordinates and paste onto local CSV file (= Coordinate.csv)

In [1]:
# import pandas library
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis

In [2]:
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_c3900fb2224740ba89b0f99a3170af4a = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='UKeuaRzo2Qc0Z_gawPPfM9iFdRjdo4zN_4G3aSt9VvUQ',
    ibm_auth_endpoint="https://iam.cloud.ibm.com/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')

body = client_c3900fb2224740ba89b0f99a3170af4a.get_object(Bucket='segmentingandclusteringneighborho-donotdelete-pr-v8wucrlwwluvmh',Key='Town.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_Town = pd.read_csv(body)
df_Town.head()


Unnamed: 0,Postal_Code,Pref,Ward,Town,Address
0,231-0001,神奈川県,横浜市中区,新港,神奈川県横浜市中区新港
1,231-0002,神奈川県,横浜市中区,海岸通,神奈川県横浜市中区海岸通
2,231-0003,神奈川県,横浜市中区,北仲通,神奈川県横浜市中区北仲通
3,231-0004,神奈川県,横浜市中区,元浜町,神奈川県横浜市中区元浜町
4,231-0005,神奈川県,横浜市中区,本町,神奈川県横浜市中区本町


### 1.2 Attach Coordinates to Town for Folium map visualization and Foursquare venue data download

In [3]:
# Import local CSV file, Coordinate.csv, to pandas dataframe on the notebook 
body = client_c3900fb2224740ba89b0f99a3170af4a.get_object(Bucket='segmentingandclusteringneighborho-donotdelete-pr-v8wucrlwwluvmh',Key='Coordinate.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_Coordinate = pd.read_csv(body)
df_Coordinate.head()


Unnamed: 0,Latitude,Longitude,Postal_Code
0,35.447439,139.636824,231-0012
1,35.442478,139.622418,231-0051
2,35.439313,139.626795,231-0057
3,35.420654,139.648998,231-0834
4,35.436849,139.641109,231-0868


In [4]:
# Check df_Town dataframe -> there are many NaN rows imported
df_Town

Unnamed: 0,Postal_Code,Pref,Ward,Town,Address
0,231-0001,神奈川県,横浜市中区,新港,神奈川県横浜市中区新港
1,231-0002,神奈川県,横浜市中区,海岸通,神奈川県横浜市中区海岸通
2,231-0003,神奈川県,横浜市中区,北仲通,神奈川県横浜市中区北仲通
3,231-0004,神奈川県,横浜市中区,元浜町,神奈川県横浜市中区元浜町
4,231-0005,神奈川県,横浜市中区,本町,神奈川県横浜市中区本町
...,...,...,...,...,...
208,,,,,
209,,,,,
210,,,,,
211,,,,,


In [5]:
# Remove NaN rows on df_Town
df_Town.dropna(how='all',axis=0)


Unnamed: 0,Postal_Code,Pref,Ward,Town,Address
0,231-0001,神奈川県,横浜市中区,新港,神奈川県横浜市中区新港
1,231-0002,神奈川県,横浜市中区,海岸通,神奈川県横浜市中区海岸通
2,231-0003,神奈川県,横浜市中区,北仲通,神奈川県横浜市中区北仲通
3,231-0004,神奈川県,横浜市中区,元浜町,神奈川県横浜市中区元浜町
4,231-0005,神奈川県,横浜市中区,本町,神奈川県横浜市中区本町
...,...,...,...,...,...
101,231-0864,神奈川県,横浜市中区,千代崎町,神奈川県横浜市中区千代崎町
102,231-0865,神奈川県,横浜市中区,北方町,神奈川県横浜市中区北方町
103,231-0866,神奈川県,横浜市中区,柏葉,神奈川県横浜市中区柏葉
104,231-0867,神奈川県,横浜市中区,打越,神奈川県横浜市中区打越


In [6]:
# Check df_Coordinate dataframe -> looks fine
df_Coordinate

Unnamed: 0,Latitude,Longitude,Postal_Code
0,35.447439,139.636824,231-0012
1,35.442478,139.622418,231-0051
2,35.439313,139.626795,231-0057
3,35.420654,139.648998,231-0834
4,35.436849,139.641109,231-0868
...,...,...,...
101,35.443559,139.640082,231-0022
102,35.446519,139.632680,231-0041
103,35.439292,139.642868,231-0024
104,35.441652,139.627941,231-0056


In [7]:
# Merge two tables joined by Postal_Code
df_merged=pd.merge(df_Town,df_Coordinate,on='Postal_Code')
df_merged

Unnamed: 0,Postal_Code,Pref,Ward,Town,Address,Latitude,Longitude
0,231-0001,神奈川県,横浜市中区,新港,神奈川県横浜市中区新港,35.454370,139.641190
1,231-0002,神奈川県,横浜市中区,海岸通,神奈川県横浜市中区海岸通,35.450641,139.642730
2,231-0003,神奈川県,横浜市中区,北仲通,神奈川県横浜市中区北仲通,35.449926,139.637674
3,231-0004,神奈川県,横浜市中区,元浜町,神奈川県横浜市中区元浜町,35.449272,139.640069
4,231-0005,神奈川県,横浜市中区,本町,神奈川県横浜市中区本町,35.449386,139.637551
...,...,...,...,...,...,...,...
101,231-0864,神奈川県,横浜市中区,千代崎町,神奈川県横浜市中区千代崎町,35.434236,139.655064
102,231-0865,神奈川県,横浜市中区,北方町,神奈川県横浜市中区北方町,35.433637,139.658545
103,231-0866,神奈川県,横浜市中区,柏葉,神奈川県横浜市中区柏葉,35.432117,139.643034
104,231-0867,神奈川県,横浜市中区,打越,神奈川県横浜市中区打越,35.434222,139.637952


In [8]:
# Remove unnecessary columns from the dataframe
drop_col=['Pref','Ward','Address']
df_Town=df_merged.drop(drop_col,axis=1)

In [9]:
df_Town

Unnamed: 0,Postal_Code,Town,Latitude,Longitude
0,231-0001,新港,35.454370,139.641190
1,231-0002,海岸通,35.450641,139.642730
2,231-0003,北仲通,35.449926,139.637674
3,231-0004,元浜町,35.449272,139.640069
4,231-0005,本町,35.449386,139.637551
...,...,...,...,...
101,231-0864,千代崎町,35.434236,139.655064
102,231-0865,北方町,35.433637,139.658545
103,231-0866,柏葉,35.432117,139.643034
104,231-0867,打越,35.434222,139.637952


### 1.3 Map Towns over Naka-ward geography with Folium

In [10]:
# Import libraries for Mapping (and following data analysis and charting)
!pip install folium
import folium
from folium import plugins
from geopy.geocoders import Nominatim
import numpy as np
import requests
from bs4 import BeautifulSoup
import os
from sklearn.cluster import KMeans
!pip install msgpack
import matplotlib.cm as cm
import matplotlib.colors as colors

Collecting folium
  Downloading folium-0.12.0-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 6.6 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.0
Collecting msgpack
  Downloading msgpack-1.0.2-cp37-cp37m-manylinux1_x86_64.whl (273 kB)
[K     |████████████████████████████████| 273 kB 13.7 MB/s eta 0:00:01
[?25hInstalling collected packages: msgpack
Successfully installed msgpack-1.0.2


In [11]:
# Define a center of Naka-ward map = the 84th row '231-0845 立野' is the center of Naka-ward
LAT=df_Town.Latitude.iloc[84]
LNG=df_Town.Longitude.iloc[84]

In [12]:
# Lay out each Ward coordinate on Yokohama map
loc=np.array([df_Town.Latitude,df_Town.Longitude]).T
map_Naka=folium.Map([LAT,LNG],zoom_start=14)
plugins.MarkerCluster(loc).add_to(map_Naka)
map_Naka

### 1.4 Cluster Towns rolling up to District: Use Foursquare venue data to cluster Towns to District by k-mean method

In [13]:
# Access Foursquare API to get venue data.  Define Client_ID, Client_Secret of mine.and VERSION
CLIENT_ID = 'Y2QO2VYM43BWS2OEAQ1PWCWVDKKELAE5S1QWBKQ3E2XWVNH4' 
CLIENT_SECRET = 'HTZH3ORGXVMLQKSLQWB3EW2BJCJBPFSR1IDIICWSIHTPRO0G' 
VERSION = '20201124'

In [14]:
# Set the function to retrieve venues
import requests
 
radius = 500
LIMIT = 500
 
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
 
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Town', 
                  'Town_Latitude', 
                  'Town_Longitude', 
                  'Venue', 
                  'Venue_Lat', 
                  'Venue_Long', 
                  'Venue_Category']
    
    return(nearby_venues)

In [15]:
# Apply above function to Town list 
Naka_venue = getNearbyVenues(names=df_Town['Town'],
                                   latitudes=df_Town['Latitude'],
                                   longitudes=df_Town['Longitude']
                                  )

新港
海岸通
北仲通
元浜町
本町
南仲通
弁天通
太田町
相生町
住吉町
常盤町
尾上町
真砂町
港町
日本大通
横浜公園
山下町
吉浜町
松影町
寿町
扇町
翁町
万代町
不老町
長者町
三吉町
千歳町
山田町
富士見町
山吹町
吉田町
福富町西通
福富町仲通
福富町東通
伊勢佐木町
末広町
羽衣町
蓬莱町
赤門町
英町
初音町
黄金町
末吉町
若葉町
曙町
弥生町
内田町
桜木町
花咲町
野毛町
宮川町
日ノ出町
新山下
小港町
本牧十二天
本牧宮原
和田山
本牧町
本牧ふ頭
錦町
かもめ町
豊浦町
千鳥町
南本牧
本牧原
本牧元町
本牧大里町
本牧三之谷
本牧間門
本牧荒井
本牧和田
矢口台
本牧緑ケ丘
本牧満坂
池袋
根岸加曽台
根岸町
滝之上
豆口台
仲尾台
妙香寺台
上野町
本郷町
西之谷町
立野
大和町
竹之丸
鷺山
麦田町
山元町
西竹之丸
根岸台
根岸旭台
寺久保
簑沢
塚越
大芝台
大平町
元町
山手町
諏訪町
千代崎町
北方町
柏葉
打越
石川町


In [16]:
Naka_venue.shape

(5032, 7)

In [17]:
Naka_venue

Unnamed: 0,Town,Town_Latitude,Town_Longitude,Venue,Venue_Lat,Venue_Long,Venue_Category
0,新港,35.454370,139.641190,Yokohama Hammerhead (横浜ハンマーヘッド),35.455791,139.641823,Shopping Mall
1,新港,35.454370,139.641190,Port Terrace Cafe,35.454525,139.640704,Café
2,新港,35.454370,139.641190,Yokohama Red Brick Warehouse (横浜赤レンガ倉庫),35.452296,139.642874,Historic Site
3,新港,35.454370,139.641190,Akarenga Park (赤レンガパーク),35.454207,139.643201,Park
4,新港,35.454370,139.641190,Motion Blue Yokohama,35.452713,139.643582,Jazz Club
...,...,...,...,...,...,...,...
5027,石川町,35.436849,139.641109,石川小学校下交差点,35.436504,139.636230,Intersection
5028,石川町,35.436849,139.641109,タンドリーハウス インド料理,35.440424,139.637979,Indian Restaurant
5029,石川町,35.436849,139.641109,柏葉 バス停,35.432482,139.641809,Bus Stop
5030,石川町,35.436849,139.641109,forgame,35.432570,139.639677,Shoe Store


In [18]:
# Count Venues by Town
Naka_venue.groupby('Town').count()

Unnamed: 0_level_0,Town_Latitude,Town_Longitude,Venue,Venue_Lat,Venue_Long,Venue_Category
Town,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
かもめ町,4,4,4,4,4,4
万代町,57,57,57,57,57,57
三吉町,26,26,26,26,26,26
上野町,23,23,23,23,23,23
不老町,50,50,50,50,50,50
...,...,...,...,...,...,...
錦町,1,1,1,1,1,1
長者町,75,75,75,75,75,75
鷺山,18,18,18,18,18,18
麦田町,19,19,19,19,19,19


In [19]:
# How many unique categories can be found from venues
print('There are {} uniques categories.'.format(len(Naka_venue['Venue_Category'].unique())))

There are 210 uniques categories.


In [20]:
# what are those 210 unique venue categories?
print(Naka_venue['Venue_Category'].unique())

['Shopping Mall' 'Café' 'Historic Site' 'Park' 'Jazz Club' 'Pie Shop'
 'Museum' 'Hot Spring' 'Music Venue' 'Australian Restaurant' 'Candy Store'
 'Restaurant' 'American Restaurant' 'Italian Restaurant' 'Playground'
 'Donut Shop' 'Hotel' 'Theme Park' 'Gift Shop' 'Coffee Shop'
 'Theme Park Ride / Attraction' 'Chocolate Shop' 'History Museum'
 'Paella Restaurant' 'Sukiyaki Restaurant' 'Hotel Bar'
 'Shabu-Shabu Restaurant' 'Ice Cream Shop' 'Japanese Restaurant'
 'Wedding Hall' 'Food Court' 'Discount Store' 'Multiplex' 'Arcade'
 'Hobby Shop' 'Pizza Place' 'Yoshoku Restaurant' 'Burger Joint'
 'Seafood Restaurant' 'Indian Restaurant' 'Mexican Restaurant' 'Buffet'
 'Takoyaki Place' 'Beer Bar' 'Chinese Restaurant' 'Convenience Store'
 'Fast Food Restaurant' 'Japanese Family Restaurant' 'Hawaiian Restaurant'
 'Gourmet Shop' 'Arts & Crafts Store' 'Marine Terminal' 'Clothing Store'
 'Pier' 'Pastry Shop' 'Furniture / Home Store' 'Luggage Store'
 'Shoe Store' 'Plaza' 'Bakery' 'Scenic Lookout' 'Outdo

# 1.4 Cluster Towns with K-mean method

In [21]:
# one hot encoding
Naka_onehot = pd.get_dummies(Naka_venue[['Venue_Category']], prefix="", prefix_sep="")
# add Town column back to dataframe
Naka_onehot['Town'] = Naka_venue['Town'] 
# move Town column to the first column
fixed_columns = [Naka_onehot.columns[-1]] + list(Naka_onehot.columns[:-1])
Naka_onehot=Naka_onehot[fixed_columns]
Naka_onehot.head()


Unnamed: 0,Town,ATM,Accessories Store,American Restaurant,Aquarium,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Video Store,Wagashi Place,Wedding Hall,Whisky Bar,Wine Bar,Wings Joint,Yakitori Restaurant,Yoshoku Restaurant,Zoo,Zoo Exhibit
0,新港,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,新港,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,新港,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,新港,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,新港,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [22]:
# Check Naka_onehot dataframe size -> rows and columns look okay
Naka_onehot.shape

(5032, 211)

In [23]:
# Group rows by Town and by the mean of occurance of each Category
Naka_grouped = Naka_onehot.groupby('Town').mean().reset_index()
Naka_grouped.head()

Unnamed: 0,Town,ATM,Accessories Store,American Restaurant,Aquarium,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Video Store,Wagashi Place,Wedding Hall,Whisky Bar,Wine Bar,Wings Joint,Yakitori Restaurant,Yoshoku Restaurant,Zoo,Zoo Exhibit
0,かもめ町,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,万代町,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.017544,0.017544,0.0,0.0,0.0
2,三吉町,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0
3,上野町,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,不老町,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0


In [24]:
# Check the dataframe shape -> looks okay as there are 106 Towns and 211 categories
Naka_grouped.shape

(106, 211)

In [25]:
# print each Town along with the top 5 most common venues
num_top_venues = 5

for hood in Naka_grouped['Town']:
    print("----"+hood+"----")
    temp = Naka_grouped[Naka_grouped['Town'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----かもめ町----
               venue  freq
0        Bus Station  0.25
1           Bus Stop  0.25
2       Intersection  0.25
3  Food & Drink Shop  0.25
4                ATM  0.00


----万代町----
                venue  freq
0    Ramen Restaurant  0.12
1   Convenience Store  0.09
2    Baseball Stadium  0.05
3                Café  0.05
4  Chinese Restaurant  0.05


----三吉町----
                venue  freq
0   Convenience Store  0.46
1    Ramen Restaurant  0.12
2  Donburi Restaurant  0.08
3        Liquor Store  0.04
4       Grocery Store  0.04


----上野町----
                venue  freq
0       Historic Site  0.22
1   Convenience Store  0.17
2  Chinese Restaurant  0.13
3                Park  0.09
4              Museum  0.04


----不老町----
                venue  freq
0   Convenience Store  0.16
1    Baseball Stadium  0.10
2    Ramen Restaurant  0.10
3                Café  0.06
4  Chinese Restaurant  0.04


----仲尾台----
               venue  freq
0  Convenience Store  0.13
1                ATM  0.07
2 

                 venue  freq
0             Bus Stop  0.29
1   Chinese Restaurant  0.07
2  Dumpling Restaurant  0.07
3                 Café  0.07
4           Restaurant  0.07


----根岸加曽台----
               venue  freq
0               Park  0.25
1  Convenience Store  0.25
2           Bus Stop  0.25
3        Sports Club  0.25
4                Pub  0.00


----根岸台----
                venue  freq
0            Bus Stop  0.29
1  Seafood Restaurant  0.07
2               Hotel  0.07
3                Park  0.07
4       Grocery Store  0.07


----根岸旭台----
               venue  freq
0  Convenience Store  0.25
1       Intersection  0.08
2               Park  0.08
3                ATM  0.04
4    Pachinko Parlor  0.04


----根岸町----
                venue  freq
0   Convenience Store   0.3
1                Park   0.2
2            Bus Stop   0.2
3  Chinese Restaurant   0.1
4    Ramen Restaurant   0.1


----桜木町----
                 venue  freq
0             Sake Bar  0.08
1  Japanese Restaurant  0.08
2     

In [26]:
# sort the categories by Town in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [27]:
# Put this to new data frame
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top 10 common venues (= common category type)
columns = ['Town']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
Town_venues_sorted = pd.DataFrame(columns=columns)
Town_venues_sorted['Town'] = Naka_grouped['Town']

for ind in np.arange(Naka_grouped.shape[0]):
    Town_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Naka_grouped.iloc[ind, :], num_top_venues)

Town_venues_sorted.head()

Unnamed: 0,Town,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,かもめ町,Intersection,Bus Station,Bus Stop,Food & Drink Shop,Zoo Exhibit,Fast Food Restaurant,Garden,Furniture / Home Store,Fried Chicken Joint,French Restaurant
1,万代町,Ramen Restaurant,Convenience Store,Baseball Stadium,Café,Chinese Restaurant,Hotel,Indian Restaurant,Bookstore,Sporting Goods Shop,Brewery
2,三吉町,Convenience Store,Ramen Restaurant,Donburi Restaurant,Bus Station,Cantonese Restaurant,Grocery Store,Drugstore,Liquor Store,Park,Wine Bar
3,上野町,Historic Site,Convenience Store,Chinese Restaurant,Park,Trail,Sushi Restaurant,Japanese Restaurant,Restaurant,Tennis Stadium,Ramen Restaurant
4,不老町,Convenience Store,Baseball Stadium,Ramen Restaurant,Café,Chinese Restaurant,Indian Restaurant,Hotel,Bookstore,Mobile Phone Shop,Sporting Goods Shop


In [28]:
# set number of clusters
kclusters = 5

Naka_grouped_clustering = Naka_grouped.drop('Town', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Naka_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([3, 0, 2, 2, 2, 2, 2, 0, 0, 0], dtype=int32)

In [29]:
# Create a new dataframe that includes the cluster and top 10 venues for each Town
# add clustering Labels
Town_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Naka_merged = df_Town

# merge Naka_grouped with df_Town to add latitude/longitude for each Town
Naka_merged = Naka_merged.join(Town_venues_sorted.set_index('Town'), on='Town')

Naka_merged.head()

Unnamed: 0,Postal_Code,Town,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,231-0001,新港,35.45437,139.64119,0,Café,Shopping Mall,Convenience Store,Park,Italian Restaurant,Hotel,Coffee Shop,Seafood Restaurant,Pie Shop,Yoshoku Restaurant
1,231-0002,海岸通,35.450641,139.64273,0,Café,History Museum,Park,American Restaurant,Historic Site,Shopping Mall,Italian Restaurant,Convenience Store,Seafood Restaurant,Sporting Goods Shop
2,231-0003,北仲通,35.449926,139.637674,0,Café,Hotel,Italian Restaurant,Japanese Restaurant,Bed & Breakfast,BBQ Joint,Soba Restaurant,Sake Bar,History Museum,Tonkatsu Restaurant
3,231-0004,元浜町,35.449272,139.640069,0,Café,Hotel,Italian Restaurant,Jazz Club,BBQ Joint,Convenience Store,Sake Bar,Shopping Mall,Soba Restaurant,Bar
4,231-0005,本町,35.449386,139.637551,0,Café,Hotel,Italian Restaurant,Coffee Shop,Sake Bar,Bed & Breakfast,Japanese Restaurant,Soba Restaurant,Tonkatsu Restaurant,History Museum


In [88]:
# Check Naka_merged dataframe if Common venues are reasonably differentiated among Towns
pd.set_option('display.max_rows',500)
Naka_merged

Unnamed: 0,Postal_Code,Town,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,231-0001,新港,35.45437,139.64119,0,Café,Shopping Mall,Convenience Store,Park,Italian Restaurant,Hotel,Coffee Shop,Seafood Restaurant,Pie Shop,Yoshoku Restaurant
1,231-0002,海岸通,35.450641,139.64273,0,Café,History Museum,Park,American Restaurant,Historic Site,Shopping Mall,Italian Restaurant,Convenience Store,Seafood Restaurant,Sporting Goods Shop
2,231-0003,北仲通,35.449926,139.637674,0,Café,Hotel,Italian Restaurant,Japanese Restaurant,Bed & Breakfast,BBQ Joint,Soba Restaurant,Sake Bar,History Museum,Tonkatsu Restaurant
3,231-0004,元浜町,35.449272,139.640069,0,Café,Hotel,Italian Restaurant,Jazz Club,BBQ Joint,Convenience Store,Sake Bar,Shopping Mall,Soba Restaurant,Bar
4,231-0005,本町,35.449386,139.637551,0,Café,Hotel,Italian Restaurant,Coffee Shop,Sake Bar,Bed & Breakfast,Japanese Restaurant,Soba Restaurant,Tonkatsu Restaurant,History Museum
5,231-0006,南仲通,35.448198,139.638584,0,Café,Coffee Shop,Convenience Store,Bed & Breakfast,Tonkatsu Restaurant,History Museum,Museum,Sake Bar,Park,Ramen Restaurant
6,231-0007,弁天通,35.448037,139.637668,0,Coffee Shop,Café,Bed & Breakfast,Convenience Store,Tonkatsu Restaurant,History Museum,Sake Bar,Steakhouse,Ramen Restaurant,Udon Restaurant
7,231-0011,太田町,35.447828,139.637055,0,Coffee Shop,Café,Convenience Store,Bed & Breakfast,Tonkatsu Restaurant,Udon Restaurant,Bar,Ramen Restaurant,Bookstore,Sake Bar
8,231-0012,相生町,35.447439,139.636824,0,Coffee Shop,Café,Ramen Restaurant,Bed & Breakfast,Convenience Store,Tonkatsu Restaurant,History Museum,Udon Restaurant,Bar,Bookstore
9,231-0013,住吉町,35.447036,139.636577,0,Café,Coffee Shop,Ramen Restaurant,Bed & Breakfast,Tonkatsu Restaurant,Convenience Store,History Museum,Udon Restaurant,Bar,Bookstore


In [30]:
# Prepare latitude and longitude of Naka-ku for Map creation
address = 'Yokohama'

geolocator = Nominatim(user_agent="Naka_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Naka-ward are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Naka-ward are 35.444991, 139.636768.


In [31]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Naka_merged['Latitude'], Naka_merged['Longitude'], Naka_merged['Town'], Naka_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster Labels ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### 1.5 Review Districts to understand their characteristics and name them

In [32]:
# Cluster Label 0: Business & Commercial District
Naka_merged.loc[Naka_merged['Cluster Labels'] == 0, Naka_merged.columns[[1] + list(range(5, Naka_merged.shape[1]))]]

Unnamed: 0,Town,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,新港,Café,Shopping Mall,Convenience Store,Park,Italian Restaurant,Hotel,Coffee Shop,Seafood Restaurant,Pie Shop,Yoshoku Restaurant
1,海岸通,Café,History Museum,Park,American Restaurant,Historic Site,Shopping Mall,Italian Restaurant,Convenience Store,Seafood Restaurant,Sporting Goods Shop
2,北仲通,Café,Hotel,Italian Restaurant,Japanese Restaurant,Bed & Breakfast,BBQ Joint,Soba Restaurant,Sake Bar,History Museum,Tonkatsu Restaurant
3,元浜町,Café,Hotel,Italian Restaurant,Jazz Club,BBQ Joint,Convenience Store,Sake Bar,Shopping Mall,Soba Restaurant,Bar
4,本町,Café,Hotel,Italian Restaurant,Coffee Shop,Sake Bar,Bed & Breakfast,Japanese Restaurant,Soba Restaurant,Tonkatsu Restaurant,History Museum
5,南仲通,Café,Coffee Shop,Convenience Store,Bed & Breakfast,Tonkatsu Restaurant,History Museum,Museum,Sake Bar,Park,Ramen Restaurant
6,弁天通,Coffee Shop,Café,Bed & Breakfast,Convenience Store,Tonkatsu Restaurant,History Museum,Sake Bar,Steakhouse,Ramen Restaurant,Udon Restaurant
7,太田町,Coffee Shop,Café,Convenience Store,Bed & Breakfast,Tonkatsu Restaurant,Udon Restaurant,Bar,Ramen Restaurant,Bookstore,Sake Bar
8,相生町,Coffee Shop,Café,Ramen Restaurant,Bed & Breakfast,Convenience Store,Tonkatsu Restaurant,History Museum,Udon Restaurant,Bar,Bookstore
9,住吉町,Café,Coffee Shop,Ramen Restaurant,Bed & Breakfast,Tonkatsu Restaurant,Convenience Store,History Museum,Udon Restaurant,Bar,Bookstore


In [33]:
# Cluster Label 1: Port District
Naka_merged.loc[Naka_merged['Cluster Labels'] == 1, Naka_merged.columns[[1] + list(range(5, Naka_merged.shape[1]))]]

Unnamed: 0,Town,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
58,本牧ふ頭,Pier,Harbor / Marina,Cafeteria,Port,Zoo Exhibit,Fast Food Restaurant,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Fountain
63,南本牧,Pier,Convenience Store,Harbor / Marina,Zoo Exhibit,Exhibit,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Fountain,Food Court


In [34]:
# Cluster Label 2: Shopping & Restaurant District
Naka_merged.loc[Naka_merged['Cluster Labels'] == 2, Naka_merged.columns[[1] + list(range(5, Naka_merged.shape[1]))]]

Unnamed: 0,Town,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,松影町,Convenience Store,Baseball Stadium,Ramen Restaurant,Chinese Restaurant,Italian Restaurant,Indian Restaurant,Historic Site,Bed & Breakfast,Intersection,Restaurant
19,寿町,Convenience Store,Ramen Restaurant,Baseball Stadium,Chinese Restaurant,Café,Bar,Indian Restaurant,Coffee Shop,Grocery Store,Sporting Goods Shop
20,扇町,Convenience Store,Baseball Stadium,Chinese Restaurant,Ramen Restaurant,Grocery Store,Café,Donburi Restaurant,Indian Restaurant,Bed & Breakfast,Sporting Goods Shop
21,翁町,Convenience Store,Baseball Stadium,Grocery Store,Ramen Restaurant,Intersection,Café,Sporting Goods Shop,Donburi Restaurant,Japanese Restaurant,Soba Restaurant
23,不老町,Convenience Store,Baseball Stadium,Ramen Restaurant,Café,Chinese Restaurant,Indian Restaurant,Hotel,Bookstore,Mobile Phone Shop,Sporting Goods Shop
25,三吉町,Convenience Store,Ramen Restaurant,Donburi Restaurant,Bus Station,Cantonese Restaurant,Grocery Store,Drugstore,Liquor Store,Park,Wine Bar
26,千歳町,Convenience Store,Ramen Restaurant,Teishoku Restaurant,Grocery Store,Donburi Restaurant,Japanese Family Restaurant,Hotel,Liquor Store,Middle Eastern Restaurant,Park
27,山田町,Convenience Store,Ramen Restaurant,Grocery Store,Japanese Curry Restaurant,Teishoku Restaurant,Donburi Restaurant,Intersection,Liquor Store,Bus Station,Park
28,富士見町,Convenience Store,Ramen Restaurant,Grocery Store,Donburi Restaurant,Chinese Restaurant,Japanese Curry Restaurant,Coffee Shop,Teishoku Restaurant,Italian Restaurant,Seafood Restaurant
29,山吹町,Convenience Store,Ramen Restaurant,Grocery Store,Chinese Restaurant,Coffee Shop,Teishoku Restaurant,Japanese Curry Restaurant,Donburi Restaurant,Tonkatsu Restaurant,Discount Store


In [35]:
# Cluster Label 3: Park & Residential District
Naka_merged.loc[Naka_merged['Cluster Labels'] == 3, Naka_merged.columns[[1] + list(range(5, Naka_merged.shape[1]))]]

Unnamed: 0,Town,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
57,本牧町,Bus Stop,Convenience Store,Park,Grocery Store,Szechuan Restaurant,Bakery,Pharmacy,Chinese Restaurant,Snack Place,Bar
60,かもめ町,Intersection,Bus Station,Bus Stop,Food & Drink Shop,Zoo Exhibit,Fast Food Restaurant,Garden,Furniture / Home Store,Fried Chicken Joint,French Restaurant
61,豊浦町,Train Station,Intersection,Bus Stop,Event Space,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Fountain,Food Court,Food & Drink Shop
62,千鳥町,Park,Tennis Court,Bus Stop,Toll Booth,Pool,Historic Site,History Museum,Baseball Field,Bus Station,Intersection
67,本牧三之谷,Historic Site,Park,Bus Stop,Convenience Store,Bakery,Garden,Café,Tea Room,Grocery Store,Snack Place
68,本牧間門,Bus Stop,Park,Snack Place,Bar,Bakery,Historic Site,Coffee Shop,Grocery Store,Lake,Steakhouse
69,本牧荒井,Sports Club,Bus Stop,Bar,Supermarket,Grocery Store,Coffee Shop,Ramen Restaurant,Indian Restaurant,Park,Steakhouse
73,本牧満坂,Japanese Restaurant,Coffee Shop,Park,Grocery Store,Bus Stop,Plaza,Snack Place,Scenic Lookout,Donburi Restaurant,Donut Shop
75,根岸加曽台,Convenience Store,Park,Bus Stop,Sports Club,Zoo Exhibit,Fast Food Restaurant,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Fountain
77,滝之上,Bus Stop,Park,Ramen Restaurant,Plaza,Chinese Restaurant,Snack Place,Museum,Bakery,Stables,Seafood Restaurant


In [36]:
# Cluster Label 4: Industrial District
Naka_merged.loc[Naka_merged['Cluster Labels'] == 4, Naka_merged.columns[[1] + list(range(5, Naka_merged.shape[1]))]]

Unnamed: 0,Town,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
59,錦町,Shipping Store,Zoo Exhibit,Event Space,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Fountain,Food Court,Food & Drink Shop,Food


In [37]:
Naka_merged.groupby('Cluster Labels').size()

Cluster Labels
0    50
1     2
2    36
3    17
4     1
dtype: int64

### More suited for Italian restaurant District is either Label 0 or 2: Business & Commercial District or Shopping & Restaurant District

In [38]:
# Creating a new data frame containing only Cluster Labels 0 or 2 as Pos_Towns (=Possible Towns)
Pos_Town=Naka_merged[(Naka_merged['Cluster Labels']==0) | (Naka_merged['Cluster Labels']==2)]
Pos_Towns=Pos_Town[['Town','Latitude','Longitude','Cluster Labels']]
Pos_Towns

Unnamed: 0,Town,Latitude,Longitude,Cluster Labels
0,新港,35.454370,139.641190,0
1,海岸通,35.450641,139.642730,0
2,北仲通,35.449926,139.637674,0
3,元浜町,35.449272,139.640069,0
4,本町,35.449386,139.637551,0
...,...,...,...,...
100,諏訪町,35.436818,139.653822,0
101,千代崎町,35.434236,139.655064,2
102,北方町,35.433637,139.658545,2
104,打越,35.434222,139.637952,2


## 2. Identify successful Italian restaurants in the 2 Disticts: Business & Commercial District and Shopping & Restaurant District

### 2.1 Identify Italian Restaurants in Cluster Labels 0 or 2 Districts using Foursquare

In [53]:
# Set the function to retrieve venues
import requests
 
radius = 150
LIMIT = 100
 
def getNearbyVenues(names, latitudes, longitudes, radius=150):
    
    venues_list=[]
    categoryID='4bf58dd8d48988d110941735' # Italian Restaurant 
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            categoryID)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
 
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Town', 
                  'Town_Latitude', 
                  'Town_Longitude', 
                  'Venue', 
                  'Venue_Lat', 
                  'Venue_Long', 
                  'Venue_Category']
    
    return(nearby_venues)

In [54]:
# Apply above function to Town list 
Naka_Italian_venue = getNearbyVenues(names=Pos_Towns['Town'],
                                   latitudes=df_Town['Latitude'],
                                   longitudes=df_Town['Longitude']
                                  )

新港
海岸通
北仲通
元浜町
本町
南仲通
弁天通
太田町
相生町
住吉町
常盤町
尾上町
真砂町
港町
日本大通
横浜公園
山下町
吉浜町
松影町
寿町
扇町
翁町
万代町
不老町
長者町
三吉町
千歳町
山田町
富士見町
山吹町
吉田町
福富町西通
福富町仲通
福富町東通
伊勢佐木町
末広町
羽衣町
蓬莱町
赤門町
英町
初音町
黄金町
末吉町
若葉町
曙町
弥生町
内田町
桜木町
花咲町
野毛町
宮川町
日ノ出町
新山下
小港町
本牧十二天
本牧宮原
和田山
本牧原
本牧元町
本牧大里町
本牧和田
矢口台
本牧緑ケ丘
池袋
根岸町
豆口台
仲尾台
妙香寺台
上野町
本郷町
西之谷町
立野
大和町
竹之丸
鷺山
麦田町
根岸旭台
大芝台
大平町
元町
山手町
諏訪町
千代崎町
北方町
打越
石川町


In [55]:
# Check how many venues are captured -> 94 venues
Naka_Italian_venue.shape

(94, 7)

In [56]:
# Check briefly the names of Italian restaurants -> Venue_Category looks fine
Naka_Italian_venue

Unnamed: 0,Town,Town_Latitude,Town_Longitude,Venue,Venue_Lat,Venue_Long,Venue_Category
0,新港,35.45437,139.64119,A16 YOKOHAMA,35.454704,139.642311,Italian Restaurant
1,北仲通,35.449926,139.637674,ROJI,35.449998,139.637779,Italian Restaurant
2,北仲通,35.449926,139.637674,Osteria Austro,35.44941,139.63853,Italian Restaurant
3,北仲通,35.449926,139.637674,La Brezza BASHAMICHI (ラブレッツア馬車道),35.449959,139.637685,Italian Restaurant
4,元浜町,35.449272,139.640069,Osteria Austro,35.44941,139.63853,Italian Restaurant
5,元浜町,35.449272,139.640069,Taverna Pollone (タベルナポローネ),35.449406,139.640001,Italian Restaurant
6,本町,35.449386,139.637551,ROJI,35.449998,139.637779,Italian Restaurant
7,本町,35.449386,139.637551,Osteria Austro,35.44941,139.63853,Italian Restaurant
8,本町,35.449386,139.637551,La Brezza BASHAMICHI (ラブレッツア馬車道),35.449959,139.637685,Italian Restaurant
9,南仲通,35.448198,139.638584,Cafe&Kitchen. 333,35.447586,139.638965,Italian Restaurant


In [63]:
# Add back Cluster Labels to above data frame by merging it with Pos_Town data frame
Naka_Italian=pd.merge(Naka_Italian_venue,Pos_Towns,on='Town')
Italian_list=Naka_Italian.drop(['Latitude','Longitude'], axis=1)
Italian_list

Unnamed: 0,Town,Town_Latitude,Town_Longitude,Venue,Venue_Lat,Venue_Long,Venue_Category,Cluster Labels
0,新港,35.45437,139.64119,A16 YOKOHAMA,35.454704,139.642311,Italian Restaurant,0
1,北仲通,35.449926,139.637674,ROJI,35.449998,139.637779,Italian Restaurant,0
2,北仲通,35.449926,139.637674,Osteria Austro,35.44941,139.63853,Italian Restaurant,0
3,北仲通,35.449926,139.637674,La Brezza BASHAMICHI (ラブレッツア馬車道),35.449959,139.637685,Italian Restaurant,0
4,元浜町,35.449272,139.640069,Osteria Austro,35.44941,139.63853,Italian Restaurant,0
5,元浜町,35.449272,139.640069,Taverna Pollone (タベルナポローネ),35.449406,139.640001,Italian Restaurant,0
6,本町,35.449386,139.637551,ROJI,35.449998,139.637779,Italian Restaurant,0
7,本町,35.449386,139.637551,Osteria Austro,35.44941,139.63853,Italian Restaurant,0
8,本町,35.449386,139.637551,La Brezza BASHAMICHI (ラブレッツア馬車道),35.449959,139.637685,Italian Restaurant,0
9,南仲通,35.448198,139.638584,Cafe&Kitchen. 333,35.447586,139.638965,Italian Restaurant,0


In [64]:
# Count Italian Venues by Town -> Good to check Italian restaurants are fairly distributed across Towns
Italian_list.groupby('Town').count()

Unnamed: 0_level_0,Town_Latitude,Town_Longitude,Venue,Venue_Lat,Venue_Long,Venue_Category,Cluster Labels
Town,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
万代町,1,1,1,1,1,1,1
不老町,1,1,1,1,1,1,1
住吉町,5,5,5,5,5,5,5
元浜町,2,2,2,2,2,2,2
内田町,1,1,1,1,1,1,1
北仲通,3,3,3,3,3,3,3
南仲通,3,3,3,3,3,3,3
吉田町,5,5,5,5,5,5,5
太田町,5,5,5,5,5,5,5
宮川町,6,6,6,6,6,6,6


In [67]:
# Check the number of unique venue names -> 94 venues captured vs. 46 unique names.  There are fair amount of duplications in the data frame, although some of them are chains and okay for duplication
uc=Italian_list['Venue'].nunique()
uc

46

In [65]:
# Check which Italian Restaurants are suspected for duplication in the data frame
pd.set_option('display.max_rows',200)
vc=Italian_list['Venue'].value_counts()
vc

Piacere                              5
Via Toscanella                       5
VINOTECA SAKURA                      5
Pizza Cozou - ぴざこぞう                  4
Osteria Austro                       4
ハマラジャ                                4
La Pausa (ラパウザ)                      4
Saizeriya (サイゼリヤ)                    3
スポーツカフェ ヤンキィース                       2
Italian Bar BACCO                    2
ベイサイド Ducky Duck キッチン                2
OiNOS                                2
Italian Bar BASIL                    2
L'isola del Brio                     2
ROJI                                 2
イタリアンバル ぽると 関内駅前店                    2
OREZZO                               2
イタリア料理 プリモ                           2
kinpira kitchen                      2
La Brezza BASHAMICHI (ラブレッツア馬車道)     2
ガーリックハウス                             2
iL-CHIANTI 横浜店                       2
レストラン Vino                           2
ヤンキース                                2
MILANO                               2
Cafe&Kitchen. 333        

In [69]:
# Visualize where those Italian restaurants are today in Cluster Lables 0 and 2 -> Almost all Italian restaurants are actually in Cluster Labels 0 = Business & Commercial District
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=14)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Italian_list['Venue_Lat'], Italian_list['Venue_Long'], Italian_list['Venue'], Italian_list['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster Labels ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters