# Capstone Project - A Comparative Analysis of Venue Interest in Kobe City and Osaka City Japan Using Foursquare API Data
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

The main objective of this study is to understand key venue categories and the top 25 venues within each category that remain popular despite the state of emergency declarations and restrictions placed on restaurants, shops, and other places of interest around the following major train stations 
-	Umeda Station, Osaka
-	Sannomiya Station, Kobe

The author rationalizes that such a study could be of interest to the following key stakeholder groups in terms of making key strategic decisions
The City Hall which could leverage the data to identify potential areas of risk for the spread of infections during peak times where there is frequent movement of large groups of people or until a satisfactory target demographic has been vaccinated.
Business owners and individuals seeking to relocate their offices or set up a new business in terms of identifying a shift of key areas of commercial activity 
Real-estate and construction companies to consider land value, demand, and the repurposing of existing buildings.
Business owners and entrepreneurs in terms of understanding which types of venues are popular to consider investing in or setting up new ventures.


## Data Sources and Methodology  <a name="Data Sources and Methodology"></a>

In order to determine the key venue categories between each major city, the first stage will involve using data from Foursquare's API will be used to understand venues and their popularity within a 1000 meter (1km) radius from each station based on the following conditions
- 25 most popular venue categories
- 25 most popular venues within each category

The second stage of this process would be to compare and contrast the results of the two cities which will involve A machine learning based comparison of data using K-means clustering with the same results from stage 1 to generate the 10 most popular venues

The final stage of the data analysis process would involve testing the k-means clustering model with a larger set of data by increasing following variables for the same two cities.
- most popular venue categories
- most popular venue within each category
- a 2000 meter (2km) radius from each station for each venue

### Library Import List  <a name="Library Import List"></a>

In [84]:
#Add list of libraries here
import requests
!pip install folium
import folium
import pandas as pd
import numpy as np
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 
# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize
! pip install folium==0.5.0
import folium # plotting library
import matplotlib.pyplot as plt # plotting library
# backend for rendering plots within the browser
%matplotlib inline 
from sklearn.cluster import KMeans 
from sklearn.datasets import make_blobs
from sklearn.preprocessing import StandardScaler
import matplotlib.cm as cm
import matplotlib.colors as colors
print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


### Foursquare API Configuration <a name="Foursquare API Configuration"></a>

In [85]:
CLIENT_ID = 'HDAKBGBGZTCF3FFZQXSF2YJGLUMKUG42ZIIOCCONKXDTDIXC'
CLIENT_SECRET = 'ZDSQ1LXYADSTCIM45TJXJTRLHKE0FFR41X2RMXCIDJWGKYKS'
VERSION = '20180604'
LIMIT = 30
print('Foursquare API credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Foursquare API credentails:
CLIENT_ID: HDAKBGBGZTCF3FFZQXSF2YJGLUMKUG42ZIIOCCONKXDTDIXC
CLIENT_SECRET:ZDSQ1LXYADSTCIM45TJXJTRLHKE0FFR41X2RMXCIDJWGKYKS


### Stage 1 - Initial Data Extraction <a name="Stage 1 - Initial Data Extraction"></a>

In [86]:
# Find identify top 10 popular venue categories and subsequent top 10 venues around the following train stations within a 1,000m radius 
# - Umeda Station, Osaka, Japan
# - Sannomiya Station, Kobe, Japan

# Use coordinates of each station as a point

#### Umeda Station, Osaka, Japan Analysis <a name="Umeda Station, Osaka, Japan Analysis "></a>

In [87]:
## Get Coordinates of Umeda Station, Osaka, Japan

osaka_address = 'Umeda Station, Osaka, Japan'

geolocator = Nominatim(user_agent="foursquare_agent")
osaka_location = geolocator.geocode(osaka_address)
osaka_latitude = osaka_location.latitude
osaka_longitude = osaka_location.longitude
print('The latitude and longitude for Umeda Station is:', osaka_latitude, osaka_longitude)

The latitude and longitude for Umeda Station is: 34.7053631 135.4979778


In [88]:
# Identify top 10 popular venue categories and subsequent top 10 venues around Umeda Station, Osaka, Japan
osaka_latitude = osaka_location.latitude
osaka_longitude= osaka_location.longitude
radius = 1000
LIMIT = 25
osaka_url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, osaka_latitude, osaka_longitude, VERSION, radius, LIMIT)
osaka_url

osaka_results = requests.get(osaka_url).json()
'There are {} around Sannomiya Station, Kobe, Japan.'.format(len(osaka_results['response']['groups'][0]['items']))

osaka_items = osaka_results['response']['groups'][0]['items']
osaka_items[0]

osaka_dataframe = json_normalize(osaka_items) # flatten JSON

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter columns
osaka_filtered_columns = ['venue.name', 'venue.categories'] + [col for col in osaka_dataframe.columns if col.startswith('venue.location.')] + ['venue.id']
osaka_dataframe_filtered = osaka_dataframe.loc[:, osaka_filtered_columns]

# filter the category for each row
osaka_dataframe_filtered['venue.categories'] = osaka_dataframe_filtered.apply(get_category_type, axis=1)

# clean columns
osaka_dataframe_filtered.columns = [col.split('.')[-1] for col in osaka_dataframe_filtered.columns]

osaka_dataframe_filtered.head(25)

  osaka_dataframe = json_normalize(osaka_items) # flatten JSON


Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,neighborhood,city,state,country,formattedAddress,id
0,Indian Curry (インデアンカレー),Japanese Curry Restaurant,北区芝田1-3,阪急三番街 南館 B2F,34.705035,135.498755,"[{'label': 'display', 'lat': 34.70503520001439...",79,530-0012,JP,梅田,大阪市,大阪府,日本,"[北区芝田1-3 (阪急三番街 南館 B2F), 大阪市, 大阪府, 530-0012, 日本]",4b5bb438f964a520181129e3
1,Kyu Yam Tetsudou (旧ヤム鐵道),Japanese Curry Restaurant,北区梅田3-1-3,LUCUA B2F,34.703723,135.496843,"[{'label': 'display', 'lat': 34.70372271536428...",210,530-8217,JP,,大阪市,大阪府,日本,"[北区梅田3-1-3 (LUCUA B2F), 大阪市, 大阪府, 530-8217, 日本]",5517b1da498e650d9b08ff4e
2,Rilakkuma Store (リラックマストア),Hobby Shop,北区芝田1-1-3,阪急三番街 北館 1F,34.705529,135.49833,"[{'label': 'display', 'lat': 34.70552912959402...",37,530-0012,JP,,大阪市,大阪府,日本,"[北区芝田1-1-3 (阪急三番街 北館 1F), 大阪市, 大阪府, 530-0012, 日本]",4c0e1c46d64c0f47269c275d
3,Shin-Umeda Shokudogai (新梅田食道街),Food Court,北区角田町9-26,,34.703826,135.497891,"[{'label': 'display', 'lat': 34.70382642951151...",171,530-0017,JP,,大阪市,大阪府,日本,"[北区角田町9-26, 大阪市, 大阪府, 530-0017, 日本]",4b63b4adf964a5203b8d2ae3
4,Honmiyake (本みやけ),Nabe Restaurant,北区芝田1-1-3,阪急三番街 南館 B2F,34.704814,135.498132,"[{'label': 'display', 'lat': 34.704814, 'lng':...",62,530-0012,JP,梅田,大阪市,大阪府,日本,"[北区芝田1-1-3 (阪急三番街 南館 B2F), 大阪市, 大阪府, 530-0012,...",4b9092b6f964a520999033e3
5,Y・C 梅田店,Café,北区角田町9-21,新梅田食道街 1F,34.703924,135.497496,"[{'label': 'display', 'lat': 34.70392392093758...",166,530-0017,JP,梅田,大阪市,大阪府,日本,"[北区角田町9-21 (新梅田食道街 1F), 大阪市, 大阪府, 530-0017, 日本]",4d92ec00fa1ef04d15c160c7
6,Kushikatsu Matsuba (串かつ松葉 総本店),Kushikatsu Restaurant,北区角田町9-20,新梅田食道街 1F,34.703819,135.497767,"[{'label': 'display', 'lat': 34.70381887442366...",172,530-0017,JP,,大阪市,大阪府,日本,"[北区角田町9-20 (新梅田食道街 1F), 大阪市, 大阪府, 530-0017, 日本]",4c482e8776d72d7faf733f4d
7,Hanadako (はなだこ),Takoyaki Place,北区角田町9-16,新梅田食道街 1F,34.702923,135.497682,"[{'label': 'display', 'lat': 34.70292288174605...",272,530-0017,JP,,大阪市,大阪府,日本,"[北区角田町9-16 (新梅田食道街 1F), 大阪市, 大阪府, 530-0017, 日本]",4b57c014f964a520333f28e3
8,ポンガラカレー,Sri Lankan Restaurant,北区角田町8-47,阪急サン広場 B1F,34.703289,135.498512,"[{'label': 'display', 'lat': 34.70328883050338...",236,530-0017,JP,,大阪市,大阪府,日本,"[北区角田町8-47 (阪急サン広場 B1F), 大阪市, 大阪府, 530-0017, 日本]",5701e824cd10cda211bd0c72
9,Yodobashi-Umeda (ヨドバシカメラ マルチメディア梅田),Electronics Store,北区大深町1-1,,34.704117,135.496552,"[{'label': 'display', 'lat': 34.70411746895576...",190,530-0011,JP,北区,大阪市,大阪府,日本,"[北区大深町1-1, 大阪市, 大阪府, 530-0011, 日本]",4b5aad26f964a52068d028e3


In [89]:
osaka_venues_map = folium.Map(osaka_location=[osaka_latitude, osaka_longitude], zoom_start=15) # generate map centred around Umeda Station


# add Umeda as a red circle mark
folium.CircleMarker(
    [osaka_latitude, osaka_longitude],
    radius=10,
    popup='Umeda Station',
    fill=True,
    color='red',
    fill_color='red',
    fill_opacity=0.6
    ).add_to(osaka_venues_map)


# add popular spots to the map as blue circle markers
for lat, lng, label in zip(osaka_dataframe_filtered.head(10).lat, osaka_dataframe_filtered.lng, osaka_dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        fill=True,
        color='blue',
        fill_color='blue',
        fill_opacity=0.6
        ).add_to(osaka_venues_map)

# display map
osaka_venues_map

#### Sannomiya Station, Kobe, Japan Analysis <a name="Sannomiya Station, Kobe, Japan Analysis "></a>

In [90]:
## Get Coordinates of Sannomiya Station, Kobe, Japan

kobe_address = 'Sannomiya Station, Kobe, Japan'

geolocator = Nominatim(user_agent="foursquare_agent")
kobe_location = geolocator.geocode(kobe_address)
kobe_latitude = kobe_location.latitude
kobe_longitude = kobe_location.longitude
print('The latitude and longitude for Sannomiya Station is:', kobe_latitude, kobe_longitude)

The latitude and longitude for Sannomiya Station is: 34.6933427 135.1952946


In [91]:
# Identify top 10 popular venue categories and subsequent top 10 venues around Sannomiya Station, Kobe, Japan
kobe_latitude = kobe_location.latitude
kobe_longitude = kobe_location.longitude
radius = 1000
LIMIT = 25
kobe_url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, kobe_latitude, kobe_longitude, VERSION, radius, LIMIT)
kobe_url

kobe_results = requests.get(kobe_url).json()
'There are {} around Sannomiya Station, Kobe, Japan.'.format(len(kobe_results['response']['groups'][0]['items']))

kobe_items = kobe_results['response']['groups'][0]['items']
kobe_items[0]

kobe_dataframe = json_normalize(kobe_items) # flatten JSON

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter columns
kobe_filtered_columns = ['venue.name', 'venue.categories'] + [col for col in kobe_dataframe.columns if col.startswith('venue.location.')] + ['venue.id']
kobe_dataframe_filtered = kobe_dataframe.loc[:, kobe_filtered_columns]

# filter the category for each row
kobe_dataframe_filtered['venue.categories'] = kobe_dataframe_filtered.apply(get_category_type, axis=1)

# clean columns
kobe_dataframe_filtered.columns = [col.split('.')[-1] for col in kobe_dataframe_filtered.columns]

kobe_dataframe_filtered.head(25)

  kobe_dataframe = json_normalize(kobe_items) # flatten JSON


Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
0,Nishimura's Coffee (にしむら珈琲店 三宮店),Coffee Shop,中央区琴ノ緒町5-3-5,グリーンシャポービル 1F-2F,34.695046,135.194363,"[{'label': 'display', 'lat': 34.69504555207305...",207,651-0094,JP,神戸市,兵庫県,日本,"[中央区琴ノ緒町5-3-5 (グリーンシャポービル 1F-2F), 神戸市, 兵庫県, 65...",,4b7bbb6df964a5203f6c2fe3
1,Juban (十番 三ノ宮店),Donburi Restaurant,中央区琴ノ緒町4-1-396,高架下,34.695909,135.195769,"[{'label': 'display', 'lat': 34.695909, 'lng':...",288,651-0094,JP,神戸市,兵庫県,日本,"[中央区琴ノ緒町4-1-396 (高架下), 神戸市, 兵庫県, 651-0094, 日本]",,4bc6fb706501c9b60e8c3d29
2,ビゴの店,Bakery,中央区御幸通8-1-6,神戸国際会館 B2F,34.691689,135.195518,"[{'label': 'display', 'lat': 34.69168869768715...",185,651-0087,JP,神戸市,兵庫県,日本,"[中央区御幸通8-1-6 (神戸国際会館 B2F), 神戸市, 兵庫県, 651-0087,...",,4b9b3a42f964a52008fb35e3
3,CAFE KESHiPEARL,Café,中央区御幸通6-1-25,もものき三宮ビル 2F,34.693536,135.197889,"[{'label': 'display', 'lat': 34.69353586057077...",238,651-0087,JP,神戸市,兵庫県,日本,"[中央区御幸通6-1-25 (もものき三宮ビル 2F), 神戸市, 兵庫県, 651-008...",,4f3550da0cd6e71afa08b63c
4,Kobe Kokusai Hall (こくさいホール),Concert Hall,中央区御幸通8-1-6,神戸国際会館 2F,34.692156,135.195726,"[{'label': 'display', 'lat': 34.69215640015317...",137,651-0087,JP,神戸市,兵庫県,日本,"[中央区御幸通8-1-6 (神戸国際会館 2F), 神戸市, 兵庫県, 651-0087, 日本]",,4d0092f90457b1f7fc053278
5,モロゾフ 神戸本店,Café,中央区三宮町1-8-1,,34.69186,135.193589,"[{'label': 'display', 'lat': 34.69186013890782...",227,650-0021,JP,神戸市,兵庫県,日本,"[中央区三宮町1-8-1, 神戸市, 兵庫県, 650-0021, 日本]",,4e5f2b4bd4c08cf7f597b98c
6,グリル一平 三宮店,Yoshoku Restaurant,中央区琴ノ緒町5-5-26,,34.69592,135.194961,"[{'label': 'display', 'lat': 34.69592032388635...",288,651-0094,JP,神戸市,兵庫県,日本,"[中央区琴ノ緒町5-5-26, 神戸市, 兵庫県, 651-0094, 日本]",,4e5647011495eb38e2057cb7
7,Baan Thai (バーンタイ),Thai Restaurant,中央区北長狭通1-8-8,しんせい堂ビル 1F,34.693155,135.191862,"[{'label': 'display', 'lat': 34.69315470395841...",314,650-0012,JP,神戸市,兵庫県,日本,"[中央区北長狭通1-8-8 (しんせい堂ビル 1F), 神戸市, 兵庫県, 650-0012...",,4d73496227ddb60c8fa9da1b
8,スタンドGONTA,Gastropub,中央区北長狭通1-31-26,,34.692407,135.192631,"[{'label': 'display', 'lat': 34.69240697352583...",265,650-0012,JP,神戸市,兵庫県,日本,"[中央区北長狭通1-31-26, 神戸市, 兵庫県, 650-0012, 日本]",,51ea2a8f498ed094a339ccab
9,たこ焼たちばな さんプラザ店,Takoyaki Place,中央区三宮町1-8-1,,34.691845,135.193055,"[{'label': 'display', 'lat': 34.69184497177562...",264,,JP,神戸市,兵庫県,日本,"[中央区三宮町1-8-1, 神戸市, 兵庫県, 日本]",三宮,4cf32b22e3b9a0930f924853


In [92]:
kobe_venues_map = folium.Map(kobe_location=[kobe_latitude, kobe_longitude], zoom_start=15) # generate map centred around Sannomiya Station


# add Sannomiya as a red circle mark
folium.CircleMarker(
    [kobe_latitude, kobe_longitude],
    radius=10,
    popup='Sannomiya',
    fill=True,
    color='red',
    fill_color='red',
    fill_opacity=0.6
    ).add_to(kobe_venues_map)


# add popular spots to the map as blue circle markers
for lat, lng, label in zip(kobe_dataframe_filtered.head(10).lat, kobe_dataframe_filtered.lng, kobe_dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        fill=True,
        color='blue',
        fill_color='blue',
        fill_opacity=0.6
        ).add_to(kobe_venues_map)

# display map
kobe_venues_map

### Stage 2 -  Model Building and Comparison <a name="Stage 2 - Manual Interpretation, Model Building and Comparison"></a>

In [1]:
# comparison of data using K-means clustering with the same results from stage 1 to generate the 10 most popular venues

#### Umeda Station, Osaka, Japan Analysis (K-means) <a name="Umeda Station, Osaka, Japan Analysis (K-means) "></a>

In [93]:
# one hot encoding
osaka_onehot = pd.get_dummies(osaka_dataframe_filtered[['categories']], prefix="", prefix_sep="")

# add categories column back to dataframe
osaka_onehot['city'] = osaka_dataframe_filtered['city'] 

# move categories column to the first column
osaka_fixed_columns = [osaka_onehot.columns[-1]] + list(osaka_onehot.columns[:-1])
osaka_onehot = osaka_onehot[osaka_fixed_columns]

osaka_onehot.head()

osaka_onehot.shape

osaka_grouped = osaka_onehot.groupby('city').mean().reset_index()
osaka_grouped

osaka_grouped.shape


num_top_venues = 25

for city in osaka_grouped['city']:
    print("----"+city+"----")
    temp = osaka_grouped[osaka_grouped['city'] == city].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----大阪市----
                        venue  freq
0      Okonomiyaki Restaurant  0.12
1   Japanese Curry Restaurant  0.12
2       Kushikatsu Restaurant  0.08
3                    Beer Bar  0.04
4                   Bookstore  0.04
5              Takoyaki Place  0.04
6       Sri Lankan Restaurant  0.04
7                 Record Shop  0.04
8             Nabe Restaurant  0.04
9               Movie Theater  0.04
10                      Hotel  0.04
11                 Hobby Shop  0.04
12                 Food Court  0.04
13          Electronics Store  0.04
14             Discount Store  0.04
15               Dessert Shop  0.04
16           Department Store  0.04
17                Coffee Shop  0.04
18                       Café  0.04
19           Toy / Game Store  0.04




In [94]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['city']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Popular Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Popular Venue'.format(ind+1))

# create a new dataframe
osaka_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
osaka_neighborhoods_venues_sorted['city'] = osaka_grouped['city']

for ind in np.arange(osaka_grouped.shape[0]):
    osaka_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(osaka_grouped.iloc[ind, :], num_top_venues)

osaka_neighborhoods_venues_sorted.head()

Unnamed: 0,city,1st Most Popular Venue,2nd Most Popular Venue,3rd Most Popular Venue,4th Most Popular Venue,5th Most Popular Venue,6th Most Popular Venue,7th Most Popular Venue,8th Most Popular Venue,9th Most Popular Venue,10th Most Popular Venue
0,大阪市,Okonomiyaki Restaurant,Japanese Curry Restaurant,Kushikatsu Restaurant,Beer Bar,Bookstore,Takoyaki Place,Sri Lankan Restaurant,Record Shop,Nabe Restaurant,Movie Theater


In [95]:
# set number of clusters
kclusters = 1

osaka_grouped_clustering = osaka_grouped.drop('city', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(osaka_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0], dtype=int32)

In [73]:
# add clustering labels
osaka_neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

osaka_merged = osaka_dataframe_filtered

# merge osaka_grouped with osaka_data to add latitude/longitude for each city
osaka_merged = osaka_merged.join(osaka_neighborhoods_venues_sorted.set_index('city'), on='city')

osaka_merged.head() # check the last columns!

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,...,1st Most Popular Venue,2nd Most Popular Venue,3rd Most Popular Venue,4th Most Popular Venue,5th Most Popular Venue,6th Most Popular Venue,7th Most Popular Venue,8th Most Popular Venue,9th Most Popular Venue,10th Most Popular Venue
0,Indian Curry (インデアンカレー),Japanese Curry Restaurant,北区芝田1-3,阪急三番街 南館 B2F,34.705035,135.498755,"[{'label': 'display', 'lat': 34.70503520001439...",79,530-0012,JP,...,Okonomiyaki Restaurant,Japanese Curry Restaurant,Kushikatsu Restaurant,Beer Bar,Bookstore,Takoyaki Place,Sri Lankan Restaurant,Record Shop,Nabe Restaurant,Movie Theater
1,Kyu Yam Tetsudou (旧ヤム鐵道),Japanese Curry Restaurant,北区梅田3-1-3,LUCUA B2F,34.703723,135.496843,"[{'label': 'display', 'lat': 34.70372271536428...",210,530-8217,JP,...,Okonomiyaki Restaurant,Japanese Curry Restaurant,Kushikatsu Restaurant,Beer Bar,Bookstore,Takoyaki Place,Sri Lankan Restaurant,Record Shop,Nabe Restaurant,Movie Theater
2,Rilakkuma Store (リラックマストア),Hobby Shop,北区芝田1-1-3,阪急三番街 北館 1F,34.705529,135.49833,"[{'label': 'display', 'lat': 34.70552912959402...",37,530-0012,JP,...,Okonomiyaki Restaurant,Japanese Curry Restaurant,Kushikatsu Restaurant,Beer Bar,Bookstore,Takoyaki Place,Sri Lankan Restaurant,Record Shop,Nabe Restaurant,Movie Theater
3,Shin-Umeda Shokudogai (新梅田食道街),Food Court,北区角田町9-26,,34.703826,135.497891,"[{'label': 'display', 'lat': 34.70382642951151...",171,530-0017,JP,...,Okonomiyaki Restaurant,Japanese Curry Restaurant,Kushikatsu Restaurant,Beer Bar,Bookstore,Takoyaki Place,Sri Lankan Restaurant,Record Shop,Nabe Restaurant,Movie Theater
4,Honmiyake (本みやけ),Nabe Restaurant,北区芝田1-1-3,阪急三番街 南館 B2F,34.704814,135.498132,"[{'label': 'display', 'lat': 34.704814, 'lng':...",62,530-0012,JP,...,Okonomiyaki Restaurant,Japanese Curry Restaurant,Kushikatsu Restaurant,Beer Bar,Bookstore,Takoyaki Place,Sri Lankan Restaurant,Record Shop,Nabe Restaurant,Movie Theater


In [96]:
# create map

import matplotlib.cm as cm
import matplotlib.colors as colors

osaka_map_clusters = folium.Map(osaka_location=[osaka_latitude, osaka_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(osaka_merged['lat'], osaka_merged['lng'], osaka_merged['city'], osaka_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(osaka_map_clusters)
       
osaka_map_clusters

#### Sannomiya Station, Kobe, Japan Analysis (K-means) <a name="Sannomiya Station, Kobe, Japan Analysis (K-means)"></a>

In [97]:
# one hot encoding
kobe_onehot = pd.get_dummies(kobe_dataframe_filtered[['categories']], prefix="", prefix_sep="")

# add categories column back to dataframe
kobe_onehot['city'] = kobe_dataframe_filtered['city'] 

# move categories column to the first column
fixed_columns = [kobe_onehot.columns[-1]] + list(kobe_onehot.columns[:-1])
kobe_onehot = kobe_onehot[fixed_columns]

kobe_onehot.head()

kobe_onehot.shape

kobe_grouped = kobe_onehot.groupby('city').mean().reset_index()
kobe_grouped

kobe_grouped.shape


num_top_venues = 10

for city in kobe_grouped['city']:
    print("----"+city+"----")
    temp = kobe_grouped[kobe_grouped['city'] == city].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----神戸市----
                       venue  freq
0                     Bakery  0.24
1                       Café  0.12
2             Ice Cream Shop  0.04
3            Thai Restaurant  0.04
4             Takoyaki Place  0.04
5                 Steakhouse  0.04
6         Seafood Restaurant  0.04
7        Japanese Restaurant  0.04
8  Japanese Curry Restaurant  0.04
9                 Hobby Shop  0.04




In [98]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['city']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Popular Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Popular Venue'.format(ind+1))

# create a new dataframe
kobe_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
kobe_neighborhoods_venues_sorted['city'] = kobe_grouped['city']

for ind in np.arange(kobe_grouped.shape[0]):
    kobe_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(kobe_grouped.iloc[ind, :], num_top_venues)

kobe_neighborhoods_venues_sorted.head()

Unnamed: 0,city,1st Most Popular Venue,2nd Most Popular Venue,3rd Most Popular Venue,4th Most Popular Venue,5th Most Popular Venue,6th Most Popular Venue,7th Most Popular Venue,8th Most Popular Venue,9th Most Popular Venue,10th Most Popular Venue
0,神戸市,Bakery,Café,Ice Cream Shop,Thai Restaurant,Takoyaki Place,Steakhouse,Seafood Restaurant,Japanese Restaurant,Japanese Curry Restaurant,Hobby Shop


In [99]:
# set number of clusters
kclusters = 1

kobe_grouped_clustering = kobe_grouped.drop('city', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(kobe_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0], dtype=int32)

In [52]:
# add clustering labels
kobe_neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

kobe_merged = kobe_dataframe_filtered

# merge osaka_grouped with osaka_data to add latitude/longitude for each city
kobe_merged = kobe_merged.join(kobe_neighborhoods_venues_sorted.set_index('city'), on='city')

kobe_merged.head() # check the last columns!

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,...,1st Most Popular Venue,2nd Most Popular Venue,3rd Most Popular Venue,4th Most Popular Venue,5th Most Popular Venue,6th Most Popular Venue,7th Most Popular Venue,8th Most Popular Venue,9th Most Popular Venue,10th Most Popular Venue
0,Nishimura's Coffee (にしむら珈琲店 三宮店),Coffee Shop,中央区琴ノ緒町5-3-5,グリーンシャポービル 1F-2F,34.695046,135.194363,"[{'label': 'display', 'lat': 34.69504555207305...",207,651-0094,JP,...,Bakery,Café,Ice Cream Shop,Thai Restaurant,Takoyaki Place,Steakhouse,Seafood Restaurant,Japanese Restaurant,Japanese Curry Restaurant,Hobby Shop
1,Juban (十番 三ノ宮店),Donburi Restaurant,中央区琴ノ緒町4-1-396,高架下,34.695909,135.195769,"[{'label': 'display', 'lat': 34.695909, 'lng':...",288,651-0094,JP,...,Bakery,Café,Ice Cream Shop,Thai Restaurant,Takoyaki Place,Steakhouse,Seafood Restaurant,Japanese Restaurant,Japanese Curry Restaurant,Hobby Shop
2,ビゴの店,Bakery,中央区御幸通8-1-6,神戸国際会館 B2F,34.691689,135.195518,"[{'label': 'display', 'lat': 34.69168869768715...",185,651-0087,JP,...,Bakery,Café,Ice Cream Shop,Thai Restaurant,Takoyaki Place,Steakhouse,Seafood Restaurant,Japanese Restaurant,Japanese Curry Restaurant,Hobby Shop
3,CAFE KESHiPEARL,Café,中央区御幸通6-1-25,もものき三宮ビル 2F,34.693536,135.197889,"[{'label': 'display', 'lat': 34.69353586057077...",238,651-0087,JP,...,Bakery,Café,Ice Cream Shop,Thai Restaurant,Takoyaki Place,Steakhouse,Seafood Restaurant,Japanese Restaurant,Japanese Curry Restaurant,Hobby Shop
4,Kobe Kokusai Hall (こくさいホール),Concert Hall,中央区御幸通8-1-6,神戸国際会館 2F,34.692156,135.195726,"[{'label': 'display', 'lat': 34.69215640015317...",137,651-0087,JP,...,Bakery,Café,Ice Cream Shop,Thai Restaurant,Takoyaki Place,Steakhouse,Seafood Restaurant,Japanese Restaurant,Japanese Curry Restaurant,Hobby Shop


In [100]:
# create map

kobe_map_clusters = folium.Map(kobe_location=[kobe_latitude, kobe_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(kobe_merged['lat'], kobe_merged['lng'], kobe_merged['city'], kobe_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(kobe_map_clusters)
       
kobe_map_clusters

### Stage 3 -  Model Execution <a name="Stage 2 - Model Execution"></a>

In [8]:
#Test the K-means Clustering model created in Step 2 by extending the parameters to include the following:
# - Most popular venue categories
# - Most popular venue within each category
# - search radius of 2,000m

#### Check on Shinagawa Station, Shinagawa, Japan  <a name="Check on Shinagawa Station, Shinagawa, Japan"></a>

In [101]:
## Get Coordinates of Shinagawa Station, Shinagawa, Japan

shinagawa_address = 'Shinagawa Station, Shinagawa, Japan'

geolocator = Nominatim(user_agent="foursquare_agent")
shinagawa_location = geolocator.geocode(shinagawa_address)
shinagawa_latitude = shinagawa_location.latitude
shinagawa_longitude = shinagawa_location.longitude
print('The latitude and longitude for Shinagawa Station is:', shinagawa_latitude, shinagawa_longitude)

The latitude and longitude for Shinagawa Station is: 35.6293681 139.73926908273415


In [102]:
# Identify top 25 popular venue categories and subsequent top 25 venues around Umeda Station, Osaka Japan
shinagawa_latitude = shinagawa_location.latitude
shinagawa_longitude= shinagawa_location.longitude
radius = 2000
LIMIT = 25
shinagawa_url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, shinagawa_latitude, shinagawa_longitude, VERSION, radius, LIMIT)
shinagawa_url

shinagawa_results = requests.get(shinagawa_url).json()
'There are {} around osaka Station, Osaka, Japan.'.format(len(shinagawa_results['response']['groups'][0]['items']))

shinagawa_items = shinagawa_results['response']['groups'][0]['items']
shinagawa_items[0]

shinagawa_nearby_venues = json_normalize(shinagawa_items) # flatten JSON

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter columns
shinagawa_filtered_columns = ['venue.name', 'venue.categories'] + [col for col in shinagawa_nearby_venues.columns if col.startswith('venue.location.')] + ['venue.id']
shinagawa_dataframe_filtered = shinagawa_nearby_venues.loc[:, shinagawa_filtered_columns]

# filter the category for each row
shinagawa_dataframe_filtered['venue.categories'] = shinagawa_dataframe_filtered.apply(get_category_type, axis=1)

# clean columns
shinagawa_dataframe_filtered.columns = [col.split('.')[-1] for col in shinagawa_dataframe_filtered.columns]

shinagawa_dataframe_filtered.head(25)

  shinagawa_nearby_venues = json_normalize(shinagawa_items) # flatten JSON


Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,neighborhood,city,state,country,formattedAddress,id
0,Barbacoa,Brazilian Restaurant,高輪4-10-18,ウィング高輪WEST 3F,35.628515,139.736955,"[{'label': 'display', 'lat': 35.6285145, 'lng'...",229,108-0074,JP,"高輪, 東京, 東京都",東京,東京都,日本,"[高輪4-10-18 (ウィング高輪WEST 3F), 港区, 東京都, 108-0074,...",55530f91498eca9bfbb5737b
1,Rojiura (路地裏),Sake Bar,港南2-2-4,,35.629259,139.742355,"[{'label': 'display', 'lat': 35.62925904771133...",279,108-0075,JP,,東京,東京都,日本,"[港南2-2-4, 港区, 東京都, 108-0075, 日本]",4b864e99f964a520798531e3
2,Strings by InterContinental Tokyo (ストリングスホテル東京...,Hotel,港南2-16-1,品川イーストワンタワー 26F-32F,35.627907,139.740726,"[{'label': 'display', 'lat': 35.6279075, 'lng'...",209,108-8282,JP,,港区,東京都,日本,"[港南2-16-1 (品川イーストワンタワー 26F-32F), 港区, 東京都, 108-...",4b1df999f964a520971624e3
3,Saza Coffee (サザコーヒー),Coffee Shop,高輪3-26-27,ecute品川 1F,35.628469,139.739426,"[{'label': 'display', 'lat': 35.62846913172101...",101,108-0074,JP,,東京,東京都,日本,"[高輪3-26-27 (ecute品川 1F), 港区, 東京都, 108-0074, 日本]",4b8da688f964a5206f0633e3
4,El Caliente,Mexican Restaurant,港南2-18-1,アトレ品川 4F,35.629079,139.740651,"[{'label': 'display', 'lat': 35.629079, 'lng':...",129,108-0075,JP,,東京,東京都,日本,"[港南2-18-1 (アトレ品川 4F), 港区, 東京都, 108-0075, 日本]",517b9623498e982bac210a7a
5,Antenna America,Beer Bar,港南2-18-1,アトレ品川 3F,35.629074,139.740635,"[{'label': 'display', 'lat': 35.62907378251761...",127,108-0075,JP,,東京,東京都,日本,"[港南2-18-1 (アトレ品川 3F), 港区, 東京都, 108-0075, 日本]",582aeb758f3b464e8309c081
6,T・ジョイPRINCE品川 IMAXシアター,Movie Theater,高輪4-10-30,品川プリンスホテル アネックスタワー 6F,35.627634,139.735805,"[{'label': 'display', 'lat': 35.62763423300281...",368,108-0074,JP,港区,東京,東京都,日本,"[高輪4-10-30 (品川プリンスホテル アネックスタワー 6F), 港区, 東京都, 1...",5780cbb3498e2b2953661723
7,Okonomiyaki Kiji (お好み焼 きじ),Okonomiyaki Restaurant,港南2-3-13,品川フロントビル 2F,35.629213,139.744264,"[{'label': 'display', 'lat': 35.62921307281103...",452,108-0075,JP,,東京,東京都,日本,"[港南2-3-13 (品川フロントビル 2F), 港区, 東京都, 108-0075, 日本]",4d05884937036dcb5d4a17fb
8,Tsubame Grill (つばめグリル),German Restaurant,高輪4-10-26,,35.62769,139.737364,"[{'label': 'display', 'lat': 35.62768970765473...",254,108-8474,JP,五反田,東京,東京都,日本,"[高輪4-10-26, 港区, 東京都, 108-8474, 日本]",4b495939f964a5200f6e26e3
9,Maxell Aqua Park Shinagawa (マクセル アクアパーク品川),Aquarium,高輪4-10-30,品川プリンスホテル内,35.62783,139.73509,"[{'label': 'display', 'lat': 35.62783, 'lng': ...",415,108-8611,JP,高輪,東京,東京都,日本,"[高輪4-10-30 (品川プリンスホテル内), 港区, 東京都, 108-8611, 日本]",4b7775a0f964a520bf9b2ee3


In [103]:
shinagawa_venues_map = folium.Map(shinagawa_location=[shinagawa_latitude, shinagawa_longitude], zoom_start=15) # generate map centred around Shinagawa Station


# add Shinagawa as a red circle mark
folium.CircleMarker(
    [osaka_latitude, osaka_longitude],
    radius=10,
    popup='Shinagawa Station',
    fill=True,
    color='red',
    fill_color='red',
    fill_opacity=0.6
    ).add_to(shinagawa_venues_map)


# add popular spots to the map as blue circle markers
for lat, lng, label in zip(shinagawa_dataframe_filtered.head(25).lat, shinagawa_dataframe_filtered.lng, shinagawa_dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        fill=True,
        color='blue',
        fill_color='blue',
        fill_opacity=0.6
        ).add_to(shinagawa_venues_map)

# display map
shinagawa_venues_map

In [27]:
# one hot encoding
shinagawa_onehot = pd.get_dummies(shinagawa_dataframe_filtered[['categories']], prefix="", prefix_sep="")

# add categories column back to dataframe
shinagawa_onehot['city'] = shinagawa_dataframe_filtered['city'] 

# move categories column to the first column
shinagawa_fixed_columns = [shinagawa_onehot.columns[-1]] + list(shinagawa_onehot.columns[:-1])
shinagawa_onehot = shinagawa_onehot[shinagawa_fixed_columns]

shinagawa_onehot.head()

shinagawa_onehot.shape

shinagawa_grouped = shinagawa_onehot.groupby('city').mean().reset_index()
shinagawa_grouped

shinagawa_grouped.shape


num_top_venues = 25

for city in shinagawa_grouped['city']:
    print("----"+city+"----")
    temp = shinagawa_grouped[shinagawa_grouped['city'] == city].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----東京----
                        venue  freq
0                 Coffee Shop  0.08
1                      Bakery  0.08
2                    Aquarium  0.04
3   Japanese Curry Restaurant  0.04
4                  Steakhouse  0.04
5          Seafood Restaurant  0.04
6                    Sake Bar  0.04
7                    Pie Shop  0.04
8      Okonomiyaki Restaurant  0.04
9               Movie Theater  0.04
10         Mexican Restaurant  0.04
11        Japanese Restaurant  0.04
12          Indian Restaurant  0.04
13                      Hotel  0.04
14              Grocery Store  0.04
15          German Restaurant  0.04
16                  Gastropub  0.04
17                       Café  0.04
18                     Buffet  0.04
19       Brazilian Restaurant  0.04
20                   Beer Bar  0.04
21           Sushi Restaurant  0.04


----港区----
                        venue  freq
0                       Hotel   1.0
1                    Aquarium   0.0
2   Japanese Curry Restaurant   0.0
3   

In [104]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['city']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Popular Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Popular Venue'.format(ind+1))

# create a new dataframe
shinagawa_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
shinagawa_neighborhoods_venues_sorted['city'] = shinagawa_grouped['city']

for ind in np.arange(shinagawa_grouped.shape[0]):
    shinagawa_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(shinagawa_grouped.iloc[ind, :], num_top_venues)

shinagawa_neighborhoods_venues_sorted.head()

Unnamed: 0,city,1st Most Popular Venue,2nd Most Popular Venue,3rd Most Popular Venue,4th Most Popular Venue,5th Most Popular Venue,6th Most Popular Venue,7th Most Popular Venue,8th Most Popular Venue,9th Most Popular Venue,10th Most Popular Venue
0,東京,Coffee Shop,Bakery,Aquarium,Japanese Curry Restaurant,Steakhouse,Seafood Restaurant,Sake Bar,Pie Shop,Okonomiyaki Restaurant,Movie Theater
1,港区,Hotel,Aquarium,Japanese Curry Restaurant,Steakhouse,Seafood Restaurant,Sake Bar,Pie Shop,Okonomiyaki Restaurant,Movie Theater,Mexican Restaurant


In [105]:
# set number of clusters
kclusters = 2

shinagawa_grouped_clustering = shinagawa_grouped.drop('city', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(shinagawa_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 1], dtype=int32)

In [106]:
# add clustering labels
shinagawa_neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

shinagawa_merged = shinagawa_dataframe_filtered

# merge osaka_grouped with osaka_data to add latitude/longitude for each city
shinagawa_merged = shinagawa_merged.join(shinagawa_neighborhoods_venues_sorted.set_index('city'), on='city')

shinagawa_merged.head(5) # check the last columns!

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,...,1st Most Popular Venue,2nd Most Popular Venue,3rd Most Popular Venue,4th Most Popular Venue,5th Most Popular Venue,6th Most Popular Venue,7th Most Popular Venue,8th Most Popular Venue,9th Most Popular Venue,10th Most Popular Venue
0,Barbacoa,Brazilian Restaurant,高輪4-10-18,ウィング高輪WEST 3F,35.628515,139.736955,"[{'label': 'display', 'lat': 35.6285145, 'lng'...",229,108-0074,JP,...,Coffee Shop,Bakery,Aquarium,Japanese Curry Restaurant,Steakhouse,Seafood Restaurant,Sake Bar,Pie Shop,Okonomiyaki Restaurant,Movie Theater
1,Rojiura (路地裏),Sake Bar,港南2-2-4,,35.629259,139.742355,"[{'label': 'display', 'lat': 35.62925904771133...",279,108-0075,JP,...,Coffee Shop,Bakery,Aquarium,Japanese Curry Restaurant,Steakhouse,Seafood Restaurant,Sake Bar,Pie Shop,Okonomiyaki Restaurant,Movie Theater
2,Strings by InterContinental Tokyo (ストリングスホテル東京...,Hotel,港南2-16-1,品川イーストワンタワー 26F-32F,35.627907,139.740726,"[{'label': 'display', 'lat': 35.6279075, 'lng'...",209,108-8282,JP,...,Hotel,Aquarium,Japanese Curry Restaurant,Steakhouse,Seafood Restaurant,Sake Bar,Pie Shop,Okonomiyaki Restaurant,Movie Theater,Mexican Restaurant
3,Saza Coffee (サザコーヒー),Coffee Shop,高輪3-26-27,ecute品川 1F,35.628469,139.739426,"[{'label': 'display', 'lat': 35.62846913172101...",101,108-0074,JP,...,Coffee Shop,Bakery,Aquarium,Japanese Curry Restaurant,Steakhouse,Seafood Restaurant,Sake Bar,Pie Shop,Okonomiyaki Restaurant,Movie Theater
4,El Caliente,Mexican Restaurant,港南2-18-1,アトレ品川 4F,35.629079,139.740651,"[{'label': 'display', 'lat': 35.629079, 'lng':...",129,108-0075,JP,...,Coffee Shop,Bakery,Aquarium,Japanese Curry Restaurant,Steakhouse,Seafood Restaurant,Sake Bar,Pie Shop,Okonomiyaki Restaurant,Movie Theater


In [107]:
# create map

import matplotlib.cm as cm
import matplotlib.colors as colors

shinagawa_map_clusters = folium.Map(shinagawa_location=[shinagawa_latitude, shinagawa_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(shinagawa_merged['lat'], shinagawa_merged['lng'], shinagawa_merged['city'], shinagawa_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(shinagawa_map_clusters)
       
shinagawa_map_clusters

#### Check on Sapporo Station, Sapporo, Japan  <a name="Check on Sapporo Station, Sapporo, Japan"></a>

In [108]:
## Get Coordinates of Sapporo Station, Sapporo, Japan

sapporo_address = 'Sapporo Station, Sapporo, Japan'

geolocator = Nominatim(user_agent="foursquare_agent")
sapporo_location = geolocator.geocode(sapporo_address)
sapporo_latitude = sapporo_location.latitude
sapporo_longitude = sapporo_location.longitude
print('The latitude and longitude for sapporo Station is:', sapporo_latitude, sapporo_longitude)

The latitude and longitude for sapporo Station is: 43.06860365 141.35079914476853


In [109]:
# Identify top 25 popular venue categories and subsequent top 25 venues around Umeda Station, Osaka Japan
sapporo_latitude = sapporo_location.latitude
sapporo_longitude= sapporo_location.longitude
radius = 2000
LIMIT = 25
sapporo_url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, sapporo_latitude, sapporo_longitude, VERSION, radius, LIMIT)
sapporo_url

sapporo_results = requests.get(sapporo_url).json()
'There are {} around osaka Station, Osaka, Japan.'.format(len(sapporo_results['response']['groups'][0]['items']))

sapporo_items = sapporo_results['response']['groups'][0]['items']
sapporo_items[0]

sapporo_nearby_venues = json_normalize(sapporo_items) # flatten JSON

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter columns
sapporo_filtered_columns = ['venue.name', 'venue.categories'] + [col for col in sapporo_nearby_venues.columns if col.startswith('venue.location.')] + ['venue.id']
sapporo_dataframe_filtered = sapporo_nearby_venues.loc[:, sapporo_filtered_columns]

# filter the category for each row
sapporo_dataframe_filtered['venue.categories'] = sapporo_dataframe_filtered.apply(get_category_type, axis=1)

# clean columns
sapporo_dataframe_filtered.columns = [col.split('.')[-1] for col in sapporo_dataframe_filtered.columns]

sapporo_dataframe_filtered.head(25)

  sapporo_nearby_venues = json_normalize(sapporo_items) # flatten JSON


Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,neighborhood,city,state,country,formattedAddress,id
0,JR Tower Observatory T38 (JRタワー展望室 タワー・スリーエイト),Scenic Lookout,中央区北5条西2-5,JRタワー 38F,43.068179,141.352369,"[{'label': 'display', 'lat': 43.06817938448641...",136,060-8503,JP,中央区,札幌市,北海道,日本,"[中央区北5条西2-5 (JRタワー 38F), 札幌市, 北海道, 060-8503, 日本]",4ba4664df964a520db9a38e3
1,Nemuro Hanamaru (回転寿司 根室花まる),Sushi Restaurant,中央区北5条西2-5,札幌ステラプレイス 6F,43.067916,141.350547,"[{'label': 'display', 'lat': 43.067916, 'lng':...",79,060-0005,JP,,札幌市,北海道,日本,"[中央区北5条西2-5 (札幌ステラプレイス 6F), 札幌市, 北海道, 060-0005...",4b6d3cb6f964a520f16c2ce3
2,Rikyu (牛たん炭焼 利久),Japanese Restaurant,北区北6条西2-1-7,札幌パセオ WEST B1F,43.068355,141.349618,"[{'label': 'display', 'lat': 43.06835507085756...",99,060-0806,JP,,札幌市,北海道,日本,"[北区北6条西2-1-7 (札幌パセオ WEST B1F), 札幌市, 北海道, 060-0...",50ac7a73e4b02888e570ff02
3,味百仙,Sake Bar,北区北7条西4,宮澤興業ビル B1F,43.068965,141.348575,"[{'label': 'display', 'lat': 43.068965, 'lng':...",185,060-0807,JP,,札幌市,北海道,日本,"[北区北7条西4 (宮澤興業ビル B1F), 札幌市, 北海道, 060-0807, 日本]",4b975ea3f964a520370135e3
4,Books Kinokuniya (紀伊國屋書店),Bookstore,中央区北5条西5-7,sapporo55 1F-2F,43.06742,141.348364,"[{'label': 'display', 'lat': 43.06742, 'lng': ...",237,060-0005,JP,札幌市,札幌市,北海道,日本,"[中央区北5条西5-7 (sapporo55 1F-2F), 札幌市, 北海道, 060-0...",4b5726c5f964a520762828e3
5,Tokachi Butadon Ippin (十勝豚丼 いっぴん),Donburi Restaurant,中央区北5条西2-5,札幌ステラプレイス 6F,43.06807,141.351404,"[{'label': 'display', 'lat': 43.06806988507564...",77,060-0005,JP,,札幌市,北海道,日本,"[中央区北5条西2-5 (札幌ステラプレイス 6F), 札幌市, 北海道, 060-0005...",5468286d498e87b190fbbb9c
6,SKY J,Bar,中央区北5条西2-5,JRタワーホテル日航札幌 35F,43.06813,141.352432,"[{'label': 'display', 'lat': 43.06812952775766...",142,060-0005,JP,,札幌市,北海道,日本,"[中央区北5条西2-5 (JRタワーホテル日航札幌 35F), 札幌市, 北海道, 060-...",4bca74b70687ef3b2c30dccc
7,Tonkatsu Tamafuji (とんかつ玉藤),Tonkatsu Restaurant,中央区北5条西2-1,札幌エスタ 10F,43.067482,141.35274,"[{'label': 'display', 'lat': 43.067482, 'lng':...",201,060-0005,JP,,札幌市,北海道,日本,"[中央区北5条西2-1 (札幌エスタ 10F), 札幌市, 北海道, 060-0005, 日本]",4e2113331838712abe6e83e8
8,Curry House Colombo (カリーハウス コロンボ),Japanese Curry Restaurant,中央区北4条西4,札幌国際ビル B1F,43.066001,141.350587,"[{'label': 'display', 'lat': 43.06600088601527...",290,060-0003,JP,中央区,札幌市,北海道,日本,"[中央区北4条西4 (札幌国際ビル B1F), 札幌市, 北海道, 060-0003, 日本]",4b626b9df964a520ec462ae3
9,175°DENO〜担担麺〜札幌駅北口店,Noodle House,中央区北7条西4-1-1,東カン札幌ビル 1F,43.069043,141.349416,"[{'label': 'display', 'lat': 43.069043, 'lng':...",122,060-0007,JP,,札幌市,北海道,日本,"[中央区北7条西4-1-1 (東カン札幌ビル 1F), 札幌市, 北海道, 060-0007...",56383361cd102b4591fae138


In [110]:
sapporo_venues_map = folium.Map(sapporo_location=[sapporo_latitude, sapporo_longitude], zoom_start=15) # generate map centred around sapporo Station


# add sapporo as a red circle mark
folium.CircleMarker(
    [osaka_latitude, osaka_longitude],
    radius=10,
    popup='sapporo Station',
    fill=True,
    color='red',
    fill_color='red',
    fill_opacity=0.6
    ).add_to(sapporo_venues_map)


# add popular spots to the map as blue circle markers
for lat, lng, label in zip(sapporo_dataframe_filtered.head(25).lat, sapporo_dataframe_filtered.lng, sapporo_dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        fill=True,
        color='blue',
        fill_color='blue',
        fill_opacity=0.6
        ).add_to(sapporo_venues_map)

# display map
sapporo_venues_map

In [111]:
# one hot encoding
sapporo_onehot = pd.get_dummies(sapporo_dataframe_filtered[['categories']], prefix="", prefix_sep="")

# add categories column back to dataframe
sapporo_onehot['city'] = sapporo_dataframe_filtered['city'] 

# move categories column to the first column
sapporo_fixed_columns = [sapporo_onehot.columns[-1]] + list(sapporo_onehot.columns[:-1])
sapporo_onehot = sapporo_onehot[sapporo_fixed_columns]

sapporo_onehot.head()

sapporo_onehot.shape

sapporo_grouped = sapporo_onehot.groupby('city').mean().reset_index()
sapporo_grouped

sapporo_grouped.shape


num_top_venues = 25

for city in sapporo_grouped['city']:
    print("----"+city+"----")
    temp = sapporo_grouped[sapporo_grouped['city'] == city].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----札幌市----
                        venue  freq
0            Sushi Restaurant  0.12
1                         Bar  0.08
2                        Café  0.08
3         Tonkatsu Restaurant  0.08
4   Japanese Curry Restaurant  0.08
5                Noodle House  0.04
6               Shopping Mall  0.04
7              Scenic Lookout  0.04
8                    Sake Bar  0.04
9                 Pastry Shop  0.04
10        Japanese Restaurant  0.04
11                  Multiplex  0.04
12                  Bookstore  0.04
13                      Hotel  0.04
14               Gourmet Shop  0.04
15          Electronics Store  0.04
16         Donburi Restaurant  0.04
17               Dessert Shop  0.04
18              Train Station  0.04




In [112]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['city']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Popular Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Popular Venue'.format(ind+1))

# create a new dataframe
sapporo_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
sapporo_neighborhoods_venues_sorted['city'] = sapporo_grouped['city']

for ind in np.arange(sapporo_grouped.shape[0]):
    sapporo_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sapporo_grouped.iloc[ind, :], num_top_venues)

sapporo_neighborhoods_venues_sorted.head()

Unnamed: 0,city,1st Most Popular Venue,2nd Most Popular Venue,3rd Most Popular Venue,4th Most Popular Venue,5th Most Popular Venue,6th Most Popular Venue,7th Most Popular Venue,8th Most Popular Venue,9th Most Popular Venue,10th Most Popular Venue
0,札幌市,Sushi Restaurant,Bar,Café,Tonkatsu Restaurant,Japanese Curry Restaurant,Noodle House,Shopping Mall,Scenic Lookout,Sake Bar,Pastry Shop


In [113]:
# set number of clusters
kclusters = 1

sapporo_grouped_clustering = sapporo_grouped.drop('city', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sapporo_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0], dtype=int32)

In [114]:
# add clustering labels
sapporo_neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

sapporo_merged = sapporo_dataframe_filtered

# merge osaka_grouped with osaka_data to add latitude/longitude for each city
sapporo_merged = sapporo_merged.join(sapporo_neighborhoods_venues_sorted.set_index('city'), on='city')

sapporo_merged.head(5) # check the last columns!

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,...,1st Most Popular Venue,2nd Most Popular Venue,3rd Most Popular Venue,4th Most Popular Venue,5th Most Popular Venue,6th Most Popular Venue,7th Most Popular Venue,8th Most Popular Venue,9th Most Popular Venue,10th Most Popular Venue
0,JR Tower Observatory T38 (JRタワー展望室 タワー・スリーエイト),Scenic Lookout,中央区北5条西2-5,JRタワー 38F,43.068179,141.352369,"[{'label': 'display', 'lat': 43.06817938448641...",136,060-8503,JP,...,Sushi Restaurant,Bar,Café,Tonkatsu Restaurant,Japanese Curry Restaurant,Noodle House,Shopping Mall,Scenic Lookout,Sake Bar,Pastry Shop
1,Nemuro Hanamaru (回転寿司 根室花まる),Sushi Restaurant,中央区北5条西2-5,札幌ステラプレイス 6F,43.067916,141.350547,"[{'label': 'display', 'lat': 43.067916, 'lng':...",79,060-0005,JP,...,Sushi Restaurant,Bar,Café,Tonkatsu Restaurant,Japanese Curry Restaurant,Noodle House,Shopping Mall,Scenic Lookout,Sake Bar,Pastry Shop
2,Rikyu (牛たん炭焼 利久),Japanese Restaurant,北区北6条西2-1-7,札幌パセオ WEST B1F,43.068355,141.349618,"[{'label': 'display', 'lat': 43.06835507085756...",99,060-0806,JP,...,Sushi Restaurant,Bar,Café,Tonkatsu Restaurant,Japanese Curry Restaurant,Noodle House,Shopping Mall,Scenic Lookout,Sake Bar,Pastry Shop
3,味百仙,Sake Bar,北区北7条西4,宮澤興業ビル B1F,43.068965,141.348575,"[{'label': 'display', 'lat': 43.068965, 'lng':...",185,060-0807,JP,...,Sushi Restaurant,Bar,Café,Tonkatsu Restaurant,Japanese Curry Restaurant,Noodle House,Shopping Mall,Scenic Lookout,Sake Bar,Pastry Shop
4,Books Kinokuniya (紀伊國屋書店),Bookstore,中央区北5条西5-7,sapporo55 1F-2F,43.06742,141.348364,"[{'label': 'display', 'lat': 43.06742, 'lng': ...",237,060-0005,JP,...,Sushi Restaurant,Bar,Café,Tonkatsu Restaurant,Japanese Curry Restaurant,Noodle House,Shopping Mall,Scenic Lookout,Sake Bar,Pastry Shop


In [115]:
# create map

import matplotlib.cm as cm
import matplotlib.colors as colors

sapporo_map_clusters = folium.Map(sapporo_location=[sapporo_latitude, sapporo_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(sapporo_merged['lat'], sapporo_merged['lng'], sapporo_merged['city'], sapporo_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(sapporo_map_clusters)
       
sapporo_map_clusters

## Analysis <a name="analysis"></a>

## Results and Discussion <a name="results"></a>

## Conclusion <a name="conclusion"></a>