# Tokyo analysis - Where to set a new business?

## Introduction

### 1. Background and problem description:

Tokyo, or officially known as Tokyo Metropolis, is the capital of Japan and the most populated prefecture in the entire country. The city holds around 13,960,236 people across 23 special wards [[1]](https://en.wikipedia.org/wiki/Tokyo). Such a high amount of people aggregated in one city causes its density to reach around 6,363 people per square kilometer [[2]](https://www.metro.tokyo.lg.jp/tosei/hodohappyo/press/2021/01/28/01.html).

Densely populated areas tend to lead to a highly diversified market demand for food and other catering services. This can easily turn into a double edge sword. On one hand, successful businesses can thrive at a faster pace and expand, however, this also means that businesses have added pressure to keep up with world trends and to cater to new customer needs in order to out-compete their massive competition. Furthermore, new businesses have an even harder time to enter this already established ecosystem.

*Location, location, location.* **Where should one start?** 
* From a shop owner perspective, a place that is located in a highly dense area, with "hopefully" lower land costs and even more "hopefully" less direct competition would be a good start. 
* From an investor perspective, the same information could be quite insightful to understand a business potential longevity and challenges (competition wise) in the short to mid-term.

This project aims at providing a solution that relies heavily on the "easy visualization" data staple. So that both new shop owners and investors can quickly gather insight on viable new opportunities.

### 2. Data description:

**Data Plan:**

1) Initially, a potential ward of interest will be shortlisted based on population density and land price factors;

2) From this, its respective boroughs will be analysed in terms of common venues (indirect/direct competitors and potential synergies with other businesses);

3) Lastly, the land price values for each borough will be overlapped on a world map with the common venues clustering information to further help reduce the potential areas to set up a new business.  

The required information will be extracted from publicly available resources:
* Wards Density information; [[3]](https://en.wikipedia.org/wiki/Special_wards_of_Tokyo#List_of_special_wards)
* Tokyo Land market value list; [[4]](https://utinokati.com/en/details/land-market-value/area/Tokyo/)
* ZIP codes within Tokyo; [[5]](https://japan-postcode.810popo.net/tokyoto/)
* Location coordinates using Geocoder Python Package; [[6]](https://geocoder.readthedocs.io/)
* **Foursquare location** to extract respective borough information. [[7]](https://foursquare.com/)

## Methodology

* Required libraries to run the notebook:

In [1]:
import requests
import numpy as np
import pandas as pd

import plotly.express as go
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

import folium
import geocoder
from geopy.geocoders import Nominatim

from bs4 import BeautifulSoup

* Webscrape Tokyo's Wards data: 
    1. Name & density
    2. Average price per land (JPY per square meters)

In [2]:
# Ward name and density
url = 'https://en.wikipedia.org/wiki/Special_wards_of_Tokyo#List_of_special_wards'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find('table', {'class':'wikitable sortable'})

ward_info = pd.read_html(str(table))[0]

ward_info.drop(ward_info.columns[[0,1,4,6,7]], axis = 1, inplace = True) # Remove extra columns from the original table

In [3]:
ward_info.head(5)

Unnamed: 0,Name,Kanji,Density(/km2)
0,Chiyoda,千代田区,5100
1,Chūō,中央区,14460
2,Minato,港区,12180
3,Shinjuku,新宿区,18620
4,Bunkyō,文京区,19790


In [4]:
# Ward average price per land
url = 'https://utinokati.com/en/details/land-market-value/area/Tokyo/'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find('table', {'id':'region_overview'})

ward_price = pd.read_html(str(table))[0]

# Extra cleaning steps with the dataframe
ward_price.drop(ward_price.columns[[1,3]], axis = 1, inplace = True) # Remove added columns within the dataframe
ward_price.drop(ward_price.index[23:len(ward_price)], axis = 0, inplace = True) # Remove cities information
ward_price['Average Unit Price'] = ward_price['Average Unit Price'].str.replace('JPY/sq.m','', regex = True)
ward_price['Average Unit Price'] = ward_price['Average Unit Price'].str.replace(',','', regex = True)
ward_price = ward_price.rename(columns = {'Average Unit Price': 'Average Price(JPY/sq.m)'})

In [5]:
ward_price.head(5)

Unnamed: 0,Area,Average Price(JPY/sq.m)
0,Chiyoda-Ku,2839779
1,Chuo-Ku,1876597
2,Minato-Ku,2075876
3,Shinjuku-Ku,875098
4,Bunkyo-Ku,952715


* Merge both dataframes

In [6]:
wards_df = pd.concat([ward_info, ward_price], axis = 1)
wards_df.drop(['Area'], axis = 1, inplace = True)
wards_df.head(5)

Unnamed: 0,Name,Kanji,Density(/km2),Average Price(JPY/sq.m)
0,Chiyoda,千代田区,5100,2839779
1,Chūō,中央区,14460,1876597
2,Minato,港区,12180,2075876
3,Shinjuku,新宿区,18620,875098
4,Bunkyō,文京区,19790,952715


In [7]:
wards_df.dtypes

Name                       object
Kanji                      object
Density(/km2)               int64
Average Price(JPY/sq.m)    object
dtype: object

In [8]:
# Change Average Price column to integers
wards_df['Average Price(JPY/sq.m)'] = pd.to_numeric(wards_df['Average Price(JPY/sq.m)'])

In [14]:
fig = go.scatter(data_frame = wards_df,
                 x = 'Density(/km2)',
                 y = 'Average Price(JPY/sq.m)',
                 color = 'Name')

fig.update_traces(marker = dict(size = 15))
fig.show()

Based on this scatter plot, Toshima ward has the most density while Chiyoda has the least density by it is by far the most expensive. Let's quickly confirm the density per price ratio:

In [25]:
wards_df['Density/Price ratio'] = wards_df['Density(/km2)']/wards_df['Average Price(JPY/sq.m)']
wards_df.sort_values(by = 'Density/Price ratio', ascending = False).head(10)

Unnamed: 0,Name,Kanji,Density(/km2),Average Price(JPY/sq.m),Density/Price ratio
17,Arakawa,荒川区,21030,475876.0,0.044192
20,Adachi,足立区,12660,293587.0,0.043122
21,Katsushika,葛飾区,12850,308694.0,0.041627
22,Edogawa,江戸川区,13750,334899.0,0.041057
18,Itabashi,板橋区,17670,432895.0,0.040818
19,Nerima,練馬区,15120,408850.0,0.036982
13,Nakano,中野区,21350,584593.0,0.036521
15,Toshima,豊島区,22650,678396.0,0.033388
16,Kita,北区,16740,526373.0,0.031803
6,Sumida,墨田区,18910,605382.0,0.031236


## References

[1-5] - Accessed on the 13/06/2021.