# Capstone Project - New Hotel Opportunity in Thailand
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

Tourism is an economic contributor to Thailand (with more than 20% of Thailand's GDP), Travel restriction due to COVID-19 pandemic will for sure causes tremendous shrink in Thailand's GDP. But this situation will not stay forever. When the pandemic is over, lots of traveller will begin thier journey to Thailand again. Therefore, this is a good opportunity to find an optimal location to start new hotel in Thailand.

There are lots of traveling province in Thailand. Start from mountainous area in Northern part like Chaing mai, Chiang rai down to the Southern part where tons of beautiful tropical beach lay there (suchas Koh samui : Surat Thaini, Krabi, Phuket). So in beginning part of our project we'll try to figure out which province will be focused based on thier traveling income per number of hotel rooms.

For focused province, we will try to detect **locations that are not already crowded with Hotels**. We are also particularly interested in **areas with filled with community and facilities**. We would also prefer location **as close to city center as possible**, assuming that first two conditions are met.

We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decission are:
* traveling income per number of hotel rooms of each province
* number of existing hotels in the neighborhood 
* number of and distance to Italian restaurants in the neighborhood, if any
* distance of neighborhood from city center

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:
* Travelling income and number of hotel rooms by province from National Statistical Office of Thailand
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **Google Maps API reverse geocoding**
* number of hotels and their type and location in every neighborhood will be obtained using **Foursquare API**
* coordinate of focused province city center will be obtained using **Google Maps API geocoding**

### Potential province 

Let's start by import data from National Statistical Office of Thailand

First of all, let's import required module for our work

In [1]:
!pip install xlrd



In [2]:
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import json
import urllib
import folium
print('Module sucessfully imported!')

Module sucessfully imported!


In [3]:
df = pd.read_excel('http://statbbi.nso.go.th/staticreport/Page/sector/TH/report/sector_17_19_TH_.xlsx')

Since excel files consist of lots of merged cells, Let's start by unmerged it and fill blank data with ffill method

In [4]:
df = df.iloc[2:,:]
df.columns = ['Area','Province','Main','Sub','Type','2552','2553','2554','2555','2556','2557','2558','2559','2560','2561']
df.reset_index()
df=df.iloc[1:,:]
df.fillna(method='ffill',inplace = True)
df.reset_index()
df.drop(df.index[2497], inplace = True)
df.set_index(['Province'],inplace = True)
df.head(20)

Unnamed: 0_level_0,Area,Main,Sub,Type,2552,2553,2554,2555,2556,2557,2558,2559,2560,2561
Province,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
ทั่วราชอาณาจักร,ทั่วราชอาณาจักร,จำนวนผู้เยี่ยมเยือน,รวม,รวม,124869207.0,156437103.0,174118400.0,198987500.0,217112400.0,227654100.0,249074200.0,265387100.0,289823300.0,303019200.0
ทั่วราชอาณาจักร,ทั่วราชอาณาจักร,จำนวนผู้เยี่ยมเยือน,รวม,ชาวไทย,97998957.0,122522114.0,133177700.0,150509400.0,161724700.0,170248100.0,185110300.0,198787600.0,217996600.0,227774100.0
ทั่วราชอาณาจักร,ทั่วราชอาณาจักร,จำนวนผู้เยี่ยมเยือน,รวม,ชาวต่างประเทศ,26870250.0,33914989.0,40940650.0,48478140.0,55387750.0,57405950.0,63963880.0,66599510.0,71826720.0,75245080.0
ทั่วราชอาณาจักร,ทั่วราชอาณาจักร,จำนวนผู้เยี่ยมเยือน,จำนวนนักท่องเที่ยว,รวม,77235686.0,96933869.0,111575000.0,128115600.0,141849900.0,147408600.0,159191400.0,168971600.0,184094800.0,192475000.0
ทั่วราชอาณาจักร,ทั่วราชอาณาจักร,จำนวนผู้เยี่ยมเยือน,จำนวนนักท่องเที่ยว,ชาวไทย,54698887.0,68463373.0,75698420.0,86413450.0,94130690.0,98902210.0,106841300.0,114552800.0,125471300.0,130867900.0
ทั่วราชอาณาจักร,ทั่วราชอาณาจักร,จำนวนผู้เยี่ยมเยือน,จำนวนนักท่องเที่ยว,ชาวต่างประเทศ,22536799.0,28470496.0,35876580.0,41702110.0,47719170.0,48506420.0,52350070.0,54418830.0,58623500.0,61607050.0
ทั่วราชอาณาจักร,ทั่วราชอาณาจักร,จำนวนผู้เยี่ยมเยือน,จำนวนนักทัศนาจร,รวม,47633521.0,59503234.0,62543390.0,70871940.0,75262580.0,80245420.0,89882860.0,96415460.0,105728500.0,110544300.0
ทั่วราชอาณาจักร,ทั่วราชอาณาจักร,จำนวนผู้เยี่ยมเยือน,จำนวนนักทัศนาจร,ชาวไทย,43300070.0,54058741.0,57479310.0,64095910.0,67594000.0,71345890.0,78269050.0,84234780.0,92525300.0,96906240.0
ทั่วราชอาณาจักร,ทั่วราชอาณาจักร,จำนวนผู้เยี่ยมเยือน,จำนวนนักทัศนาจร,ชาวต่างประเทศ,4333451.0,5444493.0,5064074.0,6776036.0,7668583.0,8899529.0,11613810.0,12180680.0,13203220.0,13638030.0
ทั่วราชอาณาจักร,ทั่วราชอาณาจักร,ระยะเวลาพำนักโดยเฉลี่ย (วัน),รวม,รวม,3.16838,3.11153,3.226885,3.26,3.2,3.13,3.1,3.15,3.11,3.07


Dataframe df contain too much information for our study, let's drop out unnecessary information

In [5]:
year_to_drop = list(map(str,range(2552,2561)))
year_to_drop

['2552', '2553', '2554', '2555', '2556', '2557', '2558', '2559', '2560']

In [6]:
df = df.drop(columns = year_to_drop)
df.head(20)

Unnamed: 0_level_0,Area,Main,Sub,Type,2561
Province,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
ทั่วราชอาณาจักร,ทั่วราชอาณาจักร,จำนวนผู้เยี่ยมเยือน,รวม,รวม,303019200.0
ทั่วราชอาณาจักร,ทั่วราชอาณาจักร,จำนวนผู้เยี่ยมเยือน,รวม,ชาวไทย,227774100.0
ทั่วราชอาณาจักร,ทั่วราชอาณาจักร,จำนวนผู้เยี่ยมเยือน,รวม,ชาวต่างประเทศ,75245080.0
ทั่วราชอาณาจักร,ทั่วราชอาณาจักร,จำนวนผู้เยี่ยมเยือน,จำนวนนักท่องเที่ยว,รวม,192475000.0
ทั่วราชอาณาจักร,ทั่วราชอาณาจักร,จำนวนผู้เยี่ยมเยือน,จำนวนนักท่องเที่ยว,ชาวไทย,130867900.0
ทั่วราชอาณาจักร,ทั่วราชอาณาจักร,จำนวนผู้เยี่ยมเยือน,จำนวนนักท่องเที่ยว,ชาวต่างประเทศ,61607050.0
ทั่วราชอาณาจักร,ทั่วราชอาณาจักร,จำนวนผู้เยี่ยมเยือน,จำนวนนักทัศนาจร,รวม,110544300.0
ทั่วราชอาณาจักร,ทั่วราชอาณาจักร,จำนวนผู้เยี่ยมเยือน,จำนวนนักทัศนาจร,ชาวไทย,96906240.0
ทั่วราชอาณาจักร,ทั่วราชอาณาจักร,จำนวนผู้เยี่ยมเยือน,จำนวนนักทัศนาจร,ชาวต่างประเทศ,13638030.0
ทั่วราชอาณาจักร,ทั่วราชอาณาจักร,ระยะเวลาพำนักโดยเฉลี่ย (วัน),รวม,รวม,3.07


In [7]:
df = df[(df.Type == 'รวม')
#         |(df.Main == 'จำนวนผู้เยี่ยมเยือน') | 
#         (df.Main == r'ระยะเวลาพำนักโดยเฉลี่ย (วัน)') | 
#         (df.Main == r'รายได้การท่องเที่ยว (ล้านบาท)') | 
#         (df.Main == 'สถานประกอบการที่พักแรม' )
       ]

In [8]:
df['Overnight_visitor'] = np.where(df['Sub']=='จำนวนผู้ที่มาเข้าพัก', df['2561'], 0)
df['Nights'] = np.where(df['Main']=='ระยะเวลาพำนักโดยเฉลี่ย (วัน)', df['2561'], 0)
df['Travel_income'] = np.where(df['Main']=='รายได้การท่องเที่ยว (ล้านบาท)', df['2561'], 0)
df['Rooms'] = np.where(df['Sub']=='จำนวนห้อง', df['2561'], 0)
df.drop(columns=['Main','Sub','Type','2561'],inplace = True)

Finally, we'll get desired dataframe as follow

In [9]:
df = df.groupby(df.index).sum()
df.head(20)

Unnamed: 0_level_0,Overnight_visitor,Nights,Travel_income,Rooms
Province,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
กระบี่,4186576.0,4.41,115176.7,21853.0
กรุงเทพมหานคร,35810567.0,3.87,1040509.51,152616.0
กาญจนบุรี,3293965.0,2.29,26796.4,16451.0
กาฬสินธุ์,247539.0,2.54,1223.11,1109.0
กำแพงเพชร,442748.0,2.05,1631.7,2382.0
ขอนแก่น,2234818.0,2.63,17231.62,9676.0
จันทบุรี,1598231.0,2.19,8520.18,6227.0
ฉะเชิงเทรา,487443.0,1.8,4994.01,1506.0
ชลบุรี,14880369.0,3.4,264543.05,66532.0
ชัยนาท,281801.0,1.96,1321.59,927.0


In [10]:
df['Income/rooms/year'] = df['Travel_income']/df['Rooms']
df.sort_values('Province', ascending=False, inplace = True)

Let's drop the row that contain information for whole country

In [11]:
df.drop(index='ทั่วราชอาณาจักร', inplace = True)


Confirm row has been dropped out by dataframe shape

In [12]:
df.shape

(77, 5)

In [13]:
df.index

Index(['แม่ฮ่องสอน', 'แพร่', 'เลย', 'เพชรบูรณ์', 'เพชรบุรี', 'เชียงใหม่',
       'เชียงราย', 'อ่างทอง', 'อุบลราชธานี', 'อุทัยธานี', 'อุตรดิตถ์',
       'อุดรธานี', 'อำนาจเจริญ', 'หนองบัวลำภู', 'หนองคาย', 'สุโขทัย',
       'สุรินทร์', 'สุราษฎร์ธานี', 'สุพรรณบุรี', 'สิงห์บุรี', 'สระแก้ว',
       'สระบุรี', 'สมุทรสาคร', 'สมุทรสงคราม', 'สมุทรปราการ', 'สตูล', 'สงขลา',
       'สกลนคร', 'ศรีสะเกษ', 'ลำพูน', 'ลำปาง', 'ลพบุรี', 'ร้อยเอ็ด', 'ราชบุรี',
       'ระยอง', 'ระนอง', 'ยโสธร', 'ยะลา', 'มุกดาหาร', 'มหาสารคาม', 'ภูเก็ต',
       'พิษณุโลก', 'พิจิตร', 'พัทลุง', 'พังงา', 'พะเยา', 'พระนครศรีอยุธยา',
       'ปัตตานี', 'ปราจีนบุรี', 'ประจวบคีรีขันธ์', 'ปทุมธานี', 'บุรีรัมย์',
       'บึงกาฬ', 'น่าน', 'นราธิวาส', 'นนทบุรี', 'นครสวรรค์', 'นครศรีธรรมราช',
       'นครราชสีมา', 'นครพนม', 'นครปฐม', 'นครนายก', 'ตาก', 'ตราด', 'ตรัง',
       'ชุมพร', 'ชัยภูมิ', 'ชัยนาท', 'ชลบุรี', 'ฉะเชิงเทรา', 'จันทบุรี',
       'ขอนแก่น', 'กำแพงเพชร', 'กาฬสินธุ์', 'กาญจนบุรี', 'กรุงเทพมหานคร',
       'กระบี่'],
      d

#### Import  Thailand's geojson file to plot choropleth map

In [14]:
# download countries geojson file
!wget --quiet https://raw.githubusercontent.com/apisit/thailand.json/master/thailand.json -O thailand.json
    
print('GeoJSON file downloaded!')

GeoJSON file downloaded!


In [15]:
thailand_geo = r'thailand.json'
url = r'https://raw.githubusercontent.com/apisit/thailand.json/master/thailand.json'
json_url = urllib.request.urlopen(url)
data = json.loads(json_url.read())

In [16]:
Province_eng = list()
for feat in data['features']:
    Province_eng.append(feat['properties']['name'])

In [17]:
Province_eng.sort()
Province_eng

['Amnat Charoen',
 'Ang Thong',
 'Bangkok Metropolis',
 'Bueng Kan',
 'Buri Ram',
 'Chachoengsao',
 'Chai Nat',
 'Chaiyaphum',
 'Chanthaburi',
 'Chiang Mai',
 'Chiang Rai',
 'Chon Buri',
 'Chumphon',
 'Kalasin',
 'Kamphaeng Phet',
 'Kanchanaburi',
 'Khon Kaen',
 'Krabi',
 'Lampang',
 'Lamphun',
 'Loei',
 'Lop Buri',
 'Mae Hong Son',
 'Maha Sarakham',
 'Mukdahan',
 'Nakhon Nayok',
 'Nakhon Pathom',
 'Nakhon Phanom',
 'Nakhon Ratchasima',
 'Nakhon Sawan',
 'Nakhon Si Thammarat',
 'Nan',
 'Narathiwat',
 'Nong Bua Lam Phu',
 'Nong Khai',
 'Nonthaburi',
 'Pathum Thani',
 'Pattani',
 'Phangnga',
 'Phatthalung',
 'Phayao',
 'Phetchabun',
 'Phetchaburi',
 'Phichit',
 'Phitsanulok',
 'Phra Nakhon Si Ayutthaya',
 'Phrae',
 'Phuket',
 'Prachin Buri',
 'Prachuap Khiri Khan',
 'Ranong',
 'Ratchaburi',
 'Rayong',
 'Roi Et',
 'Sa Kaeo',
 'Sakon Nakhon',
 'Samut Prakan',
 'Samut Sakhon',
 'Samut Songkhram',
 'Saraburi',
 'Satun',
 'Si Sa Ket',
 'Sing Buri',
 'Songkhla',
 'Sukhothai',
 'Suphan Buri',
 

Mapping province name which only provided in Thai with English name according to GeoJSON file

In [18]:
Province = ['Mae Hong Son','Phrae','Loei','Phetchabun','Phetchaburi', 'Chiang Mai',
 'Chiang Rai','Ang Thong','Ubon Ratchathani','Uthai Thani','Uttaradit','Udon Thani','Amnat Charoen','Nong Bua Lam Phu',
 'Nong Khai','Sukhothai','Surin','Surat Thani','Suphan Buri', 'Sing Buri', 'Sa Kaeo', 'Saraburi','Samut Sakhon','Samut Songkhram', 'Samut Prakan',
            'Satun','Songkhla', 'Sakon Nakhon','Si Sa Ket', 'Lamphun','Lampang', 'Lop Buri', 'Roi Et', 'Ratchaburi',
 'Rayong', 'Ranong', 'Yasothon','Yala','Mukdahan', 'Maha Sarakham','Phuket','Phitsanulok','Phichit','Phatthalung', 'Phangnga', 'Phayao', 'Phra Nakhon Si Ayutthaya','Pattani', 'Prachin Buri', 'Prachuap Khiri Khan',
            'Pathum Thani', 'Buri Ram','Bueng Kan', 'Nan',
 'Narathiwat', 'Nonthaburi','Nakhon Sawan',
 'Nakhon Si Thammarat', 'Nakhon Ratchasima', 'Nakhon Phanom', 'Nakhon Pathom', 'Nakhon Nayok','Tak','Trat','Trang', 'Chumphon','Chaiyaphum',
           'Chai Nat', 'Chon Buri', 'Chachoengsao','Chanthaburi', 'Khon Kaen','Kamphaeng Phet','Kalasin','Kanchanaburi','Bangkok Metropolis', 'Krabi',]

In [19]:
df['Province_eng'] = Province
df.reset_index(inplace = True)
df.head(10)

Unnamed: 0,Province,Overnight_visitor,Nights,Travel_income,Rooms,Income/rooms/year,Province_eng
0,แม่ฮ่องสอน,1017154.0,2.23,5216.28,6156.0,0.847349,Mae Hong Son
1,แพร่,365047.0,1.89,1722.39,1714.0,1.004895,Phrae
2,เลย,1242447.0,2.26,4610.14,5933.0,0.777034,Loei
3,เพชรบูรณ์,2000241.0,2.3,7533.7,6332.0,1.189782,Phetchabun
4,เพชรบุรี,3895640.0,2.29,31574.46,11096.0,2.845571,Phetchaburi
5,เชียงใหม่,8360997.0,2.96,107625.32,36186.0,2.974225,Chiang Mai
6,เชียงราย,3142005.0,2.54,28617.71,17003.0,1.683098,Chiang Rai
7,อ่างทอง,183550.0,1.59,993.54,498.0,1.99506,Ang Thong
8,อุบลราชธานี,1428571.0,2.49,7999.25,4694.0,1.704144,Ubon Ratchathani
9,อุทัยธานี,375449.0,1.98,1397.39,2117.0,0.66008,Uthai Thani


#### Sort dataframe by Income/room/year to show potential province for investment

In [20]:
df['Income/room/year'] = df['Travel_income']/df['Rooms']
df.sort_values('Income/room/year', ascending = False, inplace = True)
df.reindex(columns = ['Province_eng','Province','Overnight_visitor','Nights','Travel_income','Rooms','Income/room/year'])

Unnamed: 0,Province_eng,Province,Overnight_visitor,Nights,Travel_income,Rooms,Income/room/year
75,Bangkok Metropolis,กรุงเทพมหานคร,35810567.0,3.87,1040509.51,152616.0,6.817827
40,Phuket,ภูเก็ต,12834961.0,4.18,449100.73,84707.0,5.301814
76,Krabi,กระบี่,4186576.0,4.41,115176.70,21853.0,5.270521
44,Phangnga,พังงา,1081049.0,5.27,52014.56,12356.0,4.209660
68,Chon Buri,ชลบุรี,14880369.0,3.40,264543.05,66532.0,3.976178
...,...,...,...,...,...,...,...
27,Sakon Nakhon,สกลนคร,697419.0,2.47,2407.00,3555.0,0.677075
9,Uthai Thani,อุทัยธานี,375449.0,1.98,1397.39,2117.0,0.660080
32,Roi Et,ร้อยเอ็ด,445989.0,2.36,1465.83,2280.0,0.642908
13,Nong Bua Lam Phu,หนองบัวลำภู,153007.0,2.25,418.29,660.0,0.633773


#### Create choropleth map visualizing potential province

In [21]:
# create a plain world map
Thailand_map = folium.Map(location=[13.03887, 101.490104], zoom_start=5)

In [24]:
# generate choropleth map using the total immigration of each country to Canada from 1980 to 2013
Thailand_map.choropleth(
    geo_data=thailand_geo,
    data=df,
    columns=['Province_eng', 'Income/room/year'],
    key_on='feature.properties.name',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Traveling income per room per year (MB/rooms/year)'
)

# display map
Thailand_map

From above map and df, apart from Bangkok metropolis, potential provinces are spread along southern part of Thailand, the most potential province we'll focus on our further study will be **Phuket**

### Define Neighborhood Candidate
First, let's import List of postal codes of Phuket (Manually created)

In [25]:
df_phuket = pd.read_csv('Phuket_Postal.csv')

In [28]:
df_phuket = df_phuket.groupby(['Postal Code','Borough'])['Neighborhood'].apply(','.join).reset_index()
df_phuket

Unnamed: 0,Postal Code,Borough,Neighborhood
0,83000,Mueang Phuket,"Mueang Phuket,Tambon Chalong,Tambon Ko Kaeo,Ta..."
1,83100,Rawai,"Karon,Rawai,Tambon Karon,Tambon Rawai"
2,83110,Thalang,"Choeng Thale,Tambon Choeng Thale,Tambon Mai Kh..."
3,83120,Kathu,"Kathu,Tambon Kamala,Tambon Kathu,Tambon Patong"
4,83130,Chalong,Tambon Chalong
5,83140,Tambon Sa Khu,Tambon Sa Khu
6,83150,Patong,"Kathu,Patong,Tambon Patong"


Let's create latitude & longitude coordinates for centroids of our candidate neighborhoods. We will create a grid of cells covering our area of interest which is aprox. 12x12 killometers centered around Phuket city center

Let's first find the latitude & longitude of Phuket city center, using Arcgis geocoding API.

In [29]:
import geocoder #import geocoder

In [30]:
#Define get_latlng function to get geospatial location from arcgis (since google is not free now)
def get_latlng(postal_code):
    lati_long_coords = None
    while(lati_long_coords is None):
        g = geocoder.arcgis('{}, Phuket,Thailand'.format(postal_code))
        lati_long_coords = g.latlng
    return lati_long_coords

In [32]:
Postal = df_phuket['Postal Code'].tolist()
coords = [ get_latlng(postal_code) for postal_code in Postal ]

In [36]:
lat = list()
long = list()
for co in coords:
    lat.append(co[0])
    long.append(co[1])
#inser lat long into df
df_phuket['Latitude'] = lat
df_phuket['Longtitude'] = long

In [37]:
df_phuket.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longtitude
0,83000,Mueang Phuket,"Mueang Phuket,Tambon Chalong,Tambon Ko Kaeo,Ta...",7.89837,98.408274
1,83100,Rawai,"Karon,Rawai,Tambon Karon,Tambon Rawai",7.834206,98.300274
2,83110,Thalang,"Choeng Thale,Tambon Choeng Thale,Tambon Mai Kh...",8.053268,98.345658
3,83120,Kathu,"Kathu,Tambon Kamala,Tambon Kathu,Tambon Patong",7.91349,98.337731
4,83130,Chalong,Tambon Chalong,7.788775,98.336667


In [59]:
phuket_center = [df_phuket.Latitude[0],df_phuket.Longtitude[0]]

In [61]:
import warnings
warnings.filterwarnings('ignore')

In [62]:
# !pip install shapely
# import shapely.geometry

!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Phuket center longitude={}, latitude={}'.format(phuket_center[1], phuket_center[0]))
x, y = lonlat_to_xy(phuket_center[1], phuket_center[0])
print('Phuket center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Phuket center longitude={}, latitude={}'.format(lo, la))


-------------------------------
Phuket center longitude=98.40827434079222, latitude=7.8983698143894685
Phuket center UTM X=15767742.232362216, Y=5885918.370668026
Phuket center longitude=98.40822121779158, latitude=7.898441900786866


Now let's create a grid of area candidates, equaly spaced, centered around city center and within ~6km from Phuket center. Our neighborhoods will be defined as circular areas with a radius of 300 meters, so our neighborhood centers will be 600 meters apart.

To accurately calculate distances we need to create our grid of locations in Cartesian 2D coordinate system which allows us to calculate distances in meters (not in latitude/longitude degrees). Then we'll project those coordinates back to latitude/longitude degrees to be shown on Folium map. So let's create functions to convert between WGS84 spherical coordinate system (latitude/longitude degrees) and UTM Cartesian coordinate system (X/Y coordinates in  meters).

In [63]:
phuket_center_x, phuket_center_y = lonlat_to_xy(phuket_center[1], phuket_center[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = phuket_center_x - 6000
x_step = 600
y_min = phuket_center_y - 6000 - (int(21/k)*k*600 - 12000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(phuket_center_x, phuket_center_y, x, y)
        if (distance_from_center <= 6001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

364 candidate neighborhood centers generated.


In [86]:
i=0

Let's visualize the data we have so far: city center location and candidate neighborhood centers:

In [96]:
map_phuket = folium.Map(location=phuket_center, zoom_start=13)
folium.Marker(phuket_center, popup='phuket city center').add_to(map_phuket)
for lat, lon in zip(latitudes, longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_phuket) 
    folium.Circle([lat, lon], radius=55, color='blue', fill=False).add_to(map_phuket)
    folium.Marker([lat, lon]).add_to(map_phuket)
    i = i+1
map_phuket

In [88]:
i

364

OK, we now have the coordinates of centers of neighborhoods/areas to be evaluated, equally spaced (distance from every point to it's neighbors is exactly the same) and within ~6km from Alexanderplatz. 

Let's now use Google Maps API to get approximate addresses of those locations.

In [73]:
import getpass
#Credential of Arcgis API
username = getpass.getpass('Enter your username')
password = getpass.getpass('Enter your password')

Enter your username········
Enter your password········


In [74]:
!pip install arcgis

from arcgis.geocoding import reverse_geocode
from arcgis.gis import GIS
gis = GIS("https://www.arcgis.com", username, password)
print("Sucessful")



Sucessful


In [80]:
addr = reverse_geocode([phuket_center[1],phuket_center[0]],lang_code='En')

In [81]:
print('Reverse geocoding check')
print('-----------------------')
print('Address of [{}, {}] is: {}'.format(phuket_center[0], phuket_center[1], addr['address']['LongLabel']))

Reverse geocoding check
-----------------------
Address of [7.8983698143894685, 98.40827434079222] is: Thung Kha Construction Material Co., Ltd. Ratsadanuson Rd, Ratsada, Mueang Phuket, Phuket 83000, THA


In [82]:
print('Obtaining location addresses: ', end='')
addresses = []
for lat, lon in zip(latitudes, longitudes):
    address = reverse_geocode([lon,lat],lang_code='En')
    if address is None:
        address = 'NO ADDRESS'
#     address = address.replace(', Germany', '') # We don't need country part of address
    addresses.append(address['address']['LongLabel'])
    print(' .', end='')
print(' done.')

Obtaining location addresses:  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [83]:
addresses[150:170]

['83000, THA',
 '83000, THA',
 '83000, THA',
 'Plus Mansion Soi Ruam Phatthana 1, Ratsada, Mueang Phuket, Phuket 83000, THA',
 'Ruam Phatthana Rd, Ratsada, Mueang Phuket, Phuket 83000, THA',
 'Soi Muban Ban Rom Mai Chai Le, Ratsada, Mueang Phuket, Phuket 83000, THA',
 '79/21 Moo 3 Ruam Phatthana Rd, Ratsada, Mueang Phuket, Phuket 83000, THA',
 'Soi Muban Ban Rom Mai Chai Le, Ratsada, Mueang Phuket, Phuket 83000, THA',
 'Soi Muban Ban Rom Mai Chai Le, Ratsada, Mueang Phuket, Phuket 83000, THA',
 '83000, THA',
 '83000, THA',
 '83000, THA',
 '83000, THA',
 '83000, THA',
 'Ratsadanuson Rd, Ratsada, Mueang Phuket, Phuket 83000, THA',
 '2/99 Moo 3 Trang Rd, Ratsada, Mueang Phuket, Phuket 83000, THA',
 'Soi Muban Suwanna Khan 2/1, Ratsada, Mueang Phuket, Phuket 83000, THA',
 '83000, THA',
 '83000, THA',
 '83000, THA']

In [84]:
import pandas as pd

df_locations = pd.DataFrame({'Address': addresses,
                             'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center
0,"Ban Bang Ngua-Witthayalai Technic 2 Rd, Talat ...",7.895453,98.398684,15765940.0,5880203.0,5992.495307
1,"Ban Bang Ngua-Witthayalai Technic 2 Rd, Talat ...",7.894635,98.399257,15766540.0,5880203.0,5840.3767
2,"Ban Bang Ngua-Witthayalai Technic 2 Rd, Talat ...",7.893817,98.399831,15767140.0,5880203.0,5747.173218
3,"Damrong Rd, Talat Yai, Mueang Phuket, Phuket 8...",7.892999,98.400404,15767740.0,5880203.0,5715.767665
4,"Phuket Public Library Damrong Road, Ratsada, P...",7.892181,98.400978,15768340.0,5880203.0,5747.173218
5,"2/2 Kamnan Rd, Talat Yai, Mueang Phuket, Phuke...",7.891364,98.401551,15768940.0,5880203.0,5840.3767
6,"Narison Rd, Talat Yai, Mueang Phuket, Phuket 8...",7.890546,98.402124,15769540.0,5880203.0,5992.495307
7,"To Sae Rd, Ratsada, Mueang Phuket, Phuket 8300...",7.897176,98.398534,15765040.0,5880722.0,5855.766389
8,"Ban Bang Ngua-Witthayalai Technic 2 Rd, Talat ...",7.896358,98.399108,15765640.0,5880722.0,5604.462508
9,"Ban Bang Ngua-Witthayalai Technic 2 Rd, Talat ...",7.895539,98.399681,15766240.0,5880722.0,5408.326913


...and let's now save/persist this data into local file.

In [16]:
df_locations.to_pickle('./locations.pkl')    

### Foursquare
Now that we have our location candidates, let's use Foursquare API to get info on restaurants in each neighborhood.

We're interested in venues in 'food' category, but only those that are proper restaurants - coffe shops, pizza places, bakeries etc. are not direct competitors so we don't care about those. So we will include in out list only venues that have 'restaurant' in category name, and we'll make sure to detect and include all the subcategories of specific 'Italian restaurant' category, as we need info on Italian restaurants in the neighborhood.

Foursquare credentials are defined in hidden cell bellow.

In [17]:
# The code was removed by Watson Studio for sharing.

In [18]:
# Category IDs corresponding to Italian restaurants were taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):

hotel_category = '4bf58dd8d48988d1fa931735' # 'Root' category for all food-related venues

italian_restaurant_categories = ['4bf58dd8d48988d110941735','55a5a1ebe4b013909087cbb6','55a5a1ebe4b013909087cb7c',
                                 '55a5a1ebe4b013909087cba7','55a5a1ebe4b013909087cba1','55a5a1ebe4b013909087cba4',
                                 '55a5a1ebe4b013909087cb95','55a5a1ebe4b013909087cb89','55a5a1ebe4b013909087cb9b',
                                 '55a5a1ebe4b013909087cb98','55a5a1ebe4b013909087cbbf','55a5a1ebe4b013909087cb79',
                                 '55a5a1ebe4b013909087cbb0','55a5a1ebe4b013909087cbb3','55a5a1ebe4b013909087cb74',
                                 '55a5a1ebe4b013909087cbaa','55a5a1ebe4b013909087cb83','55a5a1ebe4b013909087cb8c',
                                 '55a5a1ebe4b013909087cb92','55a5a1ebe4b013909087cb8f','55a5a1ebe4b013909087cb86',
                                 '55a5a1ebe4b013909087cbb9','55a5a1ebe4b013909087cb7f','55a5a1ebe4b013909087cbbc',
                                 '55a5a1ebe4b013909087cb9e','55a5a1ebe4b013909087cbc2','55a5a1ebe4b013909087cbad']

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Deutschland', '')
    address = address.replace(', Germany', '')
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues