# IBM Data Applied Science Capstone Project

In this project, data scinece skills will be employed to answer the following business problem: Where in the city of Bangkok would you recommend a property developer set up a new shopping mall?

The code in the sections that follow will provide a walk through to how the final location was determined.

## Code

### Import pandas Library

In [1]:
# import library
import pandas as pd
import numpy as np

### Retrive Bangkok Data from Wikipedia

In [2]:
import requests

In [3]:
url = 'https://en.wikipedia.org/wiki/List_of_districts_of_Bangkok'
wiki_url = requests.get(url)
wiki_url

<Response [200]>

In [4]:
wiki_info = pd.read_html(wiki_url.text)
wiki_info

[          District(Khet)  MapNr  Post-code               Thai  Popu-lation  \
 0               Bang Bon     50      10150             บางบอน       105161   
 1              Bang Kapi      6      10240            บางกะปิ       148465   
 2              Bang Khae     40      10160              บางแค       191781   
 3              Bang Khen      5      10220             บางเขน       189539   
 4          Bang Kho Laem     31      10120          บางคอแหลม        94956   
 5        Bang Khun Thian     21      10150        บางขุนเทียน       165491   
 6                Bang Na     47      10260              บางนา        95912   
 7             Bang Phlat     25      10700            บางพลัด        99273   
 8               Bang Rak      4      10500             บางรัก        45875   
 9               Bang Sue     29      10800            บางซื่อ       132234   
 10           Bangkok Noi     20      10700         บางกอกน้อย       117793   
 11           Bangkok Yai     16      10600         

In [5]:
print(len(wiki_info)) # number of tables
print(type(wiki_info))

2
<class 'list'>


In [6]:
# obtain relevant table
wiki_info = wiki_info[0]

In [7]:
wiki_info.head()

Unnamed: 0,District(Khet),MapNr,Post-code,Thai,Popu-lation,No. ofSubdis-trictsKhwaeng,Latitude,Longitude
0,Bang Bon,50,10150,บางบอน,105161,4,13.6592,100.3991
1,Bang Kapi,6,10240,บางกะปิ,148465,2,13.765833,100.647778
2,Bang Khae,40,10160,บางแค,191781,4,13.696111,100.409444
3,Bang Khen,5,10220,บางเขน,189539,2,13.873889,100.596389
4,Bang Kho Laem,31,10120,บางคอแหลม,94956,3,13.693333,100.5025


In [8]:
# drop columns: MapNr, Thai, No. of Subdistricts Khwaeng
wiki_info_drop = wiki_info.drop(labels = wiki_info[['MapNr', "Thai", 'No. ofSubdis-trictsKhwaeng']], axis = 1)
wiki_info_drop.head()

Unnamed: 0,District(Khet),Post-code,Popu-lation,Latitude,Longitude
0,Bang Bon,10150,105161,13.6592,100.3991
1,Bang Kapi,10240,148465,13.765833,100.647778
2,Bang Khae,10160,191781,13.696111,100.409444
3,Bang Khen,10220,189539,13.873889,100.596389
4,Bang Kho Laem,10120,94956,13.693333,100.5025


In [9]:
# rename columns
bangkok_data = wiki_info_drop.rename(columns = {"District(Khet)": "District", "Post-code": "Postal Code", 'Popu-lation': 'Population'})
bangkok_data.head()

Unnamed: 0,District,Postal Code,Population,Latitude,Longitude
0,Bang Bon,10150,105161,13.6592,100.3991
1,Bang Kapi,10240,148465,13.765833,100.647778
2,Bang Khae,10160,191781,13.696111,100.409444
3,Bang Khen,10220,189539,13.873889,100.596389
4,Bang Kho Laem,10120,94956,13.693333,100.5025


### Access the Map of Bangkok

In [10]:
# access the lat and long of districts in Bangkok
from geopy.geocoders import Nominatim 

In [11]:
address = 'Bangkok, Thailand'

geolocator = Nominatim(user_agent="bangkok_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(f'Bangkok\'s coordinates are {latitude} and {longitude}')

Bangkok's coordinates are 13.7544238 and 100.4930399


In [12]:
# visualise the map of Bangkok

!pip install folium
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors



In [13]:
# map of Bangkok
bangkok_map = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers
for Latitude, Longitude, District in zip(bangkok_data['Latitude'], bangkok_data['Longitude'], bangkok_data['District']):
    label = '{}'.format(District)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [Latitude, Longitude],
        radius=5,
        popup=label,
        color='blue',
        fill=True
        ).add_to(bangkok_map)  
    
bangkok_map

### Locate Different Venues and their Categories in Bangkok using Foursquare

In [14]:
# @hidden_cell

# getting Foursquare credentials
CLIENT_ID = 'RTUMWPDJEEXNIIXZYL1QDOBI4G4OSHU2SIPT510PQSDKMA1A';
CLIENT_SECRET = 'VG0RPQYIPIYWLILGEVLXZ1VOHYIQZVBUXN0IJFDMYSPUV1HI';
VERSION = '20180605' # Foursquare API version

In [15]:
radius = 10000

venues = []

for Latitude, Longitude, District in zip(bangkok_data['Latitude'], bangkok_data['Longitude'], bangkok_data['District']):
    
    # create API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        Latitude,
        Longitude,
        radius)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            District,
            Latitude,
            Longitude, 
            venue['venue']['name'],
            venue['venue']['categories'][0]['name']))

In [16]:
# convert venues lsit to datafram
venues_bangkok = pd.DataFrame(venues)
venues_bangkok.head()

Unnamed: 0,0,1,2,3,4
0,Bang Bon,13.6592,100.3991,ขาหมูบางหว้า,Thai Restaurant
1,Bang Bon,13.6592,100.3991,MK (เอ็มเค),Hotpot Restaurant
2,Bang Bon,13.6592,100.3991,KFC,Fast Food Restaurant
3,Bang Bon,13.6592,100.3991,สอาด ลูกชิ้นปลา (สอาด ลูกชิ้นปลาเฉาโจว),Noodle House
4,Bang Bon,13.6592,100.3991,Burger King (เบอร์เกอร์คิง),Fast Food Restaurant


In [17]:
# rename columns
venues_bangkok.columns = ['District', 'Latitude', 'Longitude', 'Venue Name','Venue Category']
venues_bangkok.head()

Unnamed: 0,District,Latitude,Longitude,Venue Name,Venue Category
0,Bang Bon,13.6592,100.3991,ขาหมูบางหว้า,Thai Restaurant
1,Bang Bon,13.6592,100.3991,MK (เอ็มเค),Hotpot Restaurant
2,Bang Bon,13.6592,100.3991,KFC,Fast Food Restaurant
3,Bang Bon,13.6592,100.3991,สอาด ลูกชิ้นปลา (สอาด ลูกชิ้นปลาเฉาโจว),Noodle House
4,Bang Bon,13.6592,100.3991,Burger King (เบอร์เกอร์คิง),Fast Food Restaurant


In [18]:
# number of unique catergories
len(venues_bangkok['Venue Category'].unique())

130

In [19]:
# check if Shopping Mall is a category
venues_bangkok['Venue Category'].unique()

array(['Thai Restaurant', 'Hotpot Restaurant', 'Fast Food Restaurant',
       'Noodle House', 'Garden Center', 'Bowling Alley', 'Ice Cream Shop',
       'Asian Restaurant', 'Coffee Shop', 'BBQ Joint', 'Pet Café',
       'Water Park', 'Japanese Restaurant', 'Factory',
       'Recreation Center', 'Satay Restaurant', 'Clothing Store',
       'Furniture / Home Store', 'Soup Place', 'Flea Market',
       'Dessert Shop', 'Pet Store', 'Design Studio', 'Som Tum Restaurant',
       'Pub', 'Gym / Fitness Center', 'Park', 'Supermarket', 'Track',
       'Soccer Field', 'Massage Studio', 'Train Station',
       'Shabu-Shabu Restaurant', 'Food Service', 'Shopping Mall',
       'Department Store', 'Bookstore', 'Convenience Store',
       'Golf Course', 'Gun Range', 'Salad Place', 'Chinese Restaurant',
       'Seafood Restaurant', 'Butcher', 'Night Market', 'Hotel', 'Bistro',
       'Bakery', 'Indian Restaurant', 'Hotel Bar', 'Multiplex',
       'Food & Drink Shop', 'French Restaurant', 'Café',
      

### Analysis of each District in Bangkok

In [20]:
# one hot encoding
onehotencoding = pd.get_dummies(venues_bangkok[['Venue Category']], prefix="", prefix_sep="")

# add district column to dataframe
onehotencoding['District'] = venues_bangkok['District'] 

# move district column to the first column
fixed_columns = [onehotencoding.columns[-1]] + list(onehotencoding.columns[:-1])
onehotencoding = onehotencoding[fixed_columns]
bangkok_data_ohe = onehotencoding

print(bangkok_data_ohe.shape)
bangkok_data_ohe.head()

(1500, 131)


Unnamed: 0,District,Art Gallery,Asian Restaurant,Athletics & Sports,BBQ Joint,Badminton Court,Bakery,Bar,Beer Bar,Beer Garden,...,Tea Room,Thai Restaurant,Tonkatsu Restaurant,Tour Provider,Track,Train Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Water Park,Zoo
0,Bang Bon,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
1,Bang Bon,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Bang Bon,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Bang Bon,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Bang Bon,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [21]:
# group data venue category and get weight of each category
bangkok_data_grouped = bangkok_data_ohe.groupby(["District"]).mean().reset_index()

print(bangkok_data_grouped.shape)
bangkok_data_grouped.head()

(50, 131)


Unnamed: 0,District,Art Gallery,Asian Restaurant,Athletics & Sports,BBQ Joint,Badminton Court,Bakery,Bar,Beer Bar,Beer Garden,...,Tea Room,Thai Restaurant,Tonkatsu Restaurant,Tour Provider,Track,Train Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Water Park,Zoo
0,Bang Bon,0.0,0.066667,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,...,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0
1,Bang Kapi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0
2,Bang Khae,0.0,0.066667,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,...,0.0,0.1,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0
3,Bang Khen,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,...,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bang Kho Laem,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [22]:
# no. of districts with shopping malls
len(bangkok_data_grouped[bangkok_data_grouped["Shopping Mall"] > 0])

29

### Extract Districts and their Shopping Malls

In [23]:
bk_mall = bangkok_data_grouped[["District","Shopping Mall"]]
bk_mall.head()

Unnamed: 0,District,Shopping Mall
0,Bang Bon,0.0
1,Bang Kapi,0.0
2,Bang Khae,0.033333
3,Bang Khen,0.0
4,Bang Kho Laem,0.033333


### Cluster Districts in Bangkok using K-means

In [24]:
from sklearn.cluster import KMeans

# number of clusters
kclusters = 3

bk_cluster = bk_mall.drop("District", 1)

# k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bk_cluster)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 2, 0, 2, 0, 0, 2, 2, 0, 2], dtype=int32)

In [25]:
# new dataframe with cluster and top 10 venues for each district
bk_merged = bk_mall.copy()

# add clustering labels
bk_merged["Cluster Label"] = kmeans.labels_

In [26]:
bk_merged.rename(columns={"District": "District"}, inplace=True)
bk_merged.head()

Unnamed: 0,District,Shopping Mall,Cluster Label
0,Bang Bon,0.0,2
1,Bang Kapi,0.0,2
2,Bang Khae,0.033333,0
3,Bang Khen,0.0,2
4,Bang Kho Laem,0.033333,0


In [27]:
# add latitude/longitude for each district
bk_merged = bk_merged.join(bangkok_data.set_index("District"), on="District")

print(bk_merged.shape)
bk_merged.head()

(50, 7)


Unnamed: 0,District,Shopping Mall,Cluster Label,Postal Code,Population,Latitude,Longitude
0,Bang Bon,0.0,2,10150,105161,13.6592,100.3991
1,Bang Kapi,0.0,2,10240,148465,13.765833,100.647778
2,Bang Khae,0.033333,0,10160,191781,13.696111,100.409444
3,Bang Khen,0.0,2,10220,189539,13.873889,100.596389
4,Bang Kho Laem,0.033333,0,10120,94956,13.693333,100.5025


In [28]:
# sort by Cluster Labels
bk_merged.sort_values(["Cluster Label"], inplace=True)
bk_merged

Unnamed: 0,District,Shopping Mall,Cluster Label,Postal Code,Population,Latitude,Longitude
39,Samphanthawong,0.033333,0,10100,27452,13.731389,100.514167
35,Prawet,0.033333,0,10250,160671,13.716944,100.694444
27,Nong Chok,0.033333,0,10530,157138,13.855556,100.8625
36,Rat Burana,0.033333,0,10140,86695,13.682222,100.505556
21,Khlong San,0.033333,0,10600,76446,13.730278,100.509722
32,Phra Khanong,0.033333,0,10260,93482,13.702222,100.601667
13,Chatuchak,0.033333,0,10900,160906,13.828611,100.559722
41,Sathon,0.033333,0,10120,84916,13.708056,100.526389
11,Bangkok Yai,0.033333,0,10600,72321,13.722778,100.476389
34,Pom Prap Sattru Phai,0.033333,0,10100,51006,13.758056,100.513056


### Visualise the resulting clusters

In [29]:
# create map
map_clusters = folium.Map(location=[Latitude, Longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for Latitude, Longitude, poi, cluster in zip(bk_merged['Latitude'], bk_merged['Longitude'], bk_merged['District'], bk_merged['Cluster Label']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [Latitude, Longitude],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine Cluster

In [30]:
# red: cluster 0
bk_merged.loc[bk_merged['Cluster Label'] == 0]

Unnamed: 0,District,Shopping Mall,Cluster Label,Postal Code,Population,Latitude,Longitude
39,Samphanthawong,0.033333,0,10100,27452,13.731389,100.514167
35,Prawet,0.033333,0,10250,160671,13.716944,100.694444
27,Nong Chok,0.033333,0,10530,157138,13.855556,100.8625
36,Rat Burana,0.033333,0,10140,86695,13.682222,100.505556
21,Khlong San,0.033333,0,10600,76446,13.730278,100.509722
32,Phra Khanong,0.033333,0,10260,93482,13.702222,100.601667
13,Chatuchak,0.033333,0,10900,160906,13.828611,100.559722
41,Sathon,0.033333,0,10120,84916,13.708056,100.526389
11,Bangkok Yai,0.033333,0,10600,72321,13.722778,100.476389
34,Pom Prap Sattru Phai,0.033333,0,10100,51006,13.758056,100.513056


In [31]:
# purple: cluser 1
bk_merged.loc[bk_merged['Cluster Label'] == 1]

Unnamed: 0,District,Shopping Mall,Cluster Label,Postal Code,Population,Latitude,Longitude
29,Pathum Wan,0.1,1,10330,53263,13.744942,100.5222
25,Lat Phrao,0.1,1,10230,122182,13.803611,100.6075
48,Watthana,0.1,1,10110,81623,13.742222,100.585833
37,Ratchathewi,0.1,1,10400,73035,13.758889,100.534444
22,Khlong Toei,0.066667,1,10110,109041,13.708056,100.583889
19,Khan Na Yao,0.066667,1,10230,88678,13.8271,100.6743
47,Wang Thonglang,0.1,1,10310,114768,13.7864,100.6087
17,Dusit,0.066667,1,10300,107655,13.776944,100.520556
15,Din Daeng,0.066667,1,10400,130220,13.769722,100.552778
31,Phaya Thai,0.066667,1,10400,72952,13.78,100.542778


In [32]:
# green: cluster 2
bk_merged.loc[bk_merged['Cluster Label'] == 2]

Unnamed: 0,District,Shopping Mall,Cluster Label,Postal Code,Population,Latitude,Longitude
42,Suan Luang,0.0,2,10250,115658,13.730278,100.651389
38,Sai Mai,0.0,2,10220,188123,13.919167,100.645833
40,Saphan Sung,0.0,2,10240,89825,13.77,100.684722
46,Thung Khru,0.0,2,10140,116473,13.6472,100.4958
0,Bang Bon,0.0,2,10150,105161,13.6592,100.3991
24,Lat Krabang,0.0,2,10520,163175,13.722317,100.759669
30,Phasi Charoen,0.0,2,10160,129827,13.714722,100.437222
26,Min Buri,0.0,2,10510,137251,13.813889,100.748056
23,Lak Si,0.0,2,10210,109770,13.8875,100.578889
18,Huai Khwang,0.0,2,10310,78175,13.776667,100.579444


### Conclusion

From the map of the different districts and their identified clusters, the green cluster represent the districts with the lowest concentration of shopping malls while the purple cluster represent the districts with the highest concentrations of shopping malls.

This could potentially imply that the green areas are potential areas to open new shopping malls given the low levels of competition from existing shopping malls. On the other hand, purple areas are likely to be more populated with shopping malls and constructing a new shopping mall would imply that there would be intense competition in the area.

By zooming in on to the map, we can also see that the purple districts are areas that are more highly populated, with many residential buildings observed, along with several essential services like hospitals and public transport stations. Ostensibly, the population density in these areas would be higher, and as expected, retailers would anticipate high volumes of people, resulting in the correspondingly higher concentration of shopping malls. On the flipside., the green districts have residential building more spread out, which would imply a lower shopping mall concentration. However, this presents an opportunity to enter the market as the low concentration of shopping malls coupled with the lack of options for consumers in the area would make a new shopping mall complex in these areas very attractive.