# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem

This project is for who is planning to open a Coffee House in Seoul , Korea.
This project suggests the best locations for Coffee Houses in Seoul. 
Seoul is the capital of Korea with a population of 10M. 

Korea's coffee culture has developed rapidly over the past 20 years. The number of coffee shops has increased dramatically and is gaining huge popularity. Annual coffee consumption is also steadily increasing. According to a survey by the Hyundai Economic Research Institute, the number of coffee an adult drinks over a year continued to rise to 291 in 2015, 317 in 2016, 336 in 2017, and 353 in 2018.

This report explores which neighborhoods of Seoul have the most as well as the best Coffee Houses. Also, this project answers the questions “Where should I open an Coffee House?” and “Where should I stay If I want a tasty coffee?”

## Data

* District of Seoul are obtained from https://en.wikipedia.org/wiki/List_of_districts_of_Seoul

* Latitude and Longitude values are obtained by using "geocoder".

* All data related to locations will be obtaine by using FourSquare API and Python Libraries.

In [1]:
import requests
import pandas as pd
from bs4 import BeautifulSoup
#!conda install -c conda-forge geopy --yes
import geocoder

In [2]:
wiki_link = 'https://en.wikipedia.org/wiki/List_of_districts_of_Seoul'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0'}
wiki_page = requests.get(wiki_link, headers = headers)
wiki_page

<Response [200]>

In [3]:
soup = BeautifulSoup(wiki_page.content, 'html.parser')
table = soup.find('table', {'class':'wikitable sortable'}).tbody

In [4]:
rows = table.find_all('tr')

In [5]:
columns = [i.text.replace('\n', '') for i in rows[0].find_all('th')]
columns

['Name', 'Population', 'Area', 'Population density']

In [6]:
df_seoul = pd.DataFrame(columns = columns)

In [7]:
for i in range(1, len(rows)):
    tds = rows[i].find_all('td')
    
    if len(tds) == 7:
        values = [tds[0].text, tds[1].text, tds[2].text.replace('\n', ''.replace('\xa0','')), tds[3].text, tds[4].text.replace('\n', ''.replace('\xa0','')), tds[5].text.replace('\n', ''.replace('\xa0','')), tds[6].text.replace('\n', ''.replace('\xa0',''))]
    else:
        values = [td.text.replace('\n', '').replace('\xa0','') for td in tds]
        
        df_seoul = df_seoul.append(pd.Series(values, index = columns), ignore_index = True)

        df_seoul

In [8]:
df_seoul.head()

Unnamed: 0,Name,Population,Area,Population density
0,Dobong-gu (도봉구; 道峰區),355712,20.70km²,17184/km²
1,Dongdaemun-gu (동대문구; 東大門區),376319,14.21km²,26483/km²
2,Dongjak-gu (동작구; 銅雀區),419261,16.35km²,25643/km²
3,Eunpyeong-gu (은평구; 恩平區),503243,29.70km²,16944/km²
4,Gangbuk-gu (강북구; 江北區),338410,23.60km²,14339/km²


In [9]:
df_seoul['District'] = df_seoul.Name.str.split('(').str[0]
df_seoul['District'] = df_seoul['District'].str.strip()

In [16]:
#data cleansing seoul delete
df_seoul = df_seoul.drop([df_seoul.index[25]])#.head()
df_seoul.tail()

Unnamed: 0,Name,Population,Area,Population density,District
20,Seongdong-gu (성동구; 城東區),303891,16.86km²,19364/km²,Seongdong-gu
21,Songpa-gu (송파구; 松坡區),671794,33.88km²,19829/km²,Songpa-gu
22,Yangcheon-gu (양천구; 陽川區),490708,17.40km²,28202/km²,Yangcheon-gu
23,Yeongdeungpo-gu (영등포구; 永登浦區),421436,24.53km²,17180/km²,Yeongdeungpo-gu
24,Yongsan-gu (용산구; 龍山區),249914,21.87km²,11427/km²,Yongsan-gu


In [17]:
def get_latlng(arcgis_geocoder):
    
    lat_lng_coords = None
    
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Seoul, Korea'.format(arcgis_geocoder))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [18]:
districts = df_seoul['District']    
coordinates = [get_latlng(districts) for districts in districts.tolist()]

In [19]:
df_seoul_loc = df_seoul

df_seoul_coordinates = pd.DataFrame(coordinates, columns = ['Latitude', 'Longitude'])
df_seoul_loc['Latitude'] = df_seoul_coordinates['Latitude']
df_seoul_loc['Longitude'] = df_seoul_coordinates['Longitude']

In [20]:
df_seoul_loc.drop(columns="Name", axis=1, inplace=True)
df_seoul_loc.drop(columns="Population", axis=1, inplace=True)
df_seoul_loc.drop(columns="Population density", axis=1, inplace=True)
df_seoul_loc.drop(columns="Area", axis=1, inplace=True)

In [21]:
df_seoul_loc.head()

Unnamed: 0,District,Latitude,Longitude
0,Dobong-gu,37.65066,127.03011
1,Dongdaemun-gu,37.58189,127.05408
2,Dongjak-gu,37.50056,126.95149
3,Eunpyeong-gu,37.61846,126.9278
4,Gangbuk-gu,37.6349,127.02015


In [22]:
import numpy as np
import json 
from geopy.geocoders import Nominatim 

import requests 
from pandas.io.json import json_normalize

import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes 
import folium

print("Libraries imported")

Libraries imported


In [23]:
from geopy.geocoders import Nominatim 

address = "Gangnam-gu, Seoul"

geolocator = Nominatim(user_agent = "Seoul_explorer")

location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print("The geographical coordinates of Seoul are {}, {}.".format(latitude, longitude))

The geographical coordinates of Seoul are 37.5177, 127.0473.


In [24]:
map_seoul = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, label in zip(df_seoul_loc["Latitude"], df_seoul_loc["Longitude"], df_seoul_loc["District"]):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=25,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.3,
        parse_html=False).add_to(map_seoul)  
    
map_seoul

## Methodology

Master data which includes “District”, “Latitude” and “Longitude” information of Seoul.

Python Folium library was used to visualize map of Seoul and its districts. In order to get Latitude and Longitudes of Seoul, geopy and geocoder are used. The map below was obtained.

In order to explore and categorize places Foursquare API was used. Limit was set to 100 and Radius was set to 500. Here is a head of the list Venues name, category, latitude and longitude information from Forsquare API. 25 venues are returned.

I will also use one-hot encoding to replace categorical data with numbers and perform clustering.

In [25]:
CLIENT_ID = 'CW12KCGMCNQIWPA1GDFUISYK3XAVLPCGWPRH15XRELB3INML' # enter your Foursquare ID here!
CLIENT_SECRET = 'OOKIGBIX5EUODYSCPBLMWYEXKGJ5HSA2S3KES1RE4JLXL2XI' # enter your Foursquare Secret here!

VERSION = '20180604' # what version of Foursquare you want to use
LIMIT = 100
radius = 500

In [26]:
df_seoul_loc.loc [0, "District"]
df_seoul_loc.loc [0, "District"]

'Dobong-gu'

In [27]:
neighborhood_latitude = df_seoul_loc.loc[0, "Latitude"]
neighborhood_longitude = df_seoul_loc.loc[0, "Longitude"] 

neighborhood_name = df_seoul_loc.loc[0, "District"] 

print("Latitude and longitude values of the neighborhood {} are {}, {}.".format(neighborhood_name, neighborhood_latitude, neighborhood_longitude))

Latitude and longitude values of the neighborhood Dobong-gu are 37.65066000000007, 127.03011000000004.


In [28]:
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=CW12KCGMCNQIWPA1GDFUISYK3XAVLPCGWPRH15XRELB3INML&client_secret=OOKIGBIX5EUODYSCPBLMWYEXKGJ5HSA2S3KES1RE4JLXL2XI&v=20180604&ll=37.65066000000007,127.03011000000004&radius=500&limit=100'

## Analysis
Let's perform some basic explanatory data analysis and derive some additional info from our raw data. 

In [29]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '60ffc75a74f60452f6f4e066'},
 'response': {'headerLocation': 'Ssang-mun 3 dong',
  'headerFullLocation': 'Ssang-mun 3 dong, Seoul',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 6,
  'suggestedBounds': {'ne': {'lat': 37.655160004500075,
    'lng': 127.03578300702138},
   'sw': {'lat': 37.64615999550007, 'lng': 127.02443699297869}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '55b6da86498e7d4ef2e46871',
       'name': 'Dooly Museum (둘리뮤지엄)',
       'location': {'address': '도봉구 시루봉로1길 6',
        'lat': 37.65207492843499,
        'lng': 127.02774946932487,
        'labeledLatLngs': [{'label': 'display',
          'lat': 37.65207492843499,
          'lng': 127.02774946932487}],
        'distance': 260,
        'cc': 

In [30]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [31]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON


filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]


nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)


nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(20)

Unnamed: 0,name,categories,lat,lng
0,Dooly Museum (둘리뮤지엄),Museum,37.652075,127.027749
1,Starbucks (스타벅스),Coffee Shop,37.648425,127.034681
2,둥근달 어린이 공원,Playground,37.649673,127.030769
3,쌍문근린공원,Park,37.653162,127.028883
4,쌍문역 (10-015),Bus Stop,37.648791,127.034782
5,채랑,Korean Restaurant,37.6473,127.033502


In [32]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

6 venues were returned by Foursquare.


In [33]:
def getNearbyVenues(names, latitudes, longitudes, radius=1500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [34]:
seoul_venues = getNearbyVenues(names=df_seoul_loc['District'],
                                   latitudes=df_seoul_loc['Latitude'],
                                   longitudes=df_seoul_loc['Longitude']
                                  )

Dobong-gu
Dongdaemun-gu
Dongjak-gu
Eunpyeong-gu
Gangbuk-gu
Gangdong-gu
Gangnam-gu
Gangseo-gu
Geumcheon-gu
Guro-gu
Gwanak-gu
Gwangjin-gu
Jongno-gu
Jung-gu
Jungnang-gu
Mapo-gu
Nowon-gu
Seocho-gu
Seodaemun-gu
Seongbuk-gu
Seongdong-gu
Songpa-gu
Yangcheon-gu
Yeongdeungpo-gu
Yongsan-gu


In [35]:
print(seoul_venues.shape)
seoul_venues.head(250)

(1823, 7)


Unnamed: 0,District,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Dobong-gu,37.65066,127.03011,Dooly Museum (둘리뮤지엄),37.652075,127.027749,Museum
1,Dobong-gu,37.65066,127.03011,Starbucks (스타벅스),37.648425,127.034681,Coffee Shop
2,Dobong-gu,37.65066,127.03011,스시혼,37.646351,127.034020,Sushi Restaurant
3,Dobong-gu,37.65066,127.03011,MEGABOX Changdong (메가박스 창동),37.654821,127.038507,Multiplex
4,Dobong-gu,37.65066,127.03011,수정궁,37.662404,127.032934,Chinese Restaurant
...,...,...,...,...,...,...,...
245,Gangbuk-gu,37.63490,127.02015,PARIS BAGUETTE,37.637766,127.025640,Bakery
246,Gangbuk-gu,37.63490,127.02015,PENELOPE,37.637768,127.023048,Brewery
247,Gangbuk-gu,37.63490,127.02015,어니언,37.623902,127.028078,Coffee Shop
248,Gangbuk-gu,37.63490,127.02015,Burger King (버거킹),37.625123,127.026403,Fast Food Restaurant


In [36]:
seoul_venues.groupby("District").count()

Unnamed: 0_level_0,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dobong-gu,46,46,46,46,46,46
Dongdaemun-gu,69,69,69,69,69,69
Dongjak-gu,61,61,61,61,61,61
Eunpyeong-gu,53,53,53,53,53,53
Gangbuk-gu,35,35,35,35,35,35
Gangdong-gu,43,43,43,43,43,43
Gangnam-gu,100,100,100,100,100,100
Gangseo-gu,84,84,84,84,84,84
Geumcheon-gu,100,100,100,100,100,100
Guro-gu,21,21,21,21,21,21


In [37]:
print('There are {} unique venue categories.'.format(len(seoul_venues['Venue Category'].unique())))

There are 188 unique venue categories.


one hot-encoding

In [38]:
seoul_onehot = pd.get_dummies(seoul_venues[['Venue Category']], prefix="", prefix_sep="")

seoul_onehot['Neighborhood'] = seoul_venues['District'] 

fixed_columns = [seoul_onehot.columns[-1]] + list(seoul_onehot.columns[:-1])
seoul_onehot = seoul_onehot[fixed_columns]

seoul_onehot

Unnamed: 0,Neighborhood,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Aquarium,Arcade,...,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Village,Warehouse Store,Water Park,Wine Bar,Women's Store,Yoga Studio,Zoo
0,Dobong-gu,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Dobong-gu,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Dobong-gu,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Dobong-gu,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Dobong-gu,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1818,Yongsan-gu,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1819,Yongsan-gu,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1820,Yongsan-gu,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1821,Yongsan-gu,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [39]:
seoul_onehot.shape

(1823, 189)

In [40]:
seoul_grouped = seoul_onehot.groupby('Neighborhood').mean().reset_index()
seoul_grouped

Unnamed: 0,Neighborhood,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Aquarium,Arcade,...,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Village,Warehouse Store,Water Park,Wine Bar,Women's Store,Yoga Studio,Zoo
0,Dobong-gu,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Dongdaemun-gu,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Dongjak-gu,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,...,0.0,0.0,0.032787,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Eunpyeong-gu,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Gangbuk-gu,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Gangdong-gu,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Gangnam-gu,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0
7,Gangseo-gu,0.011905,0.011905,0.011905,0.059524,0.059524,0.02381,0.011905,0.0,0.0,...,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Geumcheon-gu,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0
9,Guro-gu,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [41]:
seoul_grouped.shape

(25, 189)

In [42]:
num_top_venues = 10

for hood in seoul_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = seoul_grouped[seoul_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Dobong-gu----
                  venue  freq
0           Coffee Shop  0.15
1                Bakery  0.15
2     Korean Restaurant  0.13
3  Fast Food Restaurant  0.09
4            Donut Shop  0.07
5        Ice Cream Shop  0.04
6   Japanese Restaurant  0.04
7                  Café  0.04
8          Soccer Field  0.02
9      Sushi Restaurant  0.02


----Dongdaemun-gu----
                 venue  freq
0    Korean Restaurant  0.16
1          Coffee Shop  0.12
2  Japanese Restaurant  0.06
3               Bakery  0.04
4           Donut Shop  0.04
5                 Café  0.04
6       Ice Cream Shop  0.04
7                Hotel  0.03
8         Noodle House  0.03
9      Bubble Tea Shop  0.03


----Dongjak-gu----
                venue  freq
0   Korean Restaurant  0.15
1         Coffee Shop  0.10
2              Bakery  0.08
3  Chinese Restaurant  0.07
4                Café  0.07
5          Donut Shop  0.05
6      Ice Cream Shop  0.05
7                Park  0.05
8   Bunsik Restaurant  0.03
9   Conv

9                 Park  0.03


----Yongsan-gu----
                   venue  freq
0                   Café  0.11
1      Korean Restaurant  0.10
2            Coffee Shop  0.09
3  Korean BBQ Restaurant  0.06
4                 Bakery  0.06
5              BBQ Joint  0.04
6           Noodle House  0.04
7     Chinese Restaurant  0.03
8                  Hotel  0.03
9              Multiplex  0.02




In [43]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [44]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']

columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
        
seoul_venues_sorted = pd.DataFrame(columns=columns)
seoul_venues_sorted['Neighborhood'] = seoul_grouped['Neighborhood']

for ind in np.arange(seoul_grouped.shape[0]):
    seoul_venues_sorted.iloc[ind, 1:] = return_most_common_venues(seoul_grouped.iloc[ind, :], num_top_venues)

seoul_venues_sorted.head(27)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Dobong-gu,Bakery,Coffee Shop,Korean Restaurant,Fast Food Restaurant,Donut Shop,Café,Japanese Restaurant,Ice Cream Shop,Noodle House,Museum
1,Dongdaemun-gu,Korean Restaurant,Coffee Shop,Japanese Restaurant,Ice Cream Shop,Donut Shop,Bakery,Café,Noodle House,Bubble Tea Shop,Italian Restaurant
2,Dongjak-gu,Korean Restaurant,Coffee Shop,Bakery,Chinese Restaurant,Café,Ice Cream Shop,Donut Shop,Park,Tunnel,Vietnamese Restaurant
3,Eunpyeong-gu,Bakery,Coffee Shop,Korean Restaurant,Fast Food Restaurant,Ice Cream Shop,Park,Trail,Metro Station,Burger Joint,Buffet
4,Gangbuk-gu,Coffee Shop,Korean Restaurant,Donut Shop,Bakery,Bus Station,Fast Food Restaurant,Ice Cream Shop,Market,Flea Market,Brewery
5,Gangdong-gu,Coffee Shop,Bakery,Café,Fast Food Restaurant,Ice Cream Shop,Donut Shop,Supermarket,Japanese Restaurant,Park,Vietnamese Restaurant
6,Gangnam-gu,Korean Restaurant,Coffee Shop,Bakery,BBQ Joint,Japanese Restaurant,Chinese Restaurant,Café,Seafood Restaurant,Bunsik Restaurant,Noodle House
7,Gangseo-gu,Coffee Shop,Korean Restaurant,Bakery,Airport Service,Donut Shop,Airport Lounge,Japanese Restaurant,Hotel,Chinese Restaurant,Bubble Tea Shop
8,Geumcheon-gu,Coffee Shop,Bakery,Korean Restaurant,Fast Food Restaurant,BBQ Joint,Outlet Store,Donut Shop,Chinese Restaurant,Buffet,Bubble Tea Shop
9,Guro-gu,Bakery,Donut Shop,Metro Station,Convenience Store,Auto Workshop,Fruit & Vegetable Store,Outlet Store,Café,Noodle House,Chinese Restaurant


In [59]:
ks = 3

seoul_grouped_clustering = seoul_grouped.drop('Neighborhood', 1)

kmeans = KMeans(n_clusters=ks, random_state=0).fit(seoul_grouped_clustering)

kmeans.labels_[1:10]

array([0, 0, 1, 1, 1, 0, 0, 1, 1], dtype=int32)

In [60]:
seoul_merged = df_seoul
seoul_merged['Cluster Labels'] = kmeans.labels_

seoul_merged = seoul_merged.join(seoul_venues_sorted.set_index('Neighborhood'), on='District')
seoul_venues_sorted.head(10)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Dobong-gu,Bakery,Coffee Shop,Korean Restaurant,Fast Food Restaurant,Donut Shop,Café,Japanese Restaurant,Ice Cream Shop,Noodle House,Museum
1,Dongdaemun-gu,Korean Restaurant,Coffee Shop,Japanese Restaurant,Ice Cream Shop,Donut Shop,Bakery,Café,Noodle House,Bubble Tea Shop,Italian Restaurant
2,Dongjak-gu,Korean Restaurant,Coffee Shop,Bakery,Chinese Restaurant,Café,Ice Cream Shop,Donut Shop,Park,Tunnel,Vietnamese Restaurant
3,Eunpyeong-gu,Bakery,Coffee Shop,Korean Restaurant,Fast Food Restaurant,Ice Cream Shop,Park,Trail,Metro Station,Burger Joint,Buffet
4,Gangbuk-gu,Coffee Shop,Korean Restaurant,Donut Shop,Bakery,Bus Station,Fast Food Restaurant,Ice Cream Shop,Market,Flea Market,Brewery
5,Gangdong-gu,Coffee Shop,Bakery,Café,Fast Food Restaurant,Ice Cream Shop,Donut Shop,Supermarket,Japanese Restaurant,Park,Vietnamese Restaurant
6,Gangnam-gu,Korean Restaurant,Coffee Shop,Bakery,BBQ Joint,Japanese Restaurant,Chinese Restaurant,Café,Seafood Restaurant,Bunsik Restaurant,Noodle House
7,Gangseo-gu,Coffee Shop,Korean Restaurant,Bakery,Airport Service,Donut Shop,Airport Lounge,Japanese Restaurant,Hotel,Chinese Restaurant,Bubble Tea Shop
8,Geumcheon-gu,Coffee Shop,Bakery,Korean Restaurant,Fast Food Restaurant,BBQ Joint,Outlet Store,Donut Shop,Chinese Restaurant,Buffet,Bubble Tea Shop
9,Guro-gu,Bakery,Donut Shop,Metro Station,Convenience Store,Auto Workshop,Fruit & Vegetable Store,Outlet Store,Café,Noodle House,Chinese Restaurant


## Results and Discussion
The results of k-means clustering are as below

In [61]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(ks)
ys = [i+x+(i*x)**2 for i in range(ks)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(seoul_merged['Latitude'], seoul_merged['Longitude'], seoul_merged['District'], seoul_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [62]:
seoul_merged.loc[seoul_merged['Cluster Labels'] == 0, seoul_merged.columns[[0] + list(range(4, seoul_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Dongdaemun-gu,Korean Restaurant,Coffee Shop,Japanese Restaurant,Ice Cream Shop,Donut Shop,Bakery,Café,Noodle House,Bubble Tea Shop,Italian Restaurant
2,Dongjak-gu,Korean Restaurant,Coffee Shop,Bakery,Chinese Restaurant,Café,Ice Cream Shop,Donut Shop,Park,Tunnel,Vietnamese Restaurant
6,Gangnam-gu,Korean Restaurant,Coffee Shop,Bakery,BBQ Joint,Japanese Restaurant,Chinese Restaurant,Café,Seafood Restaurant,Bunsik Restaurant,Noodle House
7,Gangseo-gu,Coffee Shop,Korean Restaurant,Bakery,Airport Service,Donut Shop,Airport Lounge,Japanese Restaurant,Hotel,Chinese Restaurant,Bubble Tea Shop
10,Gwanak-gu,Coffee Shop,Korean Restaurant,Bakery,Japanese Restaurant,Chinese Restaurant,Café,BBQ Joint,Market,Pub,Donut Shop
11,Gwangjin-gu,Coffee Shop,Ice Cream Shop,Chinese Restaurant,Bakery,BBQ Joint,Korean Restaurant,Café,Donut Shop,Park,Italian Restaurant
12,Jongno-gu,Coffee Shop,Korean Restaurant,Art Gallery,Historic Site,Italian Restaurant,Art Museum,Tea Room,Palace,Café,History Museum
13,Jung-gu,Hotel,Coffee Shop,Korean Restaurant,Café,Park,Bistro,French Restaurant,Italian Restaurant,Chinese Restaurant,Trail
15,Mapo-gu,Coffee Shop,BBQ Joint,Korean Restaurant,Café,Japanese Restaurant,Dessert Shop,Bakery,Ramen Restaurant,Udon Restaurant,Chinese Restaurant
17,Seocho-gu,Coffee Shop,Korean Restaurant,BBQ Joint,Bakery,Japanese Restaurant,Dessert Shop,Noodle House,Café,Italian Restaurant,Park


In [63]:
seoul_merged.loc[seoul_merged['Cluster Labels'] == 1, seoul_merged.columns[[0] + list(range(4, seoul_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Dobong-gu,Bakery,Coffee Shop,Korean Restaurant,Fast Food Restaurant,Donut Shop,Café,Japanese Restaurant,Ice Cream Shop,Noodle House,Museum
3,Eunpyeong-gu,Bakery,Coffee Shop,Korean Restaurant,Fast Food Restaurant,Ice Cream Shop,Park,Trail,Metro Station,Burger Joint,Buffet
4,Gangbuk-gu,Coffee Shop,Korean Restaurant,Donut Shop,Bakery,Bus Station,Fast Food Restaurant,Ice Cream Shop,Market,Flea Market,Brewery
5,Gangdong-gu,Coffee Shop,Bakery,Café,Fast Food Restaurant,Ice Cream Shop,Donut Shop,Supermarket,Japanese Restaurant,Park,Vietnamese Restaurant
8,Geumcheon-gu,Coffee Shop,Bakery,Korean Restaurant,Fast Food Restaurant,BBQ Joint,Outlet Store,Donut Shop,Chinese Restaurant,Buffet,Bubble Tea Shop
9,Guro-gu,Bakery,Donut Shop,Metro Station,Convenience Store,Auto Workshop,Fruit & Vegetable Store,Outlet Store,Café,Noodle House,Chinese Restaurant
14,Jungnang-gu,Korean Restaurant,Park,Bunsik Restaurant,Fast Food Restaurant,Supermarket,Bakery,Buffet,Coffee Shop,Soccer Field,Ice Cream Shop
16,Nowon-gu,Coffee Shop,Bakery,Donut Shop,Korean Restaurant,Ice Cream Shop,Chinese Restaurant,Fast Food Restaurant,Café,Bubble Tea Shop,Steakhouse
19,Seongbuk-gu,Coffee Shop,Café,Ice Cream Shop,Bakery,Supermarket,Buffet,Donut Shop,Golf Course,Fast Food Restaurant,Department Store
22,Yangcheon-gu,Bakery,Coffee Shop,Park,Korean Restaurant,Market,Ice Cream Shop,Fast Food Restaurant,Burger Joint,Fried Chicken Joint,Multiplex


In [64]:
seoul_merged.loc[seoul_merged['Cluster Labels'] == 2, seoul_merged.columns[[0] + list(range(4, seoul_merged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Seodaemun-gu,Trail,Bakery,Mountain,Fast Food Restaurant,Coffee Shop,Café,Art Museum,Scenic Lookout,Noodle House,Park


In [65]:
seoul_merged.loc[seoul_merged['Cluster Labels'] == 3, seoul_merged.columns[[0] + list(range(5, seoul_merged.shape[1]))]]

Unnamed: 0,District,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


## Conclusion
    
Kmeans algorithm was used as part of this clustering study. K value was set to 3. For more detailed and accurate guidance, the data set can be expanded, and the details of the neighbourhood or street can also be drilled.

The areas of Cluster 0 and Cluster 1 seem to be suitable places to open new cafes. 

The area with Cluster 0 is close to the center of Seoul, while the area with Cluster 1 is judged to be the periphery of Seoul. In addition, data such as real estate or floating population could be obtained to determine which areas are Cluster 0 and Cluster 1. In general, Cluster 0, which is the center of Seoul, has a high store rent and a large floating population, while Cluster 1 has a low store rent and a small floating population.