# Analyzing Venues Data of Seoul, South Korea

## Part 1: Introduction

This report will be targeted at people who are interested in exploring Seoul, South Korea. Most specifically, for foodies who like to explore a different eatery every week or even everyday.

I will use my limited data science knowledge to do the following:

1) Prepare the data (loading data, plotting map, finding venues with the help of FourSquare API)

2) Analyze the data (k-means clustering)

## Part 2: Data

Preparing the data for the districts in Seoul.

Import the necessary libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt 

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

Load the dataframe containing the districts and their corresponding geographical coordinates

In [2]:
seoul = pd.read_excel("Seoul.xlsx") 
print(seoul.shape)
seoul.head()

(25, 4)


Unnamed: 0,Korean,District,Latitude,Longitude
0,종로구,Jongno,37.58031,126.983079
1,중구,Jung,37.563656,126.99751
2,용산구,Yongsan,37.5323,126.99
3,성동구,Seongdong,37.5635,127.0365
4,광진구,Gwangjin,37.5384,127.0828


Get the geographical coordinates of Seoul using geopy library, and create map of Seoul with the districts superimposed on top

In [3]:
# Getting the geographical coordinates of Seoul
address = '서울, 대한민국'
geolocator = Nominatim(user_agent="KR_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# Plotting the map
map_seoul = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(seoul['Latitude'], seoul['Longitude'], seoul['District']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_seoul)  

print('The geographical coordinates of Seoul are {}, {}.'.format(latitude, longitude))    
map_seoul

The geographical coordinates of Seoul are 37.5666791, 126.9782914.


#### FourSquare API

In [4]:
# Define FourSquare API credentials
CLIENT_ID = 'Hidden' # your Foursquare ID
CLIENT_SECRET = 'Hidden' # will be reset perodically
VERSION = '20180605' # Foursquare API version

# Define a function to get the nearby venues
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

# Run the above function on each district and create a new dataframe
LIMIT = 200 # limit of number of venues returned by Foursquare API
radius = 1000 # define radius

seoul_venues = getNearbyVenues(names=seoul['District'],
                                   latitudes=seoul['Latitude'],
                                   longitudes=seoul['Longitude']
                                  )

Jongno
Jung
Yongsan
Seongdong
Gwangjin
Dongdaemun
Jungnang
Seongbuk
Gangbuk
Dobong
Nowon
Eunpyeong
Seodaemun
Mapo
Yangcheon
Gangseo
Guro
Geumcheon
Yeongdeung
Dongjak
Gwanak
Seocho
Gangnam
Songpa
Gangdong


Check the size of the resulting dataframe

In [5]:
print(seoul_venues.shape)
seoul_venues.head()

(722, 7)


Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Jongno,37.58031,126.983079,Baek In-Je House Museum (백인제가옥),37.580508,126.984164,Historic Site
1,Jongno,37.58031,126.983079,KIWA TAPROOM (기와탭룸),37.578711,126.98177,Brewery
2,Jongno,37.58031,126.983079,Blue Bottle Coffee (블루보틀),37.580143,126.980845,Coffee Shop
3,Jongno,37.58031,126.983079,Wood & Brick (우드앤브릭),37.579413,126.984166,Bakery
4,Jongno,37.58031,126.983079,MIRROR ROOM (미러룸),37.579933,126.981078,Coffee Shop


There are 722 venues detected.

Find out how many venues were returned for each district and how many unique categories are the venues classified into

In [6]:
print('There are {} unique categories of venues.'.format(len(seoul_venues['Venue Category'].unique())))
seoul_venues.groupby('District').count()

There are 133 unique categories of venues.


Unnamed: 0_level_0,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dobong,10,10,10,10,10,10
Dongdaemun,17,17,17,17,17,17
Dongjak,36,36,36,36,36,36
Eunpyeong,8,8,8,8,8,8
Gangbuk,19,19,19,19,19,19
Gangdong,19,19,19,19,19,19
Gangnam,28,28,28,28,28,28
Gangseo,11,11,11,11,11,11
Geumcheon,4,4,4,4,4,4
Guro,8,8,8,8,8,8
