# Capstone Project - The Battle of the Neighborhoods (Week 1)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem

This project is for who is planning to open a Coffee House in Seoul , Korea.
This project suggests the best locations for Coffee Houses in Seoul. 
Seoul is the capital of Korea with a population of 10M. 

Korea's coffee culture has developed rapidly over the past 20 years. The number of coffee shops has increased dramatically and is gaining huge popularity. Annual coffee consumption is also steadily increasing. According to a survey by the Hyundai Economic Research Institute, the number of coffee an adult drinks over a year continued to rise to 291 in 2015, 317 in 2016, 336 in 2017, and 353 in 2018.

This report explores which neighborhoods of Seoul have the most as well as the best Coffee Houses. Also, this project answers the questions “Where should I open an Coffee House?” and “Where should I stay If I want a tasty coffee?”

## Data

* District of Seoul are obtained from https://en.wikipedia.org/wiki/List_of_districts_of_Seoul

* Latitude and Longitude values are obtained by using "geocoder".

* All data related to locations will be obtaine by using FourSquare API and Python Libraries.

In [1]:
import requests
import pandas as pd
from bs4 import BeautifulSoup
#!conda install -c conda-forge geopy --yes
import geocoder

In [2]:
wiki_link = 'https://en.wikipedia.org/wiki/List_of_districts_of_Seoul'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0'}
wiki_page = requests.get(wiki_link, headers = headers)
wiki_page

<Response [200]>

In [3]:
soup = BeautifulSoup(wiki_page.content, 'html.parser')
table = soup.find('table', {'class':'wikitable sortable'}).tbody

In [4]:
rows = table.find_all('tr')

In [5]:
columns = [i.text.replace('\n', '') for i in rows[0].find_all('th')]
columns

['Name', 'Population', 'Area', 'Population density']

In [6]:
df_seoul = pd.DataFrame(columns = columns)

In [7]:
for i in range(1, len(rows)):
    tds = rows[i].find_all('td')
    
    if len(tds) == 7:
        values = [tds[0].text, tds[1].text, tds[2].text.replace('\n', ''.replace('\xa0','')), tds[3].text, tds[4].text.replace('\n', ''.replace('\xa0','')), tds[5].text.replace('\n', ''.replace('\xa0','')), tds[6].text.replace('\n', ''.replace('\xa0',''))]
    else:
        values = [td.text.replace('\n', '').replace('\xa0','') for td in tds]
        
        df_seoul = df_seoul.append(pd.Series(values, index = columns), ignore_index = True)

        df_seoul

In [8]:
df_seoul.head()

Unnamed: 0,Name,Population,Area,Population density
0,Dobong-gu (도봉구; 道峰區),355712,20.70km²,17184/km²
1,Dongdaemun-gu (동대문구; 東大門區),376319,14.21km²,26483/km²
2,Dongjak-gu (동작구; 銅雀區),419261,16.35km²,25643/km²
3,Eunpyeong-gu (은평구; 恩平區),503243,29.70km²,16944/km²
4,Gangbuk-gu (강북구; 江北區),338410,23.60km²,14339/km²


In [9]:
df_seoul['District'] = df_seoul.Name.str.split('(').str[0]
df_seoul['District'] = df_seoul['District'].str.strip()

In [16]:
#data cleansing seoul delete
df_seoul = df_seoul.drop([df_seoul.index[25]])#.head()
df_seoul.tail()

Unnamed: 0,Name,Population,Area,Population density,District
20,Seongdong-gu (성동구; 城東區),303891,16.86km²,19364/km²,Seongdong-gu
21,Songpa-gu (송파구; 松坡區),671794,33.88km²,19829/km²,Songpa-gu
22,Yangcheon-gu (양천구; 陽川區),490708,17.40km²,28202/km²,Yangcheon-gu
23,Yeongdeungpo-gu (영등포구; 永登浦區),421436,24.53km²,17180/km²,Yeongdeungpo-gu
24,Yongsan-gu (용산구; 龍山區),249914,21.87km²,11427/km²,Yongsan-gu


In [17]:
def get_latlng(arcgis_geocoder):
    
    lat_lng_coords = None
    
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Seoul, Korea'.format(arcgis_geocoder))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [18]:
districts = df_seoul['District']    
coordinates = [get_latlng(districts) for districts in districts.tolist()]

In [19]:
df_seoul_loc = df_seoul

df_seoul_coordinates = pd.DataFrame(coordinates, columns = ['Latitude', 'Longitude'])
df_seoul_loc['Latitude'] = df_seoul_coordinates['Latitude']
df_seoul_loc['Longitude'] = df_seoul_coordinates['Longitude']

In [20]:
df_seoul_loc.drop(columns="Name", axis=1, inplace=True)
df_seoul_loc.drop(columns="Population", axis=1, inplace=True)
df_seoul_loc.drop(columns="Population density", axis=1, inplace=True)
df_seoul_loc.drop(columns="Area", axis=1, inplace=True)

In [21]:
df_seoul_loc.head()

Unnamed: 0,District,Latitude,Longitude
0,Dobong-gu,37.65066,127.03011
1,Dongdaemun-gu,37.58189,127.05408
2,Dongjak-gu,37.50056,126.95149
3,Eunpyeong-gu,37.61846,126.9278
4,Gangbuk-gu,37.6349,127.02015


In [22]:
import numpy as np
import json 
from geopy.geocoders import Nominatim 

import requests 
from pandas.io.json import json_normalize

import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes 
import folium

print("Libraries imported")

Libraries imported


In [23]:
from geopy.geocoders import Nominatim 

address = "Gangnam-gu, Seoul"

geolocator = Nominatim(user_agent = "Seoul_explorer")

location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print("The geographical coordinates of Seoul are {}, {}.".format(latitude, longitude))

The geographical coordinates of Seoul are 37.5177, 127.0473.


In [24]:
map_seoul = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, label in zip(df_seoul_loc["Latitude"], df_seoul_loc["Longitude"], df_seoul_loc["District"]):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=25,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.3,
        parse_html=False).add_to(map_seoul)  
    
map_seoul