# Capstone Project - The Battle of Neighborhoods

## Prospects of a Lunch Restaurant, Close to Office Areas in Seoul, Korea.

### 1. Introduction/Business Problem

My friend wants to open a lunch restaurant in Seoul. He asked me for help.

I decided to help him by doing some analysis in the city of Seoul.
I offer three options:
+ Open a restaurant near major office buildings
+ Open fast food restaurants near the transport stations
+ Open a restaurant in places with few restaurants to avoid competition

### 2. Data
I first make use of https://en.wikipedia.org/wiki/List_of_districts_of_Seoul page to scrap the table to create a data-frame.
After that, I get coordinates of districts by using Geopy Client and prepare data.

In [1]:
import sys
import requests
import json

import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors


import io
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

from sklearn.cluster import KMeans

from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="Seoul_explorer", timeout = 10)

**Using BeautifulSoup to find Table**

In [2]:
response_obj = requests.get('https://en.wikipedia.org/wiki/List_of_districts_of_Seoul').text
soup = BeautifulSoup(response_obj,'lxml')
Districts_Seoul_Table = soup.find('table', {'class':'wikitable sortable'})
print(Districts_Seoul_Table.tr.text)


Name
Population
Area
Population density



**Saving the data what I need**

In [4]:
Name = []
Population =[]
Area = []
Popdensity = []

for tr in Districts_Seoul_Table.find_all('tr'):
    i = 0
    for tds in tr.find_all('td'):
        if i == 0:
            Name.append(tds.text[:-1])
        if i == 1:
            Population.append(tds.text[:-1])
        if i == 2:
            Area.append(tds.text[:-1])
        if i == 3:
            Popdensity.append(tds.text[:-1])
        i = i + 1

#A = np.column_stack((Name, Population, Area, Popdensity))
df = pd.DataFrame({"Name": Name, "Population": Population, "Area": Area, "Population_density": Popdensity})
df.to_csv('Seoul.csv', index = False)

In [5]:
import pandas as pd
df = pd.read_csv('Seoul.csv')
df.head()

Unnamed: 0,Name,Population,Area,Population_density
0,Dobong-gu (도봉구; 道峰區),355712,20.70 km²,17184/km²
1,Dongdaemun-gu (동대문구; 東大門區),376319,14.21 km²,26483/km²
2,Dongjak-gu (동작구; 銅雀區),419261,16.35 km²,25643/km²
3,Eunpyeong-gu (은평구; 恩平區),503243,29.70 km²,16944/km²
4,Gangbuk-gu (강북구; 江北區),338410,23.60 km²,14339/km²


**Dropping Korean Character in Table**

In [6]:
df[['Name','Korean_language1', 'Korean_language2']] = df['Name'].str.split(' ',expand=True)
df.drop(['Korean_language1'], axis=1, inplace=True)
df.drop(['Korean_language2'], axis=1, inplace=True)
df.head()

Unnamed: 0,Name,Population,Area,Population_density
0,Dobong-gu,355712,20.70 km²,17184/km²
1,Dongdaemun-gu,376319,14.21 km²,26483/km²
2,Dongjak-gu,419261,16.35 km²,25643/km²
3,Eunpyeong-gu,503243,29.70 km²,16944/km²
4,Gangbuk-gu,338410,23.60 km²,14339/km²


**Getting coordinates of districts by using Geopy Client and saving**

In [15]:
Latitude = []
Longitude = []

for i in df['Name']:
    location = geolocator.geocode(i)
    Latitude.append(location.latitude)
    Longitude.append(location.longitude)
    
df['Latitude'] = Latitude
df['Longitude'] = Longitude
df.head()

df.to_csv('Seoul_co.csv', index = False)

Unnamed: 0,Name,Population,Area,Population_density,Latitude,Longitude
0,Dobong-gu,355712,20.70 km²,17184/km²,37.6686,127.0466
1,Dongdaemun-gu,376319,14.21 km²,26483/km²,37.5742,127.0395
2,Dongjak-gu,419261,16.35 km²,25643/km²,37.5121,126.9395
3,Eunpyeong-gu,503243,29.70 km²,16944/km²,37.6024,126.9293
4,Gangbuk-gu,338410,23.60 km²,14339/km²,37.6395,127.0255
