# Capstone Project - The Battle of Neighborhoods

## Prospects of a Lunch Restaurant in Seoul, Korea.

## 1. Introduction/Business Problem

![Seoul](seoul.jpg)

My friend wants to open a lunch restaurant in Seoul. He asked me for help.

I decided to help him by doing some analysis in the city of Seoul.
I offer two options:
+ Open a restaurant near major office buildings
+ Open fast food restaurants near the transport stations

Target Audiences:
+ People who want to open a restaurant like my friend or maybe a cafe, they can see the pros and cons of the locations.
+ Tourists looking for restaurants in Seoul.
+ Someone wants to understand a piece of data science work.

## 2. Data

I make use of https://en.wikipedia.org/wiki/List_of_districts_of_Seoul page to scrap the table to create a data-frame.

After that, I get coordinates of districts by using Geopy Client and prepare data.

I will first mark the locations of the districts with Foursquare and then give the next analysis.

**Using BeautifulSoup to find Table and saving them to file**

**Dropping Korean Character in Table**

In [4]:
import pandas as pd
df = pd.read_csv('Seoul.csv')
df.head()

Unnamed: 0,Name,Population,Area,Population_density
0,Dobong-gu (도봉구; 道峰區),355712,20.70 km²,17184/km²
1,Dongdaemun-gu (동대문구; 東大門區),376319,14.21 km²,26483/km²
2,Dongjak-gu (동작구; 銅雀區),419261,16.35 km²,25643/km²
3,Eunpyeong-gu (은평구; 恩平區),503243,29.70 km²,16944/km²
4,Gangbuk-gu (강북구; 江北區),338410,23.60 km²,14339/km²


In [5]:
df[['Name','Korean_language1', 'Korean_language2']] = df['Name'].str.split(' ',expand=True)
df.drop(['Korean_language1'], axis=1, inplace=True)
df.drop(['Korean_language2'], axis=1, inplace=True)
df.head()

Unnamed: 0,Name,Population,Area,Population_density
0,Dobong-gu,355712,20.70 km²,17184/km²
1,Dongdaemun-gu,376319,14.21 km²,26483/km²
2,Dongjak-gu,419261,16.35 km²,25643/km²
3,Eunpyeong-gu,503243,29.70 km²,16944/km²
4,Gangbuk-gu,338410,23.60 km²,14339/km²


**Getting coordinates of districts by using Geopy Client and saving**

In [6]:
Latitude = []
Longitude = []

for i in df['Name']:
    location = geolocator.geocode(i)
    Latitude.append(location.latitude)
    Longitude.append(location.longitude)
    
df['Latitude'] = Latitude
df['Longitude'] = Longitude
df.head()

df.to_csv('Seoul_co.csv', index = False)

In [7]:
df = pd.read_csv('Seoul_co.csv')

**Using folium library to add districts to map:**

![Map all Districts](mapalldistrict.PNG)

## 3. Visualization and Data Exploration:

### 3a. Open a restaurant near major office buildings

After finding information on the internet about the places where have major office building, I found 5 locations: Gangnam-gu, Jung-gu, Seocho-gu, Yeongdeungpo-gu, Yongsan-gu.

In [61]:
Districs_list = ['Gangnam-gu', 'Jung-gu', 'Seocho-gu', 'Yeongdeungpo-gu', 'Yongsan-gu']
Seoul_df_selected = df.loc[df['Name'].isin(Districs_list)]
Seoul_df_selected

Unnamed: 0,Name,Population,Area,Population_density,Latitude,Longitude
6,Gangnam-gu,583446,39.50 km²,14771/km²,37.5177,127.0473
13,Jung-gu,136227,9.96 km²,13677/km²,37.563656,126.99751
17,Seocho-gu,454288,47.00 km²,9666/km²,37.4835,127.0322
23,Yeongdeungpo-gu,421436,24.53 km²,17180/km²,37.5262,126.8959
24,Yongsan-gu,249914,21.87 km²,11427/km²,37.5323,126.99


![Map 5 districts](map5d.PNG)

I make use of Foursquare API to obtain the most common venues in Food Category within 1 kilometer of each major district.

In [19]:
print (Seoul_5_district_venues['Venue Category'].value_counts())

Korean Restaurant                35
BBQ Joint                        19
Noodle House                     12
Bakery                           12
Café                             11
Chinese Restaurant                8
Fried Chicken Joint               5
Pizza Place                       5
Seafood Restaurant                5
Dumpling Restaurant               4
Vietnamese Restaurant             4
Burger Joint                      3
Breakfast Spot                    3
Fast Food Restaurant              3
Mexican Restaurant                3
Udon Restaurant                   2
Indian Restaurant                 2
Thai Restaurant                   2
Restaurant                        2
Italian Restaurant                2
Cantonese Restaurant              2
Sushi Restaurant                  2
Bunsik Restaurant                 2
Modern European Restaurant        2
German Restaurant                 2
Japanese Restaurant               2
Bistro                            1
Dim Sum Restaurant          

![map restaurant](mapres.PNG)

I list top 10 restaurant in 5 district to find what kind of food is most favourite.

In [20]:
Seoul_5d_restaurant_Top10 = Seoul_5_district_venues['Venue Category'].value_counts()[0:10].to_frame(name='frequency')
Seoul_5d_restaurant_Top10 = Seoul_5d_restaurant_Top10.reset_index()

Seoul_5d_restaurant_Top10.rename(index=str, columns={"index": "Venue_Category", "frequency": "Frequency"}, inplace=True)
Seoul_5d_restaurant_Top10

Unnamed: 0,Venue_Category,Frequency
0,Korean Restaurant,35
1,BBQ Joint,19
2,Noodle House,12
3,Bakery,12
4,Café,11
5,Chinese Restaurant,8
6,Fried Chicken Joint,5
7,Pizza Place,5
8,Seafood Restaurant,5
9,Dumpling Restaurant,4


![Most restaurant](Most_restaurant.png)

I also find the number of restaurant in each district.

![Number of restaurant in 5 Districts of Seoul](Most_restaurant_in_5_Districts.png)

I also explored how most common restaurant in each district

In [24]:
num_top_venues = 5

for hood in Seoul_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = Seoul_grouped[Seoul_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Gangnam-gu----
                        venue  freq
0           Korean Restaurant  0.20
1                      Bakery  0.16
2          Chinese Restaurant  0.12
3                Noodle House  0.12
4  Modern European Restaurant  0.08


----Jung-gu----
                 venue  freq
0    Korean Restaurant  0.34
1         Noodle House  0.15
2               Bakery  0.12
3  Fried Chicken Joint  0.05
4   Italian Restaurant  0.05


----Seocho-gu----
                venue  freq
0           BBQ Joint  0.22
1   Korean Restaurant  0.22
2  Seafood Restaurant  0.11
3  Chinese Restaurant  0.07
4        Burger Joint  0.04


----Yeongdeungpo-gu----
               venue  freq
0          BBQ Joint   0.3
1  Korean Restaurant   0.2
2         Food Court   0.1
3               Café   0.1
4         Bagel Shop   0.1


----Yongsan-gu----
                   venue  freq
0              BBQ Joint  0.10
1      Korean Restaurant  0.10
2                   Café  0.10
3    Dumpling Restaurant  0.05
4  Vietnamese Restaur

## Clustering the Major Districts of Seoul

Finally, I try to cluster these 5 districts based on the frequency of restaurant venue categories and, use K-Means clustering.  Using K-Means algorithm rom Scikit-learn library I obtain 2 clusters as shown below.

![map clustering](mapclu.PNG)

From the most common venues this clustering makes:
- Seocho-gu, Yeongdeungpo-gu and Yongsan-gu are dominated by BBQ Joint, Korean Restaurant (Red cluster)
- Gangnam-gu, Jung-gu dominated by Korean Restaurant, Bakery and Noodle House (purple cluster).

## 3b. Open fast food restaurants near the transport stations

**In this report, I only focus on Bus Stop and Bus Station**

In [71]:
Seoul_transport = Seoul_transport[Seoul_transport['Venue Category'].str.contains('Bus')]
print(Seoul_transport['Neighbourhood'].value_counts())

Seocho-gu          16
Gwanak-gu          11
Yongsan-gu         10
Dongjak-gu          9
Gangnam-gu          9
Seongbuk-gu         8
Seongdong-gu        7
Songpa-gu           6
Jung-gu             6
Gangdong-gu         6
Gangbuk-gu          6
Gwangjin-gu         5
Dongdaemun-gu       4
Nowon-gu            4
Eunpyeong-gu        4
Jongno-gu           3
Jungnang-gu         2
Seodaemun-gu        2
Mapo-gu             2
Yeongdeungpo-gu     2
Geumcheon-gu        2
Gangseo-gu          1
Yangcheon-gu        1
Name: Neighbourhood, dtype: int64


**I find the area have the most Bus stop.**

![Most bus](Most_bus.png)

**I mark all bus stop in top area have highest number of bus stop**

![Map bus](mapbus.png)

I decide to stop since using Kmean doesn't have many benefits in here.

## 4. Results
The resutls of the exploratory data analysis and clustering are summarized below
### a. Open a restaurant near major office buildings
- Korean restaurants top the charts of most common venues in the 5 districts. 
- Seocho-gu, Yeongdeungpo-gu and Yongsan-gu are dominated by BBQ Joint, Korean Restaurant.
- Gangnam-gu, Jung-gu dominated by Korean Restaurant, Bakery and Noodle House.
- Yongsan-guhas maximum number of restaurants as the most common venue whereas has Yeongdeungpo-gu area has the least.

### b. Open fast food restaurants near the transport stations
- Seocho-gu has the highest number of bus stop whereas Gangnam-gu has the least.

#### In my opinion, I will advice my friend to open a restaurant in Seocho-gu districts since It doesn't have too much restaurant, he can avoid the competition. However it has the highest number of bus stop and maybe many people will come here.

## 5. Disscusion
Drawback of this analysis are-- the clustering is completely based on the most common venues obtained from Foursquare data.

## 6. Conclusion

I have made use of 
- scrap web-data, 
- use Foursquare API to explore the map 
- show the results by using Folium leaflet map.

I also have some information if someone ask me to help them to open a restaurant in Seoul or another city.