# Capstone Project - The Battle of Neighborhoods (Week 1)

### Introduction

The goal for this project will be to explore the differences and similarities between "Koreatowns" in cities in the US and Canada with the korean capital Seoul.
   
   - Seoul, Korea
       - Toronto, Canada
       - Los Angeles, USA
       - New York, USA 

In this report the neigborhoods refered to as Koreatowns in cities outside of Korea will be compared with Seoul based on the distribution of venues. This will allow for a analysis of how similiar these "Koreatowns" actually are to the modern day Korean capital.

### Imports needed for data collection and exploration

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


### Data examples and exploration

First the location for the Koreatowns in the cities will need to be located.

In [2]:
# Coordinates in latitude and longitude for the koreatowns in each city
Toronto_latlong = [43.664516, -79.413005]
LA_latlong = [34.0618, -118.3006]
Newyork_latlong = [40.741997032, -73.985496058]

# Coordinates for Seoul
Seoul_latlong = [37.5665, 126.9780]

In [3]:
# Use a radius of 500 meters for the koreatowns
radius_town = 500

# Radius of 3km for Seuol since its a whole city not just a community.
radius_city = 3000

radius = [800,800, 800, 3000]

In [4]:
CLIENT_ID = '0UBWM0ZABLUIKP00GC25EG2VS4LWVPH4B03MPEZKUDNREGDK' # your Foursquare ID
CLIENT_SECRET = 'AEBUOUU0N34NRHRJYNUYIQPRFDYJAEMHZT31UOTEBHFWFHXT' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

In [5]:
# Function to extract venues from each location

def getNearbyVenues(city, latitudes, longitudes, radius):
    
    venues_list=[]
    for name, lat, lng, rad in zip(city, latitudes, longitudes, radius):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            rad, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [6]:
cities = ['LA','NewYork', 'Toronto', 'Seoul']
lats = [LA_latlong[0], Newyork_latlong[0], Toronto_latlong[0], Seoul_latlong[0]]
longs = [LA_latlong[1], Newyork_latlong[1], Toronto_latlong[1], Seoul_latlong[1]]

In [7]:
koreatowns_venues = getNearbyVenues(city = cities,
                                  latitudes = lats,
                                  longitudes = longs,
                                   radius = radius)
koreatowns_venues.reset_index()

LA
NewYork
Toronto
Seoul


In [8]:
koreatowns_venues.shape

(386, 7)

### Exmaple of data for the different Cities

In [12]:
koreatowns_venues[koreatowns_venues['City'] == 'LA'].head(5)

Unnamed: 0,City,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,LA,34.0618,-118.3006,BCD Tofu House,34.061961,-118.302713,Korean Restaurant
1,LA,34.0618,-118.3006,The LINE Hotel,34.06204,-118.300909,Hotel
2,LA,34.0618,-118.3006,Alfred Coffee Koreatown,34.061756,-118.300938,Café
3,LA,34.0618,-118.3006,Poketo,34.061798,-118.300865,Clothing Store
4,LA,34.0618,-118.3006,Cassell's Hamburgers,34.063417,-118.300411,Burger Joint


In [13]:
koreatowns_venues[koreatowns_venues['City'] == 'NewYork'].head(5)

Unnamed: 0,City,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
91,NewYork,40.741997,-73.985496,Equinox Gramercy,40.740749,-73.985771,Gym
92,NewYork,40.741997,-73.985496,CAVA,40.740842,-73.98551,Mediterranean Restaurant
93,NewYork,40.741997,-73.985496,Upland,40.741891,-73.98464,New American Restaurant
94,NewYork,40.741997,-73.985496,Madison Square Park,40.742262,-73.988006,Park
95,NewYork,40.741997,-73.985496,Barry's Bootcamp,40.742532,-73.984152,Gym / Fitness Center


In [14]:
koreatowns_venues[koreatowns_venues['City'] == 'Seoul'].head(5)

Unnamed: 0,City,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
286,Seoul,37.5665,126.978,무교동북어국집,37.567852,126.979753,Korean Restaurant
287,Seoul,37.5665,126.978,Läderach chocolatier suisse (레더라),37.568153,126.978265,Chocolate Shop
288,Seoul,37.5665,126.978,철철복집,37.567393,126.98131,Seafood Restaurant
289,Seoul,37.5665,126.978,The Plaza Hotel (더 플라자),37.564621,126.97806,Hotel
290,Seoul,37.5665,126.978,Seoul Plaza (서울광장),37.565475,126.977937,Pedestrian Plaza


### For this project the Venue category will be used to determine characteristics of the areas

In [15]:
koreatown_NY = koreatowns_venues[koreatowns_venues['City'] == 'NewYork']
koreatown_LA = koreatowns_venues[koreatowns_venues['City'] == 'LA']
koreatown_TO = koreatowns_venues[koreatowns_venues['City'] == 'Toronto']
seoul = koreatowns_venues[koreatowns_venues['City'] == 'Seoul']

In [25]:
print("Number of unique categories for venues" , len(koreatowns_venues['Venue Category'].unique()))

Number of unique categories for venues 125


Note: There could be difference in how the category labels are defined in different cities. A restaurant in Seoul most likely would fall under the category of Korean restaurant in another city. However this will be ignore for now.

# Analysis

In [None]:


# import k-means from clustering stage
from sklearn.cluster import KMeans