# Capstone Project - The Battle of the Neighborhoods (Week 1)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

As a major American city, **Boston** is home to a wide variety of business establishments. 

From Central Boston’s financial hub to the large hospitals of the Longwood Medical Area to the major research universities in Fenway, **Boston’s unique neighborhoods** help the city as a whole serve a global market. Each of Boston’s neighborhoods has something unique to offer businesses, and this factors into where industries choose to settle. 

In this report, We will try to find an optimal location for a **Korean Restaurant** in Boston. 

Since there are lots of Korean restaurants in Boston we will try to find **the neighborhood that are not already crowded with restaurants and with no top Korean restaurants in vicinity**. 

In this report, we will show to the skateholders about a few most promissing neighborhoods based on this criteria for them to make a final decision. 

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decission are:
* neighborhoods 's information,such as **zip code**, **cooridinates** in Boston area
* **number of restaurants** in the neighborhoods
* **number of korean restaurants** in the neighborhoods, if any

Following data sources will be needed to extract/generate the required information:
* number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**
* coordinate of Berlin center will be obtained using *US Zip Code Latitude and Longitude* from the website: [https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/table/]

In [133]:
import pandas as pd
Boston = pd.read_csv("Boston.csv",dtype=str)
Boston['Neighborhood'] = Boston['Neighborhood'].str.zfill(5)
Boston = Boston.drop_duplicates(subset = ['Latitude','Longitude'])
Boston.head()

Unnamed: 0,Neighborhood,City,State,Latitude,Longitude,Timezone,Daylight savings time flag,Borough
0,2101,Boston,MA,42.3566,-71.0565,-5,1,Central Boston
1,2102,Boston,MA,42.338947,-70.919635,-5,1,
5,2106,Boston,MA,42.354318,-71.073449,-5,1,
7,2108,Boston,MA,42.3549,-71.06408,-5,1,Back Bay/Beacon Hill
8,2109,Boston,MA,42.361477,-71.05417,-5,1,Central Boston


drop the null row in Neiborhoods

In [134]:
Boston= Boston[pd.notnull(Boston["Borough"])]
Boston.reset_index(drop=True, inplace=True)
Boston.head()

Unnamed: 0,Neighborhood,City,State,Latitude,Longitude,Timezone,Daylight savings time flag,Borough
0,2101,Boston,MA,42.3566,-71.0565,-5,1,Central Boston
1,2108,Boston,MA,42.3549,-71.06408,-5,1,Back Bay/Beacon Hill
2,2109,Boston,MA,42.361477,-71.05417,-5,1,Central Boston
3,2110,Boston,MA,42.356532,-71.05365,-5,1,Central Boston
4,2111,Boston,MA,42.349838,-71.06101,-5,1,Central Boston


In [135]:
Boston['Latitude'] = Boston['Latitude'].astype(float)
Boston['Longitude'] = Boston['Longitude'].astype(float)
Boston.head()

Unnamed: 0,Neighborhood,City,State,Latitude,Longitude,Timezone,Daylight savings time flag,Borough
0,2101,Boston,MA,42.3566,-71.0565,-5,1,Central Boston
1,2108,Boston,MA,42.3549,-71.06408,-5,1,Back Bay/Beacon Hill
2,2109,Boston,MA,42.361477,-71.05417,-5,1,Central Boston
3,2110,Boston,MA,42.356532,-71.05365,-5,1,Central Boston
4,2111,Boston,MA,42.349838,-71.06101,-5,1,Central Boston


In [136]:
Boston

Unnamed: 0,Neighborhood,City,State,Latitude,Longitude,Timezone,Daylight savings time flag,Borough
0,2101,Boston,MA,42.3566,-71.0565,-5,1,Central Boston
1,2108,Boston,MA,42.3549,-71.06408,-5,1,Back Bay/Beacon Hill
2,2109,Boston,MA,42.361477,-71.05417,-5,1,Central Boston
3,2110,Boston,MA,42.356532,-71.05365,-5,1,Central Boston
4,2111,Boston,MA,42.349838,-71.06101,-5,1,Central Boston
5,2113,Boston,MA,42.365028,-71.05636,-5,1,Central Boston
6,2114,Boston,MA,42.361792,-71.06774,-5,1,Central Boston
7,2115,Boston,MA,42.34308,-71.09268,-5,1,Fenway/Kenmore
8,2116,Boston,MA,42.349622,-71.07372,-5,1,Back Bay/Beacon Hill
9,2117,Boston,MA,42.338947,-71.07372,-5,1,Back Bay/Beacon Hill


In [21]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Libraries imported.


## Methodology <a name="methodology"></a>



1. Use geopy library to get the latitude and longitude values of Boston

In [22]:
address = 'Boston,MA'

geolocator = Nominatim(user_agent="tor_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Boston are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Boston are 42.3602534, -71.0582912.


2. create map of Boston using latitude and longitude values

In [110]:
map_boston = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood, borough in zip(Boston['Latitude'], Boston['Longitude'], Boston['Neighborhood'], Boston['Borough']):
    label = '{},{}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_boston)  
    
map_boston

3. import Foursquare Credentials and Version

In [35]:
CLIENT_ID = 'JKVUJ0GBGE4YNAEAZU1LIAEZ43LRS2EZX5HTZCGPFTCPVRZO' # your Foursquare ID
CLIENT_SECRET = '5GURXTPXEEC3OAN1IPDNRPDSOQIXR5HMKNENGOYJ5XKCGHYH' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: JKVUJ0GBGE4YNAEAZU1LIAEZ43LRS2EZX5HTZCGPFTCPVRZO
CLIENT_SECRET:5GURXTPXEEC3OAN1IPDNRPDSOQIXR5HMKNENGOYJ5XKCGHYH


In [36]:
LIMIT = 100

In [82]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [137]:
boston_venues = getNearbyVenues(names=Boston['Neighborhood'],
                                   latitudes=Boston['Latitude'],
                                   longitudes=Boston['Longitude']
                                  )

02101
02108
02109
02110
02111
02113
02114
02115
02116
02117
02118
02119
02120
02121
02122
02123
02124
02125
02127
02128
02133
02163
02196
02199
02203
02205
02210
02212
02215
02217
02222


In [138]:
print(boston_venues.shape)
boston_venues.head()

(1545, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,2101,42.3566,-71.0565,DAVIDsTEA,42.357263,-71.058332,Tea Room
1,2101,42.3566,-71.0565,sweetgreen,42.357704,-71.058713,Salad Place
2,2101,42.3566,-71.0565,Ogawa Coffee Boston,42.356674,-71.058109,Coffee Shop
3,2101,42.3566,-71.0565,Boston Wine Exchange,42.35607,-71.05742,Wine Shop
4,2101,42.3566,-71.0565,The Langham Boston Hotel,42.356482,-71.054501,Hotel


4. check how many venues were returned for each neighborhood

In [139]:
boston_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2101,62,62,62,62,62,62
2108,100,100,100,100,100,100
2109,100,100,100,100,100,100
2110,60,60,60,60,60,60
2111,100,100,100,100,100,100
2113,77,77,77,77,77,77
2114,37,37,37,37,37,37
2115,29,29,29,29,29,29
2116,100,100,100,100,100,100
2117,64,64,64,64,64,64


In [140]:
print('There are {} uniques categories.'.format(len(boston_venues['Venue Category'].unique())))

There are 197 uniques categories.


5. Analyze Each Neighborhood

In [141]:
# one hot encoding
boston_onehot = pd.get_dummies(boston_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
boston_onehot['Neighborhood'] = boston_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [boston_onehot.columns[-1]] + list(boston_onehot.columns[:-1])
boston_onehot = boston_onehot[fixed_columns]

boston_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,African Restaurant,Airport,American Restaurant,Aquarium,Arepa Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Beer Bar,Beer Garden,Belgian Restaurant,Big Box Store,Bistro,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Station,Business Service,Café,Cambodian Restaurant,Caribbean Restaurant,Chinese Restaurant,Church,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Dessert Shop,Dim Sum Restaurant,Diner,Dive Bar,Doctor's Office,Donut Shop,Ethiopian Restaurant,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food Court,Food Service,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gastropub,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,Historic Site,History Museum,Hockey Arena,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Insurance Office,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Lake,Latin American Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Malay Restaurant,Market,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Monument / Landmark,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Opera House,Optical Shop,Other Great Outdoors,Other Repair Shop,Outdoor Sculpture,Park,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Peruvian Restaurant,Pet Store,Pharmacy,Pilates Studio,Pizza Place,Playground,Plaza,Pool,Pub,Ramen Restaurant,Record Shop,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skating Rink,Ski Chalet,Soccer Field,Southern / Soul Food Restaurant,Souvenir Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Street Food Gathering,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Tour Provider,Tourist Information Center,Track,Trail,Udon Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2101,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2101,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2101,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2101,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2101,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [142]:
boston_onehot.shape

(1545, 197)

6. Get only restaurant venues in all of the Neighborhoods

In [143]:
cols = [col for col in boston_onehot.columns if 'Neighborhood' in col]+[col for col in boston_onehot.columns if 'Restaurant' in col]
boston_onehot = boston_onehot[cols]
print(boston_onehot.shape)
boston_onehot.head()

(1545, 41)


Unnamed: 0,Neighborhood,African Restaurant,American Restaurant,Arepa Restaurant,Asian Restaurant,Belgian Restaurant,Brazilian Restaurant,Cambodian Restaurant,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,French Restaurant,Greek Restaurant,Hotpot Restaurant,Indian Restaurant,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Latin American Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,New American Restaurant,Peruvian Restaurant,Ramen Restaurant,Restaurant,Seafood Restaurant,Southern / Soul Food Restaurant,Spanish Restaurant,Sushi Restaurant,Szechuan Restaurant,Tapas Restaurant,Thai Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,2101,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,2101,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,2101,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,2101,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,2101,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [144]:
boston_grouped = boston_onehot.groupby('Neighborhood').mean().reset_index()
boston_grouped

Unnamed: 0,Neighborhood,African Restaurant,American Restaurant,Arepa Restaurant,Asian Restaurant,Belgian Restaurant,Brazilian Restaurant,Cambodian Restaurant,Caribbean Restaurant,Chinese Restaurant,Dim Sum Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,French Restaurant,Greek Restaurant,Hotpot Restaurant,Indian Restaurant,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Latin American Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,New American Restaurant,Peruvian Restaurant,Ramen Restaurant,Restaurant,Seafood Restaurant,Southern / Soul Food Restaurant,Spanish Restaurant,Sushi Restaurant,Szechuan Restaurant,Tapas Restaurant,Thai Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,2101,0.0,0.048387,0.0,0.016129,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.032258,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.016129,0.0,0.0,0.016129,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0
1,2108,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.02,0.0,0.0,0.03,0.03,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.01
2,2109,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.18,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.09,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0
3,2110,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.016667,0.0,0.0,0.0,0.0,0.0,0.016667,0.016667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.0
4,2111,0.0,0.02,0.0,0.07,0.0,0.0,0.0,0.0,0.12,0.02,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.04,0.01,0.0,0.0,0.0,0.01,0.01
5,2113,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.025974,0.0,0.0,0.0,0.0,0.0,0.064935,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,2114,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.027027,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.027027,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,2115,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.034483,0.0,0.0,0.034483,0.0,0.0,0.0
8,2116,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.01,0.04,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0
9,2117,0.015625,0.015625,0.015625,0.015625,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.015625,0.0,0.0,0.0,0.015625,0.015625,0.0,0.015625,0.0,0.03125,0.03125,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.015625,0.0,0.03125,0.03125,0.0,0.0,0.0


In [145]:
boston_grouped.shape

(31, 41)

7. put that into a pandas dataframe

In [146]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

8. create the new dataframe and display the top 10 venues for each neighborhood.

In [147]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = boston_grouped['Neighborhood']

for ind in np.arange(boston_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(boston_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,2101,American Restaurant,Italian Restaurant,Greek Restaurant,Falafel Restaurant,Asian Restaurant,Belgian Restaurant,Vegetarian / Vegan Restaurant,Seafood Restaurant,Restaurant,Mediterranean Restaurant
1,2108,Chinese Restaurant,American Restaurant,Sushi Restaurant,Seafood Restaurant,Restaurant,New American Restaurant,Falafel Restaurant,French Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant
2,2109,Italian Restaurant,Seafood Restaurant,American Restaurant,Vegetarian / Vegan Restaurant,Greek Restaurant,Mediterranean Restaurant,Mexican Restaurant,Japanese Restaurant,Belgian Restaurant,Caribbean Restaurant
3,2110,Italian Restaurant,American Restaurant,Vegetarian / Vegan Restaurant,Greek Restaurant,Mediterranean Restaurant,Seafood Restaurant,Restaurant,Ethiopian Restaurant,Hotpot Restaurant,French Restaurant
4,2111,Chinese Restaurant,Asian Restaurant,Sushi Restaurant,American Restaurant,Dim Sum Restaurant,Hotpot Restaurant,Vietnamese Restaurant,Middle Eastern Restaurant,French Restaurant,Italian Restaurant


In [148]:
neighborhoods_venues_sorted.shape

(31, 11)

9. Use K-Mean to cluster the Neighborhoods according to their common venue. K-Means algorithm is one of the most common cluster method of unsupervised learning.

In [149]:
# set number of clusters
kclusters = 5

boston_grouped_clustering = boston_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(boston_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 1, 2, 1, 4, 2, 1, 1, 1, 1, 1, 3, 1, 0, 4, 1, 1, 4, 1, 2, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1])

10. Create a new dataframe that includes the clusters as well as the top 10 venues for each neighborhood

In [150]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

boston_merged = Boston

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
boston_merged = boston_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

boston_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,City,State,Latitude,Longitude,Timezone,Daylight savings time flag,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,2101,Boston,MA,42.3566,-71.0565,-5,1,Central Boston,1,American Restaurant,Italian Restaurant,Greek Restaurant,Falafel Restaurant,Asian Restaurant,Belgian Restaurant,Vegetarian / Vegan Restaurant,Seafood Restaurant,Restaurant,Mediterranean Restaurant
1,2108,Boston,MA,42.3549,-71.06408,-5,1,Back Bay/Beacon Hill,1,Chinese Restaurant,American Restaurant,Sushi Restaurant,Seafood Restaurant,Restaurant,New American Restaurant,Falafel Restaurant,French Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant
2,2109,Boston,MA,42.361477,-71.05417,-5,1,Central Boston,2,Italian Restaurant,Seafood Restaurant,American Restaurant,Vegetarian / Vegan Restaurant,Greek Restaurant,Mediterranean Restaurant,Mexican Restaurant,Japanese Restaurant,Belgian Restaurant,Caribbean Restaurant
3,2110,Boston,MA,42.356532,-71.05365,-5,1,Central Boston,1,Italian Restaurant,American Restaurant,Vegetarian / Vegan Restaurant,Greek Restaurant,Mediterranean Restaurant,Seafood Restaurant,Restaurant,Ethiopian Restaurant,Hotpot Restaurant,French Restaurant
4,2111,Boston,MA,42.349838,-71.06101,-5,1,Central Boston,4,Chinese Restaurant,Asian Restaurant,Sushi Restaurant,American Restaurant,Dim Sum Restaurant,Hotpot Restaurant,Vietnamese Restaurant,Middle Eastern Restaurant,French Restaurant,Italian Restaurant


11. Finally, let's visualize the resulting clusters

In [163]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(boston_merged['Latitude'], boston_merged['Longitude'], boston_merged['Neighborhood'], boston_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Results and Discussion <a name="results"></a>

As the clusters are shown above, we now examine each cluster and determine where is the best point for opening a korean restaurant. 

### Cluster 1

In [152]:
 boston_merged.loc[ boston_merged['Cluster Labels'] == 0,  boston_merged.columns[[0] + list(range(5,  boston_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Timezone,Daylight savings time flag,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,2121,-5,1,Roxbury,0,Fast Food Restaurant,Southern / Soul Food Restaurant,Caribbean Restaurant,Vietnamese Restaurant,Dim Sum Restaurant,Indian Restaurant,Hotpot Restaurant,Greek Restaurant,French Restaurant,Falafel Restaurant


Cluster 1 is in Roxbury where the top 5 most common restaurant venues are fast food, soul food, caribben food, vietnamese food and Dim Sum. However, this area is a little far away from downtown. Thus, we will not take this cluster into consideration. 

### Cluster 2

In [156]:
boston_merged.loc[ boston_merged['Cluster Labels'] == 1,  boston_merged.columns[[0] + list(range(5,  boston_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Timezone,Daylight savings time flag,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,2101,-5,1,Central Boston,1,American Restaurant,Italian Restaurant,Greek Restaurant,Falafel Restaurant,Asian Restaurant,Belgian Restaurant,Vegetarian / Vegan Restaurant,Seafood Restaurant,Restaurant,Mediterranean Restaurant
1,2108,-5,1,Back Bay/Beacon Hill,1,Chinese Restaurant,American Restaurant,Sushi Restaurant,Seafood Restaurant,Restaurant,New American Restaurant,Falafel Restaurant,French Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant
3,2110,-5,1,Central Boston,1,Italian Restaurant,American Restaurant,Vegetarian / Vegan Restaurant,Greek Restaurant,Mediterranean Restaurant,Seafood Restaurant,Restaurant,Ethiopian Restaurant,Hotpot Restaurant,French Restaurant
6,2114,-5,1,Central Boston,1,American Restaurant,Italian Restaurant,Hotpot Restaurant,Mediterranean Restaurant,Mexican Restaurant,French Restaurant,Vietnamese Restaurant,Falafel Restaurant,Indian Restaurant,Greek Restaurant
7,2115,-5,1,Fenway/Kenmore,1,Japanese Restaurant,American Restaurant,Thai Restaurant,Sushi Restaurant,Restaurant,Middle Eastern Restaurant,Dim Sum Restaurant,Hotpot Restaurant,Greek Restaurant,French Restaurant
8,2116,-5,1,Back Bay/Beacon Hill,1,Italian Restaurant,Seafood Restaurant,American Restaurant,French Restaurant,New American Restaurant,Mexican Restaurant,Thai Restaurant,Sushi Restaurant,Vegetarian / Vegan Restaurant,Southern / Soul Food Restaurant
9,2117,-5,1,Back Bay/Beacon Hill,1,Thai Restaurant,Tapas Restaurant,Mediterranean Restaurant,Mexican Restaurant,Japanese Restaurant,Middle Eastern Restaurant,American Restaurant,Arepa Restaurant,Asian Restaurant,Cambodian Restaurant
10,2118,-5,1,South End,1,Thai Restaurant,Mediterranean Restaurant,Mexican Restaurant,Japanese Restaurant,Middle Eastern Restaurant,American Restaurant,Arepa Restaurant,Cambodian Restaurant,French Restaurant,Greek Restaurant
12,2120,-5,1,Roxbury,1,African Restaurant,Italian Restaurant,New American Restaurant,Indian Restaurant,Hotpot Restaurant,Greek Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant
15,2123,-5,1,Fenway/Kenmore,1,Fast Food Restaurant,Italian Restaurant,American Restaurant,Indian Restaurant,Falafel Restaurant,Vegetarian / Vegan Restaurant,Ethiopian Restaurant,Israeli Restaurant,Hotpot Restaurant,Greek Restaurant


Cluster 2 has many restaurants, which is the main area for dining and craving. Also, the style of cuisine are divergent and various. So, we will not choose these places as they are crowed with different kinds of restaurants.

### Cluster 3

In [160]:
boston_merged.loc[ boston_merged['Cluster Labels'] == 2,  boston_merged.columns[[0] + list(range(5,  boston_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Timezone,Daylight savings time flag,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,2109,-5,1,Central Boston,2,Italian Restaurant,Seafood Restaurant,American Restaurant,Vegetarian / Vegan Restaurant,Greek Restaurant,Mediterranean Restaurant,Mexican Restaurant,Japanese Restaurant,Belgian Restaurant,Caribbean Restaurant
5,2113,-5,1,Central Boston,2,Italian Restaurant,Seafood Restaurant,Mexican Restaurant,Belgian Restaurant,Vietnamese Restaurant,Ethiopian Restaurant,Indian Restaurant,Hotpot Restaurant,Greek Restaurant,French Restaurant
19,2128,-5,1,East Boston,2,Italian Restaurant,Latin American Restaurant,Brazilian Restaurant,Chinese Restaurant,Peruvian Restaurant,Mexican Restaurant,Indian Restaurant,Hotpot Restaurant,Greek Restaurant,French Restaurant


Cluster 3 is only three different zip code, with a smaller area. Also, this area is close to the downtown Boston. However, as we take a close look at the most common restaurant, Ilalian, seafood and American restaurants are the top three dining in this area, so we will also not choose to open a korean restaurant here. 

### Cluster 4

In [161]:
boston_merged.loc[ boston_merged['Cluster Labels'] == 3,  boston_merged.columns[[0] + list(range(5,  boston_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Timezone,Daylight savings time flag,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,2119,-5,1,Roxbury,3,American Restaurant,African Restaurant,Ethiopian Restaurant,Israeli Restaurant,Indian Restaurant,Hotpot Restaurant,Greek Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant


Cluster 4 is a litte away from the downtown area, so we will not take it into consideration. 

### Cluster 5

In [162]:
boston_merged.loc[ boston_merged['Cluster Labels'] == 4,  boston_merged.columns[[0] + list(range(5,  boston_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Timezone,Daylight savings time flag,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,2111,-5,1,Central Boston,4,Chinese Restaurant,Asian Restaurant,Sushi Restaurant,American Restaurant,Dim Sum Restaurant,Hotpot Restaurant,Vietnamese Restaurant,Middle Eastern Restaurant,French Restaurant,Italian Restaurant
14,2122,-5,1,Dorchester,4,Vietnamese Restaurant,Chinese Restaurant,Ethiopian Restaurant,Israeli Restaurant,Indian Restaurant,Hotpot Restaurant,Greek Restaurant,French Restaurant,Fast Food Restaurant,Falafel Restaurant
17,2125,-5,1,Dorchester,4,Vietnamese Restaurant,Indian Restaurant,Caribbean Restaurant,Chinese Restaurant,Ethiopian Restaurant,Israeli Restaurant,Hotpot Restaurant,Greek Restaurant,French Restaurant,Fast Food Restaurant


Cluster 5 is our choice! Since first, it is near the downtown area but not overcrowed with other restaurants. Second, the most common restaurants in this cluster are both Asian cuisine, meaning that the subculutre here is fit for the theme of a korean restaurants. 

## Conclusion <a name="conclusion"></a>

 From Central Boston’s financial hub to the large hospitals of the Longwood Medical Area to the major research universities in Fenway, Boston’s unique neighborhoods help the city as a whole serve a global market. Each neighborhood does its part to
make the city of Boston what it is, while ach maintains its own respective character. 
Boston’s neighborhoods are not static, however, and have changed with the prevailing trends of the city. 

![](boston.jpg) 
This report aims to show the environment of restaurant development in Boston area and help the skateholders to decide where to open a korean restaurant. First, we import information of zip code, neighborhood names, and coordinates (latitudes and longitudes) from the website: [https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/table/]. And then, we clear the dataframe with removing the NaN values. Second, we import geopy to get a rough map for Boston area regarding to the coordinates. Third, we connect to Foursquare API to get the venue information. Then we select only restaurant information for further analysis. 

Here we use K-means to cluster the similar neighborhood base on the restaurant ranking, by doing this, we can clearly know the dining culture in each of the cluster so that we are able to identify where is the best and fit place for our skateholder to open a korean restaurant. 

