# IBM Data Science Capstone Project      
# Opening a Chinese Restaurant in Boston


## I. Introduction    
Boston is the capital and most populous city of the Commonwealth of Massachusetts in the United States, as well as the 21st most populous city in the United States. Boston is one of the oldest municipalities in the United States, founded on the Shawmut Peninsula in 1630 by Puritan settlers from England. Today, Boston is a thriving port city. The Boston area's many colleges and universities make it an international center of higher education, including law, medicine, engineering, and business, and the city is considered to be a world leader in innovation and entrepreneurship, with nearly 2,000 startups.
Along with its long history, food is a quintessential component of New York City. Cuisine in Boston is similar to the rest of New England cuisine, in that it has a large emphasis on seafood and dairy products. Its best-known dishes are New England clam chowder, fish and chips (usually with cod or scrod), baked beans, lobsters, steamed clams, and fried clams.      

The **Union Oyster House** is the oldest operating restaurant in the United States. Their menu includes oysters on the half-shell served straight from an oyster bar, New England clam chowder, and other seafood dishes. Quincy Market, part of Faneuil Hall Marketplace, has a variety of restaurants and food shops. Nearby Cheers is a popular tourist dining spot.         

Boston's **Chinatown** has a variety of Asian restaurants, bakeries, grocery stores, and medicinal herb and spice vendors. In addition to dim sum and other Chinese dining styles, there are Vietnamese, Japanese, Korean and Thai restaurants in the neighborhood.           

The **North End** has a variety of Italian restaurants, pizzerias, and bakeries and is well known as Boston's "Little Italy." A favorite spot bringing in tourists is Mike's Pastry, located on Hanover Street and is extremely popular for its cannolis. Newbury Street has many ethnic street cafes, while Copley Place houses a multitude of restaurants, also the home of Legal Sea Foods, a New England institution that offers gourmet seafood dishes.            

The objective of this project is to locate and recommend which neighborhood of Boston will be best choice to start a Chinese restaurant and explain the rationale of the recommendations.

## II. DATA ACQUISITION

This demonstration will make use of the following data sources:

**Boston Neighborhoods Data**       

Data will retrieved from Boston open dataset from https://data.boston.gov website.

The Neighborhood boundaries data is a combination of zoning neighborhood boundaries, zip code boundaries and 2010 Census tract boundaries.  These boundaries are used in the broad sense for visualization purposes for zoning and planning studies.  

**Boston location data retrieved using Google maps API**          

Data coordinates of Neighborhood Venues will be retrieved using google API. I also make use of subway stations coordinate as a more important center of for all towns included in venue recommendations.

**Boston Top Venue Recommendations from FourSquare API**     

(FourSquare website: www.foursquare.com)

I will be using the FourSquare API to explore neighborhoods in Boston. The Foursquare explore function will be used to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. The following information are retrieved on the first query:

- Venue ID
- Venue Name
= Coordinates : Latitude and Longitude
- Category Name        

# III. METHODOLOGY

**Load libraries**

In [4]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    branca-0.3.1               |             py_0          25 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    altair-3.2.0               |           py36_0         770 KB  conda-forge
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    ca-certificates-2019.9.11  |       hecc5488_0         144 KB  conda-forge
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.3 MB

The following NEW packages will be 

## 1. Download and Explore Dataset

### Download the Boston Neighborhood data from https://data.boston.gov

In [5]:
!wget -q -O 'boston_data.csv' http://bostonopendata-boston.opendata.arcgis.com/datasets/3525b0ee6e6b427f9aab5d0a1d0a1a28_0.csv
print('Data downloaded!')

Data downloaded!


### Load and explore the data

In [8]:
df = pd.read_csv('boston_data.csv')
df

Unnamed: 0,OBJECTID,Name,Acres,Neighborhood_ID,SqMiles,ShapeSTArea,ShapeSTLength
0,27,Roslindale,1605.568237,15,2.51,69938270.0,53563.912597
1,28,Jamaica Plain,2519.245394,11,3.94,109737900.0,56349.937161
2,29,Mission Hill,350.853564,13,0.55,15283120.0,17918.724113
3,30,Longwood,188.611947,28,0.29,8215904.0,11908.757148
4,31,Bay Village,26.539839,33,0.04,1156071.0,4650.635493
5,32,Leather District,15.639908,27,0.02,681271.7,3237.140537
6,33,Chinatown,76.32441,26,0.12,3324678.0,9736.590413
7,34,North End,126.910439,14,0.2,5527506.0,16177.826815
8,35,Roxbury,2108.469072,16,3.29,91844550.0,49488.800485
9,36,South End,471.535356,32,0.74,20540000.0,17912.333569


In [52]:
df['Latitude'] = 0.0
df['Longitude'] = 0.0

for idx,town in df['Name'].iteritems():
    address = town + " subway station, Boston" ; # I use subway stations as more important central location of each neighborhood
    url = 'https://maps.googleapis.com/maps/api/geocode/json?address={}&key={}'.format(address,google_key)
    lat = requests.get(url).json()["results"][0]["geometry"]["location"]['lat']
    lng = requests.get(url).json()["results"][0]["geometry"]["location"]['lng']
    df.loc[idx,'Latitude'] = lat
    df.loc[idx,'Longitude'] = lng

In [53]:
df

Unnamed: 0,OBJECTID,Name,Acres,Neighborhood_ID,SqMiles,ShapeSTArea,ShapeSTLength,Latitude,Longitude
0,27,Roslindale,1605.568237,15,2.51,69938270.0,53563.912597,42.30069,-71.113972
1,28,Jamaica Plain,2519.245394,11,3.94,109737900.0,56349.937161,42.317265,-71.10416
2,29,Mission Hill,350.853564,13,0.55,15283120.0,17918.724113,42.331341,-71.095499
3,30,Longwood,188.611947,28,0.29,8215904.0,11908.757148,42.336046,-71.099727
4,31,Bay Village,26.539839,33,0.04,1156071.0,4650.635493,42.34735,-71.075727
5,32,Leather District,15.639908,27,0.02,681271.7,3237.140537,42.351922,-71.05507
6,33,Chinatown,76.32441,26,0.12,3324678.0,9736.590413,42.352392,-71.062573
7,34,North End,126.910439,14,0.2,5527506.0,16177.826815,42.366352,-71.06215
8,35,Roxbury,2108.469072,16,3.29,91844550.0,49488.800485,42.331341,-71.095499
9,36,South End,471.535356,32,0.74,20540000.0,17912.333569,42.34735,-71.075727


### Generate Boston basemap

In [54]:
geo = Nominatim(user_agent='My-IBMNotebook')
address = 'Boston'
location = geo.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Boston {}, {}.'.format(latitude, longitude))

# create map of Boston using latitude and longitude values
map_boston = folium.Map(location=[latitude, longitude],tiles="OpenStreetMap", zoom_start=10)

# add markers to map
for lat, lng, town in zip(
    df['Latitude'],
    df['Longitude'],
    df['Name']):
    label = town
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#87cefa',
        fill_opacity=0.5,
        parse_html=False).add_to(map_boston)
map_boston

The geograpical coordinate of Boston 42.3602534, -71.0582912.


## Segmenting and Clustering Towns in Boston

### Retrieving FourSquare Places of interest

Using the Foursquare API, the explore API function was be used to get the most common venue categories in each neighborhood, and then used this feature to group the neighborhoods into clusters. The k-means clustering algorithm was used for the analysis. Fnally, the Folium library is used to visualize the recommended neighborhoods and their emerging clusters.     

In the ipynb notebook, the function **getNearbyVenues** extracts the following information for the dataframe it generates:

- Venue ID
- Venue Name
- Coordinates : Latitude and Longitude
- Category Name    

The function **getVenuesByCategory** performs the following:

1. **category** based venue search to simulate user venue searches based on certain places of interest. This search extracts the following information:    
 - Venue ID
 - Venue Name
 - Coordinates : Latitude and Longitude
 - Category Name   
 
 
2. For each retrieved **venueID**, retrive the venues category rating.

In [55]:
# define Foursquare Credentials and Version
CLIENT_ID = 'UVIZWYJTNFQWIQHMC1KFPXMCT4HZBPACYDA2RZH2KRENFCAI' # your Foursquare ID
CLIENT_SECRET = 'BXORRJFMB5THDVCNFP5MSVVSWSD1ZHRD20LNLKZHH1ZHY1FF' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: UVIZWYJTNFQWIQHMC1KFPXMCT4HZBPACYDA2RZH2KRENFCAI
CLIENT_SECRET:BXORRJFMB5THDVCNFP5MSVVSWSD1ZHRD20LNLKZHH1ZHY1FF


### 1. Exploring neighborhoods in Boston

**Using the following foursquare api query url, search venues on all boroughs in Boston neighborhoods.**  
> https://<i></i> api.foursquare.com/v2/venues/**search**
**client_id**=CLIENT_ID&client_secret=**CLIENT_SECRET**&ll=**LATITUDE**,**LONGITUDE**&v=**VERSION**&query=**QUERY**&radius=**RADIUS**&limit=**LIMIT**

In [58]:
radius = 500
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(df['Latitude'], df['Longitude'], df['Name']):
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [61]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'BoroughLatitude', 'BoroughLongitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(1227, 7)


Unnamed: 0,Neighborhood,BoroughLatitude,BoroughLongitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Roslindale,42.30069,-71.113972,Brassica Kitchen & Cafe,42.300266,-71.11316,New American Restaurant
1,Roslindale,42.30069,-71.113972,Mike's Donuts,42.300735,-71.114029,Donut Shop
2,Roslindale,42.30069,-71.113972,The Dogwood,42.300279,-71.113281,American Restaurant
3,Roslindale,42.30069,-71.113972,Forest Hills Diner,42.30073,-71.112889,Breakfast Spot
4,Roslindale,42.30069,-71.113972,Simpli Bar & Bites,42.297241,-71.1166,Bar


### 2. Check venue count per neighborhood

In [62]:
venues_df.groupby('Neighborhood').count()

Unnamed: 0_level_0,BoroughLatitude,BoroughLongitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Allston,14,14,14,14,14,14
Back Bay,100,100,100,100,100,100
Bay Village,100,100,100,100,100,100
Beacon Hill,100,100,100,100,100,100
Brighton,14,14,14,14,14,14
Charlestown,21,21,21,21,21,21
Chinatown,100,100,100,100,100,100
Dorchester,4,4,4,4,4,4
Downtown,100,100,100,100,100,100
East Boston,6,6,6,6,6,6


In [64]:
# Verify the dtypes 
venues_df.dtypes

Neighborhood         object
BoroughLatitude     float64
BoroughLongitude    float64
VenueName            object
VenueLatitude       float64
VenueLongitude      float64
VenueCategory        object
dtype: object

**How many unique categories can be curated from all the returned venues?**

In [66]:
# Count number of categories that can be curated.
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 161 uniques categories.


**What are the top 20 most common venue types?**

In [67]:
# Check top 10 most frequently occuring venue type
venues_df.groupby('VenueCategory')['VenueName'].count().sort_values(ascending=False)[:20]

VenueCategory
Coffee Shop                54
American Restaurant        48
Sandwich Place             38
Chinese Restaurant         35
Asian Restaurant           32
Gym                        31
Pizza Place                30
Hotel                      30
Bakery                     27
Italian Restaurant         26
Gym / Fitness Center       26
Café                       24
Bar                        24
Seafood Restaurant         23
Mexican Restaurant         23
New American Restaurant    22
Park                       22
Donut Shop                 21
Steakhouse                 16
Salad Place                15
Name: VenueName, dtype: int64

### 3. Analyze Each Boston Neighborhood nearby recommended venues

In [80]:
# one hot encoding
bos_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add Town column back to dataframe
bos_onehot['Neighborhood'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [bos_onehot.columns[-1]] + list(bos_onehot.columns[:-1])
bos_onehot = bos_onehot[fixed_columns]

# Check returned one hot encoding data:
print('One hot encoding returned "{}" rows.'.format(bos_onehot.shape[0]))

# Regroup rows by town and mean of frequency occurrence per category.
bos_grouped = bos_onehot.groupby('Neighborhood').mean().reset_index()

print('One hot encoding re-group returned "{}" rows.'.format(bos_grouped.shape[0]))
bos_grouped.head()

One hot encoding returned "1227" rows.
One hot encoding re-group returned "26" rows.


Unnamed: 0,Neighborhood,Accessories Store,African Restaurant,American Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Bed & Breakfast,Big Box Store,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Station,Business Service,Café,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Clothing Store,Club House,Cocktail Bar,Coffee Shop,Colombian Restaurant,Comedy Club,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Food,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gastropub,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Historic Site,History Museum,Hockey Arena,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Insurance Office,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Korean Restaurant,Lake,Library,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Market,Massage Studio,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Movie Theater,Museum,Nail Salon,New American Restaurant,Noodle House,Office,Opera House,Outdoor Sculpture,Park,Parking,Pedestrian Plaza,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Planetarium,Playground,Plaza,Pub,Rental Car Location,Restaurant,River,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Ski Area,Ski Chalet,Southern / Soul Food Restaurant,Souvenir Shop,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Sushi Restaurant,Szechuan Restaurant,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Tour Provider,Tourist Information Center,Track,Trail,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo Exhibit
0,Allston,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.071429,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Back Bay,0.02,0.0,0.06,0.01,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.01,0.03,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.05,0.03,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.02,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.0
2,Bay Village,0.02,0.0,0.06,0.01,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.01,0.03,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.05,0.03,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.02,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.0
3,Beacon Hill,0.0,0.0,0.04,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.02,0.0,0.02,0.06,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.02,0.02,0.03,0.01,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.04,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.01,0.02,0.0,0.0,0.02,0.0,0.02,0.0,0.05,0.01,0.0,0.01,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0
4,Brighton,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.071429,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### 4. Analyze Boston Town most visited venues

In [81]:
num_top_venues = 10
for town in bos_grouped['Neighborhood']:
    print("# Town=< "+town+" >")
    temp = bos_grouped[bos_grouped['Neighborhood'] == town].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

# Town=< Allston >
                  venue  freq
0                   Bar  0.14
1              Pharmacy  0.07
2               Dog Run  0.07
3           Coffee Shop  0.07
4  Gym / Fitness Center  0.07
5             Gastropub  0.07
6           Pizza Place  0.07
7                  Food  0.07
8          Liquor Store  0.07
9    Athletics & Sports  0.07


# Town=< Back Bay >
                     venue  freq
0      American Restaurant  0.06
1                      Gym  0.05
2                    Hotel  0.05
3       Seafood Restaurant  0.04
4     Gym / Fitness Center  0.03
5         Department Store  0.03
6           Ice Cream Shop  0.02
7             Dessert Shop  0.02
8  New American Restaurant  0.02
9              Coffee Shop  0.02


# Town=< Bay Village >
                     venue  freq
0      American Restaurant  0.06
1                      Gym  0.05
2                    Hotel  0.05
3       Seafood Restaurant  0.04
4     Gym / Fitness Center  0.03
5         Department Store  0.03
6         

In [82]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

In [87]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
town_venues_sorted = pd.DataFrame(columns=columns)
town_venues_sorted['Neighborhood'] = bos_grouped['Neighborhood']

for ind in np.arange(bos_grouped.shape[0]):
    town_venues_sorted.iloc[ind, 1:] = return_most_common_venues(bos_grouped.iloc[ind, :], num_top_venues)

print(town_venues_sorted.shape)
town_venues_sorted

(26, 11)


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allston,Bar,Gym / Fitness Center,Chinese Restaurant,Gastropub,Dog Run,Liquor Store,Coffee Shop,Food,Plaza,Athletics & Sports
1,Back Bay,American Restaurant,Gym,Hotel,Seafood Restaurant,Gym / Fitness Center,Department Store,Accessories Store,Juice Bar,Plaza,Playground
2,Bay Village,American Restaurant,Gym,Hotel,Seafood Restaurant,Gym / Fitness Center,Department Store,Accessories Store,Juice Bar,Plaza,Playground
3,Beacon Hill,Coffee Shop,Sandwich Place,American Restaurant,New American Restaurant,Italian Restaurant,Seafood Restaurant,Hotel,Falafel Restaurant,Historic Site,Steakhouse
4,Brighton,Bar,Gym / Fitness Center,Chinese Restaurant,Gastropub,Dog Run,Liquor Store,Coffee Shop,Food,Plaza,Athletics & Sports
5,Charlestown,American Restaurant,Coffee Shop,Skate Park,Convenience Store,Gastropub,Pet Store,Park,Chinese Restaurant,Donut Shop,Shopping Mall
6,Chinatown,Chinese Restaurant,Asian Restaurant,Bakery,Coffee Shop,Theater,Performing Arts Venue,Pizza Place,Sandwich Place,Seafood Restaurant,Sushi Restaurant
7,Dorchester,Park,Metro Station,Chinese Restaurant,Liquor Store,Dog Run,Food,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Donut Shop
8,Downtown,Coffee Shop,Sandwich Place,American Restaurant,New American Restaurant,Falafel Restaurant,Restaurant,Hotel,Historic Site,Gym / Fitness Center,Steakhouse
9,East Boston,Park,River,Metro Station,Business Service,Colombian Restaurant,Ski Area,Dive Bar,Farmers Market,Falafel Restaurant,Donut Shop


### 5. Clustering Neighborhoods  

Run k-means to cluster the Neighborhoods into 5 clusters.

In [93]:
# set number of clusters
kclusters = 5
bos_grouped_clustering = bos_grouped.drop('Neighborhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=1).fit(bos_grouped_clustering)

# check cluster labels generated for each row in the dataframe
print(kmeans.labels_[0:10])
print(len(kmeans.labels_))

[1 1 1 1 1 1 1 4 1 2]
26


In [94]:
town_venues_sorted.head()

Unnamed: 0_level_0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Allston,Bar,Gym / Fitness Center,Chinese Restaurant,Gastropub,Dog Run,Liquor Store,Coffee Shop,Food,Plaza,Athletics & Sports
Back Bay,American Restaurant,Gym,Hotel,Seafood Restaurant,Gym / Fitness Center,Department Store,Accessories Store,Juice Bar,Plaza,Playground
Bay Village,American Restaurant,Gym,Hotel,Seafood Restaurant,Gym / Fitness Center,Department Store,Accessories Store,Juice Bar,Plaza,Playground
Beacon Hill,Coffee Shop,Sandwich Place,American Restaurant,New American Restaurant,Italian Restaurant,Seafood Restaurant,Hotel,Falafel Restaurant,Historic Site,Steakhouse
Brighton,Bar,Gym / Fitness Center,Chinese Restaurant,Gastropub,Dog Run,Liquor Store,Coffee Shop,Food,Plaza,Athletics & Sports


In [97]:
bos_merged = df.set_index("Name")
# add clustering labels
bos_merged['Cluster Labels'] = kmeans.labels_
bos_merged = bos_merged.join(town_venues_sorted)
bos_merged

Unnamed: 0_level_0,OBJECTID,Acres,Neighborhood_ID,SqMiles,ShapeSTArea,ShapeSTLength,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
Roslindale,27,1605.568237,15,2.51,69938270.0,53563.912597,42.30069,-71.113972,1,Bar,Pub,Breakfast Spot,Grocery Store,Liquor Store,Donut Shop,New American Restaurant,Pet Store,Pizza Place,Rental Car Location
Jamaica Plain,28,2519.245394,11,3.94,109737900.0,56349.937161,42.317265,-71.10416,1,Brewery,Coffee Shop,Gym,American Restaurant,Tennis Court,Shopping Mall,Chinese Restaurant,Farmers Market,Liquor Store,Art Gallery
Mission Hill,29,350.853564,13,0.55,15283120.0,17918.724113,42.331341,-71.095499,1,Pizza Place,Donut Shop,Furniture / Home Store,Track,New American Restaurant,Liquor Store,Light Rail Station,Gym,Burger Joint,Italian Restaurant
Longwood,30,188.611947,28,0.29,8215904.0,11908.757148,42.336046,-71.099727,1,Donut Shop,Italian Restaurant,Sandwich Place,Pizza Place,Pub,Gym,Sushi Restaurant,Liquor Store,Bookstore,Café
Bay Village,31,26.539839,33,0.04,1156071.0,4650.635493,42.34735,-71.075727,1,American Restaurant,Gym,Hotel,Seafood Restaurant,Gym / Fitness Center,Department Store,Accessories Store,Juice Bar,Plaza,Playground
Leather District,32,15.639908,27,0.02,681271.7,3237.140537,42.351922,-71.05507,1,Coffee Shop,Sandwich Place,Chinese Restaurant,Asian Restaurant,Bakery,Food Truck,American Restaurant,Café,Park,Dive Bar
Chinatown,33,76.32441,26,0.12,3324678.0,9736.590413,42.352392,-71.062573,1,Chinese Restaurant,Asian Restaurant,Bakery,Coffee Shop,Theater,Performing Arts Venue,Pizza Place,Sandwich Place,Seafood Restaurant,Sushi Restaurant
North End,34,126.910439,14,0.2,5527506.0,16177.826815,42.366352,-71.06215,4,Pizza Place,Hotel,Donut Shop,Italian Restaurant,Sandwich Place,Bar,Brewery,Mexican Restaurant,Sports Bar,Coffee Shop
Roxbury,35,2108.469072,16,3.29,91844550.0,49488.800485,42.331341,-71.095499,1,Pizza Place,Donut Shop,Furniture / Home Store,Track,New American Restaurant,Liquor Store,Light Rail Station,Gym,Burger Joint,Italian Restaurant
South End,36,471.535356,32,0.74,20540000.0,17912.333569,42.34735,-71.075727,2,American Restaurant,Gym,Hotel,Seafood Restaurant,Gym / Fitness Center,Department Store,Accessories Store,Juice Bar,Plaza,Playground


In [99]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], tiles="Openstreetmap", zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bos_merged['Latitude'], bos_merged['Longitude'], bos_merged.index.values,kmeans.labels_):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=1).add_to(map_clusters)
       
map_clusters

### 6. Exam Clusters

In [100]:
#Cluster 0
bos_merged.loc[bos_merged['Cluster Labels'] == 0]

Unnamed: 0_level_0,OBJECTID,Acres,Neighborhood_ID,SqMiles,ShapeSTArea,ShapeSTLength,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
Downtown,42,397.472846,7,0.62,17313850.0,34612.804441,42.355453,-71.060453,0,Coffee Shop,Sandwich Place,American Restaurant,New American Restaurant,Falafel Restaurant,Restaurant,Hotel,Historic Site,Gym / Fitness Center,Steakhouse
Brighton,44,1840.408596,25,2.88,80167880.0,48787.519652,42.348688,-71.138024,0,Bar,Gym / Fitness Center,Chinese Restaurant,Gastropub,Dog Run,Liquor Store,Coffee Shop,Food,Plaza,Athletics & Sports
Mattapan,47,1352.098354,12,2.11,58897170.0,42005.773707,42.267784,-71.091829,0,Bakery,Southern / Soul Food Restaurant,Pharmacy,Fast Food Restaurant,Caribbean Restaurant,Mobile Phone Shop,Dog Run,Food,Farmers Market,Falafel Restaurant


In [101]:
#Cluster 1
bos_merged.loc[bos_merged['Cluster Labels'] == 1]

Unnamed: 0_level_0,OBJECTID,Acres,Neighborhood_ID,SqMiles,ShapeSTArea,ShapeSTLength,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
Roslindale,27,1605.568237,15,2.51,69938270.0,53563.912597,42.30069,-71.113972,1,Bar,Pub,Breakfast Spot,Grocery Store,Liquor Store,Donut Shop,New American Restaurant,Pet Store,Pizza Place,Rental Car Location
Jamaica Plain,28,2519.245394,11,3.94,109737900.0,56349.937161,42.317265,-71.10416,1,Brewery,Coffee Shop,Gym,American Restaurant,Tennis Court,Shopping Mall,Chinese Restaurant,Farmers Market,Liquor Store,Art Gallery
Mission Hill,29,350.853564,13,0.55,15283120.0,17918.724113,42.331341,-71.095499,1,Pizza Place,Donut Shop,Furniture / Home Store,Track,New American Restaurant,Liquor Store,Light Rail Station,Gym,Burger Joint,Italian Restaurant
Longwood,30,188.611947,28,0.29,8215904.0,11908.757148,42.336046,-71.099727,1,Donut Shop,Italian Restaurant,Sandwich Place,Pizza Place,Pub,Gym,Sushi Restaurant,Liquor Store,Bookstore,Café
Bay Village,31,26.539839,33,0.04,1156071.0,4650.635493,42.34735,-71.075727,1,American Restaurant,Gym,Hotel,Seafood Restaurant,Gym / Fitness Center,Department Store,Accessories Store,Juice Bar,Plaza,Playground
Leather District,32,15.639908,27,0.02,681271.7,3237.140537,42.351922,-71.05507,1,Coffee Shop,Sandwich Place,Chinese Restaurant,Asian Restaurant,Bakery,Food Truck,American Restaurant,Café,Park,Dive Bar
Chinatown,33,76.32441,26,0.12,3324678.0,9736.590413,42.352392,-71.062573,1,Chinese Restaurant,Asian Restaurant,Bakery,Coffee Shop,Theater,Performing Arts Venue,Pizza Place,Sandwich Place,Seafood Restaurant,Sushi Restaurant
Roxbury,35,2108.469072,16,3.29,91844550.0,49488.800485,42.331341,-71.095499,1,Pizza Place,Donut Shop,Furniture / Home Store,Track,New American Restaurant,Liquor Store,Light Rail Station,Gym,Burger Joint,Italian Restaurant
Back Bay,37,399.314411,2,0.62,17394070.0,19455.671146,42.34735,-71.075727,1,American Restaurant,Gym,Hotel,Seafood Restaurant,Gym / Fitness Center,Department Store,Accessories Store,Juice Bar,Plaza,Playground
Charlestown,39,871.541223,4,1.36,37964180.0,57509.688645,42.373678,-71.069654,1,American Restaurant,Coffee Shop,Skate Park,Convenience Store,Gastropub,Pet Store,Park,Chinese Restaurant,Donut Shop,Shopping Mall


In [102]:
#Cluster 2
bos_merged.loc[bos_merged['Cluster Labels'] == 2]

Unnamed: 0_level_0,OBJECTID,Acres,Neighborhood_ID,SqMiles,ShapeSTArea,ShapeSTLength,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
South End,36,471.535356,32,0.74,20540000.0,17912.333569,42.34735,-71.075727,2,American Restaurant,Gym,Hotel,Seafood Restaurant,Gym / Fitness Center,Department Store,Accessories Store,Juice Bar,Plaza,Playground
East Boston,38,3012.059593,8,4.71,131384500.0,121089.100852,42.390501,-70.997123,2,Park,River,Metro Station,Business Service,Colombian Restaurant,Ski Area,Dive Bar,Farmers Market,Falafel Restaurant,Donut Shop


In [103]:
#Cluster 3
bos_merged.loc[bos_merged['Cluster Labels'] == 3]

Unnamed: 0_level_0,OBJECTID,Acres,Neighborhood_ID,SqMiles,ShapeSTArea,ShapeSTLength,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
Fenway,43,560.618461,34,0.88,24420440.0,24620.876452,42.345399,-71.104332,3,American Restaurant,Furniture / Home Store,Mexican Restaurant,Café,Chinese Restaurant,Bakery,Greek Restaurant,Thai Restaurant,Cycle Studio,Movie Theater


In [104]:
#Cluster 4
bos_merged.loc[bos_merged['Cluster Labels'] == 4]

Unnamed: 0_level_0,OBJECTID,Acres,Neighborhood_ID,SqMiles,ShapeSTArea,ShapeSTLength,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
North End,34,126.910439,14,0.2,5527506.0,16177.826815,42.366352,-71.06215,4,Pizza Place,Hotel,Donut Shop,Italian Restaurant,Sandwich Place,Bar,Brewery,Mexican Restaurant,Sports Bar,Coffee Shop


## Discussion

**In this notebook, analysis of neighborhood recommendations based on Food venue category has been presented. Based on the analysis above, Chinese restaurants appear in Cluster 0, 1 and 3. In Chinatown, Chinese restaurant is the most common venue, which is pretty resonable, and in Dorchester, South Boston Waterfront, South Boston and Allston, Chinese restaurant is the third most common venue. Therefore, apart from Chinatown, which is an obvious option for opening a Chinese restaurant, neighborhoods like Dorchester, South Boston Waterfront, South Boston and Allston could also be reasonable options.**