# Capstone Project - The Battle of Neighborhoods | Coursera

## ToC

* [Introduction](#intro)
* [Data](#data)
* [Methodology and Analysis](#analysis)
* [Results and Discussion](#result)
* [Conclusion](#conclusion)

## Introduction <span id="intro"></span>
This report is for Capstone Project in Coursera [Applied Data Science Capstone](https://www.coursera.org/learn/applied-data-science-capstone).

### Business Problem
Assume that we want to open a successful Chinese restaurant in a U.S. city.

To achieve this goal, first of all we need to choose a city in U.S. Since it is a Chinese restaurant, a multi-cultural metropolis sounds promising. So we would like to select San Francisco.
![SF Map](https://ljmoore.files.wordpress.com/2013/02/san-francisco-autofill-map1.jpg?w=800)

The second step is to choose an ideal location of the restaurant in the city. We can leverage the Foursquare location data and machine learning skills to learn which are the most popular areas for Chinese restaurants. 

### Who would be interested in this project?
Someone who also has interest in opening a successful restaurant in a U.S. city.


## Data  <span id="data"></span>

Our target is to find the most popular areas for Chinese restaurants in San Francisco.

So the first thing is to divide San Francisco city into certain areas. We can do that based on on-line postal code (ZIP) information.

The second is to get necessary location data in all the areas utilizing Foursquare API. After data cleaning, we can use the data to cluster the areas into different types. Then we pick up the cluster where Chinese restaurants are more popular. 


## Methodology and Analysis <span id="analysis"></span>

### A Basic View on San Francisco City

In [1]:
import pandas as pd
import numpy as np

In [17]:
# get the coordinate of city
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

city = "San Francisco City"

geolocator = Nominatim(user_agent="city_explorer")
location = geolocator.geocode(city)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of San Francisco City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of San Francisco City are 37.7790262, -122.4199061.


In [2]:
# get the postal code and coordinate info
#   from united-states.postcode.info
from urllib.request import urlopen
import re

url = "http://united-states.postcode.info/california/san-francisco"
html = urlopen(url).read().decode("utf-8")

# example: California, <a href="/p/94102">94102</a> San Francisco, San Francisco, GPS coordinates: 37.7813,-122.4167
res = re.findall(r'>(\d+)</a>.* GPS coordinates: (.+),(.+)\r', html)
print("Coordinates: ", res[0])

Coordinates:  ('94101', '37.7848', '-122.7278')


In [10]:
# create the postal code and coordinate dataframe

hash_sf = {'PostalCode': [], 
           'Latitude': [], 
           'Longitude': []
          }

for row in res:
    hash_sf['PostalCode'].append(int(row[0]))
    hash_sf['Latitude'].append(float(row[1]))
    hash_sf['Longitude'].append(float(row[2]))


column_names = ['PostalCode', 'Latitude', 'Longitude'] 

# instantiate the dataframe
df_sf = pd.DataFrame(columns=column_names)
df_sf = pd.DataFrame(hash_sf)
df_sf.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,94101,37.7848,-122.7278
1,94102,37.7813,-122.4167
2,94103,37.7725,-122.4147
3,94104,37.7915,-122.4018
4,94105,37.7864,-122.3892


In [11]:
# remove abnormal postal codes
# in the dataframe, multiple codes share the same coodinate: [37.7848, -122.7278]
# in addition, this location is in the gulf
# so we remove these abnormal codes

df_sf = df_sf[df_sf['Longitude']!=-122.7278].reset_index(drop=True)
df_sf

Unnamed: 0,PostalCode,Latitude,Longitude
0,94102,37.7813,-122.4167
1,94103,37.7725,-122.4147
2,94104,37.7915,-122.4018
3,94105,37.7864,-122.3892
4,94107,37.7621,-122.3971
5,94108,37.7929,-122.4079
6,94109,37.7917,-122.4186
7,94110,37.7509,-122.4153
8,94111,37.7974,-122.4001
9,94112,37.7195,-122.4411


In [8]:
# render the map for areas marked by postal code
import folium

map_sf = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, pcode in zip(df_sf['Latitude'], df_sf['Longitude'], df_sf['PostalCode']):
    label = '{}'.format(pcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_sf)
    
map_sf

In case the rendered map is not shown, please see this snapshot.  
![sf_postalcode](sf_postalcode.png)

### Request Data Using Foursquare API

In [9]:
# define foursqure info
CLIENT_ID = 'DTM4ERJ5PLBBUI13LZJO44E0Z2XV1MMXNSZ4WFXDI2S0CXWR' # your Foursquare ID
CLIENT_SECRET = 'ONQOLBNCXJKP3OF51AWOD1RZC4GYFQRGOFMGZIVEHJKE4X5P' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [15]:
# define API request parameters and function
import requests # library to handle requests

LIMIT = 100
radius = 1000

def getNearbyVenues(codes, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for code, lat, lng in zip(codes, latitudes, longitudes):
        print(code)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            code, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['PostalCode', 
                  'PostalCode Latitude', 
                  'PostalCode Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [17]:
# API request
sf_venues = getNearbyVenues(df_sf['PostalCode'],
                            df_sf['Latitude'], 
                            df_sf['Longitude'],
                            radius)
sf_venues.head()

94102
94103
94104
94105
94107
94108
94109
94110
94111
94112
94114
94115
94116
94117
94118
94121
94122
94123
94124
94127
94128
94129
94130
94131
94132
94133
94134
94143
94158
94199


Unnamed: 0,PostalCode,PostalCode Latitude,PostalCode Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,94102,37.7813,-122.4167,Asian Art Museum,37.780178,-122.416505,Art Museum
1,94102,37.7813,-122.4167,Philz Coffee,37.781433,-122.417073,Coffee Shop
2,94102,37.7813,-122.4167,Ales Unlimited: Beer Basement,37.782751,-122.415656,Beer Bar
3,94102,37.7813,-122.4167,Saigon Sandwich,37.783084,-122.41765,Sandwich Place
4,94102,37.7813,-122.4167,Whitechapel,37.78223,-122.418884,Cocktail Bar


In [18]:
# save the dataframe
csv_file = "sf_venues.csv"
sf_venues.to_csv(csv_file)

In [4]:
# restore the dataframe
csv_file = "sf_venues.csv"
sf_venues = pd.read_csv(csv_file)

print(sf_venues.shape)
sf_venues.head()

(2573, 8)


Unnamed: 0.1,Unnamed: 0,PostalCode,PostalCode Latitude,PostalCode Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,0,94102,37.7813,-122.4167,Asian Art Museum,37.780178,-122.416505,Art Museum
1,1,94102,37.7813,-122.4167,Philz Coffee,37.781433,-122.417073,Coffee Shop
2,2,94102,37.7813,-122.4167,Ales Unlimited: Beer Basement,37.782751,-122.415656,Beer Bar
3,3,94102,37.7813,-122.4167,Saigon Sandwich,37.783084,-122.41765,Sandwich Place
4,4,94102,37.7813,-122.4167,Whitechapel,37.78223,-122.418884,Cocktail Bar


### Data Analysis

In [5]:
# one hot encoding
sf_onehot = pd.get_dummies(sf_venues[['Venue Category']], prefix="", prefix_sep="")

# add code column back to dataframe
sf_onehot['PostalCode'] = sf_venues['PostalCode'] 

# move code column to the first column
fixed_columns = [sf_onehot.columns[-1]] + list(sf_onehot.columns[:-1])
sf_onehot = sf_onehot[fixed_columns]

print(sf_onehot.shape)
sf_onehot.head()

(2573, 323)


Unnamed: 0,PostalCode,Acai House,Accessories Store,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,Airport Service,Alternative Healer,American Restaurant,...,Video Store,Vietnamese Restaurant,Vineyard,Wagashi Place,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,94102,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,94102,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,94102,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,94102,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,94102,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [6]:
# group by postal code
sf_grouped = sf_onehot.groupby('PostalCode').mean().reset_index()

sf_grouped

Unnamed: 0,PostalCode,Acai House,Accessories Store,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,Airport Service,Alternative Healer,American Restaurant,...,Video Store,Vietnamese Restaurant,Vineyard,Wagashi Place,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,94102,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.04,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0
1,94103,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.0
2,94104,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,...,0.0,0.01,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.01
3,94105,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,...,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01
4,94107,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,...,0.0,0.01,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.02
5,94108,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,...,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01
6,94109,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,...,0.0,0.04,0.0,0.0,0.0,0.03,0.01,0.01,0.0,0.02
7,94110,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01
8,94111,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,...,0.0,0.02,0.0,0.0,0.01,0.04,0.01,0.0,0.0,0.0
9,94112,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,...,0.0,0.034091,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Cluster Postal Code Areas

In [7]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
k = 5

sf_grouped_clustering = sf_grouped.drop('PostalCode', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=k, random_state=0).fit(sf_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 3, 0, 0, 3, 0, 3])

In [25]:
# add coordinate info
sf_merged = sf_grouped.join(df_sf.set_index('PostalCode'), on='PostalCode')

# add clustering labels
sf_merged.insert(0, 'ClusterLabel', kmeans.labels_)

print(sf_merged.shape)
sf_merged.head() # check the last columns!

(30, 326)


Unnamed: 0,ClusterLabel,PostalCode,Acai House,Accessories Store,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,Airport Service,Alternative Healer,...,Vineyard,Wagashi Place,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Latitude,Longitude
0,0,94102,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,37.7813,-122.4167
1,0,94103,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.0,37.7725,-122.4147
2,0,94104,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.01,37.7915,-122.4018
3,0,94105,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,...,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,37.7864,-122.3892
4,3,94107,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.02,37.7621,-122.3971


In [19]:
# render the map for clusters marked by postal code
import folium
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, code, cluster in zip(sf_merged['Latitude'], sf_merged['Longitude'], sf_merged['PostalCode'], sf_merged['ClusterLabel']):
    label = folium.Popup('Code ' + str(code) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In case the rendered map is not shown, please see this snapshot.  
![sf_cluster](sf_cluster.png)

### Examine Area Top Venues

In [21]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [24]:
# sort top 10 venues for each area
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['PostalCode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
sf_venues_sorted = pd.DataFrame(columns=columns)
sf_venues_sorted['PostalCode'] = sf_grouped['PostalCode']

for ind in np.arange(sf_grouped.shape[0]):
    sf_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sf_grouped.iloc[ind, :], num_top_venues)

sf_venues_sorted.head()

Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,94102,Coffee Shop,Theater,Vietnamese Restaurant,Beer Bar,Cocktail Bar,Music Venue,Sushi Restaurant,Sandwich Place,Performing Arts Venue,Speakeasy
1,94103,Coffee Shop,Cocktail Bar,Beer Bar,Wine Bar,New American Restaurant,Motorcycle Shop,Gym,Gay Bar,Art Gallery,Bar
2,94104,Coffee Shop,Men's Store,Boutique,Seafood Restaurant,Hotel,Restaurant,Park,Gym / Fitness Center,Gym,Shoe Store
3,94105,Coffee Shop,Café,Gym,Scenic Lookout,Park,Art Gallery,Seafood Restaurant,New American Restaurant,Burger Joint,Cycle Studio
4,94107,Café,Coffee Shop,Brewery,Park,Sushi Restaurant,Breakfast Spot,Sandwich Place,Gym,Bakery,Bar


In [26]:
# add clustering labels
sf_venues_sorted.insert(0, 'ClusterLabel', kmeans.labels_)

print(sf_venues_sorted.shape)
sf_venues_sorted.head()

(30, 12)


Unnamed: 0,ClusterLabel,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,94102,Coffee Shop,Theater,Vietnamese Restaurant,Beer Bar,Cocktail Bar,Music Venue,Sushi Restaurant,Sandwich Place,Performing Arts Venue,Speakeasy
1,0,94103,Coffee Shop,Cocktail Bar,Beer Bar,Wine Bar,New American Restaurant,Motorcycle Shop,Gym,Gay Bar,Art Gallery,Bar
2,0,94104,Coffee Shop,Men's Store,Boutique,Seafood Restaurant,Hotel,Restaurant,Park,Gym / Fitness Center,Gym,Shoe Store
3,0,94105,Coffee Shop,Café,Gym,Scenic Lookout,Park,Art Gallery,Seafood Restaurant,New American Restaurant,Burger Joint,Cycle Studio
4,3,94107,Café,Coffee Shop,Brewery,Park,Sushi Restaurant,Breakfast Spot,Sandwich Place,Gym,Bakery,Bar


### Examine Clusters <a name="examine_cluster"></a>

In [29]:
# CLuster 0
sf_venues_sorted.loc[sf_venues_sorted['ClusterLabel']==0]
# In this cluster, The most common venues are coffee shops & bars

Unnamed: 0,ClusterLabel,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,94102,Coffee Shop,Theater,Vietnamese Restaurant,Beer Bar,Cocktail Bar,Music Venue,Sushi Restaurant,Sandwich Place,Performing Arts Venue,Speakeasy
1,0,94103,Coffee Shop,Cocktail Bar,Beer Bar,Wine Bar,New American Restaurant,Motorcycle Shop,Gym,Gay Bar,Art Gallery,Bar
2,0,94104,Coffee Shop,Men's Store,Boutique,Seafood Restaurant,Hotel,Restaurant,Park,Gym / Fitness Center,Gym,Shoe Store
3,0,94105,Coffee Shop,Café,Gym,Scenic Lookout,Park,Art Gallery,Seafood Restaurant,New American Restaurant,Burger Joint,Cycle Studio
5,0,94108,Coffee Shop,Hotel,Men's Store,Boutique,Restaurant,Shoe Store,Cocktail Bar,Church,Szechuan Restaurant,New American Restaurant
6,0,94109,Coffee Shop,Vietnamese Restaurant,Italian Restaurant,Grocery Store,Steakhouse,Wine Bar,American Restaurant,Gym / Fitness Center,Clothing Store,Bakery
8,0,94111,Coffee Shop,Cocktail Bar,Wine Bar,New American Restaurant,Men's Store,Seafood Restaurant,Scenic Lookout,Dessert Shop,Restaurant,Italian Restaurant
29,0,94199,Cocktail Bar,Wine Bar,Coffee Shop,Performing Arts Venue,Dessert Shop,New American Restaurant,Beer Bar,French Restaurant,Marijuana Dispensary,Optical Shop


In [31]:
# CLuster 1
sf_venues_sorted.loc[sf_venues_sorted['ClusterLabel']==1]
# In this cluster, The most common venues are restaurants

Unnamed: 0,ClusterLabel,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,1,94116,Chinese Restaurant,Sandwich Place,Park,Dumpling Restaurant,Pizza Place,Bubble Tea Shop,Sushi Restaurant,Korean Restaurant,Café,Bus Stop
14,1,94118,Japanese Restaurant,Bakery,Thai Restaurant,Korean Restaurant,Italian Restaurant,Coffee Shop,Chinese Restaurant,Burmese Restaurant,Sushi Restaurant,Vietnamese Restaurant
15,1,94121,Café,Chinese Restaurant,Grocery Store,Sushi Restaurant,Korean Restaurant,Vietnamese Restaurant,Playground,Pizza Place,Bakery,Japanese Restaurant
16,1,94122,Chinese Restaurant,Bubble Tea Shop,Vietnamese Restaurant,Bakery,Japanese Restaurant,Bank,Dim Sum Restaurant,Thai Restaurant,Deli / Bodega,Bar


In [35]:
# CLuster 2
sf_venues_sorted.loc[sf_venues_sorted['ClusterLabel']==2]
# In this cluster, The most common venues are food trucks and sports

Unnamed: 0,ClusterLabel,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,2,94130,Food Truck,Athletics & Sports,Park,Rugby Pitch,Baseball Field,Music Venue,Breakfast Spot,History Museum,Bus Station,American Restaurant


In [33]:
# CLuster 3
sf_venues_sorted.loc[sf_venues_sorted['ClusterLabel']==3]
# In this cluster, The most common venues are parks and pizza places

Unnamed: 0,ClusterLabel,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,3,94107,Café,Coffee Shop,Brewery,Park,Sushi Restaurant,Breakfast Spot,Sandwich Place,Gym,Bakery,Bar
7,3,94110,Mexican Restaurant,Grocery Store,Coffee Shop,Latin American Restaurant,New American Restaurant,Dive Bar,Italian Restaurant,Cocktail Bar,Bar,Bakery
9,3,94112,Mexican Restaurant,Pizza Place,Liquor Store,Bakery,Latin American Restaurant,Sandwich Place,Vietnamese Restaurant,Bank,Park,Bar
10,3,94114,Gay Bar,Park,Thai Restaurant,Coffee Shop,Bakery,New American Restaurant,Café,Japanese Restaurant,Deli / Bodega,Pizza Place
11,3,94115,Cosmetics Shop,Bakery,Gift Shop,Ice Cream Shop,Boutique,Sandwich Place,Tea Room,Yoga Studio,Park,Spa
13,3,94117,Park,Coffee Shop,Gift Shop,Pizza Place,Ice Cream Shop,Yoga Studio,Liquor Store,Bakery,Sushi Restaurant,Bookstore
17,3,94123,Italian Restaurant,Gym / Fitness Center,French Restaurant,Cosmetics Shop,Yoga Studio,Burger Joint,Salad Place,Sandwich Place,Wine Bar,Taco Place
18,3,94124,Southern / Soul Food Restaurant,Park,Bakery,Pizza Place,Light Rail Station,Mexican Restaurant,Playground,Bistro,Theater,Gym
19,3,94127,Park,Grocery Store,Convenience Store,Burger Joint,Breakfast Spot,Mexican Restaurant,Café,Bar,Playground,Mediterranean Restaurant
20,3,94128,Rental Car Location,Airport Lounge,Airport Service,Spa,Bookstore,Japanese Restaurant,Boutique,Museum,Gift Shop,Exhibit


In [34]:
# CLuster 4
sf_venues_sorted.loc[sf_venues_sorted['ClusterLabel']==4]
# In this cluster, The most common venues are parks and spas

Unnamed: 0,ClusterLabel,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
26,4,94134,Park,Spa,Bus Station,Baseball Field,Cantonese Restaurant,Bakery,Coffee Shop,Trail,Library,Convenience Store


## Results and Discussion <span id="result"></span>  

As illustrated in section [Examine Clusters](#examine_cluster), Chinese restaurants are more popular in cluster 1.

## Conclusion <span id="conclusion"></span>  

As a conclusion, to open a successful Chinese restaurant in San Francisco, we should pick its location from the following areas:

| Area | PostalCode |
| - | - |
| Lake Merced | PostalCode 94116 |
| Sunset | PostalCode 94122 |
| Richmond | PostalCode 94118/94121 |

![SF Map](https://ljmoore.files.wordpress.com/2013/02/san-francisco-autofill-map1.jpg?w=800)