  
    
----      
<a id='toc'></a>
<center>
    <h1>Capstone Project - The Battle of Neighborhoods</font></h1>
    <h2>Coding Section</h2>
    By Pruthvi Reddy
</center>

----

## Table of Contents
- [Philadelphia](#Philadelphia)
    - [Download and Explore Philadelphia Dataset](#PHL1)
    - [Explore Neighborhoods in Philadelphia](#PHL2)
    - [Analyze Each Philadelphia Neighborhood](#PHL3)
    - [Cluster Philadelphia Neighborhoods](#PHL4)
    - [Examine Philadelphia Clusters](#PHL5)
- [San Francisco](#SanFrancisco)
    - [Download and Explore San Francisco Dataset](#SF1)
    - [Explore Neighborhoods in San Francisco](#SF2)
    - [Analyze Each San Francisco Neighborhood](#SF3)
    - [Cluster San Francisco Neighborhoods](#SF4)
    - [Examine San Francisco Clusters](#SF5)

<a id='Philadelphia'></a>
# Philadelphia
In this section, we will perform data extraction, data wrangling and data analysis required for Philadelphia.

#### Import Libraries

In [227]:
# Import Libraries
import requests
from bs4 import BeautifulSoup
import pandas as pd
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import numpy as np
import geocoder
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium
print('Libraries imported.')

Libraries imported.


<a id='PHL1'></a>
### 1. Download and Explore Philadelphia Dataset

#### Use Beauatiful Soup to extract the Philadelphia data into Pandas data frame

In [206]:
# Use Beauatiful Soup to extract the Philadelphia data into Pandas data frame

res = requests.get("https://en.wikipedia.org/wiki/List_of_Philadelphia_neighborhoods")
soup = BeautifulSoup(res.content,'lxml')

phl_neigh_ds1 = pd.DataFrame(item.get_text(strip=True) for item in soup.select("span.mw-headline"))
phl_neigh_ds1.columns = ['neighborhood']

phl_neigh_ds1 = phl_neigh_ds1[~phl_neigh_ds1['neighborhood'].isin(['References', 'External links'])]

phl_neigh_ds1.head()

Unnamed: 0,neighborhood
0,Center City
1,South Philadelphia
2,Southwest Philadelphia
3,West Philadelphia
4,Lower North Philadelphia


#### Use the neighborhood data to get latitude and longitude 

In [207]:
# Use the neighborhood data to get latitude and longitude 

#function to get latitude and longitude
def get_lat_long(address,city):
    google_api_key ='AIzaSyAQWqMTOcyLBRDR2skO4F_5QEWzNDOlUHw'
    lat_lng = None
    while(lat_lng is None):
        g = geocoder.google('{}, {}'.format(address,city), key=google_api_key)
        lat_lng = g.latlng
    return lat_lng

phl_latitude = []
phl_longitude = []

for index, row in phl_neigh_ds1.iterrows():
    phl_lat_long = get_lat_long(row["neighborhood"],'Philadelphia')
    phl_latitude.append(phl_lat_long[0])
    phl_longitude.append(phl_lat_long[1])
    
phl_neigh_ds1['latitude'] = phl_latitude
phl_neigh_ds1['longitude'] = phl_longitude
phl_neigh_ds1.head()

Unnamed: 0,neighborhood,latitude,longitude
0,Center City,39.950904,-75.157457
1,South Philadelphia,39.909315,-75.166212
2,Southwest Philadelphia,39.898299,-75.236238
3,West Philadelphia,39.975709,-75.2129
4,Lower North Philadelphia,40.006762,-75.142863


#### Get generic coordinates of Philadelphia

In [315]:
# Get generic coordinates of Philadelphia
address = 'Philadelphia, Pennsylvania'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
phl_latitude = location.latitude
phl_longitude = location.longitude
print('The geograpical coordinate of Philadelphia are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Philadelphia are 39.9524152, -75.1635755.


#### Map of Philadelphia using latitude and longitude values

In [317]:
# create map of Philadelphia using latitude and longitude values
map_phl = folium.Map(location=[phl_latitude, phl_longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(phl_neigh_ds1['latitude'], phl_neigh_ds1['longitude'], phl_neigh_ds1['neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_phl)  
    
map_phl

#### Foursquare Details

In [189]:
# Foursquare Details

client_id = 'QA3G22AQUC2H3BZK0QZDSPRRL5WCCZR2ZMYA12SNRKVK0OT4' # your Foursquare ID
client_secret = '302LWI5L0WSIROCPP52ZP1BQGLNALDP04V2NYJRJKBSGPMJV' # your Foursquare Secret
version = '20180605' # Foursquare API version

print('Your credentails:')
print('client_id: ' + client_id)
print('client_secret: ' + client_secret)

Your credentails:
client_id: QA3G22AQUC2H3BZK0QZDSPRRL5WCCZR2ZMYA12SNRKVK0OT4
client_secret: 302LWI5L0WSIROCPP52ZP1BQGLNALDP04V2NYJRJKBSGPMJV


<a id='PHL2'></a>
### 2. Explore Neighborhoods in Philadelphia

In [214]:
# Let's explore the first neighborhood in our dataframe.
phl_neigh_ds1.loc[0, 'neighborhood']

'Center City'

In [215]:
neighborhood_latitude = phl_neigh_ds1.loc[0, 'latitude'] # neighborhood latitude value
neighborhood_longitude = phl_neigh_ds1.loc[0, 'longitude'] # neighborhood longitude value

neighborhood_name = phl_neigh_ds1.loc[0, 'neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Center City are 39.9509036, -75.1574567.


In [237]:
# top 100 venues with in a radius of 500 meters
limit = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    client_id, 
    client_secret, 
    version, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    limit)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=QA3G22AQUC2H3BZK0QZDSPRRL5WCCZR2ZMYA12SNRKVK0OT4&client_secret=302LWI5L0WSIROCPP52ZP1BQGLNALDP04V2NYJRJKBSGPMJV&v=20180605&ll=39.9509036,-75.1574567&radius=500&limit=100'

In [None]:
results = requests.get(url).json()
# results

In [228]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [236]:
# Lets clean the json and structure it into a pandas dataframe.
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))
nearby_venues.head()

100 venues were returned by Foursquare.


Unnamed: 0,name,categories,lat,lng
0,MOM's Organic Market,Organic Grocery,39.950918,-75.158815
1,Di Bruno Bros.,Gourmet Shop,39.949148,-75.155587
2,Reading Terminal Market,Market,39.953341,-75.159306
3,Luke's Lobster Market East,Seafood Restaurant,39.950857,-75.158476
4,Meltkraft,Sandwich Place,39.95305,-75.158941


In [238]:
# Let's create a function to repeat the same process to all the neighborhoods in Philadelphia
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            client_id, 
            client_secret, 
            version, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [239]:
# Get venues for all PHL neighborhoods
phl_venues = getNearbyVenues(names=phl_neigh_ds1['neighborhood'],
                                   latitudes=phl_neigh_ds1['latitude'],
                                   longitudes=phl_neigh_ds1['longitude']
                                  )

Center City
South Philadelphia
Southwest Philadelphia
West Philadelphia
Lower North Philadelphia
Upper North Philadelphia
Bridesburg-Kensington-Port Richmond
Roxborough-Manayunk
Germantown-Chestnut Hill
Olney-Oak Lane
Near Northeast Philadelphia
Far Northeast Philadelphia


In [240]:
# check the size of the resulting dataframe
print(phl_venues.shape)
phl_venues.head()

(333, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Center City,39.950904,-75.157457,MOM's Organic Market,39.950918,-75.158815,Organic Grocery
1,Center City,39.950904,-75.157457,Luke's Lobster Market East,39.950857,-75.158476,Seafood Restaurant
2,Center City,39.950904,-75.157457,MilkBoy Philadelphia,39.950054,-75.158627,Music Venue
3,Center City,39.950904,-75.157457,Di Bruno Bros.,39.949148,-75.155587,Gourmet Shop
4,Center City,39.950904,-75.157457,Primo Hoagies,39.949216,-75.159052,Sandwich Place


In [244]:
# Check to see how many Thai Restaurant are near center city 
phl_venues[phl_venues['Venue Category'].str.contains('Thai', case = False)].head(5)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
34,Center City,39.950904,-75.157457,Xiandu Thai Fusion,39.948893,-75.160011,Thai Restaurant
36,Center City,39.950904,-75.157457,Little Thai Market,39.953202,-75.159499,Thai Restaurant
192,Roxborough-Manayunk,40.026001,-75.223111,Chabaa Thai Bistro,40.025885,-75.224442,Thai Restaurant


In [245]:
# Let's check how many venues were returned for each neighborhood
phl_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bridesburg-Kensington-Port Richmond,15,15,15,15,15,15
Center City,100,100,100,100,100,100
Far Northeast Philadelphia,5,5,5,5,5,5
Germantown-Chestnut Hill,53,53,53,53,53,53
Lower North Philadelphia,8,8,8,8,8,8
Near Northeast Philadelphia,18,18,18,18,18,18
Olney-Oak Lane,3,3,3,3,3,3
Roxborough-Manayunk,68,68,68,68,68,68
South Philadelphia,28,28,28,28,28,28
Southwest Philadelphia,13,13,13,13,13,13


In [247]:
# Let's find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(phl_venues['Venue Category'].unique())))

There are 138 uniques categories.


<a id='PHL3'></a>
### 3. Analyze Each Philadelphia Neighborhood

In [252]:
# one hot encoding
phl_onehot = pd.get_dummies(phl_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
phl_onehot['Neighborhood'] = phl_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [phl_onehot.columns[-1]] + list(phl_onehot.columns[:-1])
phl_onehot = phl_onehot[fixed_columns]

phl_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Bed & Breakfast,Beer Garden,Betting Shop,Board Shop,Bookstore,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Bus Station,Bus Stop,Café,Cajun / Creole Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Credit Union,Creperie,Deli / Bodega,Department Store,Dessert Shop,Discount Store,Dive Bar,Donut Shop,Electronics Store,Farmers Market,Fast Food Restaurant,Fish Market,Flower Shop,Food,Food & Drink Shop,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,Gastropub,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Health & Beauty Service,Hookah Bar,Hot Dog Joint,Hotel,Ice Cream Shop,Indian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Korean Restaurant,Light Rail Station,Liquor Store,Lounge,Malay Restaurant,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Mobile Phone Shop,Monument / Landmark,Museum,Music Venue,New American Restaurant,Optical Shop,Organic Grocery,Outdoor Sculpture,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Playground,Poke Place,Pub,Racetrack,Record Shop,Rental Car Location,Rental Service,Restaurant,Rock Club,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shoe Store,Shopping Plaza,Smoke Shop,Smoothie Shop,Snack Place,Souvenir Shop,Spa,Sporting Goods Shop,Sports Bar,Sushi Restaurant,Taco Place,Tanning Salon,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Weight Loss Center,Wine Bar,Wine Shop,Yoga Studio
0,Center City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Center City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Center City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Center City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Center City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [253]:
phl_onehot.shape

(333, 139)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [256]:
phl_grouped = phl_onehot.groupby('Neighborhood').mean().reset_index()
phl_grouped.head(5)

Unnamed: 0,Neighborhood,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Bed & Breakfast,Beer Garden,Betting Shop,Board Shop,Bookstore,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Bus Station,Bus Stop,Café,Cajun / Creole Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Credit Union,Creperie,Deli / Bodega,Department Store,Dessert Shop,Discount Store,Dive Bar,Donut Shop,Electronics Store,Farmers Market,Fast Food Restaurant,Fish Market,Flower Shop,Food,Food & Drink Shop,French Restaurant,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,Gastropub,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Health & Beauty Service,Hookah Bar,Hot Dog Joint,Hotel,Ice Cream Shop,Indian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Korean Restaurant,Light Rail Station,Liquor Store,Lounge,Malay Restaurant,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Mobile Phone Shop,Monument / Landmark,Museum,Music Venue,New American Restaurant,Optical Shop,Organic Grocery,Outdoor Sculpture,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Playground,Poke Place,Pub,Racetrack,Record Shop,Rental Car Location,Rental Service,Restaurant,Rock Club,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shoe Store,Shopping Plaza,Smoke Shop,Smoothie Shop,Snack Place,Souvenir Shop,Spa,Sporting Goods Shop,Sports Bar,Sushi Restaurant,Taco Place,Tanning Salon,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Weight Loss Center,Wine Bar,Wine Shop,Yoga Studio
0,Bridesburg-Kensington-Port Richmond,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Center City,0.01,0.01,0.0,0.02,0.0,0.0,0.06,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.03,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.02,0.0,0.01,0.02,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.02,0.02,0.02,0.02,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.02,0.0,0.01,0.01,0.01,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.02,0.03,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.04,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.02,0.02,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.01,0.0
2,Far Northeast Philadelphia,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Germantown-Chestnut Hill,0.056604,0.018868,0.018868,0.0,0.018868,0.0,0.075472,0.037736,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.037736,0.0,0.0,0.0,0.0,0.018868,0.0,0.037736,0.018868,0.0,0.018868,0.037736,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.0,0.018868,0.0,0.0,0.037736,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.037736,0.0,0.018868,0.0,0.0,0.0,0.037736,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.018868,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.037736,0.018868,0.0,0.018868,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.018868,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.037736,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Lower North Philadelphia,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's print each neighborhood along with the top 5 most common venues

In [260]:
num_top_venues = 5

for hood in phl_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = phl_grouped[phl_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bridesburg-Kensington-Port Richmond----
                 venue  freq
0          Pizza Place  0.13
1       Clothing Store  0.13
2  American Restaurant  0.07
3            Bookstore  0.07
4    Mobile Phone Shop  0.07


----Center City----
            venue  freq
0          Bakery  0.06
1  Sandwich Place  0.04
2     Pizza Place  0.03
3             Bar  0.03
4             Pub  0.03


----Far Northeast Philadelphia----
                     venue  freq
0             Credit Union   0.2
1        Mobile Phone Shop   0.2
2                   Bakery   0.2
3               Smoke Shop   0.2
4  Health & Beauty Service   0.2


----Germantown-Chestnut Hill----
                 venue  freq
0               Bakery  0.08
1  American Restaurant  0.06
2                 Park  0.04
3          Cheese Shop  0.04
4       Farmers Market  0.04


----Lower North Philadelphia----
            venue  freq
0    Intersection  0.25
1  Cosmetics Shop  0.12
2      Smoke Shop  0.12
3        Pharmacy  0.12
4             Bar

#### Let's put that into a *pandas* dataframe

In [261]:
# Function to sort the venues in descending order.
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [276]:
# Now let's create the new dataframe and display the top 10 venues for each neighborhood.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = phl_grouped['Neighborhood']

for ind in np.arange(phl_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(phl_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bridesburg-Kensington-Port Richmond,Pizza Place,Clothing Store,American Restaurant,Bookstore,Mobile Phone Shop,Fast Food Restaurant,Donut Shop,Sandwich Place,Chinese Restaurant,Sporting Goods Shop
1,Center City,Bakery,Sandwich Place,Bar,Burger Joint,Pub,Pizza Place,Indian Restaurant,Salad Place,Gym,Hot Dog Joint
2,Far Northeast Philadelphia,Credit Union,Mobile Phone Shop,Health & Beauty Service,Smoke Shop,Bakery,Electronics Store,Flower Shop,Fish Market,Fast Food Restaurant,Farmers Market
3,Germantown-Chestnut Hill,Bakery,American Restaurant,Boutique,Gym / Fitness Center,Ice Cream Shop,French Restaurant,Farmers Market,Park,Coffee Shop,Cheese Shop
4,Lower North Philadelphia,Intersection,Cosmetics Shop,Bar,Smoke Shop,Chinese Restaurant,Pharmacy,Hardware Store,Department Store,Dessert Shop,Discount Store


<a id='PHL4'></a>
### 4. Cluster Philadelphia Neighborhoods

In [265]:
# set number of clusters
kclusters = 5

phl_grouped_clustering = phl_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(phl_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 4, 3, 4, 0, 0, 2, 4, 4, 4], dtype=int32)

New dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [277]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

phl_merged = phl_neigh_ds1.rename(index=str, columns={"neighborhood": "Neighborhood", "latitude": "Latitude", "longitude": "Longitude"})
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
phl_merged = phl_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

phl_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Center City,39.950904,-75.157457,4,Bakery,Sandwich Place,Bar,Burger Joint,Pub,Pizza Place,Indian Restaurant,Salad Place,Gym,Hot Dog Joint
1,South Philadelphia,39.909315,-75.166212,4,Baseball Stadium,Outdoor Sculpture,American Restaurant,Baseball Field,Sandwich Place,Lounge,Ice Cream Shop,BBQ Joint,Betting Shop,Restaurant
2,Southwest Philadelphia,39.898299,-75.236238,4,Hotel,Asian Restaurant,Rental Service,Cosmetics Shop,Discount Store,Shoe Store,Fast Food Restaurant,Flower Shop,Rental Car Location,Intersection
3,West Philadelphia,39.975709,-75.2129,4,Light Rail Station,Intersection,Board Shop,Art Gallery,Pet Store,Athletics & Sports,Museum,Sculpture Garden,Thrift / Vintage Store,Scenic Lookout
4,Lower North Philadelphia,40.006762,-75.142863,0,Intersection,Cosmetics Shop,Bar,Smoke Shop,Chinese Restaurant,Pharmacy,Hardware Store,Department Store,Dessert Shop,Discount Store


#### Visualize the resulting clusters

In [307]:
# create map
phl_map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11, tiles="Mapbox Bright")

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(phl_merged['Latitude'], phl_merged['Longitude'], phl_merged['Neighborhood'], phl_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(phl_map_clusters)
       
phl_map_clusters

<a id='PHL5'></a>
### 5. Examine Philadelphia Clusters

Lets examine each cluster and determine the discriminating venue categories that distinguish each cluster.

##### Cluster 1

In [279]:
phl_merged.loc[phl_merged['Cluster Labels'] == 0, phl_merged.columns[[1] + list(range(5, phl_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,40.006762,Cosmetics Shop,Bar,Smoke Shop,Chinese Restaurant,Pharmacy,Hardware Store,Department Store,Dessert Shop,Discount Store
10,40.068629,Bar,Pharmacy,Discount Store,Chinese Restaurant,Liquor Store,Shopping Plaza,Sporting Goods Shop,Golf Course,Grocery Store


##### Cluster 2

In [280]:
phl_merged.loc[phl_merged['Cluster Labels'] == 1, phl_merged.columns[[1] + list(range(5, phl_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,40.004028,Discount Store,Fast Food Restaurant,Donut Shop,Breakfast Spot,Dessert Shop,Dive Bar,Department Store,Electronics Store,Frozen Yogurt Shop


##### Cluster 3

In [281]:
phl_merged.loc[phl_merged['Cluster Labels'] == 2, phl_merged.columns[[1] + list(range(5, phl_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,40.029809,Gas Station,Grocery Store,Deli / Bodega,Department Store,Dessert Shop,Discount Store,Dive Bar,Donut Shop,French Restaurant


##### Cluster 4

In [282]:
phl_merged.loc[phl_merged['Cluster Labels'] == 3, phl_merged.columns[[1] + list(range(5, phl_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,40.103866,Mobile Phone Shop,Health & Beauty Service,Smoke Shop,Bakery,Electronics Store,Flower Shop,Fish Market,Fast Food Restaurant,Farmers Market


##### Cluster 5

In [283]:
phl_merged.loc[phl_merged['Cluster Labels'] == 4, phl_merged.columns[[1] + list(range(5, phl_merged.shape[1]))]]

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,39.950904,Sandwich Place,Bar,Burger Joint,Pub,Pizza Place,Indian Restaurant,Salad Place,Gym,Hot Dog Joint
1,39.909315,Outdoor Sculpture,American Restaurant,Baseball Field,Sandwich Place,Lounge,Ice Cream Shop,BBQ Joint,Betting Shop,Restaurant
2,39.898299,Asian Restaurant,Rental Service,Cosmetics Shop,Discount Store,Shoe Store,Fast Food Restaurant,Flower Shop,Rental Car Location,Intersection
3,39.975709,Intersection,Board Shop,Art Gallery,Pet Store,Athletics & Sports,Museum,Sculpture Garden,Thrift / Vintage Store,Scenic Lookout
6,39.995553,Clothing Store,American Restaurant,Bookstore,Mobile Phone Shop,Fast Food Restaurant,Donut Shop,Sandwich Place,Chinese Restaurant,Sporting Goods Shop
7,40.026001,Pizza Place,New American Restaurant,Grocery Store,Bakery,Mexican Restaurant,Trail,Gym / Fitness Center,Chinese Restaurant,Coffee Shop
8,40.074334,American Restaurant,Boutique,Gym / Fitness Center,Ice Cream Shop,French Restaurant,Farmers Market,Park,Coffee Shop,Cheese Shop


<a id='SanFrancisco'></a>
# San Francisco
In this section, we will perform data extraction, data wrangling and data analysis required for San Francisco.

<a id='SF1'></a>
### 1. Download and Explore San Francisco Dataset

#### Use Beauatiful Soup to extract the San Francisco data into Pandas data frame

In [300]:
# Use Beauatiful Soup to extract the San Francisco data into Pandas data frame

sf_res = requests.get("https://en.wikipedia.org/wiki/List_of_neighborhoods_in_San_Francisco")
sf_soup = BeautifulSoup(sf_res.content,'lxml')

In [306]:
sf_neigh = pd.DataFrame(item.get_text(strip=True) for item in sf_soup.select("span.mw-headline"))
sf_neigh.columns = ['neighborhood']

sf_neigh = sf_neigh[~sf_neigh['neighborhood'].isin(['References', 'External links','See also','Specific neighborhoods'])]

sf_neigh.head()

Unnamed: 0,neighborhood
0,Alamo Square
1,Anza Vista
2,Ashbury Heights
3,Balboa Park
4,Balboa Terrace


#### Use the neighborhood data to get latitude and longitude 

In [308]:
# Use the neighborhood data to get latitude and longitude 

#function to get latitude and longitude
def get_lat_long(address,city):
    google_api_key ='AIzaSyAQWqMTOcyLBRDR2skO4F_5QEWzNDOlUHw'
    lat_lng = None
    while(lat_lng is None):
        g = geocoder.google('{}, {}'.format(address,city), key=google_api_key)
        lat_lng = g.latlng
    return lat_lng

sf_latitude = []
sf_longitude = []

for index, row in sf_neigh.iterrows():
    sf_lat_long = get_lat_long(row["neighborhood"],'San Francisco')
    sf_latitude.append(sf_lat_long[0])
    sf_longitude.append(sf_lat_long[1])
    
sf_neigh['latitude'] = sf_latitude
sf_neigh['longitude'] = sf_longitude
sf_neigh.head()

Unnamed: 0,neighborhood,latitude,longitude
0,Alamo Square,37.777499,-122.433252
1,Anza Vista,37.780868,-122.443185
2,Ashbury Heights,37.76922,-122.448139
3,Balboa Park,37.724569,-122.443357
4,Balboa Terrace,37.731333,-122.468661


#### Get generic coordinates of San Francisco

In [312]:
# Get generic coordinates of San Francisco
address = 'San Francisco, California'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
sf_latitude = location.latitude
sf_longitude = location.longitude
print('The geograpical coordinate of San Francisco are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of San Francisco are 39.9524152, -75.1635755.


#### Map of San Francisco using latitude and longitude values

In [359]:
# create map of San Francisco using latitude and longitude values
map_sf = folium.Map(location=[sf_latitude, sf_longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(sf_neigh['latitude'], sf_neigh['longitude'], sf_neigh['neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_sf)  
    
map_sf

<a id='SF2'></a>
### 2. Explore San Francisco Neighborhoods

In [318]:
# Let's explore the first neighborhood in our dataframe.
sf_neigh.loc[0, 'neighborhood']

'Alamo Square'

In [319]:
sf_neighborhood_latitude = sf_neigh.loc[0, 'latitude'] # neighborhood latitude value
sf_neighborhood_longitude = sf_neigh.loc[0, 'longitude'] # neighborhood longitude value

sf_neighborhood_name = sf_neigh.loc[0, 'neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(sf_neighborhood_name, 
                                                               sf_neighborhood_latitude, 
                                                               sf_neighborhood_longitude))

Latitude and longitude values of Alamo Square are 37.7774994, -122.433252.


In [320]:
# top 100 venues with in a radius of 500 meters
limit = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    client_id, 
    client_secret, 
    version, 
    sf_neighborhood_latitude, 
    sf_neighborhood_longitude, 
    radius, 
    limit)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=QA3G22AQUC2H3BZK0QZDSPRRL5WCCZR2ZMYA12SNRKVK0OT4&client_secret=302LWI5L0WSIROCPP52ZP1BQGLNALDP04V2NYJRJKBSGPMJV&v=20180605&ll=37.7774994,-122.433252&radius=500&limit=100'

In [321]:
results = requests.get(url).json()
# results

In [322]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [323]:
# Lets clean the json and structure it into a pandas dataframe.
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))
nearby_venues.head()

61 venues were returned by Foursquare.


Unnamed: 0,name,categories,lat,lng
0,Alamo Square,Park,37.775906,-122.434047
1,Painted Ladies,Historic Site,37.77601,-122.433179
2,Alamo Square Dog Park,Dog Run,37.775878,-122.43574
3,Originals Vinyl,Record Shop,37.775835,-122.431227
4,Kebab King,Pakistani Restaurant,37.779786,-122.431589


In [None]:
# Let's create a function to repeat the same process to all the neighborhoods in Philadelphia
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            client_id, 
            client_secret, 
            version, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
# Get venues for all PHL neighborhoods
sf_venues = getNearbyVenues(names=sf_neigh['neighborhood'],
                                   latitudes=sf_neigh['latitude'],
                                   longitudes=sf_neigh['longitude']
                                  )

In [325]:
# check the size of the resulting dataframe
print(sf_venues.shape)
sf_venues.head()

(5517, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Alamo Square,37.777499,-122.433252,Alamo Square,37.775906,-122.434047,Park
1,Alamo Square,37.777499,-122.433252,Painted Ladies,37.77601,-122.433179,Historic Site
2,Alamo Square,37.777499,-122.433252,Alamo Square Dog Park,37.775878,-122.43574,Dog Run
3,Alamo Square,37.777499,-122.433252,Originals Vinyl,37.775835,-122.431227,Record Shop
4,Alamo Square,37.777499,-122.433252,Kebab King,37.779786,-122.431589,Pakistani Restaurant


In [326]:
# Check to see how many Thai Restaurant are near center city 
sf_venues[sf_venues['Venue Category'].str.contains('Thai', case = False)].head(5)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
134,Ashbury Heights,37.76922,-122.448139,Siam Lotus Thai Cuisine,37.769495,-122.450901,Thai Restaurant
146,Ashbury Heights,37.76922,-122.448139,Ploy II Thai Cuisine,37.769514,-122.4514,Thai Restaurant
164,Ashbury Heights,37.76922,-122.448139,Hippie Thai Street Food,37.770218,-122.445708,Thai Restaurant
200,Balboa Terrace,37.731333,-122.468661,Jitra Thai Cuisine,37.731477,-122.472941,Thai Restaurant
201,Balboa Terrace,37.731333,-122.468661,Ocean Thai,37.731317,-122.473016,Thai Restaurant


In [329]:
# Let's check how many venues were returned for each neighborhood
sf_venues.groupby('Neighborhood').count().head(10)

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alamo Square,61,61,61,61,61,61
Anza Vista,21,21,21,21,21,21
Ashbury Heights,94,94,94,94,94,94
Balboa Park,7,7,7,7,7,7
Balboa Terrace,21,21,21,21,21,21
Bayview,3,3,3,3,3,3
Belden Place,100,100,100,100,100,100
Bernal Heights,39,39,39,39,39,39
Buena Vista,65,65,65,65,65,65
Butchertown (Old and New),15,15,15,15,15,15


In [330]:
# Let's find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(sf_venues['Venue Category'].unique())))

There are 351 uniques categories.


<a id='SF3'></a>
### 3. Analyze Each San Francisco Neighborhood

In [331]:
# one hot encoding
sf_onehot = pd.get_dummies(sf_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
sf_onehot['Neighborhood'] = sf_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [sf_onehot.columns[-1]] + list(sf_onehot.columns[:-1])
sf_onehot = sf_onehot[fixed_columns]

sf_onehot.head()

Unnamed: 0,Yoga Studio,ATM,Acai House,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Alternative Healer,American Restaurant,Antique Shop,Aquarium,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Bath House,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bistro,Board Shop,Boat or Ferry,Bookstore,Border Crossing,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Buffet,Building,Burger Joint,Burmese Restaurant,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Cave,Cha Chaan Teng,Cheese Shop,Chinese Restaurant,Chiropractor,Chocolate Shop,Church,Churrascaria,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Bookstore,College Cafeteria,College Gym,Comedy Club,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Costume Shop,Coworking Space,Creperie,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dive Shop,Doctor's Office,Dog Run,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Financial or Legal Service,Flea Market,Flower Shop,Fondue Restaurant,Food,Food & Drink Shop,Food Service,Food Stand,Food Truck,Football Stadium,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Herbs & Spices Store,Hill,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Hunan Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Intersection,Irish Pub,Island,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jiangsu Restaurant,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Laundry Service,Lawyer,Library,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Luggage Store,Mac & Cheese Joint,Malay Restaurant,Marijuana Dispensary,Market,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Memorial Site,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Motorcycle Shop,Mountain,Movie Theater,Moving Target,Museum,Music School,Music Store,Music Venue,Nabe Restaurant,Nail Salon,National Park,Neighborhood,New American Restaurant,Newsstand,Nightclub,Non-Profit,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Organic Grocery,Outdoor Sculpture,Outdoor Supply Store,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Persian Restaurant,Peruvian Restaurant,Pet Café,Pet Store,Pharmacy,Photography Lab,Physical Therapist,Pier,Piercing Parlor,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Pool Hall,Pop-Up Shop,Pub,Public Art,Racetrack,Ramen Restaurant,Record Shop,Recording Studio,Recreation Center,Rental Car Location,Reservoir,Residential Building (Apartment / Condo),Restaurant,Road,Rock Club,Roller Rink,Russian Restaurant,Salad Place,Salon / Barbershop,Salvadoran Restaurant,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shopping Mall,Shopping Plaza,Sicilian Restaurant,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,South American Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Stationery Store,Steakhouse,Street Food Gathering,Supermarket,Supplement Shop,Surf Spot,Sushi Restaurant,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Tiki Bar,Tour Provider,Tourist Information Center,Toy / Game Store,Track Stadium,Trade School,Trail,Train Station,Tree,Tunnel,Turkish Restaurant,Tuscan Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Vineyard,Wagashi Place,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Alamo Square,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Alamo Square,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Alamo Square,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Alamo Square,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Alamo Square,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [333]:
sf_onehot.shape

(5517, 351)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [334]:
sf_grouped = sf_onehot.groupby('Neighborhood').mean().reset_index()
sf_grouped.head(5)

Unnamed: 0,Neighborhood,Yoga Studio,ATM,Acai House,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Alternative Healer,American Restaurant,Antique Shop,Aquarium,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Bath House,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bistro,Board Shop,Boat or Ferry,Bookstore,Border Crossing,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Buffet,Building,Burger Joint,Burmese Restaurant,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Cave,Cha Chaan Teng,Cheese Shop,Chinese Restaurant,Chiropractor,Chocolate Shop,Church,Churrascaria,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Bookstore,College Cafeteria,College Gym,Comedy Club,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Costume Shop,Coworking Space,Creperie,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dive Shop,Doctor's Office,Dog Run,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Financial or Legal Service,Flea Market,Flower Shop,Fondue Restaurant,Food,Food & Drink Shop,Food Service,Food Stand,Food Truck,Football Stadium,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Herbs & Spices Store,Hill,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Hunan Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Intersection,Irish Pub,Island,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jiangsu Restaurant,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Laundry Service,Lawyer,Library,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Luggage Store,Mac & Cheese Joint,Malay Restaurant,Marijuana Dispensary,Market,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Memorial Site,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Motorcycle Shop,Mountain,Movie Theater,Moving Target,Museum,Music School,Music Store,Music Venue,Nabe Restaurant,Nail Salon,National Park,New American Restaurant,Newsstand,Nightclub,Non-Profit,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Organic Grocery,Outdoor Sculpture,Outdoor Supply Store,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Persian Restaurant,Peruvian Restaurant,Pet Café,Pet Store,Pharmacy,Photography Lab,Physical Therapist,Pier,Piercing Parlor,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Pool Hall,Pop-Up Shop,Pub,Public Art,Racetrack,Ramen Restaurant,Record Shop,Recording Studio,Recreation Center,Rental Car Location,Reservoir,Residential Building (Apartment / Condo),Restaurant,Road,Rock Club,Roller Rink,Russian Restaurant,Salad Place,Salon / Barbershop,Salvadoran Restaurant,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shopping Mall,Shopping Plaza,Sicilian Restaurant,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,South American Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Stationery Store,Steakhouse,Street Food Gathering,Supermarket,Supplement Shop,Surf Spot,Sushi Restaurant,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Tiki Bar,Tour Provider,Tourist Information Center,Toy / Game Store,Track Stadium,Trade School,Trail,Train Station,Tree,Tunnel,Turkish Restaurant,Tuscan Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Vineyard,Wagashi Place,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store
0,Alamo Square,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.016393,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.016393,0.0,0.065574,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.016393,0.0,0.016393,0.016393,0.0,0.0,0.04918,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.032787,0.0,0.0,0.016393,0.016393,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032787,0.0,0.0,0.0,0.0,0.016393,0.016393,0.0,0.0,0.016393,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.04918,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032787,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.016393,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.032787,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.016393,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032787,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Anza Vista,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.047619,0.047619,0.0,0.0,0.0,0.190476,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Ashbury Heights,0.021277,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.031915,0.0,0.06383,0.0,0.0,0.0,0.031915,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.031915,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.010638,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.031915,0.010638,0.053191,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.010638,0.0,0.0,0.031915,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.010638,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.010638,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.031915,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.010638,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.010638,0.010638,0.0,0.0,0.0,0.031915,0.0,0.0,0.042553,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.010638
3,Balboa Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Balboa Terrace,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's print each neighborhood along with the top 5 most common venues

In [335]:
num_top_venues = 5

for hood in sf_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = sf_grouped[sf_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Alamo Square----
          venue  freq
0           Bar  0.07
1          Café  0.05
2          Park  0.05
3  Liquor Store  0.03
4         Hotel  0.03


----Anza Vista----
                     venue  freq
0                     Café  0.19
1  Health & Beauty Service  0.10
2              Coffee Shop  0.10
3       Mexican Restaurant  0.05
4        Electronics Store  0.05


----Ashbury Heights----
                    venue  freq
0                Boutique  0.06
1             Coffee Shop  0.05
2  Thrift / Vintage Store  0.04
3             Pizza Place  0.04
4                    Café  0.03


----Balboa Park----
          venue  freq
0          Park  0.14
1     BBQ Joint  0.14
2          Pool  0.14
3  Dessert Shop  0.14
4  Tennis Court  0.14


----Balboa Terrace----
                venue  freq
0     Thai Restaurant  0.10
1  Light Rail Station  0.10
2          Playground  0.05
3                 Gym  0.05
4              Bakery  0.05


----Bayview----
                   venue  freq
0             

#### Let's put that into a *pandas* dataframe

In [337]:
# Function to sort the venues in descending order.
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [338]:
# Now let's create the new dataframe and display the top 10 venues for each neighborhood.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = sf_grouped['Neighborhood']

for ind in np.arange(phl_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sf_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alamo Square,Bar,Café,Park,Pizza Place,Seafood Restaurant,Liquor Store,Sushi Restaurant,Hotel,Mediterranean Restaurant,Boutique
1,Anza Vista,Café,Coffee Shop,Health & Beauty Service,Sandwich Place,Tunnel,Big Box Store,Mexican Restaurant,Bus Station,Bus Line,Electronics Store
2,Ashbury Heights,Boutique,Coffee Shop,Thrift / Vintage Store,Pizza Place,Thai Restaurant,Shoe Store,Gift Shop,Bookstore,Breakfast Spot,Clothing Store
3,Balboa Park,Park,Pool,BBQ Joint,Tennis Court,Sandwich Place,Light Rail Station,Dessert Shop,Fast Food Restaurant,Ethiopian Restaurant,Event Space
4,Balboa Terrace,Thai Restaurant,Light Rail Station,Japanese Restaurant,Dessert Shop,Sushi Restaurant,Shoe Repair,Gym,Park,Circus,Korean Restaurant


<a id='SF4'></a>
### 4. Cluster San Francisco Neighborhoods

In [339]:
# set number of clusters
kclusters = 5

sf_grouped_clustering = sf_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sf_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 0, 3, 0, 0, 0, 0], dtype=int32)

New dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [340]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

sf_merged = sf_neigh.rename(index=str, columns={"neighborhood": "Neighborhood", "latitude": "Latitude", "longitude": "Longitude"})
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
sf_merged = sf_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

sf_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alamo Square,37.777499,-122.433252,0,Bar,Café,Park,Pizza Place,Seafood Restaurant,Liquor Store,Sushi Restaurant,Hotel,Mediterranean Restaurant,Boutique
1,Anza Vista,37.780868,-122.443185,0,Café,Coffee Shop,Health & Beauty Service,Sandwich Place,Tunnel,Big Box Store,Mexican Restaurant,Bus Station,Bus Line,Electronics Store
2,Ashbury Heights,37.76922,-122.448139,0,Boutique,Coffee Shop,Thrift / Vintage Store,Pizza Place,Thai Restaurant,Shoe Store,Gift Shop,Bookstore,Breakfast Spot,Clothing Store
3,Balboa Park,37.724569,-122.443357,0,Park,Pool,BBQ Joint,Tennis Court,Sandwich Place,Light Rail Station,Dessert Shop,Fast Food Restaurant,Ethiopian Restaurant,Event Space
4,Balboa Terrace,37.731333,-122.468661,0,Thai Restaurant,Light Rail Station,Japanese Restaurant,Dessert Shop,Sushi Restaurant,Shoe Repair,Gym,Park,Circus,Korean Restaurant


#### Visualize the resulting clusters

In [345]:
# create map
sf_map_clusters = folium.Map(location=[sf_latitude, sf_longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(sf_merged['Latitude'], sf_merged['Longitude'], sf_merged['Neighborhood'], sf_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(sf_map_clusters)
       
sf_map_clusters

<a id='SF5'></a>
### 5. Examine San Francisco Clusters

Lets examine each cluster and determine the discriminating venue categories that distinguish each cluster.

##### Cluster 1

In [347]:
sf_merged.loc[sf_merged['Cluster Labels'] == 0, sf_merged.columns[[1] + list(range(5, sf_merged.shape[1]))]].head()

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,37.777499,Café,Park,Pizza Place,Seafood Restaurant,Liquor Store,Sushi Restaurant,Hotel,Mediterranean Restaurant,Boutique
1,37.780868,Coffee Shop,Health & Beauty Service,Sandwich Place,Tunnel,Big Box Store,Mexican Restaurant,Bus Station,Bus Line,Electronics Store
2,37.76922,Coffee Shop,Thrift / Vintage Store,Pizza Place,Thai Restaurant,Shoe Store,Gift Shop,Bookstore,Breakfast Spot,Clothing Store
3,37.724569,Pool,BBQ Joint,Tennis Court,Sandwich Place,Light Rail Station,Dessert Shop,Fast Food Restaurant,Ethiopian Restaurant,Event Space
4,37.731333,Light Rail Station,Japanese Restaurant,Dessert Shop,Sushi Restaurant,Shoe Repair,Gym,Park,Circus,Korean Restaurant


##### Cluster 2

In [350]:
sf_merged.loc[sf_merged['Cluster Labels'] == 1, sf_merged.columns[[1] + list(range(5, sf_merged.shape[1]))]].head(5)

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
34,37.758588,,,,,,,,,
36,37.752719,,,,,,,,,
58,37.793818,,,,,,,,,
89,37.798874,,,,,,,,,
96,37.736767,,,,,,,,,


##### Cluster 3

In [353]:
sf_merged.loc[sf_merged['Cluster Labels'] == 2, sf_merged.columns[[1] + list(range(5, sf_merged.shape[1]))]].head(5)

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,37.755043,,,,,,,,,
72,37.732321,,,,,,,,,
108,37.75251,,,,,,,,,
111,37.754324,,,,,,,,,


##### Cluster 4

In [354]:
sf_merged.loc[sf_merged['Cluster Labels'] == 3, sf_merged.columns[[1] + list(range(5, sf_merged.shape[1]))]].head(5)

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,37.730416,Art Gallery,Gym,Women's Store,Flower Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant
39,37.730416,,,,,,,,,


##### Cluster 5

In [355]:
sf_merged.loc[sf_merged['Cluster Labels'] == 4, sf_merged.columns[[1] + list(range(5, sf_merged.shape[1]))]].head(5)

Unnamed: 0,Latitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,37.761812,,,,,,,,,
28,37.724415,,,,,,,,,
33,37.747315,,,,,,,,,
67,37.738927,,,,,,,,,
73,37.738333,,,,,,,,,
