# Battle of the Neighborhoods - Week 4

## Introduction/Problem

Imagine a person who has worked in London, UK for a long time and has recently been reassigned by Microsoft to San Francisco, California in the United States. Due to the relocation, the person and his/her family will be relocating to San Francisco. The person preferably would like to stay in an area in San Francisco that is similar to where their family is staying now in London. The person currently stays near the Richmond Underground tube station and the Microsoft Headquarters in San Francisco is at 555 California St 200, San Francisco, CA 94104, United States.

## Data

To address this issue, the person decided to use the Foursquare data and conduct an analysis of Houston areas that can meet the family's requirement. The plan of action is as follows:
1. Obtain location of Richmond Underground Station and explore the area within 500m of the station
2. Obtain location of neighborhoods for San Francisco.
3. Explore each neighborhoods and add the explored data from Richmond.
4. Cluster the neighborhoods and highlight which San Francisco cluster is similar to Richmond.
5. Within that cluster, sort neighborhoods by distance to Microsoft Headquarters.

We will be using the neighborhood data for San Francisco from Wikipedia - https://en.wikipedia.org/wiki/List_of_neighborhoods_in_San_Francisco.

We will be usin venue data using the Foursquare API to explore areas around Richmond Underground Station and the neighborhoods in San Francisco.

The data will then be combined and clustered to reveal neighborhoods which is similar to Richmond. The neighborhoods are then sorted by distance to the new workplace.

## Import all the relevant libraries

In [1]:
# import pandas library for data analsysis
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# import library to handle requests
import requests

# tranform JSON file into a pandas dataframe
from pandas import json_normalize

# import numpy
import numpy as np

# import regex
import re

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import urlopen to access internet data
from urllib.request import urlopen

# import Beautiful soup to parse html page
from bs4 import BeautifulSoup

#import folium for generating map
import folium

# import k-means from clustering stage
from sklearn.cluster import KMeans

# import geopy
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
from geopy import distance

# import os.path
import os.path

# confirm that all libraries have been imported
print('Libraries imported!')

Libraries imported!


## Obtain long-lat of Richmond Underground Station, UK

Will use geopy to obtain latitudes and longitudes of Richmond underground station, which has the following address: The Quadrant, Richmond TW9 1EZ.

In [2]:
geolocator = Nominatim(user_agent="My_IBM_Week_4_Submission")
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=5)
location = geolocator.geocode("The Quadrant, Richmond TW9 1EZ")
rich_long = location.longitude
rich_lat = location.latitude
print("Richmond Underground Station is located at {} with latitude of {} and longitude of {}.".format(location.address,
                                                                                                     rich_lat,rich_long))

Richmond Underground Station is located at The Quadrant, Petersham, London Borough of Richmond upon Thames, London, Greater London, England, TW9 1DJ, United Kingdom with latitude of 51.4623107 and longitude of -0.3027858.


## What's around Richmond Underground Station, UK?

In [3]:
# @hidden_cell
CLIENT_ID = 'your Foursquare ID' # your Foursquare ID
CLIENT_SECRET = 'your Foursquare client secret' # your Foursquare Secret
VERSION = 'your version' # Foursquare API version

## Exploring around Richmond Underground Station

Let's explore 500m around Richmond and see what results we get.

In [46]:
radius = 500
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},\
{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, rich_lat, rich_long, VERSION, radius, LIMIT)
results = requests.get(url).json()

Borrowing the get category function from earlier assignments.

In [47]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Clean and restructure json data into a dataframe.

In [48]:
venues = results['response']['groups'][0]['items']

nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Butter Beans,Coffee Shop,51.46359,-0.301869
1,Kiss the Hippo Coffee,Coffee Shop,51.460919,-0.30423
2,Richmond Green,Park,51.46125,-0.305918
3,Richmond Theatre,Theater,51.46213,-0.304009
4,Digme Fitness,Cycle Studio,51.461212,-0.301726


Let's find out how many venues were returned?

In [49]:
print('{} venues were returned by Foursquare areound Richmond'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare areound Richmond


Let's add Richmond in as a neighborhod along with latitude and longitudes as we will use this information later

In [50]:
nearby_venues['Neighborhood']='Richmond'
nearby_venues['Neighborhood Latitude']=rich_lat
nearby_venues['Neighborhood Longitude']=rich_long
# rename columns and rearrange
nearby_venues.rename(columns={'name':'Venue','categories':'Venue Category','lat':'Venue Latitude','lng':'Venue Longitude'},
                    inplace=True)
arranged_columns = list(nearby_venues.columns[-3:])+list(nearby_venues.columns[:-3])
nearby_venues=nearby_venues[arranged_columns]
nearby_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Category,Venue Latitude,Venue Longitude
0,Richmond,51.462311,-0.302786,Butter Beans,Coffee Shop,51.46359,-0.301869
1,Richmond,51.462311,-0.302786,Kiss the Hippo Coffee,Coffee Shop,51.460919,-0.30423
2,Richmond,51.462311,-0.302786,Richmond Green,Park,51.46125,-0.305918
3,Richmond,51.462311,-0.302786,Richmond Theatre,Theater,51.46213,-0.304009
4,Richmond,51.462311,-0.302786,Digme Fitness,Cycle Studio,51.461212,-0.301726


## Now let's get all the neighborhoods in San Francisco

Using urlopen and read() to get the html codes and then parsing it through BeautifulSoup.

In [10]:
url = 'https://en.wikipedia.org/wiki/List_of_neighborhoods_in_San_Francisco'
# use urlopen to open url and read() to obtain html codes
html = urlopen(url).read()
# use BeautifulSoup to parse html
raw_data = BeautifulSoup(html,'html.parser')

## Initialize dataframe

The dataframe is then initialised using a set column header names.

In [11]:
column_header = ["Neighborhood","Latitude","Longitude"]
df = pd.DataFrame(columns = column_header)
df

Unnamed: 0,Neighborhood,Latitude,Longitude


## Obtain row data from each list using the html tags 'li'

Each neighborhood information is found between the html tags 'li' and there are 119 neighborhoods listed.

In [12]:
row_tags = raw_data('li')
# iterate across each list tags and append to dataframe
for tags in row_tags[0:119]:
    df = df.append({'Neighborhood':tags.text.split(maxsplit=1)[1]},ignore_index=True)
df.head(10)

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Alamo Square,,
1,Anza Vista,,
2,Ashbury Heights,,
3,Balboa Park,,
4,Balboa Terrace,,
5,Bayview,,
6,Belden Place,,
7,Bernal Heights,,
8,Buena Vista,,
9,Butchertown (Old and New),,


Let's clean up neighborhood names with parentheses

In [13]:
# Find neighborhoods with parenthesis, use regex to find matches before the open parenthesis and replaced the values
for neigh in df['Neighborhood']:
    if re.match(r'(.*?)\(.*\)',neigh) != None:
        df.replace({'Neighborhood':neigh},value=re.match(r'(.*?)\(.*\)',neigh)[1],inplace=True)
df.head(10)

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Alamo Square,,
1,Anza Vista,,
2,Ashbury Heights,,
3,Balboa Park,,
4,Balboa Terrace,,
5,Bayview,,
6,Belden Place,,
7,Bernal Heights,,
8,Buena Vista,,
9,Butchertown,,


## Obtaining data from geopy using neighborhood name

Now let's use geopy to find the latitude and longitude of the neighborhoods in San Francisco. We'll have to focus geopy to search within San Francisco area. As geopy is a limited service, we will store what we have found and save onto a local csv. The code will check if the csv exists, load the file and iterate over lines which does not have latitude / longitude. If not it will create a new one.

In [14]:
if os.path.exists('./SanFran_neighborhoods.csv')==True:
    print('File Exist...using file')
    df = pd.read_csv('SanFran_neighborhoods.csv')
    geolocator = Nominatim(user_agent="My_IBM_Week_4_Submission")
    for i,neigh in enumerate(df['Neighborhood']):
        if pd.isna(df.at[i,'Latitude']) == True:
            try:
                location = geolocator.geocode(neigh,
                                              country_codes="us",exactly_one=False,viewbox=[(37.82,-122.53),(37.68,-122.31)],
                                             bounded=True)
                neigh_long = location[0].longitude
                neigh_lat = location[0].latitude
                df.at[i,'Latitude'] = neigh_lat
                df.at[i,'Longitude'] = neigh_long
            except:
                print('Location details not available for',neigh)
                continue
    df.to_csv('Houston_neighborhoods.csv',index=False)
else:
    geolocator = Nominatim(user_agent="My_IBM_Week_4_Submission")
    for i,neigh in enumerate(df['Neighborhood']):
        try:
            location = geolocator.geocode(neigh,
                                          country_codes="us",exactly_one=False,viewbox=[(37.82,-122.53),(37.68,-122.31)],
                                         bounded=True)
            neigh_long = location[0].longitude
            neigh_lat = location[0].latitude
            df.at[i,'Latitude'] = neigh_lat
            df.at[i,'Longitude'] = neigh_long
        except:
            print('Location details not available for',neigh)
            continue
    df.to_csv('SanFran_neighborhoods.csv',index=False)

File Exist...using file
Location details not available for Ashbury Heights
Location details not available for Butchertown 
Location details not available for Clarendon Heights
Location details not available for Golden Gate Heights
Location details not available for Ingleside Terraces
Location details not available for Lower Nob Hill
Location details not available for Monterey Heights
Location details not available for Outer Sunset
Location details not available for Polk Gulch
Location details not available for Westwood Highlands


We have some location that cannot be populated via geopy. For the purpose of this assignment, we'll drop those that cannot be populated.

In [16]:
df.dropna(inplace=True)
df.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Alamo Square,37.77636,-122.434689
1,Anza Vista,37.780836,-122.443149
3,Balboa Park,37.721427,-122.447547
4,Balboa Terrace,37.721427,-122.447547
5,Bayview,37.728889,-122.3925


Let's draw up existing neighborhoods in folium first to check for accuracy of location details. We'll need to get the San Francisco location first.

In [17]:
address = 'San Francisco, CA'

geolocator = Nominatim(user_agent="My_IBM_Week_4_Submission")
sanfran_location = geolocator.geocode(address,country_codes="us",exactly_one=True,viewbox=[(37.82,-122.53),(37.68,-122.31)])
sanfran_latitude = sanfran_location.latitude
sanfran_longitude = sanfran_location.longitude
print('The geograpical coordinate of San Francisco, CA are {}, {}.'.format(sanfran_latitude, sanfran_longitude))

The geograpical coordinate of San Francisco, CA are 37.7790262, -122.4199061.


## Creating folium map of San Francisco, CA

In [18]:
# create map of San Francisco using latitude and longitude values
map_sanfran = folium.Map(location=[sanfran_latitude, sanfran_longitude], zoom_start=12)

# add markers to map
for lat, lng, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    if pd.isna(lat)==False:
        label = '{}'.format(neighborhood)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            
        parse_html=False).add_to(map_sanfran)  
    
map_sanfran

## Time to explore the San Francisco neighborhoods

Borrowing function from earlier labs to find venues for all neighborhoods in San Francisco

In [19]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
                   
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Use above defined function and run it for all neighborhoods in San Francisco and put it to *san_fran_venues*

In [20]:
print(df.shape)
san_fran_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

(109, 3)
Alamo Square
Anza Vista
Balboa Park
Balboa Terrace
Bayview
Belden Place
Bernal Heights
Buena Vista
Castro
Cathedral Hill
Cayuga Terrace
China Basin
Chinatown
Civic Center
Cole Valley
Corona Heights
Cow Hollow
Crocker-Amazon
Design District
Diamond Heights
Dogpatch
Dolores Heights
Duboce Triangle
Embarcadero
Eureka Valley
Excelsior
Fillmore
Financial District
Financial District South
Fisherman's Wharf
Forest Hill
Forest Knolls
Glen Park
Haight-Ashbury
Hayes Valley
Hunters Point
India Basin
Ingleside
Inner Sunset
Irish Hill
Islais Creek
Jackson Square
Japantown
Jordan Park
Laguna Honda
Lake Street
Lakeside
Lakeshore
Laurel Heights
Lincoln Manor
Little Hollywood
Little Russia
Little Saigon
Lone Mountain
Lower Haight
Lower Pacific Heights
Marina District
Merced Heights
Merced Manor
Midtown Terrace
Mid-Market
Miraloma Park
Mission Bay
Mission District
Mission Dolores
Mission Terrace
Mount Davidson
Nob Hill
Noe Valley
North Beach
North of Panhandle
Oceanview
Outer Mission
Pacific He

Checking the size of the resulting dataframe

In [52]:
print(san_fran_venues.shape)
san_fran_venues.head()

(4947, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Alamo Square,37.77636,-122.434689,Alamo Square,37.776045,-122.434363,Park
1,Alamo Square,37.77636,-122.434689,Alamo Square Dog Park,37.775878,-122.43574,Dog Run
2,Alamo Square,37.77636,-122.434689,Painted Ladies,37.77612,-122.433389,Historic Site
3,Alamo Square,37.77636,-122.434689,The Independent,37.775573,-122.437835,Rock Club
4,Alamo Square,37.77636,-122.434689,The Mill,37.776425,-122.43797,Bakery


Group the venues by neighborhood to check how many venues are found per neighborhood

In [51]:
print(san_fran_venues.groupby('Neighborhood').count().shape)
san_fran_venues.groupby('Neighborhood').count().head()

(109, 6)


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alamo Square,74,74,74,74,74,74
Anza Vista,19,19,19,19,19,19
Balboa Park,18,18,18,18,18,18
Balboa Terrace,18,18,18,18,18,18
Bayview,15,15,15,15,15,15


Investigate how many unique categories there are from all returned values

In [23]:
print('There are {} uniques categories.'.format(len(san_fran_venues['Venue Category'].unique())))

There are 342 uniques categories.


## Adding Richmond into dataframe before clustering

Now that we have the venues data for San Francisco, let's append the data from Richmond before clustering.

In [25]:
san_fran_rich_venues = nearby_venues.append(san_fran_venues,ignore_index=True)
san_fran_rich_venues.head()

Unnamed: 0,Venue Category,Venue Latitude,Venue Longitude,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue
0,Coffee Shop,51.46359,-0.301869,Richmond,51.462311,-0.302786,Butter Beans
1,Coffee Shop,51.460919,-0.30423,Richmond,51.462311,-0.302786,Kiss the Hippo Coffee
2,Park,51.46125,-0.305918,Richmond,51.462311,-0.302786,Richmond Green
3,Theater,51.46213,-0.304009,Richmond,51.462311,-0.302786,Richmond Theatre
4,Cycle Studio,51.461212,-0.301726,Richmond,51.462311,-0.302786,Digme Fitness


## Prepping data for clustering through one-hot encoding

Analyzing each neighborhood through one hot encoding

In [26]:
# one hot encoding
san_fran_onehot = pd.get_dummies(san_fran_rich_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
san_fran_onehot['Neighborhood'] = san_fran_rich_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = list(san_fran_onehot)
fixed_columns.insert(0,fixed_columns.pop(fixed_columns.index('Neighborhood')))
san_fran_onehot = san_fran_onehot[fixed_columns]

san_fran_onehot.head()

Unnamed: 0,Neighborhood,Acai House,Accessories Store,Acupuncturist,Adult Boutique,Alternative Healer,American Restaurant,Animal Shelter,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Bath House,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Bowling Green,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burmese Restaurant,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cable Car,Cafeteria,Café,Cambodian Restaurant,Camera Store,Candy Store,Cantonese Restaurant,Car Wash,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,College Gym,Comedy Club,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Credit Union,Creperie,Cuban Restaurant,Cultural Center,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dive Shop,Doctor's Office,Dog Run,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant,Event Space,Eye Doctor,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Football Stadium,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Herbs & Spices Store,Hill,Historic Site,History Museum,Hobby Shop,Home Service,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Hunan Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Intersection,Irish Pub,Island,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Jiangsu Restaurant,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Library,Light Rail Station,Lingerie Store,Liquor Store,Locksmith,Lounge,Marijuana Dispensary,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Middle School,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Motel,Mountain,Movie Theater,Moving Target,Museum,Music School,Music Store,Music Venue,Nabe Restaurant,Nail Salon,National Park,New American Restaurant,Newsstand,Nightclub,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Organic Grocery,Outdoor Sculpture,Pakistani Restaurant,Paper / Office Supplies Store,Park,Parking,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Photography Lab,Physical Therapist,Pier,Piercing Parlor,Pilates Studio,Pizza Place,Platform,Playground,Plaza,Poke Place,Pool,Pool Hall,Pop-Up Shop,Portuguese Restaurant,Pub,Public Art,Public Bathroom,Ramen Restaurant,Record Shop,Recording Studio,Recreation Center,Rental Car Location,Rental Service,Reservoir,Residential Building (Apartment / Condo),Resort,Restaurant,Road,Rock Club,Roller Rink,Roof Deck,Russian Restaurant,Salad Place,Salon / Barbershop,Salvadoran Restaurant,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Soccer Field,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Stationery Store,Steakhouse,Street Art,Street Food Gathering,Student Center,Supermarket,Supplement Shop,Sushi Restaurant,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Thrift / Vintage Store,Tiki Bar,Tour Provider,Tourist Information Center,Toy / Game Store,Track,Track Stadium,Trade School,Trail,Train Station,Trattoria/Osteria,Travel Agency,Tree,Tunnel,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Vineyard,Warehouse,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio
0,Richmond,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Richmond,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Richmond,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Richmond,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Richmond,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [27]:
san_fran_onehot.tail()

Unnamed: 0,Neighborhood,Acai House,Accessories Store,Acupuncturist,Adult Boutique,Alternative Healer,American Restaurant,Animal Shelter,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Bath House,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Bowling Green,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burmese Restaurant,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cable Car,Cafeteria,Café,Cambodian Restaurant,Camera Store,Candy Store,Cantonese Restaurant,Car Wash,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,College Gym,Comedy Club,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Credit Union,Creperie,Cuban Restaurant,Cultural Center,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dive Shop,Doctor's Office,Dog Run,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant,Event Space,Eye Doctor,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Football Stadium,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Herbs & Spices Store,Hill,Historic Site,History Museum,Hobby Shop,Home Service,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Hunan Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Intersection,Irish Pub,Island,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Jiangsu Restaurant,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Library,Light Rail Station,Lingerie Store,Liquor Store,Locksmith,Lounge,Marijuana Dispensary,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Middle School,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Motel,Mountain,Movie Theater,Moving Target,Museum,Music School,Music Store,Music Venue,Nabe Restaurant,Nail Salon,National Park,New American Restaurant,Newsstand,Nightclub,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Organic Grocery,Outdoor Sculpture,Pakistani Restaurant,Paper / Office Supplies Store,Park,Parking,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Photography Lab,Physical Therapist,Pier,Piercing Parlor,Pilates Studio,Pizza Place,Platform,Playground,Plaza,Poke Place,Pool,Pool Hall,Pop-Up Shop,Portuguese Restaurant,Pub,Public Art,Public Bathroom,Ramen Restaurant,Record Shop,Recording Studio,Recreation Center,Rental Car Location,Rental Service,Reservoir,Residential Building (Apartment / Condo),Resort,Restaurant,Road,Rock Club,Roller Rink,Roof Deck,Russian Restaurant,Salad Place,Salon / Barbershop,Salvadoran Restaurant,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Soccer Field,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Stationery Store,Steakhouse,Street Art,Street Food Gathering,Student Center,Supermarket,Supplement Shop,Sushi Restaurant,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Thrift / Vintage Store,Tiki Bar,Tour Provider,Tourist Information Center,Toy / Game Store,Track,Track Stadium,Trade School,Trail,Train Station,Trattoria/Osteria,Travel Agency,Tree,Tunnel,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Vineyard,Warehouse,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio
5042,Yerba Buena,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5043,Yerba Buena,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5044,Yerba Buena,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5045,Yerba Buena,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5046,Yerba Buena,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Now let's group them by the neighborhood to get an idea of the frequency of venue occurences in each category

In [28]:
san_fran_grouped = san_fran_onehot.groupby('Neighborhood').mean().reset_index()
san_fran_grouped.head()

Unnamed: 0,Neighborhood,Acai House,Accessories Store,Acupuncturist,Adult Boutique,Alternative Healer,American Restaurant,Animal Shelter,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Bath House,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Bowling Green,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burmese Restaurant,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cable Car,Cafeteria,Café,Cambodian Restaurant,Camera Store,Candy Store,Cantonese Restaurant,Car Wash,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,College Gym,Comedy Club,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Credit Union,Creperie,Cuban Restaurant,Cultural Center,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dive Shop,Doctor's Office,Dog Run,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Ethiopian Restaurant,Event Space,Eye Doctor,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Football Stadium,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Herbs & Spices Store,Hill,Historic Site,History Museum,Hobby Shop,Home Service,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Hunan Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Intersection,Irish Pub,Island,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Jiangsu Restaurant,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Library,Light Rail Station,Lingerie Store,Liquor Store,Locksmith,Lounge,Marijuana Dispensary,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Middle School,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Motel,Mountain,Movie Theater,Moving Target,Museum,Music School,Music Store,Music Venue,Nabe Restaurant,Nail Salon,National Park,New American Restaurant,Newsstand,Nightclub,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Organic Grocery,Outdoor Sculpture,Pakistani Restaurant,Paper / Office Supplies Store,Park,Parking,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Photography Lab,Physical Therapist,Pier,Piercing Parlor,Pilates Studio,Pizza Place,Platform,Playground,Plaza,Poke Place,Pool,Pool Hall,Pop-Up Shop,Portuguese Restaurant,Pub,Public Art,Public Bathroom,Ramen Restaurant,Record Shop,Recording Studio,Recreation Center,Rental Car Location,Rental Service,Reservoir,Residential Building (Apartment / Condo),Resort,Restaurant,Road,Rock Club,Roller Rink,Roof Deck,Russian Restaurant,Salad Place,Salon / Barbershop,Salvadoran Restaurant,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Soccer Field,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Stationery Store,Steakhouse,Street Art,Street Food Gathering,Student Center,Supermarket,Supplement Shop,Sushi Restaurant,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Thrift / Vintage Store,Tiki Bar,Tour Provider,Tourist Information Center,Toy / Game Store,Track,Track Stadium,Trade School,Trail,Train Station,Trattoria/Osteria,Travel Agency,Tree,Tunnel,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Vineyard,Warehouse,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio
0,Alamo Square,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.013514,0.013514,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.013514,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.013514,0.0,0.0,0.0,0.013514,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.013514,0.0,0.0,0.0,0.013514,0.013514,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.013514,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.013514,0.013514,0.013514,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.013514,0.013514,0.0,0.0,0.0,0.013514,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.013514,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.013514,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.013514,0.0,0.013514,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.013514
1,Anza Vista,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.157895,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Balboa Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.055556,0.055556,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Balboa Terrace,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.055556,0.055556,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bayview,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Let's check the shape of the dataframe

In [29]:
san_fran_grouped.shape

(110, 345)

Borrowing function to rate the top location for each neighborhood

In [30]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now to create a dataframe of neighborhoods with the top 15 most common venue categories

In [31]:
num_top_venues = 15

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
# match new dataframe 'Neighborhood' column with the san_fran_grouped dataframe
neighborhoods_venues_sorted['Neighborhood'] = san_fran_grouped['Neighborhood']

for ind in np.arange(san_fran_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(san_fran_grouped.iloc[ind, :], num_top_venues)

print(neighborhoods_venues_sorted.shape)
neighborhoods_venues_sorted.head()

(110, 16)


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,Alamo Square,Bar,BBQ Joint,Record Shop,Pizza Place,Seafood Restaurant,Sushi Restaurant,Liquor Store,Café,Park,Wine Bar,Hotel,Ethiopian Restaurant,Diner,Pharmacy,Pet Store
1,Anza Vista,Café,Donut Shop,Southern / Soul Food Restaurant,Mexican Restaurant,Big Box Store,Burger Joint,Grocery Store,Coffee Shop,Sandwich Place,Tunnel,Liquor Store,Arts & Crafts Store,Health & Beauty Service,Cosmetics Shop,Juice Bar
2,Balboa Park,Baseball Field,Café,Asian Restaurant,Bus Station,Bus Stop,Skate Park,Flower Shop,Filipino Restaurant,College Gym,BBQ Joint,Breakfast Spot,Burger Joint,Vietnamese Restaurant,Dessert Shop,Light Rail Station
3,Balboa Terrace,Baseball Field,Café,Asian Restaurant,Bus Station,Bus Stop,Skate Park,Flower Shop,Filipino Restaurant,College Gym,BBQ Joint,Breakfast Spot,Burger Joint,Vietnamese Restaurant,Dessert Shop,Light Rail Station
4,Bayview,Southern / Soul Food Restaurant,Bakery,Dance Studio,Pool,Café,Gym,Pharmacy,Mexican Restaurant,Home Service,Coffee Shop,Dumpling Restaurant,Piercing Parlor,Light Rail Station,Fast Food Restaurant,Field


## Now time to cluster them

Run k-means to cluster the neighborhood into 5 clusters.

In [33]:
# set number of clusters
kclusters = 5

san_fran_grouped_clustering = san_fran_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(san_fran_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:100] 

array([1, 1, 4, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 0, 2, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1,
       0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 0, 2, 0, 1, 1, 1, 1,
       0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
       0, 0, 1, 2, 1, 1, 1, 0, 1, 1, 1, 1])

Now lets combine the cluster labels with the top 15 venues

In [34]:
# add clustering labels
try:
    neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
except: 
    neighborhoods_venues_sorted
neighborhoods_venues_sorted

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,1,Alamo Square,Bar,BBQ Joint,Record Shop,Pizza Place,Seafood Restaurant,Sushi Restaurant,Liquor Store,Café,Park,Wine Bar,Hotel,Ethiopian Restaurant,Diner,Pharmacy,Pet Store
1,1,Anza Vista,Café,Donut Shop,Southern / Soul Food Restaurant,Mexican Restaurant,Big Box Store,Burger Joint,Grocery Store,Coffee Shop,Sandwich Place,Tunnel,Liquor Store,Arts & Crafts Store,Health & Beauty Service,Cosmetics Shop,Juice Bar
2,4,Balboa Park,Baseball Field,Café,Asian Restaurant,Bus Station,Bus Stop,Skate Park,Flower Shop,Filipino Restaurant,College Gym,BBQ Joint,Breakfast Spot,Burger Joint,Vietnamese Restaurant,Dessert Shop,Light Rail Station
3,4,Balboa Terrace,Baseball Field,Café,Asian Restaurant,Bus Station,Bus Stop,Skate Park,Flower Shop,Filipino Restaurant,College Gym,BBQ Joint,Breakfast Spot,Burger Joint,Vietnamese Restaurant,Dessert Shop,Light Rail Station
4,1,Bayview,Southern / Soul Food Restaurant,Bakery,Dance Studio,Pool,Café,Gym,Pharmacy,Mexican Restaurant,Home Service,Coffee Shop,Dumpling Restaurant,Piercing Parlor,Light Rail Station,Fast Food Restaurant,Field
5,1,Belden Place,Coffee Shop,Café,Food Truck,Hotel,Boutique,Sandwich Place,Sushi Restaurant,Gym / Fitness Center,Italian Restaurant,Bubble Tea Shop,American Restaurant,Men's Store,Clothing Store,Vegetarian / Vegan Restaurant,Japanese Restaurant
6,1,Bernal Heights,Coffee Shop,Playground,Italian Restaurant,Mexican Restaurant,Bakery,Pizza Place,Peruvian Restaurant,Park,Cocktail Bar,Food Truck,Gourmet Shop,Grocery Store,Trail,Yoga Studio,Vietnamese Restaurant
7,1,Buena Vista,Historic Site,Bike Rental / Bike Share,Ice Cream Shop,Chocolate Shop,Tour Provider,Harbor / Marina,Park,Seafood Restaurant,Pharmacy,New American Restaurant,Clothing Store,Beach,Gym / Fitness Center,Fast Food Restaurant,Diner
8,1,Castro,Gay Bar,Coffee Shop,Thai Restaurant,Yoga Studio,Indian Restaurant,Intersection,Gym,Mediterranean Restaurant,New American Restaurant,Pet Store,Deli / Bodega,Playground,Convenience Store,Clothing Store,Seafood Restaurant
9,1,Cathedral Hill,Hotel,Café,Italian Restaurant,American Restaurant,Spa,Bar,Cocktail Bar,Beer Bar,Sushi Restaurant,Grocery Store,Breakfast Spot,Coffee Shop,Yoga Studio,Thai Restaurant,Convenience Store


Now let's find out which cluster is Richmond in

In [35]:
neighborhoods_venues_sorted[neighborhoods_venues_sorted['Neighborhood']=='Richmond']

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
82,1,Richmond,Pub,Italian Restaurant,Coffee Shop,Café,Bakery,Grocery Store,Restaurant,Movie Theater,Sushi Restaurant,Clothing Store,French Restaurant,Burger Joint,Sandwich Place,Theater,Thai Restaurant


Let's filter out the relevant cluster neighborhoods and drop Richmond

In [53]:
cluster_no = neighborhoods_venues_sorted.loc[neighborhoods_venues_sorted['Neighborhood']=='Richmond',
                                             ['Cluster Labels']].values[0][0]

In [54]:
cluster_neigh = neighborhoods_venues_sorted[neighborhoods_venues_sorted['Cluster Labels']==cluster_no]
cluster_neigh = cluster_neigh[cluster_neigh.Neighborhood != 'Richmond']
cluster_neigh.head()

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,1,Alamo Square,Bar,BBQ Joint,Record Shop,Pizza Place,Seafood Restaurant,Sushi Restaurant,Liquor Store,Café,Park,Wine Bar,Hotel,Ethiopian Restaurant,Diner,Pharmacy,Pet Store
1,1,Anza Vista,Café,Donut Shop,Southern / Soul Food Restaurant,Mexican Restaurant,Big Box Store,Burger Joint,Grocery Store,Coffee Shop,Sandwich Place,Tunnel,Liquor Store,Arts & Crafts Store,Health & Beauty Service,Cosmetics Shop,Juice Bar
4,1,Bayview,Southern / Soul Food Restaurant,Bakery,Dance Studio,Pool,Café,Gym,Pharmacy,Mexican Restaurant,Home Service,Coffee Shop,Dumpling Restaurant,Piercing Parlor,Light Rail Station,Fast Food Restaurant,Field
5,1,Belden Place,Coffee Shop,Café,Food Truck,Hotel,Boutique,Sandwich Place,Sushi Restaurant,Gym / Fitness Center,Italian Restaurant,Bubble Tea Shop,American Restaurant,Men's Store,Clothing Store,Vegetarian / Vegan Restaurant,Japanese Restaurant
6,1,Bernal Heights,Coffee Shop,Playground,Italian Restaurant,Mexican Restaurant,Bakery,Pizza Place,Peruvian Restaurant,Park,Cocktail Bar,Food Truck,Gourmet Shop,Grocery Store,Trail,Yoga Studio,Vietnamese Restaurant


Let's merge cluster with San Francisco dataframe with geo data

In [56]:
# merge cluster with df data to add latitude/longitude for each neighborhood
cluster_merged = df.join(cluster_neigh.set_index('Neighborhood'), on='Neighborhood',how='right')
cluster_merged.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,Alamo Square,37.77636,-122.434689,1,Bar,BBQ Joint,Record Shop,Pizza Place,Seafood Restaurant,Sushi Restaurant,Liquor Store,Café,Park,Wine Bar,Hotel,Ethiopian Restaurant,Diner,Pharmacy,Pet Store
1,Anza Vista,37.780836,-122.443149,1,Café,Donut Shop,Southern / Soul Food Restaurant,Mexican Restaurant,Big Box Store,Burger Joint,Grocery Store,Coffee Shop,Sandwich Place,Tunnel,Liquor Store,Arts & Crafts Store,Health & Beauty Service,Cosmetics Shop,Juice Bar
5,Bayview,37.728889,-122.3925,1,Southern / Soul Food Restaurant,Bakery,Dance Studio,Pool,Café,Gym,Pharmacy,Mexican Restaurant,Home Service,Coffee Shop,Dumpling Restaurant,Piercing Parlor,Light Rail Station,Fast Food Restaurant,Field
6,Belden Place,37.791744,-122.403886,1,Coffee Shop,Café,Food Truck,Hotel,Boutique,Sandwich Place,Sushi Restaurant,Gym / Fitness Center,Italian Restaurant,Bubble Tea Shop,American Restaurant,Men's Store,Clothing Store,Vegetarian / Vegan Restaurant,Japanese Restaurant
7,Bernal Heights,37.742986,-122.415804,1,Coffee Shop,Playground,Italian Restaurant,Mexican Restaurant,Bakery,Pizza Place,Peruvian Restaurant,Park,Cocktail Bar,Food Truck,Gourmet Shop,Grocery Store,Trail,Yoga Studio,Vietnamese Restaurant


Checking to make sure the merge is correct

In [57]:
if (cluster_neigh.shape[0] == cluster_merged.shape[0])==True: print('Merge is correct')

Merge is correct


## Calculating distance between each neighborhood to Microsoft Headquarters

Now let's get the latitude and longitudes for the Microsoft Headquarters in San Francisco (555 California St 200, San Francisco, CA 94104, United States)

In [58]:
microsoft_address = '555 California St, San Francisco, CA 94104, United States'

geolocator = Nominatim(user_agent="My_IBM_Week_4_Submission")
microsoft_location = geolocator.geocode(microsoft_address,country_codes="us",
                                        exactly_one=True,viewbox=[(37.82,-122.53),(37.68,-122.31)])
microsoft_latitude = microsoft_location.latitude
microsoft_longitude = microsoft_location.longitude
print('The geograpical coordinate of 555 California St 200, San Francisco, CA 94104, \
United States are {}, {}.'.format(microsoft_latitude, microsoft_longitude))

The geograpical coordinate of 555 California St 200, San Francisco, CA 94104, United States are 37.792548350000004, -122.4042699625.


To calculate distance between two points, I will use geopy's distance class. Below is an example using Alamo Square.

In [59]:
# first lets put the microsoft geodata in a tuple
microsoft_lat_long = (microsoft_latitude, microsoft_longitude)
print('Microsoft lat-long is', microsoft_lat_long)

# now let put Alamo Square geodata in a tuple
alamo_lat_long = (cluster_merged.at[0,'Latitude'],cluster_merged.at[0,'Longitude'])
print('Alamo Square lat-long is', alamo_lat_long)

print('Distance between Microsoft Headquarters and Alamo Square is',distance.distance(microsoft_lat_long,alamo_lat_long).miles,
     'miles')

Microsoft lat-long is (37.792548350000004, -122.4042699625)
Alamo Square lat-long is (37.77635985, -122.43468852023723)
Distance between Microsoft Headquarters and Alamo Square is 2.0046723837352007 miles


Now let's do that for neighborhood and add it to the cluster_merged dataframe

In [63]:
# reset index on cluster_merged
cluster_merged.reset_index(drop=True,inplace=True)
cluster_merged['Distance from Microsoft'] = None
# iterate across each row, calculate distance and add data to each row
for neigh, lat, long in zip(cluster_merged.Neighborhood, cluster_merged.Latitude,cluster_merged.Longitude):
    neigh_lat_long = (lat,long)
    neigh_distance = distance.distance(microsoft_lat_long,neigh_lat_long).miles
    cluster_merged.loc[cluster_merged['Neighborhood']==neigh,['Distance from Microsoft']] = neigh_distance
cluster_merged.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,Distance from Microsoft
0,Alamo Square,37.77636,-122.434689,1,Bar,BBQ Joint,Record Shop,Pizza Place,Seafood Restaurant,Sushi Restaurant,Liquor Store,Café,Park,Wine Bar,Hotel,Ethiopian Restaurant,Diner,Pharmacy,Pet Store,2.00467
1,Anza Vista,37.780836,-122.443149,1,Café,Donut Shop,Southern / Soul Food Restaurant,Mexican Restaurant,Big Box Store,Burger Joint,Grocery Store,Coffee Shop,Sandwich Place,Tunnel,Liquor Store,Arts & Crafts Store,Health & Beauty Service,Cosmetics Shop,Juice Bar,2.27615
2,Bayview,37.728889,-122.3925,1,Southern / Soul Food Restaurant,Bakery,Dance Studio,Pool,Café,Gym,Pharmacy,Mexican Restaurant,Home Service,Coffee Shop,Dumpling Restaurant,Piercing Parlor,Light Rail Station,Fast Food Restaurant,Field,4.43746
3,Belden Place,37.791744,-122.403886,1,Coffee Shop,Café,Food Truck,Hotel,Boutique,Sandwich Place,Sushi Restaurant,Gym / Fitness Center,Italian Restaurant,Bubble Tea Shop,American Restaurant,Men's Store,Clothing Store,Vegetarian / Vegan Restaurant,Japanese Restaurant,0.059319
4,Bernal Heights,37.742986,-122.415804,1,Coffee Shop,Playground,Italian Restaurant,Mexican Restaurant,Bakery,Pizza Place,Peruvian Restaurant,Park,Cocktail Bar,Food Truck,Gourmet Shop,Grocery Store,Trail,Yoga Studio,Vietnamese Restaurant,3.47601


Now let's show the dataframe in fewer columns

In [64]:
# reduce the columns in the dataframe
fixed_columns = ['Neighborhood','Latitude','Longitude','Distance from Microsoft']
cluster_fixed = cluster_merged[fixed_columns]
# round the distance column
cluster_fixed['Distance from Microsoft'] = cluster_fixed['Distance from Microsoft'].apply(lambda x: round(x,2))
cluster_fixed.head(15)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


Unnamed: 0,Neighborhood,Latitude,Longitude,Distance from Microsoft
0,Alamo Square,37.77636,-122.434689,2.0
1,Anza Vista,37.780836,-122.443149,2.28
2,Bayview,37.728889,-122.3925,4.44
3,Belden Place,37.791744,-122.403886,0.06
4,Bernal Heights,37.742986,-122.415804,3.48
5,Buena Vista,37.806532,-122.420649,1.32
6,Castro,37.760856,-122.434957,2.76
7,Cathedral Hill,37.79182,-122.413495,0.51
8,Cayuga Terrace,37.730297,-122.432929,4.57
9,China Basin,37.776329,-122.391839,1.31


Now let's sort by distance from Microsoft

In [65]:
cluster_fixed.sort_values(by='Distance from Microsoft', ascending=True)

Unnamed: 0,Neighborhood,Latitude,Longitude,Distance from Microsoft
3,Belden Place,37.791744,-122.403886,0.06
10,Chinatown,37.794301,-122.406376,0.17
24,Financial District South,37.793647,-122.398938,0.3
23,Financial District,37.793647,-122.398938,0.3
34,Jackson Square,37.795147,-122.409798,0.35
80,Union Square,37.787936,-122.407517,0.36
19,Embarcadero,37.792864,-122.396912,0.4
7,Cathedral Hill,37.79182,-122.413495,0.51
77,Telegraph Hill,37.800785,-122.404091,0.57
54,Nob Hill,37.793262,-122.415249,0.6


## Using Folium let's visualise the data above

In [66]:
# recreate folium map
cluster_map = folium.Map(location=[sanfran_latitude, sanfran_longitude], zoom_start=12)

# add a red circle marker to represent the Microsoft Headquarters
folium.vector_layers.CircleMarker(
    [microsoft_latitude, microsoft_longitude],
    radius=8,
    color='red',
    popup='Microsoft Headquarters',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(cluster_map)

# add neighborhoods in the cluster as blue circle markers
for lat, lng, label, label2 in zip(cluster_fixed.Latitude, cluster_fixed.Longitude,
                                   cluster_fixed.Neighborhood, cluster_fixed['Distance from Microsoft']):
    folium.vector_layers.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=[label, str(label2) + ' miles away'],
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(cluster_map)

cluster_map

The person now has a list of neighborhoods that is in the same cluster as where he is staying in Richmond. In addition, it's been listed in order of distance from the Microsoft Headquarters. Further work can be done to investigate availability and price of houses in those neighborhoods that would satisfy the person's need as part of the move.