<h1 align=Left><font size = 5>Segmenting and Clustering Neighborhoods in San Francisco - Housing and Community Development</font></h1>

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1.  <a href="#item1">Download and Explore Dataset</a>

2.  <a href="#item2">Explore Neighborhoods in New York City</a>

3.  <a href="#item3">Analyze Each Neighborhood</a>

4.  <a href="#item4">Cluster Neighborhoods</a>

5.  <a href="#item5">Examine Clusters</a>  
    </font>
    </div>


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from unicodedata import normalize
!pip install folium
import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

import requests # library to handle requests

print('Libraries imported.')

Collecting folium
  Downloading folium-0.11.0-py2.py3-none-any.whl (93 kB)
[K     |████████████████████████████████| 93 kB 3.1 MB/s  eta 0:00:01
[?25hCollecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.11.0
Libraries imported.


In [2]:
!wget -q -O 'sf_housing_data.json' https://data.sfgov.org/resource/9rdx-httc.geojson
print('Data downloaded!')

Data downloaded!


In [3]:
with open('sf_housing_data.json') as json_data:
    sf_data = json.load(json_data)

sf_data

neighborhoods_data = sf_data['features']
#neighborhoods_data[0]
# define the dataframe columns
column_names = [ 'Neighborhood', 'Project_Name', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
neighborhoods

for data in neighborhoods_data:
   # borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['neighborhood']
    project_name = data['properties']['project_name']
        
   # neighborhood_latlon = data['properties']['latitude']
    neighborhood_lat = data['properties']['latitude']
    neighborhood_lon = data['properties']['longitude']
    
    neighborhoods = neighborhoods.append({'Neighborhood': neighborhood_name,
                                          'Project_Name': project_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
    
neighborhoods
#neighborhoods.describe()
#neighborhoods.head(10)

Unnamed: 0,Neighborhood,Project_Name,Latitude,Longitude
0,Confidential,SafeHouse,,
1,Tenderloin,William Penn Hotel,37.78,-122.4102631
2,Noe Valley,60 28th Street,37.75,-122.4234398
3,Bayview Hunters Point,Geraldine Johnson Manor,37.73,-122.3931198
4,South of Market,1028 Howard,37.78,-122.4079208
...,...,...,...,...
371,Tenderloin,990 Polk,37.79,-122.4193802
372,Lone Mountain/USF,The Coronet,37.78,-122.4579163
373,Mission,3353 26th Street - Small Sites,37.75,-122.4171635
374,Mission,Mission Hotel,37.76,-122.4178848


In [4]:
neighborhoods['Latitude'].replace("None", np.nan, inplace = True)
neighborhoods.dropna(subset=["Latitude"], axis=0, inplace=True)
# reset index, because we droped two rows
neighborhoods.reset_index(drop=True, inplace=True)
neighborhoods
neighborhoods.describe()

Unnamed: 0,Neighborhood,Project_Name,Latitude,Longitude
count,371,371,371.0,371.0
unique,36,371,13.0,356.0
top,Tenderloin,Dalt Hotel,37.78,-122.4223862
freq,70,1,147.0,2.0


In [5]:
print('The dataframe has {} neighborhoods and {} different projects.'.format(
        len(neighborhoods['Neighborhood'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 36 neighborhoods and 371 different projects.


In [6]:
address = 'San Francisco, CA'

geolocator = Nominatim(user_agent="sf_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of San Francisco are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of San Francisco are 37.7790262, -122.4199061.


In [7]:
# create map of New York using latitude and longitude values
map_sf = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood, project_name in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Neighborhood'], neighborhoods['Project_Name']):
    label = '{}, {}'.format(project_name, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_sf)  
    
map_sf

In [8]:
neighborhoods['Neighborhood'].unique()

array(['Tenderloin', 'Noe Valley', 'Bayview Hunters Point',
       'South of Market', 'Mission', 'Lone Mountain/USF',
       'Western Addition', 'Japantown', 'Bernal Heights', 'Chinatown',
       'Hayes Valley', 'Potrero Hill', 'Treasure Island', 'North Beach',
       'Financial District/South Beach', 'Outer Richmond', 'Glen Park',
       'Haight Ashbury', 'Nob Hill', 'Mission Bay', 'Castro/Upper Market',
       'Portola', 'Russian Hill', 'Pacific Heights', 'West of Twin Peaks',
       'Twin Peaks', 'Oceanview/Merced/Ingleside', 'Outer Mission',
       'Presidio', 'McLaren Park', 'SOUTH OF MARKET', 'Visitacion Valley',
       'Excelsior', 'Inner Richmond', 'Marina', 'Richmond'], dtype=object)

In [9]:
neighborhood1_data = neighborhoods[neighborhoods['Neighborhood'] == 'Tenderloin'].reset_index(drop=True)
neighborhood1_data.head(20)

Unnamed: 0,Neighborhood,Project_Name,Latitude,Longitude
0,Tenderloin,William Penn Hotel,37.78,-122.4102631
1,Tenderloin,125 Mason Street,37.78,-122.4097443
2,Tenderloin,555 Ellis Street Family Apartments,37.78,-122.4154129
3,Tenderloin,Madonna Residence,37.78,-122.4163818
4,Tenderloin,The Nathan Building,37.78,-122.4134521
5,Tenderloin,Marathon Hotel,37.78,-122.41803
6,Tenderloin,Curran House,37.78,-122.4112244
7,Tenderloin,Yosemite Apartments,37.78,-122.4155045
8,Tenderloin,Maria Manor,37.79,-122.40905
9,Tenderloin,Presentation Senior Community,37.78,-122.411438


In [10]:
address = 'Tenderloin, CA'

geolocator = Nominatim(user_agent="sf_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Tenderloin are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Tenderloin are 37.7842493, -122.4139933.


In [11]:
# create map of Neighborhood using latitude and longitude values
neighborhood1_map = folium.Map(location=[latitude, longitude], zoom_start=16)

# add markers to map
for lat, lng, label in zip(neighborhood1_data['Latitude'], neighborhood1_data['Longitude'], neighborhood1_data['Project_Name']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(neighborhood1_map)  
    
neighborhood1_map

In [12]:
# The code was removed by Watson Studio for sharing.

In [13]:
search_query = 'grocery'
radius = 500

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)
#url

In [14]:
import requests # library to handle requests
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ff49245db73076d0e5c69a9'},
 'notifications': [{'type': 'notificationTray', 'item': {'unreadCount': 0}}],
 'response': {'venues': [{'id': '4c2045cf920076b0b20fc6e9',
    'name': 'G & H Liquor & Grocery',
    'location': {'address': '201 Jones St',
     'crossStreet': 'Turk Street',
     'lat': 37.78324575324384,
     'lng': -122.41246663427593,
     'labeledLatLngs': [{'label': 'display',
       'lat': 37.78324575324384,
       'lng': -122.41246663427593}],
     'distance': 174,
     'postalCode': '94102',
     'cc': 'US',
     'city': 'San Francisco',
     'state': 'CA',
     'country': 'United States',
     'formattedAddress': ['201 Jones St (Turk Street)',
      'San Francisco, CA 94102']},
    'categories': [{'id': '4bf58dd8d48988d186941735',
      'name': 'Liquor Store',
      'pluralName': 'Liquor Stores',
      'shortName': 'Liquor Store',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/shops/food_liquor_',
       'suffix': '

In [15]:
neighborhood_latitude = neighborhood1_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = neighborhood1_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = neighborhood1_data.loc[0, 'Project_Name'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of William Penn Hotel are 37.78, -122.4102631.


In [16]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
#url # display URL

In [17]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ff49245d8b55343b9f2789c'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': '$-$$$$', 'key': 'price'},
    {'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'SoMa',
  'headerFullLocation': 'SoMa, San Francisco',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 71,
  'suggestedBounds': {'ne': {'lat': 37.7845000045, 'lng': -122.4045801810179},
   'sw': {'lat': 37.7754999955, 'lng': -122.41594601898208}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5ae925e7c0f163002cc57349',
       'name': 'Birdsong',
       'location': {'address': '1085 Mission St',
        'crossStreet': '7th',
        'lat': 37.77942453003254,
        'lng': -122.41047319701762,
        'labeledLatLn

In [18]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [19]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues.head(50)

nearby_venues['categories'].unique()
#nearbygroc_data = nearby_venues[nearby_venues['categories'] == '%Bar%'].reset_index(drop=True)
#nearbygroc_data.query('categories.str.contains("Bar")', engine='python')
#nearby_venues.query('categories.str.contains("Bar")', engine='python')
#nearbygroc_data.head()


  app.launch_new_instance()


array(['Restaurant', 'Pizza Place', 'Bakery', 'Dance Studio',
       'Coffee Shop', 'Mediterranean Restaurant', 'Beer Bar',
       'Art Gallery', 'Theater', 'Breakfast Spot',
       'Vietnamese Restaurant', 'Burrito Place', 'Marijuana Dispensary',
       'Bar', 'Hotel', 'Hotel Bar', 'Food Truck', 'Design Studio',
       'Arts & Crafts Store', 'Coworking Space', 'Brewery',
       'Gym / Fitness Center', 'Music Venue', 'Plaza', 'Gym',
       'Sandwich Place', 'Deli / Bodega', 'Burger Joint', 'Cocktail Bar',
       'Japanese Restaurant', 'American Restaurant', 'Burmese Restaurant',
       'South Indian Restaurant', 'Farmers Market', 'Italian Restaurant',
       'Shoe Store', 'Luggage Store', 'Street Food Gathering',
       'Performing Arts Venue', 'Karaoke Bar', 'Mexican Restaurant',
       'Diner', 'Museum', 'Thai Restaurant', 'Café', 'Bank'], dtype=object)

In [20]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

71 venues were returned by Foursquare.


### 2. Explore the Tenderloin, CA


In [21]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name, lat, lng)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Project_Name', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [22]:
tenderloin_venues = getNearbyVenues(names=neighborhood1_data['Project_Name'],
                                   latitudes=neighborhood1_data['Latitude'],
                                   longitudes=neighborhood1_data['Longitude']
                                  )

William Penn Hotel 37.78 -122.4102631
125 Mason Street 37.78 -122.4097443
555 Ellis Street Family Apartments 37.78 -122.4154129
Madonna Residence 37.78 -122.4163818
The Nathan Building 37.78 -122.4134521
Marathon Hotel 37.78 -122.41803
Curran House 37.78 -122.4112244
Yosemite Apartments 37.78 -122.4155045
Maria Manor 37.79 -122.40905
Presentation Senior Community 37.78 -122.411438
Cadillac Hotel 37.78 -122.4139404
Tenderloin Family Housing 37.78 -122.412941
Lyric Hotel 37.78 -122.4120712
Hamlin Hotel 37.78 -122.4138641
Padre Apartments 37.78 -122.4128647
Vera Haile Senior Housing 37.78 -122.4125443
Civic Center Residence 37.78 -122.4127884
Ritz Hotel 37.78 -122.4112473
Cameo Apartments 37.78 -122.4155045
St. Claire Residence 37.79 -122.4128418
Herald Hotel Apartments 37.78 -122.4129181
Hotel Essex 37.78 -122.417511
Eddy & Taylor Family Housing (210 Taylor or 168 Eddy) 37.78 -122.4106615
Tower 737 (aka Post St Towers) 37.79 -122.414032
350 Ellis 37.79 -122.4121244
Larkin St Assisted Car

In [23]:
print(tenderloin_venues.shape)
tenderloin_venues.head()

(5886, 7)


Unnamed: 0,Project_Name,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,William Penn Hotel,37.78,-122.4102631,Birdsong,37.779425,-122.410473,Restaurant
1,William Penn Hotel,37.78,-122.4102631,Square Pie Guys,37.779229,-122.41087,Pizza Place
2,William Penn Hotel,37.78,-122.4102631,Frena Bakery and Cafe,37.7805,-122.40825,Bakery
3,William Penn Hotel,37.78,-122.4102631,Alonzo King LINES Dance Center,37.780116,-122.412187,Dance Studio
4,William Penn Hotel,37.78,-122.4102631,Saint Frank,37.779519,-122.410432,Coffee Shop


In [24]:
tenderloin_venues.groupby('Project_Name').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Project_Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
111 Jones,94,94,94,94,94,94
125 Mason Street,66,66,66,66,66,66
149 Mason Street Apartments,66,66,66,66,66,66
205 Jones,95,95,95,95,95,95
308 Turk Street,75,75,75,75,75,75
...,...,...,...,...,...,...
Turk & Eddy Apartments,86,86,86,86,86,86
Vera Haile Senior Housing,93,93,93,93,93,93
West Hotel,65,65,65,65,65,65
William Penn Hotel,71,71,71,71,71,71


In [25]:
print('There are {} uniques categories.'.format(len(tenderloin_venues['Venue Category'].unique())))

There are 154 uniques categories.


In [26]:
# one hot encoding
tendorloin_onehot = pd.get_dummies(tenderloin_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
tendorloin_onehot['Project_Name'] = tenderloin_venues['Project_Name'] 

# move neighborhood column to the first column
fixed_columns = [tendorloin_onehot.columns[-1]] + list(tendorloin_onehot.columns[:-1])
tendorloin_onehot = tendorloin_onehot[fixed_columns]

tendorloin_onehot.head()
tendorloin_onehot.shape

(5886, 155)

In [27]:
tenderloin_grouped = tendorloin_onehot.groupby('Project_Name').mean().reset_index()
tenderloin_grouped

Unnamed: 0,Project_Name,Adult Boutique,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Bagel Shop,Bakery,Bank,...,Thai Restaurant,Theater,Tiki Bar,Toy / Game Store,Used Bookstore,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,111 Jones,0.000000,0.031915,0.021277,0.010638,0.010638,0.010638,0.0,0.031915,0.010638,...,0.010638,0.031915,0.0,0.0,0.0,0.021277,0.031915,0.000000,0.0,0.0
1,125 Mason Street,0.015152,0.030303,0.030303,0.000000,0.015152,0.000000,0.0,0.045455,0.015152,...,0.015152,0.030303,0.0,0.0,0.0,0.000000,0.045455,0.000000,0.0,0.0
2,149 Mason Street Apartments,0.015152,0.030303,0.030303,0.000000,0.015152,0.000000,0.0,0.045455,0.015152,...,0.015152,0.030303,0.0,0.0,0.0,0.000000,0.045455,0.000000,0.0,0.0
3,205 Jones,0.000000,0.031579,0.021053,0.010526,0.010526,0.010526,0.0,0.031579,0.010526,...,0.010526,0.031579,0.0,0.0,0.0,0.021053,0.031579,0.000000,0.0,0.0
4,308 Turk Street,0.000000,0.013333,0.026667,0.013333,0.026667,0.013333,0.0,0.026667,0.000000,...,0.000000,0.040000,0.0,0.0,0.0,0.026667,0.026667,0.013333,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65,Turk & Eddy Apartments,0.000000,0.046512,0.023256,0.011628,0.011628,0.011628,0.0,0.034884,0.011628,...,0.011628,0.023256,0.0,0.0,0.0,0.023256,0.023256,0.000000,0.0,0.0
66,Vera Haile Senior Housing,0.000000,0.032258,0.021505,0.010753,0.010753,0.010753,0.0,0.032258,0.010753,...,0.010753,0.032258,0.0,0.0,0.0,0.021505,0.032258,0.000000,0.0,0.0
67,West Hotel,0.015385,0.030769,0.030769,0.000000,0.015385,0.000000,0.0,0.046154,0.015385,...,0.015385,0.030769,0.0,0.0,0.0,0.000000,0.046154,0.000000,0.0,0.0
68,William Penn Hotel,0.000000,0.042254,0.028169,0.000000,0.014085,0.000000,0.0,0.042254,0.014085,...,0.014085,0.028169,0.0,0.0,0.0,0.000000,0.042254,0.000000,0.0,0.0


In [28]:
tenderloin_grouped.shape

(70, 155)

In [29]:
num_top_venues = 5

for hood in tenderloin_grouped['Project_Name']:
    print("----"+hood+"----")
    temp = tenderloin_grouped[tenderloin_grouped['Project_Name'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----111 Jones----
                  venue  freq
0           Coffee Shop  0.10
1  Marijuana Dispensary  0.04
2                  Café  0.03
3           Music Venue  0.03
4                Bakery  0.03


----125 Mason Street----
                   venue  freq
0            Coffee Shop  0.11
1  Vietnamese Restaurant  0.05
2                 Bakery  0.05
3     Mexican Restaurant  0.03
4                Theater  0.03


----149 Mason Street Apartments----
                   venue  freq
0            Coffee Shop  0.11
1  Vietnamese Restaurant  0.05
2                 Bakery  0.05
3     Mexican Restaurant  0.03
4                Theater  0.03


----205 Jones----
                  venue  freq
0           Coffee Shop  0.09
1  Marijuana Dispensary  0.04
2                  Café  0.03
3           Music Venue  0.03
4                Bakery  0.03


----308 Turk Street----
         venue  freq
0  Coffee Shop  0.08
1     Beer Bar  0.05
2        Hotel  0.04
3      Theater  0.04
4  Music Venue  0.04


----350 Ell

In [30]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [31]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Project_Name']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Project_Name'] = tenderloin_grouped['Project_Name']

for ind in np.arange(tenderloin_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(tenderloin_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Project_Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,111 Jones,Coffee Shop,Marijuana Dispensary,American Restaurant,Vietnamese Restaurant,Café,Music Venue,Theater,Beer Bar,Bakery,Performing Arts Venue
1,125 Mason Street,Coffee Shop,Vietnamese Restaurant,Bakery,Marijuana Dispensary,Theater,Hotel,Bar,Sandwich Place,Mexican Restaurant,Music Venue
2,149 Mason Street Apartments,Coffee Shop,Vietnamese Restaurant,Bakery,Marijuana Dispensary,Theater,Hotel,Bar,Sandwich Place,Mexican Restaurant,Music Venue
3,205 Jones,Coffee Shop,Marijuana Dispensary,Bakery,Beer Bar,Music Venue,Theater,Café,Vietnamese Restaurant,American Restaurant,Burrito Place
4,308 Turk Street,Coffee Shop,Beer Bar,Café,Theater,Sandwich Place,Music Venue,Hotel,Hotel Bar,Bar,Pizza Place


### 3. Cluster Analysis

In [32]:
# set number of clusters
kclusters = 5

manhattan_grouped_clustering = tenderloin_grouped.drop('Project_Name', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 4, 4, 4, 1, 3, 1, 0, 0, 3], dtype=int32)

In [33]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

tenderloin_merged = neighborhood1_data

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
tenderloin_merged = tenderloin_merged.join(neighborhoods_venues_sorted.set_index('Project_Name'), on='Project_Name')

tenderloin_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Project_Name,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Tenderloin,William Penn Hotel,37.78,-122.4102631,4,Coffee Shop,American Restaurant,Vietnamese Restaurant,Bakery,Sandwich Place,Marijuana Dispensary,Beer Bar,Art Gallery,Pizza Place,Mexican Restaurant
1,Tenderloin,125 Mason Street,37.78,-122.4097443,4,Coffee Shop,Vietnamese Restaurant,Bakery,Marijuana Dispensary,Theater,Hotel,Bar,Sandwich Place,Mexican Restaurant,Music Venue
2,Tenderloin,555 Ellis Street Family Apartments,37.78,-122.4154129,0,Café,Theater,Vietnamese Restaurant,Beer Bar,Coffee Shop,Sandwich Place,Hotel,Vegetarian / Vegan Restaurant,Dance Studio,Indian Restaurant
3,Tenderloin,Madonna Residence,37.78,-122.4163818,0,Vietnamese Restaurant,Beer Bar,Café,Hotel,Vegetarian / Vegan Restaurant,Coffee Shop,Theater,Indian Restaurant,Concert Hall,Sandwich Place
4,Tenderloin,The Nathan Building,37.78,-122.4134521,1,Coffee Shop,Theater,Bakery,Cocktail Bar,Beer Bar,Bar,Marijuana Dispensary,Music Venue,Art Gallery,American Restaurant


In [34]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(tenderloin_merged['Latitude'], tenderloin_merged['Longitude'], tenderloin_merged['Neighborhood'], tenderloin_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [35]:
tenderloin_merged.loc[tenderloin_merged['Cluster Labels'] == 0, tenderloin_merged.columns[[1] + list(range(5, tenderloin_merged.shape[1]))]]

Unnamed: 0,Project_Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,555 Ellis Street Family Apartments,Café,Theater,Vietnamese Restaurant,Beer Bar,Coffee Shop,Sandwich Place,Hotel,Vegetarian / Vegan Restaurant,Dance Studio,Indian Restaurant
3,Madonna Residence,Vietnamese Restaurant,Beer Bar,Café,Hotel,Vegetarian / Vegan Restaurant,Coffee Shop,Theater,Indian Restaurant,Concert Hall,Sandwich Place
5,Marathon Hotel,Vietnamese Restaurant,Hotel,Café,Coffee Shop,Vegetarian / Vegan Restaurant,Theater,Beer Bar,Sandwich Place,Concert Hall,Boutique
7,Yosemite Apartments,Café,Coffee Shop,Vietnamese Restaurant,Beer Bar,Theater,Hotel,Sandwich Place,Vegetarian / Vegan Restaurant,Indian Restaurant,Hotel Bar
18,Cameo Apartments,Café,Coffee Shop,Vietnamese Restaurant,Beer Bar,Theater,Hotel,Sandwich Place,Vegetarian / Vegan Restaurant,Indian Restaurant,Hotel Bar
21,Hotel Essex,Vietnamese Restaurant,Café,Beer Bar,Hotel,Sandwich Place,Theater,Coffee Shop,Vegetarian / Vegan Restaurant,Southern / Soul Food Restaurant,Poke Place
25,Larkin St Assisted Care Program,Vietnamese Restaurant,Café,Hotel,Coffee Shop,Beer Bar,Vegetarian / Vegan Restaurant,Theater,Hotel Bar,Indian Restaurant,Concert Hall
29,735 Ellis,Vietnamese Restaurant,Café,Hotel,Sandwich Place,Coffee Shop,Beer Bar,Theater,Vegetarian / Vegan Restaurant,Indian Restaurant,Southern / Soul Food Restaurant
36,Arnett Watson Apartments,Vietnamese Restaurant,Café,Hotel,Coffee Shop,Vegetarian / Vegan Restaurant,Sandwich Place,Theater,Beer Bar,Park,Indian Restaurant
37,575 Eddy,Vietnamese Restaurant,Café,Hotel,Beer Bar,Coffee Shop,Vegetarian / Vegan Restaurant,Theater,Sandwich Place,Park,Southern / Soul Food Restaurant


### Cluster Cafe & Beer

In [36]:
tenderloin_merged.loc[tenderloin_merged['Cluster Labels'] == 1, tenderloin_merged.columns[[1] + list(range(5, tenderloin_merged.shape[1]))]]

Unnamed: 0,Project_Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,The Nathan Building,Coffee Shop,Theater,Bakery,Cocktail Bar,Beer Bar,Bar,Marijuana Dispensary,Music Venue,Art Gallery,American Restaurant
10,Cadillac Hotel,Coffee Shop,Bar,Cocktail Bar,Theater,Beer Bar,Bakery,Vietnamese Restaurant,Performing Arts Venue,Sandwich Place,Music Venue
11,Tenderloin Family Housing,Coffee Shop,Art Gallery,Theater,Bakery,Vietnamese Restaurant,American Restaurant,Marijuana Dispensary,Cocktail Bar,Beer Bar,Bar
13,Hamlin Hotel,Coffee Shop,Theater,Bar,Bakery,Sandwich Place,Vietnamese Restaurant,Music Venue,Beer Bar,Performing Arts Venue,Cocktail Bar
14,Padre Apartments,Coffee Shop,Theater,Art Gallery,Vietnamese Restaurant,Music Venue,American Restaurant,Marijuana Dispensary,Bakery,Cocktail Bar,Beer Bar
16,Civic Center Residence,Coffee Shop,Art Gallery,Theater,Marijuana Dispensary,Bakery,Vietnamese Restaurant,American Restaurant,Cocktail Bar,Beer Bar,Music Venue
20,Herald Hotel Apartments,Coffee Shop,Theater,Art Gallery,Vietnamese Restaurant,Music Venue,American Restaurant,Marijuana Dispensary,Bakery,Cocktail Bar,Beer Bar
27,Curry Senior Center Apartments,Coffee Shop,Beer Bar,Café,Theater,Sandwich Place,Music Venue,Hotel,Hotel Bar,Bar,Pizza Place
28,375 Eddy,Coffee Shop,Bar,Theater,Bakery,Cocktail Bar,Music Venue,Beer Bar,Performing Arts Venue,Sandwich Place,Vietnamese Restaurant
32,Arlington Hotel,Coffee Shop,Beer Bar,Hotel,Music Venue,Sandwich Place,Vietnamese Restaurant,Theater,Pizza Place,Cocktail Bar,Hotel Bar


In [37]:
tenderloin_merged.loc[tenderloin_merged['Cluster Labels'] == 2, tenderloin_merged.columns[[1] + list(range(5, tenderloin_merged.shape[1]))]]

Unnamed: 0,Project_Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
35,Jordan Apartments,Bar,Café,Grocery Store,Massage Studio,Bakery,Wine Bar,Pet Store,Thai Restaurant,Sushi Restaurant,Art Gallery
41,Hartland Hotel,Bar,Café,Grocery Store,Wine Bar,Massage Studio,Sushi Restaurant,Bakery,Thai Restaurant,Diner,Convenience Store
69,990 Polk,Bar,Grocery Store,Café,Wine Bar,Pet Store,Massage Studio,Bakery,Thai Restaurant,Sushi Restaurant,Yoga Studio


### Cluster Coffee Shop

In [38]:
tenderloin_merged.loc[tenderloin_merged['Cluster Labels'] == 3, tenderloin_merged.columns[[1] + list(range(5, tenderloin_merged.shape[1]))]]

Unnamed: 0,Project_Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Maria Manor,Boutique,Hotel,Clothing Store,Coffee Shop,Bubble Tea Shop,Sushi Restaurant,Cocktail Bar,Jewelry Store,Art Gallery,Optical Shop
19,St. Claire Residence,Hotel,Cocktail Bar,Italian Restaurant,Café,Theater,Grocery Store,Spa,Speakeasy,French Restaurant,Breakfast Spot
23,Tower 737 (aka Post St Towers),Hotel,Italian Restaurant,Café,Cocktail Bar,Bar,Theater,Grocery Store,Speakeasy,American Restaurant,Art Gallery
24,350 Ellis,Hotel,Bar,Café,Speakeasy,Cocktail Bar,Breakfast Spot,Italian Restaurant,French Restaurant,Pizza Place,Kitchen Supply Store
26,O'Farrell Towers,Hotel,Bar,Café,Cocktail Bar,Speakeasy,French Restaurant,Italian Restaurant,Spa,Breakfast Spot,Electronics Store
30,Cecil Williams Glide Community House,Hotel,Bar,Cocktail Bar,Speakeasy,Café,Breakfast Spot,French Restaurant,Italian Restaurant,Pizza Place,Liquor Store
61,450 Ellis St,Hotel,Italian Restaurant,Cocktail Bar,Café,Theater,Breakfast Spot,American Restaurant,Grocery Store,Beer Bar,Spa
62,Geary Courtyard,Hotel,Italian Restaurant,Cocktail Bar,Café,Theater,Breakfast Spot,American Restaurant,Grocery Store,Beer Bar,Spa
68,525 O'Farrell Street,Hotel,Italian Restaurant,Cocktail Bar,Café,Theater,Grocery Store,Beer Bar,Spa,Art Gallery,Breakfast Spot


### Cluster Grocery Store

In [39]:
tenderloin_merged.loc[tenderloin_merged['Cluster Labels'] == 4, tenderloin_merged.columns[[1] + list(range(5, tenderloin_merged.shape[1]))]]

Unnamed: 0,Project_Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,William Penn Hotel,Coffee Shop,American Restaurant,Vietnamese Restaurant,Bakery,Sandwich Place,Marijuana Dispensary,Beer Bar,Art Gallery,Pizza Place,Mexican Restaurant
1,125 Mason Street,Coffee Shop,Vietnamese Restaurant,Bakery,Marijuana Dispensary,Theater,Hotel,Bar,Sandwich Place,Mexican Restaurant,Music Venue
6,Curran House,Coffee Shop,American Restaurant,Marijuana Dispensary,Beer Bar,Bakery,Vietnamese Restaurant,Diner,Mexican Restaurant,Mediterranean Restaurant,Bar
9,Presentation Senior Community,Coffee Shop,American Restaurant,Marijuana Dispensary,Bakery,Beer Bar,Performing Arts Venue,Sandwich Place,Music Venue,Diner,Mexican Restaurant
12,Lyric Hotel,Coffee Shop,Marijuana Dispensary,American Restaurant,Vietnamese Restaurant,Café,Beer Bar,Bakery,Bar,Burrito Place,Music Venue
15,Vera Haile Senior Housing,Coffee Shop,Marijuana Dispensary,American Restaurant,Vietnamese Restaurant,Café,Music Venue,Theater,Beer Bar,Bakery,Performing Arts Venue
17,Ritz Hotel,Coffee Shop,American Restaurant,Marijuana Dispensary,Beer Bar,Bakery,Vietnamese Restaurant,Diner,Mexican Restaurant,Mediterranean Restaurant,Bar
22,Eddy & Taylor Family Housing (210 Taylor or 16...,Coffee Shop,Bakery,American Restaurant,Vietnamese Restaurant,Marijuana Dispensary,Sandwich Place,Beer Bar,Mexican Restaurant,Mediterranean Restaurant,Bar
31,Alexander Residence,Coffee Shop,Marijuana Dispensary,American Restaurant,Beer Bar,Bakery,Mediterranean Restaurant,Café,Performing Arts Venue,Sandwich Place,Music Venue
33,Franciscan Towers,Coffee Shop,American Restaurant,Marijuana Dispensary,Beer Bar,Bakery,Vietnamese Restaurant,Diner,Mexican Restaurant,Mediterranean Restaurant,Bar
