  
    
      
<a id='toc'></a>
<center><h1>The Battle of Neighborhood</font></h1>
Segmenting and Clustering Neighborhoods of San Francisco and Los Angeles  
by Charles C Gomes</center>

----

## Table of Contents
- [Introduction](#introduction)
- [Objectives](#objective)
- [Data](#data)
- [Methodology](#methodology)
    - [Analyze San Francisco](#analyzeSF)
    - [K-mean Cluster San Francisco](#kmeanSF)
    - [Analyze Los Angeles](#analyzeLA)
    - [K-mean Cluster Los Angeles](#kmeanLA)
- [Results](#results)
- [Discussion](#discussion)
- [Conclusion](#conclusion)

<a id='introduction'></a>
# Introduction
San Francisco and Los Angeles are two major cities in California. 

Brief information about both cities:
- San Francisco: officially the City and County of San Francisco, is a city in, and the cultural, commercial, and financial center of, Northern California. San Francisco is the 13th-most populous city in the United States, and the fourth-most populous in California, with 883,305 residents as of 2018.
- Los Angeles:  officially the City of Los Angeles and often known by its initials L.A., is the most populous city in California, the second most populous city in the United States, after New York City, and the third most populous city in North America. With an estimated population of nearly four million,[11] Los Angeles is the cultural, financial, and commercial center of Southern California.

<a id='objective'></a>
# Objective
In this project, we will study in details the area classification using Foursquare data and machine learning segmentation and clustering.
The aim of this project is to segment areas or neighbourhood of San Francisco and Los Angeles based on the most common places captured from Foursquare. 

Using segmentation and clustering, we hope we can determine:
1. the similarity or dissimilarirty of both cities
2. classification of area located inside the city whether it is residential, tourism places, or others

<a id='data'></a>
# Data
The data for neighbourhoods is acquired from following sources for 
Los Angeles - https://data.lacity.org/api/views/nwj3-ufba/rows.csv?accessType=DOWNLOAD
San Francisco  - https://data.sfgov.org/api/views/xfcw-9evu/rows.csv?accessType=DOWNLOAD

We will utilize Foursquare data api for getting different kind of venues for segmentation and clustering.

In [1]:
from requests import get
from requests.exceptions import RequestException
from contextlib import closing
from bs4 import BeautifulSoup

In [2]:
import pandas as pd

In [3]:
#Lets download the San Francisco Data
sfneighdata = pd.read_csv("https://github.com/charles2588/Coursera_Capstone/raw/master/SFNeighbourhood.csv")

In [230]:
sfneighdata.shape

(21, 3)

In [5]:
#For Los Angeles, we need to scrap the data and use BeautifulSoup to get the dataframe that we need.
raw_html = get('http://www.laalmanac.com/communications/cm02_communities.php').text

In [6]:
html = BeautifulSoup(raw_html,'html.parser')

In [7]:
table = html.find('table')

In [8]:
data = []
table_body = table.find('tbody')
table_cols = table.find('thead')
rows = table_body.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele]) # Get rid of empty values

In [10]:
import pandas as pd

In [191]:
#The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
dfcolumns = ['Neighborhood','Postal Code']
laneigh = pd.DataFrame(data[1:],columns=dfcolumns)

<h3>We will do some cleaning</h3>

In [201]:
laneigh_LA = laneigh[laneigh['Neighborhood'].str.contains("Los Angeles")].reset_index(drop=True)

In [203]:
laneigh_LA['Neighborhood'] = laneigh_LA['Neighborhood'].map(lambda x: x.rstrip(' (Los Angeles)'))

In [204]:
laneigh_LA.head()

Unnamed: 0,Neighborhood,Postal Code
0,Arleta,91331
1,Arlington Height,90019
2,Atwater Villa,90039
3,Baldwin Hi,90008
4,Bel Air Estat,90049


In [206]:
laneigh_LA_df = pd.DataFrame(laneigh_LA['Postal Code'].str.split(',').tolist(), index=laneigh_LA.Neighborhood).stack()

In [207]:
laneigh_LA_df = laneigh_LA_df.reset_index([0, 'Neighborhood'])

In [208]:
laneigh_LA_df.columns = ['Neighborhood', 'ZIP']

In [209]:
laneigh_LA_df = laneigh_LA_df.drop_duplicates(subset=['Neighborhood'],keep='first')

In [210]:
laneigh_LA_df.shape

(332, 2)

In [33]:
#Get US Zip Code data for geo codes
pd_us_zip_geo = pd.read_csv("https://gist.githubusercontent.com/erichurst/7882666/raw/5bdc46db47d9515269ab12ed6fb2850377fd869e/US%2520Zip%2520Codes%2520from%25202013%2520Government%2520Data")

In [36]:
pd_us_zip_geo.shape

(33144, 3)

In [37]:
pd_us_zip_geo.columns

Index(['ZIP', 'LAT', 'LNG'], dtype='object')

In [43]:
pd_us_zip_geo.tail()

Unnamed: 0,ZIP,LAT,LNG
33139,99923,56.002315,-130.041026
33140,99925,55.550204,-132.945933
33141,99926,55.138352,-131.470424
33142,99927,56.239062,-133.457924
33143,99929,56.370751,-131.693301


In [211]:
laneigh_LA_df.dtypes

Neighborhood    object
ZIP             object
dtype: object

In [212]:
laneigh_LA_df['ZIP'] = laneigh_LA_df['ZIP'].astype('Int64')

In [234]:
laneigh_LA_df.shape

(332, 2)

In [235]:
laneigh_LA_df = laneigh_LA_df.drop_duplicates(subset=['ZIP'],keep='first')

In [237]:
laneigh_LA_df_geo = laneigh_LA_df.merge(pd_us_zip_geo, on = 'ZIP')

In [315]:
laneigh_LA_df_geo.head()

Unnamed: 0,Neighborhood,ZIP,LAT,LNG
0,Arleta,91331,34.255442,-118.421314
1,Arlington Height,90019,34.049841,-118.33846
2,Atwater Villa,90039,34.111885,-118.261033
3,Baldwin Hi,90008,34.009552,-118.346724
4,Bel Air Estat,90049,34.09254,-118.491064


In [54]:
sfneighdata.columns = ['ZIP', 'Neighborhood','Population']

In [94]:
sfneighdata_geo = sfneighdata.merge(pd_us_zip_geo, on='ZIP')

In [67]:
sfneighdata_geo.head()

Unnamed: 0,ZIP,Neighborhood,Population,LAT,LNG
0,94102,Hayes Valley/Tenderloin/North of Market,28991,37.779588,-122.419318
1,94103,South of Market,23016,37.773134,-122.411167
2,94107,Potrero Hill,17368,37.76046,-122.399724
3,94108,Chinatown,13716,37.792007,-122.408575
4,94109,Polk/Russian Hill (Nob Hill),56322,37.795388,-122.422453


In [241]:
laneigh_LA_df_geo.head()

Unnamed: 0,Neighborhood,ZIP,LAT,LNG
0,Arleta,91331,34.255442,-118.421314
1,Arlington Height,90019,34.049841,-118.33846
2,Atwater Villa,90039,34.111885,-118.261033
3,Baldwin Hi,90008,34.009552,-118.346724
4,Bel Air Estat,90049,34.09254,-118.491064


In [316]:
#Now Lets put all this into visualization

In [71]:
!pip install folium

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/4f/86/1ab30184cb60bc2b95deffe2bd86b8ddbab65a4fac9f7313c278c6e8d049/folium-0.9.1-py2.py3-none-any.whl (91kB)
[K    100% |████████████████████████████████| 92kB 14.9MB/s ta 0:00:01
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/63/36/1c93318e9653f4e414a2e0c3b98fc898b4970e939afeedeee6075dd3b703/branca-0.3.1-py3-none-any.whl
[31mtensorflow 1.3.0 requires tensorflow-tensorboard<0.2.0,>=0.1.0, which is not installed.[0m
Installing collected packages: branca, folium
Successfully installed branca-0.3.1 folium-0.9.1


In [72]:
import folium

In [154]:
sf_latitude = 37.773972
sf_longitude = -122.431297
#Lets put this data on map
# create map of San Francisco using latitude and longitude values
map_sf = folium.Map(location=[sf_latitude, sf_longitude], zoom_start=10)

# add markers to map
for lat, lng, ZIP, neighborhood in zip(sfneighdata_geo['LAT'], sfneighdata_geo['LNG'], sfneighdata_geo['ZIP'], sfneighdata_geo['Neighborhood']):
    label = '{}, {}'.format(neighborhood, ZIP)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_sf)  
    
map_sf

In [242]:
la_latitude = 34.052235
la_longitude = -118.243683
#Lets put this data on map
# create map of Los Angeles using latitude and longitude values
map_la = folium.Map(location=[la_latitude, la_longitude], zoom_start=10)

# add markers to map
for lat, lng, ZIP, neighborhood in zip(laneigh_LA_df_geo['LAT'], laneigh_LA_df_geo['LNG'], laneigh_LA_df_geo['ZIP'], laneigh_LA_df_geo['Neighborhood']):
    label = '{}, {}'.format(neighborhood, ZIP)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_la)  
    
map_la

In [79]:
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

#Define Foursquare Credentials and Version
CLIENT_ID = 'P2MFHZX0I2BQ24T20OY50COFZQ2VWKH3WXYCV3KM5QRTRPP3' # your Foursquare ID
CLIENT_SECRET = '20MHYBRADOQ5E0FEWYJ22AGD20QVW1DWPWHFV5C2CESUIAZ3' # your Foursquare Secret
VERSION = '20190604'

#explore the first neighborhood in our dataframe for San Francisco which is Hayes Valley/Tenderloin/North of Market
#Get the neighborhood's latitude and longitude values.
neighborhood_latitude = sfneighdata_geo.loc[0, 'LAT'] # neighborhood latitude value
neighborhood_longitude = sfneighdata_geo.loc[0, 'LNG'] # neighborhood longitude value
neighborhood_name = sfneighdata_geo.loc[0, 'Neighborhood'] # neighborhood name

#get the top 100 venues that are in Bukit Bintang within a radius of 500 meters
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

#Send the GET request and examine the resutls
results = requests.get(url).json()

#borrow the get_category_type function from the Foursquare lab.
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

#clean the json and structure it into a pandas dataframe
venues = results['response']['groups'][0]['items']    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
print('{} venues were returned by Foursquare for Hayes Valley/Tenderloin/North of Market - San Francisco.'.format(nearby_venues.shape[0]))
nearby_venues.head()

100 venues were returned by Foursquare for Hayes Valley/Tenderloin/North of Market - San Francisco.


Unnamed: 0,name,categories,lat,lng
0,Louise M. Davies Symphony Hall,Concert Hall,37.777976,-122.420157
1,Herbst Theater,Concert Hall,37.779548,-122.420953
2,War Memorial Opera House,Opera House,37.778601,-122.420816
3,San Francisco Ballet,Dance Studio,37.77858,-122.420798
4,Asian Art Museum,Art Museum,37.780178,-122.416505


In [250]:
laneigh_LA_df_geo.head()

Unnamed: 0,Neighborhood,ZIP,LAT,LNG
0,Arleta,91331,34.255442,-118.421314
1,Arlington Height,90019,34.049841,-118.33846
2,Atwater Villa,90039,34.111885,-118.261033
3,Baldwin Hi,90008,34.009552,-118.346724
4,Bel Air Estat,90049,34.09254,-118.491064


In [256]:
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

#Define Foursquare Credentials and Version
#CLIENT_ID = 'P2MFHZX0I2BQ24T20OY50COFZQ2VWKH3WXYCV3KM5QRTRPP3' # your Foursquare ID
#CLIENT_SECRET = '20MHYBRADOQ5E0FEWYJ22AGD20QVW1DWPWHFV5C2CESUIAZ3' # your Foursquare Secret
#VERSION = '20190604'

#Define Foursquare Credentials and Version
CLIENT_ID = 'J5GENTOLGWXB5L51XSCQX0OYIWOZMGBNEV54LDWYAOSFN3GZ' # your Foursquare ID
CLIENT_SECRET = 'ESTJOIPJZ4H0BTDSIMA0HS20J5DIN2ENWE4503ORKVHHMQOU' # your Foursquare Secret
VERSION = '20190604'



#explore the first neighborhood in our dataframe for Los Angeles which is Arleta
#Get the neighborhood's latitude and longitude values.
neighborhood_latitude = laneigh_LA_df_geo.loc[0, 'LAT'] # neighborhood latitude value
neighborhood_longitude = laneigh_LA_df_geo.loc[0, 'LNG'] # neighborhood longitude value
neighborhood_name = laneigh_LA_df_geo.loc[0, 'Neighborhood'] # neighborhood name

#get the top 100 venues that are in Bukit Bintang within a radius of 500 meters
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

#Send the GET request and examine the resutls
results = requests.get(url).json()

#borrow the get_category_type function from the Foursquare lab.
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

#clean the json and structure it into a pandas dataframe
venues = results['response']['groups'][0]['items']    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
print('{} venues were returned by Foursquare for Arleta - Los Angeles.'.format(nearby_venues.shape[0]))
nearby_venues.head()

2 venues were returned by Foursquare for Arleta - Los Angeles.


Unnamed: 0,name,categories,lat,lng
0,ServersBookinIt,Accessories Store,34.258506,-118.419259
1,Birreria Apatzingan,Mexican Restaurant,34.252693,-118.42536


In [86]:
#laneigh_LA_df_geo.loc[laneigh_LA_df_geo['Neighborhood'].str.contains("Beverly")]

Unnamed: 0,Neighborhood,ZIP,LAT,LNG
28,"Bel Air Estates, Beverly G",90077,34.108023,-118.456964
29,"Beverly Glen, Bel Air Estat",90077,34.108023,-118.456964
31,Los Angeles (Beverly G,90077,34.108023,-118.456964
225,Los Angeles (West Beverly,90048,34.072924,-118.37271


In [301]:
#function to repeat the same process to all area
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#run the above function on each neighborhood and create a new dataframe
sf_venues = getNearbyVenues(names=sfneighdata_geo['Neighborhood'],
                                   latitudes=sfneighdata_geo['LAT'],
                                   longitudes=sfneighdata_geo['LNG']
                                  )

#check the size of the resulting dataframe
print(sf_venues.shape)
sf_venues.head()

Hayes Valley/Tenderloin/North of Market
South of Market
Potrero Hill
Chinatown
Polk/Russian Hill (Nob Hill)
Inner Mission/Bernal Heights
Ingelside-Excelsior/Crocker-Amazon
Castro/Noe Valley
Western Addition/Japantown
Parkside/Forest Hill
Haight-Ashbury
Inner Richmond
Outer Richmond
Sunset
Marina
Bayview-Hunters Point
St. Francis Wood/Miraloma/West Portal
Twin Peaks-Glen Park
Lake Merced
North Beach/Chinatown
Visitacion Valley/Sunnydale
(1254, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Hayes Valley/Tenderloin/North of Market,37.779588,-122.419318,Louise M. Davies Symphony Hall,37.777976,-122.420157,Concert Hall
1,Hayes Valley/Tenderloin/North of Market,37.779588,-122.419318,Herbst Theater,37.779548,-122.420953,Concert Hall
2,Hayes Valley/Tenderloin/North of Market,37.779588,-122.419318,War Memorial Opera House,37.778601,-122.420816,Opera House
3,Hayes Valley/Tenderloin/North of Market,37.779588,-122.419318,San Francisco Ballet,37.77858,-122.420798,Dance Studio
4,Hayes Valley/Tenderloin/North of Market,37.779588,-122.419318,Asian Art Museum,37.780178,-122.416505,Art Museum


In [258]:
#run the above function on each neighborhood and create a new dataframe
la_venues = getNearbyVenues(names=laneigh_LA_df_geo['Neighborhood'],
                                   latitudes=laneigh_LA_df_geo['LAT'],
                                   longitudes=laneigh_LA_df_geo['LNG']
                                  )

#check the size of the resulting dataframe
print(la_venues.shape)
la_venues.head()

Arleta
Arlington Height
Atwater Villa
Baldwin Hi
Bel Air Estat
Bel Air Estates, Beverly G
Boyle Height
Byzantine-Latino Quarter
California State Univ Northrid
Canoga Park
Castellemar
Century City
Chatsworth
Cheviot Hi
Chinatow
Commerce, East
Cypress Park
Downtown City West
Downtown Fashion District
Downtown South Park
Downtown Southeast
Eagle Rock
East Hollywood
East
Echo Park
El Sereno, Monterey Hills, University Hi
Enci
Fairfax
Florence-Graham, South
Granada Hi
Griffith Park
Hancock Park
Harbor City
Harbor Gateway/Shoestri
Highland Park
Hyde Park
Jefferson Park
Koreatow
La Tuna Cany
Lake Balboa
Lak
Lake View Terrac
Lincoln Heights, Montecito Hieght
Los Angeles (Canoga Park
Los Angeles (Downtown Arts District
Los Angeles (Los Angeles International Airport
Los Angeles (Mid-City West
Los Angeles (Mission Hi
Los Angeles (Mt Olympu
Los Angeles (North Hi
Los Angeles (North Hollywood
Los Angeles (Northrid
Los Angeles (Palm
Los Angeles (Panorama City
Los Angeles (Pierce C
Los Angeles (Playa 

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Arleta,34.255442,-118.421314,ServersBookinIt,34.258506,-118.419259,Accessories Store
1,Arleta,34.255442,-118.421314,Birreria Apatzingan,34.252693,-118.42536,Mexican Restaurant
2,Arlington Height,34.049841,-118.33846,PizzaRev,34.048585,-118.336439,Pizza Place
3,Arlington Height,34.049841,-118.33846,Planet Fitness,34.047774,-118.338605,Gym / Fitness Center
4,Arlington Height,34.049841,-118.33846,Jersey Mike's Subs,34.048449,-118.337419,Sandwich Place


In [259]:
#check how many venues were returned for each Neighborhood in San Francisco
print('There are {} uniques categories in San Francisco.'.format(len(sf_venues['Venue Category'].unique())))
sf_venues.groupby('Neighborhood').count()

There are 241 uniques categories in San Francisco.


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bayview-Hunters Point,4,4,4,4,4,4
Castro/Noe Valley,81,81,81,81,81,81
Chinatown,100,100,100,100,100,100
Haight-Ashbury,100,100,100,100,100,100
Hayes Valley/Tenderloin/North of Market,100,100,100,100,100,100
Ingelside-Excelsior/Crocker-Amazon,33,33,33,33,33,33
Inner Mission/Bernal Heights,83,83,83,83,83,83
Inner Richmond,64,64,64,64,64,64
Lake Merced,17,17,17,17,17,17
Marina,100,100,100,100,100,100


In [260]:
#check how many venues were returned for each Neighborhood in Los Angeles
print('There are {} uniques categories in Los Angeles.'.format(len(la_venues['Venue Category'].unique())))
la_venues.groupby('Neighborhood').count()

There are 279 uniques categories in Los Angeles.


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Arleta,2,2,2,2,2,2
Arlington Height,31,31,31,31,31,31
Atwater Villa,4,4,4,4,4,4
Baldwin Hi,2,2,2,2,2,2
Bel Air Estat,1,1,1,1,1,1
"Bel Air Estates, Beverly G",2,2,2,2,2,2
Boyle Height,11,11,11,11,11,11
Byzantine-Latino Quarter,12,12,12,12,12,12
Canoga Park,21,21,21,21,21,21
Century City,71,71,71,71,71,71


<h3><a id="analyzeSF">Analyze San Francisco</a></h3>


In [302]:
# one hot encoding
sf_onehot = pd.get_dummies(sf_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
sf_onehot['Neighborhood'] = sf_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [sf_onehot.columns[-1]] + list(sf_onehot.columns[:-1])
sf_onehot = sf_onehot[fixed_columns]

#examine the new dataframe size after one hot encoding
print('{} rows were returned after one hot encoding.'.format(sf_onehot.shape[0]))

#group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
sf_grouped = sf_onehot.groupby('Neighborhood').mean().reset_index()

#examine the new dataframe size after one hot encoding
print('{} rows were returned after grouping.'.format(sf_grouped.shape[0]))

1254 rows were returned after one hot encoding.
21 rows were returned after grouping.


In [303]:
#print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in sf_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = sf_grouped[sf_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bayview-Hunters Point----
                venue  freq
0             Brewery  0.25
1                Food  0.25
2  Seafood Restaurant  0.25
3     Motorcycle Shop  0.25
4   Outdoor Sculpture  0.00


----Castro/Noe Valley----
            venue  freq
0         Gay Bar  0.11
1     Coffee Shop  0.05
2  Scenic Lookout  0.04
3     Yoga Studio  0.02
4   Deli / Bodega  0.02


----Chinatown----
          venue  freq
0   Coffee Shop  0.06
1         Hotel  0.06
2           Gym  0.03
3          Café  0.03
4  Cocktail Bar  0.03


----Haight-Ashbury----
                    venue  freq
0                Boutique  0.06
1             Coffee Shop  0.05
2  Thrift / Vintage Store  0.05
3                    Café  0.04
4          Clothing Store  0.04


----Hayes Valley/Tenderloin/North of Market----
          venue  freq
0  Cocktail Bar  0.04
1   Coffee Shop  0.04
2          Café  0.04
3       Theater  0.03
4      Beer Bar  0.03


----Ingelside-Excelsior/Crocker-Amazon----
                   venue  freq
0  

In [150]:
import numpy as np

In [304]:
#put into a pandas dataframe

#write a function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#create the new dataframe and display the top 10 venues for each neighborhood
num_top_venues = 8

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
areas_venues_sorted = pd.DataFrame(columns=columns)
areas_venues_sorted['Neighborhood'] = sf_grouped['Neighborhood']

for ind in np.arange(sf_grouped.shape[0]):
    areas_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sf_grouped.iloc[ind, :], num_top_venues)

areas_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,Bayview-Hunters Point,Brewery,Motorcycle Shop,Food,Seafood Restaurant,Yoga Studio,Garden,Furniture / Home Store,Frozen Yogurt Shop
1,Castro/Noe Valley,Gay Bar,Coffee Shop,Scenic Lookout,Yoga Studio,Pet Store,Deli / Bodega,Clothing Store,Café
2,Chinatown,Hotel,Coffee Shop,Spa,Café,French Restaurant,Cocktail Bar,Gym,Gym / Fitness Center
3,Haight-Ashbury,Boutique,Thrift / Vintage Store,Coffee Shop,Breakfast Spot,Clothing Store,Café,Gift Shop,Shoe Store
4,Hayes Valley/Tenderloin/North of Market,Cocktail Bar,Coffee Shop,Café,Vietnamese Restaurant,Performing Arts Venue,French Restaurant,Furniture / Home Store,Beer Bar


<h3><a id="kmeanSF">K-mean Cluster San Francisco</a></h3>

In [305]:
sfneighdata_geo

Unnamed: 0,ZIP,Neighborhood,Population,LAT,LNG,Cluster Labels
0,94102,Hayes Valley/Tenderloin/North of Market,28991,37.779588,-122.419318,2
1,94103,South of Market,23016,37.773134,-122.411167,0
2,94107,Potrero Hill,17368,37.76046,-122.399724,0
3,94108,Chinatown,13716,37.792007,-122.408575,0
4,94109,Polk/Russian Hill (Nob Hill),56322,37.795388,-122.422453,0
5,94110,Inner Mission/Bernal Heights,74633,37.750021,-122.415201,0
6,94112,Ingelside-Excelsior/Crocker-Amazon,73104,37.720375,-122.44295,0
7,94114,Castro/Noe Valley,30574,37.758057,-122.43541,0
8,94115,Western Addition/Japantown,33115,37.785969,-122.437253,0
9,94116,Parkside/Forest Hill,42958,37.745399,-122.486065,0


In [306]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 3

sf_grouped_clustering = sf_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sf_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

#create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
sf_merged = sfneighdata_geo

# add clustering labels
sf_merged['Cluster Labels'] = kmeans.labels_

# merge SF_grouped with SF_data to add latitude/longitude for each neighborhood
sf_merged = sf_merged.join(areas_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

sf_merged.head()

Unnamed: 0,ZIP,Neighborhood,Population,LAT,LNG,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,94102,Hayes Valley/Tenderloin/North of Market,28991,37.779588,-122.419318,2,Cocktail Bar,Coffee Shop,Café,Vietnamese Restaurant,Performing Arts Venue,French Restaurant,Furniture / Home Store,Beer Bar
1,94103,South of Market,23016,37.773134,-122.411167,0,Nightclub,Cocktail Bar,Gay Bar,Coffee Shop,Café,Food Truck,Motorcycle Shop,Art Gallery
2,94107,Potrero Hill,17368,37.76046,-122.399724,0,Park,Café,Grocery Store,Breakfast Spot,Brewery,Garden,Yoga Studio,Bus Station
3,94108,Chinatown,13716,37.792007,-122.408575,0,Hotel,Coffee Shop,Spa,Café,French Restaurant,Cocktail Bar,Gym,Gym / Fitness Center
4,94109,Polk/Russian Hill (Nob Hill),56322,37.795388,-122.422453,0,Italian Restaurant,Spa,Deli / Bodega,Gym,Bar,Wine Bar,Gym / Fitness Center,Cosmetics Shop


In [307]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

#Finally, let's visualize the resulting clusters
# create map for SF
sf_clusters = folium.Map(location=[sf_latitude, sf_longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(sf_merged['LAT'], sf_merged['LNG'], sf_merged['Neighborhood'], sf_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(sf_clusters)
       
sf_clusters

<h3><a id="analyzeLA">Analyze Los Angeles</a></h3>

In [264]:
# one hot encoding
la_onehot = pd.get_dummies(la_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
la_onehot['Neighborhood'] = la_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [la_onehot.columns[-1]] + list(la_onehot.columns[:-1])
la_onehot = la_onehot[fixed_columns]

#examine the new dataframe size after one hot encoding
print('{} rows were returned after one hot encoding.'.format(la_onehot.shape[0]))

#group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
la_grouped = la_onehot.groupby('Neighborhood').mean().reset_index()

#examine the new dataframe size after one hot encoding
print('{} rows were returned after grouping.'.format(la_grouped.shape[0]))

1751 rows were returned after one hot encoding.
81 rows were returned after grouping.


In [265]:
#print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in la_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = la_grouped[la_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Arleta----
                venue  freq
0   Accessories Store   0.5
1  Mexican Restaurant   0.5
2  Yoshoku Restaurant   0.0
3         Music Venue   0.0
4     Organic Grocery   0.0


----Arlington Height----
                    venue  freq
0      Chinese Restaurant  0.06
1          Sandwich Place  0.06
2                    Bank  0.06
3  Furniture / Home Store  0.06
4                     Gym  0.03


----Atwater Villa----
                    venue  freq
0         Laundry Service  0.25
1              Donut Shop  0.25
2  Furniture / Home Store  0.25
3                     Gym  0.25
4      Yoshoku Restaurant  0.00


----Baldwin Hi----
                venue  freq
0      Scenic Lookout   0.5
1      Clothing Store   0.5
2  Yoshoku Restaurant   0.0
3   Outdoor Sculpture   0.0
4     Organic Grocery   0.0


----Bel Air Estat----
                     venue  freq
0         Insurance Office   1.0
1        Outdoor Sculpture   0.0
2          Organic Grocery   0.0
3             Optical Shop   0.0
4  N

                venue  freq
0             Airport  0.25
1    Airport Terminal  0.25
2               Plane  0.25
3    Sushi Restaurant  0.25
4  Yoshoku Restaurant  0.00


----Los Angeles (Mid-City West----
                  venue  freq
0        Clothing Store  0.09
1             Juice Bar  0.03
2  Gym / Fitness Center  0.03
3                  Café  0.03
4    Mexican Restaurant  0.03


----Los Angeles (Mission Hi----
                venue  freq
0  Mexican Restaurant  0.29
1               Plaza  0.12
2      Breakfast Spot  0.06
3           BBQ Joint  0.06
4      Ice Cream Shop  0.06


----Los Angeles (Mt Olympu----
                venue  freq
0      Scenic Lookout   0.5
1       Grocery Store   0.5
2  Yoshoku Restaurant   0.0
3   Outdoor Sculpture   0.0
4     Organic Grocery   0.0


----Los Angeles (North Hi----
                 venue  freq
0     Asian Restaurant  0.15
1        Grocery Store  0.08
2  Filipino Restaurant  0.08
3         Liquor Store  0.08
4                  Bar  0.08


----

In [228]:
#create the new dataframe and display the top 10 venues for each neighborhood
num_top_venues = 8

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
areas_venues_sorted = pd.DataFrame(columns=columns)
areas_venues_sorted['Neighborhood'] = la_grouped['Neighborhood']

for ind in np.arange(la_grouped.shape[0]):
    areas_venues_sorted.iloc[ind, 1:] = return_most_common_venues(la_grouped.iloc[ind, :], num_top_venues)

areas_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,Angeles (Arleta,Mexican Restaurant,Accessories Store,Electronics Store,Dog Run,Donburi Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant
1,Angeles (Arlington Height,Furniture / Home Store,Sandwich Place,Chinese Restaurant,Bank,Gym,Shipping Store,Burger Joint,Salon / Barbershop
2,Angeles (Atwater Villa,Furniture / Home Store,Laundry Service,Donut Shop,Gym,Yoga Studio,Eastern European Restaurant,Donburi Restaurant,Doner Restaurant
3,Angeles (Baldwin Hi,Scenic Lookout,Clothing Store,Yoga Studio,Eastern European Restaurant,Dog Run,Donburi Restaurant,Doner Restaurant,Donut Shop
4,Angeles (Bel Air Estat,Bowling Alley,Football Stadium,Event Space,Donburi Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant


<h3><a id="kmeanLA">K Mean Cluster Los Angeles</a></h3>

In [266]:
la_grouped.shape

(81, 279)

In [267]:
laneigh_LA_df_geo.shape

(87, 4)

In [None]:
laneigh_LA_df_geo["neighborhood"]

In [275]:
# set number of clusters
kclusters = 3

la_grouped_clustering = la_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(la_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

#create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
la_merged = laneigh_LA_df_geo[laneigh_LA_df_geo['Neighborhood'].isin(la_grouped['Neighborhood'])]

# add clustering labels
la_merged['Cluster Labels'] = kmeans.labels_

# merge la_grouped with laneigh_LA_df_geo to add latitude/longitude for each neighborhood
la_merged = la_merged.join(areas_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

la_merged.head() # check the last columns!

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,Neighborhood,ZIP,LAT,LNG,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,Arleta,91331,34.255442,-118.421314,0,Mexican Restaurant,Accessories Store,Electronics Store,Dog Run,Donburi Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant
1,Arlington Height,90019,34.049841,-118.33846,0,Furniture / Home Store,Sandwich Place,Chinese Restaurant,Bank,Gym,Shipping Store,Burger Joint,Salon / Barbershop
2,Atwater Villa,90039,34.111885,-118.261033,0,Furniture / Home Store,Laundry Service,Donut Shop,Gym,Yoga Studio,Eastern European Restaurant,Donburi Restaurant,Doner Restaurant
3,Baldwin Hi,90008,34.009552,-118.346724,0,Scenic Lookout,Clothing Store,Yoga Studio,Eastern European Restaurant,Dog Run,Donburi Restaurant,Doner Restaurant,Donut Shop
4,Bel Air Estat,90049,34.09254,-118.491064,0,Insurance Office,Yoga Studio,Diner,Dog Run,Donburi Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant


In [277]:
#Finally, let's visualize the resulting clusters
# create map
la_clusters = folium.Map(location=[la_latitude, la_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(la_merged['LAT'], la_merged['LNG'], la_merged['Neighborhood'], la_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(la_clusters)
       
la_clusters

<h3><a id="results">Results</a></h3>

In [308]:
#Cluster 1 for San Francisco
sf_merged.loc[sf_merged['Cluster Labels'] == 0, sf_merged.columns[[1] + list(range(5, sf_merged.shape[1]))]]


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
1,South of Market,0,Nightclub,Cocktail Bar,Gay Bar,Coffee Shop,Café,Food Truck,Motorcycle Shop,Art Gallery
2,Potrero Hill,0,Park,Café,Grocery Store,Breakfast Spot,Brewery,Garden,Yoga Studio,Bus Station
3,Chinatown,0,Hotel,Coffee Shop,Spa,Café,French Restaurant,Cocktail Bar,Gym,Gym / Fitness Center
4,Polk/Russian Hill (Nob Hill),0,Italian Restaurant,Spa,Deli / Bodega,Gym,Bar,Wine Bar,Gym / Fitness Center,Cosmetics Shop
5,Inner Mission/Bernal Heights,0,Mexican Restaurant,Coffee Shop,Grocery Store,Pizza Place,Art Gallery,Park,Gym / Fitness Center,Latin American Restaurant
6,Ingelside-Excelsior/Crocker-Amazon,0,Pizza Place,Bus Station,Light Rail Station,Mexican Restaurant,Vietnamese Restaurant,Sandwich Place,Steakhouse,Filipino Restaurant
7,Castro/Noe Valley,0,Gay Bar,Coffee Shop,Scenic Lookout,Yoga Studio,Pet Store,Deli / Bodega,Clothing Store,Café
8,Western Addition/Japantown,0,Spa,Cosmetics Shop,Bakery,Ice Cream Shop,Chinese Restaurant,Café,Boutique,Yoga Studio
9,Parkside/Forest Hill,0,Chinese Restaurant,Japanese Restaurant,Dance Studio,Dumpling Restaurant,Italian Restaurant,Shoe Repair,Sandwich Place,Pizza Place
10,Haight-Ashbury,0,Boutique,Thrift / Vintage Store,Coffee Shop,Breakfast Spot,Clothing Store,Café,Gift Shop,Shoe Store


In [309]:
#Cluster 2 for San Francisco
sf_merged.loc[sf_merged['Cluster Labels'] == 1, sf_merged.columns[[1] + list(range(5, sf_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
16,St. Francis Wood/Miraloma/West Portal,1,Monument / Landmark,Gun Range,Trail,Park,Yoga Studio,Field,Furniture / Home Store,Frozen Yogurt Shop
19,North Beach/Chinatown,1,Coffee Shop,Hotel,Pizza Place,Park,Café,Italian Restaurant,Sandwich Place,Bakery


In [310]:
#Cluster 3 for San Francisco
sf_merged.loc[sf_merged['Cluster Labels'] == 2, sf_merged.columns[[1] + list(range(5, sf_merged.shape[1]))]]


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,Hayes Valley/Tenderloin/North of Market,2,Cocktail Bar,Coffee Shop,Café,Vietnamese Restaurant,Performing Arts Venue,French Restaurant,Furniture / Home Store,Beer Bar


In [292]:
#Cluster 1 for Los Angeles
la_merged.loc[la_merged['Cluster Labels'] == 0, la_merged.columns[[0] + list(range(5, la_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,Arleta,Mexican Restaurant,Accessories Store,Electronics Store,Dog Run,Donburi Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant
1,Arlington Height,Furniture / Home Store,Sandwich Place,Chinese Restaurant,Bank,Gym,Shipping Store,Burger Joint,Salon / Barbershop
2,Atwater Villa,Furniture / Home Store,Laundry Service,Donut Shop,Gym,Yoga Studio,Eastern European Restaurant,Donburi Restaurant,Doner Restaurant
3,Baldwin Hi,Scenic Lookout,Clothing Store,Yoga Studio,Eastern European Restaurant,Dog Run,Donburi Restaurant,Doner Restaurant,Donut Shop
4,Bel Air Estat,Insurance Office,Yoga Studio,Diner,Dog Run,Donburi Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant
5,"Bel Air Estates, Beverly G",Bowling Alley,Football Stadium,Event Space,Donburi Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant
6,Boyle Height,Mexican Restaurant,Thai Restaurant,Pizza Place,Fast Food Restaurant,Burger Joint,Taco Place,Seafood Restaurant,Sandwich Place
7,Byzantine-Latino Quarter,Donut Shop,Pizza Place,Video Game Store,Spa,Food Truck,Bus Station,Diner,Cosmetics Shop
9,Canoga Park,Bank,Coffee Shop,Pet Store,Automotive Shop,Flower Shop,Mexican Restaurant,Sporting Goods Shop,Motorcycle Shop
11,Century City,Food Truck,Coffee Shop,Café,Mexican Restaurant,Salad Place,Chinese Restaurant,Department Store,Hotel


In [293]:
#Cluster 2 for Los Angeles
la_merged.loc[la_merged['Cluster Labels'] == 1, la_merged.columns[[0] + list(range(5, la_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
25,"El Sereno, Monterey Hills, University Hi",Park,Yoga Studio,Eastern European Restaurant,Dog Run,Donburi Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant
33,Harbor Gateway/Shoestri,Park,Board Shop,Convenience Store,Fast Food Restaurant,Yoga Studio,Donburi Restaurant,Doner Restaurant,Donut Shop
61,Los Angeles (Shadow Hi,,,,,,,,
65,Los Angeles (Southeast,,,,,,,,
85,South Los Angeles/Broadway Manchester,Fast Food Restaurant,Southern / Soul Food Restaurant,Yoga Studio,Eastern European Restaurant,Dog Run,Donburi Restaurant,Doner Restaurant,Donut Shop


In [294]:
#Cluster 3 for Los Angeles
la_merged.loc[la_merged['Cluster Labels'] == 2, la_merged.columns[[0] + list(range(5, la_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
29,Granada Hi,Cosmetics Shop,Yoga Studio,Event Space,Donburi Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant


<h3><a id="discussion">Discussion</a></h3>

<pre>Based on cluster for each cities above, we believe that classification for each cluster can be done better with calculation of venues categories (most common) in each cities. 
Refering to each clsuter, we can not deterimine clearly what represent in each cluster by using Foursquare - Most Common Venue data.

However, We have assumed each cluster is as follow:

Cluster 1: San Francisco: Tourism 
Cluster 2: San Francisco: Residental and Tourism
Cluster 3: San Francisco: Tourism
Cluster 1: Los Angeles: Tourism
Cluster 2: Los Angeles: Residential based on the Park, Convenience Store and Yoga Studio.
Cluster 3: Los Angeles: Mixed.
        
What is lacking at this point is a systematic, quantitative way to identify and distinguish different district and to describe the correlation most common venues as recorded in Foursquare. The reality is however more complex: similar cities might have or might not have similar common venues. A further step in this classification would be to find a method to extract these common venues and integrate the spatial correlations between different of areas or district.

We believe that the classification we propose is an encouraging step towards a quantitative and systematic comparison of the different cities. Further studies are indeed needed in order to relate the data acquired, then observe it to more meaningful and objective results.

</pre>

<h3><a id="conclusion">Conclusion</a></h3>

<pre>
With the help of Foursquare API, we were able to capture the venue information and using venue information, we can figure out
the similarities or dissimilarities of San Francisco and Los Angeles.
We did classification of Neighbourhoods as Residential, tourism or Mixed.

In conclusion, both cities San Francisco and Los Angeles have tourism as similarity as well as
there are some residential areas.
It is somewhat clear that in San Francisco, the residential and tourism neighbourhoods are mixed compare to Los Angeles.

Thank you.
By,
Charles Gomes.
charles2588@gmail.com
</pre>