# Clustering of Japanese or Sushi Restaurants in LA Neighborhoods

### For the first week:
#### 1- The problem:
We will test whether the Japanese and Sushi Restaurants are clustered in the neighborhoods of Los Angeles (LA) around bars. If this is true, the location of nearby bars is important when deciding the location of new Japanese restaurants. 

#### 2- A description of the data:
We will extract the neighborhoods of Los Angeles (LA) and get those with the most Japanese or Sushi restaurants (call it JPLA).


## For the second week:
### The full report:

#### Introduction:
We will test whether the Japanese and Sushi Restaurants are clustered in the neighborhoods of Los Angeles (LA) around bars. Also, we will check if happens in certain LA region. If this is true, the location of nearby bars is important when deciding the location of new Japanese restaurants. This will be compared with clustering around other restaurants (which is expected to be true) or any other venue (which is not expected to be true as this is random).

#### Data:
I have extracted the neighborhoods of Los Angeles (LA) and get those with the most Japanese or Sushi restaurants (hereafter, JPLA).

#### Methodology:
I have created averaged number of venues per neighborhood for bars, other restaurants, or other venues in general named bar index (test), restaurant index (positive control), other index (negative control). I used Kmeans to cluster JPLA based on these indexes. The mean value for index per cluster and per region is calculated. I used One-way ANOVA for statistical analysis.

#### Results:
When I clustered the JPLA neighborhoods into 4 clusters, only one cluster (cluster 3) showed higher bar index compared to other clusters (all are zeros). Also, in this cluster the bar index was significantly above the the other index (One-way ANOVA, p = 0.003). Neighborhoods in this cluster are from different regions in LA. 
In all clusters, the restaurant index was significantly higher than other indexes (One-way ANOVA, P = 0.0096).

#### Discussion:
In some neighborhoods (check names in cluster 2), the Japanese and Sushi restaurants cluster around bars. However, this is not the general trend. As expected, the Japanese and Sushi restaurants cluster around other restaurants. 

#### Conclusion:
The notion that the Japanese and Sushi restaurants cluster around bars is not a rare trend in LA county. There is no need to consider nearby bar locations to decide a Japanese restaurant location.   


In [1]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
import requests
from bs4 import BeautifulSoup # This is useless

## 1-Create DataFrame

#### Scrape list of Neighborhoods in LA county

In [2]:
link='http://maps.latimes.com/neighborhoods/neighborhood/list/' 
dfs = pd.read_html(link,flavor='bs4')
df = dfs[0].iloc[1:]
df.columns = ['Neighborhood' , 'Region']
df.size

542

In [3]:
df.head()

Unnamed: 0,Neighborhood,Region
1,Adams-Normandie,South L.A.
2,Agoura Hills,Santa Monica Mountains
3,Agua Dulce,Northwest County
4,Alhambra,San Gabriel Valley
5,Alondra Park,South Bay


### Install geocoders

In [4]:
!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library


Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.



### Add geographical coordinates

In [5]:
d = []
for addrs in df.Neighborhood:
    
    address = addrs + ', Los Angeles, USA'
    
    try:
        geolocator = Nominatim()
        location = geolocator.geocode(address)
        latitude = location.latitude
        longitude = location.longitude
        print(location)
        d.append({'Neighborhood': addrs, 'Location': location, 'Latitude': latitude, 'Longitude': longitude})
    
    except Exception as e:
        latitude = None
        longitude = None

loc_df = pd.DataFrame(d)

  import sys


Agoura Hills, Los Angeles County, California, 91301, USA
Agua Dulce, Los Angeles County, California, USA
Alhambra, Los Angeles County, California, USA
Alondra Park, Los Angeles County, California, 90506, USA
Altadena, Los Angeles County, California, 91001, USA
Angeles Crest Station, Altacanyada, Los Angeles County, California, 91011, USA
Arcadia, Los Angeles County, California, USA
Arleta, LA, Los Angeles County, California, USA
Arlington Heights Elementary School, 7th Avenue, Country Club Park, Cienega, LA, Los Angeles County, California, 90019, USA
Artesia, Los Angeles County, California, USA
Athens, Los Angeles County, California, 90061, USA
Atwater Village, Atwater, LA, Los Angeles County, California, 90039, USA
Avalon, Los Angeles County, California, USA
Avocado Heights, Los Angeles County, California, 91746, USA
Azusa, Los Angeles County, California, USA
Baldwin Park, Los Angeles County, California, 91706, USA
Bel Air, Westwood, LA, Los Angeles County, California, 90024-2613, USA

In [6]:
loc_df.shape

(245, 4)

In [8]:
compo = pd.merge(loc_df, df,  how='inner')
compo.head()

Unnamed: 0,Latitude,Location,Longitude,Neighborhood,Region
0,34.136395,"(Agoura Hills, Los Angeles County, California,...",-118.774535,Agoura Hills,Santa Monica Mountains
1,34.496382,"(Agua Dulce, Los Angeles County, California, U...",-118.325635,Agua Dulce,Northwest County
2,34.093042,"(Alhambra, Los Angeles County, California, USA...",-118.12706,Alhambra,San Gabriel Valley
3,33.88946,"(Alondra Park, Los Angeles County, California,...",-118.330907,Alondra Park,South Bay
4,34.186316,"(Altadena, Los Angeles County, California, 910...",-118.135233,Altadena,Verdugos


In [9]:
address = 'Los Angeles, USA'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of LA are {}, {}.'.format(latitude, longitude))

  This is separate from the ipykernel package so we can avoid doing imports until


The geograpical coordinate of LA are 34.0536834, -118.2427669.


## 2- Visualization of LA neighborhoods

In [10]:
# create map of LA using latitude and longitude values
map_LA = folium.Map(location=[latitude, longitude], zoom_start=11)

LA_data = compo

In [11]:
# add markers to map
for lat, lng, label in zip(LA_data['Latitude'], LA_data['Longitude'], LA_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_LA)  
    
map_LA

#### Define Foursquare Credentials and Version

In [12]:
CLIENT_ID = '1DAKCYWC3Z0RAPMRMBPYHHXX4I23SRAX4S51AX4DS35HKP43' # your Foursquare ID
CLIENT_SECRET = 'ZOZQFPX5QEPOKLDI5NQ32ONPRZQ142WUQJBJPEGWATGVTLQQ' # your Foursquare Secret
VERSION = '20181223' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 1DAKCYWC3Z0RAPMRMBPYHHXX4I23SRAX4S51AX4DS35HKP43
CLIENT_SECRET:ZOZQFPX5QEPOKLDI5NQ32ONPRZQ142WUQJBJPEGWATGVTLQQ


## 3. Explore Neighborhoods in LA

In [13]:
import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe


In [14]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng,
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [16]:
LIMIT = 200 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

LA_venues = getNearbyVenues(names=LA_data['Neighborhood'],
                                   latitudes=LA_data['Latitude'],
                                   longitudes=LA_data['Longitude'])

Agoura Hills
Agua Dulce
Alhambra
Alondra Park
Altadena
Angeles Crest
Arcadia
Arleta
Arlington Heights
Artesia
Athens
Atwater Village
Avalon
Avocado Heights
Azusa
Baldwin Park
Bel-Air
Bell
Bellflower
Bell Gardens
Beverly Hills
Beverlywood
Boyle Heights
Bradbury
Brentwood
Burbank
Calabasas
Canoga Park
Carson
Carthay
Castaic
Central-Alameda
Century City
Cerritos
Charter Oak
Chatsworth
Chatsworth Reservoir
Cheviot Hills
Chinatown
Citrus
Claremont
Commerce
Compton
Covina
Cudahy
Culver City
Cypress Park
Del Aire
Del Rey
Desert View Highlands
Diamond Bar
Downey
Downtown
Duarte
Eagle Rock
East Compton
East Hollywood
East La Mirada
East Los Angeles
East Pasadena
East San Gabriel
Echo Park
El Monte
El Segundo
El Sereno
Elysian Park
Elysian Valley
Encino
Exposition Park
Fairfax
Florence
Florence-Firestone
Gardena
Glassell Park
Glendale
Glendora
Gramercy Park
Granada Hills
Green Valley
Griffith Park
Hacienda Heights
Hancock Park
Hansen Dam
Harbor City
Harbor Gateway
Harvard Heights
Hasley Canyon
H

#### how many unique categories can be curated

In [17]:
print('There are {} uniques categories.'.format(len(LA_venues['Venue Category'].unique())))

There are 339 uniques categories.


In [18]:
print(LA_venues.shape)
LA_venues.head()

(4582, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Agoura Hills,34.136395,-118.774535,Key Service Agoura Hills,34.136378,-118.773805,Locksmith
1,Agoura Hills,34.136395,-118.774535,Charm Thai Cuisine,34.136797,-118.774384,Thai Restaurant
2,Agua Dulce,34.496382,-118.325635,Maria Bonita Mexican Restaurant,34.494858,-118.326727,Mexican Restaurant
3,Agua Dulce,34.496382,-118.325635,Sweetwater Cafe,34.49483,-118.325997,Café
4,Agua Dulce,34.496382,-118.325635,Big Mouth Pizza,34.494961,-118.326481,Pizza Place


In [19]:
# one hot encoding
LA_onehot = pd.get_dummies(LA_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
LA_onehot['Neighborhood'] = LA_venues['Neighborhood'] 
LA_onehot.shape

(4582, 339)

In [20]:
LA_grouped = LA_onehot.groupby('Neighborhood').mean().reset_index()
LA_grouped

Unnamed: 0,Neighborhood,ATM,Accessories Store,Adult Boutique,Advertising Agency,Airport,Airport Lounge,American Restaurant,Amphitheater,Antique Shop,...,Volleyball Court,Warehouse Store,Water Park,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio
0,Agoura Hills,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.00000,0.000000,0.000000
1,Agua Dulce,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.00000,0.000000,0.000000
2,Alhambra,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.023810,0.000000,0.0,0.02381,0.000000,0.000000
3,Alondra Park,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.00000,0.000000,0.000000
4,Altadena,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.00000,0.000000,0.000000
5,Angeles Crest,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.00000,0.000000,0.000000
6,Arcadia,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.111111,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.00000,0.000000,0.000000
7,Arleta,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.00000,0.000000,0.000000
8,Arlington Heights,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.00000,0.000000,0.000000
9,Artesia,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,...,0.0,0.000000,0.0,0.0,0.000000,0.000000,0.0,0.00000,0.000000,0.000000


In [21]:
num_top_venues = 5

for hood in LA_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = LA_grouped[LA_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agoura Hills----
                venue  freq
0           Locksmith   0.5
1     Thai Restaurant   0.5
2                 ATM   0.0
3            Pet Café   0.0
4  Persian Restaurant   0.0


----Agua Dulce----
           venue  freq
0    Pizza Place  0.25
1         Bakery  0.12
2  Grocery Store  0.12
3           Café  0.12
4      Gift Shop  0.12


----Alhambra----
               venue  freq
0             Bakery  0.07
1     Ice Cream Shop  0.07
2  Korean Restaurant  0.05
3   Sushi Restaurant  0.05
4       Burger Joint  0.05


----Alondra Park----
                  venue  freq
0        Breakfast Spot  0.29
1    Mexican Restaurant  0.29
2  Fast Food Restaurant  0.14
3            Hookah Bar  0.14
4      Asian Restaurant  0.14


----Altadena----
           venue  freq
0      Gift Shop  0.05
1    Pizza Place  0.05
2   Burger Joint  0.05
3          Diner  0.05
4  Grocery Store  0.05


----Angeles Crest----
                   venue  freq
0   Other Great Outdoors  0.67
1         Scenic Lookout 

In [22]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = LA_grouped['Neighborhood']

for ind in np.arange(LA_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(LA_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Agoura Hills,Locksmith,Thai Restaurant,Yoga Studio,Exhibit,Falafel Restaurant
1,Agua Dulce,Pizza Place,Gift Shop,Convenience Store,Bakery,Grocery Store
2,Alhambra,Bakery,Ice Cream Shop,Seafood Restaurant,Korean Restaurant,Sushi Restaurant
3,Alondra Park,Breakfast Spot,Mexican Restaurant,Fast Food Restaurant,Asian Restaurant,Hookah Bar
4,Altadena,Food Truck,Home Service,Grocery Store,Bank,Bakery
5,Angeles Crest,Other Great Outdoors,Scenic Lookout,Yoga Studio,Film Studio,Falafel Restaurant
6,Arcadia,Racetrack,Optical Shop,Track,American Restaurant,Night Market
7,Arleta,Movie Theater,Dog Run,Historic Site,Mexican Restaurant,Yoga Studio
8,Arlington Heights,Bakery,Restaurant,Art Gallery,Korean Restaurant,Health & Beauty Service
9,Artesia,Indian Restaurant,Fast Food Restaurant,Korean Restaurant,Bubble Tea Shop,Sandwich Place


### Get Neighborhoods with Japanese or Sushi restaurants

In [23]:
JP1 = neighborhoods_venues_sorted.loc[neighborhoods_venues_sorted['1st Most Common Venue'].isin(['Japanese Restaurant'])]
JP2 = neighborhoods_venues_sorted.loc[neighborhoods_venues_sorted['2nd Most Common Venue'].isin(['Japanese Restaurant'])]
JP3 = neighborhoods_venues_sorted.loc[neighborhoods_venues_sorted['3rd Most Common Venue'].isin(['Japanese Restaurant'])]
JP4 = neighborhoods_venues_sorted.loc[neighborhoods_venues_sorted['4th Most Common Venue'].isin(['Japanese Restaurant'])]
JP5 = neighborhoods_venues_sorted.loc[neighborhoods_venues_sorted['5th Most Common Venue'].isin(['Japanese Restaurant'])]

In [24]:
Su1 = neighborhoods_venues_sorted.loc[neighborhoods_venues_sorted['1st Most Common Venue'].isin(['Sushi Restaurant'])]
Su2 = neighborhoods_venues_sorted.loc[neighborhoods_venues_sorted['2nd Most Common Venue'].isin(['Sushi Restaurant'])]
Su3 = neighborhoods_venues_sorted.loc[neighborhoods_venues_sorted['3rd Most Common Venue'].isin(['Sushi Restaurant'])]
Su4 = neighborhoods_venues_sorted.loc[neighborhoods_venues_sorted['4th Most Common Venue'].isin(['Sushi Restaurant'])]
Su5 = neighborhoods_venues_sorted.loc[neighborhoods_venues_sorted['5th Most Common Venue'].isin(['Sushi Restaurant'])]

In [25]:
Japanese = pd.concat([JP1, JP2, JP3, JP4, JP5, Su1, Su2, Su3, Su4, Su5], ignore_index=True)
Japanese.shape

(23, 6)

### Visualization of LA neighborhoods with Japanese or Suchi restaurants


In [27]:
JP_data = pd.merge(LA_data, Japanese,  how='inner')
JP_data.head()

Unnamed: 0,Latitude,Location,Longitude,Neighborhood,Region,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,34.093042,"(Alhambra, Los Angeles County, California, USA...",-118.12706,Alhambra,San Gabriel Valley,Bakery,Ice Cream Shop,Seafood Restaurant,Korean Restaurant,Sushi Restaurant
1,34.133875,"(Azusa, Los Angeles County, California, USA, (...",-117.905605,Azusa,San Gabriel Valley,Mexican Restaurant,Coffee Shop,Japanese Restaurant,Pharmacy,Pizza Place
2,33.98529,"(Del Rey, Culver Garden, LA, Los Angeles Count...",-118.425355,Del Rey,Westside,Japanese Restaurant,Lounge,Liquor Store,Gym / Fitness Center,Diner
3,33.940014,"(Downey, Los Angeles County, California, USA, ...",-118.132569,Downey,Southeast,Mexican Restaurant,Burger Joint,Chinese Restaurant,Asian Restaurant,Sushi Restaurant
4,34.147645,"(Pasadena, Los Angeles County, California, USA...",-118.144478,East Pasadena,San Gabriel Valley,American Restaurant,Sushi Restaurant,Steakhouse,Coffee Shop,Burger Joint


In [28]:
# add markers to map
for lat, lng, label in zip(JP_data['Latitude'], JP_data['Longitude'], JP_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_LA)  
    
map_LA

## 4. Cluster Neighborhoods

In [30]:
JP_grouped = pd.merge(LA_grouped, Japanese,  how='inner')
JP_grouped.head()

Unnamed: 0,Neighborhood,ATM,Accessories Store,Adult Boutique,Advertising Agency,Airport,Airport Lounge,American Restaurant,Amphitheater,Antique Shop,...,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Alhambra,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.02381,0.0,0.0,Bakery,Ice Cream Shop,Seafood Restaurant,Korean Restaurant,Sushi Restaurant
1,Azusa,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,Mexican Restaurant,Coffee Shop,Japanese Restaurant,Pharmacy,Pizza Place
2,Del Rey,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,Japanese Restaurant,Lounge,Liquor Store,Gym / Fitness Center,Diner
3,Downey,0.020833,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,...,0.0,0.0,0.020833,0.0,0.0,Mexican Restaurant,Burger Joint,Chinese Restaurant,Asian Restaurant,Sushi Restaurant
4,East Pasadena,0.0,0.0,0.0,0.013699,0.0,0.0,0.082192,0.0,0.013699,...,0.013699,0.013699,0.0,0.0,0.0,American Restaurant,Sushi Restaurant,Steakhouse,Coffee Shop,Burger Joint


In [31]:
print(list(JP_grouped.columns))

['Neighborhood', 'ATM', 'Accessories Store', 'Adult Boutique', 'Advertising Agency', 'Airport', 'Airport Lounge', 'American Restaurant', 'Amphitheater', 'Antique Shop', 'Aquarium', 'Arcade', 'Art Gallery', 'Art Museum', 'Arts & Crafts Store', 'Arts & Entertainment', 'Asian Restaurant', 'Athletics & Sports', 'Australian Restaurant', 'Auto Dealership', 'Auto Garage', 'Auto Workshop', 'Automotive Shop', 'BBQ Joint', 'Baby Store', 'Bagel Shop', 'Bakery', 'Bank', 'Bar', 'Baseball Field', 'Basketball Court', 'Basketball Stadium', 'Beach', 'Beer Bar', 'Beer Garden', 'Beer Store', 'Big Box Store', 'Bike Rental / Bike Share', 'Bike Shop', 'Bistro', 'Board Shop', 'Boat Rental', 'Boat or Ferry', 'Bookstore', 'Boutique', 'Bowling Alley', 'Brazilian Restaurant', 'Breakfast Spot', 'Brewery', 'Bridal Shop', 'Bubble Tea Shop', 'Buffet', 'Building', 'Burger Joint', 'Burrito Place', 'Bus Line', 'Bus Station', 'Bus Stop', 'Business Service', 'Butcher', 'Cafeteria', 'Café', 'Cajun / Creole Restaurant', 'C

### Get features for clustering

### 1- Bar feature

In [32]:
ft_col1 = [col for col in JP_grouped.columns if 'Wine' in col]
print(ft_col1)

['Wine Bar', 'Wine Shop', 'Winery']


In [33]:
ft_col2 = [col for col in JP_grouped.columns if 'Bar' in col]
print(ft_col2)

['Bar', 'Beer Bar', 'Cocktail Bar', 'Dive Bar', 'Gay Bar', 'Hookah Bar', 'Hotel Bar', 'Juice Bar', 'Karaoke Bar', 'Salon / Barbershop', 'Sports Bar', 'Whisky Bar', 'Wine Bar']


In [34]:
ft_col3 = [col for col in JP_grouped.columns if 'Beer' in col]
print(ft_col3)

['Beer Bar', 'Beer Garden', 'Beer Store']


In [35]:
Bar_ft = ['Wine Bar', 'Wine Shop', 'Winery', 'Bar', 'Beer Bar', 'Cocktail Bar', 'Hotel Bar', 'Karaoke Bar' , 'Whisky Bar', 'Beer Bar', 'Beer Garden', 'Beer Store']

In [36]:
JP_bar = JP_grouped[Bar_ft]
JP_bar_mean = JP_bar.mean(axis=1)

### 2- Restaurant feature

In [38]:
Restaurant_ft = [col for col in JP_grouped.columns if 'Restaurant' in col]
print(Restaurant_ft)

['American Restaurant', 'Asian Restaurant', 'Australian Restaurant', 'Brazilian Restaurant', 'Cajun / Creole Restaurant', 'Caribbean Restaurant', 'Chinese Restaurant', 'Cuban Restaurant', 'Dim Sum Restaurant', 'Dongbei Restaurant', 'Dumpling Restaurant', 'Eastern European Restaurant', 'English Restaurant', 'Falafel Restaurant', 'Fast Food Restaurant', 'Filipino Restaurant', 'French Restaurant', 'Greek Restaurant', 'Hawaiian Restaurant', 'Hotpot Restaurant', 'Indian Restaurant', 'Indonesian Restaurant', 'Italian Restaurant', 'Japanese Restaurant', 'Korean Restaurant', 'Kosher Restaurant', 'Latin American Restaurant', 'Malay Restaurant', 'Mediterranean Restaurant', 'Mexican Restaurant', 'Middle Eastern Restaurant', 'Mongolian Restaurant', 'Moroccan Restaurant', 'New American Restaurant', 'North Indian Restaurant', 'Persian Restaurant', 'Peruvian Restaurant', 'Polish Restaurant', 'Ramen Restaurant', 'Restaurant', 'Russian Restaurant', 'Salvadoran Restaurant', 'Seafood Restaurant', 'Shabu-

In [39]:
JP_Restaurant = JP_grouped[Restaurant_ft]
JP_Restaurant_mean = JP_Restaurant.mean(axis=1)

### 3- All other features

In [40]:
LLL = list(Bar_ft)
JP_Other = JP_grouped.drop(LLL, axis=1)
DDD = list(Restaurant_ft)
JP_Other = JP_Other.drop(DDD, axis=1)

JP_Other_mean = JP_Other.mean(axis=1)

In [41]:
JP_grouped_clustering = pd.DataFrame({'Bar': JP_bar_mean, 'Restaurant': JP_Restaurant_mean, 'Others': JP_Other_mean})
JP_grouped_clustering.head()

Unnamed: 0,Bar,Restaurant,Others
0,0.003968,0.005291,0.002442
1,0.0,0.004444,0.002784
2,0.0,0.003704,0.00293
3,0.003472,0.007716,0.001984
4,0.006849,0.005327,0.002308


In [42]:
# set number of clusters
kclusters = 4

#JP_grouped_clustering = JP_grouped.drop(['Neighborhood', '1st Most Common Venue', '2nd Most Common Venue', '3rd Most Common Venue','4th Most Common Venue','5th Most Common Venue',], 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(JP_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 0, 0, 2, 2, 0, 1, 1, 0, 3], dtype=int32)

In [43]:
JP_merged = JP_data

# add clustering labels
JP_merged['Cluster_Labels'] = kmeans.labels_
JP_merge = JP_merged.drop('Location', 1)

JP_merge.head()

Unnamed: 0,Latitude,Longitude,Neighborhood,Region,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Cluster_Labels
0,34.093042,-118.12706,Alhambra,San Gabriel Valley,Bakery,Ice Cream Shop,Seafood Restaurant,Korean Restaurant,Sushi Restaurant,2
1,34.133875,-117.905605,Azusa,San Gabriel Valley,Mexican Restaurant,Coffee Shop,Japanese Restaurant,Pharmacy,Pizza Place,0
2,33.98529,-118.425355,Del Rey,Westside,Japanese Restaurant,Lounge,Liquor Store,Gym / Fitness Center,Diner,0
3,33.940014,-118.132569,Downey,Southeast,Mexican Restaurant,Burger Joint,Chinese Restaurant,Asian Restaurant,Sushi Restaurant,2
4,34.147645,-118.144478,East Pasadena,San Gabriel Valley,American Restaurant,Sushi Restaurant,Steakhouse,Coffee Shop,Burger Joint,2


In [45]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(JP_merged['Latitude'], JP_merged['Longitude'], JP_merged['Neighborhood'], JP_merged['Cluster_Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 6- Characteristics of each cluster:

In [46]:
JP_merge['Bar'] = JP_grouped_clustering.Bar
JP_merge['Restaurant'] = JP_grouped_clustering.Restaurant
JP_merge['Others'] = JP_grouped_clustering.Others

### Group by cluster

In [49]:
agg_clus = JP_merge.groupby(['Cluster_Labels'])['Bar', 'Restaurant', 'Others'].mean()
agg_clus['Num Of Neighborhoods'] = JP_merge.groupby(['Cluster_Labels'])['Cluster_Labels'].count()
agg_clus

Unnamed: 0_level_0,Bar,Restaurant,Others,Num Of Neighborhoods
Cluster_Labels,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,0.0,0.00469,0.002726,9
1,0.0,0.008436,0.001994,7
2,0.004921,0.00582,0.002296,6
3,0.0,0.012963,0.001099,1


### Group by Region

In [50]:
agg_region = JP_merge.groupby(['Region'])['Bar', 'Restaurant', 'Others'].mean()
agg_region['Num Of Neighborhoods'] = JP_merge.groupby(['Region'])['Region'].count()
agg_region

Unnamed: 0_level_0,Bar,Restaurant,Others,Num Of Neighborhoods
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Antelope Valley,0.0,0.008715,0.001939,2
Central L.A.,0.0,0.002946,0.002997,1
Harbor,0.0,0.007407,0.002198,1
Northwest County,0.0,0.006173,0.002442,1
San Fernando Valley,0.000753,0.004757,0.002689,4
San Gabriel Valley,0.001803,0.006952,0.002209,6
South Bay,0.0,0.008509,0.00198,2
Southeast,0.003472,0.007716,0.001984,1
Verdugos,0.003425,0.00575,0.002375,2
Westside,0.001792,0.00607,0.002383,3


### Details of each clusters

In [52]:
# cluster 1
JP_merge.loc[JP_merge['Cluster_Labels'] == 0, JP_merge.columns[list(range(2, JP_merge.shape[1]))]]

Unnamed: 0,Neighborhood,Region,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Cluster_Labels,Bar,Restaurant,Others
1,Azusa,San Gabriel Valley,Mexican Restaurant,Coffee Shop,Japanese Restaurant,Pharmacy,Pizza Place,0,0.0,0.004444,0.002784
2,Del Rey,Westside,Japanese Restaurant,Lounge,Liquor Store,Gym / Fitness Center,Diner,0,0.0,0.003704,0.00293
5,Encino,San Fernando Valley,Pizza Place,Japanese Restaurant,ATM,Mediterranean Restaurant,Chinese Restaurant,0,0.0,0.00463,0.002747
8,Hasley Canyon,Northwest County,Sushi Restaurant,Coffee Shop,Shopping Mall,Mexican Restaurant,Sandwich Place,0,0.0,0.006173,0.002442
15,Sepulveda Basin,San Fernando Valley,Intersection,Lake,Bagel Shop,Sushi Restaurant,Farm,0,0.0,0.00463,0.002747
17,South Pasadena,San Gabriel Valley,Pizza Place,Grocery Store,Chinese Restaurant,Japanese Restaurant,Restaurant,0,0.0,0.004428,0.002787
18,Tujunga,Verdugos,Mexican Restaurant,Sports Bar,Sushi Restaurant,Pizza Place,Bakery,0,0.0,0.006173,0.002442
21,Windsor Square,Central L.A.,Coffee Shop,Bakery,Italian Restaurant,Juice Bar,Sushi Restaurant,0,0.0,0.002946,0.002997
22,Woodland Hills,San Fernando Valley,Sushi Restaurant,Ice Cream Shop,Mediterranean Restaurant,Burger Joint,Spa,0,0.0,0.005084,0.002657


In [53]:
# cluster 2
JP_merge.loc[JP_merge['Cluster_Labels'] == 1, JP_merge.columns[list(range(2, JP_merge.shape[1]))]]

Unnamed: 0,Neighborhood,Region,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Cluster_Labels,Bar,Restaurant,Others
6,Gardena,South Bay,Japanese Restaurant,Sushi Restaurant,Bakery,Noodle House,Pizza Place,1,0.0,0.008509,0.00198
7,Gardena,South Bay,Japanese Restaurant,Sushi Restaurant,Bakery,Noodle House,Pizza Place,1,0.0,0.008509,0.00198
10,Lakewood,Harbor,Fast Food Restaurant,Japanese Restaurant,Cosmetics Shop,Convenience Store,Supermarket,1,0.0,0.007407,0.002198
11,Northwest Palmdale,Antelope Valley,Pizza Place,Fast Food Restaurant,Japanese Restaurant,Dance Studio,Optical Shop,1,0.0,0.008715,0.001939
12,Palmdale,Antelope Valley,Pizza Place,Fast Food Restaurant,Japanese Restaurant,Dance Studio,Optical Shop,1,0.0,0.008715,0.001939
13,Palms,Westside,Yoga Studio,Italian Restaurant,Japanese Restaurant,Café,Asian Restaurant,1,0.0,0.007937,0.002093
19,Walnut,San Gabriel Valley,Asian Restaurant,Pizza Place,Donut Shop,Thai Restaurant,Sushi Restaurant,1,0.0,0.009259,0.001832


In [54]:
# cluster 3
JP_merge.loc[JP_merge['Cluster_Labels'] == 2, JP_merge.columns[list(range(2, JP_merge.shape[1]))]]

Unnamed: 0,Neighborhood,Region,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Cluster_Labels,Bar,Restaurant,Others
0,Alhambra,San Gabriel Valley,Bakery,Ice Cream Shop,Seafood Restaurant,Korean Restaurant,Sushi Restaurant,2,0.003968,0.005291,0.002442
3,Downey,Southeast,Mexican Restaurant,Burger Joint,Chinese Restaurant,Asian Restaurant,Sushi Restaurant,2,0.003472,0.007716,0.001984
4,East Pasadena,San Gabriel Valley,American Restaurant,Sushi Restaurant,Steakhouse,Coffee Shop,Burger Joint,2,0.006849,0.005327,0.002308
14,Pasadena,Verdugos,American Restaurant,Sushi Restaurant,Steakhouse,Coffee Shop,Burger Joint,2,0.006849,0.005327,0.002308
16,Sherman Oaks,San Fernando Valley,Sushi Restaurant,Burger Joint,Coffee Shop,Clothing Store,Pet Store,2,0.003012,0.004685,0.002604
20,West Los Angeles,Westside,Middle Eastern Restaurant,Indie Movie Theater,Japanese Restaurant,Pizza Place,Szechuan Restaurant,2,0.005376,0.006571,0.002127


In [55]:
# cluster 4
JP_merge.loc[JP_merge['Cluster_Labels'] == 3, JP_merge.columns[list(range(2, JP_merge.shape[1]))]]

Unnamed: 0,Neighborhood,Region,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Cluster_Labels,Bar,Restaurant,Others
9,Irwindale,San Gabriel Valley,Fast Food Restaurant,Mexican Restaurant,Sandwich Place,Japanese Restaurant,Middle Eastern Restaurant,3,0.0,0.012963,0.001099
