# Korean Restaurants in Los Angeles

A reminder of the purpose of this exercise:
A company that operates a chain of Korean Restaurants is considering expansion into Los Angeles. They do not know the city, and have asked us to identify the best Districts for them to set up, based on:
 - As few as possible nearby Korean Restaurants
 - Reasonable population density
 - Good level of local non-restaurant amenities

I have used the list of Districts, and 2010 Census data available at this site http://www.laalmanac.com/
In particular the list of zip codes - http://www.laalmanac.com/communications/cm02_communities.php
and the list of Districts' population - http://www.laalmanac.com/population/po24la.php

The data was messy and required cleaning to ensure naming conventions of e.g. Districts were identical. This wasn't something possible in Pandas. In any case, we now have 105 Districts to work with. Let's start by bringing in our data on Los Angeles Districts

In [72]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,District,Zip Code(s)
0,Arleta,91331
1,Arlington Heights,90019
2,Atwater Village,90039
3,Baldwin Hills,90008
4,Bel-Air,90049
5,Beverly Glen,90077
6,Boyle Heights,"90033, 90063"
7,Brentwood,90049
8,Byzantine-Latino Quarter,90006
9,Canoga Park,91304


In [73]:
#We can see that we have some Districts with numerous postcodes attributed. We will assume that the first zip code is the major one for that area, and remove the others
df_x = df_dist['Zip Code(s)'].str.split(pat=',',expand=True)
df_x

Unnamed: 0,0,1,2,3,4,5
0,91331,,,,,
1,90019,,,,,
2,90039,,,,,
3,90008,,,,,
4,90049,,,,,
5,90077,,,,,
6,90033,90063.0,,,,
7,90049,,,,,
8,90006,,,,,
9,91304,,,,,


In [74]:
df_dist['Zip Code(s)'] = df_x[0]
df_dist

Unnamed: 0,District,Zip Code(s)
0,Arleta,91331
1,Arlington Heights,90019
2,Atwater Village,90039
3,Baldwin Hills,90008
4,Bel-Air,90049
5,Beverly Glen,90077
6,Boyle Heights,90033
7,Brentwood,90049
8,Byzantine-Latino Quarter,90006
9,Canoga Park,91304


In [75]:
#Let's rename the Zip Codes column to something easier
df_dist.rename({'Zip Code(s)': 'Zip'}, axis=1, inplace=True)
df_dist

Unnamed: 0,District,Zip
0,Arleta,91331
1,Arlington Heights,90019
2,Atwater Village,90039
3,Baldwin Hills,90008
4,Bel-Air,90049
5,Beverly Glen,90077
6,Boyle Heights,90033
7,Brentwood,90049
8,Byzantine-Latino Quarter,90006
9,Canoga Park,91304


Ok so we are happy with the dataframe showing Districts and their Zip locations. Let's move on to the population data.

In [76]:
body = client_3a06e2bbe0894a1cb24c04dd63182127.get_object(Bucket='courseracapstone-donotdelete-pr-vtxgmrryvvil1p',Key='LA_Pop_Clean.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_pop = pd.read_csv(body)
df_pop


Unnamed: 0,District,Population
0,Arleta,34492
1,Arlington Heights,17618
2,Atwater Village,14101
3,Baldwin Hills,26303
4,Bel-Air,8261
5,Beverly Glen,4341
6,Boyle Heights,91193
7,Brentwood,30964
8,Byzantine-Latino Quarter,20919
9,Canoga Park,48272


In [77]:
# The population data looks good, but let's remove the comma separator and ensure that Population is stored as an integer
df_pop['Population'] =df_pop['Population'].str.replace(',','')

In [78]:
df_pop.dtypes

District      object
Population    object
dtype: object

In [79]:
#Cast population as an integer
df_pop['Population'] = df_pop['Population'].astype(int)
df_pop.dtypes

District      object
Population     int64
dtype: object

In [80]:
df_pop.head()

Unnamed: 0,District,Population
0,Arleta,34492
1,Arlington Heights,17618
2,Atwater Village,14101
3,Baldwin Hills,26303
4,Bel-Air,8261


Ok so we are happy with the dataframe showing Populations by District. Now let's join the 2 dataframes together before we move on to location data.

In [81]:
df_la = pd.concat((df_dist, df_pop['Population']), axis=1)
df_la

Unnamed: 0,District,Zip,Population
0,Arleta,91331,34492
1,Arlington Heights,90019,17618
2,Atwater Village,90039,14101
3,Baldwin Hills,90008,26303
4,Bel-Air,90049,8261
5,Beverly Glen,90077,4341
6,Boyle Heights,90033,91193
7,Brentwood,90049,30964
8,Byzantine-Latino Quarter,90006,20919
9,Canoga Park,91304,48272


We need to populate the Latitude/Longitude data into our table next. We have this data, by zip code, at this location - https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/table/?refine.state=CA. We have downloaded it as a CSV.

In [82]:
body = client_3a06e2bbe0894a1cb24c04dd63182127.get_object(Bucket='courseracapstone-donotdelete-pr-vtxgmrryvvil1p',Key='us-zip-code-latitude-and-longitude.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_geo = pd.read_csv(body)
df_geo.head()

Unnamed: 0,Zip,City,State,Latitude,Longitude,Timezone,Daylight savings time flag,geopoint
0,95717,Gold Run,CA,39.177026,-120.8451,-8,1,39.177026
1,94564,Pinole,CA,37.997509,-122.29208,-8,1,37.997509
2,91605,North Hollywood,CA,34.208142,-118.4011,-8,1,34.208142
3,91102,Pasadena,CA,33.786594,-118.298662,-8,1,33.786594
4,95019,Freedom,CA,36.935552,-121.77972,-8,1,36.935552


In [83]:
#Let's drop all columns apart from Zip and Latitude/Longitude
df_geo = df_geo.drop(['City','State','Timezone','Daylight savings time flag','geopoint'], axis=1)


In [84]:
df_geo.head()

Unnamed: 0,Zip,Latitude,Longitude
0,95717,39.177026,-120.8451
1,94564,37.997509,-122.29208
2,91605,34.208142,-118.4011
3,91102,33.786594,-118.298662
4,95019,36.935552,-121.77972


In [85]:
#Let's make zip the index
df_geo.set_index('Zip', inplace=True)

In [86]:
df_la['Zip'] = df_la['Zip'].astype(int)

In [87]:
#Let's also make zip the index of our original combinated dataframe

In [88]:
df_la.set_index('Zip', inplace=True)
df_la

Unnamed: 0_level_0,District,Population
Zip,Unnamed: 1_level_1,Unnamed: 2_level_1
91331,Arleta,34492
90019,Arlington Heights,17618
90039,Atwater Village,14101
90008,Baldwin Hills,26303
90049,Bel-Air,8261
90077,Beverly Glen,4341
90033,Boyle Heights,91193
90049,Brentwood,30964
90006,Byzantine-Latino Quarter,20919
91304,Canoga Park,48272


In [89]:
df_comb = df_la.join(df_geo, lsuffix='1', rsuffix='2')

In [90]:
df_comb.reset_index(inplace=True)

In [91]:
df_comb

Unnamed: 0,Zip,District,Population,Latitude,Longitude
0,90002,Southeast Los Angeles,192229,33.948315,-118.24845
1,90002,Watts,3513,33.948315,-118.24845
2,90004,Hancock Park,4615,34.07711,-118.30755
3,90004,Rampart Village,21060,34.07711,-118.30755
4,90004,Virgil Village,32625,34.07711,-118.30755
5,90004,Wilshire Center,65232,34.07711,-118.30755
6,90004,Windsor Square,5275,34.07711,-118.30755
7,90005,Wilshire Park,10845,34.058911,-118.30848
8,90006,Byzantine-Latino Quarter,20919,34.048351,-118.2943
9,90006,Harvard Heights,7848,34.048351,-118.2943


df_comb now contains all data we need on these LA Districts. Let's quickly add a measure of population density for each District, by dividing each district's population by the total population of all districts.

In [92]:
df_comb['Pop Density'] = df_comb['Population']/df_comb['Population'].sum()

In [93]:
df_comb

Unnamed: 0,Zip,District,Population,Latitude,Longitude,Pop Density
0,90002,Southeast Los Angeles,192229,33.948315,-118.24845,0.055056
1,90002,Watts,3513,33.948315,-118.24845,0.001006
2,90004,Hancock Park,4615,34.07711,-118.30755,0.001322
3,90004,Rampart Village,21060,34.07711,-118.30755,0.006032
4,90004,Virgil Village,32625,34.07711,-118.30755,0.009344
5,90004,Wilshire Center,65232,34.07711,-118.30755,0.018683
6,90004,Windsor Square,5275,34.07711,-118.30755,0.001511
7,90005,Wilshire Park,10845,34.058911,-118.30848,0.003106
8,90006,Byzantine-Latino Quarter,20919,34.048351,-118.2943,0.005991
9,90006,Harvard Heights,7848,34.048351,-118.2943,0.002248


Let's move on to getting FourSquare data. Recall that we want specific information on Korean Restaurants, and specific information on non-restaurant locations

In [2]:
#FourSquare credentials - redacted
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
ACCESS_TOKEN = '' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 
CLIENT_SECRET:


In [95]:
#Import required modules

import numpy as np # library to handle data in a vectorized manner

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors



Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: \ 
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
                                                                                                              -failed

UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:

Specifications:

  - cffi -> python[version='2.7.*|3.5.*|3.6.*|3.6.12|>=3.6,<3.7.0a0|>=3.7,<3.8.0a0|>=3.9,<3.10.0a0|>=3.8,<3.9.0a0|3.6.9|3.6.9|3.6.9|>=2.7,<2.8.0a0|3.6.9|>=3.5,<3.6.0a0|3.4.*',build='0_73_pypy|1_73_pypy|2_73_p

In [96]:
pip install folium

Note: you may need to restart the kernel to use updated packages.


In [97]:
import folium

In [98]:
#Define user_agent for FourSquare operations

address = 'Los Angeles, CA'

geolocator = Nominatim(user_agent="la_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Los Angeles are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Los Angeles are 34.0536909, -118.242766.


In [99]:
#Mapping out the Los Angeles neighbourhoods we have
map_la = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, district, in zip(df_comb['Latitude'], df_comb['Longitude'], df_comb['District']):
    label = '{}'.format(district)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_la)  
    
map_la

In [100]:
#Find the first District for venue exploration
df_comb.loc[0,'District']

'Southeast Los Angeles'

In [101]:
#Details of the first District
neighborhood_latitude = df_comb.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df_comb.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = df_comb.loc[0, 'District'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Southeast Los Angeles are 33.948315, -118.24845.


In [3]:
#Make the first call to FourSquare API for Southeast Los Angeles

LIMIT = 100 # limit of number of venues returned by Foursquare API


radius = 500 # define radius



# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
CLIENT_ID, 
CLIENT_SECRET, 
VERSION, 
neighborhood_latitude, 
neighborhood_longitude, 
radius, 
LIMIT)
url # display URL

NameError: name 'neighborhood_latitude' is not defined

In [103]:
#Get requests Library
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

In [104]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ff9ab0dcde0a63d8a9d3b92'},
  'headerLocation': 'Watts',
  'headerFullLocation': 'Watts, Los Angeles',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 3,
  'suggestedBounds': {'ne': {'lat': 33.9528150045, 'lng': -118.24303544079379},
   'sw': {'lat': 33.9438149955, 'lng': -118.25386455920622}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5afdaa7d464d65002ca6468e',
       'name': 'Gemini Squad',
       'location': {'address': 'Gemini Sqaud',
        'crossStreet': 'California',
        'lat': 33.948041499079906,
        'lng': -118.24777007102966,
        'labeledLatLngs': [{'label': 'display',
          'lat': 33.948041499079906,
          'lng': -118.24777007102966}],
        'distance': 69,
        'postalCode':

In [105]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [106]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  app.launch_new_instance()


Unnamed: 0,name,categories,lat,lng
0,Gemini Squad,IT Services,33.948041,-118.24777
1,McKevins Pub,Bar,33.944476,-118.246948
2,Watts Senior Center & Rose Garden,Park,33.9459,-118.244087


In [107]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [108]:
la_venues = getNearbyVenues(names=df_comb['District'],
                                   latitudes=df_comb['Latitude'],
                                   longitudes=df_comb['Longitude']
                                  )

Southeast Los Angeles
Watts
Hancock Park
Rampart Village
Virgil Village
Wilshire Center
Windsor Square
Wilshire Park
Byzantine-Latino Quarter
Harvard Heights
Koreatown
University Park
Baldwin Hills
Crenshaw
Leimert Park
Chinatown
Downtown Civic Center
Downtown Little Tokyo
Downtown Fashion District
Downtown Arts District
Downtown Historic Core
Downtown South Park
West Adams
Downtown Bunker Hill
Downtown City West
Jefferson Park
Arlington Heights
Country Club Park
Mid-City
Western Wilton
Downtown Southeast
Westwood
West Los Angeles
Echo Park
Silver Lake
Griffith Park
Hollywood
Los Feliz
East Hollywood
Lincoln Heights
El Sereno
Boyle Heights
Palms
Fairfax
Melrose
Miracle Mile
South Los Angeles
Atwater Village
Elysian Valley
Eagle Rock
Highland Park
Hyde Park
Westchester
Mid-City West
Bel-Air
Brentwood
Westlake
Harbor Gateway
Cheviot Hills
Rancho Park
Cypress Park
Glassell Park
Mount Washington
Century City
Beverly Glen
Playa Vista
Pacific Palisades
Topanga State Park
Venice
Playa del Rey

In [109]:
la_venues.head(50)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Southeast Los Angeles,33.948315,-118.24845,Gemini Squad,33.948041,-118.24777,IT Services
1,Southeast Los Angeles,33.948315,-118.24845,McKevins Pub,33.944476,-118.246948,Bar
2,Southeast Los Angeles,33.948315,-118.24845,Watts Senior Center & Rose Garden,33.9459,-118.244087,Park
3,Watts,33.948315,-118.24845,Gemini Squad,33.948041,-118.24777,IT Services
4,Watts,33.948315,-118.24845,McKevins Pub,33.944476,-118.246948,Bar
5,Watts,33.948315,-118.24845,Watts Senior Center & Rose Garden,33.9459,-118.244087,Park
6,Hancock Park,34.07711,-118.30755,Kim Sun Young Hair Beauty Salon (Kim Sun Young...,34.076453,-118.308921,Salon / Barbershop
7,Hancock Park,34.07711,-118.30755,Jaraguá,34.076364,-118.306646,Cocktail Bar
8,Hancock Park,34.07711,-118.30755,Noshi Sushi,34.076159,-118.305374,Sushi Restaurant
9,Hancock Park,34.07711,-118.30755,Lab Coffee and Roasters,34.078942,-118.309562,Coffee Shop


In [110]:
# one hot encoding
la_onehot = pd.get_dummies(la_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
la_onehot['Neighborhood'] = la_venues['Neighborhood']

# move neighborhood column to the first column
fixed_columns = [la_onehot.columns[-92]] + list(la_onehot.columns[:-92])
la_onehot = la_onehot[fixed_columns]

In [111]:
la_onehot.head()

Unnamed: 0,Neighborhood,ATM,Accessories Store,Airport,American Restaurant,Antique Shop,Arcade,Art Gallery,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Basketball Court,Basketball Stadium,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bistro,Board Shop,Bookstore,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Line,Bus Station,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Campground,Candy Store,Cantonese Restaurant,Carpet Store,Casino,Check Cashing Service,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Residence Hall,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Shop,Doctor's Office,Dog Run,Doner Restaurant,Donut Shop,Dry Cleaner,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Escape Room,Event Space,Fabric Shop,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Flea Market,Flight School,Flower Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Truck,Fraternity House,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,High School,Historic Site,Hobby Shop,Home Service,Hot Dog Joint,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indoor Play Area,Insurance Office,Intersection,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Kitchen Supply Store,Korean Restaurant,Kosher Restaurant,Lake,Latin American Restaurant,Laundromat,Laundry Service,Light Rail Station,Lingerie Store,Liquor Store,Locksmith,Lounge,Marijuana Dispensary,Market,Martial Arts School,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Moroccan Restaurant,Motel,Motorcycle Shop,Motorsports Shop,Movie Theater,Multiplex,Music Venue
0,Southeast Los Angeles,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Southeast Los Angeles,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Southeast Los Angeles,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Watts,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Watts,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [112]:
la_onehot.shape

(2632, 173)

Let's grab some useful statistics at this point - namely:
a) Count of Korean Restaurants per district
b) Count of non-Korean Restaurants per district
c) Count of non-restaurant amenities per district

In [113]:
#Use sum of occurence of each category
la_grouped = la_onehot.groupby('Neighborhood').sum().reset_index()
la_grouped

Unnamed: 0,Neighborhood,ATM,Accessories Store,Airport,American Restaurant,Antique Shop,Arcade,Art Gallery,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Basketball Court,Basketball Stadium,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bistro,Board Shop,Bookstore,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Line,Bus Station,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Campground,Candy Store,Cantonese Restaurant,Carpet Store,Casino,Check Cashing Service,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Residence Hall,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Shop,Doctor's Office,Dog Run,Doner Restaurant,Donut Shop,Dry Cleaner,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Escape Room,Event Space,Fabric Shop,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Flea Market,Flight School,Flower Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Truck,Fraternity House,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,High School,Historic Site,Hobby Shop,Home Service,Hot Dog Joint,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indoor Play Area,Insurance Office,Intersection,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Kitchen Supply Store,Korean Restaurant,Kosher Restaurant,Lake,Latin American Restaurant,Laundromat,Laundry Service,Light Rail Station,Lingerie Store,Liquor Store,Locksmith,Lounge,Marijuana Dispensary,Market,Martial Arts School,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Moroccan Restaurant,Motel,Motorcycle Shop,Motorsports Shop,Movie Theater,Multiplex,Music Venue
0,Arleta,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
1,Arlington Heights,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,2,0,0,0,1,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,2,0,0,0,0,0,0,0,0,0,2,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,2,0,0,0,0,0,0,0,0
2,Atwater Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1
3,Baldwin Hills,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,2,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0
4,Bel-Air,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,Beverly Glen,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6,Boyle Heights,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
7,Brentwood,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,Byzantine-Latino Quarter,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
9,Canoga Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0


In [114]:
#Create a dataframe with only Restaurants
df_rest = la_grouped.filter(regex='Restaurant')
df_rest

Unnamed: 0,American Restaurant,Asian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Cantonese Restaurant,Chinese Restaurant,Cuban Restaurant,Dim Sum Restaurant,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant,Greek Restaurant,Halal Restaurant,Hawaiian Restaurant,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Korean Restaurant,Kosher Restaurant,Latin American Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Moroccan Restaurant
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
3,0,0,0,0,0,1,0,0,0,0,0,3,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0
9,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0


In [115]:
#Let's make the index of our restaurants-only dataframe the Neighborhood
df_rest.set_index(la_grouped['Neighborhood'])

Unnamed: 0_level_0,American Restaurant,Asian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Cantonese Restaurant,Chinese Restaurant,Cuban Restaurant,Dim Sum Restaurant,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant,Greek Restaurant,Halal Restaurant,Hawaiian Restaurant,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Korean Restaurant,Kosher Restaurant,Latin American Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Moroccan Restaurant
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1
Arleta,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
Arlington Heights,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0
Atwater Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
Baldwin Hills,0,0,0,0,0,1,0,0,0,0,0,3,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0
Bel-Air,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Beverly Glen,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Boyle Heights,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
Brentwood,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Byzantine-Latino Quarter,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0
Canoga Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0


In [116]:
# Let's create a new dataframe with the statistics we want
df_counts = pd.DataFrame() 

In [117]:
df_counts['Neighborhood'] = la_grouped['Neighborhood']

In [118]:
df_counts['Korean Total'] = df_rest['Korean Restaurant']

In [119]:
df_counts['Total Restaurants'] = df_rest.sum(axis=1, numeric_only=True)

In [120]:
df_counts

Unnamed: 0,Neighborhood,Korean Total,Total Restaurants
0,Arleta,0,1
1,Arlington Heights,0,5
2,Atwater Village,0,1
3,Baldwin Hills,0,6
4,Bel-Air,0,0
5,Beverly Glen,0,0
6,Boyle Heights,0,1
7,Brentwood,0,0
8,Byzantine-Latino Quarter,1,2
9,Canoga Park,0,3


In [121]:
#Lastly let's add total amenities per Neighbourhood, and from that easily derive total non-restaurant amenities too
df_counts['Total Locations'] = la_grouped.sum(axis=1, numeric_only=True)

In [122]:
df_counts['Other Amenities'] = df_counts['Total Locations'] - df_counts['Total Restaurants']
df_counts

Unnamed: 0,Neighborhood,Korean Total,Total Restaurants,Total Locations,Other Amenities
0,Arleta,0,1,3,2
1,Arlington Heights,0,5,26,21
2,Atwater Village,0,1,4,3
3,Baldwin Hills,0,6,24,18
4,Bel-Air,0,0,4,4
5,Beverly Glen,0,0,3,3
6,Boyle Heights,0,1,3,2
7,Brentwood,0,0,4,4
8,Byzantine-Latino Quarter,1,2,10,8
9,Canoga Park,0,3,6,3


In [123]:
#We don't need Total Locations any longer. Let's drop it. We will keep Total Restaurants as this could come in useful later.
df_counts.drop(['Total Locations'], axis=1, inplace=True)

In [124]:
df_counts

Unnamed: 0,Neighborhood,Korean Total,Total Restaurants,Other Amenities
0,Arleta,0,1,2
1,Arlington Heights,0,5,21
2,Atwater Village,0,1,3
3,Baldwin Hills,0,6,18
4,Bel-Air,0,0,4
5,Beverly Glen,0,0,3
6,Boyle Heights,0,1,2
7,Brentwood,0,0,4
8,Byzantine-Latino Quarter,1,2,8
9,Canoga Park,0,3,3


We're getting very close to what we need to move forward. Let's add this data back into our df_comb dataframe.

In [127]:
#Let's rename the Neighborhood column, since we are referring to "District" throughout this project
df_counts.rename(columns={'Neighborhood' : 'District'}, inplace=True)

In [128]:
df_counts.head()

Unnamed: 0,District,Korean Total,Total Restaurants,Other Amenities
0,Arleta,0,1,2
1,Arlington Heights,0,5,21
2,Atwater Village,0,1,3
3,Baldwin Hills,0,6,18
4,Bel-Air,0,0,4


In [129]:
#We are going to join df_counts onto df_comb. This is easier if we set District as the index of df_counts, then join on District in df_comb
df_counts.set_index('District', inplace=True)

In [130]:
df_la2 = df_comb.join(df_counts, on=df_comb['District'], how='left', lsuffix='1', rsuffix='2')

In [131]:
df_la2.head()

Unnamed: 0,Zip,District,Population,Latitude,Longitude,Pop Density,Korean Total,Total Restaurants,Other Amenities
0,90002,Southeast Los Angeles,192229,33.948315,-118.24845,0.055056,0.0,0.0,2.0
1,90002,Watts,3513,33.948315,-118.24845,0.001006,0.0,0.0,2.0
2,90004,Hancock Park,4615,34.07711,-118.30755,0.001322,5.0,8.0,18.0
3,90004,Rampart Village,21060,34.07711,-118.30755,0.006032,5.0,8.0,18.0
4,90004,Virgil Village,32625,34.07711,-118.30755,0.009344,5.0,8.0,18.0


In [132]:
#We will input a density of Korean restaurants, as a division of the number of Korean restaurants by the amenities in the area
df_la2['Korean Density'] = df_la2['Korean Total'] / (df_la2['Total Restaurants'] + df_la2['Other Amenities'])

In [133]:
df_la2

Unnamed: 0,Zip,District,Population,Latitude,Longitude,Pop Density,Korean Total,Total Restaurants,Other Amenities,Korean Density
0,90002,Southeast Los Angeles,192229,33.948315,-118.24845,0.055056,0.0,0.0,2.0,0.0
1,90002,Watts,3513,33.948315,-118.24845,0.001006,0.0,0.0,2.0,0.0
2,90004,Hancock Park,4615,34.07711,-118.30755,0.001322,5.0,8.0,18.0,0.192308
3,90004,Rampart Village,21060,34.07711,-118.30755,0.006032,5.0,8.0,18.0,0.192308
4,90004,Virgil Village,32625,34.07711,-118.30755,0.009344,5.0,8.0,18.0,0.192308
5,90004,Wilshire Center,65232,34.07711,-118.30755,0.018683,5.0,8.0,18.0,0.192308
6,90004,Windsor Square,5275,34.07711,-118.30755,0.001511,5.0,8.0,18.0,0.192308
7,90005,Wilshire Park,10845,34.058911,-118.30848,0.003106,10.0,14.0,18.0,0.3125
8,90006,Byzantine-Latino Quarter,20919,34.048351,-118.2943,0.005991,1.0,2.0,8.0,0.1
9,90006,Harvard Heights,7848,34.048351,-118.2943,0.002248,1.0,2.0,8.0,0.1


In [134]:
#And an amenity density, based on Population
df_la2['Amenity Density'] = df_la2['Other Amenities'] / df_la2['Population']

In [135]:
df_la2

Unnamed: 0,Zip,District,Population,Latitude,Longitude,Pop Density,Korean Total,Total Restaurants,Other Amenities,Korean Density,Amenity Density
0,90002,Southeast Los Angeles,192229,33.948315,-118.24845,0.055056,0.0,0.0,2.0,0.0,1e-05
1,90002,Watts,3513,33.948315,-118.24845,0.001006,0.0,0.0,2.0,0.0,0.000569
2,90004,Hancock Park,4615,34.07711,-118.30755,0.001322,5.0,8.0,18.0,0.192308,0.0039
3,90004,Rampart Village,21060,34.07711,-118.30755,0.006032,5.0,8.0,18.0,0.192308,0.000855
4,90004,Virgil Village,32625,34.07711,-118.30755,0.009344,5.0,8.0,18.0,0.192308,0.000552
5,90004,Wilshire Center,65232,34.07711,-118.30755,0.018683,5.0,8.0,18.0,0.192308,0.000276
6,90004,Windsor Square,5275,34.07711,-118.30755,0.001511,5.0,8.0,18.0,0.192308,0.003412
7,90005,Wilshire Park,10845,34.058911,-118.30848,0.003106,10.0,14.0,18.0,0.3125,0.00166
8,90006,Byzantine-Latino Quarter,20919,34.048351,-118.2943,0.005991,1.0,2.0,8.0,0.1,0.000382
9,90006,Harvard Heights,7848,34.048351,-118.2943,0.002248,1.0,2.0,8.0,0.1,0.001019


In [136]:
#Let's output a score for each District, which we will call Restaurant Desirability. We will define this as Pop Density * Amenity Density * (1 / (1 + Korean Density))
# We are doing this since more Population and more Amenities is good, but more Korean restaurants is bad
df_la2['Desirability'] = 1000000 * (df_la2['Pop Density'] * (df_la2['Amenity Density']) * (1 / (1 + df_la2['Korean Density'])))
df_la2

Unnamed: 0,Zip,District,Population,Latitude,Longitude,Pop Density,Korean Total,Total Restaurants,Other Amenities,Korean Density,Amenity Density,Desirability
0,90002,Southeast Los Angeles,192229,33.948315,-118.24845,0.055056,0.0,0.0,2.0,0.0,1e-05,0.572818
1,90002,Watts,3513,33.948315,-118.24845,0.001006,0.0,0.0,2.0,0.0,0.000569,0.572818
2,90004,Hancock Park,4615,34.07711,-118.30755,0.001322,5.0,8.0,18.0,0.192308,0.0039,4.323849
3,90004,Rampart Village,21060,34.07711,-118.30755,0.006032,5.0,8.0,18.0,0.192308,0.000855,4.323849
4,90004,Virgil Village,32625,34.07711,-118.30755,0.009344,5.0,8.0,18.0,0.192308,0.000552,4.323849
5,90004,Wilshire Center,65232,34.07711,-118.30755,0.018683,5.0,8.0,18.0,0.192308,0.000276,4.323849
6,90004,Windsor Square,5275,34.07711,-118.30755,0.001511,5.0,8.0,18.0,0.192308,0.003412,4.323849
7,90005,Wilshire Park,10845,34.058911,-118.30848,0.003106,10.0,14.0,18.0,0.3125,0.00166,3.927892
8,90006,Byzantine-Latino Quarter,20919,34.048351,-118.2943,0.005991,1.0,2.0,8.0,0.1,0.000382,2.082973
9,90006,Harvard Heights,7848,34.048351,-118.2943,0.002248,1.0,2.0,8.0,0.1,0.001019,2.082973


In [137]:
df_la2.sort_values('Desirability')

Unnamed: 0,Zip,District,Population,Latitude,Longitude,Pop Density,Korean Total,Total Restaurants,Other Amenities,Korean Density,Amenity Density,Desirability
40,90032,El Sereno,37931,34.08166,-118.17568,0.010864,0.0,2.0,1.0,0.0,2.6e-05,0.286409
97,91401,Van Nuys,13243,34.176505,-118.43308,0.003793,0.0,2.0,1.0,0.0,7.6e-05,0.286409
96,91401,Valley Glen,23493,34.176505,-118.43308,0.006729,0.0,2.0,1.0,0.0,4.3e-05,0.286409
94,91352,Sun Valley,38836,34.224089,-118.37563,0.011123,0.0,0.0,1.0,0.0,2.6e-05,0.286409
93,91352,La Tuna Canyon,45767,34.224089,-118.37563,0.013108,0.0,0.0,1.0,0.0,2.2e-05,0.286409
67,90272,Topanga State Park,3984,34.050505,-118.53374,0.001141,0.0,0.0,2.0,0.0,0.000502,0.572818
0,90002,Southeast Los Angeles,192229,33.948315,-118.24845,0.055056,0.0,0.0,2.0,0.0,1e-05,0.572818
66,90272,Pacific Palisades,75797,34.050505,-118.53374,0.021709,0.0,0.0,2.0,0.0,2.6e-05,0.572818
41,90033,Boyle Heights,91193,34.050411,-118.21195,0.026118,0.0,1.0,2.0,0.0,2.2e-05,0.572818
82,91316,Encino,48619,34.168753,-118.51636,0.013925,0.0,0.0,2.0,0.0,4.1e-05,0.572818


We can see that our desirability metric is showing the least desirable as El Sereno. This is due to a complete lack of amenities. 
Downtown Arts Distric and Historic Core look like very strong prospects with lots of Amenities, which is evidenced by the fact there are already many restaurants there, but no Korean ones!

Our client asked us to cluster similar neighbourhoods. We will use k-Means to do this, and cluster into 5 groups - most to least desirable. Before we do that, let's just visualise the Neighbourhoods based on quantity of Korean Restaurants, and in turn, quantity of Amenities.



In [138]:
#Create a table with only Districts that have at least 1 Korean Restaurant
df_la_kor = df_la2.loc[df_la2['Korean Total'] > 0]
df_la_kor

Unnamed: 0,Zip,District,Population,Latitude,Longitude,Pop Density,Korean Total,Total Restaurants,Other Amenities,Korean Density,Amenity Density,Desirability
2,90004,Hancock Park,4615,34.07711,-118.30755,0.001322,5.0,8.0,18.0,0.192308,0.0039,4.323849
3,90004,Rampart Village,21060,34.07711,-118.30755,0.006032,5.0,8.0,18.0,0.192308,0.000855,4.323849
4,90004,Virgil Village,32625,34.07711,-118.30755,0.009344,5.0,8.0,18.0,0.192308,0.000552,4.323849
5,90004,Wilshire Center,65232,34.07711,-118.30755,0.018683,5.0,8.0,18.0,0.192308,0.000276,4.323849
6,90004,Windsor Square,5275,34.07711,-118.30755,0.001511,5.0,8.0,18.0,0.192308,0.003412,4.323849
7,90005,Wilshire Park,10845,34.058911,-118.30848,0.003106,10.0,14.0,18.0,0.3125,0.00166,3.927892
8,90006,Byzantine-Latino Quarter,20919,34.048351,-118.2943,0.005991,1.0,2.0,8.0,0.1,0.000382,2.082973
9,90006,Harvard Heights,7848,34.048351,-118.2943,0.002248,1.0,2.0,8.0,0.1,0.001019,2.082973
10,90006,Koreatown,25088,34.048351,-118.2943,0.007185,1.0,2.0,8.0,0.1,0.000319,2.082973
11,90007,University Park,25870,34.026448,-118.2829,0.007409,1.0,8.0,14.0,0.045455,0.000541,3.835387


In [139]:
#Quickly remove the space from this column name
df_la_kor.rename(columns={'Korean Total': 'Korean_Total'}, inplace=True)
df_la_kor

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,Zip,District,Population,Latitude,Longitude,Pop Density,Korean_Total,Total Restaurants,Other Amenities,Korean Density,Amenity Density,Desirability
2,90004,Hancock Park,4615,34.07711,-118.30755,0.001322,5.0,8.0,18.0,0.192308,0.0039,4.323849
3,90004,Rampart Village,21060,34.07711,-118.30755,0.006032,5.0,8.0,18.0,0.192308,0.000855,4.323849
4,90004,Virgil Village,32625,34.07711,-118.30755,0.009344,5.0,8.0,18.0,0.192308,0.000552,4.323849
5,90004,Wilshire Center,65232,34.07711,-118.30755,0.018683,5.0,8.0,18.0,0.192308,0.000276,4.323849
6,90004,Windsor Square,5275,34.07711,-118.30755,0.001511,5.0,8.0,18.0,0.192308,0.003412,4.323849
7,90005,Wilshire Park,10845,34.058911,-118.30848,0.003106,10.0,14.0,18.0,0.3125,0.00166,3.927892
8,90006,Byzantine-Latino Quarter,20919,34.048351,-118.2943,0.005991,1.0,2.0,8.0,0.1,0.000382,2.082973
9,90006,Harvard Heights,7848,34.048351,-118.2943,0.002248,1.0,2.0,8.0,0.1,0.001019,2.082973
10,90006,Koreatown,25088,34.048351,-118.2943,0.007185,1.0,2.0,8.0,0.1,0.000319,2.082973
11,90007,University Park,25870,34.026448,-118.2829,0.007409,1.0,8.0,14.0,0.045455,0.000541,3.835387


In [140]:
#Mapping out the Korean restaurants we have
map_la_kor = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(df_la_kor['Latitude'], df_la_kor['Longitude'], df_la_kor['Korean_Total']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_la_kor)  

In [141]:
# Let's add a count of Restaurants to each location
Kor_Count = folium.map.FeatureGroup()

# loop through the 100 crimes and add each to the incidents feature group
for lat, lng, in zip(df_la_kor.Latitude, df_la_kor.Longitude):
    Kor_Count.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )

# add pop-up text to each marker on the map
latitudes = list(df_la_kor.Latitude)
longitudes = list(df_la_kor.Longitude)
labels = list(df_la_kor.Korean_Total)

for lat, lng, label in zip(latitudes, longitudes, labels):
    folium.Marker([lat, lng], popup=label).add_to(map_la_kor)    
    
# add incidents to map
map_la_kor.add_child(Kor_Count)

In [155]:
#The map above shows the Districts with at least 1 Lorean restaurant, but is not a convenient way to see how many per District.
#Let's create a map with clustered counts of Korean Restaurants. First create a dataframe with a new line for each Korean Restaurant in each district. This will make clustering Amenity counts easier in Folium
df_kor_exp = df_la_kor.loc[df_la_kor.index.repeat(df_la_kor.Korean_Total)]
df_kor_exp

Unnamed: 0,Zip,District,Population,Latitude,Longitude,Pop Density,Korean_Total,Total Restaurants,Other Amenities,Korean Density,Amenity Density,Desirability
2,90004,Hancock Park,4615,34.07711,-118.30755,0.001322,5.0,8.0,18.0,0.192308,0.0039,4.323849
2,90004,Hancock Park,4615,34.07711,-118.30755,0.001322,5.0,8.0,18.0,0.192308,0.0039,4.323849
2,90004,Hancock Park,4615,34.07711,-118.30755,0.001322,5.0,8.0,18.0,0.192308,0.0039,4.323849
2,90004,Hancock Park,4615,34.07711,-118.30755,0.001322,5.0,8.0,18.0,0.192308,0.0039,4.323849
2,90004,Hancock Park,4615,34.07711,-118.30755,0.001322,5.0,8.0,18.0,0.192308,0.0039,4.323849
3,90004,Rampart Village,21060,34.07711,-118.30755,0.006032,5.0,8.0,18.0,0.192308,0.000855,4.323849
3,90004,Rampart Village,21060,34.07711,-118.30755,0.006032,5.0,8.0,18.0,0.192308,0.000855,4.323849
3,90004,Rampart Village,21060,34.07711,-118.30755,0.006032,5.0,8.0,18.0,0.192308,0.000855,4.323849
3,90004,Rampart Village,21060,34.07711,-118.30755,0.006032,5.0,8.0,18.0,0.192308,0.000855,4.323849
3,90004,Rampart Village,21060,34.07711,-118.30755,0.006032,5.0,8.0,18.0,0.192308,0.000855,4.323849


In [156]:
#Now let's create a new map with the Korean Restaurants clustered

# let's start again with a clean copy of the map of San Francisco
map_la_kor2 = folium.Map(location = [latitude, longitude], zoom_start = 10)

# instantiate a mark cluster object for the incidents in the dataframe
counts = plugins.MarkerCluster().add_to(map_la_kor2)

# loop through the dataframe and add each data point to the mark cluster
for lat, lng, label in zip(df_kor_exp['Latitude'], df_kor_exp['Longitude'], df_kor_exp['Korean_Total']):
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(counts)

# display map
map_la_kor2

In [144]:
#Let's now map out the prevalence of Non-restaurant Amenities across Los Angeles
df_la_amen = df_la2.loc[df_la2['Other Amenities'] > 0]
df_la_amen

Unnamed: 0,Zip,District,Population,Latitude,Longitude,Pop Density,Korean Total,Total Restaurants,Other Amenities,Korean Density,Amenity Density,Desirability
0,90002,Southeast Los Angeles,192229,33.948315,-118.24845,0.055056,0.0,0.0,2.0,0.0,1e-05,0.572818
1,90002,Watts,3513,33.948315,-118.24845,0.001006,0.0,0.0,2.0,0.0,0.000569,0.572818
2,90004,Hancock Park,4615,34.07711,-118.30755,0.001322,5.0,8.0,18.0,0.192308,0.0039,4.323849
3,90004,Rampart Village,21060,34.07711,-118.30755,0.006032,5.0,8.0,18.0,0.192308,0.000855,4.323849
4,90004,Virgil Village,32625,34.07711,-118.30755,0.009344,5.0,8.0,18.0,0.192308,0.000552,4.323849
5,90004,Wilshire Center,65232,34.07711,-118.30755,0.018683,5.0,8.0,18.0,0.192308,0.000276,4.323849
6,90004,Windsor Square,5275,34.07711,-118.30755,0.001511,5.0,8.0,18.0,0.192308,0.003412,4.323849
7,90005,Wilshire Park,10845,34.058911,-118.30848,0.003106,10.0,14.0,18.0,0.3125,0.00166,3.927892
8,90006,Byzantine-Latino Quarter,20919,34.048351,-118.2943,0.005991,1.0,2.0,8.0,0.1,0.000382,2.082973
9,90006,Harvard Heights,7848,34.048351,-118.2943,0.002248,1.0,2.0,8.0,0.1,0.001019,2.082973


In [148]:
df_la_amen.rename(columns={'Other Amenities':'Other_Amenities'}, inplace=True)
df_la_amen

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,Zip,District,Population,Latitude,Longitude,Pop Density,Korean Total,Total Restaurants,Other_Amenities,Korean Density,Amenity Density,Desirability
0,90002,Southeast Los Angeles,192229,33.948315,-118.24845,0.055056,0.0,0.0,2.0,0.0,1e-05,0.572818
1,90002,Watts,3513,33.948315,-118.24845,0.001006,0.0,0.0,2.0,0.0,0.000569,0.572818
2,90004,Hancock Park,4615,34.07711,-118.30755,0.001322,5.0,8.0,18.0,0.192308,0.0039,4.323849
3,90004,Rampart Village,21060,34.07711,-118.30755,0.006032,5.0,8.0,18.0,0.192308,0.000855,4.323849
4,90004,Virgil Village,32625,34.07711,-118.30755,0.009344,5.0,8.0,18.0,0.192308,0.000552,4.323849
5,90004,Wilshire Center,65232,34.07711,-118.30755,0.018683,5.0,8.0,18.0,0.192308,0.000276,4.323849
6,90004,Windsor Square,5275,34.07711,-118.30755,0.001511,5.0,8.0,18.0,0.192308,0.003412,4.323849
7,90005,Wilshire Park,10845,34.058911,-118.30848,0.003106,10.0,14.0,18.0,0.3125,0.00166,3.927892
8,90006,Byzantine-Latino Quarter,20919,34.048351,-118.2943,0.005991,1.0,2.0,8.0,0.1,0.000382,2.082973
9,90006,Harvard Heights,7848,34.048351,-118.2943,0.002248,1.0,2.0,8.0,0.1,0.001019,2.082973


In [149]:
#Mapping out the non-restaurant Amenities we have
map_la_amen = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(df_la_amen['Latitude'], df_la_amen['Longitude'], df_la_amen['Other_Amenities']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_la_amen)

In [150]:
# Let's add a count of Restaurants to each location
Amen_Count = folium.map.FeatureGroup()

# loop through the 100 crimes and add each to the incidents feature group
for lat, lng, in zip(df_la_amen.Latitude, df_la_amen.Longitude):
    Amen_Count.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )

# add pop-up text to each marker on the map
latitudes = list(df_la_amen.Latitude)
longitudes = list(df_la_amen.Longitude)
labels = list(df_la_amen.Other_Amenities)

for lat, lng, label in zip(latitudes, longitudes, labels):
    folium.Marker([lat, lng], popup=label).add_to(map_la_amen)    
    
# add incidents to map
map_la_amen.add_child(Amen_Count)

In [152]:
#The amenities map is too congested. Let's create a dataframe with a new line for each Amenity in each district. This will make clustering Amenity counts easier in Folium
df_amen_exp = df_la_amen.loc[df_la_amen.index.repeat(df_la_amen.Other_Amenities)]
df_amen_exp

Unnamed: 0,Zip,District,Population,Latitude,Longitude,Pop Density,Korean Total,Total Restaurants,Other_Amenities,Korean Density,Amenity Density,Desirability
0,90002,Southeast Los Angeles,192229,33.948315,-118.24845,0.055056,0.0,0.0,2.0,0.0,1e-05,0.572818
0,90002,Southeast Los Angeles,192229,33.948315,-118.24845,0.055056,0.0,0.0,2.0,0.0,1e-05,0.572818
1,90002,Watts,3513,33.948315,-118.24845,0.001006,0.0,0.0,2.0,0.0,0.000569,0.572818
1,90002,Watts,3513,33.948315,-118.24845,0.001006,0.0,0.0,2.0,0.0,0.000569,0.572818
2,90004,Hancock Park,4615,34.07711,-118.30755,0.001322,5.0,8.0,18.0,0.192308,0.0039,4.323849
2,90004,Hancock Park,4615,34.07711,-118.30755,0.001322,5.0,8.0,18.0,0.192308,0.0039,4.323849
2,90004,Hancock Park,4615,34.07711,-118.30755,0.001322,5.0,8.0,18.0,0.192308,0.0039,4.323849
2,90004,Hancock Park,4615,34.07711,-118.30755,0.001322,5.0,8.0,18.0,0.192308,0.0039,4.323849
2,90004,Hancock Park,4615,34.07711,-118.30755,0.001322,5.0,8.0,18.0,0.192308,0.0039,4.323849
2,90004,Hancock Park,4615,34.07711,-118.30755,0.001322,5.0,8.0,18.0,0.192308,0.0039,4.323849


In [153]:
#Now let's create a new map with the Amenities clustered
from folium import plugins

# let's start again with a clean copy of the map of San Francisco
map_la_amen2 = folium.Map(location = [latitude, longitude], zoom_start = 10)

# instantiate a mark cluster object for the incidents in the dataframe
counts = plugins.MarkerCluster().add_to(map_la_amen2)

# loop through the dataframe and add each data point to the mark cluster
for lat, lng, label in zip(df_amen_exp['Latitude'], df_amen_exp['Longitude'], df_amen_exp['Other_Amenities']):
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(counts)

# display map
map_la_amen2

In [158]:
#Prepare the dataframe for K-means clustering
df_cluster = df_la2[['Desirability']]
df_cluster

Unnamed: 0,Desirability
0,0.572818
1,0.572818
2,4.323849
3,4.323849
4,4.323849
5,4.323849
6,4.323849
7,3.927892
8,2.082973
9,2.082973


In [160]:
#Inspecting the above shows we have 2 NaN rows - 76 and 95. Let's drop them.
df_cluster.drop([76, 95], inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


In [161]:
#Run K Means
# set number of clusters
from sklearn.cluster import KMeans
from sklearn import preprocessing

kclusters = 5

lanew_grouped_clustering = df_cluster

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(lanew_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:1000]

array([2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 3, 3, 3, 3, 4, 4, 3,
       2, 3, 3, 2, 3, 3, 3, 1, 1, 2, 1, 3, 3, 1, 1, 1, 1, 2, 2, 2, 2, 1,
       1, 1, 2, 2, 2, 1, 1, 1, 3, 0, 2, 2, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1,
       2, 2, 4, 1, 2, 1, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1,
       1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 2, 0, 1, 1, 1], dtype=int32)

In [164]:
df_la2.drop([76, 95], inplace=True)

In [165]:
# add clustering labels
df_la2.insert(0, 'Cluster Labels', kmeans.labels_)

In [175]:
df_la2.sort_values('Desirability', ascending=True)

Unnamed: 0,Cluster Labels,Zip,District,Population,Latitude,Longitude,Pop Density,Korean Total,Total Restaurants,Other Amenities,Korean Density,Amenity Density,Desirability
40,2,90032,El Sereno,37931,34.08166,-118.17568,0.010864,0.0,2.0,1.0,0.0,2.6e-05,0.286409
97,2,91401,Van Nuys,13243,34.176505,-118.43308,0.003793,0.0,2.0,1.0,0.0,7.6e-05,0.286409
96,2,91401,Valley Glen,23493,34.176505,-118.43308,0.006729,0.0,2.0,1.0,0.0,4.3e-05,0.286409
94,2,91352,Sun Valley,38836,34.224089,-118.37563,0.011123,0.0,0.0,1.0,0.0,2.6e-05,0.286409
93,2,91352,La Tuna Canyon,45767,34.224089,-118.37563,0.013108,0.0,0.0,1.0,0.0,2.2e-05,0.286409
67,2,90272,Topanga State Park,3984,34.050505,-118.53374,0.001141,0.0,0.0,2.0,0.0,0.000502,0.572818
0,2,90002,Southeast Los Angeles,192229,33.948315,-118.24845,0.055056,0.0,0.0,2.0,0.0,1e-05,0.572818
66,2,90272,Pacific Palisades,75797,34.050505,-118.53374,0.021709,0.0,0.0,2.0,0.0,2.6e-05,0.572818
41,2,90033,Boyle Heights,91193,34.050411,-118.21195,0.026118,0.0,1.0,2.0,0.0,2.2e-05,0.572818
82,2,91316,Encino,48619,34.168753,-118.51636,0.013925,0.0,0.0,2.0,0.0,4.1e-05,0.572818


In [167]:
# create map of clusters
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_la2['Latitude'], df_la2['Longitude'], df_la2['District'], df_la2['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

So we've mapped up our clusters but what exactly is each cluster?

In [168]:
#1st Cluster - this is our High Desirability, but not Highest
df_la2.loc[df_la2['Cluster Labels'] == 0]

Unnamed: 0,Cluster Labels,Zip,District,Population,Latitude,Longitude,Pop Density,Korean Total,Total Restaurants,Other Amenities,Korean Density,Amenity Density,Desirability
53,0,90048,Mid-City West,27326,34.073759,-118.37376,0.007826,0.0,14.0,43.0,0.0,0.001574,12.315578
101,0,91601,North Hollywood,54086,34.168206,-118.37246,0.015491,0.0,14.0,42.0,0.0,0.000777,12.029169


In [169]:
#2nd Cluster - these are low Desirability, but not Lowest
df_la2.loc[df_la2['Cluster Labels'] == 1]

Unnamed: 0,Cluster Labels,Zip,District,Population,Latitude,Longitude,Pop Density,Korean Total,Total Restaurants,Other Amenities,Korean Density,Amenity Density,Desirability
2,1,90004,Hancock Park,4615,34.07711,-118.30755,0.001322,5.0,8.0,18.0,0.192308,0.0039,4.323849
3,1,90004,Rampart Village,21060,34.07711,-118.30755,0.006032,5.0,8.0,18.0,0.192308,0.000855,4.323849
4,1,90004,Virgil Village,32625,34.07711,-118.30755,0.009344,5.0,8.0,18.0,0.192308,0.000552,4.323849
5,1,90004,Wilshire Center,65232,34.07711,-118.30755,0.018683,5.0,8.0,18.0,0.192308,0.000276,4.323849
6,1,90004,Windsor Square,5275,34.07711,-118.30755,0.001511,5.0,8.0,18.0,0.192308,0.003412,4.323849
7,1,90005,Wilshire Park,10845,34.058911,-118.30848,0.003106,10.0,14.0,18.0,0.3125,0.00166,3.927892
11,1,90007,University Park,25870,34.026448,-118.2829,0.007409,1.0,8.0,14.0,0.045455,0.000541,3.835387
12,1,90008,Baldwin Hills,26303,34.009754,-118.33705,0.007533,0.0,6.0,18.0,0.0,0.000684,5.155358
13,1,90008,Crenshaw,10450,34.009754,-118.33705,0.002993,0.0,6.0,18.0,0.0,0.001722,5.155358
14,1,90008,Leimert Park,12363,34.009754,-118.33705,0.003541,0.0,6.0,18.0,0.0,0.001456,5.155358


In [170]:
#3rd Cluster - these are lowest Desirability
df_la2.loc[df_la2['Cluster Labels'] == 2]

Unnamed: 0,Cluster Labels,Zip,District,Population,Latitude,Longitude,Pop Density,Korean Total,Total Restaurants,Other Amenities,Korean Density,Amenity Density,Desirability
0,2,90002,Southeast Los Angeles,192229,33.948315,-118.24845,0.055056,0.0,0.0,2.0,0.0,1e-05,0.572818
1,2,90002,Watts,3513,33.948315,-118.24845,0.001006,0.0,0.0,2.0,0.0,0.000569,0.572818
8,2,90006,Byzantine-Latino Quarter,20919,34.048351,-118.2943,0.005991,1.0,2.0,8.0,0.1,0.000382,2.082973
9,2,90006,Harvard Heights,7848,34.048351,-118.2943,0.002248,1.0,2.0,8.0,0.1,0.001019,2.082973
10,2,90006,Koreatown,25088,34.048351,-118.2943,0.007185,1.0,2.0,8.0,0.1,0.000319,2.082973
22,2,90016,West Adams,39593,34.029711,-118.35255,0.01134,0.0,1.0,4.0,0.0,0.000101,1.145635
25,2,90018,Jefferson Park,26176,34.029112,-118.3183,0.007497,0.0,1.0,5.0,0.0,0.000191,1.432044
31,2,90024,Westwood,48500,34.063209,-118.43643,0.013891,0.0,0.0,6.0,0.0,0.000124,1.718453
39,2,90031,Lincoln Heights,11809,34.07871,-118.2161,0.003382,0.0,1.0,3.0,0.0,0.000254,0.859226
40,2,90032,El Sereno,37931,34.08166,-118.17568,0.010864,0.0,2.0,1.0,0.0,2.6e-05,0.286409


In [171]:
#4th Cluster - these are Medium Desirability
df_la2.loc[df_la2['Cluster Labels'] == 3]

Unnamed: 0,Cluster Labels,Zip,District,Population,Latitude,Longitude,Pop Density,Korean Total,Total Restaurants,Other Amenities,Korean Density,Amenity Density,Desirability
15,3,90012,Chinatown,18687,34.061611,-118.23944,0.005352,0.0,33.0,28.0,0.0,0.001498,8.019446
16,3,90012,Downtown Civic Center,1363,34.061611,-118.23944,0.00039,0.0,33.0,28.0,0.0,0.020543,8.019446
17,3,90012,Downtown Little Tokyo,3386,34.061611,-118.23944,0.00097,0.0,33.0,28.0,0.0,0.008269,8.019446
18,3,90013,Downtown Fashion District,3672,34.044662,-118.24255,0.001052,1.0,9.0,28.0,0.027027,0.007625,7.808408
21,3,90015,Downtown South Park,8103,34.038993,-118.26516,0.002321,0.0,2.0,27.0,0.0,0.003332,7.733037
23,3,90017,Downtown Bunker Hill,4807,34.052561,-118.26434,0.001377,0.0,3.0,24.0,0.0,0.004993,6.873811
24,3,90017,Downtown City West,2785,34.052561,-118.26434,0.000798,0.0,3.0,24.0,0.0,0.008618,6.873811
26,3,90019,Arlington Heights,17618,34.048411,-118.34015,0.005046,0.0,5.0,21.0,0.0,0.001192,6.014585
27,3,90019,Country Club Park,6795,34.048411,-118.34015,0.001946,0.0,5.0,21.0,0.0,0.003091,6.014585
28,3,90019,Mid-City,13864,34.048411,-118.34015,0.003971,0.0,5.0,21.0,0.0,0.001515,6.014585


In [172]:
#5th Cluster - these are Highest Desirability
df_la2.loc[df_la2['Cluster Labels'] == 4]

Unnamed: 0,Cluster Labels,Zip,District,Population,Latitude,Longitude,Pop Density,Korean Total,Total Restaurants,Other Amenities,Korean Density,Amenity Density,Desirability
19,4,90014,Downtown Arts District,2287,34.042912,-118.25193,0.000655,0.0,15.0,59.0,0.0,0.025798,16.898118
20,4,90014,Downtown Historic Core,8312,34.042912,-118.25193,0.002381,0.0,15.0,59.0,0.0,0.007098,16.898118
68,4,90291,Venice,93258,33.992411,-118.46531,0.02671,0.0,2.0,55.0,0.0,0.00059,15.752483
