<a href="https://colab.research.google.com/github/Preciuse/Coursera_Capstone/blob/main/Capstone_Project_Week_1_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data - Finding the State](#data1)
* [Data - Ideal Location](#data2)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

The recent Covid outbreak across the world has impacted all aspects of our daily lives. However, an enterprising entrepreneur has decided to capitalise on the uncertainty and open an office for their new startup in a US state. Unfortunately, a large number of their employees are 'at-risk': thus, the entrepreneur seeks to find the safest state in terms of covid cases.

## Data - Finding the State <a name="data1"></a>


The data we'll be using is sourced from the NY Times, linked below. The historical data used in this report is from 1 Jan 2020 to 2 Jun 2021, though we'll only be looking at the last 2 months (i.e. 1 Apr 2021 - 2 Jun 2021) for this analysis.

Source: https://github.com/nytimes/covid-19-data

Historical Data for US - https://raw.githubusercontent.com/nytimes/covid-19-data/master/us.csv

Historical Data for States - https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv

Historical Data by County - https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv


In [1]:
import pandas as pd
import numpy as np

In [2]:
# Load historical covid data for the US, as well as data per state and per county
us_data = pd.read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us.csv").dropna()
us_state_data = pd.read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv").dropna()
us_county_data = pd.read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv").dropna()

In [3]:
# As we're working with only the 50 states and D.C., we drop everything else.
us_state_data = us_state_data.query('state != "Northern Mariana Islands" & state != "Guam" & state != "Puerto Rico" & state != "Virgin Islands"')
us_county_data = us_county_data.query('state != "Northern Mariana Islands" & state != "Guam" & state != "Puerto Rico" & state != "Virgin Islands"')

In [4]:
# Limit the data to the start of this year (1 Jan 2021 ) to end of the data set, at 1 Jun 2021
us_data = us_data[(us_data.date >= '2021-01-01') & (us_data.date <= '2021-06-02')]
us_state_data = us_state_data[(us_state_data.date >= '2021-01-01') & (us_state_data.date <= '2021-06-02')]
us_county_data = us_county_data[(us_county_data.date >= '2021-01-01') & (us_county_data.date <= '2021-06-02')]
us_state_data.tail()

Unnamed: 0,date,state,fips,cases,deaths
25144,2021-06-02,Virginia,51,675783,11206
25145,2021-06-02,Washington,53,439675,5845
25146,2021-06-02,West Virginia,54,161967,2800
25147,2021-06-02,Wisconsin,55,674952,7894
25148,2021-06-02,Wyoming,56,60433,720


Plotly's US maps require state codes rather than state names, so we'll need to replace these in the dataframe.

In [5]:
# Dictionary to replace state names with their code, in order to map the states: sourced from https://gist.github.com/rogerallen/1583593 with thanks
us_state_abbrev = {
'Alabama': 'AL','Alaska': 'AK','American Samoa': 'AS','Arizona': 'AZ','Arkansas': 'AR','California': 'CA','Colorado': 'CO','Connecticut': 'CT','Delaware': 'DE','District of Columbia': 'DC','Florida': 'FL','Georgia': 'GA','Guam': 'GU','Hawaii': 'HI','Idaho': 'ID','Illinois': 'IL','Indiana': 'IN','Iowa': 'IA','Kansas': 'KS','Kentucky': 'KY','Louisiana': 'LA','Maine': 'ME','Maryland': 'MD','Massachusetts': 'MA','Michigan': 'MI','Minnesota': 'MN','Mississippi': 'MS','Missouri': 'MO','Montana': 'MT','Nebraska': 'NE','Nevada': 'NV','New Hampshire': 'NH','New Jersey': 'NJ','New Mexico': 'NM','New York': 'NY','North Carolina': 'NC','North Dakota': 'ND','Northern Mariana Islands':'MP','Ohio': 'OH','Oklahoma': 'OK','Oregon': 'OR','Pennsylvania': 'PA','Puerto Rico': 'PR','Rhode Island': 'RI','South Carolina': 'SC','South Dakota': 'SD','Tennessee': 'TN','Texas': 'TX','Utah': 'UT','Vermont': 'VT','Virgin Islands': 'VI','Virginia': 'VA','Washington': 'WA','West Virginia': 'WV','Wisconsin': 'WI','Wyoming': 'WY'
}
abbrev_us_state = dict(map(reversed, us_state_abbrev.items()))

In [6]:
us_state_data_coded = us_state_data.copy(deep=True)
us_state_data_coded['state'].replace(us_state_abbrev, inplace=True)
us_state_data_coded.dropna().head()

Unnamed: 0,date,state,fips,cases,deaths
16734,2021-01-01,AL,1,365747,4872
16735,2021-01-01,AK,2,46740,198
16736,2021-01-01,AZ,4,530267,9015
16737,2021-01-01,AR,5,229442,3711
16738,2021-01-01,CA,6,2345811,26236


In [7]:
us_state_data_jun = us_state_data_coded.copy(deep=True)
us_state_data_jun = us_state_data_jun[us_state_data_jun.date == '2021-06-02']
us_state_data_jun.head()

Unnamed: 0,date,state,fips,cases,deaths
25094,2021-06-02,AL,1,544598,11167
25095,2021-06-02,AK,2,69773,352
25096,2021-06-02,AZ,4,882369,17648
25097,2021-06-02,AR,5,341692,5835
25098,2021-06-02,CA,6,3792111,63284


Let's take a look at the total number of cases that each state and territory has accumulated.

In [8]:
import plotly.graph_objects as go

# Mapping the # of cumulative cases as of the 2nd of June, 2021, by state
fig = go.Figure(data=go.Choropleth(
    locations=us_state_data_jun['state'], # Spatial coordinates
    z = us_state_data_jun['cases'].astype(float), # Data to be color-coded
    locationmode = 'USA-states', # set of locations match entries in `locations`
    colorscale = 'Blues',
    colorbar_title = "# of cases",
))

fig.update_layout(
    title_text = 'Cumulative Covid Cases on 2021-06-02 by State',
    geo_scope='usa', # limit map scope to USA
)

fig.show()

However, the total number of cases doesn't tell us the whole picture - a state's population will affect how many cases they have, as well as the steps they've taken since the pandemic started. Let's look at the number of new cases that have popped up since the start of April, so about 2 months ago. First, we'll have to operate on the data.

In [9]:
# New dataframe for data on the 1st of April 2021
us_state_data_apr = us_state_data_coded.copy(deep=True)
us_state_data_apr = us_state_data_apr[us_state_data_apr.date == '2021-04-01']

us_state_data_apr.head()

Unnamed: 0,date,state,fips,cases,deaths
21684,2021-04-01,AL,1,515866,10553
21685,2021-04-01,AK,2,62785,299
21686,2021-04-01,AZ,4,842200,16977
21687,2021-04-01,AR,5,330609,5636
21688,2021-04-01,CA,6,3671690,59396


In [10]:
# For each state, find the difference in # of cases from April to June, and save it as a dataframe
states_list = us_state_data_jun["state"].unique()
new_cases_apr_jun_list = []

for state in states_list:
  new_cases_apr_jun_list.append(int(us_state_data_jun[us_state_data_jun.state == state].cases) - int(us_state_data_apr[us_state_data_apr.state == state].cases))

df_new_cases_jun_apr = pd.DataFrame({"state": states_list,
                                     "new cases": new_cases_apr_jun_list})

df_new_cases_jun_apr.head()

Unnamed: 0,state,new cases
0,AL,28732
1,AK,6988
2,AZ,40169
3,AR,11083
4,CA,120421


In [11]:
# Mapping the # of cases from April to June, by state
fig = go.Figure(data=go.Choropleth(
    locations=df_new_cases_jun_apr['state'], # Spatial coordinates
    z = df_new_cases_jun_apr['new cases'].astype(float), # Data to be color-coded
    locationmode = 'USA-states', # set of locations match entries in `locations`
    colorscale = 'Blues',
    colorbar_title = "# of new cases, APR to JUN",
))

fig.update_layout(
    title_text = 'Number of Covid Cases from APR to JUN by State',
    geo_scope='usa', # limit map scope to USA
)

fig.show()

As we can see, Florida and some north-eastern states and territories have the most new cases in the last 2 months. Again, this doesn't tell the whole story - we'll need to factor in each state's population to get a better picture as to the extent of covid cases for each state.

The most recent US population data seems to be from 2019, so we'll have to extrapolate it to 2021.

In [12]:
us_pop_data = pd.read_excel("https://www2.census.gov/programs-surveys/popest/tables/2010-2019/state/totals/nst-est2019-01.xlsx", header=3)
us_pop_data = us_pop_data.dropna()
# As the data contains the population estimates for the entire US and four quadrants, we'll need to remove the first five columns.
us_pop_data = us_pop_data.iloc[5:]

In [13]:
# Refresh the df index after removing five rows, and rename the first column
us_pop_data = us_pop_data.reset_index(drop=True)
us_pop_data = us_pop_data.rename(columns={"Unnamed: 0": "state"})
# Remove Puerto Rico
us_pop_data = us_pop_data.query('state != "Puerto Rico"')
us_pop_data.head()

Unnamed: 0,state,Census,Estimates Base,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
0,.Alabama,4779736.0,4780125.0,4785437.0,4799069.0,4815588.0,4830081.0,4841799.0,4852347.0,4863525.0,4874486.0,4887681.0,4903185.0
1,.Alaska,710231.0,710249.0,713910.0,722128.0,730443.0,737068.0,736283.0,737498.0,741456.0,739700.0,735139.0,731545.0
2,.Arizona,6392017.0,6392288.0,6407172.0,6472643.0,6554978.0,6632764.0,6730413.0,6829676.0,6941072.0,7044008.0,7158024.0,7278717.0
3,.Arkansas,2915918.0,2916031.0,2921964.0,2940667.0,2952164.0,2959400.0,2967392.0,2978048.0,2989918.0,3001345.0,3009733.0,3017804.0
4,.California,37253956.0,37254519.0,37319502.0,37638369.0,37948800.0,38260787.0,38596972.0,38918045.0,39167117.0,39358497.0,39461588.0,39512223.0


However, the data stops at 2019, so we'll need to extrapolate to midway through 2021. We can find the average yearly rate of growth from 1 Apr 2010 to 1 Jul 2019, and forecast for population at 1 Jun 2021.

In [14]:
# Verify that our data has the right number of states (50 + D.C.)
print(df_new_cases_jun_apr.shape)
print(us_pop_data.shape)

(51, 2)
(51, 13)


In [15]:
import math

projected_pop_2021 = []

for i in us_pop_data.index:
  # 1 Apr 2010 to 1 Jul 2019 is 9 and 1/3 years
  growth_10_19 = int(us_pop_data.iloc[i, -1]) / int(us_pop_data.iloc[i, 3])
  # 1 over 9 + 1/3 = 3/28
  growth_10_19_yearly = math.pow(growth_10_19, 3/28)
  # 1 Jul 2019 to 1 Jun 2021 is 1 and 11/12 years
  extrap_pop_2021 = us_pop_data.iloc[i, -1] * pow(growth_10_19_yearly, 23/24)
  rounded_projection = int(extrap_pop_2021)
  projected_pop_2021.append(rounded_projection)

print(projected_pop_2021)

[4915438, 733380, 7374660, 3027820, 39744537, 5837231, 3563870, 981717, 716972, 21768004, 10715054, 1421312, 1810897, 12654626, 6757549, 3165981, 2919033, 4480126, 4659633, 1345926, 6072709, 6926900, 9998152, 5674524, 2976724, 6152139, 1077135, 1945510, 3121814, 1364199, 8890729, 2100171, 19459081, 10586709, 771647, 11704587, 3977776, 4258852, 12811351, 1059917, 5204508, 892009, 6879786, 29411616, 3253792, 623795, 8589885, 7710587, 1785890, 5836155, 580244]


Now, let's find the number of new cases from Apr to Jun as a percentage of each state's population, and save it to a dataframe.

In [16]:
pcent_new_cases_list = []

for i in range(0,51):
  pcent_new_cases = new_cases_apr_jun_list[i] / projected_pop_2021[i]
  pcent_new_cases_list.append(pcent_new_cases*100) # multiply by 100 to convert from decimal to percentage

percentage_new_cases_apr_jun_df = pd.DataFrame({"state": states_list,
                                                "% new cases": pcent_new_cases_list})

In [17]:
# Mapping the # of cases from April to June, by state
fig = go.Figure(data=go.Choropleth(
    locations=percentage_new_cases_apr_jun_df['state'], # Spatial coordinates
    z = percentage_new_cases_apr_jun_df['% new cases'].astype(float), # Data to be color-coded
    locationmode = 'USA-states', # set of locations match entries in `locations`
    colorscale = 'Blues',
    colorbar_title = "% of new cases of state population",
))

fig.update_layout(
    title_text = "Percentage of new covid cases, from APR to JUN, of a state's population",
    geo_scope='usa', # limit map scope to USA
)

fig.show()

As we can see, the state with the highest percentage of its inhabitants contracting covid from April to June is Michigan. Compared to the previous maps, California has a very low percentage, even though it had a high number of absolute cases: its even larger population drags the % down. Other states with high percentages include Minnesota, Delaware, Colorado, and Pennsylvania.

In [18]:
percentage_new_cases_apr_jun_df.sort_values(by=["% new cases"], ascending=False).head()

Unnamed: 0,state,% new cases
22,MI,2.397393
23,MN,1.409687
7,DE,1.397857
5,CO,1.383533
38,PA,1.357156


Looks like California has the fewest new cases from April to June by a decent margin. 

In [19]:
percentage_new_cases_apr_jun_df.sort_values(by=["% new cases"], ascending=True).head()

Unnamed: 0,state,% new cases
4,CA,0.302988
36,OK,0.364726
3,AR,0.366039
11,HI,0.367477
16,KS,0.397597


## Data - Ideal Location <a name="data2"></a>


So we've decided on California as the state of choice. 

In [20]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

In [21]:
r = requests.get('https://apps.gis.ucla.edu/geodata/dataset/93d71e41-6196-4ecb-9ddd-15f1a4a7630c/resource/6cde4e9e-307c-477d-9089-cae9484c8bc1/download/la-county-neighborhoods-v6.geojson')

In [22]:
la_data_df = pd.json_normalize(r.json()['features'], record_prefix=False)[['properties.metadata.name', 'properties.metadata.region', 'properties.metadata.county', 'properties.metadata.type']]
la_data_df = la_data_df.rename(columns = {"properties.metadata.name": "Name",
                             "properties.metadata.region": "Region",
                             "properties.metadata.county": "County",
                             "properties.metadata.type": "Type"})
la_data_df.tail()

Unnamed: 0,Name,Region,County,Type
313,Wilmington,harbor,los-angeles,segment-of-a-city
314,Windsor Square,central-la,los-angeles,segment-of-a-city
315,Winnetka,san-fernando-valley,los-angeles,segment-of-a-city
316,Woodland Hills,san-fernando-valley,los-angeles,segment-of-a-city
317,Yorba Linda,north-county,orange,standalone-city


In [23]:
# Limit the dataset to Los Angeles county, removing entries from Orange county
la_data_df = la_data_df[la_data_df.County == 'los-angeles']

In [24]:
# Remove regions of LA that aren't in the heart of the city (outlying areas, mis-attributed lats/longs, etc.)
removed_regions = ["antelope-valley", "northwest-county", "santa-monica-mountains", "harbor", "san-gabriel-valley", "pomona-valley", "san-fernando-valley", "angeles-forest", "verdugos"]
la_data_df = la_data_df[~la_data_df.Region.isin(removed_regions)]

In [25]:
# Same as above for specific locations
removed_neighbourhoods = ["Mount Washington", "Fairfax", "Chinatown", "Del Rey", "Green Meadows", "Florence", "South Park", "Brentwood", "Cypress Park"]
la_data_df = la_data_df[~la_data_df.Name.isin(removed_neighbourhoods)]

In [26]:
la_data_df.reset_index(drop=True, inplace=True)
la_data_df.tail()

Unnamed: 0,Name,Region,County,Type
118,Westmont,south-la,los-angeles,unincorporated-area
119,West Whittier-Los Nietos,southeast,los-angeles,unincorporated-area
120,Westwood,westside,los-angeles,segment-of-a-city
121,Willowbrook,south-la,los-angeles,unincorporated-area
122,Windsor Square,central-la,los-angeles,segment-of-a-city


In [27]:
la_data_df.describe() 

Unnamed: 0,Name,Region,County,Type
count,123,123,123,123
unique,123,7,1,3
top,Lennox,south-la,los-angeles,segment-of-a-city
freq,1,25,123,68


In [28]:
import time

la_names = []
la_regions = []
la_latitudes = []
la_longitudes = []

for i in la_data_df['Name']:
  try:
    address = str(i) + str(", CA") 
    geolocator = Nominatim(user_agent="ca_explorer")
    location = geolocator.geocode(address)
    la_latitudes.append(location.latitude)
    la_longitudes.append(location.longitude)
    la_names.append(address)
    la_regions.append(la_data_df[la_data_df['Name'] == i]['Region'].item())
    time.sleep(2)
  except:
    pass


In [29]:
la_location_data = pd.DataFrame({"Name": la_names,
                                 "Region": la_regions,
                                 "Latitude": la_latitudes,
                                 "Longitude": la_longitudes})

In [30]:
la_location_data['Region'].unique()

array(['south-la', 'south-bay', 'central-la', 'southeast', 'northeast-la',
       'westside', 'eastside'], dtype=object)

In [54]:
map_la = folium.Map(location=[la_location_data.Latitude.iloc[0], la_location_data.Longitude.iloc[0]], zoom_start=10)

# add markers to map
for lat, lng, name, region in zip(la_location_data['Latitude'], la_location_data['Longitude'], la_location_data['Name'], la_location_data['Region']):
    label = '{}, {}'.format(name, region)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_la)  
    
map_la

In [32]:
CLIENT_ID = 'VPDB4HJUI4IFXE4VXK2T2FTZ3CUK2YIACXT35QCI3JMJZZ0Q' # your Foursquare ID
CLIENT_SECRET = 'CSN50FUOWFI2ZBCAOEIKGJNX4NNR4WEE2TPUK2CIXIP1LNSJ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

In [33]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        # print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Name', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [34]:
la_venues = getNearbyVenues(names=la_location_data['Name'],
                            latitudes=la_location_data['Latitude'],
                            longitudes=la_location_data['Longitude'])

In [35]:
la_venues.tail()

Unnamed: 0,Name,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
2307,"Windsor Square, CA",34.072593,-118.32081,Chase Bank,34.073221,-118.323844,Bank
2308,"Windsor Square, CA",34.072593,-118.32081,Wells Fargo,34.0759,-118.323993,Bank
2309,"Windsor Square, CA",34.072593,-118.32081,Uncool Burgers,34.074424,-118.323671,Burger Joint
2310,"Windsor Square, CA",34.072593,-118.32081,Buck Mason,34.073307,-118.323818,Clothing Store
2311,"Windsor Square, CA",34.072593,-118.32081,Albert's Mexican Grill,34.076034,-118.323993,Mexican Restaurant


In [36]:
print('There are {} uniques categories.'.format(len(la_venues['Venue Category'].unique())))
la_venues.shape

There are 297 uniques categories.


(2312, 7)

In [37]:
# one hot encoding
la_onehot = pd.get_dummies(la_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
la_onehot['Name'] = la_venues['Name'] 

# move neighborhood column to the first column
neigh_col = la_onehot['Name']
la_onehot.drop(labels=['Name'], axis=1, inplace=True)
la_onehot.insert(0, 'Name', neigh_col)
la_onehot.head()

Unnamed: 0,Name,ATM,Accessories Store,African Restaurant,Airport,American Restaurant,Amphitheater,Antique Shop,Aquarium,Arcade,Art Gallery,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Assisted Living,Athletics & Sports,Australian Restaurant,Auto Garage,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Buffet,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Canal,Cantonese Restaurant,Caribbean Restaurant,Carpet Store,Check Cashing Service,Chinese Restaurant,Chocolate Shop,City Hall,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,College Football Field,College Residence Hall,Comedy Club,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Creperie,Cuban Restaurant,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Disc Golf,Discount Store,Dive Bar,Doctor's Office,Dog Run,Donburi Restaurant,Dongbei Restaurant,Donut Shop,Dumpling Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Film Studio,Financial or Legal Service,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Truck,Football Stadium,Fountain,Fraternity House,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Government Building,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,High School,Historic Site,History Museum,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Insurance Office,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean BBQ Restaurant,Korean Restaurant,Kosher Restaurant,Lake,Latin American Restaurant,Laundromat,Lawyer,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Marijuana Dispensary,Market,Martial Arts School,Massage Studio,Mattress Store,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Motel,Movie Theater,Moving Target,Multiplex,Museum,Music Store,Music Venue,Nail Salon,Neighborhood,New American Restaurant,Nightclub,Noodle House,North Indian Restaurant,Optical Shop,Organic Grocery,Other Great Outdoors,Other Repair Shop,Outdoor Supply Store,Paper / Office Supplies Store,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Photography Studio,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Print Shop,Pub,Rafting,Ramen Restaurant,Record Shop,Recording Studio,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Rock Club,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Satay Restaurant,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soccer Stadium,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,State / Provincial Park,Steakhouse,Storage Facility,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tapas Restaurant,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Thrift / Vintage Store,Toy / Game Store,Track,Trail,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Watch Shop,Water Park,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Adams-Normandie, CA",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Adams-Normandie, CA",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Adams-Normandie, CA",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Adams-Normandie, CA",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Adams-Normandie, CA",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [38]:
la_onehot.shape

(2312, 298)

In [39]:
la_grouped = la_onehot.groupby('Name').mean().reset_index()
la_grouped.head()

Unnamed: 0,Name,ATM,Accessories Store,African Restaurant,Airport,American Restaurant,Amphitheater,Antique Shop,Aquarium,Arcade,Art Gallery,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Assisted Living,Athletics & Sports,Australian Restaurant,Auto Garage,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Buffet,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Canal,Cantonese Restaurant,Caribbean Restaurant,Carpet Store,Check Cashing Service,Chinese Restaurant,Chocolate Shop,City Hall,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,College Football Field,College Residence Hall,Comedy Club,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Creperie,Cuban Restaurant,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Disc Golf,Discount Store,Dive Bar,Doctor's Office,Dog Run,Donburi Restaurant,Dongbei Restaurant,Donut Shop,Dumpling Restaurant,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Film Studio,Financial or Legal Service,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Service,Food Truck,Football Stadium,Fountain,Fraternity House,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Government Building,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,High School,Historic Site,History Museum,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Insurance Office,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean BBQ Restaurant,Korean Restaurant,Kosher Restaurant,Lake,Latin American Restaurant,Laundromat,Lawyer,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Marijuana Dispensary,Market,Martial Arts School,Massage Studio,Mattress Store,Medical Center,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Motel,Movie Theater,Moving Target,Multiplex,Museum,Music Store,Music Venue,Nail Salon,Neighborhood,New American Restaurant,Nightclub,Noodle House,North Indian Restaurant,Optical Shop,Organic Grocery,Other Great Outdoors,Other Repair Shop,Outdoor Supply Store,Paper / Office Supplies Store,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Photography Studio,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Print Shop,Pub,Rafting,Ramen Restaurant,Record Shop,Recording Studio,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Rock Club,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Satay Restaurant,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soccer Stadium,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,State / Provincial Park,Steakhouse,Storage Facility,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tapas Restaurant,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Thrift / Vintage Store,Toy / Game Store,Track,Trail,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Watch Shop,Water Park,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Adams-Normandie, CA",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.272727,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alondra Park, CA",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Arlington Heights, CA",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Artesia, CA",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.032258,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.096774,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.096774,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.032258,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Athens, CA",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [40]:
list(la_grouped.columns)

['Name',
 'ATM',
 'Accessories Store',
 'African Restaurant',
 'Airport',
 'American Restaurant',
 'Amphitheater',
 'Antique Shop',
 'Aquarium',
 'Arcade',
 'Art Gallery',
 'Arts & Crafts Store',
 'Arts & Entertainment',
 'Asian Restaurant',
 'Assisted Living',
 'Athletics & Sports',
 'Australian Restaurant',
 'Auto Garage',
 'Automotive Shop',
 'BBQ Joint',
 'Baby Store',
 'Bagel Shop',
 'Bakery',
 'Bank',
 'Bar',
 'Baseball Field',
 'Baseball Stadium',
 'Basketball Court',
 'Beach',
 'Bed & Breakfast',
 'Beer Bar',
 'Beer Garden',
 'Beer Store',
 'Big Box Store',
 'Bike Rental / Bike Share',
 'Bistro',
 'Board Shop',
 'Boat or Ferry',
 'Bookstore',
 'Boutique',
 'Bowling Alley',
 'Brazilian Restaurant',
 'Breakfast Spot',
 'Brewery',
 'Bubble Tea Shop',
 'Buffet',
 'Building',
 'Burger Joint',
 'Burrito Place',
 'Bus Line',
 'Bus Station',
 'Bus Stop',
 'Business Service',
 'Butcher',
 'Cafeteria',
 'Café',
 'Canal',
 'Cantonese Restaurant',
 'Caribbean Restaurant',
 'Carpet Store',


In [68]:
la_grouped.shape

(122, 298)

In [69]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [70]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Name']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Name'] = la_grouped['Name']

for ind in np.arange(la_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(la_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adams-Normandie, CA",Sushi Restaurant,Convenience Store,Taco Place,Grocery Store,Gas Station,Latin American Restaurant,Park,Playground,Home Service,Arcade
1,"Alondra Park, CA",Home Service,Park,Yoga Studio,Farmers Market,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm
2,"Arlington Heights, CA",Food,Restaurant,Convenience Store,Latin American Restaurant,Dance Studio,Shop & Service,Art Gallery,Café,Rental Car Location,English Restaurant
3,"Artesia, CA",Fast Food Restaurant,Indian Restaurant,Hotel,Sandwich Place,Bubble Tea Shop,Vietnamese Restaurant,Korean Restaurant,Frozen Yogurt Shop,Gift Shop,Taiwanese Restaurant
4,"Athens, CA",Food Truck,Mexican Restaurant,Park,Farmers Market,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm


In [71]:
# set number of clusters
kclusters = 5

la_grouped_clustering = la_grouped.drop('Name', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=3).fit(la_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 1, 1, 1, 1, 0, 1, 1, 1], dtype=int32)

In [72]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

new_la_data = la_location_data.copy()

la_merged = new_la_data

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
la_merged = la_merged.join(neighborhoods_venues_sorted.set_index('Name'), on='Name')

la_merged.head() # check the last columns!

Unnamed: 0,Name,Region,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adams-Normandie, CA",south-la,34.031788,-118.300247,1.0,Sushi Restaurant,Convenience Store,Taco Place,Grocery Store,Gas Station,Latin American Restaurant,Park,Playground,Home Service,Arcade
1,"Alondra Park, CA",south-bay,33.890134,-118.335139,0.0,Home Service,Park,Yoga Studio,Farmers Market,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm
2,"Arlington Heights, CA",central-la,34.043494,-118.321374,1.0,Food,Restaurant,Convenience Store,Latin American Restaurant,Dance Studio,Shop & Service,Art Gallery,Café,Rental Car Location,English Restaurant
3,"Artesia, CA",southeast,33.86902,-118.07962,1.0,Fast Food Restaurant,Indian Restaurant,Hotel,Sandwich Place,Bubble Tea Shop,Vietnamese Restaurant,Korean Restaurant,Frozen Yogurt Shop,Gift Shop,Taiwanese Restaurant
4,"Athens, CA",south-la,33.920407,-118.279049,1.0,Food Truck,Mexican Restaurant,Park,Farmers Market,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm


In [73]:
la_merged.dropna(inplace=True)

In [81]:
# create map
map_clusters = folium.Map(location=[la_location_data.Latitude.iloc[0], la_location_data.Longitude.iloc[0]], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(la_merged['Latitude'], la_merged['Longitude'], la_merged['Name'], la_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [75]:
la_merged.loc[la_merged['Cluster Labels'] == 0, la_merged.columns[[0] + list(range(5, la_merged.shape[1]))]].head(10)

Unnamed: 0,Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Alondra Park, CA",Home Service,Park,Yoga Studio,Farmers Market,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm
6,"Baldwin Hills/Crenshaw, CA",Playground,Park,Yoga Studio,Farmers Market,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm
14,"Beverlywood, CA",Business Service,Park,Yoga Studio,Farm,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market
47,"Harvard Park, CA",Park,Yoga Studio,Farmers Market,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm,Fast Food Restaurant
78,"Montecito Heights, CA",Food,Park,Yoga Studio,Farmers Market,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm
83,"Paramount, CA",Mexican Restaurant,Park,Business Service,Burger Joint,Flower Shop,Flea Market,Financial or Legal Service,Film Studio,Fast Food Restaurant,Electronics Store
92,"Rolling Hills Estates, CA",Bank,Business Service,Farm,Park,Yoga Studio,Farmers Market,Ethiopian Restaurant,Event Space,Falafel Restaurant,Fast Food Restaurant
93,"Rolling Hills, CA",Business Service,Yoga Studio,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant


In [76]:
la_merged.loc[la_merged['Cluster Labels'] == 1, la_merged.columns[[0] + list(range(5, la_merged.shape[1]))]].head(10)

Unnamed: 0,Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adams-Normandie, CA",Sushi Restaurant,Convenience Store,Taco Place,Grocery Store,Gas Station,Latin American Restaurant,Park,Playground,Home Service,Arcade
2,"Arlington Heights, CA",Food,Restaurant,Convenience Store,Latin American Restaurant,Dance Studio,Shop & Service,Art Gallery,Café,Rental Car Location,English Restaurant
3,"Artesia, CA",Fast Food Restaurant,Indian Restaurant,Hotel,Sandwich Place,Bubble Tea Shop,Vietnamese Restaurant,Korean Restaurant,Frozen Yogurt Shop,Gift Shop,Taiwanese Restaurant
4,"Athens, CA",Food Truck,Mexican Restaurant,Park,Farmers Market,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm
5,"Atwater Village, CA",Arts & Crafts Store,Pet Store,Pizza Place,Gym,Restaurant,Chinese Restaurant,Sporting Goods Shop,Latin American Restaurant,Liquor Store,Farmers Market
7,"Bel-Air, CA",Hotel Bar,Golf Course,Restaurant,Café,Spa,Hotel Pool,Cycle Studio,Farmers Market,English Restaurant,Escape Room
9,"Bell Gardens, CA",Mexican Restaurant,Men's Store,Park,Donut Shop,Burger Joint,Fried Chicken Joint,Sporting Goods Shop,Convenience Store,Flower Shop,Flea Market
10,"Bell, CA",Mexican Restaurant,Pizza Place,Hotel,Burger Joint,Bank,Mediterranean Restaurant,Chinese Restaurant,Grocery Store,Fast Food Restaurant,Coffee Shop
12,"Beverly Grove, CA",Hotel,Café,Mexican Restaurant,French Restaurant,Vegetarian / Vegan Restaurant,Gift Shop,Seafood Restaurant,Chinese Restaurant,Clothing Store,Sushi Restaurant
13,"Beverly Hills, CA",Hotel,Italian Restaurant,New American Restaurant,American Restaurant,Park,Café,Sushi Restaurant,Coffee Shop,Mexican Restaurant,Steakhouse


In [77]:
la_merged.loc[la_merged['Cluster Labels'] == 2, la_merged.columns[[0] + list(range(5, la_merged.shape[1]))]].head(10)

Unnamed: 0,Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,"Bellflower, CA",Trail,Boutique,Yoga Studio,Fast Food Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm,Farmers Market
44,"Griffith Park, CA",Trail,Park,Tea Room,Scenic Lookout,Flea Market,Financial or Legal Service,Film Studio,Fast Food Restaurant,Farmers Market,Farm
52,"Hollywood Hills, CA",Trail,Yoga Studio,Farmers Market,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm,Fast Food Restaurant


In [78]:
la_merged.loc[la_merged['Cluster Labels'] == 3, la_merged.columns[[0] + list(range(5, la_merged.shape[1]))]].head(10)

Unnamed: 0,Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
109,"View Park-Windsor Hills, CA",Gift Shop,Yoga Studio,Dumpling Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm,Farmers Market


In [79]:
la_merged.loc[la_merged['Cluster Labels'] == 4, la_merged.columns[[0] + list(range(5, la_merged.shape[1]))]].head(10)

Unnamed: 0,Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,"Cheviot Hills, CA",Tennis Court,Farmers Market,Electronics Store,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm,Yoga Studio


In [80]:
la_merged.loc[la_merged['Cluster Labels'] == 5, la_merged.columns[[0] + list(range(5, la_merged.shape[1]))]].head()

Unnamed: 0,Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


## Conclusion <a name="conclusion"></a>


So, an aspiring entrepreneur looking to open an office in a relatively covid safe state in the US should choose California, which has the lowest number of new cases as a percentage of that state's population. Of the neighbourhoods in Los Angeles, California's largest city, those identified as belonging to cluster 1 would be the most suitable in terms of nearby venues, for a business that seeks physical activity and wellbeing locations for their employees.