### Introduction/Business Problem

The goal of this project is to determine if there is a correlation between neighborhood composition in terms of restaurants, outdoor activities, and economic specialties and the impact of COVID-19 on those neighborhoods.  I will do a clustering very similar to the one we did for Toronto and New York, then add a dimension of COVID rates and overall population to determine if certain neighborhoods were disproportionately affected, and if so, which ones. I'll also try to determine what economic factors may have contributed to those case rates if a correlation is found. 

Another interesting experiment may be to also run clustering INCLUDING scaled COVID data (ie, by percent of population affected) to see if we get a different grouping of neigborhoods.  

The target audience for this report may be municipalities or local governments preparing for pandemic scenarios in the future, to determine the highest likelihood areas of spread and which activities may need to be limited first.  It may also indicate who the most at risk people and businesses are - and hence what kinds of businesses and areas need to close or distance first.  

The challenge/assumption here is that people actually *live* in the neighborhoods where they work and where certain kinds of businesses are located.  It may also be that crowds are more likely to bring COVID to high traffic areas either way. I'm curious to see the outcome. 



### Data and Methods

For COVID case rate data, I will use data publicly available on the LA county website, and only focus on LA counties.  http://publichealth.lacounty.gov/media/coronavirus/locations.htm

For the geographical coordinates of those counties, I will rely on USC's neighborhood data for social change, located here, which can be exported as a CSV:
https://usc.data.socrata.com/dataset/Los-Angeles-Neighborhood-Map/r8qd-yxsr

I will only be focused on neighborhoods in Los Angeles as opposed to the surrounding suburbs and exurbs.  I will simply filter neighborhood data from the public health website for LA county on neighborhoods containing "Los Angeles."

To determine the clusters of similar neighborhoods with which to pair COVID data, I will use the FourSquare API in a manner similar to the previous exercises (ie, cluster by neighborhood using one-hot encoding) and include COVID case rate data to see if we get obvious clusters by economic driver. 

### Quick Data Gathering Exercise: COVID in LA

In [43]:
import numpy as np
import pandas as pd
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium 
import geocoder

In [2]:
url = "http://publichealth.lacounty.gov/media/coronavirus/locations.htm"
tables_from_page = pd.read_html(url)

In [7]:
covid_by_neighborhood = tables_from_page[4]

In [20]:
los_angeles_covid_rates = covid_by_neighborhood[covid_by_neighborhood["City/Community"].str.contains("Los Angeles -")]

In [21]:
los_angeles_covid_rates

Unnamed: 0,City/Community,Total Cases,Crude Case Rate3,"Adjusted Case Rate3,4",Unstable Adjusted Rate,2018 PEPS Population
85,Los Angeles - Adams-Normandie,50,610,589,,8202
86,Los Angeles - Alsace,37,297,302,,12445
87,Los Angeles - Angeles National Forest,0,0,0,,40
88,Los Angeles - Angelino Heights,14,560,614,^,2502
89,Los Angeles - Arleta,318,925,924,,34370
...,...,...,...,...,...,...
219,Los Angeles - Wholesale District,167,462,461,,36129
220,Los Angeles - Wilmington,360,637,633,,56487
221,Los Angeles - Wilshire Center,298,594,591,,50170
222,Los Angeles - Winnetka,314,606,598,,51786


### Quick Data Gathering Exercise: LA Neighborhood Data

In [35]:
los_angeles_covid_rates["City/Community"]=los_angeles_covid_rates["City/Community"].str.split(" - ").str[1]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  los_angeles_covid_rates["City/Community"]=los_angeles_covid_rates["City/Community"].str.split(" - ").str[1]


In [36]:
los_angeles_covid_rates

Unnamed: 0,City/Community,Total Cases,Crude Case Rate3,"Adjusted Case Rate3,4",Unstable Adjusted Rate,2018 PEPS Population
85,Adams-Normandie,50,610,589,,8202
86,Alsace,37,297,302,,12445
87,Angeles National Forest,0,0,0,,40
88,Angelino Heights,14,560,614,^,2502
89,Arleta,318,925,924,,34370
...,...,...,...,...,...,...
219,Wholesale District,167,462,461,,36129
220,Wilmington,360,637,633,,56487
221,Wilshire Center,298,594,591,,50170
222,Winnetka,314,606,598,,51786


In [12]:
la_neighborhoods=pd.read_csv('la_neighborhoods.csv')

In [13]:
la_neighborhoods

Unnamed: 0,set,slug,the_geom,kind,external_i,name,display_na,sqmi,type,name_1,slug_1,latitude,longitude,location
0,L.A. County Neighborhoods (Current),acton,MULTIPOLYGON (((-118.20261747920541 34.5389897...,L.A. County Neighborhood (Current),acton,Acton,Acton L.A. County Neighborhood (Current),39.339109,unincorporated-area,,,-118.169810,34.497355,POINT(34.497355239240846 -118.16981019229348)
1,L.A. County Neighborhoods (Current),adams-normandie,MULTIPOLYGON (((-118.30900800000012 34.0374109...,L.A. County Neighborhood (Current),adams-normandie,Adams-Normandie,Adams-Normandie L.A. County Neighborhood (Curr...,0.805350,segment-of-a-city,,,-118.300208,34.031461,POINT(34.031461499124156 -118.30020800000011)
2,L.A. County Neighborhoods (Current),agoura-hills,MULTIPOLYGON (((-118.76192500000009 34.1682029...,L.A. County Neighborhood (Current),agoura-hills,Agoura Hills,Agoura Hills L.A. County Neighborhood (Current),8.146760,standalone-city,,,-118.759885,34.146736,POINT(34.146736499122795 -118.75988450000015)
3,L.A. County Neighborhoods (Current),agua-dulce,MULTIPOLYGON (((-118.2546773959221 34.55830403...,L.A. County Neighborhood (Current),agua-dulce,Agua Dulce,Agua Dulce L.A. County Neighborhood (Current),31.462632,unincorporated-area,,,-118.317104,34.504927,POINT(34.504926999796837 -118.3171036690717)
4,L.A. County Neighborhoods (Current),alhambra,MULTIPOLYGON (((-118.12174700000014 34.1050399...,L.A. County Neighborhood (Current),alhambra,Alhambra,Alhambra L.A. County Neighborhood (Current),7.623814,standalone-city,,,-118.136512,34.085539,POINT(34.085538999123571 -118.13651200000021)
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
267,L.A. County Neighborhoods (Current),willowbrook,MULTIPOLYGON (((-118.2300539720206 33.92809400...,L.A. County Neighborhood (Current),willowbrook,Willowbrook,Willowbrook L.A. County Neighborhood (Current),3.766361,unincorporated-area,,,-118.252312,33.915711,POINT(33.915710503828592 -118.25231247908229)
268,L.A. County Neighborhoods (Current),wilmington,"MULTIPOLYGON (((-118.224761 33.82460699912682,...",L.A. County Neighborhood (Current),wilmington,Wilmington,Wilmington L.A. County Neighborhood (Current),9.141293,segment-of-a-city,,,-118.259187,33.791294,POINT(33.79129350128175 -118.25918700000008)
269,L.A. County Neighborhoods (Current),windsor-square,MULTIPOLYGON (((-118.313709 34.076309999123666...,L.A. County Neighborhood (Current),windsor-square,Windsor Square,Windsor Square L.A. County Neighborhood (Current),0.683464,segment-of-a-city,,,-118.319909,34.069108,POINT(34.069108499123722 -118.31990900000005)
270,L.A. County Neighborhoods (Current),winnetka,MULTIPOLYGON (((-118.562213 34.231502999121666...,L.A. County Neighborhood (Current),winnetka,Winnetka,Winnetka L.A. County Neighborhood (Current),4.777241,segment-of-a-city,,,-118.575220,34.210459,POINT(34.210459499121988 -118.57521950000014)


In [18]:
la_neighborhoods_df = la_neighborhoods[["name","latitude","longitude"]]

In [19]:
la_neighborhoods_df

Unnamed: 0,name,latitude,longitude
0,Acton,-118.169810,34.497355
1,Adams-Normandie,-118.300208,34.031461
2,Agoura Hills,-118.759885,34.146736
3,Agua Dulce,-118.317104,34.504927
4,Alhambra,-118.136512,34.085539
...,...,...,...
267,Willowbrook,-118.252312,33.915711
268,Wilmington,-118.259187,33.791294
269,Windsor Square,-118.319909,34.069108
270,Winnetka,-118.575220,34.210459


In [37]:
los_angeles_covid_rates.rename(columns={"City/Community":"name"}, inplace = True)
los_angeles_covid_rates

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(


Unnamed: 0,name,Total Cases,Crude Case Rate3,"Adjusted Case Rate3,4",Unstable Adjusted Rate,2018 PEPS Population
85,Adams-Normandie,50,610,589,,8202
86,Alsace,37,297,302,,12445
87,Angeles National Forest,0,0,0,,40
88,Angelino Heights,14,560,614,^,2502
89,Arleta,318,925,924,,34370
...,...,...,...,...,...,...
219,Wholesale District,167,462,461,,36129
220,Wilmington,360,637,633,,56487
221,Wilshire Center,298,594,591,,50170
222,Winnetka,314,606,598,,51786


In [38]:
merged_LA_data = pd.merge(la_neighborhoods_df,los_angeles_covid_rates, on="name")

### The inner merge above ensures we only keep neighborhoods for which we have data in both dataframes

In [39]:
merged_LA_data

Unnamed: 0,name,latitude,longitude,Total Cases,Crude Case Rate3,"Adjusted Case Rate3,4",Unstable Adjusted Rate,2018 PEPS Population
0,Adams-Normandie,-118.300208,34.031461,50,610,589,,8202
1,Arleta,-118.430757,34.243100,318,925,924,,34370
2,Atwater Village,-118.262373,34.131066,59,402,394,,14666
3,Beverly Crest,-118.423263,34.106007,17,136,135,^,12525
4,Koreatown,-118.304958,34.064510,274,530,526,,51693
...,...,...,...,...,...,...,...,...
81,West Los Angeles,-118.430745,34.047220,108,287,296,,37636
82,Westwood,-118.440480,34.065235,141,261,245,,54109
83,Wilmington,-118.259187,33.791294,360,637,633,,56487
84,Winnetka,-118.575220,34.210459,314,606,598,,51786


In [40]:
merged_LA_data.rename(columns = {"name":"Neighborhood"}, inplace=True)

### So turns out our latitude and logitude data was inverted from our source.  Fixing that. 

In [57]:
corrected_long_lat_LA = merged_LA_data.rename(columns={"longitude":"lat", "latitude":"long"})

In [58]:
corrected_long_lat_LA

Unnamed: 0,Neighborhood,long,lat,Total Cases,Crude Case Rate3,"Adjusted Case Rate3,4",Unstable Adjusted Rate,2018 PEPS Population
0,Adams-Normandie,-118.300208,34.031461,50,610,589,,8202
1,Arleta,-118.430757,34.243100,318,925,924,,34370
2,Atwater Village,-118.262373,34.131066,59,402,394,,14666
3,Beverly Crest,-118.423263,34.106007,17,136,135,^,12525
4,Koreatown,-118.304958,34.064510,274,530,526,,51693
...,...,...,...,...,...,...,...,...
81,West Los Angeles,-118.430745,34.047220,108,287,296,,37636
82,Westwood,-118.440480,34.065235,141,261,245,,54109
83,Wilmington,-118.259187,33.791294,360,637,633,,56487
84,Winnetka,-118.575220,34.210459,314,606,598,,51786


### Pulling Foursquare Neighborhood Data: A Known Problem

In [42]:
CLIENT_ID = 'REDACTED' # your Foursquare ID
CLIENT_SECRET = 'REDACTED' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: LNP5VXIAT3ARBHP2HZONJKFBZCU2Z2D4CREQEGDSIH3WGRJT
CLIENT_SECRET:R55Q3YNOJ3BH4CDXBYVKJSQ2JEYS2KUWYOTWCWJ0NVHFRTMB


In [59]:
latitude = 34.052235
longitude = -118.243683
la = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood, covid_rate in zip(corrected_long_lat_LA['lat'], corrected_long_lat_LA['long'], corrected_long_lat_LA['Neighborhood'], (corrected_long_lat_LA["Total Cases"]/corrected_long_lat_LA["2018 PEPS Population"])*100):
    label = '{}, {}, {}, {}'.format(neighborhood, covid_rate, lat, lng)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(la)  
    
la

### Randomly pick West LA for lat/long

In [61]:
west_la_latitude = corrected_long_lat_LA.loc[81,'lat']
west_la_longitude = corrected_long_lat_LA.loc[81, 'long']

url = "https://api.foursquare.com/v2/venues/explore"

params = dict(
client_id= CLIENT_ID,
client_secret= CLIENT_SECRET,
v='20180323',
ll='{},{}'.format(west_la_latitude,west_la_longitude),
radius = 500,
limit=100
)


### Boom, some data on West LA! 

In [62]:
import requests
import json

results = requests.get(url, params).json()
results

{'meta': {'code': 200, 'requestId': '602c7ac8d8ba7776df7c237a'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': '$-$$$$', 'key': 'price'},
    {'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Westside',
  'headerFullLocation': 'Westside, Los Angeles',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 31,
  'suggestedBounds': {'ne': {'lat': 34.051720503623926,
    'lng': -118.42532363318354},
   'sw': {'lat': 34.04272049462392, 'lng': -118.43616536681647}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '49fb4ccaf964a520326e1fe3',
       'name': 'California Chicken Cafe',
       'contact': {},
       'location': {'address': '2005 Westwood Blvd',
        'crossStreet': 'at La Grange Ave',
        'lat': 34.