### Introduction/Business Problem

The goal of this project is to determine if there is a correlation between neighborhood composition in terms of restaurants, outdoor activities, and economic specialties and the impact of COVID-19 on those neighborhoods.  I will do a clustering very similar to the one we did for Toronto and New York, then add a dimension of COVID rates and overall population to determine if certain neighborhoods were disproportionately affected, and if so, which ones. I'll also try to determine what economic factors may have contributed to those case rates if a correlation is found. 

Another interesting experiment may be to also run clustering INCLUDING scaled COVID data (ie, by percent of population affected) to see if we get a different grouping of neigborhoods.  

### Data and Methods

For COVID case rate data, I will use data publicly available on the LA county website, and only focus on LA counties.  http://publichealth.lacounty.gov/media/coronavirus/locations.htm

For the geographical coordinates of those counties, I will rely on USC's neighborhood data for social change, located here, which can be exported as a CSV:
https://usc.data.socrata.com/dataset/Los-Angeles-Neighborhood-Map/r8qd-yxsr

I will only be focused on neighborhoods in Los Angeles as opposed to the surrounding suburbs and exurbs.  I will simply filter neighborhood data from the public health website for LA county on neighborhoods containing "Los Angeles."

To determine the clusters of similar neighborhoods with which to pair COVID data, I will use the FourSquare API in a manner similar to the previous exercises (ie, cluster by neighborhood using one-hot encoding) and include COVID case rate data to see if we get obvious clusters by economic driver. 

In [1]:
import numpy as np
import pandas as pd

In [2]:
url = "http://publichealth.lacounty.gov/media/coronavirus/locations.htm"
tables_from_page = pd.read_html(url)

In [3]:
tables_from_page

[                                                        New Daily Counts  \
                                               Laboratory Confirmed Cases   
                                                                   Deaths   
                Age Group (Los Angeles County Cases Only-excl LB and Pas)   
                   Gender (Los Angeles County Cases Only-excl LB and Pas)   
           Race/Ethnicity (Los Angeles County Cases Only-excl LB and Pas)   
            Hospitalization LAC cases only (excl Long Beach and Pasadena)   
    Deaths Race/Ethnicity (Los Angeles County Cases Only-excl LB and Pas)   
 0                                             Cases**                      
 1                                              Deaths                      
 2                                                 NaN                      
 3           -- Los Angeles County (excl. LB and Pas)*                      
 4                                       -- Long Beach                      

In [7]:
covid_by_neighborhood = tables_from_page[4]

In [10]:
los_angeles_covid_rates = covid_by_neighborhood[covid_by_neighborhood["City/Community"].str.contains("Los Angeles")]

In [11]:
los_angeles_covid_rates

Unnamed: 0,City/Community,Total Cases,Crude Case Rate3,"Adjusted Case Rate3,4",Unstable Adjusted Rate,2018 PEPS Population
85,Los Angeles - Adams-Normandie,50,610,589,,8202
86,Los Angeles - Alsace,37,297,302,,12445
87,Los Angeles - Angeles National Forest,0,0,0,,40
88,Los Angeles - Angelino Heights,14,560,614,^,2502
89,Los Angeles - Arleta,318,925,924,,34370
...,...,...,...,...,...,...
221,Los Angeles - Wilshire Center,298,594,591,,50170
222,Los Angeles - Winnetka,314,606,598,,51786
223,Los Angeles - Woodland Hills,258,379,383,,68055
252,Unincorporated - East Los Angeles,888,709,721,,125269
