## Combining Demographic Data with Johns Hopkins COVID-19 Data

This is the second notebook in data set creation, and uses the output from demographic_data.ipynb.

COVID-19 data was downloaded from https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data:
+ US county-level confirmed cases data: https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv
+ US county-level deaths data: https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_US.csv

Additionally, since the death and confirmed case numbers for the counties that make up New York City are combined under New York County in the Johns Hopkins dataset, I manually updated the confirmed case and death numbers for each of the 5 counties (New York, Queens, Kings, Richmond, and Bronx counties) with data released by the NYC Health department for the individual dates used.
+ 03/24/20 (earliest date available. data from 03/24, published on 03/25): cases- https://www1.nyc.gov/assets/doh/downloads/pdf/imm/covid-19-daily-data-summary-03252020-1.pdf; deaths- https://www1.nyc.gov/assets/doh/downloads/pdf/imm/covid-19-daily-data-summary-deaths-03252020-1.pdf
+ 04/23/20: cases (data from 04/23, published on 04/24)- https://www1.nyc.gov/assets/doh/downloads/pdf/imm/covid-19-daily-data-summary-04242020-1.pdf; deaths- https://www1.nyc.gov/assets/doh/downloads/pdf/imm/covid-19-daily-data-summary-deaths-04242020-1.pdf

In [82]:
import pandas as pd
from datetime import date
from datetime import datetime
import re

In [83]:
pd.set_option('display.expand_frame_repr', False) # the frame will be huge, don't expand
pd.set_option('display.precision', 4)

In [84]:
demo = pd.read_csv("counties.csv", dtype={'FIPS':float})
# looks like JH data doesn't have leading zeros in FIPS codes
confirmed = pd.read_csv("time_series_covid19_confirmed_US_20200424.csv")
deaths = pd.read_csv("time_series_covid19_deaths_US_20200424.csv")

In [85]:
demo.head()

Unnamed: 0.1,Unnamed: 0,household_size,empl_agriculture,empl_professional,empl_social,empl_services,empl_manufacturing,empl_retail,empl_transp_utilities,employed,...,health_ins,county,state,FIPS,area,prc_obese,incarcerated,domestic_passengers,intl_passengers,order started
0,"Morgan County, Alabama: Summary level: 050, st...",2.56,1.0792,11.1812,19.4094,8.3231,22.2135,10.2471,5.6548,53742,...,98.8143,Morgan County,AL,1103.0,579.34,32.1,604.0,580000,0,04/04/20
1,"Kings County, California: Summary level: 050, ...",3.15,14.8108,7.4102,21.6017,8.9412,7.1271,9.3059,4.4241,52644,...,90.0942,Kings County,CA,6031.0,1389.42,29.4,465.0,0,0,03/19/20
2,"Monterey County, California: Summary level: 05...",3.31,15.99,10.0846,19.6731,10.8732,6.5126,8.9714,3.8882,190707,...,96.2853,Monterey County,CA,6053.0,3280.6,27.6,929.0,186000,0,03/19/20
3,"Nevada County, California: Summary level: 050,...",2.37,1.3392,16.3689,20.6696,11.5335,3.5097,10.5988,6.7902,44505,...,98.7723,Nevada County,CA,6057.0,957.77,21.5,197.0,0,0,03/19/20
4,"Shasta County, California: Summary level: 050,...",2.59,1.0668,9.3942,25.462,11.4847,4.4179,12.8545,5.008,69649,...,99.1735,Shasta County,CA,6089.0,3775.4,23.3,339.0,0,0,03/19/20


In [86]:
confirmed.head()

Unnamed: 0,UID,iso2,iso3,code3,FIPS,Admin2,Province_State,Country_Region,Lat,Long_,...,4/14/20,4/15/20,4/16/20,4/17/20,4/18/20,4/19/20,4/20/20,4/21/20,4/22/20,4/23/20
0,16.0,AS,ASM,16,60.0,,American Samoa,US,-14.271,-170.132,...,0,0,0,0,0,0,0,0,0,0
1,316.0,GU,GUM,316,66.0,,Guam,US,13.4443,144.7937,...,133,135,135,136,136,136,136,136,136,139
2,580.0,MP,MNP,580,69.0,,Northern Mariana Islands,US,15.0979,145.6739,...,11,13,13,13,14,14,14,14,14,14
3,630.0,PR,PRI,630,72.0,,Puerto Rico,US,18.2208,-66.5901,...,923,974,1043,1068,1118,1213,1252,1298,1252,1416
4,850.0,VI,VIR,850,78.0,,Virgin Islands,US,18.3358,-64.8963,...,51,51,51,51,53,53,53,53,54,54


In [87]:
last_date = confirmed.columns.values[-1]
last_date

'4/23/20'

In [88]:
# get date of at least 10 confirmed cases...
date_10_cases = list()

for x in range(confirmed.shape[0]):
    trans = confirmed.iloc[x].T
    trans = trans.iloc[11:] # just use the date fields
    trans = trans[trans >= 10]
    
    if len(trans) > 0:
        date_10_cases.append(trans.keys()[0])
    else:
        date_10_cases.append(last_date) # if county hasn't yet reached 10 cases, use the latest date in the data

In [89]:
confirmed["ten plus cases"] = date_10_cases

In [90]:
confirmed.head()

Unnamed: 0,UID,iso2,iso3,code3,FIPS,Admin2,Province_State,Country_Region,Lat,Long_,...,4/15/20,4/16/20,4/17/20,4/18/20,4/19/20,4/20/20,4/21/20,4/22/20,4/23/20,ten plus cases
0,16.0,AS,ASM,16,60.0,,American Samoa,US,-14.271,-170.132,...,0,0,0,0,0,0,0,0,0,4/23/20
1,316.0,GU,GUM,316,66.0,,Guam,US,13.4443,144.7937,...,135,135,136,136,136,136,136,136,139,3/19/20
2,580.0,MP,MNP,580,69.0,,Northern Mariana Islands,US,15.0979,145.6739,...,13,13,13,14,14,14,14,14,14,4/8/20
3,630.0,PR,PRI,630,72.0,,Puerto Rico,US,18.2208,-66.5901,...,974,1043,1068,1118,1213,1252,1298,1252,1416,3/20/20
4,850.0,VI,VIR,850,78.0,,Virgin Islands,US,18.3358,-64.8963,...,51,51,51,53,53,53,53,54,54,3/24/20


In [91]:
confirmed[(confirmed["Province_State"] == "New York") & 
         ((confirmed["Admin2"] == "Bronx") | (confirmed["Admin2"] == "Kings") |
          (confirmed["Admin2"] == "New York") | (confirmed["Admin2"] == "Queens") |
          (confirmed["Admin2"] == "Richmond"))]
# looks like NYC counties (bronx, kings, queens, etc. all reported under New York County

Unnamed: 0,UID,iso2,iso3,code3,FIPS,Admin2,Province_State,Country_Region,Lat,Long_,...,4/15/20,4/16/20,4/17/20,4/18/20,4/19/20,4/20/20,4/21/20,4/22/20,4/23/20,ten plus cases
1835,84036000.0,US,USA,840,36005.0,Bronx,New York,US,40.8521,-73.8628,...,0,0,0,0,0,0,0,0,0,4/23/20
1856,84036000.0,US,USA,840,36047.0,Kings,New York,US,40.6362,-73.9494,...,0,0,0,0,0,0,0,0,0,4/23/20
1863,84036000.0,US,USA,840,36061.0,New York,New York,US,40.7673,-73.9715,...,118302,123146,127352,135572,138700,141235,144190,147297,145855,3/6/20
1873,84036000.0,US,USA,840,36081.0,Queens,New York,US,40.7109,-73.8168,...,0,0,0,0,0,0,0,0,0,4/23/20
1875,84036000.0,US,USA,840,36085.0,Richmond,New York,US,40.5858,-74.1481,...,0,0,0,0,0,0,0,0,0,4/23/20


In [92]:
deaths.head()

Unnamed: 0,UID,iso2,iso3,code3,FIPS,Admin2,Province_State,Country_Region,Lat,Long_,...,4/14/20,4/15/20,4/16/20,4/17/20,4/18/20,4/19/20,4/20/20,4/21/20,4/22/20,4/23/20
0,16.0,AS,ASM,16,60.0,,American Samoa,US,-14.271,-170.132,...,0,0,0,0,0,0,0,0,0,0
1,316.0,GU,GUM,316,66.0,,Guam,US,13.4443,144.7937,...,5,5,5,5,5,5,5,5,5,5
2,580.0,MP,MNP,580,69.0,,Northern Mariana Islands,US,15.0979,145.6739,...,2,2,2,2,2,2,2,2,2,2
3,630.0,PR,PRI,630,72.0,,Puerto Rico,US,18.2208,-66.5901,...,45,51,56,58,60,62,63,64,63,69
4,850.0,VI,VIR,850,78.0,,Virgin Islands,US,18.3358,-64.8963,...,1,1,1,2,3,3,3,3,3,3


In [93]:
deaths.shape

(3262, 105)

In [94]:
deaths.columns

Index(['UID', 'iso2', 'iso3', 'code3', 'FIPS', 'Admin2', 'Province_State',
       'Country_Region', 'Lat', 'Long_',
       ...
       '4/14/20', '4/15/20', '4/16/20', '4/17/20', '4/18/20', '4/19/20',
       '4/20/20', '4/21/20', '4/22/20', '4/23/20'],
      dtype='object', length=105)

In [95]:
# nyc_counties = ["Bronx", "Kings", "New York", "Queens", "Richmond"]
deaths[(deaths["Province_State"] == "New York") & ((deaths["Admin2"] == "Bronx") | (deaths["Admin2"] == "Kings") |
                                                  (deaths["Admin2"] == "New York") | (deaths["Admin2"] == "Queens") |
                                                  (deaths["Admin2"] == "Richmond"))]
# it appears that all 5 counties in NYC are all being reported en masse as New York County
# SO will need to aggregate some data for NYC instead of leaving each of these counties individually in the data

Unnamed: 0,UID,iso2,iso3,code3,FIPS,Admin2,Province_State,Country_Region,Lat,Long_,...,4/14/20,4/15/20,4/16/20,4/17/20,4/18/20,4/19/20,4/20/20,4/21/20,4/22/20,4/23/20
1835,84036000.0,US,USA,840,36005.0,Bronx,New York,US,40.8521,-73.8628,...,0,0,0,0,0,0,0,0,0,0
1856,84036000.0,US,USA,840,36047.0,Kings,New York,US,40.6362,-73.9494,...,0,0,0,0,0,0,0,0,0,0
1863,84036000.0,US,USA,840,36061.0,New York,New York,US,40.7673,-73.9715,...,7905,8455,11477,13202,13202,14451,14604,14887,15074,16388
1873,84036000.0,US,USA,840,36081.0,Queens,New York,US,40.7109,-73.8168,...,0,0,0,0,0,0,0,0,0,0
1875,84036000.0,US,USA,840,36085.0,Richmond,New York,US,40.5858,-74.1481,...,0,0,0,0,0,0,0,0,0,0


In [96]:
merged = pd.merge(demo, deaths, how='inner', on="FIPS", 
                  left_index=False, right_index=False)

In [97]:
merged.head()

Unnamed: 0.1,Unnamed: 0,household_size,empl_agriculture,empl_professional,empl_social,empl_services,empl_manufacturing,empl_retail,empl_transp_utilities,employed,...,4/14/20,4/15/20,4/16/20,4/17/20,4/18/20,4/19/20,4/20/20,4/21/20,4/22/20,4/23/20
0,"Morgan County, Alabama: Summary level: 050, st...",2.56,1.0792,11.1812,19.4094,8.3231,22.2135,10.2471,5.6548,53742,...,0,0,0,0,0,0,0,0,0,0
1,"Kings County, California: Summary level: 050, ...",3.15,14.8108,7.4102,21.6017,8.9412,7.1271,9.3059,4.4241,52644,...,1,1,1,1,1,1,1,1,1,1
2,"Monterey County, California: Summary level: 05...",3.31,15.99,10.0846,19.6731,10.8732,6.5126,8.9714,3.8882,190707,...,3,3,3,3,3,3,4,4,4,4
3,"Nevada County, California: Summary level: 050,...",2.37,1.3392,16.3689,20.6696,11.5335,3.5097,10.5988,6.7902,44505,...,1,1,1,1,1,1,1,1,1,1
4,"Shasta County, California: Summary level: 050,...",2.59,1.0668,9.3942,25.462,11.4847,4.4179,12.8545,5.008,69649,...,3,3,3,3,3,3,3,3,3,3


In [98]:
merged.describe()

Unnamed: 0,household_size,empl_agriculture,empl_professional,empl_social,empl_services,empl_manufacturing,empl_retail,empl_transp_utilities,employed,prc_fam_poverty,...,4/14/20,4/15/20,4/16/20,4/17/20,4/18/20,4/19/20,4/20/20,4/21/20,4/22/20,4/23/20
count,827.0,827.0,827.0,827.0,827.0,827.0,827.0,827.0,827.0,827.0,...,827.0,827.0,827.0,827.0,827.0,827.0,827.0,827.0,827.0,827.0
mean,2.5905,1.9294,10.1016,23.9089,9.6716,11.0223,11.5296,5.3078,163970.0,9.0312,...,29.4522,32.474,37.9033,41.7146,43.2902,46.3954,48.3059,50.7086,53.3289,56.8827
std,0.2503,2.7136,3.5975,4.7552,2.7863,5.855,2.02,1.913,293100.0,4.2493,...,283.8104,304.0147,407.3916,467.14,467.8789,511.1856,517.9138,528.4922,536.6174,582.4061
min,1.9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,17763.0,1.3,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,2.41,0.5405,7.6825,20.9051,7.968,6.7159,10.2838,3.9946,44176.0,5.9,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
50%,2.55,1.0454,9.5856,23.2224,9.3013,10.0312,11.4295,5.0199,75494.0,8.4,...,3.0,3.0,3.0,4.0,4.0,4.0,4.0,4.0,5.0,5.0
75%,2.71,2.2877,11.926,26.3873,10.8736,14.3158,12.6416,6.3624,168630.0,11.4,...,10.0,11.0,12.5,13.0,14.0,15.0,15.5,17.0,17.5,18.5
max,4.11,25.7236,30.1044,46.2286,31.9878,43.88,21.3085,17.1676,5001400.0,29.4,...,7905.0,8455.0,11477.0,13202.0,13202.0,14451.0,14604.0,14887.0,15074.0,16388.0


In [99]:
merged.columns.values

array(['Unnamed: 0', 'household_size', 'empl_agriculture',
       'empl_professional', 'empl_social', 'empl_services',
       'empl_manufacturing', 'empl_retail', 'empl_transp_utilities',
       'employed', 'prc_fam_poverty', 'avg_income', 'prc_public_transp',
       'population', 'pop_65_plus', 'health_ins', 'county', 'state',
       'FIPS', 'area', 'prc_obese', 'incarcerated', 'domestic_passengers',
       'intl_passengers', 'order started', 'UID', 'iso2', 'iso3', 'code3',
       'Admin2', 'Province_State', 'Country_Region', 'Lat', 'Long_',
       'Combined_Key', 'Population', '1/22/20', '1/23/20', '1/24/20',
       '1/25/20', '1/26/20', '1/27/20', '1/28/20', '1/29/20', '1/30/20',
       '1/31/20', '2/1/20', '2/2/20', '2/3/20', '2/4/20', '2/5/20',
       '2/6/20', '2/7/20', '2/8/20', '2/9/20', '2/10/20', '2/11/20',
       '2/12/20', '2/13/20', '2/14/20', '2/15/20', '2/16/20', '2/17/20',
       '2/18/20', '2/19/20', '2/20/20', '2/21/20', '2/22/20', '2/23/20',
       '2/24/20', '2/25/2

In [100]:
merged.shape

(827, 129)

In [101]:
merged[["population", "Population"]] 
# ACS population and that used by JH data very close, though not exactly the same
# just use ACS population for consistency
# ultimately drop columns 'UID', 'iso2', 'iso3', 'code3', 'Admin2', 'Province_State',
#        'Country_Region', 'Lat', 'Long_', 'Combined_Key', 'Population', "Unnamed: 0"
# and drop/ignore for model "county", "state", "FIPS"

Unnamed: 0,population,Population
0,119089,119679
1,151366,152940
2,435594,434061
3,99696,99755
4,180040,180080
...,...,...
822,814901,822083
823,85129,84769
824,948201,945726
825,187365,187885


In [102]:
merged = merged.drop(['UID', 'iso2', 'iso3', 'code3', 'Admin2', 'Province_State','Country_Region', 
                      'Lat', 'Long_', 'Combined_Key', 'Population', "Unnamed: 0"], axis=1)

In [103]:
confirmed[["3/24/20", "4/1/20", "4/23/20"]].describe() 
# in first pass, found that number of ases 2 weeks prior was almost perfectly correlated with 
# number of deaths, so will just use the count of confirmed cases 4 weeks prior.  I would
# have used earlier data, too, if I could get the break-down by county for NYC

Unnamed: 0,3/24/20,4/1/20,4/23/20
count,3262.0,3262.0,3262.0
mean,16.4733,65.3716,266.4531
std,280.9347,907.6414,2888.1093
min,0.0,0.0,0.0
25%,0.0,0.0,2.0
50%,0.0,2.0,12.0
75%,2.0,11.0,59.0
max,14904.0,47439.0,145855.0


In [104]:
# add the date of 10+ confirmed deaths as a column
confirmed = confirmed[["FIPS", "3/23/20", "4/1/20", "4/23/20", "ten plus cases"]]
confirmed.columns = ["FIPS", "cases_march24", "cases_april1", "cases_april23", "ten plus cases"]

In [105]:
merged = pd.merge(merged, confirmed, how='inner', on="FIPS",
                 left_index=False, right_index=False)

In [106]:
merged.head()

Unnamed: 0,household_size,empl_agriculture,empl_professional,empl_social,empl_services,empl_manufacturing,empl_retail,empl_transp_utilities,employed,prc_fam_poverty,...,4/18/20,4/19/20,4/20/20,4/21/20,4/22/20,4/23/20,cases_march24,cases_april1,cases_april23,ten plus cases
0,2.56,1.0792,11.1812,19.4094,8.3231,22.2135,10.2471,5.6548,53742,9.9,...,0,0,0,0,0,0,0,19,50,3/28/20
1,3.15,14.8108,7.4102,21.6017,8.9412,7.1271,9.3059,4.4241,52644,15.6,...,1,1,1,1,1,1,0,4,35,4/12/20
2,3.31,15.99,10.0846,19.6731,10.8732,6.5126,8.9714,3.8882,190707,10.5,...,3,3,4,4,4,4,14,42,154,3/21/20
3,2.37,1.3392,16.3689,20.6696,11.5335,3.5097,10.5988,6.7902,44505,5.1,...,1,1,1,1,1,1,2,26,36,3/28/20
4,2.59,1.0668,9.3942,25.462,11.4847,4.4179,12.8545,5.008,69649,9.5,...,3,3,3,3,3,3,2,7,28,4/3/20


In [107]:
merged.shape

(827, 121)

#### Manually Update Numbers for NYC Counties

Combining the data for the 5 NYC counties since infections and data is only being reported through New York County instead of each county (New York County, Bronx County, Kings County, Queens County, Richmond County) individually

In [108]:
merged[(merged["state"] == "NY") & ((merged["county"] == "Bronx County") | 
                                   (merged["county"] == "Kings County") |
                                   (merged["county"] == "New York County") | 
                                   (merged["county"] == "Queens County") |
                                   (merged["county"] == "Richmond County"))][["FIPS", "county", "state",
                                                                              "ten plus cases", "3/24/20",
                                                                              "4/1/20", "4/23/20", "cases_march24",
                                                                              "cases_april1", "cases_april23"]]

Unnamed: 0,FIPS,county,state,ten plus cases,3/24/20,4/1/20,4/23/20,cases_march24,cases_april1,cases_april23
284,36005.0,Bronx County,NY,4/23/20,0,0,0,0,0,0
489,36061.0,New York County,NY,3/6/20,131,1374,16388,12305,47439,145855
588,36047.0,Kings County,NY,4/23/20,0,0,0,0,0,0
688,36081.0,Queens County,NY,4/23/20,0,0,0,0,0,0
689,36085.0,Richmond County,NY,4/23/20,0,0,0,0,0,0


In [109]:
# update so all have same "ten plus cases" of 3/6/20
for a in [284, 588, 688, 689]:
    merged.at[a,'ten plus cases'] = "3/6/20"

In [110]:
merged[(merged["state"] == "NY") & ((merged["county"] == "Bronx County") | 
                                   (merged["county"] == "Kings County") |
                                   (merged["county"] == "New York County") | 
                                   (merged["county"] == "Queens County") |
                                   (merged["county"] == "Richmond County"))][["FIPS", "county", "state",
                                                                              "ten plus cases", "3/24/20",
                                                                              "4/1/20", "4/23/20", "cases_march24",
                                                                              "cases_april1", "cases_april23"]]

Unnamed: 0,FIPS,county,state,ten plus cases,3/24/20,4/1/20,4/23/20,cases_march24,cases_april1,cases_april23
284,36005.0,Bronx County,NY,3/6/20,0,0,0,0,0,0
489,36061.0,New York County,NY,3/6/20,131,1374,16388,12305,47439,145855
588,36047.0,Kings County,NY,3/6/20,0,0,0,0,0,0
688,36081.0,Queens County,NY,3/6/20,0,0,0,0,0,0
689,36085.0,Richmond County,NY,3/6/20,0,0,0,0,0,0


In [111]:
# deaths for 4/23: https://www1.nyc.gov/assets/doh/downloads/pdf/imm/covid-19-daily-data-summary-deaths-04242020-1.pdf
merged.at[284, "4/23/20"] = 2342 
merged.at[489, "4/23/20"] = 1390
merged.at[588, "4/23/20"] = 3190
merged.at[688, "4/23/20"] = 3304
merged.at[689, "4/23/20"] = 515

In [112]:
# deaths for 4/1: https://www1.nyc.gov/assets/doh/downloads/pdf/imm/covid-19-daily-data-summary-deaths-04022020-1.pdf
merged.at[284, "4/1/20"] = 382
merged.at[489, "4/1/20"] = 165
merged.at[588, "4/1/20"] = 328
merged.at[688, "4/1/20"] = 448
merged.at[689, "4/1/20"] = 67

In [113]:
# deaths for 3/24: https://www1.nyc.gov/assets/doh/downloads/pdf/imm/covid-19-daily-data-summary-deaths-03252020-1.pdf

merged.at[284, "3/24/20"] = 43
merged.at[489, "3/24/20"] = 30
merged.at[588, "3/24/20"] = 43
merged.at[688, "3/24/20"] = 65
merged.at[689, "3/24/20"] = 18

In [114]:
merged[(merged["state"] == "NY") & ((merged["county"] == "Bronx County") | 
                                   (merged["county"] == "Kings County") |
                                   (merged["county"] == "New York County") | 
                                   (merged["county"] == "Queens County") |
                                   (merged["county"] == "Richmond County"))][["FIPS", "county", "state",
                                                                              "ten plus cases", "3/24/20",
                                                                              "4/1/20", "4/23/20", "cases_march24",
                                                                              "cases_april1", "cases_april23"]]

Unnamed: 0,FIPS,county,state,ten plus cases,3/24/20,4/1/20,4/23/20,cases_march24,cases_april1,cases_april23
284,36005.0,Bronx County,NY,3/6/20,43,382,2342,0,0,0
489,36061.0,New York County,NY,3/6/20,30,165,1390,12305,47439,145855
588,36047.0,Kings County,NY,3/6/20,43,328,3190,0,0,0
688,36081.0,Queens County,NY,3/6/20,65,448,3304,0,0,0
689,36085.0,Richmond County,NY,3/6/20,18,67,515,0,0,0


In [115]:
# cases march 24: https://www1.nyc.gov/assets/doh/downloads/pdf/imm/covid-19-daily-data-summary-03252020-1.pdf

merged.at[284, "cases_march24"] = 2789
merged.at[489, "cases_march24"] = 3187
merged.at[588, "cases_march24"] = 4656
merged.at[688, "cases_march24"] = 5066
merged.at[689, "cases_march24"] = 1084

In [116]:
# cases april 1: https://www1.nyc.gov/assets/doh/downloads/pdf/imm/covid-19-daily-data-summary-04022020-1.pdf

merged.at[284, "cases_april1"] = 9107
merged.at[489, "cases_april1"] = 7278
merged.at[588, "cases_april1"] = 12983
merged.at[688, "cases_april1"] = 16336
merged.at[689, "cases_april1"] = 2723

In [117]:
# cases april 23: https://www1.nyc.gov/assets/doh/downloads/pdf/imm/covid-19-daily-data-summary-04242020-1.pdf

merged.at[284, "cases_april23"] = 32862
merged.at[489, "cases_april23"] = 18252
merged.at[588, "cases_april23"] = 38727
merged.at[688, "cases_april23"] = 45313
merged.at[689, "cases_april23"] = 10917

In [118]:
merged[(merged["state"] == "NY") & ((merged["county"] == "Bronx County") | 
                                   (merged["county"] == "Kings County") |
                                   (merged["county"] == "New York County") | 
                                   (merged["county"] == "Queens County") |
                                   (merged["county"] == "Richmond County"))][["FIPS", "county", "state",
                                                                              "ten plus cases", "3/24/20",
                                                                              "4/1/20", "4/23/20", "cases_march24",
                                                                              "cases_april1", "cases_april23"]]

Unnamed: 0,FIPS,county,state,ten plus cases,3/24/20,4/1/20,4/23/20,cases_march24,cases_april1,cases_april23
284,36005.0,Bronx County,NY,3/6/20,43,382,2342,2789,9107,32862
489,36061.0,New York County,NY,3/6/20,30,165,1390,3187,7278,18252
588,36047.0,Kings County,NY,3/6/20,43,328,3190,4656,12983,38727
688,36081.0,Queens County,NY,3/6/20,65,448,3304,5066,16336,45313
689,36085.0,Richmond County,NY,3/6/20,18,67,515,1084,2723,10917


remove all the unused date (death counts) columns

In [119]:
to_drop = ['1/22/20', '1/23/20','1/24/20', '1/25/20', '1/26/20', '1/27/20', '1/28/20', '1/29/20',
           '1/30/20', '1/31/20', '2/1/20', '2/2/20', '2/3/20', '2/4/20',
           '2/5/20', '2/6/20', '2/7/20', '2/8/20', '2/9/20', '2/10/20',
           '2/11/20', '2/12/20', '2/13/20', '2/14/20', '2/15/20', '2/16/20',
           '2/17/20', '2/18/20', '2/19/20', '2/20/20', '2/21/20', '2/22/20',
           '2/23/20', '2/24/20', '2/25/20', '2/26/20', '2/27/20', '2/28/20', 
           '2/29/20', '3/1/20', '3/2/20', '3/3/20', '3/4/20', '3/5/20',
           '3/6/20', '3/7/20', '3/8/20', '3/9/20', '3/10/20', '3/11/20',
           '3/12/20', '3/13/20', '3/14/20', '3/15/20', '3/16/20', '3/17/20',
           '3/18/20', '3/19/20', '3/20/20', '3/21/20', '3/22/20', '3/23/20', 
           '3/25/20', '3/26/20', '3/27/20', '3/28/20', '3/29/20',
           '3/30/20', '3/31/20', '4/2/20', '4/3/20', '4/4/20',
           '4/5/20', '4/6/20', '4/7/20', '4/8/20', '4/9/20', '4/10/20',
           '4/11/20', '4/12/20', '4/13/20', '4/14/20', '4/15/20', '4/16/20',
           '4/17/20', '4/18/20', '4/19/20', '4/20/20', '4/21/20', '4/22/20']
merged.drop(to_drop, axis=1, inplace=True)

In [120]:
merged.columns.values

array(['household_size', 'empl_agriculture', 'empl_professional',
       'empl_social', 'empl_services', 'empl_manufacturing',
       'empl_retail', 'empl_transp_utilities', 'employed',
       'prc_fam_poverty', 'avg_income', 'prc_public_transp', 'population',
       'pop_65_plus', 'health_ins', 'county', 'state', 'FIPS', 'area',
       'prc_obese', 'incarcerated', 'domestic_passengers',
       'intl_passengers', 'order started', '3/24/20', '4/1/20', '4/23/20',
       'cases_march24', 'cases_april1', 'cases_april23', 'ten plus cases'],
      dtype=object)

In [121]:
merged.to_csv("../combined_data.csv")