# Data Treatment for County-Level COVID-19 Analysis
## Curtis Peterson

This notebook creates a .csv file that combines county-level COVID-19 data from the [covid act now](https://covidactnow.org/?s=27521640) API with 2020 census data from [redistricting data hub](https://redistrictingdatahub.org/) in a simplified format. An API key from covid act now is required to get the data.

In [6]:
import glob
import requests

import pandas as pd
import numpy as np

In [7]:
update_data = False

if update_data:
    response_API = requests.get('https://api.covidactnow.org/v2/counties.timeseries.csv?apiKey=****ADD YOUR API KEY HERE****')
    file = open('covid_data.csv', "w")
    file.write(response_API.text)
    file.close()

covid_df = pd.read_csv('covid_data.csv')

In [8]:
covid_df.head()

Unnamed: 0,date,country,state,county,fips,lat,long,locationId,actuals.cases,actuals.deaths,...,unused3,unused4,metrics.icuCapacityRatio,riskLevels.overall,metrics.vaccinationsInitiatedRatio,metrics.vaccinationsCompletedRatio,actuals.newDeaths,actuals.vaccinesAdministered,riskLevels.caseDensity,cdcTransmissionLevel
0,2020-04-09,US,AK,Aleutians East Borough,2013,,,iso1:us#iso2:us-ak#fips:02013,,,...,,,,0,,,,,0,0
1,2020-04-10,US,AK,Aleutians East Borough,2013,,,iso1:us#iso2:us-ak#fips:02013,,,...,,,,0,,,,,0,0
2,2020-04-11,US,AK,Aleutians East Borough,2013,,,iso1:us#iso2:us-ak#fips:02013,,,...,,,,0,,,,,0,0
3,2020-04-12,US,AK,Aleutians East Borough,2013,,,iso1:us#iso2:us-ak#fips:02013,,,...,,,,0,,,,,0,0
4,2020-04-13,US,AK,Aleutians East Borough,2013,,,iso1:us#iso2:us-ak#fips:02013,,,...,,,,0,,,,,0,0


In [9]:
covid_df.info(show_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2079240 entries, 0 to 2079239
Data columns (total 41 columns):
 #   Column                                  Non-Null Count    Dtype  
---  ------                                  --------------    -----  
 0   date                                    2079240 non-null  object 
 1   country                                 2079240 non-null  object 
 2   state                                   2079240 non-null  object 
 3   county                                  2079240 non-null  object 
 4   fips                                    2079240 non-null  int64  
 5   lat                                     0 non-null        float64
 6   long                                    0 non-null        float64
 7   locationId                              2079240 non-null  object 
 8   actuals.cases                           2036661 non-null  float64
 9   actuals.deaths                          1991668 non-null  float64
 10  actuals.positiveTests         

In [10]:
use_columns = ['date', 'country', 'state', 'county', 'locationId', 'actuals.newCases', 'actuals.newDeaths', 'actuals.cases', 'actuals.deaths', 'actuals.icuBeds.currentUsageCovid', 'actuals.icuBeds.currentUsageTotal', 'actuals.icuBeds.capacity', 'actuals.vaccinationsInitiated', 'actuals.vaccinationsCompleted', ]
cleaned_covid_df = covid_df[use_columns]


In [11]:
file_list = glob.glob('census_data_by_county/**/*.csv')
df_list = []
for file in file_list:
    df = pd.read_csv(file)
    df_list.append(df)
df = pd.DataFrame()
census_df = df.append([df for df in df_list], ignore_index=True)

In [12]:
print(len(census_df.columns))

332


In [13]:
print(census_df.columns[:100])
print(census_df.columns[100:200])
print(census_df.columns[200:300])
print(census_df.columns[300])

Index(['FILEID', 'STUSAB', 'SUMLEV', 'GEOVAR', 'GEOCOMP', 'CHARITER',
       'LOGRECNO', 'GEOID', 'GEOCODE', 'REGION', 'DIVISION', 'STATE',
       'STATENS', 'COUNTY', 'COUNTYCC', 'COUNTYNS', 'CBSA', 'MEMI', 'CSA',
       'METDIV', 'AREALAND', 'AREAWATR', 'BASENAME', 'NAME', 'FUNCSTAT',
       'POP100', 'HU100', 'INTPTLAT', 'INTPTLON', 'LSADC', 'GEOID20',
       'P0010001', 'P0010002', 'P0010003', 'P0010004', 'P0010005', 'P0010006',
       'P0010007', 'P0010008', 'P0010009', 'P0010010', 'P0010011', 'P0010012',
       'P0010013', 'P0010014', 'P0010015', 'P0010016', 'P0010017', 'P0010018',
       'P0010019', 'P0010020', 'P0010021', 'P0010022', 'P0010023', 'P0010024',
       'P0010025', 'P0010026', 'P0010027', 'P0010028', 'P0010029', 'P0010030',
       'P0010031', 'P0010032', 'P0010033', 'P0010034', 'P0010035', 'P0010036',
       'P0010037', 'P0010038', 'P0010039', 'P0010040', 'P0010041', 'P0010042',
       'P0010043', 'P0010044', 'P0010045', 'P0010046', 'P0010047', 'P0010048',
       'P0

In [14]:
fields = ['STUSAB','NAME','POP100','AREALAND']
pop_by_county_df =  census_df[fields]
pop_by_county_df.loc[len(pop_by_county_df.index)] = ['DC', 'District of Columbia', 689545, 177000000] #Inserted manually because it wasn't available on redistrictingdatahub

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  iloc._setitem_with_indexer(indexer, value, self.name)


In [15]:
list_temp = []
for line in pop_by_county_df['NAME']:
    list_temp.append(np.sum(cleaned_covid_df['county'] == line))

print(np.sum(np.array(list_temp) == 0))

missing_index = np.where(np.array(list_temp) == 0)
print('missing counties:')
print(pop_by_county_df.iloc[missing_index])

2
missing counties:
  STUSAB                      NAME  POP100     AREALAND
5     AK       Chugach Census Area    7102  24682168359
6     AK  Copper River Census Area    2617  63952335592


In [16]:
cleaned_covid_df['county_pop'] = np.nan
cleaned_covid_df['land_area'] = np.nan
for row in pop_by_county_df.index:
    state_filter = (cleaned_covid_df['state'] == pop_by_county_df.iloc[row]['STUSAB'])
    county_filter = (cleaned_covid_df['county'] == pop_by_county_df.iloc[row]['NAME'])
    pop_temp = pop_by_county_df.iloc[row]['POP100']
    area_temp = pop_by_county_df.iloc[row]['AREALAND']

    filter_temp = state_filter&county_filter

    cleaned_covid_df.loc[filter_temp, 'county_pop'] = pop_temp
    cleaned_covid_df.loc[filter_temp, 'land_area'] = area_temp

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_covid_df['county_pop'] = np.nan
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleaned_covid_df['land_area'] = np.nan
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)


In [17]:
name_map = {'actuals.newCases': 'new_cases', 'actuals.newDeaths': 'new_deaths', 'actuals.cases': 'total_cases', 'actuals.deaths': 'total_deaths',
 'actuals.icuBeds.currentUsageCovid': 'icu_beds_used_covid', 'actuals.icuBeds.currentUsageTotal': 'icu_beds_used_total',
  'actuals.icuBeds.capacity': 'icu_beds_total', 'actuals.vaccinationsInitiated': 'vax_initiated', 'actuals.vaccinationsCompleted': 'vax_completed'}

cleaned_covid_df.rename(columns=name_map, inplace=True)
cleaned_covid_df.tail()


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(


Unnamed: 0,date,country,state,county,locationId,new_cases,new_deaths,total_cases,total_deaths,icu_beds_used_covid,icu_beds_used_total,icu_beds_total,vax_initiated,vax_completed,county_pop,land_area
2079235,2021-12-22,US,WY,Weston County,iso1:us#iso2:us-wy#fips:56045,0.0,0.0,1243.0,14.0,,,,2655.0,2359.0,6838.0,6210804000.0
2079236,2021-12-23,US,WY,Weston County,iso1:us#iso2:us-wy#fips:56045,3.0,0.0,1246.0,14.0,,,,2662.0,2366.0,6838.0,6210804000.0
2079237,2021-12-24,US,WY,Weston County,iso1:us#iso2:us-wy#fips:56045,0.0,0.0,1246.0,14.0,,,,,,6838.0,6210804000.0
2079238,2021-12-25,US,WY,Weston County,iso1:us#iso2:us-wy#fips:56045,0.0,0.0,1246.0,14.0,,,,,,6838.0,6210804000.0
2079239,2021-12-26,US,WY,Weston County,iso1:us#iso2:us-wy#fips:56045,0.0,0.0,1246.0,14.0,,,,,,6838.0,6210804000.0


In [18]:
cleaned_covid_df.to_csv('covid_data_withpop.csv', index=False)