# Contents

### 1. Import Library 
### 2. Import  Mask and Stay and Home Data
### 3. Merge Stay at Home and Mask Data
### 4. Prepare Mask and Home Data for Merge
* Derive New Categorical Variables
* Add Variables for Dataset Agreement

### 5. Import COVID Data: Deaths Per 100K
### 6. Prepare COVID Data for Merge
*  Rename Columns

### 7. Merge Mask/Home with Cases Per 100K
### 8. Download Merged 100K and Mask/Home Order
### 9. Create Sub-Set for Merge with COVID Case and Death Data
* Derive New Grouped Variables
* Merge Subdata Sets
* Replace Values

### 10. Download COVID Case and Death Data
### 11. Prepare COVID Case and Death Data for Merge
* Drop Columns
* Rename Columns for Merge
* Change Data Format

### 12. Merge COVID Mask/Home/Case Per 100K to Cases and Death Data
###  13. Download Merged COVID Mask/Home/Case Per 100K to Cases and Death Data

## 1. Import Library and Data

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import seaborn as sns
import scipy

## 2. Import Mask and Stay at Home Data

In [2]:
# Create path to folder
path = r'/Users/caitlin/iCloud/Caitlin/COVID Data/Prepared Data'

In [3]:
path

'/Users/caitlin/iCloud/Caitlin/COVID Data/Prepared Data'

#### Stay at home order data

In [6]:
# Import stay at home data file
COVID_home = pd.read_pickle(os.path.join(path, 'COVID_home_clean.pkl'))

In [7]:
COVID_home.head(2)

Unnamed: 0,state,County_Name,FIPS_State,FIPS_County,date,order_code_home,Stay_at_Home_Order_Recommendation
26,AL,Autauga County,1,1,2020-04-10,1,Mandatory for all individuals
27,AL,Autauga County,1,1,2020-04-11,1,Mandatory for all individuals


In [8]:
COVID_home.shape

(779216, 7)

#### Mask mandate data

In [9]:
# Import mask mandate data file
COVID_mask = pd.read_pickle(os.path.join(path, 'COVID_mask_clean.pkl'))

In [10]:
COVID_mask.head(2)

Unnamed: 0,state,County_Name,FIPS_State,FIPS_County,date,order_code_mask,Face_Masks_Required_in_Public
0,AL,Autauga County,1,1,2020-04-10,2,No
1,AL,Autauga County,1,1,2020-04-11,2,No


In [11]:
COVID_mask.shape

(779216, 7)

## 3. Merge Stay at Home and Mask Data

In [12]:
# Merge dataframes
COVID_mask_home = COVID_home.merge(COVID_mask, on = ['state', 'County_Name', 'FIPS_State', 'FIPS_County', 'date'])

In [13]:
COVID_mask_home.head(5)

Unnamed: 0,state,County_Name,FIPS_State,FIPS_County,date,order_code_home,Stay_at_Home_Order_Recommendation,order_code_mask,Face_Masks_Required_in_Public
0,AL,Autauga County,1,1,2020-04-10,1,Mandatory for all individuals,2,No
1,AL,Autauga County,1,1,2020-04-11,1,Mandatory for all individuals,2,No
2,AL,Autauga County,1,1,2020-04-12,1,Mandatory for all individuals,2,No
3,AL,Autauga County,1,1,2020-04-13,1,Mandatory for all individuals,2,No
4,AL,Autauga County,1,1,2020-04-14,1,Mandatory for all individuals,2,No


In [14]:
COVID_mask_home.shape

(779216, 9)

In [15]:
COVID_mask_home.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 779216 entries, 0 to 779215
Data columns (total 9 columns):
 #   Column                             Non-Null Count   Dtype         
---  ------                             --------------   -----         
 0   state                              779216 non-null  object        
 1   County_Name                        779216 non-null  object        
 2   FIPS_State                         779216 non-null  int8          
 3   FIPS_County                        779216 non-null  int16         
 4   date                               779216 non-null  datetime64[ns]
 5   order_code_home                    779216 non-null  int8          
 6   Stay_at_Home_Order_Recommendation  779216 non-null  object        
 7   order_code_mask                    779216 non-null  int8          
 8   Face_Masks_Required_in_Public      779216 non-null  object        
dtypes: datetime64[ns](1), int16(1), int8(3), object(4)
memory usage: 39.4+ MB


## 4. Prepare Mask and Home Data for Merge

### Derive New Categorical Variables

A yes/no/recommended categorical variable will be analyzed later, so adding.

#### Stay at Home Orders

In [16]:
# Group order code and stay at home recommendation to see how many of each
COVID_mask_home.groupby(['order_code_home'])['Stay_at_Home_Order_Recommendation'].value_counts()

order_code_home  Stay_at_Home_Order_Recommendation                                          
1                Mandatory for all individuals                                                   61325
2                Mandatory only for all individuals in certain areas of the jurisdiction          3153
3                Mandatory only for at-risk individuals in the jurisdiction                      55478
5                Mandatory only for at-risk individuals in certain areas of the jurisdiction        30
6                Advisory/Recommendation                                                        405566
7                No order for individuals to stay home                                          253664
Name: Stay_at_Home_Order_Recommendation, dtype: int64

In [17]:
# Create exclusion flag for dates with no stay at home orders 
COVID_mask_home.loc[COVID_mask_home['order_code_home'] == 7, 'home_order'] = "No"
COVID_mask_home.loc[COVID_mask_home['order_code_home'] <= 5, 'home_order'] = "Yes"
COVID_mask_home.loc[COVID_mask_home['order_code_home'] == 6, 'home_order'] = "Recommended"

In [18]:
# Check creation of flag and that it equals 7 above
COVID_mask_home['home_order'].value_counts()

Recommended    405566
No             253664
Yes            119986
Name: home_order, dtype: int64

#### Mask Mandates

In [19]:
# Group order code and mask recommendation to see how many of each
COVID_mask_home.groupby(['order_code_mask'])['Face_Masks_Required_in_Public'].value_counts()

order_code_mask  Face_Masks_Required_in_Public
1                Yes                              370675
2                No                               408541
Name: Face_Masks_Required_in_Public, dtype: int64

In [20]:
# Show all rows
pd.options.display.max_rows = None

In [21]:
# Check new values
COVID_mask_home.head(3)

Unnamed: 0,state,County_Name,FIPS_State,FIPS_County,date,order_code_home,Stay_at_Home_Order_Recommendation,order_code_mask,Face_Masks_Required_in_Public,home_order
0,AL,Autauga County,1,1,2020-04-10,1,Mandatory for all individuals,2,No,Yes
1,AL,Autauga County,1,1,2020-04-11,1,Mandatory for all individuals,2,No,Yes
2,AL,Autauga County,1,1,2020-04-12,1,Mandatory for all individuals,2,No,Yes


### Add Variables for Dataset Agreement

Have to add a column for state names since the COVID case dataset has names.

In [22]:
# Rename current state abbreviation state column
COVID_mask_home = COVID_mask_home.rename(columns = {'state':'state_abbreviated'})

In [23]:
# Create dictionary for state abbreviations
us_state_abbrev = {
        'AK': 'Alaska',
        'AL': 'Alabama',
        'AR': 'Arkansas',
        'AZ': 'Arizona',
        'CA': 'California',
        'CO': 'Colorado',
        'CT': 'Connecticut',
        'DC': 'District of Columbia',
        'DE': 'Delaware',
        'FL': 'Florida',
        'GA': 'Georgia',
        'HI': 'Hawaii',
        'IA': 'Iowa',
        'ID': 'Idaho',
        'IL': 'Illinois',
        'IN': 'Indiana',
        'KS': 'Kansas',
        'KY': 'Kentucky',
        'LA': 'Louisiana',
        'MA': 'Massachusetts',
        'MD': 'Maryland',
        'ME': 'Maine',
        'MI': 'Michigan',
        'MN': 'Minnesota',
        'MO': 'Missouri',
        'MS': 'Mississippi',
        'MT': 'Montana',
        'NA': 'National',
        'NC': 'North Carolina',
        'ND': 'North Dakota',
        'NE': 'Nebraska',
        'NH': 'New Hampshire',
        'NJ': 'New Jersey',
        'NM': 'New Mexico',
        'NV': 'Nevada',
        'NY': 'New York',
        'OH': 'Ohio',
        'OK': 'Oklahoma',
        'OR': 'Oregon',
        'PA': 'Pennsylvania',
        'RI': 'Rhode Island',
        'SC': 'South Carolina',
        'SD': 'South Dakota',
        'TN': 'Tennessee',
        'TX': 'Texas',
        'UT': 'Utah',
        'VA': 'Virginia',
        'VT': 'Vermont',
        'WA': 'Washington',
        'WI': 'Wisconsin',
        'WV': 'West Virginia',
        'WY': 'Wyoming'}

In [24]:
# Create new column with full state names
COVID_mask_home['state'] = COVID_mask_home['state_abbreviated'].map(us_state_abbrev).fillna(COVID_mask_home['state_abbreviated'])

In [25]:
COVID_mask_home.head(5)

Unnamed: 0,state_abbreviated,County_Name,FIPS_State,FIPS_County,date,order_code_home,Stay_at_Home_Order_Recommendation,order_code_mask,Face_Masks_Required_in_Public,home_order,state
0,AL,Autauga County,1,1,2020-04-10,1,Mandatory for all individuals,2,No,Yes,Alabama
1,AL,Autauga County,1,1,2020-04-11,1,Mandatory for all individuals,2,No,Yes,Alabama
2,AL,Autauga County,1,1,2020-04-12,1,Mandatory for all individuals,2,No,Yes,Alabama
3,AL,Autauga County,1,1,2020-04-13,1,Mandatory for all individuals,2,No,Yes,Alabama
4,AL,Autauga County,1,1,2020-04-14,1,Mandatory for all individuals,2,No,Yes,Alabama


In [26]:
# Check addition
COVID_mask_home.head(5)

Unnamed: 0,state_abbreviated,County_Name,FIPS_State,FIPS_County,date,order_code_home,Stay_at_Home_Order_Recommendation,order_code_mask,Face_Masks_Required_in_Public,home_order,state
0,AL,Autauga County,1,1,2020-04-10,1,Mandatory for all individuals,2,No,Yes,Alabama
1,AL,Autauga County,1,1,2020-04-11,1,Mandatory for all individuals,2,No,Yes,Alabama
2,AL,Autauga County,1,1,2020-04-12,1,Mandatory for all individuals,2,No,Yes,Alabama
3,AL,Autauga County,1,1,2020-04-13,1,Mandatory for all individuals,2,No,Yes,Alabama
4,AL,Autauga County,1,1,2020-04-14,1,Mandatory for all individuals,2,No,Yes,Alabama


In [27]:
# Check counts to check add
COVID_mask_home['state'].value_counts(dropna = False)

Texas                   62992
Georgia                 39432
Virginia                32984
Kentucky                29760
Missouri                28520
Kansas                  26040
Illinois                25296
North Carolina          24800
Iowa                    24552
Tennessee               23560
Nebraska                23064
Indiana                 22816
Ohio                    21824
Minnesota               21576
Michigan                20584
Mississippi             20336
Oklahoma                19096
Arkansas                18600
Wisconsin               17856
Pennsylvania            16616
Alabama                 16616
Florida                 16616
South Dakota            16368
Louisiana               15872
Colorado                15872
New York                15376
California              14384
Montana                 13888
West Virginia           13640
North Dakota            13144
South Carolina          11408
Idaho                   10912
Washington               9672
Oregon    

## 5. Import COVID Data: Deaths Per 100K

In [57]:
# Import stay at COVID cases
COVID_cases = pd.read_csv(os.path.join(path, 'COVID_cases_per_100K.csv'))

In [58]:
COVID_cases.head(5)

Unnamed: 0.1,Unnamed: 0,state_name,county_name,fips_code,date,cases_per_100K,community_transmission_level
0,8,Kansas,Riley County,20161,2020-07-04,119.894,high
1,9,Indiana,Posey County,18129,2020-07-04,39.328,moderate
2,10,Georgia,Hart County,13147,2020-07-04,95.402,high
3,11,New Jersey,Ocean County,34029,2020-07-04,24.704,moderate
4,12,Florida,Lee County,12071,2020-07-04,293.157,high


In [59]:
# Delete unneeded columns
COVID_cases = COVID_cases.drop(columns = ['Unnamed: 0'],)

In [60]:
COVID_cases.head(3)

Unnamed: 0,state_name,county_name,fips_code,date,cases_per_100K,community_transmission_level
0,Kansas,Riley County,20161,2020-07-04,119.894,high
1,Indiana,Posey County,18129,2020-07-04,39.328,moderate
2,Georgia,Hart County,13147,2020-07-04,95.402,high


In [61]:
COVID_cases.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 799056 entries, 0 to 799055
Data columns (total 6 columns):
 #   Column                        Non-Null Count   Dtype  
---  ------                        --------------   -----  
 0   state_name                    799056 non-null  object 
 1   county_name                   799056 non-null  object 
 2   fips_code                     799056 non-null  int64  
 3   date                          799056 non-null  object 
 4   cases_per_100K                579930 non-null  float64
 5   community_transmission_level  799056 non-null  object 
dtypes: float64(1), int64(1), object(4)
memory usage: 36.6+ MB


In [62]:
COVID_cases.shape

(799056, 6)

In [63]:
# Change date column to datetime format so it behaves as a number
COVID_cases['date'] = pd.to_datetime(COVID_cases['date'])

In [64]:
COVID_cases.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 799056 entries, 0 to 799055
Data columns (total 6 columns):
 #   Column                        Non-Null Count   Dtype         
---  ------                        --------------   -----         
 0   state_name                    799056 non-null  object        
 1   county_name                   799056 non-null  object        
 2   fips_code                     799056 non-null  int64         
 3   date                          799056 non-null  datetime64[ns]
 4   cases_per_100K                579930 non-null  float64       
 5   community_transmission_level  799056 non-null  object        
dtypes: datetime64[ns](1), float64(1), int64(1), object(3)
memory usage: 36.6+ MB


## 6. Prepare COVID Data for Merge

### Rename Columns

In [65]:
# Rename Columns
COVID_cases_2 = COVID_cases.rename(columns = {'state_name':'state','county_name':'County_Name'})

In [66]:
COVID_cases_2.head(3)

Unnamed: 0,state,County_Name,fips_code,date,cases_per_100K,community_transmission_level
0,Kansas,Riley County,20161,2020-07-04,119.894,high
1,Indiana,Posey County,18129,2020-07-04,39.328,moderate
2,Georgia,Hart County,13147,2020-07-04,95.402,high


## 7. Merge Mask/Home with Cases

In [67]:
# Merge dataframes
COVID_mask_home_100K = COVID_mask_home.merge(COVID_cases_2, on = ['state', 'County_Name','date'])

In [68]:
COVID_mask_home_100K.head(3)

Unnamed: 0,state_abbreviated,County_Name,FIPS_State,FIPS_County,date,order_code_home,Stay_at_Home_Order_Recommendation,order_code_mask,Face_Masks_Required_in_Public,home_order,state,fips_code,cases_per_100K,community_transmission_level
0,AL,Autauga County,1,1,2020-04-10,1,Mandatory for all individuals,2,No,Yes,Alabama,1001,,moderate
1,AL,Autauga County,1,1,2020-04-11,1,Mandatory for all individuals,2,No,Yes,Alabama,1001,,moderate
2,AL,Autauga County,1,1,2020-04-12,1,Mandatory for all individuals,2,No,Yes,Alabama,1001,,moderate


In [69]:
COVID_mask_home_100K.shape

(769296, 14)

## 8. Download Merged 100K and Mask/Home Order

The county level data could be useful, but in order to merge all the data, it requires combining to states as the key, so I am going to use two different data sets.

In [70]:
#Download cleaned data
COVID_mask_home_100K.to_csv(os.path.join(path, 'COVID_state_mask_100K.csv'))

In [71]:
COVID_mask_home_100K.groupby(['order_code_home'])['Stay_at_Home_Order_Recommendation'].value_counts()

order_code_home  Stay_at_Home_Order_Recommendation                                          
1                Mandatory for all individuals                                                   59938
2                Mandatory only for all individuals in certain areas of the jurisdiction          3069
3                Mandatory only for at-risk individuals in the jurisdiction                      55478
5                Mandatory only for at-risk individuals in certain areas of the jurisdiction        30
6                Advisory/Recommendation                                                        405523
7                No order for individuals to stay home                                          245258
Name: Stay_at_Home_Order_Recommendation, dtype: int64

## 9. Create Sub-Set for Merge with COVID Case and Death Data

Datasets above are on the county level, so variables need to be grouped by state to merge with case and death data.

### Derive New Grouped Variables

In [72]:
# Variable creation: Count of yes for stay at home recommended
COVID_home_by_state_yes = COVID_mask_home_100K.groupby(['date','state'])['home_order'].apply(lambda x: (x=='Yes').sum()).reset_index(name='home_yes_count')

In [73]:
# Variable creation: Count of no for stay at home recommended
COVID_home_by_state_no = COVID_mask_home_100K.groupby(['date','state'])['home_order'].apply(lambda x: (x=='No').sum()).reset_index(name='home_no_count')

In [74]:
# Variable creation: Count of recommend for stay at home recommended
COVID_home_by_state_recommend = COVID_mask_home_100K.groupby(['date','state'])['home_order'].apply(lambda x: (x=='Recommended').sum()).reset_index(name='home_recommend_count')

In [75]:
# Variable creation: Count of yes mask orders
COVID_mask_by_state_yes = COVID_mask_home_100K.groupby(['date','state'])['Face_Masks_Required_in_Public'].apply(lambda x: (x=='Yes').sum()).reset_index(name='yes_mask_count')

In [76]:
# Variable creation: Count of no mask orders
COVID_mask_by_state_no = COVID_mask_home_100K.groupby(['date','state'])['Face_Masks_Required_in_Public'].apply(lambda x: (x=='No').sum()).reset_index(name='no_mask_count')

In [77]:
# Variable creation: Sum of COVID cases by 100K
COVID_home_by_state_avgcases100K = COVID_mask_home_100K.groupby(['date','state'])['cases_per_100K'].mean().reset_index(name='avg_per_100K')

### Merge Subdata Sets

In [78]:
# Merge dataframes
COVID_home_by_state_total = COVID_home_by_state_yes.merge(COVID_home_by_state_no, on = ['date', 'state']).merge(COVID_home_by_state_recommend, on = ['date', 'state']).merge(COVID_mask_by_state_yes, on = ['date', 'state']).merge(COVID_mask_by_state_no, on = ['date', 'state']).merge(COVID_home_by_state_avgcases100K, on = ['date', 'state'])

In [79]:
# Check merge
COVID_home_by_state_total.head(5)

Unnamed: 0,date,state,home_yes_count,home_no_count,home_recommend_count,yes_mask_count,no_mask_count,avg_per_100K
0,2020-04-10,Alabama,67,0,0,0,67,49.177
1,2020-04-10,Alaska,29,0,0,0,29,2.027
2,2020-04-10,Arizona,15,0,0,0,15,31.3668
3,2020-04-10,Arkansas,0,75,0,0,75,23.702227
4,2020-04-10,California,58,0,0,0,58,11.686698


### Replace Values

Since counts are important to this analysis, and Python will count 0 as a value, I need to turn all 0 into NaNs for categorical variables.

In [80]:
# Replace 0 with NaNs
COVID_home_by_state_total['home_yes_count']=COVID_home_by_state_total['home_yes_count'].replace(0, np.nan)

In [81]:
COVID_home_by_state_total['home_no_count']=COVID_home_by_state_total['home_no_count'].replace(0, np.nan)

In [82]:
COVID_home_by_state_total['home_recommend_count']=COVID_home_by_state_total['home_recommend_count'].replace(0, np.nan)

In [83]:
COVID_home_by_state_total['yes_mask_count']=COVID_home_by_state_total['yes_mask_count'].replace(0, np.nan)

In [84]:
COVID_home_by_state_total['no_mask_count']=COVID_home_by_state_total['no_mask_count'].replace(0, np.nan)

In [85]:
COVID_home_by_state_total.head(20)

Unnamed: 0,date,state,home_yes_count,home_no_count,home_recommend_count,yes_mask_count,no_mask_count,avg_per_100K
0,2020-04-10,Alabama,67.0,,,,67.0,49.177
1,2020-04-10,Alaska,29.0,,,,29.0,2.027
2,2020-04-10,Arizona,15.0,,,,15.0,31.3668
3,2020-04-10,Arkansas,,75.0,,,75.0,23.702227
4,2020-04-10,California,58.0,,,,58.0,11.686698
5,2020-04-10,Colorado,64.0,,,,64.0,29.503756
6,2020-04-10,Connecticut,,8.0,,,8.0,105.99475
7,2020-04-10,Delaware,3.0,,,,3.0,88.897333
8,2020-04-10,District of Columbia,1.0,,,,1.0,127.949
9,2020-04-10,Florida,67.0,,,,67.0,25.399681


## 10. Download COVID Case and Death Data

In [98]:
# Import stay at home data file
COVID_cases_deaths = pd.read_csv(os.path.join(path, 'COVID_final_updated_2.csv'))

In [99]:
COVID_cases_deaths.head(10)

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,date,state_abbreviated,Total Cases,New Cases,Total Deaths,New Deaths,State
0,0,5,2020-06-10,VT,1009,10,54,0,Vermont
1,1,6,2020-04-28,MI,51401,1218,4310,105,Michigan
2,2,7,2020-05-03,NH,2518,89,86,2,New Hampshire
3,3,8,2020-07-31,ND,6602,133,103,0,North Dakota
4,4,10,2020-09-24,AL,148606,2188,3894,13,Alabama
5,5,11,2020-05-16,MI,81494,702,5444,31,Michigan
6,6,15,2020-05-15,CT,36085,621,3285,66,Connecticut
7,7,17,2020-10-23,NE,62931,1303,591,4,Nebraska
8,8,24,2020-11-05,MO,226051,5996,3700,44,Missouri
9,9,28,2020-12-08,MS,173907,2476,4705,42,Mississippi


## 11. Prepare COVID Case and Death Data for Merge

### Drop Columns

In [100]:
# Delete unneeded columns
COVID_cases_deaths_2 = COVID_cases_deaths.drop(columns = ['Unnamed: 0','Unnamed: 0.1'],)

In [101]:
COVID_cases_deaths_2.head(3)

Unnamed: 0,date,state_abbreviated,Total Cases,New Cases,Total Deaths,New Deaths,State
0,2020-06-10,VT,1009,10,54,0,Vermont
1,2020-04-28,MI,51401,1218,4310,105,Michigan
2,2020-05-03,NH,2518,89,86,2,New Hampshire


### Rename Columns for Merge

In [102]:
# Rename current state abbreviation state column
COVID_cases_deaths_3 = COVID_cases_deaths_2.rename(columns = {'state':'state_abbreviated'})

In [103]:
# Create dictionary for state abbreviations
us_state_abbrev = {
        'AK': 'Alaska',
        'AL': 'Alabama',
        'AR': 'Arkansas',
        'AZ': 'Arizona',
        'CA': 'California',
        'CO': 'Colorado',
        'CT': 'Connecticut',
        'DC': 'District of Columbia',
        'DE': 'Delaware',
        'FL': 'Florida',
        'GA': 'Georgia',
        'HI': 'Hawaii',
        'IA': 'Iowa',
        'ID': 'Idaho',
        'IL': 'Illinois',
        'IN': 'Indiana',
        'KS': 'Kansas',
        'KY': 'Kentucky',
        'LA': 'Louisiana',
        'MA': 'Massachusetts',
        'MD': 'Maryland',
        'ME': 'Maine',
        'MI': 'Michigan',
        'MN': 'Minnesota',
        'MO': 'Missouri',
        'MS': 'Mississippi',
        'MT': 'Montana',
        'NA': 'National',
        'NC': 'North Carolina',
        'ND': 'North Dakota',
        'NE': 'Nebraska',
        'NH': 'New Hampshire',
        'NJ': 'New Jersey',
        'NM': 'New Mexico',
        'NV': 'Nevada',
        'NY': 'New York',
        'OH': 'Ohio',
        'OK': 'Oklahoma',
        'OR': 'Oregon',
        'PA': 'Pennsylvania',
        'RI': 'Rhode Island',
        'SC': 'South Carolina',
        'SD': 'South Dakota',
        'TN': 'Tennessee',
        'TX': 'Texas',
        'UT': 'Utah',
        'VA': 'Virginia',
        'VT': 'Vermont',
        'WA': 'Washington',
        'WI': 'Wisconsin',
        'WV': 'West Virginia',
        'WY': 'Wyoming'}

In [104]:
# Create new column with full state names
COVID_cases_deaths_3['state'] = COVID_cases_deaths_3['state_abbreviated'].map(us_state_abbrev).fillna(COVID_cases_deaths_3['state_abbreviated'])

In [105]:
COVID_cases_deaths_3.head(5)

Unnamed: 0,date,state_abbreviated,Total Cases,New Cases,Total Deaths,New Deaths,State,state
0,2020-06-10,VT,1009,10,54,0,Vermont,Vermont
1,2020-04-28,MI,51401,1218,4310,105,Michigan,Michigan
2,2020-05-03,NH,2518,89,86,2,New Hampshire,New Hampshire
3,2020-07-31,ND,6602,133,103,0,North Dakota,North Dakota
4,2020-09-24,AL,148606,2188,3894,13,Alabama,Alabama


In [106]:
COVID_cases_deaths_3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13939 entries, 0 to 13938
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   date               13939 non-null  object
 1   state_abbreviated  13939 non-null  object
 2   Total Cases        13939 non-null  int64 
 3   New Cases          13939 non-null  int64 
 4   Total Deaths       13939 non-null  int64 
 5   New Deaths         13939 non-null  int64 
 6   State              13939 non-null  object
 7   state              13939 non-null  object
dtypes: int64(4), object(4)
memory usage: 871.3+ KB


### Change Data Format

In [107]:
# Change date column to datetime format so it behaves as a number
COVID_cases_deaths_3['date'] = pd.to_datetime(COVID_cases_deaths_3['date'])

##  12. Merge COVID Mask/Home/Case Per 100K to Cases and Death Data

In [108]:
# Merge dataframes
COVID_by_state_final_merged = COVID_home_by_state_total.merge(COVID_cases_deaths_3, on = ['date', 'state'])

In [109]:
COVID_by_state_final_merged.head(5)

Unnamed: 0,date,state,home_yes_count,home_no_count,home_recommend_count,yes_mask_count,no_mask_count,avg_per_100K,state_abbreviated,Total Cases,New Cases,Total Deaths,New Deaths,State
0,2020-04-10,Alabama,67.0,,,,67.0,49.177,AL,3103,158,141,12,Alabama
1,2020-04-10,Alaska,29.0,,,,29.0,2.027,AK,246,11,9,0,Alaska
2,2020-04-10,Arizona,15.0,,,,15.0,31.3668,AZ,3112,94,97,8,Arizona
3,2020-04-10,Arkansas,,75.0,,,75.0,23.702227,AR,1202,75,24,3,Arkansas
4,2020-04-10,California,58.0,,,,58.0,11.686698,CA,19472,1163,541,49,California


In [110]:
# Drop unnecessary columns
COVID_by_state_final_merged = COVID_by_state_final_merged.drop(columns = ['state_abbreviated'],)

In [111]:
COVID_by_state_final_merged.head(5)

Unnamed: 0,date,state,home_yes_count,home_no_count,home_recommend_count,yes_mask_count,no_mask_count,avg_per_100K,Total Cases,New Cases,Total Deaths,New Deaths,State
0,2020-04-10,Alabama,67.0,,,,67.0,49.177,3103,158,141,12,Alabama
1,2020-04-10,Alaska,29.0,,,,29.0,2.027,246,11,9,0,Alaska
2,2020-04-10,Arizona,15.0,,,,15.0,31.3668,3112,94,97,8,Arizona
3,2020-04-10,Arkansas,,75.0,,,75.0,23.702227,1202,75,24,3,Arkansas
4,2020-04-10,California,58.0,,,,58.0,11.686698,19472,1163,541,49,California


## 13. Download Merged COVID Mask/Home/Case Per 100K to Cases and Death Data

In [112]:
#Download cleaned data
COVID_by_state_final_merged.to_csv(os.path.join(path, 'COVID_death_case_mandate_by_state.csv'))