## Minnesota State COVID Response Analysis
This notebook contains the work to identify associations between the Minnesota state governmental response and the COVID-19 case count throughout the pandemic.


## Data Cleanup
As with most data mining projects, we will need to clean up the given data file in order to focus on the goal at hand. The "all-states-history.csv" file is a dataset of U.S. COVID-19 cases and deaths dating from the start of the pandemic to 11/29/20 and was sourced from [The Covid Tracking Project](https://covidtracking.com/data). We are analyzing 3 periods throughout this timeline:

- Early Breakout (Early March -> May)
- Summer (June -> August)
- Fall/Present (September -> Late November)

We will divide up the data into 3 different frames according to these periods.

In [None]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import squarify
import seaborn as sns

In [None]:
data = pd.read_csv('all-states-history.csv')

Cleaning up data to only include Minnesota instances and the appropriate attributes

In [None]:
#isolating the columns we need
columns_to_show = ['date','state','death','deathConfirmed','deathIncrease','hospitalized','hospitalizedIncrease','negative'
                   ,'negativeIncrease','positive','positiveIncrease','totalTestResults','totalTestResultsIncrease']

#isolating only for MN data and putting in order March->November
clean_data = data[data['state'] == 'MN']
clean_data = clean_data[columns_to_show]
clean_data = clean_data.iloc[::-1]

#reindexing for weekly processing 
clean_data['date'] = clean_data['date'].astype('datetime64[ns]')
clean_data = clean_data.set_index('date')

# isolating the columns that need to be summed when converting to weekly index
columns_to_sum = clean_data[['deathIncrease','hospitalizedIncrease','negativeIncrease','positiveIncrease','totalTestResultsIncrease']]
weekly_data = columns_to_sum.resample('W', label='right', closed='right').sum()
weekly_data = weekly_data.reset_index()


# converting remaining non-sum columns to weekly index
remaining_cols = clean_data[['state','death','deathConfirmed','hospitalized', 'negative','positive','totalTestResults']]
remaining_cols = remaining_cols.resample('W').backfill().reset_index()
remaining_cols.head(39)

#merging and resetting the datframe order to be more clear
clean_data = pd.merge(remaining_cols, weekly_data, on='date').fillna(0)
clean_data = clean_data[['date','state','death','deathIncrease','deathConfirmed','hospitalized', 'hospitalizedIncrease','negative',
                        'negativeIncrease','positive', 'positiveIncrease','totalTestResults','totalTestResultsIncrease']]

In [None]:
## Breaking down clean data into each period (earliest days at bottom of dataset)

early_breakout_data = clean_data[0:13]

summer_data = clean_data[13:26]

fall_data = clean_data[26:]

early_breakout_data.head(13)

bins = pd.cut(early_breakout_data['positiveIncrease'],4)

print(bins.shape)


## Analysis

Important MN Stats:

- Population (mn.gov estimate): 5,680,337
- Land Area (estimate): 79,610.08 sq. mi.
- Population Density: 71.35 people/sq. mi.

Since we are performing a market basket analysis using the Apriori algorithm, we will need to discretize the data. To do so, we've implemented a function 'discretize_data':

In [125]:
# arr is the dataframe 
# k is the number of equal frequency bins
def discretize_data(arr, k):
    out = pd.DataFrame({'date': arr['date']})
    out['state'] = arr['state']
    cols = arr.columns[2:]
    for i in cols:
        bins = pd.cut(arr[i], k, 'retbins' == True, labels = list(range(k)))
        bin_range = pd.cut(arr[i],k)
        for j in range(k):
            count = 0
            for row in arr.index:
                if bins.loc[row] == j:
                    out.loc[row, i + " bin " +  str(bin_range.loc[count])] = 1
                count += 1
    out = out.fillna(0)
    return out      

Early Breakout Analysis:

In [126]:
early_break_disc = discretize_data(early_breakout_data,4)
early_break_disc.head(10)

Unnamed: 0,date,state,"death bin (-1.05, 262.5]","death bin (262.5, 525.0]","death bin (525.0, 787.5]","death bin (787.5, 1050.0]","deathIncrease bin (-0.172, 43.0]","deathIncrease bin (43.0, 86.0]","deathIncrease bin (129.0, 172.0]","deathConfirmed bin (-1.04, 260.0]",...,"positiveIncrease bin (2553.0, 3829.0]","positiveIncrease bin (3829.0, 5105.0]","totalTestResults bin (-233.405, 70901.25]","totalTestResults bin (70901.25, 141752.5]","totalTestResults bin (141752.5, 212603.75]","totalTestResults bin (212603.75, 283455.0]","totalTestResultsIncrease bin (-45.755, 14701.75]","totalTestResultsIncrease bin (14701.75, 29390.5]","totalTestResultsIncrease bin (29390.5, 44079.25]","totalTestResultsIncrease bin (44079.25, 58768.0]"
0,2020-03-08,MN,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
1,2020-03-15,MN,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
2,2020-03-22,MN,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
3,2020-03-29,MN,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
4,2020-04-05,MN,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
5,2020-04-12,MN,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
6,2020-04-19,MN,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
7,2020-04-26,MN,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
8,2020-05-03,MN,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0
9,2020-05-10,MN,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,...,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0
