# Data analysis of COVID-19 statistics

This notebook contains code to do some web scraping of COVID-19 statistics, and of state attributes including land area and political affiliation (based on the Electoral College vote of 2020). It also uses various tools to perform analysis and visualization of the resulting data downloaded.

__IMPORTANT__: Most of the code in this notebook is time-sensitive, meaning that the site used to collect data updates their site at approximately 9:00 PM Eastern Time Zone. Typically, when working with the site interactively, I wait until at least that time of day to collect stats. However, it occurred to me that I can achieve the same results by executing this notebook the following day, so long as I do so before the site gets updated. In some cases, this requires me to search for the previous day's data (specifically for the collection of individual US state-level data).

In [1]:
# Set the global variable on whether to use current data or yesterday's data
USE_YESTERDAYS_DATA = True

## Background

I have been collecting country- and state-level data on COVID cases and deaths since the start of the pandemic. More recently, I became interested in the response to recommended counter-measures issued by the CDC and other health organizations, specifically with how those people who tended to vote Republican versus those who tended to vote Democratic fared against the virus. Although not perfect, I used the Electoral College vote of 2020 as an indicator for each state's political leanings. This, of course has shortcomings in such things as lumping all of the state's population into either one basket or the other, as well as excluding any effect that the Governor of each state may inject into the response to the pandemic.

To be able to compare states on a level playing field, instead of taking total deaths in each state, I used the per capita death rate. Additionally, and as is well known, areas with a high population density tend to spread disease (especially of the airborne variety) more quickly and with harsher effect than areas with a low population density. So to compensate further and to take into account this difference in population density, we use some more techniques to standardize the data on this metric.

## Import the required libraries used in this notebook

We need the ability to access a URL for content, parse and process the html, load the data into tables for analysis, as well as numpy to perform some basic numerical operations and sklearn for the Linear Regression portion of this script

In [2]:
import urllib3
from lxml import html
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import statsmodels.api as sm
from statsmodels.formula.api import ols

Create the http object that we will use to pull down data from various sources

In [3]:
http = urllib3.PoolManager()

The website worldometers.info contains data compiled for various countries, and additionally for the USA (and possibly others), state-level data which is what we're interested in.

In [4]:
u = http.request("GET", "https://www.worldometers.info/coronavirus/country/us/")
if u.status == 200:
    strStateData = u.data.decode('utf-8', errors='ignore')
    treeStateData = html.fromstring(strStateData)
    if USE_YESTERDAYS_DATA:
        tblStateData = treeStateData.xpath('//table[contains(@id, "usa_table_countries_yesterday")]/tbody[1]/tr')
    else:
        tblStateData = treeStateData.xpath('//table[contains(@id, "usa_table_countries_today")]/tbody[1]/tr')

Now that we have the data loaded in an html table object, we can start parsing that object to pull out the required data. On occasion (twice so far in two years), I have had to adjust this section when the format used in the page changes. At this point, what we are after is to collect data from the columns representing data for the state name, the number of deaths, and the state population. Each of these will be placed into a separate Python list which will be used in the next cell of this notebook.

In [5]:
lstStateName = []
lstStatePop = []
lstStateDeaths = []

for i in range(1, len(tblStateData)):
    if USE_YESTERDAYS_DATA:
        row = treeStateData.xpath('//table[contains(@id, "usa_table_countries_yesterday")]/tbody[1]/tr')[i]
    else:
        row = treeStateData.xpath('//table[contains(@id, "usa_table_countries_today")]/tbody[1]/tr')[i]
    cellState = row.xpath('td')[1]
    lstStateName.append(cellState.find("a").text)
    cellDeaths = row.xpath('td')[4].text.replace(",", "").replace("\n", "")
    lstStateDeaths.append(int(cellDeaths))
    cellPop = row.xpath('td')[12].text.replace(",", "").replace(" ", "")
    lstStatePop.append(int(cellPop))
    #print("{} Population:{} ; Deaths:{}".format(lstStateName[i-1], lstStatePop[i-1], lstStateDeaths[i-1]))

After building each list, we create a dictionary object out of it and then use the 3 dictionary objects to create a Pandas dataframe. This dataframe will have the index as the state name, and be sorted on the index values in (default) ascending order.

In [6]:
dicStateData = {}

dicStateData["State"] = lstStateName
dicStateData["Population"] = lstStatePop
dicStateData["Deaths"] = lstStateDeaths

dfStateData = pd.DataFrame(dicStateData)
dfStateData.set_index("State", inplace=True)
dfStateData.sort_index(axis=0, inplace=True)

dfStateData["PerCapitaDeaths"] = dfStateData["Deaths"] / dfStateData["Population"] * 100000
dfStateData

Unnamed: 0_level_0,Population,Deaths,PerCapitaDeaths
State,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Alabama,4903185,16418,334.843576
Alaska,731545,945,129.178656
Arizona,7278717,23983,329.494882
Arkansas,3017804,9058,300.152031
California,39512223,76349,193.228814
Colorado,5758736,10383,180.299982
Connecticut,3565287,9077,254.59381
Delaware,973764,2271,233.218727
District Of Columbia,705749,1207,171.023976
Florida,21477737,62347,290.286635


This next section is intended to work in a compensating factor for population density.

In [7]:
u = http.request("GET", "https://www.census.gov/geographies/reference-files/2010/geo/state-area.html")
if u.status == 200:
    strStatePop = u.data.decode('utf-8', errors='ignore')
    treeStatePop = html.fromstring(strStatePop)
    rowsStatePop = treeStatePop.xpath('//div[contains(@class, "uscb-text-image-text uscb-text-media-text uscb-padding-LR-0")]/table/tbody/tr[td]')

If we have successfully connected to the census.gov site where the area for each state is maintained, we can use this data to create another dictionary object. For each state we find, we append it to the dictionary object only if it is a state we are tracking (we need to have equal length series when building our dataframe).

In [8]:
dicStateArea = {}

for i in range(len(rowsStatePop)):
    if str(rowsStatePop[i].xpath('td')[0].text).title() in lstStateName:
        dicStateArea[str(rowsStatePop[i].xpath('td')[0].text).title()] = int(rowsStatePop[i].xpath('td')[3].text.replace(",", ""))

Now, we append this new "Area" column to the existing dataframe and use it to calculate the "Density" column by dividing that into the existing "Population" column.

In [9]:
dfStateArea = pd.DataFrame.from_dict(dicStateArea, orient='index', columns=["Area"])
dfStateData = pd.concat([dfStateData, dfStateArea], axis=1)
dfStateData["Density"] = dfStateData["Population"] / dfStateData["Area"]

In [10]:
print(f"Rather than simply using the Density as a factor due to the wide band of data (for example, the District of Columbia - the most densely populated 'state' - has a density of {round(dfStateData.loc['District Of Columbia']['Density'], 0)} and Alaska - the most thinly populated state - has a density of {round(dfStateData.loc['Alaska']['Density'], 0)}), we want to 'temper' these wide differences in densities. We can do so by applying the log function (in fact log + 1, to remove any zeros that might result). Once we have the log value for each state's density, we can take the average of that column and divide that into each state's log value to provide a scaling factor for each state. This scaling factor will actually be the divisor for each state's Per Capita Death figure such that those states with a factor less than 1 will be scaled up (increasing the relative number of deaths) while the converse would also apply, i.e. states that have a factor higher than 1 will be scaled down for the Adjusted Per Capita Death.")

Rather than simply using the Density as a factor due to the wide band of data (for example, the District of Columbia - the most densely populated 'state' - has a density of 11570.0 and Alaska - the most thinly populated state - has a density of 1.0), we want to 'temper' these wide differences in densities. We can do so by applying the log function (in fact log + 1, to remove any zeros that might result). Once we have the log value for each state's density, we can take the average of that column and divide that into each state's log value to provide a scaling factor for each state. This scaling factor will actually be the divisor for each state's Per Capita Death figure such that those states with a factor less than 1 will be scaled up (increasing the relative number of deaths) while the converse would also apply, i.e. states that have a factor higher than 1 will be scaled down for the Adjusted Per Capita Death.


In [11]:
dfStateData["Density_Log"] = np.log10(dfStateData["Density"]) + 1
dfStateData["Density_Factor"] = dfStateData["Density_Log"] / np.mean(dfStateData["Density_Log"])
dfStateData["AdjustedPerCapitaDeaths"] = dfStateData["PerCapitaDeaths"] / dfStateData["Density_Factor"]
dfStateData

Unnamed: 0_level_0,Population,Deaths,PerCapitaDeaths,Area,Density,Density_Log,Density_Factor,AdjustedPerCapitaDeaths
State,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Alabama,4903185,16418,334.843576,50645,96.814789,2.985942,0.989896,338.261395
Alaska,731545,945,129.178656,570641,1.281971,1.107878,0.367282,351.714751
Arizona,7278717,23983,329.494882,113594,64.076597,2.806699,0.930474,354.115188
Arkansas,3017804,9058,300.152031,52035,57.995657,2.763395,0.916118,327.634807
California,39512223,76349,193.228814,155779,253.642808,3.404223,1.128564,171.216547
Colorado,5758736,10383,180.299982,103642,55.563729,2.744791,0.90995,198.142725
Connecticut,3565287,9077,254.59381,4842,736.325279,3.86707,1.282006,198.590115
Delaware,973764,2271,233.218727,1949,499.62237,3.698642,1.226169,190.201057
District Of Columbia,705749,1207,171.023976,61,11569.655738,5.06332,1.678586,101.885733
Florida,21477737,62347,290.286635,53625,400.51724,3.602621,1.194337,243.052576


The next step is to add the vote for Office of the President as represented by the Electoral College vote (in 2020). These data come from the archives.gov website.

In [12]:
u = http.request("GET", "https://www.archives.gov/electoral-college/2020")
if u.status == 200:
    strStateECol = u.data.decode('utf-8', errors='ignore')
    treeStateECol = html.fromstring(strStateECol)
    rowsStateECol = treeStateECol.xpath('//div[contains(@class, "region-content")]//table[contains(@width, "100%")]//tr[td[a]]')

The archives.gov website doesn't publish which state won - only a count of votes for each candidate. Some states award Electoral College votes in more than 1 block so we need to compare the votes cast for one candidate against those cast for the opposing candidate and choose whichever has the higher number of votes in that state. Once we have those figures determined, we add the political affiliation of the state into the dictionary object, then use that dictionary object to create a ewn column in our existing dataframe.

In [13]:
dicStateECol = {}

for i in range(len(rowsStateECol)):
    ecolState = str(rowsStateECol[i].xpath('td/a')[0].text).title()
    if ecolState in lstStateName:
        voteDem = int(str(rowsStateECol[i].xpath('td')[2].text_content()).replace("-", "0"))
        voteRep = int(str(rowsStateECol[i].xpath('td')[3].text_content()).replace("-", "0"))
        dicStateECol[ecolState] = "Republican" if voteRep > voteDem else "Democratic"

dfStateECol = pd.DataFrame.from_dict(dicStateECol, orient='index', columns=["ElectoralCollege2020"])
dfStateData = pd.concat([dfStateData, dfStateECol], axis=1)
dfStateData

Unnamed: 0_level_0,Population,Deaths,PerCapitaDeaths,Area,Density,Density_Log,Density_Factor,AdjustedPerCapitaDeaths,ElectoralCollege2020
State,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Alabama,4903185,16418,334.843576,50645,96.814789,2.985942,0.989896,338.261395,Republican
Alaska,731545,945,129.178656,570641,1.281971,1.107878,0.367282,351.714751,Republican
Arizona,7278717,23983,329.494882,113594,64.076597,2.806699,0.930474,354.115188,Democratic
Arkansas,3017804,9058,300.152031,52035,57.995657,2.763395,0.916118,327.634807,Republican
California,39512223,76349,193.228814,155779,253.642808,3.404223,1.128564,171.216547,Democratic
Colorado,5758736,10383,180.299982,103642,55.563729,2.744791,0.90995,198.142725,Democratic
Connecticut,3565287,9077,254.59381,4842,736.325279,3.86707,1.282006,198.590115,Democratic
Delaware,973764,2271,233.218727,1949,499.62237,3.698642,1.226169,190.201057,Democratic
District Of Columbia,705749,1207,171.023976,61,11569.655738,5.06332,1.678586,101.885733,Democratic
Florida,21477737,62347,290.286635,53625,400.51724,3.602621,1.194337,243.052576,Republican


Although not expected, let's replace all NaN values with "Not Applicable"

In [14]:
dfStateData.fillna('Not Applicable', inplace=True)

To more easily identify political affiliation, let's color-code the cells in the dataframe where the political party is named.

In [15]:
def colorCodeParties(val):
    bgcolor = ""
    if (val == "Republican"):
        bgcolor = "background-color:red; color:white"
    elif (val == "Democratic"):
        bgcolor = "background-color:blue; color:white"
    elif (val == "Not Applicable"):
        bgcolor = "background-color:white; color:black"
    return bgcolor


dfStateData.style.applymap(colorCodeParties)

Unnamed: 0_level_0,Population,Deaths,PerCapitaDeaths,Area,Density,Density_Log,Density_Factor,AdjustedPerCapitaDeaths,ElectoralCollege2020
State,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Alabama,4903185,16418,334.843576,50645,96.814789,2.985942,0.989896,338.261395,Republican
Alaska,731545,945,129.178656,570641,1.281971,1.107878,0.367282,351.714751,Republican
Arizona,7278717,23983,329.494882,113594,64.076597,2.806699,0.930474,354.115188,Democratic
Arkansas,3017804,9058,300.152031,52035,57.995657,2.763395,0.916118,327.634807,Republican
California,39512223,76349,193.228814,155779,253.642808,3.404223,1.128564,171.216547,Democratic
Colorado,5758736,10383,180.299982,103642,55.563729,2.744791,0.90995,198.142725,Democratic
Connecticut,3565287,9077,254.59381,4842,736.325279,3.86707,1.282006,198.590115,Democratic
Delaware,973764,2271,233.218727,1949,499.62237,3.698642,1.226169,190.201057,Democratic
District Of Columbia,705749,1207,171.023976,61,11569.655738,5.06332,1.678586,101.885733,Democratic
Florida,21477737,62347,290.286635,53625,400.51724,3.602621,1.194337,243.052576,Republican


Now that we have the full set of data in our dataframe, let's take a subset of that data, choosing only the (unadjusted) Per Capita Death figure, and the two political affiliation columns, then sort the result by Per Capita Death in descending order.

In [16]:
dfPerCapitaDeaths = dfStateData.sort_values(by=["PerCapitaDeaths"], ascending=False).drop(
    ["Population", "Deaths", "Area", "Density", "Density_Log", "Density_Factor", "AdjustedPerCapitaDeaths"], axis=1)
dfPerCapitaDeaths.style.applymap(colorCodeParties)

Unnamed: 0_level_0,PerCapitaDeaths,ElectoralCollege2020
State,Unnamed: 1_level_1,Unnamed: 2_level_1
Mississippi,349.209667,Republican
Alabama,334.843576,Republican
Arizona,329.494882,Democratic
New Jersey,324.852317,Democratic
Louisiana,321.653315,Republican
New York,305.707526,Democratic
Arkansas,300.152031,Republican
Tennessee,299.43592,Republican
Georgia,293.941383,Democratic
West Virginia,293.000518,Republican


If there are any trends, we may be able to see them developing now. Let's take another subset of that data, choosing only the Adjusted Per Capita Death figure, and the two political affiliation columns, then sort the result by Adjusted Per Capita Death in descending order.

In [17]:
dfAdjustedPerCapitaDeaths = dfStateData.sort_values(by=["AdjustedPerCapitaDeaths"], ascending=False).drop(
    ["Population", "Deaths", "PerCapitaDeaths", "Area", "Density", "Density_Log", "Density_Factor"], axis=1)
dfAdjustedPerCapitaDeaths.style.applymap(colorCodeParties)

Unnamed: 0_level_0,AdjustedPerCapitaDeaths,ElectoralCollege2020
State,Unnamed: 1_level_1,Unnamed: 2_level_1
Wyoming,447.996277,Republican
Montana,437.892451,Republican
South Dakota,405.130181,Republican
North Dakota,386.691809,Republican
Mississippi,375.896463,Republican
New Mexico,369.330459,Democratic
Arizona,354.115188,Democratic
Alaska,351.714751,Republican
Alabama,338.261395,Republican
Nevada,334.590503,Democratic


Again, if there are any trends in the data, they may be visible now, after adjusting for population density. Next, we want to see if these trends (if any) for both unadjusted and adjusted Per Capita Death rates can be confirmed by regression. We will create two models, one for each unadjusted and adjusted Per Capita Death rates. For the X variables, we need to construct dummy variables since the ElectoralCollege2020 column contains categorical data. This will give us the value 1 (in this case, when the columns contained "Republican"), or 0 otherwise.

In [18]:
X0 = pd.get_dummies(data=dfPerCapitaDeaths[['ElectoralCollege2020']])
X0 = X0[['ElectoralCollege2020_Republican']]
X0.head(10)

Unnamed: 0_level_0,ElectoralCollege2020_Republican
State,Unnamed: 1_level_1
Mississippi,1
Alabama,1
Arizona,0
New Jersey,0
Louisiana,1
New York,0
Arkansas,1
Tennessee,1
Georgia,0
West Virginia,1


For the y value in the regression model, we simply choose the (unadjusted) Per Capita Deaths

In [19]:
y0 = dfPerCapitaDeaths['PerCapitaDeaths']
y0.head(10)

State
Mississippi      349.209667
Alabama          334.843576
Arizona          329.494882
New Jersey       324.852317
Louisiana        321.653315
New York         305.707526
Arkansas         300.152031
Tennessee        299.435920
Georgia          293.941383
West Virginia    293.000518
Name: PerCapitaDeaths, dtype: float64

Now we perform the regression and report the coefficients

In [20]:
model0 = LinearRegression()
model0.fit(X0, y0)
coeff_parameter_0 = pd.DataFrame(model0.coef_, X0.columns, columns=['Coefficient'])
coeff_parameter_0

Unnamed: 0,Coefficient
ElectoralCollege2020_Republican,42.577756


We can also use another method to regress on the data. Using this method both validates our model from created using the LinearRegression library, and also provides additional statistical information.

In [21]:
X0ols = sm.add_constant(X0)
mod0 = sm.OLS(y0, X0ols)
fit0 = mod0.fit()
fit0.summary2()

0,1,2,3
Model:,OLS,Adj. R-squared:,0.075
Dependent Variable:,PerCapitaDeaths,AIC:,576.5075
Date:,2021-12-26 19:11,BIC:,580.3711
No. Observations:,51,Log-Likelihood:,-286.25
Df Model:,1,F-statistic:,5.053
Df Residuals:,49,Prob (F-statistic):,0.0291
R-squared:,0.093,Scale:,4572.3

0,1,2,3,4,5,6
,Coef.,Std.Err.,t,P>|t|,[0.025,0.975]
const,215.9128,13.2611,16.2817,0.0000,189.2636,242.5620
ElectoralCollege2020_Republican,42.5778,18.9406,2.2480,0.0291,4.5151,80.6404

0,1,2,3
Omnibus:,2.701,Durbin-Watson:,0.238
Prob(Omnibus):,0.259,Jarque-Bera (JB):,2.577
Skew:,-0.498,Prob(JB):,0.276
Kurtosis:,2.53,Condition No.:,3.0


In [22]:
# View just the params instead of the whole summary
fit0.params

const                              215.912787
ElectoralCollege2020_Republican     42.577756
dtype: float64

In [23]:
# View just the pvalues instead of the whole summary
fit0.pvalues

const                              2.097860e-21
ElectoralCollege2020_Republican    2.911230e-02
dtype: float64

We can also perform an ANOVA test to confirm the P values returned above.

In [24]:
dfANOVA0 = pd.concat([X0, y0], axis=1)
modelANOVA0 = ols('PerCapitaDeaths ~ C(ElectoralCollege2020_Republican)', data=dfANOVA0).fit()
dfANOVAresults0 = sm.stats.anova_lm(modelANOVA0, typ=2)
dfANOVAresults0

Unnamed: 0,sum_sq,df,F,PR(>F)
C(ElectoralCollege2020_Republican),23105.145827,1.0,5.053319,0.029112
Residual,224041.303831,49.0,,


Next, we want to confirm any trends observed visually for the adjusted Per Capita Death rates by using regression. Similarly to the other model, for the X variables we need to construct dummy variables since the ElectoralCollege2020 column contains categorical data. This will give us the value 1 (in this case, when the columns contained "Republican"), or 0 otherwise.

In [25]:
X1 = pd.get_dummies(data=dfAdjustedPerCapitaDeaths[['ElectoralCollege2020']])
X1 = X1[['ElectoralCollege2020_Republican']]
X1.head(10)

Unnamed: 0_level_0,ElectoralCollege2020_Republican
State,Unnamed: 1_level_1
Wyoming,1
Montana,1
South Dakota,1
North Dakota,1
Mississippi,1
New Mexico,0
Arizona,0
Alaska,1
Alabama,1
Nevada,0


For the y value in the regression model, we simply choose the (adjusted) Per Capita Deaths

In [26]:
y1 = dfAdjustedPerCapitaDeaths['AdjustedPerCapitaDeaths']
y1.head(10)

State
Wyoming         447.996277
Montana         437.892451
South Dakota    405.130181
North Dakota    386.691809
Mississippi     375.896463
New Mexico      369.330459
Arizona         354.115188
Alaska          351.714751
Alabama         338.261395
Nevada          334.590503
Name: AdjustedPerCapitaDeaths, dtype: float64

Now we can regress the adjusted data

In [27]:
model1 = LinearRegression()
model1.fit(X1, y1)
coeff_parameter_1 = pd.DataFrame(model1.coef_, X1.columns, columns=['Coefficient'])
coeff_parameter_1

Unnamed: 0,Coefficient
ElectoralCollege2020_Republican,96.175532


Once again, let's use OLS to confirm the most recent model

In [28]:
X1ols = sm.add_constant(X1)
mod1 = sm.OLS(y1, X1ols)
fit1 = mod1.fit()
fit1.summary2()

0,1,2,3
Model:,OLS,Adj. R-squared:,0.271
Dependent Variable:,AdjustedPerCapitaDeaths,AIC:,590.4476
Date:,2021-12-26 19:11,BIC:,594.3112
No. Observations:,51,Log-Likelihood:,-293.22
Df Model:,1,F-statistic:,19.62
Df Residuals:,49,Prob (F-statistic):,5.3e-05
R-squared:,0.286,Scale:,6009.5

0,1,2,3,4,5,6
,Coef.,Std.Err.,t,P>|t|,[0.025,0.975]
const,200.9288,15.2031,13.2163,0.0000,170.3769,231.4806
ElectoralCollege2020_Republican,96.1755,21.7144,4.4291,0.0001,52.5388,139.8123

0,1,2,3
Omnibus:,0.759,Durbin-Watson:,0.551
Prob(Omnibus):,0.684,Jarque-Bera (JB):,0.749
Skew:,0.272,Prob(JB):,0.688
Kurtosis:,2.761,Condition No.:,3.0


In [29]:
# View just the params instead of the whole summary
fit1.params

const                              200.928762
ElectoralCollege2020_Republican     96.175532
dtype: float64

In [30]:
# View just the pvalues instead of the whole summary
fit1.pvalues

const                              8.920548e-18
ElectoralCollege2020_Republican    5.303262e-05
dtype: float64

Once more, use ANOVA to confirm our results

In [31]:
dfANOVA1 = pd.concat([X1, y1], axis=1)
modelANOVA1 = ols('AdjustedPerCapitaDeaths ~ C(ElectoralCollege2020_Republican)', data=dfANOVA1).fit()
dfANOVAresults1 = sm.stats.anova_lm(modelANOVA1, typ=2)
dfANOVAresults1

Unnamed: 0,sum_sq,df,F,PR(>F)
C(ElectoralCollege2020_Republican),117888.753279,1.0,19.616991,5.3e-05
Residual,294466.618827,49.0,,


Finally, report the results from both adjusted and unadjusted models

In [32]:
coefEColRep0 = round(fit0.params.ElectoralCollege2020_Republican, 2)
print(f"The unadjusted model predicts that whenever a state was declared as Republican in the 2020 Electoral College, this results in {abs(coefEColRep0)} {'additional' if coefEColRep0 > 0 else 'fewer'} additional deaths over a state that did not vote Republican in the 2020 Electoral College")
print(f"The significance test (P-value) for the predictor variable 'ElectoralCollege2020_Republican' is: {round(fit0.pvalues.ElectoralCollege2020_Republican, 4)} and is {'NOT ' if fit0.pvalues.ElectoralCollege2020_Republican > 0.05 else ''}statistically significant at the 0.05 level")

The unadjusted model predicts that whenever a state was declared as Republican in the 2020 Electoral College, this results in 42.58 additional additional deaths over a state that did not vote Republican in the 2020 Electoral College
The significance test (P-value) for the predictor variable 'ElectoralCollege2020_Republican' is: 0.0291 and is statistically significant at the 0.05 level


In [33]:
coefEColRep1 = round(coeff_parameter_1.loc['ElectoralCollege2020_Republican']['Coefficient'], 2)
print(f"The density-adjusted model predicts that whenever a state was declared as Republican in the 2020 Electoral College, this results in {abs(coefEColRep1)} {'additional' if coefEColRep1 > 0 else 'fewer'} additional deaths over a state that did not vote Republican in the 2020 Electoral College")
print(f"The significance test (P-value) for the predictor variable 'ElectoralCollege2020_Republican' is: {round(fit1.pvalues.ElectoralCollege2020_Republican, 4)} and is {'NOT ' if fit1.pvalues.ElectoralCollege2020_Republican > 0.05 else ''}statistically significant at the 0.05 level")


The density-adjusted model predicts that whenever a state was declared as Republican in the 2020 Electoral College, this results in 96.18 additional additional deaths over a state that did not vote Republican in the 2020 Electoral College
The significance test (P-value) for the predictor variable 'ElectoralCollege2020_Republican' is: 0.0001 and is statistically significant at the 0.05 level
