# Statistical Tests for Statewide Vaccine Progress Section of Report

<li>Two-sample t-tests were performed</li>
<li>Null Hypothesis: No statistically significant difference in mean sentiment before and after event</li>

## COVID-19 vaccine availability by age bracket

### 1. EUA of Pfizer vaccine issued for adolescents, 12-15 year olds

p > 0.05, which means that we fail to reject the null hypothesis and conclude that a significant difference does not exist

In [1]:
import pandas as pd
import numpy as np
import scipy.stats as stats

new_df = pd.read_csv('meansentimentbystate_event.csv')

EUA_12_15yo_b = new_df.loc[(new_df['Before/After'] == 'One month before') & (new_df['Event'] == 'EUA for Pfizer vaccine for 12-15yo')]
EUA_12_15yo_a = new_df.loc[(new_df['Before/After'] == 'One month after') & (new_df['Event'] == 'EUA for Pfizer vaccine for 12-15yo')]

x = np.array(EUA_12_15yo_b['Mean Sentiment'])
y = np.array(EUA_12_15yo_a['Mean Sentiment'])

print(np.var(x), np.var(y))
print(np.var(x)/np.var(y))
stats.ttest_ind(a=x, b=y, equal_var=True)

0.05281053808337061 0.060415665284980664
0.8741199461143617


Ttest_indResult(statistic=0.44437481854548555, pvalue=0.6577513949779046)

### 2. EUA of Comirnaty issued for 16+

p > 0.05, which means that we fail to reject the null hypothesis and conclude that a significant difference does not exist

In [2]:
comirnaty_b = new_df.loc[(new_df['Before/After'] == 'One month before') & (new_df['Event'] == 'Comirnaty for 16yo+ FDA approval')]
comirnaty_a = new_df.loc[(new_df['Before/After'] == 'One month after') & (new_df['Event'] == 'Comirnaty for 16yo+ FDA approval')]
x = np.array(comirnaty_b['Mean Sentiment'])
y = np.array(comirnaty_a['Mean Sentiment'])
print(np.var(x), np.var(y))
print(np.var(x)/np.var(y))
stats.ttest_ind(a=x, b=y, equal_var=True)

0.04281375458591164 0.13562580169303806
0.3156755871778147


Ttest_indResult(statistic=1.7005196047209021, pvalue=0.09217264569675973)

### 3. Pfizer Booster for 65+, 18-64 at high-risk

p > 0.05, which means that we fail to reject the null hypothesis and conclude that a significant difference does not exist

In [3]:
pfizerbooster_b = new_df.loc[(new_df['Before/After'] == 'One month before') & (new_df['Event'] == 'Pfizer Booster for 65yo+ & 18-64 at high risk')]
pfizerbooster_a = new_df.loc[(new_df['Before/After'] == 'One month after') & (new_df['Event'] == 'Pfizer Booster for 65yo+ & 18-64 at high risk')]
x = np.array(pfizerbooster_b['Mean Sentiment'])
y = np.array(pfizerbooster_a['Mean Sentiment'])
print(np.var(x), np.var(y))
print(np.var(x)/np.var(y))
stats.ttest_ind(a=x, b=y, equal_var=True)

0.13883615644759187 0.07460995893894962
1.8608260669489982


Ttest_indResult(statistic=0.7894574475001506, pvalue=0.43171256847029393)

### 4. EUA of Pfizer vaccine issued for children, 5-11 years old

p < 0.05, which means that we reject the null hypothesis and conclude that a significant difference does exist

In [4]:
eua5_11_b = new_df.loc[(new_df['Before/After'] == 'One month before') & (new_df['Event'] == 'EUA for Pfizer vaccine for 5-11yo')]
eua5_11_a = new_df.loc[(new_df['Before/After'] == 'One month after') & (new_df['Event'] == 'EUA for Pfizer vaccine for 5-11yo')]
x = np.array(eua5_11_b['Mean Sentiment'])
y = np.array(eua5_11_a['Mean Sentiment'])
print(np.var(x), np.var(y))
print(np.var(x)/np.var(y))
stats.ttest_ind(a=x, b=y, equal_var=True)

0.06711237216949087 0.0588032112299724
1.141304543845783


Ttest_indResult(statistic=-3.555663968604342, pvalue=0.0005777530999668665)

## COVID Variants of Concern

### 1. Initial cases of the Gamma Variant

p > 0.05, which means that we fail to reject the null hypothesis and conclude that a significant difference does not exist

In [5]:
gamma_b = new_df.loc[(new_df['Before/After'] == 'One month before') & (new_df['Event'] == 'Gamma Variant')]
gamma_a = new_df.loc[(new_df['Before/After'] == 'One month after') & (new_df['Event'] == 'Gamma Variant')]
x = np.array(gamma_b['Mean Sentiment'])
y = np.array(gamma_a['Mean Sentiment'])
print(np.var(x), np.var(y))
print(np.var(x)/np.var(y))
stats.ttest_ind(a=x, b=y, equal_var=True)

0.03876289484245502 0.03617216836680471
1.0716220949039879


Ttest_indResult(statistic=-0.8153377335058878, pvalue=0.4168362474917261)

### 2. Initial cases of the Beta Variant

p > 0.05, which means that we fail to reject the null hypothesis and conclude that a significant difference does not exist

In [6]:
beta_b = new_df.loc[(new_df['Before/After'] == 'One month before') & (new_df['Event'] == 'Beta Variant')]
beta_a = new_df.loc[(new_df['Before/After'] == 'One month after') & (new_df['Event'] == 'Beta Variant')]
x = np.array(beta_b['Mean Sentiment'])
y = np.array(beta_a['Mean Sentiment'])
print(np.var(x), np.var(y))
print(np.var(x)/np.var(y))
stats.ttest_ind(a=x, b=y, equal_var=True)

0.026798734394866575 0.03810960605015825
0.7032015591973114


Ttest_indResult(statistic=-0.5568882125680231, pvalue=0.5788605925080398)

### 3. Initial Cases of the Delta Variant

p > 0.05, which means that we fail to reject the null hypothesis and conclude that a significant difference does not exist

In [7]:
delta_b = new_df.loc[(new_df['Before/After'] == 'One month before') & (new_df['Event'] == 'Delta Variant')]
delta_a = new_df.loc[(new_df['Before/After'] == 'One month after') & (new_df['Event'] == 'Delta Variant')]
x = np.array(delta_b['Mean Sentiment'])
y = np.array(delta_a['Mean Sentiment'])
print(np.var(x), np.var(y))
print(np.var(x)/np.var(y))
stats.ttest_ind(a=x, b=y, equal_var=True)

0.12345591474166683 0.040467584628188585
3.0507359378120853


Ttest_indResult(statistic=-1.8308195176513484, pvalue=0.07010541156628106)

### 4. Initial Cases of the Omicron Variant

p < 0.05, which means that we reject the null hypothesis and conclude that a significant difference does exist

In [8]:
omicron_b = new_df.loc[(new_df['Before/After'] == 'One month before') & (new_df['Event'] == 'Omicron Variant')]
omicron_a = new_df.loc[(new_df['Before/After'] == 'One month after') & (new_df['Event'] == 'Omicron Variant')]
x = np.array(omicron_b['Mean Sentiment'])
y = np.array(omicron_a['Mean Sentiment'])
print(np.var(x), np.var(y))
print(np.var(x)/np.var(y))
stats.ttest_ind(a=x, b=y, equal_var=True)

0.056326545475833226 0.08087432657472256
0.6964700401405037


Ttest_indResult(statistic=2.6556082515385224, pvalue=0.009241211749888107)

## Pause in Vaccine Use

### 1. Pause in Use of the Johnson & Johnson Vaccine

p > 0.05, which means that we fail to reject the null hypothesis and conclude that a significant difference does not exist

In [9]:
jjpause_b = new_df.loc[(new_df['Before/After'] == 'One month before') & (new_df['Event'] == 'Pause in J&J')]
jjpause_a = new_df.loc[(new_df['Before/After'] == 'One month after') & (new_df['Event'] == 'Pause in J&J')]
x = np.array(jjpause_b['Mean Sentiment'])
y = np.array(jjpause_a['Mean Sentiment'])
print(np.var(x), np.var(y))
print(np.var(x)/np.var(y))
stats.ttest_ind(a=x, b=y, equal_var=True)

0.034034987398303615 0.06611107364147988
0.5148152272170806


Ttest_indResult(statistic=1.6290253643479, pvalue=0.10648574994126875)