# Welcome to Python!

<img src="Image/logo.png" width="400" height="600">

## Visualizing COVID-19 Cases

In [None]:
import pandas as pd
df = pd.read_csv('data/covid_country.csv')
df.index = df['date']
del df['date']

In [None]:
import bar_chart_race as bcr
bcr.bar_chart_race(df)

## COVID-19 Death Cases by Race

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import chi2_contingency

data = pd.read_csv('Data/race.csv')
case = data['Per100,000']
data

In [None]:
fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
label = ['White','Black','Latin','Asian','AIAN','NHPI','Other']
location = [1,2,3,4,5,6,7,8]
ax.bar(label, case, alpha=0.5)
ax.set_ylabel('Number of deaths per 100,000')
ax.set_xlabel('Race/Ethnicity')
#ax.set_ylim(0,78000)
ax.set_title('Number of COVID-19 deaths per 100,000 Americans', fontweight="bold", fontsize=15, pad=20)

ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

for a,b in zip(location, np.round(case, decimals=1)):
    plt.text(a-1.25, b+1, str(b), color='b', size=14)
    
plt.grid(axis='y')

plt.show()

freq = data['Relative'][0:6]
fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
label = ['White','Black','Hispanic','Asian','AIAN','NHPI']


ax.barh(label, freq,color=('#c1ba9d','#733d47','#733d47','#733d47','#733d47','#733d47'))
ax.invert_yaxis()
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.get_xaxis().set_visible(False)
ax.set_yticklabels(label, size=15)
ax.set_title('Relative COVID-19 Death Rates', fontweight="bold", fontsize=20)

plt.text(0.8,0,'1.0',color='black',va="center",size=15)
plt.text(2,1,'2.2',color='white',va="center",size=15)
plt.text(1,2,'1.2',color='white',va="center",size=15)
plt.text(0.75,3,'0.9',color='white',va="center",size=15)
plt.text(1.25,4,'1.4',color='white',va="center",size=15)
plt.text(0.76,5,'0.9',color='white',va="center",size=15)

plt.show()

As we can see, more black people are dying from COVID-19 compared to white people.

## Diabetes by Race

Since we know that diabetes is related with COVID-19 death cases, we will compare diabetes prevalence between African Americans and Whites.

In [None]:
data = pd.read_csv('Data/NHANES.csv')

In [None]:
data[['Diabetes','Race1']]

The original dataset looks like this. We will convert this to a readable dataset.

In [None]:
adult = data[data['Age']>=18]
white = adult[adult['Race1'] == 'White']
black = adult[adult['Race1']=='Black']

diabetes = {'Diabetes': [white['Diabetes'].value_counts()['Yes'], black['Diabetes'].value_counts()['Yes']],
           'Not Diabetes': [white['Diabetes'].value_counts()['No'], black['Diabetes'].value_counts()['No']]}

df = pd.DataFrame(diabetes, columns = ['Diabetes','Not Diabetes'], index=['White', 'Black'])
df['Percentage'] = df['Diabetes']/(df['Diabetes']+df['Not Diabetes']) *100

In [None]:
df

In [None]:
fig = plt.figure()
ax = fig.add_axes([0,0,1,1])
label = ['White','Black']
location = [1,2]
ax.bar(label, df['Percentage'], alpha=0.5, width=0.5)
ax.set_ylabel('Percentage [%]', size=15)
ax.set_xlabel('Race/Ethnicity', size=15)
ax.set_title('Diabetes Percentage', fontweight="bold", fontsize=15, pad=20)

ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

for a,b in zip(location, np.round(df['Percentage'], decimals=1)):
    plt.text(a-1.1, b+0.2, str(b)+'%', color='b', size=20)
    
plt.grid(axis='y')

plt.show()

## Statistical Analysis

In [None]:
contig_table = pd.DataFrame(diabetes, columns = ['Diabetes','Not Diabetes'], index=['White', 'Black'])

chi2, p, dof, expected = chi2_contingency(contig_table)

In [None]:
print("chi2: ",chi2)
print("P-value: ", p)
print("Degrees of freedom: ", dof)

In [None]:
df

In [None]:
def func(pi, n):
    a = pi * (1 - pi) / n
    return a

white_pi = df['Percentage']['White']/100
black_pi = df['Percentage']['Black']/100

white_n = df['Diabetes']['White'] + df['Not Diabetes']['White']
black_n = df['Diabetes']['Black'] + df['Not Diabetes']['Black']

se = np.sqrt(func(white_pi, white_n) + func(black_pi, black_n))

low_ci_bound = (black_pi - white_pi) - 1.96*se
high_ci_bound = (black_pi - white_pi) + 1.96*se

mean_diff_ci = [low_ci_bound, high_ci_bound]
print("95% CI:", mean_diff_ci)
print("Proportion difference: ", black_pi - white_pi)

In [None]:
white_n + black_n

## Report Results

<img src="Image/diabetes_data.png" width=500>

Previous research suggests that diabetes raises the risk of death from COVID-19. The proportion of diabetes among whites was 8.82%, while the proportion among African Americans was 15.20%. The result from chi-square test of independence indicated that the difference in proportions is significant ($\chi^2(1, N=5835)=33.128; p < 0.001)$, and the proportion difference was 0.064 (95% CI = [0.039, 0.089]).

This suggests that African Americans are more likely to die from COVID-19, because they are more likely to have diabetes compared to whites.

## Next Steps

- The reasons behind diabetes differences
- Consider other factors behind COVID-19 death cases

We will be conducting this in the future labs.