## Covid Morbidity Factors Exploration Project
### Chris Weilacker, Kirk Kosinski, Patrick Cao, OJ Alcaraz
### 9 February 2021

#### Datasets are orginally from the following:

##### COVID-19 Case Mortality Ratios by Country

https://www.kaggle.com/paultimothymooney/coronavirus-covid19-mortality-rate-by-country

##### Percentage of Population 65 and Over by Country (2019)

https://www.kaggle.com/krukmat/demographic-and-socioeconomic-unesco

##### Percentage of Obesity among Adults by Country (2016)
https://www.kaggle.com/amanarora/obesity-among-adults-by-country-19752016

### Problem Question: Can we predict the Case Mortality Ratio of a Country utilizing its Pop above 65 and Percentage of Obesity

#### We decided on using Population Over Age 65 and Obesity as in the US over 80% of the deaths were in the population 65 and over, and the CDC has stated that 94 % of deaths had some underlying health condition.  In it's Vaccine Distribution plan California has listed several diseases such as heart disease and Diabetes which are known to be exaserbated by Obesity as priority for recieving a vaccine.  Our idea is that we can more accurately predict the Mortality Ratio of Covid-19 by using both population 65+ and Obesity rather than just population 65+, which can show that creating a healthier population may be the best way to prevent the devastation in future pandemics that the US is currently facing.

### On the following slide we will run some Data Preprocessing to get all of our data in one Data Frame df.

In [4]:

import numpy as np
import pandas as pd
from matplotlib import rcParams
import seaborn as sns
from scipy.stats import zscore

# allow output to span multiple output lines in the console
pd.set_option('display.max_columns', 500)

# switch to seaborn default stylistic parameters
# see the very useful https://seaborn.pydata.org/tutorial/aesthetics.html
sns.set()
sns.set_context('talk')

# change default plot size
rcParams['figure.figsize'] = 10,8
# Covid Data
covidDF = pd.read_csv('https://raw.githubusercontent.com/chrisweilacker/CovidMachineLearningProject/master/global_covid19_mortality_rates.csv', index_col=0)
# 2016 Data for Both Sexes is in Column Both Sexes
obesityDF = pd.read_csv('https://raw.githubusercontent.com/chrisweilacker/CovidMachineLearningProject/master/obesity-data.csv', skiprows=3)
# Demographics Data which includes the percentage of population over 65 per country.
demoDF = pd.read_csv('https://raw.githubusercontent.com/chrisweilacker/CovidMachineLearningProject/master/DEMO_Global.csv')

# Create our DataFrame with the necessary data from all three datasets
df = covidDF.merge(obesityDF[['Country', 'Both sexes']], how='inner') #Copy in the whole Covid Data and the Obesity Data
# Modify name of Both Sexes to Obesity and get just the number
df.rename(columns={'Both sexes': 'Obesity'}, inplace=True)
obNumber = df['Obesity'].str.split(" ", n = 1, expand = True) 
df['Obesity'] = obNumber[0]

#Drop Rows with no Obesity Data
df.drop(df.loc[df['Obesity']=='No'].index, inplace=True)
#Drop Rows with no Covid Data
df.drop(df.loc[df['Mortality Ratio']==0].index, inplace=True)
df.rename(columns={'Mortality Ratio': 'Covid-19 Mortality Ratio'}, inplace=True)
# Convert Obesity data to float
df['Obesity'] = pd.to_numeric(df['Obesity'], downcast="float")

df = df.merge(demoDF[['Country', 'Value']][(demoDF['Time']==2017) & (demoDF['Indicator']=='Population aged 65 years or older ')], how='inner') #Copy in the 65 and Over Data
df.rename(columns={'Value': 'Pop65Over'}, inplace=True)
df = df.merge(demoDF[['Country', 'Value']][(demoDF['Time']==2017) & (demoDF['Indicator']=='Total population ')], how='inner') #Copy in the 65 and Over Data
df.rename(columns={'Value': 'TotPop'}, inplace=True)

# Create Perc 65 and Over
df['Perc65Over'] = (100 * df['Pop65Over']/df['TotPop'])

corr = df[['Covid-19 Mortality Ratio', 'Perc65Over', 'Obesity']].corr()
#It seems that there is a correlation between both Perc 65 and Over, Obesity and Covid-19 Mortality Ratio
#Though the Perc65 and Over seems to be stronger than Obesity, but the data has not been normalized yet.
corr.style.background_gradient(cmap='coolwarm')


Unnamed: 0,Covid-19 Mortality Ratio,Perc65Over,Obesity
Covid-19 Mortality Ratio,1.0,0.12778,0.037187
Perc65Over,0.12778,1.0,0.382828
Obesity,0.037187,0.382828,1.0
