### Table of Contents

* [Data Sources](#datasources)
* [Data Importing](#dataimporting)
* [Data Cleaning](#datacleaning)
* [Visualizations for global malnutrition problem](#visualizations)
     1. [Histograms](#histogram)
     2. [Boxplots](#boxplot)
     3. [Heatmap](#heatmap)
     4. [US malnutrtion graph](#US)
     5. [Top 5 countries with the LEAST malnutrition problem graphs](#top5)
     6. [Top 5 countries with the MOST malnutrition probelm graphs](#bot5)   
     7. [PCA on Global malnutrition data](#PCA)
     8. [Continent Analysis](#continent)
     9. [Mapping with Geopandas](#map)
     10. [Animation of Map over time](#animation)
* [COVID-19 Analysis](#covid)
     1. [Histogram of COVID-19 death (min-max normalized)](#histcovid)
     2. [Highest % of COVID-19 deaths in the world](#highestdeath)
     3. [Barplot comparing 2020 and 2021](#covidbar)
     4. [Linear regression of malnutrition % and  COVID-19 death %](#covidlinear)
     5. [Creating a scoring system for the countries to compare malnutrition to COVID-19 deathrate](#covidscore)
    
    

## I. Data Sources <a class="anchor" id="datasources"></a>

1. The malnutrtion data is from UNICEF.org and it has the ratio of kids under 5 with stunt/severe stunting, wasted/severe wasted, and overweight problems at country-level from 1970 to 2021.\
(datalink: https://data.unicef.org/resources/data_explorer/unicef_f/?ag=UNICEF&df=GLOBAL_DATAFLOW&ver=1.0&dq=.NT_ANT_HAZ_NE2+NT_ANT_HAZ_NE3..&startPeriod=2016&endPeriod=2021).

2. COVID deaths data from OurWorldInData.org.\
(datalink: https://ourworldindata.org/coronavirus-source-data)

3. Continent & Country Data from Kaggle.\
(datalink: https://www.kaggle.com/statchaitya/country-to-continent) 

The data sources described above will be used to extract relevant data that we will use for our analysis.

## II. Data Importing<a class="anchor" id="dataimporting"></a>

In [None]:
# download necessary packages

# basic packages
import pandas as pd  #dataframe
import numpy as np  #calculation
import scipy as sp #calculation
# regression analysis
from sklearn import linear_model #linear regression
import statsmodels.api as sm #linear regression model
# graphing
import matplotlib.pyplot as plt #plot
import seaborn as sns #plot
from scipy.interpolate import interp1d #interpolation
from seaborn import heatmap #heatmap
# geopandas
import os
#import geopandas as gpd
#import descartes
# statistics
from sklearn.preprocessing import MinMaxScaler, normalize
# avoid the redlines
import warnings
warnings.filterwarnings('ignore')

In [None]:
malnutrition_data = pd.read_excel(os.getcwd()+'/Malnutrition1.xlsx')
display(malnutrition_data[:3])

In [None]:
malnutrition_data['World Bank Region'].unique() 

In [None]:
gini_data = pd.read_excel(os.getcwd()+'/gini_africa.xlsx')
display(gini_data[:3])

In [None]:
gini_data['Subregion'].unique() 

In [None]:
gdp_data = pd.read_excel(os.getcwd()+'/GDP_africa.xlsx')
display(gdp_data[:3])

In [None]:
income_per_person = pd.read_csv(os.getcwd()+'/income_per_person_gdppercapita_ppp_inflation_adjusted.csv')
display(income_per_person[:3])

## III. Data Cleaning<a class="anchor" id="datacleaning"></a>

#### Malnutrition Dataset

In [None]:
#Cleaning Malnutrition dataset
malnutrition_data.isnull().sum()

In [None]:
from sklearn.impute import SimpleImputer
import numpy as np
malnutrition_data= malnutrition_data.dropna(subset = ['Wasting','Stunting', 'Underweight' ])
imputer = SimpleImputer(missing_values = np.nan, strategy = 'median')
malnutrition_data[['Severe Wasting']] = imputer.fit_transform(malnutrition_data[['Severe Wasting']])
malnutrition_data[['Overweight']]= imputer.fit_transform(malnutrition_data[['Overweight']])
print("Null values after updating dataset")
print("\n")
print(malnutrition_data.isnull().sum())

#### Gini Dataset

In [None]:
#cleaning Gini Dataset
gini_data.isnull().sum()

In [None]:
gini_data= gini_data.dropna(subset = ['Years','WB Gini[4] %','CIA Gini[6] %' ])                                      
print("Null values after updating dataset")
print("\n")
print(gini_data.isnull().sum())

#### Income per person

In [None]:
#cleaning income per person
income_per_person.isnull().sum()

In [None]:
income_per_person = income_per_person.melt(id_vars=['country'], var_name='year', value_name='Income_per_person')

In [None]:
gdp_data.isnull().sum()

#### Concatenating Datasets

In [None]:
#gdp and gini datasets
economic_data = pd.concat([gdp_data,gini_data],axis=1)
economic_data.head()

In [None]:
economic_data.isnull().sum()

In [None]:
economic_data= economic_data.dropna(subset = ['Country_gini','Subregion','Region','Years','WB Gini[4] %','CIA Gini[6] %' ])
print("Null values after updating dataset")
print("\n")
print(economic_data.isnull().sum())

In [None]:
#malnutrition and gdp datasets
malnutrition_gdp_data = pd.concat([malnutrition_data,gdp_data],axis=1)
malnutrition_gdp_data.head()

In [None]:
malnutrition_gdp_data.isnull().sum()

In [None]:
from sklearn.impute import SimpleImputer
import numpy as np
malnutrition_gdp_data= malnutrition_gdp_data.dropna(subset = ['Country_gdp' ])

#imputer = SimpleImputer(missing_values = np.nan, strategy = 'median')
#malnutrition_gdp_data[['Rank']] = imputer.fit_transform(malnutrition_gdp_data[['Rank']])
#malnutrition_gdp_data[['Nominal GDP (Billion US$)[7][8]']]= imputer.fit_transform(malnutrition_gdp_data[['Nominal GDP (Billion US$)[7][8]']])
#malnutrition_gdp_data[['Per Capita (US$)[7][8] ']]= imputer.fit_transform(malnutrition_gdp_data[['Per Capita (US$)[7][8] ']])
print("Null values after updating dataset")
print("\n")
print(malnutrition_gdp_data.isnull().sum())