### Table of Contents

* [Data Sources](#datasources)
* [Data Importing](#dataimporting)
* [Data Cleaning](#datacleaning)
* [Visualizations for global malnutrition problem](#visualizations)
     1. [Histograms](#histogram)
     2. [Boxplots](#boxplot)
     3. [Heatmap](#heatmap)
     4. [US malnutrtion graph](#US)
     5. [Top 5 countries with the LEAST malnutrition problem graphs](#top5)
     6. [Top 5 countries with the MOST malnutrition probelm graphs](#bot5)   
     7. [PCA on Global malnutrition data](#PCA)
     8. [Continent Analysis](#continent)
     9. [Mapping with Geopandas](#map)
     10. [Animation of Map over time](#animation)
* [COVID-19 Analysis](#covid)
     1. [Histogram of COVID-19 death (min-max normalized)](#histcovid)
     2. [Highest % of COVID-19 deaths in the world](#highestdeath)
     3. [Barplot comparing 2020 and 2021](#covidbar)
     4. [Linear regression of malnutrition % and  COVID-19 death %](#covidlinear)
     5. [Creating a scoring system for the countries to compare malnutrition to COVID-19 deathrate](#covidscore)
    
    

## I. Data Sources <a class="anchor" id="datasources"></a>

1. The malnutrtion data is from UNICEF.org and it has the ratio of kids under 5 with stunt/severe stunting, wasted/severe wasted, and overweight problems at country-level from 1970 to 2021.\
(datalink: https://data.unicef.org/resources/data_explorer/unicef_f/?ag=UNICEF&df=GLOBAL_DATAFLOW&ver=1.0&dq=.NT_ANT_HAZ_NE2+NT_ANT_HAZ_NE3..&startPeriod=2016&endPeriod=2021).

2. COVID deaths data from OurWorldInData.org.\
(datalink: https://ourworldindata.org/coronavirus-source-data)

3. Continent & Country Data from Kaggle.\
(datalink: https://www.kaggle.com/statchaitya/country-to-continent) 

The data sources described above will be used to extract relevant data that we will use for our analysis.

## II. Data Importing<a class="anchor" id="dataimporting"></a>

In [1]:
# download necessary packages

# basic packages
import pandas as pd  #dataframe
import numpy as np  #calculation
import scipy as sp #calculation
# regression analysis
from sklearn import linear_model #linear regression
import statsmodels.api as sm #linear regression model
# graphing
import matplotlib.pyplot as plt #plot
import seaborn as sns #plot
from scipy.interpolate import interp1d #interpolation
from seaborn import heatmap #heatmap
# geopandas
import os
#import geopandas as gpd
#import descartes
# statistics
from sklearn.preprocessing import MinMaxScaler, normalize
# avoid the redlines
import warnings
warnings.filterwarnings('ignore')

In [2]:
malnutrition_data = pd.read_excel(os.getcwd()+'/Malnutrition1.xlsx')
display(malnutrition_data[:3])

Unnamed: 0,Country,Year*,Region,World Bank Income Classification,World Bank Region,Severe Wasting,Wasting,Overweight,Stunting,Underweight,U5 Population ('000s)
0,ANGOLA,1996,Africa,Lower Middle Income,Sub-Saharan Africa,1.8,7.7,1.7,61.1,36.2,2749.75
1,ANGOLA,2007,Africa,Lower Middle Income,Sub-Saharan Africa,4.3,8.2,,29.2,15.6,3998.053955
2,ANGOLA,2015,Africa,Lower Middle Income,Sub-Saharan Africa,1.1,4.9,3.4,37.6,19.0,5192.35791


In [3]:
malnutrition_data['World Bank Region'].unique() 

array(['Sub-Saharan Africa', 'Middle East & North Africa'], dtype=object)

In [4]:
gini_data = pd.read_excel(os.getcwd()+'/gini_africa.xlsx')
display(gini_data[:3])

Unnamed: 0,Country_gini,Subregion,Region,Years,WB Gini[4] %,CIA Gini[6] %
0,Algeria,Northern Africa,Africa,2011.0,27.6,27.6
1,Angola,Middle Africa,Africa,2018.0,51.3,51.3
2,Benin,Western Africa,Africa,2018.0,37.8,47.8


In [5]:
gini_data['Subregion'].unique() 

array(['Northern Africa', 'Middle Africa', 'Western Africa',
       'Southern Africa', 'Eastern Africa'], dtype=object)

In [6]:
gdp_data = pd.read_excel(os.getcwd()+'/GDP_africa.xlsx')
display(gdp_data[:3])

Unnamed: 0,Rank,Country_gdp,Nominal GDP (Billion US$)[7][8],Per Capita (US$)[7][8]
0,1,Nigeria,510.588,2355.688
1,2,Egypt,435.621,4162.081
2,3,South Africa,426.166,6979.44


In [7]:
income_per_person = pd.read_csv(os.getcwd()+'/income_per_person_gdppercapita_ppp_inflation_adjusted.csv')
display(income_per_person[:3])

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,...,2031,2032,2033,2034,2035,2036,2037,2038,2039,2040
0,Algeria,715,716,717,718,719,720,721,722,723,...,14300,14600,14900,15200,15500,15800,16100,16500,16800,17100
1,Angola,618,620,623,626,628,631,634,637,640,...,6110,6230,6350,6480,6610,6750,6880,7020,7170,7310
2,Benin,597,597,597,597,597,597,597,597,597,...,3310,3380,3450,3520,3590,3660,3740,3810,3890,3970


## III. Data Cleaning<a class="anchor" id="datacleaning"></a>

#### Malnutrition Dataset

In [8]:
#Cleaning Malnutrition dataset
malnutrition_data.isnull().sum()

Country                              0
Year*                                0
Region                               0
World Bank Income Classification     0
World Bank Region                    0
Severe Wasting                      47
Wasting                              7
Overweight                          45
Stunting                             1
Underweight                          6
U5 Population ('000s)                0
dtype: int64

In [9]:
from sklearn.impute import SimpleImputer
import numpy as np
malnutrition_data= malnutrition_data.dropna(subset = ['Wasting','Stunting', 'Underweight' ])
imputer = SimpleImputer(missing_values = np.nan, strategy = 'median')
malnutrition_data[['Severe Wasting']] = imputer.fit_transform(malnutrition_data[['Severe Wasting']])
malnutrition_data[['Overweight']]= imputer.fit_transform(malnutrition_data[['Overweight']])
print("Null values after updating dataset")
print("\n")
print(malnutrition_data.isnull().sum())

Null values after updating dataset


Country                             0
Year*                               0
Region                              0
World Bank Income Classification    0
World Bank Region                   0
Severe Wasting                      0
Wasting                             0
Overweight                          0
Stunting                            0
Underweight                         0
U5 Population ('000s)               0
dtype: int64


#### Gini Dataset

In [10]:
#cleaning Gini Dataset
gini_data.isnull().sum()

Country_gini     0
Subregion        0
Region           0
Years            3
WB Gini[4] %     3
CIA Gini[6] %    3
dtype: int64

In [11]:
gini_data= gini_data.dropna(subset = ['Years','WB Gini[4] %','CIA Gini[6] %' ])                                      
print("Null values after updating dataset")
print("\n")
print(gini_data.isnull().sum())

Null values after updating dataset


Country_gini     0
Subregion        0
Region           0
Years            0
WB Gini[4] %     0
CIA Gini[6] %    0
dtype: int64


#### Income per person

In [12]:
#cleaning income per person
income_per_person.isnull().sum()

country    0
1800       0
1801       0
1802       0
1803       0
          ..
2036       0
2037       0
2038       0
2039       0
2040       0
Length: 242, dtype: int64

In [13]:
income_per_person = income_per_person.melt(id_vars=['country'], var_name='year', value_name='Income_per_person')

In [14]:
gdp_data.isnull().sum()

Rank                               0
Country_gdp                        0
Nominal GDP (Billion US$)[7][8]    0
Per Capita (US$)[7][8]             0
dtype: int64

#### Concatenating Datasets

In [15]:
#gdp and gini datasets
economic_data = pd.concat([gdp_data,gini_data],axis=1)
economic_data.head()

Unnamed: 0,Rank,Country_gdp,Nominal GDP (Billion US$)[7][8],Per Capita (US$)[7][8],Country_gini,Subregion,Region,Years,WB Gini[4] %,CIA Gini[6] %
0,1,Nigeria,510.588,2355.688,Algeria,Northern Africa,Africa,2011.0,27.6,27.6
1,2,Egypt,435.621,4162.081,Angola,Middle Africa,Africa,2018.0,51.3,51.3
2,3,South Africa,426.166,6979.44,Benin,Western Africa,Africa,2018.0,37.8,47.8
3,4,Algeria,193.601,4294.418,Botswana,Southern Africa,Africa,2015.0,53.3,53.3
4,5,Morocco,133.062,3628.641,Burkina Faso,Western Africa,Africa,2018.0,47.3,35.3


In [16]:
economic_data.isnull().sum()

Rank                               0
Country_gdp                        0
Nominal GDP (Billion US$)[7][8]    0
Per Capita (US$)[7][8]             0
Country_gini                       5
Subregion                          5
Region                             5
Years                              5
WB Gini[4] %                       5
CIA Gini[6] %                      5
dtype: int64

In [17]:
economic_data= economic_data.dropna(subset = ['Country_gini','Subregion','Region','Years','WB Gini[4] %','CIA Gini[6] %' ])
print("Null values after updating dataset")
print("\n")
print(economic_data.isnull().sum())

Null values after updating dataset


Rank                               0
Country_gdp                        0
Nominal GDP (Billion US$)[7][8]    0
Per Capita (US$)[7][8]             0
Country_gini                       0
Subregion                          0
Region                             0
Years                              0
WB Gini[4] %                       0
CIA Gini[6] %                      0
dtype: int64


In [18]:
#malnutrition and gdp datasets
malnutrition_gdp_data = pd.concat([malnutrition_data,gdp_data],axis=1)
malnutrition_gdp_data.head()

Unnamed: 0,Country,Year*,Region,World Bank Income Classification,World Bank Region,Severe Wasting,Wasting,Overweight,Stunting,Underweight,U5 Population ('000s),Rank,Country_gdp,Nominal GDP (Billion US$)[7][8],Per Capita (US$)[7][8]
0,ANGOLA,1996,Africa,Lower Middle Income,Sub-Saharan Africa,1.8,7.7,1.7,61.1,36.2,2749.75,1,Nigeria,510.588,2355.688
1,ANGOLA,2007,Africa,Lower Middle Income,Sub-Saharan Africa,4.3,8.2,3.6,29.2,15.6,3998.053955,2,Egypt,435.621,4162.081
2,ANGOLA,2015,Africa,Lower Middle Income,Sub-Saharan Africa,1.1,4.9,3.4,37.6,19.0,5192.35791,3,South Africa,426.166,6979.44
3,BURUNDI,1987,Africa,Low Income,Sub-Saharan Africa,1.3,6.7,1.3,56.2,33.6,1021.437012,4,Algeria,193.601,4294.418
4,BURUNDI,2000,Africa,Low Income,Sub-Saharan Africa,1.6,8.1,1.5,64.0,39.1,1155.817017,5,Morocco,133.062,3628.641


In [19]:
malnutrition_gdp_data.isnull().sum()

Country                               0
Year*                                 0
Region                                0
World Bank Income Classification      0
World Bank Region                     0
Severe Wasting                        0
Wasting                               0
Overweight                            0
Stunting                              0
Underweight                           0
U5 Population ('000s)                 0
Rank                                324
Country_gdp                         324
Nominal GDP (Billion US$)[7][8]     324
Per Capita (US$)[7][8]              324
dtype: int64

In [20]:
from sklearn.impute import SimpleImputer
import numpy as np
malnutrition_gdp_data= malnutrition_gdp_data.dropna(subset = ['Country_gdp' ])

#imputer = SimpleImputer(missing_values = np.nan, strategy = 'median')
#malnutrition_gdp_data[['Rank']] = imputer.fit_transform(malnutrition_gdp_data[['Rank']])
#malnutrition_gdp_data[['Nominal GDP (Billion US$)[7][8]']]= imputer.fit_transform(malnutrition_gdp_data[['Nominal GDP (Billion US$)[7][8]']])
#malnutrition_gdp_data[['Per Capita (US$)[7][8] ']]= imputer.fit_transform(malnutrition_gdp_data[['Per Capita (US$)[7][8] ']])
print("Null values after updating dataset")
print("\n")
print(malnutrition_gdp_data.isnull().sum())

Null values after updating dataset


Country                             0
Year*                               0
Region                              0
World Bank Income Classification    0
World Bank Region                   0
Severe Wasting                      0
Wasting                             0
Overweight                          0
Stunting                            0
Underweight                         0
U5 Population ('000s)               0
Rank                                0
Country_gdp                         0
Nominal GDP (Billion US$)[7][8]     0
Per Capita (US$)[7][8]              0
dtype: int64
