# **INTRODUCTION**

## **The Silent Crisis: The Urgent Need to Protect Our Youngest in Africa**
![Child Mortality](images/a_case_study_on_tackling_infants_and.jpeg)


Child and infant mortality rates remain a significant global health challenge, particularly in Africa. Despite progress in recent years, preventable deaths continue to occur, hindering the realization of sustainable development goals.

This project aims to leverage data-driven approaches to identify key factors contributing to child and infant mortality in African countries. By analyzing relevant datasets, we will uncover patterns, correlations, and insights that can inform evidence-based interventions and policies. 

Our goal is to contribute to the development of effective strategies to reduce child and infant mortality rates, ultimately improving the health and well-being of children across the continent. 
erations to come.


#### Dataset Descriptions

 1. **Health Protection Coverage**

| **Field**                          | **Description**                                                  |
|------------------------------------|------------------------------------------------------------------|
| Entity                             | Name of the country or region.                                   |
| Code                               | Country code (ISO 3-letter format).                              |
| Year                               | Year of observation.                                             |
| Share of population covered by health insurance | Percentage of the population covered by health insurance. |

---

2. **Global Vaccination Coverage**

| **Field**                          | **Description**                                                  |
|------------------------------------|------------------------------------------------------------------|
| Entity                             | Name of the country or region.                                   |
| Code                               | Country code (ISO 3-letter format).                              |
| Year                               | Year of observation.                                             |
| BCG (% of one-year-olds immunized) | Percentage of one-year-olds immunized with BCG vaccine.          |
| HepB3 (% of one-year-olds immunized) | Percentage of one-year-olds immunized with Hepatitis B vaccine. |
| Hib3 (% of one-year-olds immunized) | Percentage of one-year-olds immunized with Haemophilus influenzae B. |
| IPV1 (% of one-year-olds immunized) | Percentage of one-year-olds immunized with Inactivated Polio Vaccine. |
| MCV1 (% of one-year-olds immunized) | Percentage of one-year-olds immunized with Measles vaccine (1st dose). |
| PCV3 (% of one-year-olds immunized) | Percentage of one-year-olds immunized with Pneumococcal conjugate vaccine. |
| Pol3 (% of one-year-olds immunized) | Percentage of one-year-olds immunized with Polio vaccine (3rd dose). |
| RCV1 (% of one-year-olds immunized) | Percentage of one-year-olds immunized with Rubella vaccine.      |
| RotaC (% of one-year-olds immunized) | Percentage of one-year-olds immunized with Rotavirus vaccine.    |
| YFV (% of one-year-olds immunized) | Percentage of one-year-olds immunized with Yellow Fever vaccine. |
| DTP3 (% of one-year-olds immunized) | Percentage of one-year-olds immunized with Diphtheria, Tetanus, and Pertussis vaccine. |

---

3. **Births Attended by Health Staff**

| **Field**                          | **Description**                                                  |
|------------------------------------|------------------------------------------------------------------|
| Entity                             | Name of the country or region.                                   |
| Code                               | Country code (ISO 3-letter format).                              |
| Year                               | Year of observation.                                             |
| Births attended by skilled health staff (%) | Percentage of total births attended by skilled health staff. |

---

4. **Maternal Deaths by Region**

| **Field**                          | **Description**                                                  |
|------------------------------------|------------------------------------------------------------------|
| Entity                             | Name of the country or region.                                   |
| Code                               | Country code (ISO 3-letter format).                              |
| Year                               | Year of observation.                                             |
| Estimated maternal deaths          | Estimated number of maternal deaths in the given year.           |

---

5. **Child Mortality by Income Level**

| **Field**                          | **Description**                                                  |
|------------------------------------|------------------------------------------------------------------|
| Entity                             | Name of the country or region.                                   |
| Code                               | Country code (ISO 3-letter format).                              |
| Year                               | Year of observation.                                             |
| Under-five mortality rate          | Number of deaths of children under five years per 100 live births. |

---

6. **Infant Deaths**

| **Field**                          | **Description**                                                  |
|------------------------------------|------------------------------------------------------------------|
| Entity                             | Name of the country or region.                                   |
| Code                               | Country code (ISO 3-letter format).                              |
| Year                               | Year of observation.                                             |
| Deaths - Sex: all - Age: 0         | Total number of infant deaths (age 0) for the given year.        |

---

7. **Youth Mortality Rate**

| **Field**                          | **Description**                                                  |
|------------------------------------|------------------------------------------------------------------|
| Entity                             | Name of the country or region.                                   |
| Code                               | Country code (ISO 3-letter format).                              |
| Year                               | Year of observation.                                             |
| Under-fifteen mortality rate       | Number of deaths per 1,000 live births for children under 15 years. |

---

8. **Causes of Death in Children Under Five**

| **Field**                          | **Description**                                                  |
|------------------------------------|------------------------------------------------------------------|
| IndicatorCode                      | Indicator code for the specific cause of death.                 |
| Indicator                          | Description of the cause of death.                              |
| ValueType                          | Type of the value (e.g., numeric, percentage).                  |
| ParentLocationCode                 | Code for the parent region.                                      |
| ParentLocation                     | Name of the parent region.                                       |
| Location                           | Name of the country.                                             |
| Type                               | Classification type.                                             |
| SpatialDimValueCode                | Spatial dimension code for the country.                         |
| Period                             | Year of data collection.                                         |
| FactValueNumericHigh               | Upper bound of the estimated value.                             |
| FactValueNumericLow                | Lower bound of the estimated value.                             |
| FactValueTranslationID             | Translation ID for value.                                        |
| FactComments                       | Comments or notes related to the fact.                          |
| Language                           | Language of the record.                                          |


# **DATA IMPORT AND CLEANING**

In [2]:
# Importing necessary libraries
%load_ext autoreload
%autoreload 2
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import helpers as hp
# import plotly.express as px
import plotly.io as pio
pio.templates.default = "ggplot2"
plt.style.use('ggplot')

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [3]:
# Load the datasets
health_protection = pd.read_csv('data/health-protection-coverage.csv')
vaccination_coverage = pd.read_csv('data/global-vaccination-coverage.csv')
births_attended = pd.read_csv('data/births-attended-by-health-staff-sdgs.csv')
maternal_deaths = pd.read_csv('data/number-of-maternal-deaths-by-region.csv')
child_mortality = pd.read_csv('data/child-mortality-by-income-level-of-country.csv')
infant_deaths = pd.read_csv('data/number-of-infant-deaths-unwpp.csv')
youth_mortality = pd.read_csv('data/youth-mortality-rate.csv')
causes_of_death = pd.read_csv('data/Distribution of Causes of Death among Children Aged less than 5 years.csv')

In [4]:
# Display the first few rows of health_protection dataset
health_protection.head()

Unnamed: 0,Entity,Code,Year,Share of population covered by health insurance (ILO (2014))
0,Albania,ALB,2008,23.6
1,Algeria,DZA,2005,85.2
2,American Samoa,ASM,2009,3.0
3,Angola,AGO,2010,0.0
4,Antigua and Barbuda,ATG,2007,51.1


In [5]:
# Display the information about the health_protection dataset
health_protection.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 162 entries, 0 to 161
Data columns (total 4 columns):
 #   Column                                                        Non-Null Count  Dtype  
---  ------                                                        --------------  -----  
 0   Entity                                                        162 non-null    object 
 1   Code                                                          162 non-null    object 
 2   Year                                                          162 non-null    int64  
 3   Share of population covered by health insurance (ILO (2014))  162 non-null    float64
dtypes: float64(1), int64(1), object(2)
memory usage: 5.2+ KB


In [6]:
# Function that checks for missing values in a dataset
hp.missing_data(health_protection)

Unnamed: 0,Total,Percent
Entity,0,0.0
Code,0,0.0
Year,0,0.0
Share of population covered by health insurance (ILO (2014)),0,0.0


The health protection dataset has no missing values in the key column `Code`, and the data types are correct. There is no need for data cleaning. 

> **Note:**We will not be dropping NaNs or missing values except when found in the key column needed for merging the datasets. NaNs will be handled after joining the datasets.

In [7]:
# Display the first few rows of vaccination_coverage dataset
vaccination_coverage.head()

Unnamed: 0,Entity,Code,Year,BCG (% of one-year-olds immunized),HepB3 (% of one-year-olds immunized),Hib3 (% of one-year-olds immunized),IPV1 (% of one-year-olds immunized),MCV1 (% of one-year-olds immunized),PCV3 (% of one-year-olds immunized),Pol3 (% of one-year-olds immunized),RCV1 (% of one-year-olds immunized),RotaC (% of one-year-olds immunized),YFV (% of one-year-olds immunized),DTP3 (% of one-year-olds immunized)
0,Afghanistan,AFG,1982,10.0,,,,8.0,,5.0,,,,5.0
1,Afghanistan,AFG,1983,10.0,,,,9.0,,5.0,,,,5.0
2,Afghanistan,AFG,1984,11.0,,,,14.0,,16.0,,,,16.0
3,Afghanistan,AFG,1985,17.0,,,,14.0,,15.0,,,,15.0
4,Afghanistan,AFG,1986,18.0,,,,14.0,,11.0,,,,11.0


In [8]:
# Display the information about the vaccination_coverage dataset
vaccination_coverage.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7897 entries, 0 to 7896
Data columns (total 14 columns):
 #   Column                                Non-Null Count  Dtype  
---  ------                                --------------  -----  
 0   Entity                                7897 non-null   object 
 1   Code                                  7645 non-null   object 
 2   Year                                  7897 non-null   int64  
 3   BCG (% of one-year-olds immunized)    6497 non-null   float64
 4   HepB3 (% of one-year-olds immunized)  4436 non-null   float64
 5   Hib3 (% of one-year-olds immunized)   3629 non-null   float64
 6   IPV1 (% of one-year-olds immunized)   1297 non-null   float64
 7   MCV1 (% of one-year-olds immunized)   7728 non-null   float64
 8   PCV3 (% of one-year-olds immunized)   1497 non-null   float64
 9   Pol3 (% of one-year-olds immunized)   7855 non-null   float64
 10  RCV1 (% of one-year-olds immunized)   4198 non-null   float64
 11  RotaC (% of one-y

In [9]:
# Check for missing values in the vaccination_coverage dataset
hp.missing_data(vaccination_coverage)

Unnamed: 0,Total,Percent
YFV (% of one-year-olds immunized),7073,89.565658
RotaC (% of one-year-olds immunized),6858,86.843105
IPV1 (% of one-year-olds immunized),6600,83.576042
PCV3 (% of one-year-olds immunized),6400,81.043434
Hib3 (% of one-year-olds immunized),4268,54.04584
RCV1 (% of one-year-olds immunized),3699,46.840572
HepB3 (% of one-year-olds immunized),3461,43.82677
BCG (% of one-year-olds immunized),1400,17.728251
Code,252,3.191085
MCV1 (% of one-year-olds immunized),169,2.140053


There are missing values or NaNs in the key column `Code`. We will drop these rows before proceeding to the  next datasets.

In [10]:
# Drop rows with missing values in the key column 'Code'
vaccination_coverage = vaccination_coverage.dropna(subset=['Code'])
# Check if the missing values have been removed
hp.missing_data(vaccination_coverage)

Unnamed: 0,Total,Percent
YFV (% of one-year-olds immunized),6896,90.202747
RotaC (% of one-year-olds immunized),6702,87.665141
IPV1 (% of one-year-olds immunized),6390,83.584042
PCV3 (% of one-year-olds immunized),6232,81.517332
Hib3 (% of one-year-olds immunized),4208,55.042511
RCV1 (% of one-year-olds immunized),3699,48.384565
HepB3 (% of one-year-olds immunized),3431,44.879006
BCG (% of one-year-olds immunized),1400,18.312623
MCV1 (% of one-year-olds immunized),169,2.210595
DTP3 (% of one-year-olds immunized),43,0.562459


The missing values in the key column `Code` have been removed. We will proceed to the next dataset.

In [11]:
# Display the first few rows of births_attended dataset
births_attended.head()

Unnamed: 0,Entity,Code,Year,Births attended by skilled health staff (% of total)
0,Afghanistan,AFG,2000,12.4
1,Afghanistan,AFG,2003,14.3
2,Afghanistan,AFG,2006,18.9
3,Afghanistan,AFG,2008,24.0
4,Afghanistan,AFG,2010,34.3


In [12]:
# Display the information about the births_attended dataset
births_attended.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2985 entries, 0 to 2984
Data columns (total 4 columns):
 #   Column                                                Non-Null Count  Dtype  
---  ------                                                --------------  -----  
 0   Entity                                                2985 non-null   object 
 1   Code                                                  2943 non-null   object 
 2   Year                                                  2985 non-null   int64  
 3   Births attended by skilled health staff (% of total)  2985 non-null   float64
dtypes: float64(1), int64(1), object(2)
memory usage: 93.4+ KB


In [13]:
# Check for missing values in the births_attended dataset
hp.missing_data(births_attended)

Unnamed: 0,Total,Percent
Code,42,1.407035
Entity,0,0.0
Year,0,0.0
Births attended by skilled health staff (% of total),0,0.0


In [14]:
# Drop rows with missing values in the key column 'Code'
births_attended = births_attended.dropna(subset=['Code'])
# Check if the missing values have been removed
hp.missing_data(births_attended)

Unnamed: 0,Total,Percent
Entity,0,0.0
Code,0,0.0
Year,0,0.0
Births attended by skilled health staff (% of total),0,0.0


In [15]:
# Display the first few rows of maternal_deaths dataset
maternal_deaths.head()

Unnamed: 0,Entity,Code,Year,Estimated maternal deaths,959828-annotations
0,Afghanistan,AFG,1985,10258.534,
1,Afghanistan,AFG,1986,8671.921,
2,Afghanistan,AFG,1987,8488.96,
3,Afghanistan,AFG,1988,7522.1216,
4,Afghanistan,AFG,1989,7549.705,


In [16]:
# Display the information about the maternal_deaths dataset
maternal_deaths.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7056 entries, 0 to 7055
Data columns (total 5 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Entity                     7056 non-null   object 
 1   Code                       6696 non-null   object 
 2   Year                       7056 non-null   int64  
 3   Estimated maternal deaths  7056 non-null   float64
 4   959828-annotations         36 non-null     object 
dtypes: float64(1), int64(1), object(3)
memory usage: 275.8+ KB


In [17]:
# Check for missing values in the maternal_deaths dataset
hp.missing_data(maternal_deaths)

Unnamed: 0,Total,Percent
959828-annotations,7020,99.489796
Code,360,5.102041
Entity,0,0.0
Year,0,0.0
Estimated maternal deaths,0,0.0


In [18]:
# Drop rows with missing values in the key column 'Code'
maternal_deaths = maternal_deaths.dropna(subset=['Code'])
# Drop the '959828-annotations' column
maternal_deaths.drop(columns=['959828-annotations'], inplace=True)
# Check if the missing values have been removed
hp.missing_data(maternal_deaths)

Unnamed: 0,Total,Percent
Entity,0,0.0
Code,0,0.0
Year,0,0.0
Estimated maternal deaths,0,0.0


In [19]:
# Display the first few rows of child_mortality dataset
child_mortality.head()

Unnamed: 0,Entity,Code,Year,Observation value - Indicator: Under-five mortality rate - Sex: Total - Wealth quintile: Total - Unit of measure: Deaths per 100 live births
0,Afghanistan,AFG,1957,37.245758
1,Afghanistan,AFG,1958,36.626625
2,Afghanistan,AFG,1959,36.04348
3,Afghanistan,AFG,1960,35.45985
4,Afghanistan,AFG,1961,34.89488


In [20]:
# Display the information about the child_mortality dataset
child_mortality.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14200 entries, 0 to 14199
Data columns (total 4 columns):
 #   Column                                                                                                                                        Non-Null Count  Dtype  
---  ------                                                                                                                                        --------------  -----  
 0   Entity                                                                                                                                        14200 non-null  object 
 1   Code                                                                                                                                          12842 non-null  object 
 2   Year                                                                                                                                          14200 non-null  int64  
 3   Observation value - Indicator: U

In [21]:
# Check for missing values in the child_mortality dataset
hp.missing_data(child_mortality)

Unnamed: 0,Total,Percent
Code,1358,9.56338
Entity,0,0.0
Year,0,0.0
Observation value - Indicator: Under-five mortality rate - Sex: Total - Wealth quintile: Total - Unit of measure: Deaths per 100 live births,0,0.0


In [22]:
# Drop rows with missing values in the key column 'Code'
child_mortality = child_mortality.dropna(subset=['Code'])
# Check if the missing values have been removed
hp.missing_data(child_mortality)

Unnamed: 0,Total,Percent
Entity,0,0.0
Code,0,0.0
Year,0,0.0
Observation value - Indicator: Under-five mortality rate - Sex: Total - Wealth quintile: Total - Unit of measure: Deaths per 100 live births,0,0.0


In [23]:
# Display the first few rows of infant_deaths dataset
infant_deaths.head()

Unnamed: 0,Entity,Code,Year,Deaths - Sex: all - Age: 0 - Variant: estimates
0,Afghanistan,AFG,1950,109220.0
1,Afghanistan,AFG,1951,107971.0
2,Afghanistan,AFG,1952,108140.0
3,Afghanistan,AFG,1953,108248.0
4,Afghanistan,AFG,1954,108241.0


In [24]:
# Display the information about the infant_deaths dataset
infant_deaths.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18944 entries, 0 to 18943
Data columns (total 4 columns):
 #   Column                                           Non-Null Count  Dtype  
---  ------                                           --------------  -----  
 0   Entity                                           18944 non-null  object 
 1   Code                                             17612 non-null  object 
 2   Year                                             18944 non-null  int64  
 3   Deaths - Sex: all - Age: 0 - Variant: estimates  18944 non-null  float64
dtypes: float64(1), int64(1), object(2)
memory usage: 592.1+ KB


In [25]:
# Check for missing values in the infant_deaths dataset
hp.missing_data(infant_deaths)

Unnamed: 0,Total,Percent
Code,1332,7.03125
Entity,0,0.0
Year,0,0.0
Deaths - Sex: all - Age: 0 - Variant: estimates,0,0.0


In [26]:
# Drop rows with missing values in the key column 'Code'
infant_deaths = infant_deaths.dropna(subset=['Code'])
# Check if the missing values have been removed
hp.missing_data(infant_deaths)

Unnamed: 0,Total,Percent
Entity,0,0.0
Code,0,0.0
Year,0,0.0
Deaths - Sex: all - Age: 0 - Variant: estimates,0,0.0


In [27]:
# Display the first few rows of youth_mortality dataset
youth_mortality.head()

Unnamed: 0,Entity,Code,Year,Under-fifteen mortality rate
0,Afghanistan,AFG,1977,30.110573
1,Afghanistan,AFG,1978,29.290777
2,Afghanistan,AFG,1979,28.47901
3,Afghanistan,AFG,1980,27.649078
4,Afghanistan,AFG,1981,26.834482


In [28]:
# Display the information about the youth_mortality dataset
youth_mortality.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10515 entries, 0 to 10514
Data columns (total 4 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Entity                        10515 non-null  object 
 1   Code                          9492 non-null   object 
 2   Year                          10515 non-null  int64  
 3   Under-fifteen mortality rate  10515 non-null  float64
dtypes: float64(1), int64(1), object(2)
memory usage: 328.7+ KB


In [29]:
# Check for missing values in the youth_mortality dataset
hp.missing_data(youth_mortality)

Unnamed: 0,Total,Percent
Code,1023,9.728959
Entity,0,0.0
Year,0,0.0
Under-fifteen mortality rate,0,0.0


In [30]:
# Drop rows with missing values in the key column 'Code'
youth_mortality = youth_mortality.dropna(subset=['Code'])
# Check if the missing values have been removed
hp.missing_data(youth_mortality)

Unnamed: 0,Total,Percent
Entity,0,0.0
Code,0,0.0
Year,0,0.0
Under-fifteen mortality rate,0,0.0


In [31]:
# Display the first few rows of causes_of_death dataset
causes_of_death.head()

Unnamed: 0,IndicatorCode,Indicator,ValueType,ParentLocationCode,ParentLocation,Location type,SpatialDimValueCode,Location,Period type,Period,...,FactValueUoM,FactValueNumericLowPrefix,FactValueNumericLow,FactValueNumericHighPrefix,FactValueNumericHigh,Value,FactValueTranslationID,FactComments,Language,DateModified
0,MORT_300,Distribution of causes of death among children...,numeric,EMR,Eastern Mediterranean,Country,AFG,Afghanistan,Year,2017,...,,,,,,0.0,,,EN,2018-11-26T23:00:00.000Z
1,MORT_300,Distribution of causes of death among children...,numeric,EMR,Eastern Mediterranean,Country,AFG,Afghanistan,Year,2017,...,,,,,,0.0,,,EN,2018-11-26T23:00:00.000Z
2,MORT_300,Distribution of causes of death among children...,numeric,EMR,Eastern Mediterranean,Country,AFG,Afghanistan,Year,2017,...,,,,,,0.0,,,EN,2018-11-26T23:00:00.000Z
3,MORT_300,Distribution of causes of death among children...,numeric,EMR,Eastern Mediterranean,Country,AFG,Afghanistan,Year,2017,...,,,,,,0.0,,,EN,2018-11-26T23:00:00.000Z
4,MORT_300,Distribution of causes of death among children...,numeric,EMR,Eastern Mediterranean,Country,AFG,Afghanistan,Year,2017,...,,,,,,0.0,,,EN,2018-11-26T23:00:00.000Z


In [32]:
# Display the information about the causes_of_death dataset
causes_of_death.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 146664 entries, 0 to 146663
Data columns (total 34 columns):
 #   Column                      Non-Null Count   Dtype  
---  ------                      --------------   -----  
 0   IndicatorCode               146664 non-null  object 
 1   Indicator                   146664 non-null  object 
 2   ValueType                   146664 non-null  object 
 3   ParentLocationCode          146664 non-null  object 
 4   ParentLocation              146664 non-null  object 
 5   Location type               146664 non-null  object 
 6   SpatialDimValueCode         146664 non-null  object 
 7   Location                    146664 non-null  object 
 8   Period type                 146664 non-null  object 
 9   Period                      146664 non-null  int64  
 10  IsLatestYear                146664 non-null  bool   
 11  Dim1 type                   146664 non-null  object 
 12  Dim1                        146664 non-null  object 
 13  Dim1ValueCode 

In [33]:
# Check for missing values in the causes_of_death dataset
hp.missing_data(causes_of_death)

Unnamed: 0,Total,Percent
FactValueNumericLowPrefix,146664,100.0
Dim3 type,146664,100.0
FactValueNumericLow,146664,100.0
DataSourceDimValueCode,146664,100.0
FactValueUoM,146664,100.0
FactValueNumericPrefix,146664,100.0
FactValueTranslationID,146664,100.0
FactValueNumericHigh,146664,100.0
FactValueNumericHighPrefix,146664,100.0
FactComments,146664,100.0


In [34]:
# Drop rows with missing values in the dataset
cols_to_drop = ['DataSourceDimValueCode',
                 'Dim3',
                'DataSource',
                 'Dim3 type',
                 'Dim3ValueCode',
                 'FactComments',
                 'FactValueNumericHigh',
                 'FactValueNumericHighPrefix',
                 'FactValueNumericLow',
                 'FactValueNumericLowPrefix',
                 'FactValueNumericPrefix',
                 'FactValueTranslationID',
                 'FactValueUoM']
# Drop the columns with missing values
causes_of_death.drop(columns=cols_to_drop, inplace=True)
# Check if the missing values have been removed
hp.missing_data(causes_of_death)

Unnamed: 0,Total,Percent
IndicatorCode,0,0.0
Indicator,0,0.0
ValueType,0,0.0
ParentLocationCode,0,0.0
ParentLocation,0,0.0
Location type,0,0.0
SpatialDimValueCode,0,0.0
Location,0,0.0
Period type,0,0.0
Period,0,0.0


In [39]:
# Merge the datasets
merged_data = health_protection.merge(vaccination_coverage, on=['Code', 'Year'], how='inner', suffixes=('_health', '_vacc'))
merged_data = merged_data.merge(births_attended, on=['Code', 'Year'], how='inner', suffixes=('', '_births'))
merged_data = merged_data.merge(maternal_deaths, on=['Code', 'Year'], how='inner', suffixes=('', '_maternal'))
merged_data = merged_data.merge(child_mortality, on=['Code', 'Year'], how='inner', suffixes=('', '_child'))
merged_data = merged_data.merge(infant_deaths, on=['Code', 'Year'], how='inner', suffixes=('', '_infant'))
merged_data = merged_data.merge(youth_mortality, on=['Code', 'Year'], how='inner', suffixes=('', '_youth'))
merged_data = pd.merge(merged_data, causes_of_death, left_on='Code', right_on='SpatialDimValueCode', how='inner')
new_merge.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 75600 entries, 0 to 75599
Data columns (total 47 columns):
 #   Column                                                                                                                                        Non-Null Count  Dtype  
---  ------                                                                                                                                        --------------  -----  
 0   Entity_health                                                                                                                                 75600 non-null  object 
 1   Code                                                                                                                                          75600 non-null  object 
 2   Year                                                                                                                                          75600 non-null  int64  
 3   Share of population covered by 

In [40]:
# Display the first few rows of the merged_data
merged_data.head()

Unnamed: 0,Entity_health,Code,Year,Share of population covered by health insurance (ILO (2014)),Entity_vacc,BCG (% of one-year-olds immunized),HepB3 (% of one-year-olds immunized),Hib3 (% of one-year-olds immunized),IPV1 (% of one-year-olds immunized),MCV1 (% of one-year-olds immunized),...,Dim1 type,Dim1,Dim1ValueCode,Dim2 type,Dim2,Dim2ValueCode,FactValueNumeric,Value,Language,DateModified
0,Antigua and Barbuda,ATG,2007,51.1,Antigua and Barbuda,,97.0,99.0,,99.0,...,Age Group,0-27 days,AGEGROUP_DAYS0-27,Cause of death,Sepsis and other infectious conditions of the ...,CHILDCAUSE_CH12,0.038,0.0,EN,2018-11-26T23:00:00.000Z
1,Antigua and Barbuda,ATG,2007,51.1,Antigua and Barbuda,,97.0,99.0,,99.0,...,Age Group,0-27 days,AGEGROUP_DAYS0-27,Cause of death,Other noncommunicable diseases,CHILDCAUSE_CH16,0.0,0.0,EN,2018-11-26T23:00:00.000Z
2,Antigua and Barbuda,ATG,2007,51.1,Antigua and Barbuda,,97.0,99.0,,99.0,...,Age Group,0-27 days,AGEGROUP_DAYS0-27,Cause of death,Injuries,CHILDCAUSE_CH17,0.0,0.0,EN,2018-11-26T23:00:00.000Z
3,Antigua and Barbuda,ATG,2007,51.1,Antigua and Barbuda,,97.0,99.0,,99.0,...,Age Group,0-27 days,AGEGROUP_DAYS0-27,Cause of death,HIV/AIDS,CHILDCAUSE_CH2,0.0,0.0,EN,2018-11-26T23:00:00.000Z
4,Antigua and Barbuda,ATG,2007,51.1,Antigua and Barbuda,,97.0,99.0,,99.0,...,Age Group,0-27 days,AGEGROUP_DAYS0-27,Cause of death,Diarrhoeal diseases,CHILDCAUSE_CH3,0.0,0.0,EN,2018-11-26T23:00:00.000Z


### OPTIONAL: Data Cleaning and Wrangling Function

In [42]:
# Wrangle complete dataset
df = hp.wrangle_health_data()

Loaded health_protection: 162 rows
Loaded vaccination_coverage: 7897 rows
Loaded births_attended: 2985 rows
Loaded maternal_deaths: 7056 rows
Loaded child_mortality: 14200 rows
Loaded infant_deaths: 18944 rows
Loaded youth_mortality: 10515 rows
Loaded causes_of_death: 146664 rows

Merge Statistics:
Final merged dataset: 75600 rows


We will proceed to write a wrangle function that handles the above data cleaning steps and merging for all the datasets.

In [37]:
# Check for missing values in the merged_data
hp.missing_data(new_merge)

Unnamed: 0,Total,Percent
IPV1 (% of one-year-olds immunized),75600,100.0
RotaC (% of one-year-olds immunized),66528,88.0
YFV (% of one-year-olds immunized),63504,84.0
PCV3 (% of one-year-olds immunized),59724,79.0
Hib3 (% of one-year-olds immunized),20412,27.0
BCG (% of one-year-olds immunized),16632,22.0
RCV1 (% of one-year-olds immunized),15120,20.0
HepB3 (% of one-year-olds immunized),9072,12.0
Share of population covered by health insurance (ILO (2014)),0,0.0
Code,0,0.0


SyntaxError: incomplete input (3131297869.py, line 3)

# EXPLORATORY DATA ANALYSIS

# MACHINE LEARNING (OPTIONAL)

# RESULTS AND INSIHGTS

# CONCLUSION
