### BUSINESS UNDERSTANDING
Background:

Sustainable Development Goals (SDGs) are a set of global goals adopted by United Nations member states in 2015 to address social, economic, and environmental challenges.

Predicting progress towards SDGs is essential for monitoring development outcomes, guiding policy interventions, and mobilizing resources effectively.


Business Problem:

- Lack of reliable methods to forecast progress towards SDGs hinders informed decision-making and targeted interventions.
- Developing predictive models based on historical data from the Millennium Development Goals (MDGs) era can provide insights into the factors driving successful development outcomes and inform future policy actions.

Business Objectives:

- Develop predictive models to forecast progress towards the Sustainable Development Goals (SDGs) using machine learning techniques.
- Identify key factors and indicators that contribute to successful development outcomes and progress towards the SDGs.
- Provide actionable insights and policy recommendations based on the predictive models to support decision-making and resource allocation strategies.

### DATA UNDERSTANDING

### IMPORTING OF NECESSARY LIBRARIES

In [86]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

### LOADING OF DATASETS

In [87]:
#load dataset from csv
#data = pd.read_csv('dataset/data/africa_millennium_development_goals_xlsx_1.csv')
#data.head(15)

In [88]:
#load dataset from excel file
original_data = pd.read_excel('dataset/original/africa-millennium-development-goals-xlsx-1.xlsx')
original_data.head(15)

Unnamed: 0,CountryName,Country,GoalName,Goal,IndicatorName,Indicator,Social GroupName,Social Group,Units,Scale,Frequency,Date,Value
0,Kenya,25,Goal 4: Reduce child mortality,KN.1000030,Infant mortality rate per 1000 live births,20267305,Women,20136705,,1,A,1990-01-01,123.0
1,Kenya,25,Goal 4: Reduce child mortality,KN.1000030,Infant mortality rate per 1000 live births,20267305,Women,20136705,,1,A,1992-01-01,107.2
2,Kenya,25,Goal 4: Reduce child mortality,KN.1000030,Infant mortality rate per 1000 live births,20267305,Women,20136705,,1,A,1996-01-01,108.9
3,Kenya,25,Goal 4: Reduce child mortality,KN.1000030,Infant mortality rate per 1000 live births,20267305,Women,20136705,,1,A,2000-01-01,110.0
4,Kenya,25,Goal 4: Reduce child mortality,KN.1000030,Infant mortality rate per 1000 live births,20267305,Women,20136705,,1,A,2001-01-01,95.0
5,Kenya,25,Goal 4: Reduce child mortality,KN.1000030,Infant mortality rate per 1000 live births,20267305,Women,20136705,,1,A,2007-01-01,70.0
6,Somalia,44,Goal 5: Improve maternal health,KN.1000040,Contraceptive prevalence rate 15-49,20267705,Women,20136705,,1,A,2000-01-01,11.7
7,Somalia,44,Goal 5: Improve maternal health,KN.1000040,Contraceptive prevalence rate 15-49,20267705,Women,20136705,,1,A,2007-01-01,14.6
8,Somalia,44,Goal 1: Eradicate extreme poverty and hunger,KN.1000000,Prevalence of underweight children under-five ...,20266005,Women,20136705,,1,A,2005-01-01,26.0
9,Somalia,44,Goal 1: Eradicate extreme poverty and hunger,KN.1000000,Prevalence of underweight children under-five ...,20266005,Women,20136705,,1,A,2007-01-01,36.0


In [89]:
#performing data exploration
#print("Dimensions of the dataset:",data.shape)
#print("\nFirst few rows of the dataset:")
#data.head()

In [90]:
#performing data exploration
print("Dimensions of the dataset:",original_data.shape)
print("\nFirst few rows of the dataset:")
original_data.tail()

Dimensions of the dataset: (10610, 13)

First few rows of the dataset:


Unnamed: 0,CountryName,Country,GoalName,Goal,IndicatorName,Indicator,Social GroupName,Social Group,Units,Scale,Frequency,Date,Value
10605,Seychelles,42,Goal 7: Ensure environmental sustainability,KN.1000060,Proportion of population using an improved san...,20270005,Rural,20137005,,1,A,2001-01-01,100.0
10606,Seychelles,42,Goal 7: Ensure environmental sustainability,KN.1000060,Proportion of population using an improved san...,20270005,Rural,20137005,,1,A,2002-01-01,100.0
10607,Seychelles,42,Goal 7: Ensure environmental sustainability,KN.1000060,Proportion of population using an improved san...,20270005,Rural,20137005,,1,A,2003-01-01,100.0
10608,Seychelles,42,Goal 7: Ensure environmental sustainability,KN.1000060,Proportion of population using an improved san...,20270005,Rural,20137005,,1,A,2004-01-01,100.0
10609,Seychelles,42,Goal 7: Ensure environmental sustainability,KN.1000060,Proportion of population using an improved san...,20270005,Rural,20137005,,1,A,2006-01-01,100.0


In [91]:
#descriptive statistics
print("\nSummary statistics:")
original_data.describe()


Summary statistics:


Unnamed: 0,Country,Indicator,Social Group,Units,Scale,Value
count,10610.0,10610.0,10610.0,0.0,10610.0,10610.0
mean,27.962677,20267660.0,20137020.0,,1.0,6365.211
std,15.145405,1734.153,163.0599,,0.0,129894.2
min,1.0,20265000.0,20136600.0,,1.0,0.0
25%,14.0,20266400.0,20137000.0,,1.0,10.5
50%,30.0,20267300.0,20137100.0,,1.0,39.4
75%,39.0,20269300.0,20137100.0,,1.0,73.1
max,53.0,20270700.0,20137100.0,,1.0,4366048.0


### DATA CLEANING & PREPROCESSING

In [92]:
#dropping unnecessary column
#due to the column "units" missing almost all its values
#it is deemed unnecessary and therefore explains 
# Dropping columns with all missing values
columns_to_drop = ['Units','Country','Indicator','Social Group','Frequency']
original_data= original_data.drop(columns=columns_to_drop)
original_data.tail(15)
#original_data.info()

Unnamed: 0,CountryName,GoalName,Goal,IndicatorName,Social GroupName,Scale,Date,Value
10595,Kenya,"Goal 6: Combat HIV/AIDS, malaria and other dis...",KN.1000050,Proportion of population with advanced HIV inf...,Total,1,2006-01-01,26.0
10596,Kenya,"Goal 6: Combat HIV/AIDS, malaria and other dis...",KN.1000050,Proportion of population with advanced HIV inf...,Total,1,2007-01-01,46.0
10597,Gambia,Goal 1: Eradicate extreme poverty and hunger,KN.1000000,Proportion of pop below poverty line (%),Total,1,1990-01-01,31.0
10598,Gambia,Goal 1: Eradicate extreme poverty and hunger,KN.1000000,Proportion of pop below poverty line (%),Total,1,2006-01-01,58.0
10599,Chad,Goal 1: Eradicate extreme poverty and hunger,KN.1000000,Proportion of pop below $1 PPP per day (%),Total,1,1995-01-01,5.4
10600,Chad,Goal 1: Eradicate extreme poverty and hunger,KN.1000000,Proportion of pop below $1 PPP per day (%),Total,1,1996-01-01,5.4
10601,Chad,Goal 1: Eradicate extreme poverty and hunger,KN.1000000,Proportion of pop below $1 PPP per day (%),Total,1,2005-01-01,6.4
10602,Chad,Goal 1: Eradicate extreme poverty and hunger,KN.1000000,Proportion of pop below $1 PPP per day (%),Total,1,2006-01-01,6.4
10603,Seychelles,Goal 7: Ensure environmental sustainability,KN.1000060,Proportion of population using an improved san...,Rural,1,1990-01-01,100.0
10604,Seychelles,Goal 7: Ensure environmental sustainability,KN.1000060,Proportion of population using an improved san...,Rural,1,2000-01-01,100.0


In [93]:
country_name = 'Kenya'
kenyan_data=original_data[original_data['CountryName']==country_name]
kenyan_data

Unnamed: 0,CountryName,GoalName,Goal,IndicatorName,Social GroupName,Scale,Date,Value
0,Kenya,Goal 4: Reduce child mortality,KN.1000030,Infant mortality rate per 1000 live births,Women,1,1990-01-01,123.0
1,Kenya,Goal 4: Reduce child mortality,KN.1000030,Infant mortality rate per 1000 live births,Women,1,1992-01-01,107.2
2,Kenya,Goal 4: Reduce child mortality,KN.1000030,Infant mortality rate per 1000 live births,Women,1,1996-01-01,108.9
3,Kenya,Goal 4: Reduce child mortality,KN.1000030,Infant mortality rate per 1000 live births,Women,1,2000-01-01,110.0
4,Kenya,Goal 4: Reduce child mortality,KN.1000030,Infant mortality rate per 1000 live births,Women,1,2001-01-01,95.0
...,...,...,...,...,...,...,...,...
9803,Kenya,Goal 1: Eradicate extreme poverty and hunger,KN.1000000,Prevalence of underweight children under-five ...,Rural,1,2007-01-01,20.4
10024,Kenya,Goal 1: Eradicate extreme poverty and hunger,KN.1000000,Proportion of pop below $1 PPP per day (%),Urban,1,1994-01-01,29.0
10025,Kenya,Goal 1: Eradicate extreme poverty and hunger,KN.1000000,Proportion of pop below $1 PPP per day (%),Urban,1,1997-01-01,49.0
10595,Kenya,"Goal 6: Combat HIV/AIDS, malaria and other dis...",KN.1000050,Proportion of population with advanced HIV inf...,Total,1,2006-01-01,26.0


In [51]:
#check the column types
#original_data.dtypes

In [42]:
#convert datetime format
original_data['Date'] = pd.to_datetime(original_data['Date'])
#verify conversion
print(original_data.dtypes)
#original_data.head(15)

CountryName                 object
GoalName                    object
Goal                        object
IndicatorName               object
Indicator                    int64
Social GroupName            object
Social Group                 int64
Scale                        int64
Frequency                   object
Date                datetime64[ns]
Value                      float64
dtype: object


In [None]:
necessary_columns = ['CountryName','GoalName']

In [None]:
#renaming columns
original_data.rename(columns={
    'CountryName': 'Country'
    'Goal'
})

### EXPLORATORY DATA ANALYSIS

In [9]:
#check correlation
correlation_matrix = original_data.corr()
sns.heatmap(correlation_matrix,annot=True,cmap='coolwarm')
plt.title("Correlation Matrix")
plt.show()

ValueError: could not convert string to float: 'Kenya'