# Implementing Neural Networks

The World Health Organization (WHO)’s Global Health Observatory (GHO) data repository tracks life expectancy for countries worldwide by following health status and many other related factors.

Although there have been a lot of studies undertaken in the past on factors affecting life expectancy considering demographic variables, income composition, and mortality rates, it was found that the effects of immunization and human development index were not taken into account.

This dataset covers a variety of indicators for all countries from 2000 to 2015 including:

* Immunization factors<br>
* Mortality factors<br>
* Economic factors<br>
* Social factors<br>
* Other health-related factors<br>

Ideally, this data will eventually inform countries concerning which factors to change in order to improve the life expectancy of their populations. If we can predict life expectancy well given all the factors, this is a good sign that there are some important patterns in the data. Life expectancy is expressed in years, and hence it is a number. This means that in order to build a predictive model one needs to use regression.

In [41]:
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import InputLayer
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam

In [12]:
dataset = pd.read_csv('life.csv', index_col=0)
dataset.head().T

Unnamed: 0,0,1,2,3,4
Country,Afghanistan,Afghanistan,Afghanistan,Afghanistan,Afghanistan
Year,2015,2014,2013,2012,2011
Status,Developing,Developing,Developing,Developing,Developing
Adult Mortality,263.0,271.0,268.0,272.0,275.0
infant deaths,62,64,66,69,71
Alcohol,0.01,0.01,0.01,0.01,0.01
percentage expenditure,71.279624,73.523582,73.219243,78.184215,7.097109
Hepatitis B,65.0,62.0,64.0,67.0,68.0
Measles,1154,492,430,2787,3013
BMI,19.1,18.6,18.1,17.6,17.2


In [13]:
dataset.columns

Index(['Country', 'Year', 'Status', 'Adult Mortality', 'infant deaths',
       'Alcohol', 'percentage expenditure', 'Hepatitis B', 'Measles ', ' BMI ',
       'under-five deaths ', 'Polio', 'Total expenditure', 'Diphtheria ',
       ' HIV/AIDS', 'GDP', 'Population', ' thinness  1-19 years',
       ' thinness 5-9 years', 'Income composition of resources', 'Schooling',
       'Life expectancy'],
      dtype='object')

In [11]:
dataset.describe().head().T

Unnamed: 0,count,mean,std,min,25%
Year,2938.0,2007.519,4.613841,2000.0,2004.0
Adult Mortality,2938.0,164.7257,124.0862,1.0,74.0
infant deaths,2938.0,30.30395,117.9265,0.0,0.0
Alcohol,2938.0,4.546875,3.921946,0.01,1.0925
percentage expenditure,2938.0,738.2513,1987.915,0.0,4.685343
Hepatitis B,2938.0,83.02212,22.99698,1.0,82.0
Measles,2938.0,2419.592,11467.27,0.0,0.0
BMI,2938.0,38.38118,19.93537,1.0,19.4
under-five deaths,2938.0,42.03574,160.4455,0.0,0.0
Polio,2938.0,82.61777,23.36717,3.0,78.0


In [14]:
dataset = dataset.drop(['Country'], axis=1)

In [20]:
dataset.shape

(2938, 21)

In [15]:
dataset.columns

Index(['Year', 'Status', 'Adult Mortality', 'infant deaths', 'Alcohol',
       'percentage expenditure', 'Hepatitis B', 'Measles ', ' BMI ',
       'under-five deaths ', 'Polio', 'Total expenditure', 'Diphtheria ',
       ' HIV/AIDS', 'GDP', 'Population', ' thinness  1-19 years',
       ' thinness 5-9 years', 'Income composition of resources', 'Schooling',
       'Life expectancy'],
      dtype='object')

In [22]:
dataset.dtypes

Year                                 int64
Status                              object
Adult Mortality                    float64
infant deaths                        int64
Alcohol                            float64
percentage expenditure             float64
Hepatitis B                        float64
Measles                              int64
 BMI                               float64
under-five deaths                    int64
Polio                              float64
Total expenditure                  float64
Diphtheria                         float64
 HIV/AIDS                          float64
GDP                                float64
Population                         float64
 thinness  1-19 years              float64
 thinness 5-9 years                float64
Income composition of resources    float64
Schooling                          float64
Life expectancy                    float64
dtype: object

##### Train-Test-Split

In [18]:
features = dataset.iloc[:, 0:-1]
labels = dataset.iloc[:, -1]

In [19]:
features.shape

(2938, 20)

In [23]:
features = pd.get_dummies(features)

In [26]:
features.head()

Unnamed: 0,Year,Adult Mortality,infant deaths,Alcohol,percentage expenditure,Hepatitis B,Measles,BMI,under-five deaths,Polio,...,Diphtheria,HIV/AIDS,GDP,Population,thinness 1-19 years,thinness 5-9 years,Income composition of resources,Schooling,Status_Developed,Status_Developing
0,2015,263.0,62,0.01,71.279624,65.0,1154,19.1,83,6.0,...,65.0,0.1,584.25921,33736494.0,17.2,17.3,0.479,10.1,0,1
1,2014,271.0,64,0.01,73.523582,62.0,492,18.6,86,58.0,...,62.0,0.1,612.696514,327582.0,17.5,17.5,0.476,10.0,0,1
2,2013,268.0,66,0.01,73.219243,64.0,430,18.1,89,62.0,...,64.0,0.1,631.744976,31731688.0,17.7,17.7,0.47,9.9,0,1
3,2012,272.0,69,0.01,78.184215,67.0,2787,17.6,93,67.0,...,67.0,0.1,669.959,3696958.0,17.9,18.0,0.463,9.8,0,1
4,2011,275.0,71,0.01,7.097109,68.0,3013,17.2,97,68.0,...,68.0,0.1,63.537231,2978599.0,18.2,18.2,0.454,9.5,0,1


In [28]:
features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size=0.25, random_state=1)

In [30]:
numerical_features = features.select_dtypes(include=['float64', 'int64'])
numerical_columns = numerical_features.columns
 
ct = ColumnTransformer([("only numeric", StandardScaler(), numerical_columns)], remainder='passthrough')

In [31]:
features_train_scaled = ct.fit_transform(features_train)
features_test_scaled = ct.fit_transform(features_test)

In [35]:
my_model = Sequential()
input = InputLayer(input_shape=(features.shape[1], ))

In [36]:
my_model.add(input)

In [38]:
my_model.add(Dense(64, activation = 'relu'))

In [39]:
my_model.add(Dense(1))

In [40]:
my_model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 64)                1408      
                                                                 
 dense_1 (Dense)             (None, 1)                 65        
                                                                 
Total params: 1,473
Trainable params: 1,473
Non-trainable params: 0
_________________________________________________________________


In [42]:
opt = Adam(learning_rate = 0.01)

In [43]:
my_model.compile(loss='mse', metrics=['mae'], optimizer=opt)

In [44]:
my_model.fit(features_train_scaled, labels_train, epochs=40, batch_size=1, verbose=1)

Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40


<keras.callbacks.History at 0x289f8c64640>

In [46]:
res_mse, res_mae = my_model.evaluate(features_test_scaled, labels_test, verbose=0)

In [47]:
print(res_mse, res_mae)

6.806833267211914 1.8508808612823486
