### IMPORTS

In [45]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import math
import os


from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.datasets import load_iris
from sklearn import svm
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier

import warnings
warnings.filterwarnings('ignore')

### World Happiness Dataset 
Following is the raw dataset for world happiness index as downloaded from WHR. It contains multiple rows for one country for roughly 5-7 years. So we will first clean the dataset and use only what we need

In [46]:
world_happiness_df = pd.read_excel('./Datasets/Raw_Datasets/world_happiness/DataPanelWHR2021C2_1.xls',header=0)

In [47]:
display(world_happiness_df.head(15))

Unnamed: 0,Country name,year,Life Ladder,Log GDP per capita,Social support,Healthy life expectancy at birth,Freedom to make life choices,Generosity,Perceptions of corruption,Positive affect,Negative affect
0,Afghanistan,2008,3.72359,7.3701,0.450662,50.799999,0.718114,0.16764,0.881686,0.517637,0.258195
1,Afghanistan,2009,4.401778,7.539972,0.552308,51.200001,0.678896,0.190099,0.850035,0.583926,0.237092
2,Afghanistan,2010,4.758381,7.646709,0.539075,51.599998,0.600127,0.12059,0.706766,0.618265,0.275324
3,Afghanistan,2011,3.831719,7.619532,0.521104,51.919998,0.495901,0.162427,0.731109,0.611387,0.267175
4,Afghanistan,2012,3.782938,7.705479,0.520637,52.240002,0.530935,0.236032,0.77562,0.710385,0.267919
5,Afghanistan,2013,3.5721,7.725029,0.483552,52.560001,0.577955,0.061148,0.823204,0.620585,0.273328
6,Afghanistan,2014,3.130896,7.718354,0.525568,52.880001,0.508514,0.104013,0.871242,0.531691,0.374861
7,Afghanistan,2015,3.982855,7.701992,0.528597,53.200001,0.388928,0.079864,0.880638,0.553553,0.339276
8,Afghanistan,2016,4.220169,7.69656,0.559072,53.0,0.522566,0.042265,0.793246,0.564953,0.348332
9,Afghanistan,2017,2.661718,7.697381,0.49088,52.799999,0.427011,-0.121303,0.954393,0.496349,0.371326


We see that a couple columns have null/empty values. Seems like we got some data cleaning do to. A plus point is that all our all our columns (except year) are float types. That will make it easy to fill in the null values

### Let's observe the null values for each column in the dataset focused on year 2015

In [48]:
world_happiness_df.isnull().sum(axis = 0)

Country name                          0
year                                  0
Life Ladder                           0
Log GDP per capita                   36
Social support                       13
Healthy life expectancy at birth     55
Freedom to make life choices         32
Generosity                           89
Perceptions of corruption           110
Positive affect                      22
Negative affect                      16
dtype: int64

##### It looks like our data has a lot of null values 

### Data Clean

In [49]:
world_happiness_df = world_happiness_df.dropna()
world_happiness_df.isnull().sum(axis = 0)

Country name                        0
year                                0
Life Ladder                         0
Log GDP per capita                  0
Social support                      0
Healthy life expectancy at birth    0
Freedom to make life choices        0
Generosity                          0
Perceptions of corruption           0
Positive affect                     0
Negative affect                     0
dtype: int64

In [53]:
path = os.getcwd()
print(path)
%run Scripts/data_cleaning.py

/home/ahsan/Semesters/8. Fall_2021/CSC460_DataScience/Project
/home/ahsan/Semesters/8. Fall_2021/CSC460_DataScience/Project
  Country name  year  Life Ladder  Log GDP per capita  Social support  \
0  Afghanistan  2008     3.723590            7.370100        0.450662   
1  Afghanistan  2009     4.401778            7.539972        0.552308   
2  Afghanistan  2010     4.758381            7.646709        0.539075   
3  Afghanistan  2011     3.831719            7.619532        0.521104   
4  Afghanistan  2012     3.782938            7.705479        0.520637   

   Healthy life expectancy at birth  Freedom to make life choices  Generosity  \
0                         50.799999                      0.718114    0.167640   
1                         51.200001                      0.678896    0.190099   
2                         51.599998                      0.600127    0.120590   
3                         51.919998                      0.495901    0.162427   
4                         52.240