Problem statement

In the year 2000, the member states of the United Nations agreed to a set of goals to measure the progress of global development. The aim of these goals was to increase standards of living around the world by emphasizing human capital, infrastructure, and human rights.

The UN measures progress towards these goals using indicators such as percent of the population making over one dollar per day. Your task is to predict the change in these indicators one year and five years into the future. Predicting future progress will help us to understand how we achieve these goals by uncovering complex relations between these goals and other economic indicators. The UN set 2015 as the target for measurable progress. Given the data from 1972 - 2007, you need to predict a specific indicator for each of these goals in 2008 and 2012.

Import libraries

In [26]:
#import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

Load and read files

In [27]:
#read files
#Reading train file:
train = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/UN_TrainingSet.csv')
#Reading sample submission file:
sample = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/UN_Submission.csv')

In [28]:
train

Unnamed: 0.1,Unnamed: 0,1972 [YR1972],1973 [YR1973],1974 [YR1974],1975 [YR1975],1976 [YR1976],1977 [YR1977],1978 [YR1978],1979 [YR1979],1980 [YR1980],1981 [YR1981],1982 [YR1982],1983 [YR1983],1984 [YR1984],1985 [YR1985],1986 [YR1986],1987 [YR1987],1988 [YR1988],1989 [YR1989],1990 [YR1990],1991 [YR1991],1992 [YR1992],1993 [YR1993],1994 [YR1994],1995 [YR1995],1996 [YR1996],1997 [YR1997],1998 [YR1998],1999 [YR1999],2000 [YR2000],2001 [YR2001],2002 [YR2002],2003 [YR2003],2004 [YR2004],2005 [YR2005],2006 [YR2006],2007 [YR2007],Country Name,Series Code,Series Name
0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.769214,Afghanistan,allsi.bi_q1,(%) Benefits held by 1st 20% population - All ...
1,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,7.027746,Afghanistan,allsp.bi_q1,(%) Benefits held by 1st 20% population - All ...
2,2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,8.244887,Afghanistan,allsa.bi_q1,(%) Benefits held by 1st 20% population - All ...
3,4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,12.933105,Afghanistan,allsi.gen_pop,(%) Generosity of All Social Insurance
4,5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,18.996814,Afghanistan,allsp.gen_pop,(%) Generosity of All Social Protection
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
195397,286113,,,,,,,,,,,,,,,,,,,,,,,,,,,,12.0,,,,,,,12.2,,Zimbabwe,SG.VAW.BURN.ZS,Women who believe a husband is justified in be...
195398,286114,,,,,,,,,,,,,,,,,,,,,,,,,,,,27.8,,,,,,,33.0,,Zimbabwe,SG.VAW.GOES.ZS,Women who believe a husband is justified in be...
195399,286115,,,,,,,,,,,,,,,,,,,,,,,,,,,,31.2,,,,,,,30.2,,Zimbabwe,SG.VAW.NEGL.ZS,Women who believe a husband is justified in be...
195400,286116,,,,,,,,,,,,,,,,,,,,,,,,,,,,22.3,,,,,,,24.3,,Zimbabwe,SG.VAW.REFU.ZS,Women who believe a husband is justified in be...


Sample

This file is important! It contains the rows that you need to predict and the format for the predictions. Use the IDs in this file to extract the time series to predict from the Training Set.

In [29]:
sample

Unnamed: 0.1,Unnamed: 0,2008 [YR2008],2012 [YR2012]
0,559,,
1,618,,
2,753,,
3,1030,,
4,1896,,
...,...,...,...
732,284474,,
733,285340,,
734,285399,,
735,285534,,


Merge sample with train

In [30]:
Train = pd.merge(train, sample, how='right')
Train

Unnamed: 0.1,Unnamed: 0,1972 [YR1972],1973 [YR1973],1974 [YR1974],1975 [YR1975],1976 [YR1976],1977 [YR1977],1978 [YR1978],1979 [YR1979],1980 [YR1980],1981 [YR1981],1982 [YR1982],1983 [YR1983],1984 [YR1984],1985 [YR1985],1986 [YR1986],1987 [YR1987],1988 [YR1988],1989 [YR1989],1990 [YR1990],1991 [YR1991],1992 [YR1992],1993 [YR1993],1994 [YR1994],1995 [YR1995],1996 [YR1996],1997 [YR1997],1998 [YR1998],1999 [YR1999],2000 [YR2000],2001 [YR2001],2002 [YR2002],2003 [YR2003],2004 [YR2004],2005 [YR2005],2006 [YR2006],2007 [YR2007],Country Name,Series Code,Series Name,2008 [YR2008],2012 [YR2012]
0,559,,,,,,,,,,,,,,,,,,,,0.0480,0.0490,0.0490,0.049000,0.049000,0.084000,0.118000,0.152000,0.187000,0.221000,0.256000,0.291000,0.325000,0.360000,0.395000,0.430000,0.4650,Afghanistan,7.8,Ensure environmental sustainability,,
1,618,,,,,,,,,,,,,,,,,,,0.0000,,,,,,,,,,,0.000047,0.000046,0.000879,0.001058,0.012241,0.021071,0.0190,Afghanistan,8.16,Develop a global partnership for development: ...,,
2,753,0.2960,0.2909,0.2852,0.2798,0.2742,0.2683,0.2624,0.2565,0.2503,0.2439,0.2374,0.2304,0.2229,0.2151,0.2071,0.1993,0.1914,0.1836,0.1762,0.1693,0.1627,0.1571,0.152100,0.147900,0.144600,0.141700,0.139100,0.136600,0.133900,0.131000,0.127700,0.124400,0.121000,0.117700,0.114500,0.1115,Afghanistan,4.1,Reduce child mortality,,
3,1030,,,,,,,,,,,,,,,,,,,0.0010,0.0010,0.0010,0.0010,0.001000,0.001000,0.001000,0.001000,0.001000,0.001000,0.001000,0.001000,0.001000,0.001000,0.001000,0.001000,0.001000,0.0010,Afghanistan,6.1,Combat HIV/AIDS,,
4,1896,,,,,,,,,,,,,,,,,,,,,,,0.964000,0.964000,0.965000,0.965000,0.965000,0.965000,0.965000,0.964000,0.964000,0.963000,0.963000,0.962000,0.962000,0.9610,Albania,7.8,Ensure environmental sustainability,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
732,284474,,,,,,,,,,,,,,,,,,,0.1040,0.1170,0.1290,0.1380,0.146000,0.151000,0.155000,0.156000,0.156000,0.155000,0.153000,0.151000,0.147000,0.144000,0.141000,0.138000,0.136000,0.1340,Zambia,6.1,Combat HIV/AIDS,,
733,285340,,,,,,,,,,,,,,,,,,,0.7920,0.7930,0.7940,0.7940,0.794000,0.795000,0.795000,0.795000,0.795000,0.795000,0.795000,0.796000,0.796000,0.796000,0.796000,0.797000,0.797000,0.7970,Zimbabwe,7.8,Ensure environmental sustainability,,
734,285399,,,,,,,,,,,,,,,,,,,0.0000,,,,0.000017,0.000077,0.000168,0.000331,0.000816,0.001617,0.004014,0.007998,0.039944,0.063948,0.065640,0.080160,0.097918,0.1085,Zimbabwe,8.16,Develop a global partnership for development: ...,,
735,285534,0.1087,0.1083,0.1081,0.1081,0.1080,0.1077,0.1069,0.1053,0.1024,0.0984,0.0935,0.0883,0.0833,0.0789,0.0755,0.0732,0.0720,0.0722,0.0740,0.0771,0.0811,0.0857,0.090100,0.094400,0.098000,0.100600,0.102000,0.102500,0.102300,0.101400,0.100100,0.098500,0.097300,0.096600,0.095800,0.0960,Zimbabwe,4.1,Reduce child mortality,,


In [31]:
train = Train.drop(['2008 [YR2008]', '2012 [YR2012]'], axis=1)
train

Unnamed: 0.1,Unnamed: 0,1972 [YR1972],1973 [YR1973],1974 [YR1974],1975 [YR1975],1976 [YR1976],1977 [YR1977],1978 [YR1978],1979 [YR1979],1980 [YR1980],1981 [YR1981],1982 [YR1982],1983 [YR1983],1984 [YR1984],1985 [YR1985],1986 [YR1986],1987 [YR1987],1988 [YR1988],1989 [YR1989],1990 [YR1990],1991 [YR1991],1992 [YR1992],1993 [YR1993],1994 [YR1994],1995 [YR1995],1996 [YR1996],1997 [YR1997],1998 [YR1998],1999 [YR1999],2000 [YR2000],2001 [YR2001],2002 [YR2002],2003 [YR2003],2004 [YR2004],2005 [YR2005],2006 [YR2006],2007 [YR2007],Country Name,Series Code,Series Name
0,559,,,,,,,,,,,,,,,,,,,,0.0480,0.0490,0.0490,0.049000,0.049000,0.084000,0.118000,0.152000,0.187000,0.221000,0.256000,0.291000,0.325000,0.360000,0.395000,0.430000,0.4650,Afghanistan,7.8,Ensure environmental sustainability
1,618,,,,,,,,,,,,,,,,,,,0.0000,,,,,,,,,,,0.000047,0.000046,0.000879,0.001058,0.012241,0.021071,0.0190,Afghanistan,8.16,Develop a global partnership for development: ...
2,753,0.2960,0.2909,0.2852,0.2798,0.2742,0.2683,0.2624,0.2565,0.2503,0.2439,0.2374,0.2304,0.2229,0.2151,0.2071,0.1993,0.1914,0.1836,0.1762,0.1693,0.1627,0.1571,0.152100,0.147900,0.144600,0.141700,0.139100,0.136600,0.133900,0.131000,0.127700,0.124400,0.121000,0.117700,0.114500,0.1115,Afghanistan,4.1,Reduce child mortality
3,1030,,,,,,,,,,,,,,,,,,,0.0010,0.0010,0.0010,0.0010,0.001000,0.001000,0.001000,0.001000,0.001000,0.001000,0.001000,0.001000,0.001000,0.001000,0.001000,0.001000,0.001000,0.0010,Afghanistan,6.1,Combat HIV/AIDS
4,1896,,,,,,,,,,,,,,,,,,,,,,,0.964000,0.964000,0.965000,0.965000,0.965000,0.965000,0.965000,0.964000,0.964000,0.963000,0.963000,0.962000,0.962000,0.9610,Albania,7.8,Ensure environmental sustainability
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
732,284474,,,,,,,,,,,,,,,,,,,0.1040,0.1170,0.1290,0.1380,0.146000,0.151000,0.155000,0.156000,0.156000,0.155000,0.153000,0.151000,0.147000,0.144000,0.141000,0.138000,0.136000,0.1340,Zambia,6.1,Combat HIV/AIDS
733,285340,,,,,,,,,,,,,,,,,,,0.7920,0.7930,0.7940,0.7940,0.794000,0.795000,0.795000,0.795000,0.795000,0.795000,0.795000,0.796000,0.796000,0.796000,0.796000,0.797000,0.797000,0.7970,Zimbabwe,7.8,Ensure environmental sustainability
734,285399,,,,,,,,,,,,,,,,,,,0.0000,,,,0.000017,0.000077,0.000168,0.000331,0.000816,0.001617,0.004014,0.007998,0.039944,0.063948,0.065640,0.080160,0.097918,0.1085,Zimbabwe,8.16,Develop a global partnership for development: ...
735,285534,0.1087,0.1083,0.1081,0.1081,0.1080,0.1077,0.1069,0.1053,0.1024,0.0984,0.0935,0.0883,0.0833,0.0789,0.0755,0.0732,0.0720,0.0722,0.0740,0.0771,0.0811,0.0857,0.090100,0.094400,0.098000,0.100600,0.102000,0.102500,0.102300,0.101400,0.100100,0.098500,0.097300,0.096600,0.095800,0.0960,Zimbabwe,4.1,Reduce child mortality


Create variables

In [32]:
ID = train['Unnamed: 0']
country_name = train['Country Name']
series_code = train['Series Code']
series_name = train['Series Name']

train = train.drop(['Unnamed: 0','Country Name', 'Series Code', 'Series Name'],axis=1)
train

Unnamed: 0,1972 [YR1972],1973 [YR1973],1974 [YR1974],1975 [YR1975],1976 [YR1976],1977 [YR1977],1978 [YR1978],1979 [YR1979],1980 [YR1980],1981 [YR1981],1982 [YR1982],1983 [YR1983],1984 [YR1984],1985 [YR1985],1986 [YR1986],1987 [YR1987],1988 [YR1988],1989 [YR1989],1990 [YR1990],1991 [YR1991],1992 [YR1992],1993 [YR1993],1994 [YR1994],1995 [YR1995],1996 [YR1996],1997 [YR1997],1998 [YR1998],1999 [YR1999],2000 [YR2000],2001 [YR2001],2002 [YR2002],2003 [YR2003],2004 [YR2004],2005 [YR2005],2006 [YR2006],2007 [YR2007]
0,,,,,,,,,,,,,,,,,,,,0.0480,0.0490,0.0490,0.049000,0.049000,0.084000,0.118000,0.152000,0.187000,0.221000,0.256000,0.291000,0.325000,0.360000,0.395000,0.430000,0.4650
1,,,,,,,,,,,,,,,,,,,0.0000,,,,,,,,,,,0.000047,0.000046,0.000879,0.001058,0.012241,0.021071,0.0190
2,0.2960,0.2909,0.2852,0.2798,0.2742,0.2683,0.2624,0.2565,0.2503,0.2439,0.2374,0.2304,0.2229,0.2151,0.2071,0.1993,0.1914,0.1836,0.1762,0.1693,0.1627,0.1571,0.152100,0.147900,0.144600,0.141700,0.139100,0.136600,0.133900,0.131000,0.127700,0.124400,0.121000,0.117700,0.114500,0.1115
3,,,,,,,,,,,,,,,,,,,0.0010,0.0010,0.0010,0.0010,0.001000,0.001000,0.001000,0.001000,0.001000,0.001000,0.001000,0.001000,0.001000,0.001000,0.001000,0.001000,0.001000,0.0010
4,,,,,,,,,,,,,,,,,,,,,,,0.964000,0.964000,0.965000,0.965000,0.965000,0.965000,0.965000,0.964000,0.964000,0.963000,0.963000,0.962000,0.962000,0.9610
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
732,,,,,,,,,,,,,,,,,,,0.1040,0.1170,0.1290,0.1380,0.146000,0.151000,0.155000,0.156000,0.156000,0.155000,0.153000,0.151000,0.147000,0.144000,0.141000,0.138000,0.136000,0.1340
733,,,,,,,,,,,,,,,,,,,0.7920,0.7930,0.7940,0.7940,0.794000,0.795000,0.795000,0.795000,0.795000,0.795000,0.795000,0.796000,0.796000,0.796000,0.796000,0.797000,0.797000,0.7970
734,,,,,,,,,,,,,,,,,,,0.0000,,,,0.000017,0.000077,0.000168,0.000331,0.000816,0.001617,0.004014,0.007998,0.039944,0.063948,0.065640,0.080160,0.097918,0.1085
735,0.1087,0.1083,0.1081,0.1081,0.1080,0.1077,0.1069,0.1053,0.1024,0.0984,0.0935,0.0883,0.0833,0.0789,0.0755,0.0732,0.0720,0.0722,0.0740,0.0771,0.0811,0.0857,0.090100,0.094400,0.098000,0.100600,0.102000,0.102500,0.102300,0.101400,0.100100,0.098500,0.097300,0.096600,0.095800,0.0960


In [33]:
train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 737 entries, 0 to 736
Data columns (total 36 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   1972 [YR1972]  172 non-null    float64
 1   1973 [YR1973]  170 non-null    float64
 2   1974 [YR1974]  170 non-null    float64
 3   1975 [YR1975]  179 non-null    float64
 4   1976 [YR1976]  186 non-null    float64
 5   1977 [YR1977]  192 non-null    float64
 6   1978 [YR1978]  195 non-null    float64
 7   1979 [YR1979]  200 non-null    float64
 8   1980 [YR1980]  201 non-null    float64
 9   1981 [YR1981]  213 non-null    float64
 10  1982 [YR1982]  219 non-null    float64
 11  1983 [YR1983]  219 non-null    float64
 12  1984 [YR1984]  226 non-null    float64
 13  1985 [YR1985]  229 non-null    float64
 14  1986 [YR1986]  235 non-null    float64
 15  1987 [YR1987]  227 non-null    float64
 16  1988 [YR1988]  230 non-null    float64
 17  1989 [YR1989]  231 non-null    float64
 18  1990 [YR19

In [34]:
train.shape

(737, 36)

Impute missing values

In [35]:
# explicitly require this experimental feature
from sklearn.experimental import enable_iterative_imputer  # noqa
# now you can import normally from sklearn.impute
from sklearn.impute import IterativeImputer

imp = IterativeImputer(random_state=1)

train = imp.fit_transform(train)
train[train < 0] = 0
train.shape

(737, 36)

In [36]:
train

array([[0.        , 0.07481325, 0.        , ..., 0.395     , 0.43      ,
        0.465     ],
       [0.27305771, 0.17597141, 0.22986594, ..., 0.01224148, 0.02107124,
        0.019     ],
       [0.296     , 0.2909    , 0.2852    , ..., 0.1177    , 0.1145    ,
        0.1115    ],
       ...,
       [0.07749425, 0.12067247, 0.14993393, ..., 0.08015978, 0.09791841,
        0.1085    ],
       [0.1087    , 0.1083    , 0.1081    , ..., 0.0966    , 0.0958    ,
        0.096     ],
       [0.29635569, 0.16406746, 0.24595407, ..., 0.184     , 0.173     ,
        0.164     ]])

Define X, y and X_test

In [37]:
#split the dataset in two
# the last column is our label
y_train = train[:,-5:]
#drop last column of data
X_train = train[:, :-5]
#drop first colum of data
X_test = train[:,5:]
# lets have a look on the shape 
print(X_train.shape,y_train.shape,X_test.shape)

(737, 31) (737, 5) (737, 31)


In [38]:
X_train

array([[0.00000000e+00, 7.48132539e-02, 0.00000000e+00, ...,
        2.21000000e-01, 2.56000000e-01, 2.91000000e-01],
       [2.73057708e-01, 1.75971411e-01, 2.29865944e-01, ...,
        5.50137652e-02, 4.72000000e-05, 4.56000000e-05],
       [2.96000000e-01, 2.90900000e-01, 2.85200000e-01, ...,
        1.33900000e-01, 1.31000000e-01, 1.27700000e-01],
       ...,
       [7.74942520e-02, 1.20672465e-01, 1.49933927e-01, ...,
        4.01433500e-03, 7.99846000e-03, 3.99435610e-02],
       [1.08700000e-01, 1.08300000e-01, 1.08100000e-01, ...,
        1.02300000e-01, 1.01400000e-01, 1.00100000e-01],
       [2.96355687e-01, 1.64067458e-01, 2.45954066e-01, ...,
        2.57000000e-01, 2.43000000e-01, 2.28000000e-01]])

In [39]:
y_train

array([[0.325     , 0.36      , 0.395     , 0.43      , 0.465     ],
       [0.00087891, 0.00105809, 0.01224148, 0.02107124, 0.019     ],
       [0.1244    , 0.121     , 0.1177    , 0.1145    , 0.1115    ],
       ...,
       [0.06394787, 0.06564045, 0.08015978, 0.09791841, 0.1085    ],
       [0.0985    , 0.0973    , 0.0966    , 0.0958    , 0.096     ],
       [0.213     , 0.198     , 0.184     , 0.173     , 0.164     ]])

In [40]:
X_test

array([[0.02248484, 0.06614952, 0.04085576, ..., 0.395     , 0.43      ,
        0.465     ],
       [0.05795554, 0.08523337, 0.06270475, ..., 0.01224148, 0.02107124,
        0.019     ],
       [0.2683    , 0.2624    , 0.2565    , ..., 0.1177    , 0.1145    ,
        0.1115    ],
       ...,
       [0.06585548, 0.03505771, 0.00606963, ..., 0.08015978, 0.09791841,
        0.1085    ],
       [0.1077    , 0.1069    , 0.1053    , ..., 0.0966    , 0.0958    ,
        0.096     ],
       [0.29248319, 0.30760464, 0.3216563 , ..., 0.184     , 0.173     ,
        0.164     ]])

Define model

Imstall CatBoost

In [41]:
!pip install catboost



In [42]:
from sklearn.multioutput import MultiOutputRegressor
from catboost import CatBoostRegressor

model = MultiOutputRegressor(CatBoostRegressor(iterations=1000, depth=5, learning_rate=0.1, loss_function='RMSE')).fit(X_train, y_train)
print(model.score(X_train, y_train))

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
1:	learn: 0.3170775	total: 12.4ms	remaining: 6.19s
2:	learn: 0.2892329	total: 17.7ms	remaining: 5.88s
3:	learn: 0.2631985	total: 22.9ms	remaining: 5.71s
4:	learn: 0.2411323	total: 28.1ms	remaining: 5.6s
5:	learn: 0.2198672	total: 33.4ms	remaining: 5.54s
6:	learn: 0.2007718	total: 38.8ms	remaining: 5.5s
7:	learn: 0.1838880	total: 44.2ms	remaining: 5.48s
8:	learn: 0.1685876	total: 49.4ms	remaining: 5.44s
9:	learn: 0.1543477	total: 54.9ms	remaining: 5.43s
10:	learn: 0.1413773	total: 60.2ms	remaining: 5.41s
11:	learn: 0.1305823	total: 65.5ms	remaining: 5.4s
12:	learn: 0.1201431	total: 70.8ms	remaining: 5.37s
13:	learn: 0.1112723	total: 75.9ms	remaining: 5.34s
14:	learn: 0.1032243	total: 81.2ms	remaining: 5.33s
15:	learn: 0.0953084	total: 86.3ms	remaining: 5.3s
16:	learn: 0.0883660	total: 91.6ms	remaining: 5.3s
17:	learn: 0.0819069	total: 97ms	remaining: 5.29s
18:	learn: 0.0763507	total: 102ms	remaining: 5.27s
19:	learn: 0.071

Predict on train set

In [43]:
y_pred = model.predict(X_train)
y_pred

array([[3.24068768e-01, 3.59222923e-01, 3.93938540e-01, 4.29746913e-01,
        4.66198575e-01],
       [4.43176005e-04, 1.09602222e-03, 1.26049176e-02, 2.08304881e-02,
        1.98364503e-02],
       [1.24166400e-01, 1.20086506e-01, 1.15847987e-01, 1.08206895e-01,
        1.05097200e-01],
       ...,
       [6.34322784e-02, 6.47090516e-02, 8.01255309e-02, 9.68656952e-02,
        1.07772423e-01],
       [9.86841859e-02, 9.80663411e-02, 9.68721776e-02, 9.49470271e-02,
        9.48479125e-02],
       [2.13891299e-01, 1.98723532e-01, 1.84920731e-01, 1.74423604e-01,
        1.65516167e-01]])

Predict on test set

In [44]:
prediction = model.predict(X_test)
prediction[prediction < 0] = 0
prediction

array([[0.46327011, 0.47532245, 0.48016062, 0.52545648, 0.53154791],
       [0.03026232, 0.04019702, 0.06155066, 0.09174632, 0.12284057],
       [0.09458579, 0.08359615, 0.0846497 , 0.09178668, 0.08419369],
       ...,
       [0.14684242, 0.19310964, 0.23185903, 0.1992065 , 0.23686053],
       [0.09781427, 0.09026927, 0.08403843, 0.07943116, 0.0738292 ],
       [0.13307629, 0.13938631, 0.13932209, 0.12198894, 0.10591098]])

In [45]:
yr_2008 = prediction[:,0]
yr_2008

array([4.63270105e-01, 3.02623202e-02, 9.45857900e-02, 4.94667609e-03,
       9.53672108e-01, 1.64668725e-01, 2.27544238e-02, 9.57093833e-01,
       8.43054652e-01, 9.10298730e-02, 2.68525818e-02, 6.22195703e-01,
       2.99326094e-03, 5.26399513e-01, 1.10677869e-02, 1.91064390e-01,
       1.60581576e-02, 7.53115458e-01, 9.68202225e-01, 4.62731725e-01,
       1.01306564e-02, 9.70966164e-01, 3.05296441e-01, 1.56562743e-02,
       6.75155926e-03, 9.67199135e-01, 7.13647825e-02, 2.27792834e-02,
       4.94667609e-03, 9.58200272e-01, 3.98413074e-01, 9.48238045e-01,
       5.83121630e-01, 5.49450542e-03, 6.48400596e-01, 4.77277654e-03,
       8.49115606e-01, 8.43515923e-01, 2.39256328e-01, 4.48575679e-02,
       5.57240892e-03, 9.70927611e-01, 2.98820538e-01, 1.37804346e-02,
       3.09231765e-02, 3.89947355e-01, 9.80443557e-03, 8.70963041e-01,
       0.00000000e+00, 6.10224853e-02, 4.94667609e-03, 9.84714615e-01,
       6.87037998e-01, 1.51182241e-02, 1.40283732e-02, 9.80695830e-01,
      

In [46]:
yr_2012 = prediction[:,4]
yr_2012

array([5.31547910e-01, 1.22840574e-01, 8.41936938e-02, 2.33461604e-03,
       9.78082467e-01, 2.19137549e-01, 2.90548589e-02, 9.72901466e-01,
       8.52397764e-01, 1.73758146e-01, 3.97447133e-02, 6.61856251e-01,
       3.89022765e-03, 5.78780312e-01, 1.54161382e-04, 1.75140589e-01,
       1.19026134e-02, 7.65842377e-01, 9.87607010e-01, 6.14391094e-01,
       5.23390694e-03, 9.86262024e-01, 3.59478428e-01, 1.49596916e-02,
       0.00000000e+00, 9.84853820e-01, 1.72002787e-01, 2.24893525e-02,
       2.33461604e-03, 9.68359969e-01, 4.94653971e-01, 9.72616340e-01,
       6.44825409e-01, 5.08504784e-03, 6.69107119e-01, 4.85133484e-03,
       8.43037762e-01, 8.47316736e-01, 3.29671842e-01, 4.17092645e-02,
       9.37089721e-04, 9.89908308e-01, 4.24320725e-01, 1.59003566e-02,
       2.66820057e-02, 5.37435591e-01, 1.03679515e-02, 8.70772935e-01,
       0.00000000e+00, 5.08218119e-02, 2.33461604e-03, 9.92869513e-01,
       8.20413140e-01, 1.55939385e-02, 1.80059752e-03, 9.94728452e-01,
      

Prepare submission

In [47]:
# creating dataframe with required columns 
submission = pd.DataFrame({'Unnamed: 0':ID,'2008 [YR2008]':yr_2008,'2012 [YR2012]': yr_2012 })
# creating csv file from dataframe
submission.to_csv('submission.csv',index = False)
submission

Unnamed: 0.1,Unnamed: 0,2008 [YR2008],2012 [YR2012]
0,559,0.463270,0.531548
1,618,0.030262,0.122841
2,753,0.094586,0.084194
3,1030,0.004947,0.002335
4,1896,0.953672,0.978082
...,...,...,...
732,284474,0.116520,0.117809
733,285340,0.829426,0.835609
734,285399,0.146842,0.236861
735,285534,0.097814,0.073829
