## Gaussian Processes Regression

### Preprocessing the Monthly Aggregated Data.

First we import the necessary packages. We then create the dataset by defing the *create_lag* function and use it to make the dataset we want.



In [1]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt

In [2]:
import os
cwd = os.getcwd()
os.chdir('C:\\Users\\ekoulier\\Desktop\\GGD\\New Data\\Central_Folder')
data = pd.read_csv('br_regions_gt.csv')
data = data[['HVB', 'Date', 'WB', 'BZO', 'Trends']]
os.chdir(cwd)

In [3]:
data.head()

Unnamed: 0,HVB,Date,WB,BZO,Trends
0,29,2004-01,3,20,0
1,26,2004-02,12,11,0
2,43,2004-03,5,23,57
3,22,2004-04,8,44,47
4,41,2004-05,18,39,0


We make the create_lag function in order to create the dataset that contains the time lags.

In [4]:
def create_lag(df, n_lags):
    """
    Manipulates the dataset in order to create time lags for 'HVB', 'WB', 'BZO' and 'Trends'.
    """
    
    assert type(n_lags) == int
 
    for i in ['HVB', 'WB', 'BZO', 'Trends']:
        for j in range(1, n_lags + 1):
            df[i+'-'+str(j)] = df[i].shift(j)
        df[i+'+1'] = df[i].shift(-1)
        
    # We dont need to forecast the google trends    
    del df['Trends+1']
    
    # Due to the shift that creates nans, we delete the first n_lags rows and the last row.
    df = df[n_lags:].reset_index(drop = True)
    df = df[:-1]
    
    return df

In [5]:
data2 = data.copy()
data2 = data2[18:]
data2 = create_lag(data2, 4)

In [6]:
data2['Date'] = pd.to_datetime(data2['Date'], format = '%Y-%m')

In [7]:
data2.tail(n = 5)

Unnamed: 0,HVB,Date,WB,BZO,Trends,HVB-1,HVB-2,HVB-3,HVB-4,HVB+1,...,WB+1,BZO-1,BZO-2,BZO-3,BZO-4,BZO+1,Trends-1,Trends-2,Trends-3,Trends-4
140,59,2017-07-01,17,18,17,80.0,39.0,53.0,61.0,44.0,...,24.0,13.0,21.0,12.0,4.0,12.0,19.0,19.0,23.0,15.0
141,44,2017-08-01,24,12,8,59.0,80.0,39.0,53.0,58.0,...,36.0,18.0,13.0,21.0,12.0,11.0,17.0,19.0,19.0,23.0
142,58,2017-09-01,36,11,19,44.0,59.0,80.0,39.0,27.0,...,22.0,12.0,18.0,13.0,21.0,10.0,8.0,17.0,19.0,19.0
143,27,2017-10-01,22,10,21,58.0,44.0,59.0,80.0,23.0,...,23.0,11.0,12.0,18.0,13.0,10.0,19.0,8.0,17.0,19.0
144,23,2017-11-01,23,10,19,27.0,58.0,44.0,59.0,21.0,...,16.0,10.0,11.0,12.0,18.0,13.0,21.0,19.0,8.0,17.0


The Dataframe is ready. <br/>
Now we use the features X and target y. Apart from that, we use the train_test_split to separate our models. <br/>
We also use the train test split in the X_train and y_train to create a validation set.

### Multi-Layer Perceptron Architectures

#### Using two time lags.

We first test our network using two time lags. We will decide later if we should add more features. <br/>
Before everything we initialize the numpy seed.

In [8]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_absolute_error
from keras.optimizers import Adam
from keras.losses import mean_squared_error


Using TensorFlow backend.


In [9]:
X = data2[['HVB', 'HVB-1', 'HVB-2', 'HVB-3',  
          'WB', 'WB-1', 'WB-2', 'WB-3', 
          'BZO', 'BZO-1', 'BZO-2', 'BZO-3', 
          'Trends', 'Trends-1']]

y = data2[['HVB+1', 'WB+1', 'BZO+1']]

In [12]:
X_T, X_test, y_T, y_test = train_test_split(X, y, test_size = 0.33, shuffle = False)
X_train, X_val, y_train, y_val = train_test_split(X_T, y_T, test_size = 0.15, shuffle = False)

In [13]:
import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels \
    import RBF, WhiteKernel, RationalQuadratic, ExpSineSquared
from sklearn.datasets import fetch_mldata