# Multi-Task Gaussian Process

This is a non-parametric model that can be used for multi-task learning. It can be used for both classification and regression tasks.   
https://scikit-learn.org/stable/modules/gaussian_process.html

gaussian_process.GaussianProcessRegressor()   

**_no hyperparameter tuning_**

https://towardsdatascience.com/gaussian-process-kernels-96bafb4dd63e    
https://www.youtube.com/watch?v=QvcHrwXS4_U&ab_channel=JFL

# ---- TODO: Check for hyperparam tuning

In [1]:
# config 'all', 'vif_5' or 'vif_10'
vif = 'all'

In [2]:
from datetime import datetime
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pathlib
import platform
import seaborn as sns
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import DotProduct, WhiteKernel
#https://scikit-learn.org/stable/modules/classes.html#module-sklearn.gaussian_process
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.preprocessing import StandardScaler
from skopt import gp_minimize, space
import sys

from validation import cross_validation
from validation import performance_test_fixed
from validation import performance_test_shifted

date_format = "%Y-%m-%d"

pd.options.display.max_columns = None
pd.options.display.max_rows = None
pd.options.display.max_colwidth = None

In [3]:
my_os = platform.system()
print("OS in my system: ",my_os)

if my_os == "Windows":
    path = str(pathlib.Path().absolute()) + '\\'
    slash = '\\'
else:
    path = str(pathlib.Path().absolute()) + '/'
    slash = '/'

path_3 = path.replace('4_modelling', '3_data_pre-processing')

OS in my system:  Windows


## Load Data

In [4]:
data = pd.read_csv(path_3 + 'data_artifacts' + slash + 'data_set_e_spx_3-' + vif + '.csv', index_col=0)

# ----------------------------------------------------
# FIRST TRY WITH MODEL

In [5]:
multi_target = False

In [6]:
X_head_drop = ['tau_target', 'symbol', 'ric', 'year', 'fam_target_clayton', 'fam_target_frank', 'fam_target_gaussian',
            'fam_target_gumbel', 'fam_target_indep', 'fam_target_joe', 'fam_target_student']
y_head_multi_target = ['tau_target', 'fam_target_clayton', 'fam_target_frank', 'fam_target_gaussian', 'fam_target_gumbel',
                    'fam_target_indep', 'fam_target_joe', 'fam_target_student']
y_head_single_target = ['tau_target']

In [7]:
# train test split
train = data[(data['year'] >= (2000)) & (data['year'] <= (2018))]
test = data[(data['year'] >= (2019)) & (data['year'] <= (2020))]

X_train = train.drop(columns=X_head_drop)
X_test = test.drop(columns=X_head_drop)

if multi_target == False:
    y_train = train[y_head_single_target]
    y_test = test[y_head_single_target]
else:
    y_train = train[y_head_multi_target]
    y_test = test[y_head_multi_target]

In [None]:
kernel = DotProduct()
gp = GaussianProcessRegressor(kernel=kernel, random_state=0, n_restarts_optimizer=5)
gp.fit(X_train, y_train)

# Return the coefficient of determination R^2 of the prediction.
gp.score(X_train, y_train)

In [None]:
gp.get_params()

In [None]:
gp.predict()

You can implement a multi-task Gaussian process (MTGP) in Python using the GPy library. Here's an outline of the steps:
- Install GPy library: You can install GPy using the following command: pip install GPy
- Load data: Load your multi-task data into a numpy array.
- Define task covariance structure: Decide on the covariance structure between tasks (e.g. full, diagonal, or low-rank).
- Define GP model: Define the MTGP model using GPy's gp_multitask_regression or gpmulti class, specifying the covariance structure and data.
- Fit the model: Fit the MTGP model to the data using the .fit() method.
- Predict: Use the .predict() method to make predictions for new data points.

In [None]:
import GPy
import numpy as np

# Load data
X = np.random.rand(100, 1)
Y = np.sin(X) + np.random.randn(100, 1) * 0.05

# Define task covariance structure
task_cov = GPy.kern.RBF(input_dim=1)

# Define GP model
m = GPy.models.gp_multitask_regression([X] * 2, [Y] * 2, task_covariance_structure=task_cov)

# Fit the model
m.optimize()

# Predict
x_new = np.linspace(0, 1, 10)[:, np.newaxis]
y_pred, y_var = m.predict(x_new)


or

https://docs.gpytorch.ai/en/stable/examples/03_Multitask_Exact_GPs/Multitask_GP_Regression.html