# Supervised learning

Supervised ML is the task of learning a function that maps an input to an output based on example input-output pairs. 
Formally, we are given with a set $\mathcal{D}$ consists of (data,labels) pairs:
$$
\mathcal{D} = \{ (x_i , y_i) \}_{i=1}^{m}
$$
where $x_i \in \mathcal{X}$ are the datapoints and $y_i \in \mathcal{Y}$ are the labels. For simplicity, we assume here that the "labels space" $\mathcal{Y}$ is a finite set  $y_i$ that are discrete, univariate variables, i.e., classification settings. 

The goal in supervised learning is to **fit** a function $f : \mathcal{X} \to \mathcal{Y}$ such that $f(x_i) =y_i $ for all $i=1,\dots,m$. 


Traditionaly, the data points $x_i$ are elements of some *vector space*\, meaning that, each point can be expressed using a $p$-tuple (vector) of numbers
$$
x_i = (x_{i1}, x_{i2}, \dots, x_{ip}) ~,
$$
hence we have $f : \mathbb{R}^p \to \mathcal{Y}$. 
However, when working with multi-way data, and time-series data in particular, the situation is different. 

## The the case of longitudinal sampling

To adhere with the empirical results and demonstrations shown in <cite data-footcite="mor2021">Mor et al.</cite>, we restrict the discussion to the case where data points are gathered across multiple timepoints. That is, each sample $x_i$ is in fact, an $n$ by $p$ matrix, where each of the $p$ columns represents a feature and the rows correspond to different timepoints in which the samples were gathered. 




In [None]:
import pandas as pd
import dateutil
import datetime

import numpy as np
import scipy
from scipy.fftpack import dct, idct

from itertools import combinations, product

import seaborn as sns; sns.set_style("ticks")
import matplotlib.pyplot as plt

import sys 
sys.path.append('/home/labs/elinav/uria/mprod/')


from sklearn.pipeline import Pipeline
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.model_selection import StratifiedKFold
from sklearn.inspection import permutation_importance
from sklearn.metrics import plot_roc_curve, auc
from sklearn.preprocessing import RobustScaler, StandardScaler, MinMaxScaler, FunctionTransformer
from sklearn.ensemble import AdaBoostClassifier, RandomForestClassifier, BaggingClassifier, GradientBoostingClassifier
from sklearn.neural_network import MLPClassifier