## MULTIVARIATE FORECASTING

**n - univariate forecasters trained on individual columns**


```
mf = ReducedMultivariateForecaster([
    ("uf1", ARIMA(), [0, 1]), # (<name>, <forecaster>, <column>)
    ("uf2", ExponentialSmoothing(), [3]),
    ("uf3", Theta(), [4])
])
```
**(i)** Into a number of single columns - with each column having a forecaster ex - ARIMA, KNN, Theta, Decision Trees, Exponential Smoothening etc.

**(ii)** The rest columns can be segmented into groups of columns. And those forecasters which support proper handling of multi column / multivariate , for ex - VAR etc. 

Since Deep Learning methods are excluded from this package and included in "sktime-dl", I am not mentioning RNN, LSTM's capability of handling time sequences.     

What remains is -
ML Models such as - Decision Trees, KNN, Linear Regression etc can be trained on them directly.



```
ex - 
A | B | C     A_for | B_for | C_for
1  | 2 | 3      4   |   5   |   6
4 | 5 | 6       7   |   8   |   9
7 | 8 | 9       forecasted values
```



In [None]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from numpy import absolute
from numpy import mean
from numpy import std
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedKFold

### Machine learning algorithms implemented in the scikit-learn that support **multiple outputs** directly are - 

LinearRegression (and related - Multinomial Logistic Regression)

KNeighborsRegressor

DecisionTreeRegressor

RandomForestRegressor (and related)



In [None]:
data = pd.read_csv("/content/drive/MyDrive/Data/for_multi.csv")
print(data.shape)
data.head(4)

(2970, 11)


Unnamed: 0,Time,Temperature,Rel_Humidity,S1,S2,S3,S4,S5,S6,S7,S8
0,3.0,20.94,35.98,312,280,504,568,528,647,578,664
1,3.1,20.94,35.98,310,280,504,568,527,647,578,664
2,3.2,20.94,35.98,311,280,503,567,528,648,577,664
3,3.3,20.94,35.98,310,280,503,567,527,648,578,665


In [None]:
target = data.loc[1:,:]
print(target.shape)
target.head(4)

(2969, 11)


Unnamed: 0,Time,Temperature,Rel_Humidity,S1,S2,S3,S4,S5,S6,S7,S8
1,3.1,20.94,35.98,310,280,504,568,527,647,578,664
2,3.2,20.94,35.98,311,280,503,567,528,648,577,664
3,3.3,20.94,35.98,310,280,503,567,527,648,578,665
4,3.4,20.94,35.98,311,280,503,568,528,647,577,664


In [None]:
data.drop("Time", inplace=True, axis=1)
target.drop("Time", inplace=True, axis=1)
target.reset_index(inplace=True, drop=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


In [None]:
target.tail(4)

Unnamed: 0,Temperature,Rel_Humidity,S1,S2,S3,S4,S5,S6,S7,S8
2965,20.93,35.74,330,290,535,576,598,746,650,772
2966,20.93,35.74,330,290,535,575,598,746,650,772
2967,20.93,35.74,331,290,535,575,598,746,650,772
2968,20.93,35.74,329,290,535,575,598,745,649,772


In [None]:
data.tail(4)

Unnamed: 0,Temperature,Rel_Humidity,S1,S2,S3,S4,S5,S6,S7,S8
2966,20.93,35.74,330,290,535,576,598,746,650,772
2967,20.93,35.74,330,290,535,575,598,746,650,772
2968,20.93,35.74,331,290,535,575,598,746,650,772
2969,20.93,35.74,329,290,535,575,598,745,649,772


In [None]:
train_x = data.loc[:2232,:]
train_y = target.loc[:2232,:]
test_x = data.loc[2233:2968,:]
test_y = target.loc[2233:,:]
print(train_x.shape)
print(train_y.shape)
print(test_x.shape)
print(test_y.shape)

(2233, 10)
(2233, 10)
(736, 10)
(736, 10)


In [None]:
model = LinearRegression()
model.fit(train_x, train_y)
model.predict(test_x)

array([[  20.93003163,   35.76987339,  391.11606391, ...,  916.86981484,
         879.9190154 , 1070.74648586],
       [  20.93002039,   35.80903199,  391.43090865, ...,  915.87869502,
         880.19864366, 1070.91858569],
       [  20.92991744,   35.80877032,  391.36591999, ...,  916.85121145,
         879.04712517, 1070.87584695],
       ...,
       [  20.92983256,   35.73927635,  329.47698949, ...,  745.59491152,
         649.84021533,  771.56228304],
       [  20.92998649,   35.73968148,  329.38672449, ...,  745.61729449,
         649.90537452,  771.57322842],
       [  20.9299877 ,   35.7393875 ,  329.77448198, ...,  745.62321329,
         650.03900879,  771.67045325]])

In [None]:
pred_lin = model.predict(test_x)
pred_lin[0,0]

20.930031629822015

In [None]:
pred_lin[0].shape

(10,)

In [None]:
model.score(test_x,test_y)



0.999843828359222

# Prediction Strategy in Sktime : 


1.   **Direct Method** - It involves developing a separate model for each forecast time step.
for example : 



  ```
  prediction(t+1) = model1(obs(t-1), obs(t-2), ..., obs(t-n))

  prediction(t+2) = model2(obs(t-2), obs(t-3), ..., obs(t-n))
  ```


2.   **Recursive Method** - It involves using a one-step model multiple times where the prediction for the prior time step is used as an input for making a prediction on the following time step.


```
  prediction(t+1) = model(obs(t-1), obs(t-2), ..., obs(t-n))

  prediction(t+2) = model(prediction(t+1), obs(t-1), ..., obs(t-n))
```


3.  **Direct-Recursive Hybrid Strategies** - combined to offer the benefits of both methods.

```
  prediction(t+1) = model1(obs(t-1), obs(t-2), ..., obs(t-n))

  prediction(t+2) = model2(prediction(t+1), obs(t-1), ..., obs(t-n))
```

In [None]:
def recur_multioutput_linreg():
  fh_list = []
  pred = model.predict([data.loc[2969,:].values])
  fh_list.append(pred)

  for i in range(5):
    pred = model.predict(pred)
    fh_list.append(pred)

  return fh_list

recur_multioutput_linreg()

[array([[ 20.93002037,  35.73998474, 328.96733925, 290.42110072,
         534.87471414, 575.50633171, 597.4680537 , 744.60603609,
         648.77555516, 771.45760689]]),
 array([[ 20.92996354,  35.73998728, 328.96176516, 290.67842382,
         534.76621953, 575.88296308, 597.00284001, 744.20259253,
         648.52615316, 770.90947335]]),
 array([[ 20.92985155,  35.73997825, 328.9630901 , 290.82482379,
         534.67138333, 576.16604832, 596.58679025, 743.79302036,
         648.26261008, 770.36289736]]),
 array([[ 20.92970058,  35.73994431, 328.96502891, 290.896587  ,
         534.58610265, 576.38100407, 596.2073335 , 743.37967179,
         647.99093714, 769.82159442]]),
 array([[ 20.92952272,  35.73988024, 328.96545806, 290.9185774 ,
         534.50708974, 576.54587084, 595.8554473 , 742.96422317,
         647.71489896, 769.28755485]]),
 array([[ 20.92932699,  35.73978532, 328.96362814, 290.90778373,
         534.43201479, 576.67354464, 595.52462339, 742.54788085,
         647.4369979

In [None]:
model_k = KNeighborsRegressor()
model_k.fit(train_x, train_y)
model_k.predict(test_x)

array([[  20.93,   35.77,  392.  , ...,  917.2 ,  881.2 , 1071.  ],
       [  20.93,   35.77,  392.  , ...,  917.2 ,  881.2 , 1071.  ],
       [  20.93,   35.77,  392.  , ...,  917.2 ,  881.2 , 1071.  ],
       ...,
       [  20.92,   35.74,  330.6 , ...,  678.2 ,  675.8 ,  791.4 ],
       [  20.92,   35.74,  330.6 , ...,  678.2 ,  675.8 ,  791.4 ],
       [  20.92,   35.74,  330.6 , ...,  678.2 ,  675.8 ,  791.4 ]])

In [None]:
model_k.score(test_x,test_y)



0.8245281486198375

In [None]:
def recur_multioutput_linreg():
  fh_list = []
  pred = model_k.predict([data.loc[2969,:].values])
  fh_list.append(pred)

  for i in range(5):
    pred = model_k.predict(pred)
    fh_list.append(pred)

  return fh_list

recur_multioutput_linreg()

[array([[ 20.92,  35.74, 330.6 , 285.  , 533.2 , 571.2 , 557.4 , 678.2 ,
         675.8 , 791.4 ]]),
 array([[ 20.92,  35.74, 330.8 , 284.8 , 533.2 , 571.2 , 558.  , 678.6 ,
         676.8 , 792.6 ]]),
 array([[ 20.92,  35.74, 331.  , 285.  , 533.6 , 571.4 , 558.4 , 679.  ,
         678.2 , 794.4 ]]),
 array([[ 20.92,  35.74, 331.  , 285.  , 534.  , 571.2 , 559.  , 679.6 ,
         679.8 , 796.2 ]]),
 array([[ 20.92,  35.74, 331.4 , 285.  , 534.4 , 571.2 , 559.4 , 680.4 ,
         681.2 , 798.  ]]),
 array([[ 20.92,  35.74, 331.6 , 285.2 , 535.  , 571.2 , 559.8 , 681.  ,
         682.8 , 800.  ]])]

In [None]:
model_d = DecisionTreeRegressor()
model_d.fit(train_x, train_y)
model_d.predict(test_x)

array([[  20.93,   35.77,  392.  , ...,  917.  ,  880.  , 1071.  ],
       [  20.93,   35.77,  391.  , ...,  917.  ,  880.  , 1071.  ],
       [  20.93,   35.77,  392.  , ...,  917.  ,  880.  , 1071.  ],
       ...,
       [  20.92,   35.74,  330.  , ...,  676.  ,  671.  ,  784.  ],
       [  20.92,   35.74,  330.  , ...,  676.  ,  671.  ,  784.  ],
       [  20.92,   35.74,  330.  , ...,  676.  ,  671.  ,  784.  ]])

In [None]:
model_d.score(test_x,test_y)



0.7873301241121027

In [None]:
def recur_multioutput_linreg():
  fh_list = []
  pred = model_d.predict([data.loc[2969,:].values])
  fh_list.append(pred)

  for i in range(5):
    pred = model_d.predict(pred)
    fh_list.append(pred)

  return fh_list

recur_multioutput_linreg()

[array([[ 20.92,  35.74, 330.  , 285.  , 531.  , 571.  , 555.  , 676.  ,
         671.  , 784.  ]]),
 array([[ 20.92,  35.74, 329.  , 284.  , 531.  , 571.  , 556.  , 677.  ,
         671.  , 786.  ]]),
 array([[ 20.92,  35.74, 330.  , 285.  , 533.  , 571.  , 556.  , 677.  ,
         673.  , 788.  ]]),
 array([[ 20.92,  35.74, 330.  , 285.  , 533.  , 571.  , 557.  , 678.  ,
         674.  , 789.  ]]),
 array([[ 20.92,  35.74, 331.  , 285.  , 533.  , 572.  , 557.  , 678.  ,
         675.  , 791.  ]]),
 array([[ 20.92,  35.74, 330.  , 285.  , 533.  , 571.  , 558.  , 678.  ,
         677.  , 793.  ]])]

In [None]:
model_cross = DecisionTreeRegressor()
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
n_scores = cross_val_score(model_cross, train_x, train_y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1)
n_scores = absolute(n_scores)
# Mean Absolute error
print('MAE: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))

MAE: 0.420 (0.016)


In [None]:
model.predict([data.loc[2969,:].values])

array([[ 20.93002037,  35.73998474, 328.96733925, 290.42110072,
        534.87471414, 575.50633171, 597.4680537 , 744.60603609,
        648.77555516, 771.45760689]])

In [None]:
# Forecasting (LinearRegression Model and can be extended to any model we will be using)
fh_list = []
pred = model.predict([data.loc[2969,:].values])
fh_list.append(pred)

for i in range(5):
  pred = model.predict(pred)
  fh_list.append(pred)

fh_list

[array([[ 20.93002037,  35.73998474, 328.96733925, 290.42110072,
         534.87471414, 575.50633171, 597.4680537 , 744.60603609,
         648.77555516, 771.45760689]]),
 array([[ 20.92996354,  35.73998728, 328.96176516, 290.67842382,
         534.76621953, 575.88296308, 597.00284001, 744.20259253,
         648.52615316, 770.90947335]]),
 array([[ 20.92985155,  35.73997825, 328.9630901 , 290.82482379,
         534.67138333, 576.16604832, 596.58679025, 743.79302036,
         648.26261008, 770.36289736]]),
 array([[ 20.92970058,  35.73994431, 328.96502891, 290.896587  ,
         534.58610265, 576.38100407, 596.2073335 , 743.37967179,
         647.99093714, 769.82159442]]),
 array([[ 20.92952272,  35.73988024, 328.96545806, 290.9185774 ,
         534.50708974, 576.54587084, 595.8554473 , 742.96422317,
         647.71489896, 769.28755485]]),
 array([[ 20.92932699,  35.73978532, 328.96362814, 290.90778373,
         534.43201479, 576.67354464, 595.52462339, 742.54788085,
         647.4369979