# Multioutput regression 
These are regression problems that involve predicting two or more numerical values given an input example.

An example might be to predict a coordinate given an input, e.g. predicting x and y values. Another example would be multi-step time series forecasting that involves predicting multiple future time series of a given variable.

This problem statement will help us to learn:   
1) The problem of multioutput regression in machine learning.   
2) How to develop machine learning models that inherently support multiple-output regression.   
3) How to develop wrapper models that allow algorithms that do not inherently support multiple outputs to be used for multiple-output regression

References:https://machinelearningmastery.com/multi-output-regression-models-with-python/

**Problem of Multioutput Regression:**   
Multioutput Regression Test Problem    

**Inherently Multioutput Regression Algorithms:**    
Linear Regression for Multioutput Regression   
k-Nearest Neighbors for Multioutput Regression   
Random Forest for Multioutput Regression   
Evaluate Multioutput Regression With Cross-Validation    

**Wrapper Multioutput Regression Algorithms**    
Separate Model for Each Output (MultiOutputRegressor)    
Chained Models for Each Output (RegressorChain)   

## Problem of Multioutput Regression
In multioutput regression, typically the outputs are dependent upon the input and upon each other. This means that often the outputs are not independent of each other and may require a model that predicts both outputs together or each output contingent upon the other outputs.

Multi-step time series forecasting may be considered a type of multiple-output regression where a sequence of future values are predicted and each predicted value is dependent upon the prior values in the sequence.

#### Multioutput Regression Test Dataset
We can define a test problem that we can use to demonstrate the different modeling strategies.    
We will use the make_regression() function to create a test dataset for multiple-output regression. We will generate 1,000 examples with 10 input features, five of which will be redundant and five that will be informative (most important to do the prediction)

The problem will require the prediction of two numeric values.   
Problem Input: 10 numeric variables.   
Problem Output: 2 numeric variables.   

In [4]:
# example of multioutput regression test problem
from sklearn.datasets import make_regression
# create datasets
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1)
# summarize dataset
print(X.shape, y.shape)

(1000, 10) (1000, 2)


## Inherently Multioutput Regression Algorithms
Some regression machine learning algorithms support multiple outputs directly,such as:

LinearRegression (and related)  
KNeighborsRegressor  
DecisionTreeRegressor   
RandomForestRegressor (and related)

#### LinearRegression 

In [7]:
# linear regression for multioutput regression

from sklearn.linear_model import LinearRegression
# define model
lr = LinearRegression()
# fit model
lr.fit(X, y)
# make a prediction
test_data = [[-2.02220122, 0.31563495, 0.82797464, -0.30620401, 0.16003707, -1.44411381, 0.87616892, -0.50446586, 0.23009474, 0.76201118]]
model.predict(test_data)



array([[-93.147146  ,  23.26985013]])

#### Decision Tree Regressor

In [8]:
from sklearn.tree import DecisionTreeRegressor
dtregressor=DecisionTreeRegressor()
dtregressor.fit(X,y)

DecisionTreeRegressor(ccp_alpha=0.0, criterion='mse', max_depth=None,
                      max_features=None, max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, presort='deprecated',
                      random_state=None, splitter='best')

In [9]:
dtregressor.predict(test_data)

array([[-93.14714589,  23.26984981]])

#### K Nearest Neighbour

In [12]:
from sklearn.neighbors import KNeighborsRegressor
# define model
knn = KNeighborsRegressor()
# fit model
knn.fit(X, y)

KNeighborsRegressor(algorithm='auto', leaf_size=30, metric='minkowski',
                    metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                    weights='uniform')

In [13]:
knn.predict(test_data)

array([[-109.74862659,    0.38754079]])

#### Random forest regressor

In [14]:
from sklearn.ensemble import RandomForestRegressor
# define model
rf=RandomForestRegressor()
# fit model
rf.fit(X, y)

RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',
                      max_depth=None, max_features='auto', max_leaf_nodes=None,
                      max_samples=None, min_impurity_decrease=0.0,
                      min_impurity_split=None, min_samples_leaf=1,
                      min_samples_split=2, min_weight_fraction_leaf=0.0,
                      n_estimators=100, n_jobs=None, oob_score=False,
                      random_state=None, verbose=0, warm_start=False)

In [15]:
rf.predict(test_data)

array([[-79.66153647,  25.26445816]])

#### Evaluate Multioutput Regression With K-fold Cross-Validation

In [22]:
import numpy as np
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedKFold

In [23]:
#define model
dt = DecisionTreeRegressor()
# evaluate model
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
n_scores = cross_val_score(dt, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1, error_score='raise')
# summarize performance
n_scores = np.abs(n_scores)
print('Result: %.3f (%.3f)' % (np.mean(n_scores), np.std(n_scores)))

Result: 52.077 (2.592)


The mean and standard deviation of the MAE is reported calculated across all folds and all repeats.
Importantly, error is reported across both output variables, rather than separate error scores for each output variable.

## Wrapper Multioutput Regression Algorithms

Not all regression algorithms support multioutput regression.   
One example is the support vector machine, although for regression, it is referred to as support vector regression, or SVR.   
This algorithm does not support multiple outputs for a regression problem and will raise an error.

In [25]:
from sklearn.svm import LinearSVR
svregressor=LinearSVR()
svregressor.fit(X,y)

ValueError: bad input shape (1000, 2)

## Separate Model for Each Output (MultiOutputRegressor)
To fix the above problem we have Separate Model for Each Output (MultiOutputRegressor).   
This assumes that the outputs are independent of each other, which might not be a correct assumption. Nevertheless, this approach can provide surprisingly effective predictions on a range of problems and may be worth trying, at least as a performance baseline.You never know. The outputs for your problem may, in fact, be mostly independent, if not completely independent, and this strategy can help you find out.   

This approach is supported by the MultiOutputRegressor class that takes a regression model as an argument. It will then create one instance of the provided model for each output in the problem.

In [26]:

from sklearn.multioutput import MultiOutputRegressor
from sklearn.svm import LinearSVR

#define model
svr=LinearSVR()
#define wrapper
wrapper=MultiOutputRegressor(svr)

#fit model
wrapper.fit(X,y)

MultiOutputRegressor(estimator=LinearSVR(C=1.0, dual=True, epsilon=0.0,
                                         fit_intercept=True,
                                         intercept_scaling=1.0,
                                         loss='epsilon_insensitive',
                                         max_iter=1000, random_state=None,
                                         tol=0.0001, verbose=0),
                     n_jobs=None)

In [27]:
wrapper.predict(test_data)

array([[-93.147146  ,  23.26985013]])

This wrapper can then be used directly to make a prediction on new data, confirming that multiple outputs are supported.

## Chained Models for Each Output (RegressorChain)

Another approach to using single-output regression models for multioutput regression is to create a linear sequence of models.
The first model in the sequence uses the input and predicts one output; the second model uses the input and the output from the first model to make a prediction; the third model uses the input and output from the first two models to make a prediction, and so on.
This can be achieved using the **RegressorChain class** in the scikit-learn library.

In [29]:
from sklearn.multioutput import RegressorChain
from sklearn.svm import LinearSVR

# define model
model = LinearSVR()
wrapper = RegressorChain(model)

# fit model
wrapper.fit(X, y)

#predict
wrapper.predict(test_data)

array([[-93.147146  ,  23.26344224]])