- Multioutput regression are regression problems that involve predicting two or more numerical values given an input example
- An example might be to predict a coordinate given an input
-  Some algorithms do support multioutput regression inherently, such as linear regression and decision trees
- In multioutput regression, typically the outputs are dependent upon the input and upon each other
- This means that often the outputs are not independent of each other and may require a model that predicts both outputs together or each output contingent upon the other outputs

### Inherently Multioutput Regression Algorithms

#### Linear Regression for Multioutput Regression

In [1]:
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)
model = LinearRegression()
model.fit(X, y)
row = [0.21947749, 0.32948997, 0.81560036, 0.440956, -0.0606303, -0.29257894, -0.2820059, -0.00290545, 0.96402263, 0.04992249]
yhat = model.predict([row])
print(yhat[0])

[50.06781717 64.564973  ]


#### KNN for Multioutput Regression

In [2]:
from sklearn.datasets import make_regression
from sklearn.neighbors import KNeighborsRegressor

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)
model = KNeighborsRegressor()
model.fit(X, y)
row = [0.21947749, 0.32948997, 0.81560036, 0.440956, -0.0606303, -0.29257894, -0.2820059, -0.00290545, 0.96402263, 0.04992249]
yhat = model.predict([row])
print(yhat[0])

[-11.73511093  52.78406297]


#### Decision Tree for Multioutput Regression

In [3]:
from sklearn.datasets import make_regression
from sklearn.tree import DecisionTreeRegressor

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)
model = DecisionTreeRegressor()
model.fit(X, y)
row = [0.21947749, 0.32948997, 0.81560036, 0.440956, -0.0606303, -0.29257894, -0.2820059, -0.00290545, 0.96402263, 0.04992249]
yhat = model.predict([row])
print(yhat[0])

[49.93137149 64.08484989]


### Wrapper Multioutput Regression Algorithms

- Not all regression algorithms support multioutput regression
- A workaround for using regression models designed for predicting one value for multioutput regression is to divide the multioutput regression problem into multiple sub-problems
- For example, if a multioutput regression problem required the prediction of three values y1, y2 and y3 given an input X, then this could be partitioned into three single-output regression problems:
    - Problem 1: Given X, predict y1.
    - Problem 2: Given X, predict y2.
    - Problem 3: Given X, predict y3
- There are two main approaches to implementing this technique

#### Direct Multioutput Regression

- The direct approach to multioutput regression involves dividing the regression problem into a separate problem for each target variable to be predicted
- This assumes that the outputs are independent of each other, which might not be a correct assumption
- This approach is supported by the MultiOutputRegressor class that takes a regression model as an argument

In [4]:
from numpy import mean
from numpy import std
from numpy import absolute
from sklearn.datasets import make_regression
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedKFold
from sklearn.multioutput import MultiOutputRegressor
from sklearn.svm import LinearSVR

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)
model = LinearSVR()
wrapper = MultiOutputRegressor(model)
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
n_scores = cross_val_score(wrapper, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1)
n_scores = absolute(n_scores)
print('MAE: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))

MAE: 0.418 (0.023)


In [5]:
from sklearn.datasets import make_regression
from sklearn.multioutput import MultiOutputRegressor
from sklearn.svm import LinearSVR

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)
model = LinearSVR()
wrapper = MultiOutputRegressor(model)
wrapper.fit(X, y)
row = [0.21947749, 0.32948997, 0.81560036, 0.440956, -0.0606303, -0.29257894, -0.2820059, -0.00290545, 0.96402263, 0.04992249]
yhat = wrapper.predict([row])
print('Predicted: %s' % yhat[0])

Predicted: [50.01310289 64.52108199]


#### Chained Multioutput Regression

- Another approach to using single-output regression models for multioutput regression is to create a linear sequence of models
- The first model in the sequence uses the input and predicts one output
- the second model uses the input and the output from the first model to make a prediction
- the third model uses the input and output from the first two models to make a prediction, and so on
    - Problem 1: Given X, predict y1.
    - Problem 2: Given X and yhat1, predict y2.
    - Problem 3: Given X, yhat1, and yhat2, predict y3
- This can be achieved using the RegressorChain class in the scikit-learn library
- The order of the models may be based on the order of the outputs in the dataset (the default) or specified via the “order” argument. For example, order=[0,1] would first predict the oth output, then the 1st output, whereas an order=[1,0] would first predict the last output variable and then the first output variable in our test problem

In [6]:
from numpy import mean
from numpy import std
from numpy import absolute
from sklearn.datasets import make_regression
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedKFold
from sklearn.multioutput import RegressorChain
from sklearn.svm import LinearSVR

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)
model = LinearSVR()
wrapper = RegressorChain(model)
cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
n_scores = cross_val_score(wrapper, X, y, scoring='neg_mean_absolute_error', cv=cv, n_jobs=-1)
n_scores = absolute(n_scores)
print('MAE: %.3f (%.3f)' % (mean(n_scores), std(n_scores)))

MAE: 0.583 (0.255)


In [7]:
from sklearn.datasets import make_regression
from sklearn.multioutput import RegressorChain
from sklearn.svm import LinearSVR

X, y = make_regression(n_samples=1000, n_features=10, n_informative=5, n_targets=2, random_state=1, noise=0.5)
model = LinearSVR()
wrapper = RegressorChain(model)
wrapper.fit(X, y)
row = [0.21947749, 0.32948997, 0.81560036, 0.440956, -0.0606303, -0.29257894, -0.2820059, -0.00290545, 0.96402263, 0.04992249]
yhat = wrapper.predict([row])
print('Predicted: %s' % yhat[0])

Predicted: [50.01965485 64.5193285 ]


