# How to Develop Multi-Output Regression Models with Python

Multioutput regression are regression problems that involve predicting two or more numerical values given an input example.

An example might be to predict a coordinate given an input, e.g. predicting x and y values. Another example would be multi-step time series forecasting that involves predicting multiple future time series of a given variable.

Refer to this blog post by Jason Brownlee for more detail: https://machinelearningmastery.com/multi-output-regression-models-with-python/

Also, here's a YouTube video that covers the same subject: https://youtu.be/26J3bcqhfLE

### MultiOutput Regression Models In Python

In [5]:
import pandas as pd
import numpy as np

First let's generate a dataset with 2 targets, to use in this tutorial:

In [1]:
from sklearn.datasets import make_regression

In [2]:
## Create the dataset.
# Note: 'X' and 'y' are both NumPy nd-arrays
X, y = make_regression(n_samples=1500,n_informative=5,n_features=10,n_targets=2)

In [6]:
# preview 'X' dataset
pd.DataFrame(X)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,0.389931,-0.838284,1.968936,0.924795,-0.953027,-1.524406,-0.153558,-1.764456,-1.438949,0.295061
1,-0.180634,0.827593,0.017068,-0.800341,0.450367,-0.285883,-1.007966,0.620922,-0.252213,0.975275
2,1.327942,2.257078,0.278028,1.196362,1.488471,1.585731,1.106778,3.002443,-0.449102,-0.592198
3,-0.038221,-1.543440,-0.389085,-1.057201,0.127640,1.775334,-1.490630,-1.218147,1.346898,1.203405
4,-1.213114,-2.303337,-0.873773,1.538329,-0.330545,-1.246458,-0.320235,-0.261251,-0.617855,-2.374674
...,...,...,...,...,...,...,...,...,...,...
1495,0.078254,0.461972,1.123928,-0.560024,0.056163,-1.556716,-1.121268,1.009823,0.491985,0.089019
1496,0.270709,1.011420,-0.371784,-1.229511,0.907450,-0.102422,-0.393842,-0.545959,-0.085532,0.501635
1497,-1.720846,-1.227855,-0.021201,-0.869074,0.898413,-1.699503,-0.013160,-0.450193,-0.324492,1.111755
1498,-0.921633,0.479956,-1.162853,-0.279212,-2.346354,-1.221802,-0.657585,1.544478,-1.123184,0.122732


In [25]:
# show summary statistics of 'X'
pd.DataFrame(X).describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
0,1500.0,0.006138,1.004022,-3.154282,-0.659677,0.001471,0.698747,3.386113
1,1500.0,0.015777,0.993568,-3.559955,-0.687674,0.008349,0.696319,3.280244
2,1500.0,-0.002676,0.993163,-3.384969,-0.64601,-0.005259,0.641921,2.979984
3,1500.0,0.023872,1.016042,-3.97607,-0.663683,0.034525,0.674896,3.596219
4,1500.0,-0.02326,0.979603,-3.068737,-0.687028,-0.047396,0.638866,3.085333
5,1500.0,0.004309,0.996382,-3.462813,-0.674827,-0.00347,0.707636,3.764744
6,1500.0,-0.012742,0.975011,-3.696708,-0.64222,-0.015128,0.665179,3.789854
7,1500.0,0.006775,1.031621,-3.228711,-0.701051,-0.002143,0.700683,3.048714
8,1500.0,0.02598,0.98104,-3.276782,-0.679221,0.014352,0.678847,3.400097
9,1500.0,0.03692,0.97149,-3.058188,-0.603427,-0.008014,0.691798,3.213886


In [7]:
# preview 'y' dataset
pd.DataFrame(y)

Unnamed: 0,0,1
0,-251.341754,-263.733400
1,128.281976,110.702275
2,539.816839,527.742881
3,-140.680751,-112.725884
4,-333.707629,-336.408857
...,...,...
1495,50.481536,21.294943
1496,162.898299,121.751412
1497,-159.258409,-213.364252
1498,-159.717566,-88.255655


In [26]:
# show summary statistics of 'y'
pd.DataFrame(y).describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
0,1500.0,0.103792,142.166895,-485.235538,-92.90437,-0.670546,95.823508,539.816839
1,1500.0,0.960873,129.303016,-461.504088,-82.854381,5.591691,90.257862,527.742881


# Some algorithms naturally support Multioutput Regression  

For instance: 
- LinearRegression (and related)  
- KNeighborsRegressor  
- DecisionTreeRegressor  
- RandomForestRegressor (and related)

### Apply Linear Regression

In [8]:
from sklearn.linear_model import LinearRegression

In [9]:
lrregression=LinearRegression()
lrregression.fit(X,y)

LinearRegression()

In [10]:
test_data=[[-0.35383149,  0.39382202, -2.03033197,  0.08873402, -0.38576581,
        0.0032707 , -0.56476034, -0.67236167,  0.31317233,  1.5208706 ]]

In [11]:
lrregression.predict(test_data)

array([[-31.00535229, -20.42165322]])

### Decision Tree Regression

In [12]:
from sklearn.tree import DecisionTreeRegressor
dtregressor=DecisionTreeRegressor()
dtregressor.fit(X,y)

DecisionTreeRegressor()

In [13]:
dtregressor.predict(test_data)

array([[-14.66075855, -16.04802789]])

### Random Forest Regressor

In [14]:
from sklearn.ensemble import RandomForestRegressor
rdregressor=RandomForestRegressor()
rdregressor.fit(X,y)

RandomForestRegressor()

In [15]:
rdregressor.predict(test_data)

array([[-30.17798373, -18.76161587]])

## Evaluate Multioutput Regression With Cross-Validation

In [16]:
from sklearn.model_selection import cross_val_score

In [17]:
scores=cross_val_score(rdregressor,X,y,scoring='neg_mean_squared_error',cv=5)

In [18]:
print(scores)

[-1751.07809473 -1406.95891831 -1363.33957065 -1287.08619231
 -1269.75448764]


# Not all algorithms can support Multioutput Regression

E.g., Support Vector Regressor.

Running the code below generates an error message.

In [19]:
from sklearn.svm import LinearSVR
svregressor=LinearSVR()
svregressor.fit(X,y)

ValueError: y should be a 1d array, got an array of shape (1500, 2) instead.

# Wrapper Multioutput Regression Algorithms

- Direct Multioutput Regression  
- Chained Multioutput Regression

### Direct Multioutput Regression

Using a wrapper method from `sklearn` to enable MultiOutput Regression 

In [20]:
from sklearn.multioutput import MultiOutputRegressor

In [21]:
mulregressor=MultiOutputRegressor(svregressor)
mulregressor.fit(X,y)

MultiOutputRegressor(estimator=LinearSVR())

In [22]:
mulregressor.predict(test_data)

array([[-31.00535229, -20.42165322]])

### Chained Multioutput Regression

Using another wrapper method from `sklearn` to enable MultiOutput Regression 

In [28]:
from sklearn.multioutput import RegressorChain

In [29]:
mulregressor2 = RegressorChain(svregressor)
mulregressor2.fit(X,y)

RegressorChain(base_estimator=LinearSVR())

In [30]:
mulregressor2.predict(test_data)

array([[-31.00535229, -20.42072872]])

The End! :)