# Issue 5824
Continuing to work on this issue with the very awesome Hugo Bowne-Anderson.
The issue: Meta-estimators for multi-output learning https://github.com/scikit-learn/scikit-learn/issues/5824
I think it would be useful to have meta-estimators for turning a classifier or a regressor into a multi-output classifier or regressor. It's a recurrent pattern and I find myself reimplementing it every once in a while.
This is of course useful for estimators that don't have native multi-output support but even for those that have like RF, I find that estimating a model independently for each output sometimes works better.
Class names: MultiOutputClassifier and MultiOutputRegressor.

Recently: Hugo, Micheal and I wrote some of the base code. Hugo has worked on documentation. I am now implementing test cases. 

In [1]:
import sys
sys.path.append('dev_multi/')
import numpy as np

from MultiOneVsRest import MultiOneVsRestClassifier 
from sklearn.datasets import load_digits
from sklearn.base import clone
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

# Import libraries for vaidation
import sklearn.utils.estimator_checks as ec
import sklearn.utils.validation  as val
from sklearn import datasets

# import the shuffle 
from sklearn.utils import shuffle
from sklearn.preprocessing import LabelBinarizer

# these are function for testing
from sklearn.utils.testing import assert_array_equal
from sklearn.utils.testing import assert_equal
from sklearn.utils.testing import assert_almost_equal

### Develop a nice test case
Below I a create a function for generating a test case for multitarget classification. I use the iris data set for this.


In [2]:
def _create_data_set():
    """ Creates a multi-target data set using the iris data set.
    Returns:
    ________
        X : (numpy matrix): The iris predictor data.
        Y : (numpy array): A multi-target (150x3) array generated from the 
        original response data
        """
    
    # Import the data 
    iris = datasets.load_iris()
    X = iris.data

    # create a multiple targets by randomizing the shuffling and concatenating y. 
    y1 = iris.target
    y2 = shuffle(y1, random_state = 1) 
    y3 = shuffle(y1, random_state = 2)

    # concatenate the array and transpose
    Y = np.vstack((y1,y2,y3)).T
    
    return(X,Y)

### Functions for testing the random_forest with the multi-target. 

Below are several functions for test the subprocesses of the multitarge function. 

In [3]:
def _set_up_multi_target_random_forest():
    ''' Set up the forest and multi-target forest'''
    
    forest = RandomForestClassifier(n_estimators =100, random_state=1)
    multi_target_forest = MultiOneVsRestClassifier(forest, n_jobs = -1)
    
    return forest, multi_target_forest


def test_multi_target_init_with_random_forest():
    ''' test if multi_target initilizes correctly as desired for random forest.'''
    
    forest, multi_target_forest = _set_up_multi_target_random_forest()
    
    # check to see that the estimator type is correct
    assert_equal(forest, multi_target_forest.estimator)
    #check to that the number of jobs is correct
    assert_equal(-1,multi_target_forest.n_jobs)
    
def test_multi_target_fit_and_predict_with_random_forest():
    ''' test the fit procedure with random forest and assert that predictions work as expected. '''
    
    X,Y = _create_data_set()
    forest, multi_target_forest = _set_up_multi_target_random_forest()
    
    # train the multi_target_forest and also get the predictions. 
    multi_target_forest.fit(X,Y)
    predictions = multi_target_forest.predict(X)
    assert_equal(3,len(predictions))
    
    # train the forest with each column and then assert that the predictions are equal
    for i in range(3):     
        forest.fit(X,Y[:,i])
        assert_equal(list(forest.predict(X)), list(predictions[i]))


def test_multi_target_fit_and_predict_probs_with_random_forest(): 
    ''' test the that the fit probabilites are as expected up to one decimal point'''
    
    # create the data set using the helper function 
    X,Y = _create_data_set()
    forest, multi_target_forest = _set_up_multi_target_random_forest()
    
    # train the multi_target_forest
    multi_target_forest.fit(X,Y)
    # train the forest with each column and then assert that the predictions are equal
    for i in range(3):
        forest_ = clone(forest)  #create a clone with the same state
        forest_.fit(X,Y[:,i])
        assert_almost_equal(list(forest_.predict_proba(X)), list(multi_target_forest.predict_proba(X)[i]), decimal = 1)

        
def test_multi_target_score():
    ''' test the scoring function '''
    
    # create the data set using the helper function 
    X,Y = _create_data_set()
    forest, multi_target_forest = _set_up_multi_target_random_forest()
    
    # train the multi_target_forest
    multi_target_forest.fit(X,Y)
    
    #score the multi_target_forest expect an array of floats
    multi_score = multi_target_forest.score(X,Y)
    
    # train the forest with each column and then assert that scores are similar. 
    for i in range(3):
        score = forest.fit(X,Y[:,i]).score(X,Y[:,i])
        assert_almost_equal(score, multi_score[i])

In [4]:
test_multi_target_fit_and_predict_with_random_forest()

In [5]:
test_multi_target_fit_and_predict_probs_with_random_forest()

In [6]:
test_multi_target_score()

### Ending remarks. 

The above functions test that the the following modalities work with random forest for our class: 
    1. Inititialization. 
    2. Fitting with data. 
    3. Prediction. 
    4. Returning Probabilities. 
    5. Scoring. 