# Stacking Regressor
## Diabetes Dataset
### Andrea Cano


***

## Introduction

Stacking is a type of Ensemble Method that is used to combine "weak" learners in order to make a strong model. In this markdown, there is a simple Stacking Regressor that shows the blending of these different regression models with the fit(), predict(), and score() functions. To demonstrate the function, the Diabetes dataset from sklearn will help.

In [1]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error
#accuracy score not for regression

## Stacking Class

In [2]:
class StackingRegressor:
    def __init__(self, regs):
       self.regs = regs
       # Creating holders
       self.fits = []
       self.preds = []
       # Defining Blender
       self.blend = RandomForestRegressor()
       
    def fit(self, X, y):
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5) # split data in half
        for r in self.regs:
            fit_var = r.fit(X_train,y = y_train) #create var to use with predict
            self.fits.append(fit_var) # fit and add to array
            self.preds.append(fit_var.predict(X_test)) # predict and add to array
            
        self.preds = np.transpose(self.preds) # transpose so that columns not rows
        self.blend = self.blend.fit(self.preds,y_test)  # fit blender
        return self.fits # return the fits
    
    def predict(self, X):
        # Get from ten features to three features
        # Run the regressors to get the three features
        x_predictions = []
        for r in self.fits:
            x_predictions.append(r.predict(X)) # going through fitted regressors
            
        x_predictions = np.transpose(x_predictions)
        y_pred = self.blend.predict(x_predictions) # training blender on predicts
        return y_pred #return the predictions

    def score(self, X_test,y_test):
        return mean_squared_error(y_test,self.predict(X_test)) #(y_true,y_pred)
        


## Loading Data

In [3]:
diabetes = datasets.load_diabetes()

X = diabetes["data"]
y = diabetes["target"]

## Splitting Data

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3)

## Sample Testing


In [5]:
regr1 = RandomForestRegressor()
regr2 = LinearRegression()
regr3 = Ridge()        
regr = StackingRegressor([regr1, regr2, regr3])

regr.fit(X_train, y_train)

y_predict = regr.predict(X_test)

regr.score(X_test, y_test)  

3624.36045112782

## Conclusion
The output from the testing gave around 3000+. This is the MSE also known as  Mean squared error. The MSE is more preferred to over Mean Absolute Error for regression but both gave a relatively similar response.