## This notebook Focuses on Super learner and Stacking concept in Deep learning

> So you might be wondering what is `super Learner` Since Stacking is some what like already covered on previous notebook although there are also some hint of super Learner earlier but let's first start with Super Learner

### [`Super Learner`](https://machinelearningmastery.com/super-learner-ensemble-in-python/)

*Consider that you have already fit many different algorithms on your dataset, and some algorithms have been evaluated many times with different configurations. You may have many tens or hundreds of different models of your problem. Why not use all those models instead of the best model from the group?*

This is the intuition behind the so-called `super learner` ensemble algorithm.

>The super learner algorithm was proposed by Mark van der Laan, Eric Polley, and Alan Hubbard from Berkeley in their 2007 paper titled “Super Learner.” It was published in a biological journal, which may be sheltered from the broader machine learning community.

**The super learner algorithm involves first pre-defining the k-fold split of your data, then evaluating all different algorithms and algorithm configurations on the same split of the data. All out-of-fold predictions are then kept and used to train an algorithm that learns how to best combine the predictions.**

### The procedure can be summarized as follow: 

1. Select a k-fold split of the training dataset.
2. Select m base-models or model configurations.
3. For each basemodel:
    * Evaluate using k-fold cross-validation.
    * Store all out-of-fold predictions.
    * Fit the model on the full training dataset and store.
4. Fit a meta-model on the out-of-fold predictions.
5. Evaluate the model on a holdout dataset or use model to make predictions.

![superlearner](Super-learner.png)

#### Super Learner is an application of Stacking Generalization *from previous notebooks*

*firstly*

## Super Learner for Regression

In [4]:
# manually Develop a Super Learner with scikit learn

#Importing all modules
import numpy as np
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import KFold, train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.linear_model import LinearRegression, ElasticNet
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR
from sklearn.ensemble import AdaBoostRegressor, BaggingRegressor, RandomForestRegressor
from sklearn.ensemble import ExtraTreesRegressor

In [7]:
# get the data 
housing = fetch_california_housing()
housing_df = pd.DataFrame(housing.data, columns = housing.feature_names)
housing_df['target'] = housing.target
housing_df.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,target
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422


In [8]:
housing_df.shape

(20640, 9)

In [61]:
# Get features and labels 
X = housing_df.drop("target",axis = 1)
y = housing_df.target
X.shape, y.shape

((20640, 8), (20640,))

In [62]:
# Create the list of all base model
def get_models(): 
    models = []
    models.append(LinearRegression())
    models.append(ElasticNet())
    models.append(SVR(gamma = 'scale'))
    models.append(DecisionTreeRegressor())
    models.append(KNeighborsRegressor())
    models.append(AdaBoostRegressor())
    models.append(BaggingRegressor(n_estimators=10))
    models.append(RandomForestRegressor(n_estimators=10))
    models.append(ExtraTreesRegressor(n_estimators = 10))
    return models

In [63]:
# Creating k fold for dataset for 10 folds 
kfold = KFold(n_splits = 10, shuffle = True)
for i, (train_index, test_index) in enumerate(kfold.split(X)):
    print(f"Fold {i}:")
    print(f"  Train: index={train_index}")
    print(f"  Test:  index={test_index}")

Fold 0:
  Train: index=[    0     1     2 ... 20637 20638 20639]
  Test:  index=[   14    27    32 ... 20605 20611 20634]
Fold 1:
  Train: index=[    0     1     3 ... 20637 20638 20639]
  Test:  index=[    2     8    21 ... 20575 20584 20588]
Fold 2:
  Train: index=[    0     1     2 ... 20635 20636 20639]
  Test:  index=[    3    15    22 ... 20618 20637 20638]
Fold 3:
  Train: index=[    0     1     2 ... 20637 20638 20639]
  Test:  index=[   13    43    51 ... 20590 20613 20629]
Fold 4:
  Train: index=[    0     2     3 ... 20637 20638 20639]
  Test:  index=[    1     7    34 ... 20622 20625 20627]
Fold 5:
  Train: index=[    0     1     2 ... 20637 20638 20639]
  Test:  index=[    9    10    12 ... 20612 20615 20620]
Fold 6:
  Train: index=[    1     2     3 ... 20637 20638 20639]
  Test:  index=[    0     4     5 ... 20621 20623 20632]
Fold 7:
  Train: index=[    0     1     2 ... 20636 20637 20638]
  Test:  index=[   19    24    28 ... 20624 20630 20639]
Fold 8:
  Train: index=[

In [64]:
# Spliting the data into train and test split
X_train,X_test, y_train,y_test = train_test_split(X,y,test_size=0.2)
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((16512, 8), (4128, 8), (16512,), (4128,))

In [65]:
# Breaking down function
a = np.array([32,232,23,23,21,12,123,234])
b = np.array([323,32,32,23,3223,32,32,12])
a_reshaped = a.reshape(len(a),1)
b_reshaped = b.reshape(len(b),1)
a_hstack = []
a_hstack.append(np.hstack(a_reshaped))
a_vhstack = [] 
a_vhstack.append(np.vstack(a_hstack))

In [66]:
a, a_reshaped

(array([ 32, 232,  23,  23,  21,  12, 123, 234]),
 array([[ 32],
        [232],
        [ 23],
        [ 23],
        [ 21],
        [ 12],
        [123],
        [234]]))

In [67]:
a_hstack, a_hstack[0].shape

([array([ 32, 232,  23,  23,  21,  12, 123, 234])], (8,))

In [68]:
 a_vhstack

[array([[ 32, 232,  23,  23,  21,  12, 123, 234]])]

In [69]:
a_hstack.append(np.hstack(b_reshaped))
a_vhstack.append(np.vstack(a_hstack))

In [70]:
a_hstack, a_vhstack

([array([ 32, 232,  23,  23,  21,  12, 123, 234]),
  array([ 323,   32,   32,   23, 3223,   32,   32,   12])],
 [array([[ 32, 232,  23,  23,  21,  12, 123, 234]]),
  array([[  32,  232,   23,   23,   21,   12,  123,  234],
         [ 323,   32,   32,   23, 3223,   32,   32,   12]])])

In [73]:
# collect prediction of each base model on k fold dataset
def get_out_of_fold_predictions(X,y,models): 
    meta_X, meta_y = list(), list() # this are going to act as input for meta model
    #define split of data 
    kfold = KFold(n_splits=10, shuffle=True)
    #enumerate splits 
    for train_index,test_index in kfold.split(X):
        fold_predictions = list()
        X_train, X_test = X.iloc[train_index], X.iloc[test_index]
        y_train, y_test = y.iloc[train_index], y.iloc[test_index]
        meta_y.extend(y_test)
        
        #fit and make predictions with each base-model
        for model in models:
            model.fit(X_train,y_train)
            base_model_prediction = model.predict(X_test)
            
            fold_predictions.append(base_model_prediction.reshape(len(base_model_prediction),1))
        meta_X.append(np.hstack(fold_predictions))
    return np.vstack(meta_X), np.asarray(meta_y)

In [74]:
# Let's see how inputs of meta data are goona look like
meta_X, meta_y = get_out_of_fold_predictions(X_train,y_train,models=get_models())
meta_X, meta_y

(array([[1.7660686 , 1.82273782, 1.81112003, ..., 2.0053    , 2.1713    ,
         1.6605    ],
        [1.7839361 , 1.86237395, 1.80123518, ..., 1.7426    , 1.8851    ,
         1.7762    ],
        [2.18013872, 1.73304683, 1.82377491, ..., 2.337     , 2.3666    ,
         2.3146    ],
        ...,
        [1.79198967, 1.78242079, 1.79856318, ..., 1.6777    , 1.7894    ,
         1.4824    ],
        [2.33888768, 2.41281997, 1.94047055, ..., 2.6328    , 2.728101  ,
         2.625     ],
        [2.01936282, 1.80689836, 1.88501851, ..., 3.769904  , 2.940702  ,
         4.197306  ]]),
 array([1.577, 1.535, 2.281, ..., 1.271, 3.203, 4.25 ]))

In [75]:
# shape of meta data shows now they can be used as input for any model
meta_X.shape, meta_y.shape

((16512, 9), (16512,))

In [76]:
# DataFrame representation of X_meta for better understanding
pd.DataFrame(meta_X)

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,1.766069,1.822738,1.811120,2.750,1.750600,2.300209,2.005300,2.171300,1.660500
1,1.783936,1.862374,1.801235,1.500,2.114200,1.935461,1.742600,1.885100,1.776200
2,2.180139,1.733047,1.823775,2.308,2.682400,1.884963,2.337000,2.366600,2.314600
3,2.063799,2.169576,1.775397,1.591,1.868200,2.779841,1.639100,1.584800,1.599800
4,1.651970,1.725596,1.855109,3.603,2.221402,2.782794,2.205600,2.783100,2.497700
...,...,...,...,...,...,...,...,...,...
16507,2.001309,2.002026,1.788406,1.449,1.499400,2.010944,1.244600,1.188400,1.324300
16508,1.132998,1.773183,1.850860,0.982,1.421800,1.706269,1.174200,1.041200,0.816100
16509,1.791990,1.782421,1.798563,1.703,1.268600,2.271499,1.677700,1.789400,1.482400
16510,2.338888,2.412820,1.940471,3.215,3.183202,3.019157,2.632800,2.728101,2.625000


In [77]:
# Create a funtion to fit base model
def fit_base_models(X,y,models): 
    for model in models: 
        model.fit(X,y)

In [49]:
# Create function to fit meta model
def fit_meta_model(X,y): 
    model = LinearRegression() # using Linear Regression for meta model since meta models are generally simple models
    model.fit(X,y)
    return model

### Before fitting the model let's Create evaluation function for both base model and meta model

In [53]:
def evaluate_models(X,y,models): 
    for model in models:
        y_preds = model.predict(X)
        mse =mean_squared_error(y,y_preds)
        print("%s: RMSE %.3f"%(model.__class__.__name__,np.sqrt(mse)))

In [84]:
def super_learner_predictions(X,models,meta_model): 
    meta_X = list()
    for model in models: 
        y_preds = model.predict(X)
        meta_X.append(y_preds.reshape(len(y_preds),1))
    meta_X = np.hstack(meta_X)
        
    #predict
    return meta_model.predict(meta_X)

In [55]:
def meta_model_evaluate(y_test,y_preds): 
    print("Super Learner :RMSE %.3f" % (np.sqrt(mean_squared_error(y_test,y_preds))))

## Let's fit the models and `our Super Learner`

In [79]:
models = get_models()
print("fitting base models...")
fit_base_models(X_train,y_train,models)
print("fitting Meta model...")
meta_model = fit_meta_model(meta_X,meta_y)

fitting base models...
fitting Meta model...


## Let's evaluate our models and See results

In [80]:
evaluate_models(X_test,y_test,models)

LinearRegression: RMSE 0.722
ElasticNet: RMSE 0.881
SVR: RMSE 1.162
DecisionTreeRegressor: RMSE 0.685
KNeighborsRegressor: RMSE 1.044
AdaBoostRegressor: RMSE 0.873
BaggingRegressor: RMSE 0.519
RandomForestRegressor: RMSE 0.515
ExtraTreesRegressor: RMSE 0.523


In [85]:
y_preds_super_learner = super_learner_predictions(X_test,models,meta_model)
y_preds_super_learner.shape

(4128,)

In [87]:
y_test.shape

(4128,)

In [88]:
meta_model_evaluate(y_test,y_preds_super_learner)

Super Learner :RMSE 0.488


# As we can see Super Learner has least `RMSE = 0.488` means it is the best of all models

**This is what we called `super learner` which can learn from all models and provide us the best way to combine all this model**

### you can also implement this with `ML-Ensemble Library`

*for this we need mlens library*

> pip install mlens