Stacking is basically using the predictions from different models to train another model.
The models whose predictions are used are called level0 models or base models.
The model which is then trained on the predictions of previous models is known as level1 model or the meta model

Two methods:
    1. Using Sklearn stacking classifier class 
    2. Building yours
    
Which ever method, you're doing the same thing

It's a common practise to use very strong models like GBMs, tree models, NN as your level0 models and use a simple linear model such as LogisticRegressor/ LinearRegressor as the case maybe as your level1 model beacuse it's believed that linear models know the best way to combine these predictions and give good result

For this demonstration, our level0 models will be: Catboost and LightGBM and our level1 model will be Logistic regression

### Read in our data (i'd be using the dsn qualification data set)
i'll need to quickly preprocess the data, you can skip that part, i'll do all that in just one cell

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
def to_df(data, cols):
    df= pd.DataFrame(data, columns= cols)
    return df

df= pd.read_csv('Train.csv')
df.drop('Applicant_ID', 1, inplace= True)

y= df.default_status.map({'yes': 1, 'no': 0})
df.drop(['default_status'], 1, inplace= True)
df= pd.get_dummies(df, drop_first= True)

from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler

cols= df.columns
pipe= Pipeline([('imputer', SimpleImputer(strategy= 'mean')), ('scaler', StandardScaler())])
X= to_df(pipe.fit_transform(df), cols)

### Import the models we'd be using

In [2]:
from catboost import CatBoostClassifier
from lightgbm import LGBMClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

### Method 1 (Using stacking classifier)

It takes in three major parameters  


`estimators`: These are the base models, you pass them in as list of tuples cintaing the name and the model itself e.g   
[('cat', CatBoostClassifier()), ('lgb',LGBMClassifier())]


`final_estimator`: this is the meta model it is passed in normally e.g LogisticRegression()


`passthrough`: It takes in True or False, True means the features of the data set should also be passed in while training the meta model

In [3]:
# split data
X_train, X_test, y_train, y_test= train_test_split(X, y, test_size= 0.2,random_state=101)

In [4]:
from sklearn.ensemble import StackingClassifier
level0 = [('cat', CatBoostClassifier(n_estimators= 100, silent= True)), ('lgb',LGBMClassifier(n_estimators= 100))]
level1= LogisticRegression()

stack= StackingClassifier(estimators= level0, final_estimator=level1, passthrough= False, verbose= 0)
# Now train Logistic regression using this new data

stack.fit(X_train, y_train)
stack_pred= stack.predict(X_test)
accuracy_score(y_test, stack_pred)

0.8083928571428571

### Method 2 (Building our stacking classifier)

Four steps
1. Train your base models
2. Get the prediction from the models
3. Turn them to a data frame
4. Train the meta model with the df

In [5]:
lgb= LGBMClassifier(n_estimators= 100)
lgb.fit(X_train, y_train)
lgb_pred= lgb.predict(X_test)
accuracy_score(y_test, lgb_pred)

0.8084821428571428

In [7]:
# cat= CatBoostClassifier(n_estimators= 100, silent=True)
# cat.fit(X_train, y_train)
# cat_pred= cat.predict(X_test)
accuracy_score(y_test, cat_pred)

0.8014285714285714

In [8]:
# Form a dataframe with the predictions
stacked= pd.DataFrame({'lgb': lgb_pred, 'cat': cat_pred})
stacked.head()

Unnamed: 0,lgb,cat
0,0,0
1,1,1
2,0,0
3,0,0
4,0,0


In [11]:
# Now train Logistic regression using this new data
lr= LogisticRegression()
lr.fit(stacked, y_test)
lr_pred= lr.predict(stacked)
accuracy_score(y_test, lr_pred)

0.8041964285714286

For some reasons, the stacking wasn't exactly better, but that is the idea