# Blending

## How does blending work?

1) Test set + (Validation set + training set)

2) Have models (MODEL1), train these models using the training set (The models can be different, e.g. tree, logistic model, XGBoost)

3) Use MODEL1 on Test set and Val set, have the test_predidct and val_predict

4) Have models (MODEL2), use validation prediction as a training set (use the predicted result from step 3, as training set)

5) Predict using MODEL2 on the test_predict from step (3)

## Application of Blending

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use("ggplot")
%matplotlib inline
import seaborn as sns

### Step 1: Test + Validation + Train

In [2]:
from sklearn import datasets
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
data, target = make_blobs(n_samples=10000, centers=2, random_state=1, cluster_std=1.0 )

#Training set
X_train1,X_test,y_train1,y_test = train_test_split(data, target, test_size=0.2,
random_state=1)

#Split the 'training set' into Traing and validation set
X_train,X_val,y_train,y_val = train_test_split(X_train1, y_train1, test_size=0.3,
random_state=1)

print("The shape of training X:",X_train.shape)
print("The shape of training y:",y_train.shape)
print("The shape of test X:",X_test.shape)
print("The shape of test y:",y_test.shape)
print("The shape of validation X:",X_val.shape)
print("The shape of validation y:",y_val.shape)

The shape of training X: (5600, 2)
The shape of training y: (5600,)
The shape of test X: (2000, 2)
The shape of test y: (2000,)
The shape of validation X: (2400, 2)
The shape of validation y: (2400,)


### Step 2: Have models (MODEL1), train these models using the training set (SVM, KNN, Random Forest)

In [3]:
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier

#MODEL1
#SVM + Random Forest + KNN
clfs = [SVC(probability = True),RandomForestClassifier(n_estimators=5, n_jobs=-1,
criterion='gini'),KNeighborsClassifier()]


### Step 3: Use MODEL1 on Test set and Val set, have the test_predidct and val_predict

In [4]:
val_features = np.zeros((X_val.shape[0],len(clfs))) 
test_features = np.zeros((X_test.shape[0],len(clfs))) 
for i,clf in enumerate(clfs):
    clf.fit(X_train,y_train)
    val_feature = clf.predict_proba(X_val)[:, 1]
    test_feature = clf.predict_proba(X_test)[:,1]
    val_features[:,i] = val_feature
    test_features[:,i] = test_feature

### Step 4: Set up MODEL2, use validation prediction as a training set (use the predicted result from step 3, as training set)

In [5]:
#Set up MODEL2
from sklearn.linear_model import LinearRegression
lr = LinearRegression()

In [6]:
#Val_features is from STEP3, y_val also
lr.fit(val_features,y_val)

#Output the prediction result
from sklearn.model_selection import cross_val_score
cross_val_score(lr,test_features,y_test,cv=5)

array([1., 1., 1., 1., 1.])

## Application of Blending - IRIS

In [7]:
from sklearn import datasets

iris = datasets.load_iris()
X = iris.data
y = iris.target

In [16]:
#Training set
X_train1,X_test,y_train1,y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

#Split the 'training set' into Traing and validation set
X_train,X_val,y_train,y_val = train_test_split(X_train1, y_train1, test_size=0.3,
random_state=42)

print("The shape of training X:",X_train.shape)
print("The shape of training y:",y_train.shape)
print("The shape of test X:",X_test.shape)
print("The shape of test y:",y_test.shape)
print("The shape of validation X:",X_val.shape)
print("The shape of validation y:",y_val.shape)

The shape of training X: (84, 4)
The shape of training y: (84,)
The shape of test X: (30, 4)
The shape of test y: (30,)
The shape of validation X: (36, 4)
The shape of validation y: (36,)


In [29]:
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier

#MODEL1
#SVM + Random Forest + KNN
clfs = [SVC(probability = True),RandomForestClassifier(n_estimators=2,
criterion='gini'),KNeighborsClassifier()]


In [30]:
val_features = np.zeros((X_val.shape[0],len(clfs))) 
test_features = np.zeros((X_test.shape[0],len(clfs))) 
for i,clf in enumerate(clfs):
    clf.fit(X_train,y_train)
    val_feature = clf.predict_proba(X_val)[:, 1]
    test_feature = clf.predict_proba(X_test)[:,1]
    val_features[:,i] = val_feature
    test_features[:,i] = test_feature

In [27]:
from sklearn.linear_model import LinearRegression
lr = LinearRegression()

In [28]:
#Val_features is from STEP3, y_val also
lr.fit(val_features,y_val)

#Output the prediction result
from sklearn.model_selection import cross_val_score
cross_val_score(lr,test_features,y_test,cv=3)

array([-6.21817084, -0.08757105, -0.14423259])