# 1. About this Sprint

## The purpose of this Sprint
<li> Understand ensemble learning </li>

## How to learn

We will implement various methods of ensemble learning with scratch.

# 2. Ensemble learning

We will implement scratch implementation of three types of ensemble learning. Then check each effect on a smaller dataset.

<li> Blending </li>
<li> Bagging </li>
<li> Stacking </li>


## Preparing a small dataset

Prepare the regression dataset that you used before.


[House Prices: Advanced Regression Techniques](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data)


Download train.csv and use SalePrice as the objective variable and GrLivArea and YearBuiltas explanatory variables.


Divide train.csv into 80% for learning (train) and 20% for verification (val).

## scikit-learn

We recommend using a library such as scikit-learn for a single model rather than a scratch implementation.


[sklearn.linear_model.LinearRegression — scikit-learn 0.21.3 documentation](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html)


[sklearn.svm.SVR — scikit-learn 0.21.3 documentation](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html)


[sklearn.tree.DecisionTreeRegressor — scikit-learn 0.21.3 documentation](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html)

# 3. Blending

## Problem 1: Blending scratch mounting

Show **at least three​ ​examples of scratch implementation of blending** that are more accurate than a single model. Higher accuracy means less mean squared error (MSE) on the validation data.

## What is blending?

Blending is a method of independently training N diverse models, weighting the estimation results, and then adding them together. The simplest is to take the average. Various models are created by changing the following conditions.

<li>Techniques (eg linear regression, SVM, decision tree, neural network, etc.)</li>
<li>Hyperparameters (eg SVM kernel type, initial weights, etc.)</li>
<li>How to preprocess input data (eg standardization, logarithmic transformation, PCA, etc.)</li>


The important thing is that each model is very different.


Blending in regression problems is so simple that it is not provided in scikit-learn.


<< **Supplement** >>


In the case of a classification problem, a majority vote will be taken. Because it is more complicated than regression problems, scikit-learn provides a Voting Classifier.


[sklearn.ensemble.VotingClassifier — scikit-learn 0.21.3 documentation](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.VotingClassifier.html)

## Libraries

In [42]:
from numpy import hstack
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR
from sklearn.ensemble import VotingRegressor
import pandas as pd

## Data

In [2]:
data = pd.read_csv("train.csv")

In [3]:
data.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000


In [4]:
data.shape

(1460, 81)

In [10]:
X = data[["GrLivArea","YearBuilt"]]
X.head()


Unnamed: 0,GrLivArea,YearBuilt
0,1710,2003
1,1262,1976
2,1786,2001
3,1717,1915
4,2198,2000


In [12]:
Y = data["SalePrice"]
Y.head()

0    208500
1    181500
2    223500
3    140000
4    250000
Name: SalePrice, dtype: int64

In [28]:
X_train, X_test, y_train, y_test = train_test_split(X.to_numpy(), Y.to_numpy(), test_size=0.25, random_state=42)

In [30]:
print("Y",Y.shape)
print("X",X.shape)
print("X_train",X_train.shape)
print("X_test",X_test.shape)
print("y_train",y_train.shape)
print("y_test",y_test.shape)

Y (1460,)
X (1460, 2)
X_train (1095, 2)
X_test (365, 2)
y_train (1095,)
y_test (365,)


In [33]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X)

X_train_trans = scaler.transform(X_train)
X_test_trans = scaler.transform(X_test)

In [34]:
print("X_train_trans",X_train_trans.shape)
print("X_test_trans",X_test_trans.shape)
print("y_train",y_train.shape)
print("y_test",y_test.shape)

X_train_trans (1095, 2)
X_test_trans (365, 2)
y_train (1095,)
y_test (365,)


### Blending

### Single Models

In [47]:
lr = LinearRegression()
dt = DecisionTreeRegressor()
svr = SVR()
lr.fit(X_train_trans, y_train)
dt.fit(X_train_trans, y_train)
svr.fit(X_train_trans, y_train)
y_pred_lr = lr.predict(X_test_trans)
y_pred_dt = dt.predict(X_test_trans)
y_pred_svr = svr.predict(X_test_trans)
print("Mean square error Linear Regression",mean_squared_error(y_test,y_pred_lr))
print("Mean square error Decision Tree",mean_squared_error(y_test,y_pred_dt))
print("Mean square error SVR",mean_squared_error(y_test,y_pred_svr))

Mean square error Linear Regression 2314465092.7320137
Mean square error Decision Tree 2477516526.54551
Mean square error SVR 7169177845.536933


### Ensamble Models

In [95]:
class Blendding():

    def __init__(self):
        self.models = list()
        self.blender = LinearRegression()
        self.models.append(LinearRegression())
        self.models.append(DecisionTreeRegressor())
        self.models.append(SVR())
    def fit(self, X_train, X_val, y_train, y_val):
        meta_X = list()
        for model in self.models:
            # fit in training set
            model.fit(X_train, y_train)
            # predict on hold out set
            yhat = model.predict(X_val)
            # reshape predictions into a matrix with one column
            yhat = yhat.reshape(len(yhat), 1)
            # store predictions as input for blending
            meta_X.append(yhat)
        # create 2d array from predictions, each set is an input feature
        meta_X = hstack(meta_X)
        self.blender.fit(meta_X, y_val)
    def predict(self, X_test):
        meta_X = list()
        for model in self.models:
            # predict with base model
            yhat = model.predict(X_test)
            # reshape predictions into a matrix with one column
            yhat = yhat.reshape(len(yhat), 1)
            # store prediction
            meta_X.append(yhat)
        # create 2d array from predictions, each set is an input feature
        meta_X = hstack(meta_X)
        # predict
        return self.blender.predict(meta_X)

In [96]:
blending_model = Blendding()
blending_model.fit( X_train_trans, X_test_trans, y_train, y_test)
y_pred_ensemble = blending_model.predict(X_test_trans)
print("Blending Score",mean_squared_error(y_test, y_pred_ensemble))

Blending Score 1869607264.7200263


### The result show that the essemble has smaller  Mean Squared Error

## Problem 2: Scratch mounting of bagging
Please show at least one​ ​example where you scratch-implement the bagging and it is more accurate than a single model.

## What is bagging?


Bagging is a way to diversify how to select input data. N types of subsets (bootstrap samples) are created by randomly extracting from the training data after allowing duplication. N models are trained by them and the estimation results are averaged. Unlike blending, each weighting does not change.


[sklearn.model_selection.train_test_split — scikit-learn 0.21.3 documentation](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)


Data can be split randomly by using scikit-learn's train_test_split with the shuffle parameter set to True. This will give you a bootstrap sample.


The part that averages the estimation results is implemented in the same way as boosting.

In [122]:
import random
import numpy as np
class BaggingRegressor:
    def __init__(self,number_model = 200,sample_size = 0.5):
        self.models = [LinearRegression() for i in range(number_model)]
        self.sample_size = sample_size
    def fit(self,X, y):
        X = np.array(X)
        y = np.array(y)
        for model in self.models:
            sample_X , sample_y = self.subsample(X,y)
            model.fit(sample_X,sample_y)
    
    def predict(self,X):
        result = 0
        for model in self.models:
            result += model.predict(X)
        return result / len(self.models)
    # Create a random subsample from the dataset with replacement
    def subsample(self,X,y):
        sample_X = list()
        sample_y = list()
        n_sample = round(len(X) * self.sample_size)
        while len(sample_X) < n_sample:
            index = random.randrange(len(X))
            sample_X.append(X[index])
            sample_y.append(y[index])
        return np.array(sample_X) , np.array(sample_y)

In [136]:
ensamble_model = BaggingRegressor()

In [137]:
ensamble_model.fit(X_train_trans,y_train)

In [138]:
single_model = LinearRegression()

In [139]:
single_model.fit(X_train_trans,y_train)

LinearRegression()

In [159]:
single_model_result = single_model.predict(X_test_trans)

In [160]:
bagging_result = ensamble_model.predict(X_test_trans)

In [167]:
print('Mean Squared Error Linear Regression',mean_squared_error(y_test, single_model_result))

Mean Squared Error Linear Regression 2314465092.7320137


In [168]:
print('Mean Squared Error Bagging:',mean_squared_error(y_test, bagging_result))

Mean Squared Error Bagging: 2304942911.9577136


# Stacking

## Problem 3: Stacking scratch mounting

Please show at least one​ ​example where stacking is scratch-implemented and more accurate than a single model.

## What is stacking?

The stacking procedure is as follows. Stacking is possible if there is at least stage 0 and stage 1, so implement it. First, set about $K_0=3, M_0=2$.

《**When learning**》

(Stage $0$)

<li> Divide the training data into $K_0$ pieces. </li>
<li> The combination of $(K_0-1)$ pieces of the divided data can be made as learning data, and the remaining $1$ pieces can be made as estimation data to make $K_0$ pieces. </li>
<li> Prepare $K_0$ instances of a model and learn using different learning data. </li>
<li> For each trained model, input the remaining $1$ unused estimation data and obtain the estimated value. (This is called blend data) </li>
<li> In addition, prepare $K_0$ instances of different models and do the same. If there are $M_0$ models, $M_0$ blend data can be obtained. </li>

(Stage $n$)

<li> Consider the blended data of stage $n-1$ as learning data having $M_{n-1}$ dimensional features and divide it into $K_n$ pieces. The same applies hereinafter.

(Stage $N$) *Last stage

<li> One kind of model is trained by using $M_{N-1}$ blended data of stage $N-1$ as input of $M_{N-1}$ dimensional features. This is the model that makes the final estimation.

《**Estimated time**》
    
(Stage $0$)
    
<li> Input test data into $K_0 × M_0$ trained models and obtain $K_0 × M_0$ estimates. The average value is calculated on the $K_0$ axis, and data with $M_0$ dimensional features is obtained. (Called the blend test)

(Stage $n$)
    
<li> Input the blend test obtained at stage $n-1$ into $K_n×M_n$ trained models and obtain $K_n×M_n$ estimates. The average value is calculated on the axis of $K_n$, and the data with the $M_0$-dimensional feature quantity is obtained. (Called the blend test)

(Stage $N$) *Last stage
    
<li> Input the blend test obtained in the stage $N-1$ into the trained model to obtain the estimated value.

In [170]:
from numpy import mean
from numpy import std
from sklearn.datasets import make_regression
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedKFold
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR
from sklearn.ensemble import StackingRegressor
from matplotlib import pyplot

In [194]:
def get_models():
    models = dict()
    models['lr'] = LinearRegression()
    models['dt'] = DecisionTreeRegressor()
    models['svm'] = SVR()
    models['stacking'] = get_stacking()
    return models

def get_stacking():
    # define the base models
    level0 = list()
    level0.append(('lr', LinearRegression()))
    level0.append(('dt', DecisionTreeRegressor()))
    level0.append(('svm', SVR()))
    # define meta learner model
    level1 = LinearRegression()
    # define the stacking ensemble
    model = StackingRegressor(estimators=level0, final_estimator=level1, cv=5)
    return model

def evaluate_model(model, X, y):
    cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
    scores = cross_val_score(model, X, y, scoring='neg_mean_squared_error', cv=cv, n_jobs=-1, error_score='raise')
    return scores

In [195]:
models = get_models()

In [196]:
results, names = list(), list()

In [198]:
for name, model in models.items():
    scores = evaluate_model(model, X_test_trans, y_test)
    results.append(scores)
    names.append(name)
    print('>%s %.3f (%.3f)' % (name, mean(scores), std(scores)))

>lr -2362134345.783 (1062051104.700)
>dt -3833171660.947 (1784754247.536)
>svm -7563001526.929 (3888030235.055)
>stacking -2329858661.110 (1039956256.723)
