# Advanced Ensemble Learning
Ensemble learning techniques are used to improve classification/regressions in several ways. For a comprehensive guide, please see [https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/](here).   

In this notebook, I am going to practice some of the advanced techniques like 
1. Stacking
2. Blending
3. Bagging
4. Boosting

### Importing libraries

In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.ensemble import BaggingClassifier, RandomForestClassifier
from sklearn.ensemble import AdaBoostClassifier, GradientBoostingClassifier
    
import xgboost as xgb

### Loading and manipulating data

In [2]:
# Load dataset
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
data = pd.read_csv('data/iris.csv', names=names)
X = data[['sepal-length', 'sepal-width', 'petal-length', 'petal-width']]
Y = data[['class']]
x_train, x_test, y_train, y_test = train_test_split(X,Y, test_size=0.3, random_state=20)

# Normalizing the data
#scaler = StandardScaler()
#scaler.fit(x_train)
#x_train = scaler.transform(x_train)
#x_test = scaler.transform(x_test)

### Advanced Ensembling Techniques
Here we implement some advanced ensembling techniques:
1. **Stacking** - [more here](http://blog.kaggle.com/2016/12/27/a-kagglers-guide-to-model-stacking-in-practice/). Using base models, does predictions on sections of training data (stratified), and finally the whole data. The base models are combined by using Logistic regression on the predictions of all base models.
2. **Blending** - The data is split into train, validation, and testing. Base models are fit onto the trainig set, and predictions are made on the validation and test sets. Final makes fits on the validation set, and predicts on the training set.
Bagging-type
3. **Bagging** or Bootstrap Aggregating - Different samples are taken with replacement, and several models are applied on it, and a base model combines all the answers together.
4. **Random Forest** - Like bagging, it bootstraps data points, but also selects some features for every model and finally combines all the predictions with a decision tree.
Boosting-type
5. **AdaBoost** - Models are built on a subset of data and erros are calculated. In the next model, data points that are incorrectly predicted are given more weight. This process is repeated until the error function becomes constant.
6. 

### Stacking

In [3]:
def Stacking(model,train,y,test,n_fold):
    """Performs stacking on given data"""
    folds = StratifiedKFold(n_splits=n_fold,random_state=1)
    test_pred = []
    train_pred = []
    for train_indices, val_indices in folds.split(train,y.values):
        x_train, x_val = train.iloc[train_indices], train.iloc[val_indices]
        y_train, y_val = y.iloc[train_indices], y.iloc[val_indices]

        model.fit(X=x_train,y=y_train)
        train_pred = np.append(train_pred,model.predict(x_val))
    model.fit(train, y.values); model.predict(test);
    test_pred = np.append(test_pred,model.predict(test))
    return test_pred, train_pred


In [4]:
model1 = DecisionTreeClassifier(random_state=1)

test_pred1, train_pred1 =Stacking(model=model1,n_fold=5, train=x_train,test=x_test,y=y_train)

train_pred1 = pd.DataFrame(train_pred1)
test_pred1 = pd.DataFrame(test_pred1)

In [5]:
model2 = KNeighborsClassifier(2)

test_pred2 ,train_pred2=Stacking(model=model2,n_fold=5,train=x_train,test=x_test,y=y_train)

train_pred2 = pd.DataFrame(train_pred2)
test_pred2 = pd.DataFrame(test_pred2)

  # Remove the CWD from sys.path while we load stuff.
  if sys.path[0] == '':


In [6]:
df = pd.concat([train_pred1, train_pred2], axis=1)
df_test = pd.concat([test_pred1, test_pred2], axis=1)

# Since this a categorical variable have to one hot encode
enc = OneHotEncoder()
enc.fit(df_test)
enc_df_test = enc.transform(df_test)
enc_df = enc.transform(df)


model = LogisticRegression(random_state=1)
model.fit(enc_df,y_train)
print "Score of stacked model = %8.5f" %(model.score(enc_df_test, y_test))

Score of stacked model =  0.88889


  y = column_or_1d(y, warn=True)


### Blending

In [7]:
# Test-train split
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=1)
# Train-validation split
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.2, random_state=1)

# Fittin first model
model1 = DecisionTreeClassifier(random_state=1)
model1.fit(x_train, y_train)
val_pred1 = model1.predict(x_val)
test_pred1 = model1.predict(x_test)
val_pred1 = pd.DataFrame(val_pred1)
test_pred1 = pd.DataFrame(test_pred1)

# Fitting second model
model2 = KNeighborsClassifier(2)
model2.fit(x_train,y_train)
val_pred2 = model2.predict(x_val)
test_pred2 = model2.predict(x_test)
val_pred2 = pd.DataFrame(val_pred2)
test_pred2 = pd.DataFrame(test_pred2)

  app.launch_new_instance()


In [8]:
# Combining models
df = pd.concat([val_pred1, val_pred2], axis=1)
df_test = pd.concat([test_pred1, test_pred2], axis=1)

# Since this a categorical variable have to one hot encode
enc = OneHotEncoder()
enc.fit(df_test)
enc_df_test = enc.transform(df_test)
enc_df = enc.transform(df)


model = LogisticRegression(random_state=1)
model.fit(enc_df,y_val)
print "Score of blended model = %8.5f" %(model.score(enc_df_test, y_test))

Score of blended model =  0.96667


### Bagging

In [9]:
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=1)
# Choosing with replacement, and fitting decision tree on each dataset
# Final answer by voting
model = BaggingClassifier(DecisionTreeClassifier(random_state=1))
model.fit(x_train, y_train)
print "Score of bagging model with decision trees = %8.5f" %(model.score(x_test,y_test))

Score of bagging model with decision trees =  0.96667


  y = column_or_1d(y, warn=True)


In [10]:
model = BaggingClassifier(KNeighborsClassifier(2))
model.fit(x_train, y_train)
print "Score of bagging model with kNN = %8.5f" %(model.score(x_test,y_test))

Score of bagging model with kNN =  1.00000


### Random Forest

In [11]:
# Choosing with replacement, and fitting decision tree on each dataset
# Final answer by voting
model = RandomForestClassifier()
model.fit(x_train, y_train)
print "Score of random forest = %8.5f" %(model.score(x_test,y_test))

  after removing the cwd from sys.path.


Score of random forest =  0.96667


### AdaBoost

In [12]:
# Decision trees as base estmators
model = AdaBoostClassifier(random_state=1)
model.fit(x_train, y_train)
print "Score of AdaBoost = %8.5f" %(model.score(x_test,y_test))

Score of AdaBoost =  0.96667


### Gradient Boosting 

In [13]:
model= GradientBoostingClassifier(learning_rate=0.01,random_state=1)
model.fit(x_train, y_train)
print "Score of Gradient Boosting = %8.5f" %(model.score(x_test,y_test))

Score of Gradient Boosting =  0.96667


## XGBoost
Boosting with regularization

In [14]:
model=xgb.XGBClassifier(random_state=1,learning_rate=0.01)
model.fit(x_train, y_train)
print "Score of XGBoosting = %8.5f" %(model.score(x_test,y_test))

Score of XGBoosting =  0.96667


  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


**Conclusion** - For the toy Iris dataset, we used different, advanced ensemble techniques. Most prediction scores were 0.96 or greater, so all models performed really well. Notably, stacked models performed poorly (0.88) and Bagging classifer with NearestNeighborsClassifier worked the best.
