# Ensemble Learning Exercise

## 1. Implementing Random Forest From Scratch (30 points)
In this exercise you will need to implement a simple version of Random Forest Regressor from scratch. Your model will handle **continuous input and output**. 

* Compelete the skeleton class below (you should use scikit-learn's `DecisionTreeRegressor` model that the `TreeEnsemble` will use)
  - `X` is a matrix of data values (rows are samples, columns are attributes)
  - `y` is a vector of corresponding target values
  - `n_trees` is the number of trees to create
  - `sample_sz` is the size of the sample set to use of each of the trees in the forest (chose the samples randomly, with or without repetition)
  - `n_features` is the size of features to sample. This can be a natrual number > 0, or a ratio of the features as a number in range (0,1]
  - `min_leaf` is the minimal number of samples in each leaf node of each tree in the forest
  

* The `predict` function will use mean of the target values of the trees. The result is a vector of predictions matching the number of rows in `X`.

* The `oob_mse` function will compute the mean squared error over all **out of bag (oob)** samples. That is, for each sample calculate the squared error using  predictions from the trees that do not contain x in their respective bootstrap sample, then average this score for all samples. See:  [OOB Errors for Random Forests](https://scikit-learn.org/stable/auto_circless/ensemble/plot_ensemble_oob.html).

* To check your random forest implementation, use the boston dataset (`from sklearn.datasets import load_boston`)

  - Use the following to estimate what are the best hyper parameters to use for your model
```
for n in [1,5,10,20,50,100]:
  for sz in [50,100,300,500]:
    for min_leaf in [1,5]:
      forest = TreeEnsemble(X, y, n, sz, min_leaf)
      mse = forest.oob_mse()
      print("n_trees:{0}, sz:{1}, min_leaf:{2} --- oob mse: {3}".format(n, sz, min_leaf, mse))
```
  
  - Using your chosen hyperparameters as a final model, plot the predictions vs. true values of all the samples in the training set . Use something like:
  ```
  y_hat = forest.predict(X)  # forest is the chosen model
  plt.scatter(y_hat, y)
  ```
 


In [1]:
class TreeEnsemble():
    def __init__(self, X, y, n_trees, sample_sz, n_features, min_leaf):
        pass

    def predict(self, X):
        pass

    def oob_mse(self):
        pass



## 2. Implementing AdaBoost From Scratch (30 points)


*   Implement the AdaBoost algorithm for classification task. Your `AdaBoost` class should receive a method for creating a weak learner, which has a fit and predict methods (**hint**: you can simulate re-weighting of the samples by an appropriate re-sampling of the train set).
*   Use your model to find a strong classifier on the sample set given below, using $n$ weak learners:
    - For the base weak learners, use a ***linear*** SVM classifier (use `LinearSVC` with the default parameters). 
    - Split the sample set into train and test sets.
    - Plot the final decision plane of your classifier for $n\in \{1, 2, 3, 5, 10, 50\}$, and visualize the final iteration weights of the samples in those plots.
    - How does the overall train set accuracy changes with $n$?
    - Does you model starts to overfit at some point?



In [2]:
def plot_decision_boundary(clf, X, y, axes=[-1.5, 2.45, -1, 1.5], alpha=0.5, contour=True):
    x1s = np.linspace(axes[0], axes[1], 100)
    x2s = np.linspace(axes[2], axes[3], 100)
    x1, x2 = np.meshgrid(x1s, x2s)
    X_new = np.c_[x1.ravel(), x2.ravel()]
    y_pred = clf.predict(X_new).reshape(x1.shape)
    custom_cmap = ListedColormap(['#fafab0','#9898ff','#a0faa0'])
    plt.contourf(x1, x2, y_pred, alpha=0.3, cmap=custom_cmap)
    if contour:
        custom_cmap2 = ListedColormap(['#7d7d58','#4c4c7f','#507d50'])
        plt.contour(x1, x2, y_pred, cmap=custom_cmap2, alpha=0.8)
    plt.plot(X[:, 0][y==0], X[:, 1][y==0], "yo", alpha=alpha)
    plt.plot(X[:, 0][y==1], X[:, 1][y==1], "bs", alpha=alpha)
    plt.axis(axes)
    plt.xlabel(r"$x_1$", fontsize=18)
    plt.ylabel(r"$x_2$", fontsize=18, rotation=0)

In [3]:
m = len(X_train)

fix, axes = plt.subplots(ncols=2, figsize=(10,4), sharey=True)
for subplot, learning_rate in ((0, 1), (1, 0.5)):
    sample_weights = np.ones(m) / m
    plt.sca(axes[subplot])
    for i in range(5):
        svm_clf = SVC(kernel="rbf", C=0.2, gamma=0.6, random_state=42)
        svm_clf.fit(X_train, y_train, sample_weight=sample_weights * m)
        y_pred = svm_clf.predict(X_train)

        r = sample_weights[y_pred != y_train].sum() / sample_weights.sum() # equation 7-1
        alpha = learning_rate * np.log((1 - r) / r) # equation 7-2
        sample_weights[y_pred != y_train] *= np.exp(alpha) # equation 7-3
        sample_weights /= sample_weights.sum() # normalization step

        plot_decision_boundary(svm_clf, X, y, alpha=0.2)
        plt.title("learning_rate = {}".format(learning_rate), fontsize=16)
    if subplot == 0:
        plt.text(-0.75, -0.95, "1", fontsize=14)
        plt.text(-1.05, -0.95, "2", fontsize=14)
        plt.text(1.0, -0.95, "3", fontsize=14)
        plt.text(-1.45, -0.5, "4", fontsize=14)
        plt.text(1.36,  -0.95, "5", fontsize=14)
    else:
        plt.ylabel("")

save_fig("boosting_plot")
plt.show()

NameError: name 'X_train' is not defined

In [4]:
from sklearn.datasets import make_circles
from matplotlib import pyplot
from pandas import DataFrame

# generate 2d classification dataset
X, y = make_circles(n_samples=1500, noise=0.2, random_state=101, factor=0.5)

# scatter plot, dots colored by class value
df = DataFrame(dict(x=X[:,0], y=X[:,1], label=y))
colors = {0:'red', 1:'blue'}
fig, ax = pyplot.subplots()
grouped = df.groupby('label')
for key, group in grouped:
    group.plot(ax=ax, kind='scatter', x='x', y='y', label=key, color=colors[key])
pyplot.show()

<Figure size 640x480 with 1 Axes>

In [5]:
#importing libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from random import sample
import random
from sklearn.metrics import confusion_matrix
from sklearn import tree
from math import log,exp
from sklearn.svm import SVC
import seaborn as sns
sns.set_style('white')

In [6]:
learning_rate = 0.5
weight_list = []


In [7]:
circles = pd.DataFrame({'Column1': X[:, 0], 'Column2': X[:, 1]})
circles['Label'] = y

In [8]:
circles['prob1'] = 1/(circles.shape[0])

In [9]:
circles

Unnamed: 0,Column1,Column2,Label,prob1
0,-0.495799,0.323412,1,0.000667
1,-0.419618,-0.318647,1,0.000667
2,-0.068761,-0.124953,1,0.000667
3,0.517004,0.383915,1,0.000667
4,-0.076649,-0.376912,1,0.000667
...,...,...,...,...
1495,0.447559,-0.222795,1,0.000667
1496,0.854887,0.887338,0,0.000667
1497,0.487560,-0.980803,0,0.000667
1498,0.064621,-0.889950,1,0.000667


In [11]:
circles1.head()

Unnamed: 0,Column1,Column2,Label,prob1
43,0.188413,-0.610976,1,0.000667
996,-0.110184,-0.463169,1,0.000667
83,-1.030785,-0.329508,0,0.000667
1148,-0.449383,0.28648,1,0.000667
916,0.14109,-0.988688,0,0.000667


In [10]:
#simple random sample with replacement
random.seed(10)
circles1 = circles.sample(len(circles), replace = True, weights = circles['prob1'])
#X_train and Y_train split
X_train = circles1.iloc[0:len(circles),0:2]
y_train = circles1.iloc[0:len(circles),2]
#fitting the DT model with depth one
classifier = SVC(kernel = 'rbf', random_state = 0)
clf = classifier.fit(X, y)
#prediction
y_pred = classifier.predict(circles.iloc[0:len(circles),0:2])
circles['pred1'] = y_pred
#misclassified = 0 if the label and prediction are same
circles.loc[circles['Label'] == circles.pred1, 'misclassified'] = 0
circles.loc[circles['Label'] != circles.pred1, 'misclassified'] = 1
error = sum(circles['misclassified'] * circles['prob1'])# /len(circles)
weight = learning_rate*log((1-error)/error)
weight_list.append(weight)
#update weight
new_weight = circles['prob1']*np.exp(-1*weight*circles['Label']*circles['pred1'])
#normalized weight
# z = sum(new_weight)
normalized_weight = new_weight/sum(new_weight)
circles['prob2'] = round(normalized_weight,4)
circles.head(10)



Unnamed: 0,Column1,Column2,Label,prob1,pred1,misclassified,prob2
0,-0.495799,0.323412,1,0.000667,1,0.0,0.0003
1,-0.419618,-0.318647,1,0.000667,1,0.0,0.0003
2,-0.068761,-0.124953,1,0.000667,1,0.0,0.0003
3,0.517004,0.383915,1,0.000667,1,0.0,0.0003
4,-0.076649,-0.376912,1,0.000667,1,0.0,0.0003
5,-0.161664,-0.439531,1,0.000667,1,0.0,0.0003
6,-0.252313,0.526283,1,0.000667,1,0.0,0.0003
7,-0.550185,0.263274,1,0.000667,1,0.0,0.0003
8,0.329312,-0.043181,1,0.000667,1,0.0,0.0003
9,-0.571568,1.022087,0,0.000667,0,0.0,0.0009


In [12]:
l = str(2)
M = 2
for m in range(M):
    num_pred = "pred" + l
    num_misclass = "misclassified" + l
    num_prob = "prob" + l
    random.seed(20)
    circles_temp = circles.sample(len(circles), replace = True, weights = circles['prob2'])
    circles_temp = circles_temp.iloc[:,0:3]
    X_train = circles_temp.iloc[0:len(circles),0:2]
    y_train = circles_temp.iloc[0:len(circles),2]

    classifier = SVC(kernel = 'rbf', random_state = 0)
    clf = classifier.fit(X, y)

    y_pred = classifier.predict(circles.iloc[0:len(circles),0:2])
    #adding a column pred2 after the second round of boosting
    circles[num_pred] = y_pred

    #adding a field misclassified
    circles.loc[circles['Label'] == circles[num_pred], num_misclass] = 0
    circles.loc[circles['Label'] != circles[num_pred], num_misclass] = 1

    # calculation of error
    error = sum(circles[num_misclass] * circles[num_prob])# /len(circles)
    # print(error)
    
    #calculation of weights
    weight = learning_rate*log((1-error)/error)
    weight_list.append(weight)
    print(weight)
#     print(num_pred)
#     print(circles[num_pred])
#     print(weight2)

    #update weight
    new_weight = circles[num_prob]*np.exp(-1*weight*circles['Label']*circles[num_pred])
    # print(new_weight)
    # z = sum(new_weight)
    normalized_weight = new_weight/sum(new_weight)
    # print(normalized_weight)
    l = int(l)+1
    l = str(l)
    num_prob_p = "prob" + l
    circles[num_prob_p] = round(normalized_weight,4)
    


0.9172629161118349
0.7988618488389417




In [14]:
circles_temp

Unnamed: 0,Column1,Column2,Label
1306,0.046081,-0.531297,1
981,1.052127,0.563368,0
311,-1.037654,0.447629,0
509,-0.038964,-1.005537,0
78,0.329079,0.140998,1
...,...,...,...
911,-0.874770,0.549524,0
25,0.379233,-1.243899,0
766,1.032952,0.604765,0
512,0.146792,-1.001921,0


In [None]:
# final 
m = m + 1 
num_pred = "pred" + str(m)
num_misclass = "misclassified" + str(m)
num_prob = "prob" + str(m)
random.seed(20)
circles_final = circles.sample(len(circles), replace = True, weights = circles[num_prob])
circles_final = circles_final.iloc[:,0:3]
X_train = circles_final.iloc[0:len(circles),0:2]
y_train = circles_final.iloc[0:len(circles),2]

classifier = SVC(kernel = 'rbf', random_state = 0)
clf = classifier.fit(X, y)

#adding a column pred4 after the fourth round of boosting
y_pred = classifier.predict(circles.iloc[0:len(circles),0:2])
circles[num_pred] = y_pred

# #plotting tree for round 4 boosting
# tree.plot_tree(clf)

#adding a field misclassified4
circles.loc[circles['Label'] == circles[num_pred], num_misclass] = 0
circles.loc[circles['Label'] != circles[num_pred], num_misclass] = 1

#error calculation
error = sum(circles[num_misclass] * circles[num_prob])# /len(circles)
error

# calculation of performance (weight)
weight = learning_rate*log((1-error)/error)
weight_list.append(weight)

In [None]:
y_pred_final = weight_list[0] * circles['pred1'] + weight_list[1] * circles['pred2'] + weight_list[2] * circles['pred3'] + weight_list[3] * circles['pred4']

In [None]:
np.sign(list(y_pred_final))


In [None]:
circles['final_pred'] = np.sign(list(y_pred_final))

In [None]:
#Confusion matrix
c=confusion_matrix(circles['Label'], circles['final_pred'])
c

In [None]:
from matplotlib.colors import ListedColormap
X_set, y_set = X, y
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 0, stop = X_set[:, 0].max() + 0, step = 0.25),
                     np.arange(start = X_set[:, 1].min() - 0, stop = X_set[:, 1].max() + 0, step = 0.25))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Kernel SVM (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

In [None]:
from matplotlib.colors import ListedColormap
X_set, y_set = X, y
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 0, stop = X_set[:, 0].max() + 0, step = 0.25),
                     np.arange(start = X_set[:, 1].min() - 0, stop = X_set[:, 1].max() + 0, step = 0.25))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Kernel SVM (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

## 3. Boosting Trees from Scratch (40 points)
* Use the scikit-learn's DecisionTreeRegressor (again :) with `max_depth = 1` (stumps)  to write a L2Boost model which minimize the L2 square loss iteration by iteration.
Reminder: in each step, build a decision tree to minimize the error between the true label and the accumulated (sum) of the previous step predictions.
![alt text](https://explained.ai/gradient-boosting/images/latex-321A7951E78381FB73D2A6874916134D.svg)
* Use the Boston dataset to plot the MSE as a function of the number of trees for a logspace of `n_trees` up to 1,000. What is the optimal value of `n_trees`? of learning rate?
* Compare the performance with a deep DecisionTreeRegressor (find the optimal `max_depth`).  Who wins?
* Add an early-stopping mechanisim to the GBTL2 model to use a validation set to detect over-fit.

In [None]:
def plot_predictions(regressors, X, y, axes, label=None, style="r-", data_style="b.", data_label=None):
    x1 = np.linspace(axes[0], axes[1], 500)
    y_pred = sum(regressor.predict(x1.reshape(-1, 1)) for regressor in regressors)
    plt.plot(X[:, 0], y, data_style, label=data_label)
    plt.plot(x1, y_pred, style, linewidth=2, label=label)
    if label or data_label:
        plt.legend(loc="upper center", fontsize=16)
    plt.axis(axes)

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor


from sklearn.datasets import load_boston
X,y = load_boston(return_X_y=True)

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split


from sklearn.datasets import load_boston
X,y = load_boston(return_X_y=True)

In [None]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

X_train, X_val, y_train, y_val = train_test_split(X, y, random_state=49)

gbrt = GradientBoostingRegressor(max_depth=2, n_estimators=120, random_state=42)
gbrt.fit(X_train, y_train)

errors = [mean_squared_error(y_val, y_pred)
          for y_pred in gbrt.staged_predict(X_val)]
bst_n_estimators = np.argmin(errors) + 1

gbrt_best = GradientBoostingRegressor(max_depth=2, n_estimators=bst_n_estimators, random_state=42)
gbrt_best.fit(X_train, y_train)

In [None]:
min_error = np.min(errors)


In [None]:
plt.figure(figsize=(10, 4))

plt.subplot(121)
plt.plot(errors, "b.-")
plt.plot([bst_n_estimators, bst_n_estimators], [0, min_error], "k--")
plt.plot([0, 120], [min_error, min_error], "k--")
plt.plot(bst_n_estimators, min_error, "ko")
plt.text(bst_n_estimators, min_error*1.2, "Minimum", ha="center", fontsize=14)
plt.axis([0, 120, 0, 0.01])
plt.xlabel("Number of trees")
plt.ylabel("Error", fontsize=16)
plt.title("Validation error", fontsize=14)

plt.subplot(122)
plot_predictions([gbrt_best], X, y, axes=[-0.5, 0.5, -0.1, 0.8])
plt.title("Best model (%d trees)" % bst_n_estimators, fontsize=14)
plt.ylabel("$y$", fontsize=16, rotation=0)
plt.xlabel("$x_1$", fontsize=16)

# save_fig("early_stopping_gbrt_plot")
plt.show()

In [None]:
gbrt = GradientBoostingRegressor(max_depth=2, warm_start=True, random_state=42)

min_val_error = float("inf")
error_going_up = 0
for n_estimators in range(1, 120):
    gbrt.n_estimators = n_estimators
    gbrt.fit(X_train, y_train)
    y_pred = gbrt.predict(X_val)
    val_error = mean_squared_error(y_val, y_pred)
    if val_error < min_val_error:
        min_val_error = val_error
        error_going_up = 0
    else:
        error_going_up += 1
        if error_going_up == 5:
            break  # early stopping

In [None]:
plt.figure(figsize=(11,11))

plt.subplot(321)
plot_predictions([gbrt], X, y, axes=[-0.5, 0.5, -0.1, 0.8], label="$h_1(x_1)$", style="g-", data_label="Training set")
plt.ylabel("$y$", fontsize=16, rotation=0)
plt.title("Residuals and tree predictions", fontsize=16)

save_fig("gradient_boosting_plot")
plt.show()