# Ensemble Learning and Random Forests

A group of predictors is called an ensemble; thus, this technique is called **Ensemble Learning** , and an Ensemble Learning Algorithm is called a an **Ensemble Method**

## Voting Classifiers

Suppose you have trained a few classifiers, each one achieving about 80% accuracy. You may have a Logistic Regression classifier, an SVM classifier, a Random Forest classifer, a K-Nearest Neighbors classifier, and perhaps a few more..

In [1]:
import numpy as np
heads_proba= .51 # Define the probability of heads 
coin_tosses = (np.random.rand(1,000,10)<heads_proba).astype(np.int32) # The np.random.rand() function returns a 10,000 rows with 10 random numbers as columns

In [2]:
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split

# Importing the train_test_split is required to create training samples.

X, y = make_moons(n_samples=500, noise=.30,random_state=42)


X_train, X_test,y_train,y_test = train_test_split(X ,y, random_state=42)

# Splitting of Training and Testing Set is Seperated with a Random Seed.

0.35

In [3]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC


log_clf = LogisticRegression()
rnd_clf = RandomForestClassifier()
svm_clf = SVC()

voting_clf = VotingClassifier(
    estimators = [('lr',log_clf), ('rf',rnd_clf),('svc',svm_clf)],
    voting = 'hard'
)

voting_clf.fit(X_train,y_train)

VotingClassifier(estimators=[('lr', LogisticRegression()),
                             ('rf', RandomForestClassifier()), ('svc', SVC())])

Let us look at each classifiers accuracy on the test set:

In [4]:
from sklearn.metrics import accuracy_score
for clf in (log_clf,rnd_clf,svm_clf,voting_clf):
    clf.fit(X_train,y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__,accuracy_score(y_test,y_pred))

LogisticRegression 0.864
RandomForestClassifier 0.888
SVC 0.896
VotingClassifier 0.912


In [5]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC


log_clf = LogisticRegression()
rnd_clf = RandomForestClassifier()
svm_clf = SVC(probability=True)

voting_clf = VotingClassifier(
    estimators = [('lr',log_clf), ('rf',rnd_clf),('svc',svm_clf)],
    voting = 'soft'
)

voting_clf.fit(X_train,y_train)

VotingClassifier(estimators=[('lr', LogisticRegression()),
                             ('rf', RandomForestClassifier()),
                             ('svc', SVC(probability=True))],
                 voting='soft')

# Bagging and Pasting in Scikit-Learn

One way to get a diverse set of classifiers is very different training algorithms, as just discussed. Another approach is to use the same training algorithm for every, but to train them on different random subsets of the training set. When sampling is performed with replacement, this method is called bagging (short for bootstrap). When the sampling is done without replacement it is called Pasting.

In [6]:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

bag_clf=BaggingClassifier(
    DecisionTreeClassifier(), n_estimators=500, max_samples=100,bootstrap=True,n_jobs=-1
)

bag_clf.fit(X_train,y_train)
y_pred=bag_clf.predict(X_test)


In [7]:
y_pred[0]

0

## Out of Bag Evaluation


In [8]:
bag_clf = BaggingClassifier(
    DecisionTreeClassifier(),n_estimators=500,
    bootstrap=True,n_jobs=-1,oob_score=True
)

bag_clf.fit(X_train,y_train)
bag_clf.oob_score_

0.904

According to this metric this Bagging Classifier is likely to achieve about 93.1 accuracy on the test set.

In [9]:
from sklearn.metrics import accuracy_score
y_pred = bag_clf.predict(X_test)
accuracy_score(y_test,y_pred)

0.904

# Random Forests

In [10]:
from sklearn.ensemble import RandomForestClassifier

rnd_clf=RandomForestClassifier(n_estimators=500,max_leaf_nodes=16,n_jobs=-1)
rnd_clf.fit(X_train,y_train)

y_pred_rf=rnd_clf.predict(X_test)

In [11]:
y_pred

array([1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1,
       1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0,
       0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0,
       0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0,
       1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1,
       0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0])

## A Bagging Classifer Equivalent to the Previous Random Forest

In [12]:
## Bagging Classifier With A Decision Classifier Only

bag_clf = BaggingClassifier(
    DecisionTreeClassifier(splitter="random",max_leaf_nodes=16),
    n_estimators=500, max_samples=1.0, bootstrap=True, n_jobs=-1
)

## Using Random Forests For Feature Importances

In [13]:
from sklearn.datasets import load_iris
iris=load_iris()
rnd_clf=RandomForestClassifier(n_estimators=500,n_jobs=-1)
rnd_clf.fit(iris["data"],iris["target"])

for name,score in zip(iris["feature_names"],rnd_clf.feature_importances_):
    print(name,score)


sepal length (cm) 0.09182141447733241
sepal width (cm) 0.024958852212837314
petal length (cm) 0.4513550547973911
petal width (cm) 0.4318646785124392


# Ada Boosting

A sequential training of predictors where the instances have weights attributed to them based on the predictors ability to accurately classify them.

In [14]:
from sklearn.ensemble import AdaBoostClassifier

ada_clf=AdaBoostClassifier(
        DecisionTreeClassifier(max_depth=1),
        n_estimators=200,
        algorithm="SAMME.R",
        learning_rate=.5
        )
ada_clf.fit(X_train,y_train)

AdaBoostClassifier(base_estimator=DecisionTreeClassifier(max_depth=1),
                   learning_rate=0.5, n_estimators=200)

# Gradient Boosting


Gradient boosting is training sequential Predictors that do a better job at reducing the residual errors of the predicotrs. This is opposed to the AdaBoost which changes the weights of the instances.

Train a Decision Tree Regressor to the training set.

In [15]:
from sklearn.tree import DecisionTreeRegressor

tree_reg1=DecisionTreeRegressor(max_depth=2)
tree_reg1.fit(X,y)

DecisionTreeRegressor(max_depth=2)

Train a second Decision Tree Regressor on the residual errors made by the first predictor

In [17]:
y2 = y - tree_reg1.predict(X)
tree_reg2 = DecisionTreeRegressor(max_depth=2)
tree_reg2.fit(X,y2)

DecisionTreeRegressor(max_depth=2)

We then train a third regressor on the residual errors made by the second preditor:

In [19]:
y3= y2 - tree_reg2.predict(X)
tree_reg3=DecisionTreeRegressor(max_depth=2)
tree_reg3.fit(X,y3)

DecisionTreeRegressor(max_depth=2)

In [29]:
print(X)

[[ 8.31039149e-01 -2.58748754e-01]
 [ 1.18506381e+00  9.20387143e-01]
 [ 1.16402213e+00 -4.55525583e-01]
 [-2.36556013e-02  1.08628844e+00]
 [ 4.80502733e-01  1.50942444e+00]
 [ 1.31164912e+00 -5.51176060e-01]
 [ 1.16542367e+00 -1.58629894e-01]
 [ 1.56736404e-01  1.31817168e+00]
 [ 4.53301022e-01  4.96074925e-01]
 [ 1.65139719e+00 -4.59804351e-01]
 [ 1.02664982e+00 -1.56999382e-02]
 [-3.99677570e-01  2.52192940e-01]
 [ 1.85352710e+00 -7.16418704e-01]
 [ 1.17564737e-01  6.24869329e-01]
 [ 1.54123944e+00 -5.11050694e-01]
 [ 1.32833559e+00 -5.40696860e-01]
 [ 1.97170320e+00  2.97790052e-01]
 [ 9.44441260e-01  5.17911799e-01]
 [ 8.30619129e-01 -8.02099114e-01]
 [ 1.89343763e+00 -2.53611270e-01]
 [ 1.88323111e+00  2.22375278e-01]
 [ 2.30801311e+00  4.67930154e-01]
 [-3.89437608e-01  2.39389050e-01]
 [ 1.11201360e+00 -2.15523587e-01]
 [ 3.86380017e-01  4.15946340e-01]
 [ 1.78946151e+00  1.01491329e+00]
 [ 1.53618196e+00  2.21948473e-01]
 [-7.69640886e-01  4.23924672e-01]
 [ 2.32109137e-01  1

In [30]:
#y_pred=sum(tree.predict(X[0]) for tree in (tree_reg1,tree_reg2,tree_reg3))

We can get new predictions by simply adding the sums of previous predictions.

# Scikit Learn Gradient Boosted Trees

In [34]:
from sklearn.ensemble import GradientBoostingRegressor

gbrt= GradientBoostingRegressor(max_depth=2,n_estimators=3,learning_rate=1.0)
gbrt.fit(X,y)

GradientBoostingRegressor(learning_rate=1.0, max_depth=2, n_estimators=3)

Training with while searching for the optimal number of trees

In [35]:
import numpy as np 
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

X_train, X_val, y_train, y_val = train_test_split(X,y)

In [37]:
grbt = GradientBoostingRegressor(max_depth=2,n_estimators=120)
grbt.fit(X_train,y_train)

GradientBoostingRegressor(max_depth=2, n_estimators=120)

In [38]:
errors = [mean_squared_error(y_val,y_pred) for y_pred in grbt.staged_predict(X_val)]

bst_n_estimators=np.argmin(errors)

In [39]:
grbt_best = GradientBoostingRegressor(max_depth=2,n_estimators=bst_n_estimators)
grbt_best.fit(X_train,y_train)

GradientBoostingRegressor(max_depth=2, n_estimators=55)

# Early Stopping When No Improvement Occurs

In [None]:
grbt = GradientBoostingRegressor(max_depth=2,warm_start=True)

min_val_error = float("inf")
error_going_up=0
for n_estimators in range(1,120):
    grbt.n_estimators=n_estimators
    grbt.fit(X_train,y_train)
    y_pred=grbt.predict(X_val)
    val_error=mean_squared_error(y_val,y_pred)
    if min_val_error < val_error:
        min_val_error=val_error
        error_going_up = 0
    else:
        error_going_up += 1
        if error_going_up == 5:
            break # Early Stopping

In [1]:
import xgboost

# xgb_reg = xgboost.XGBRegressor()
# xgb_reg.fit(X_train,y_train)
# y_pred = xgb_reg.predict(X_val)

# xgb_reg.fit(X_train,Y_train,
#             eval_set = [(X_val,y_val)], early_stopping_rounds=2)
# y_pred = xgb_reg.predict(X_val)


ModuleNotFoundError: No module named 'xgboost'

### NOTE : The XGBOOST package is not available with the anaconda packages and should be an additional download.

## Stacking