# **Chapter 7 - Ensemble Learning and Random Forests**
- We all know that If we have a Question and we ask it's Answer from General Public and Accumulate Their Results most of the Times It would be better answer than and Experts Answer. This is known as **Wisdom of Crowd**.
- **Machine Learning** Also Functions and Behaves Similar to This if You Train A *model* that is *Expert* in a Certain Task and Someone Else Creates *1000* Relatively **Dumb Models** and then Culminate there Skills into one than There is a High Possibility His Model Overtaking Your Model in Every Aspect in the Language of Machine Learning it is known as **Ensemble Learning**.

## What We are Going to Learn
- Voting Classifier
- Bagging and Pasting
    - Bagging and Pasting in Scikit-Learn
    - Out-of-Bag Evaluation
- Random Patches and Random Subspaces
- Random Forests
    - Extra-Trees
    - Feature Importance
- Boosting
    - AdaBoost
    - Gradient Boosting
- Stacking Ensemble

# Voting Classifiers
- In this Ensemble We Train Multiple Classifiers and the One Category Which Gets the Most Number of Votes is finally Predicted.
- There are Two Types of Voting Classifiers.
    - Hard Voting Classifier - Takes **Predicted Class** as Inputs
    - Soft Voting Classifier - Takes **Possibilities of Every Class** as Inputs

In [1]:
import pandas as pd
from sklearn.datasets import make_moons

X, y = make_moons(n_samples=10000, noise=0.25)
X_train = X[:2000]
y_train = y[:2000]
X_test = X[2000:]
y_test = y[2000:]

In [2]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

log_clf = LogisticRegression()
rnd_clf = RandomForestClassifier()
svm_clf = SVC(probability=True)

voting_clf = VotingClassifier(
    estimators=[("lr", log_clf), ("rnd", rnd_clf), ("svc", svm_clf)],
    voting="hard"
)
voting_clf.fit(X_train, y_train)

In [3]:
from sklearn.metrics import accuracy_score
for clf in (log_clf, rnd_clf, svm_clf, voting_clf):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_pred))

LogisticRegression 0.863375
RandomForestClassifier 0.935625
SVC 0.944375
VotingClassifier 0.941


### **Soft Voting Classifier** - Most of the Times Better than Hard Voting Classifiers

In [4]:
voting_clf_soft = VotingClassifier(
    estimators=[("lr", log_clf), ("rnd", rnd_clf), ("svc", svm_clf)],
    voting="soft"
)
voting_clf_soft.fit(X_train, y_train)

In [5]:
from sklearn.metrics import accuracy_score
for clf in (log_clf, rnd_clf, svm_clf, voting_clf_soft):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_pred))

LogisticRegression 0.863375
RandomForestClassifier 0.93625
SVC 0.944375
VotingClassifier 0.938875


# Bagging and Pasting
- There are Two ways to get a diverse set of *Classifiers* so They don't have same Weak Points :-
    - Training Multiple Type of Classifiers.
    - Training Same Classifiers on Random Subsets of the Same Dataset.
        - When Sampling is *Performed with replacement* it is known as **Bagging**.
        - When Sampling is *Performed without replacement* it is known as **Pasting**.
- Once all the Classifiers are Trained Predictions can be Made Just By Aggregating The Predictors Votes. - Soft Votes

In [6]:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

bag_clf = BaggingClassifier(
    DecisionTreeClassifier(), n_estimators=500,
    max_samples=100, bootstrap=True, n_jobs=-1
)
bag_clf.fit(X_train, y_train)
y_pred = bag_clf.predict(X_test)
accuracy_score(y_test, y_pred)

0.939875

If you Want to Try Our Pasting Just set **bootstrap=False**.

## Out-of-Bag Evaluation
When Sampling with Bagging Classifier only **67%** of the Instances are **Sampled** and **37%** of the Instances are **Not Sampled** at All So We can Use then as Our Testing set and There Results are Quite Amazing and Accurate and this is known as **Out-of-Bag Evaluation**.

In [7]:
bag_clf = BaggingClassifier(
    DecisionTreeClassifier(), n_estimators=500,
    max_samples=100, bootstrap=True, n_jobs=-1,
    oob_score=True
)
bag_clf.fit(X_train, y_train)
bag_clf.oob_score_

0.9405

In [8]:
y_pred = bag_clf.predict(X_test)
accuracy_score(y_test, y_pred)

0.939125

Now let's See Probability of Every Instance using oob_decision_function_

In [9]:
bag_clf.oob_decision_function_

array([[0.93958333, 0.06041667],
       [0.99371069, 0.00628931],
       [0.03138075, 0.96861925],
       ...,
       [0.05496829, 0.94503171],
       [0.97468354, 0.02531646],
       [0.99580713, 0.00419287]])

As You can see that the Results are Almost Equal to the Actual Accuracy.

# Random Patches and Random Subspaces
We all know that Bagging Classifiers Support Sampling Well and we Can Control it with two Hyperparameters:-
- **max_features** - Determines the maximum number of features each base classifier is allowed to use when fitting on a subset of the data.
- **bootstrap_features** - Specifies whether features are sampled with replacement when fitting each base classifier.

# Random Forests
- Ensemble of Decision Trees Trained using Bagging Classifier.

In [10]:
from sklearn.ensemble import RandomForestClassifier
rnd_clf = RandomForestClassifier(n_estimators=500, max_leaf_nodes=16, n_jobs=1)
rnd_clf.fit(X_train, y_train)
y_pred = rnd_clf.predict(X_test)
accuracy_score(y_test, y_pred)

0.942125

With Few Exception this Class Has all of the Hyperparameters of Decision Trees and Bagging Classifier.

## Extra-Trees
Exactly Same as Random Forest but it also puts Random Threshholds for each feature which makes it Extreamly Random.

In [11]:
from sklearn.ensemble import ExtraTreesClassifier
ext_clf = ExtraTreesClassifier(n_estimators=500, max_leaf_nodes=16, n_jobs=1)
ext_clf.fit(X_train, y_train)
y_pred = ext_clf.predict(X_test)
accuracy_score(y_test, y_pred)

0.927125

## Feature Importance

In [12]:
ext_clf.feature_importances_

array([0.3986664, 0.6013336])

In [13]:
rnd_clf.feature_importances_

array([0.41037262, 0.58962738])

# Boosting
The Ensemble which can convert many Weak Learners into a Strong Learner is known as Boosting Ensemble.
The Whole Idea of Boosting is to train Predictors Sequentially each trying to correct its Predicessor mainly there are two types of Boosting Methods :-
- AdaBoost (Adaptive Boosting)
- Gradient Boosting

## AdaBoost (Adaptive Boosting)
One way to Correct a Predictor from its Predicessor is to pay more attention on the Instances that were underfitted.
To Implement this Methode **AdaBoost** increases the weights of the underfitted instances thus resulting in a way better Ensemble than the normal random Ensemble.

In [14]:
from sklearn.ensemble import AdaBoostClassifier

ada_clf = AdaBoostClassifier(
    DecisionTreeClassifier(max_depth=2), n_estimators=200,
    algorithm="SAMME.R", learning_rate=0.5
)
ada_clf.fit(X_train, y_train)
y_pred = ada_clf.predict(X_test)
accuracy_score(y_test, y_pred)

0.92525

## Gradient Boosting
It Works just Like AdaBoost but Instead of Increasing Weights of the underfitted instances it Trains the next Predictor on the **Residual Errors** made by the Previous Predictor we can Understand it better with its implementation in Python.

In [15]:
tree_reg_1 = DecisionTreeClassifier(max_depth=2)
tree_reg_1.fit(X_train, y_train)

In [16]:
y_train2 = y_train - tree_reg_1.predict(X_train)
tree_reg_2 = DecisionTreeClassifier(max_depth=2)
tree_reg_2.fit(X_train, y_train2)

In [17]:
y_train3 = y_train - tree_reg_2.predict(X_train)
tree_reg_3 = DecisionTreeClassifier(max_depth=2)
tree_reg_3.fit(X_train, y_train3)

In [18]:
y_pred = tree_reg_3.predict(X_test)
accuracy_score(y_test, y_pred)

0.901125

Apart from this **Scikit-Learn** Provides a Class for Gradient Boosting Classifier.

In [19]:
from sklearn.ensemble import GradientBoostingClassifier
gbclf = GradientBoostingClassifier(max_depth=2, n_estimators=3, learning_rate=1.0)
gbclf.fit(X_train, y_train)

In [20]:
y_pred = gbclf.predict(X_test)
accuracy_score(y_test, y_pred)

0.926

The **learning_rate** *hyperparameter* scales the contribution of each tree if you set it to a low value such as *0.1* then you will need more trees to fit the the dataset effectively but it will generalize better this regularizing technique is known as *shrikage*.

# Stacking Ensemble
In this type of Ensemble a new Meta Learner is placed on top of the previous Learners to create an ensemble.

In [21]:
X_train = X[:5000]
y_train = y[:5000]
X_test = X[5000:]
y_test = y[5000:]

In [22]:
log_clf = LogisticRegression()
rnd_clf = RandomForestClassifier()
svm_clf = SVC(probability=True)

log_clf.fit(X_train, y_train)
rnd_clf.fit(X_train, y_train)
svm_clf.fit(X_train, y_train)

In [23]:
log_y_pred = log_clf.predict(X_test)
rnd_y_pred = rnd_clf.predict(X_test)
svm_y_pred = svm_clf.predict(X_test)

data = {
    "Logistic Regression": log_y_pred,
    "Random Forest Classifier": rnd_y_pred,
    "SVM Classifier": svm_y_pred,
}
data = pd.DataFrame(data)
data.head()

Unnamed: 0,Logistic Regression,Random Forest Classifier,SVM Classifier
0,1,1,1
1,0,0,0
2,1,0,0
3,0,0,0
4,0,1,1


In [24]:
y_test

array([1, 0, 0, ..., 0, 1, 0])

In [25]:
meta_learner = DecisionTreeClassifier(max_depth=5)
meta_learner.fit(data, y_test)

In [26]:
X = pd.DataFrame({
    "Logistic Regression": log_clf.predict(X),
    "Random Forest Classifier": rnd_clf.predict(X),
    "SVM Classifier": svm_clf.predict(X),
})
y_pred = meta_learner.predict(X)
accuracy_score(y, y_pred)

0.9466

## Stacking with Scikit-Learn

In [27]:
X_train = X[:2000]
y_train = y[:2000]
X_test = X[2000:]
y_test = y[2000:]

In [28]:
from sklearn.ensemble import StackingClassifier

base_classifiers = [
    ('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
    ('svm', SVC(kernel='linear', probability=True, random_state=42))
]
meta_classifier = LogisticRegression(random_state=42)
stacking_classifier = StackingClassifier(estimators=base_classifiers, final_estimator=meta_classifier)
stacking_classifier.fit(X_train, y_train)

In [29]:
y_pred = stacking_classifier.predict(X_test)
accuracy_score(y_test, y_pred)

0.960125

# The End