# Ensemble Learning

Ensemble learning is an algorithm in which we take predictions of a group of predictors, then predict the class that gets most votes.
A group of predictos is called an ensemble.


## Voting Classifier

In majority voting we create an ensemble of different , train them on trainig sets , then the most predicted class will be taken.

This majority voting classifier is known as hard voting classifier.


In [7]:
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split

make_moons

<function sklearn.datasets._samples_generator.make_moons(n_samples=100, *, shuffle=True, noise=None, random_state=None)>

In [8]:
X, y = make_moons(n_samples=1000,shuffle=True, noise=0.10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

In [9]:
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

#we will ensemble 3 classifiers (logistic regression, random forest, svc)
log_reg = LogisticRegression()
forest_clf = RandomForestClassifier()
svm_clf = SVC()

voting_clf = VotingClassifier(
    estimators=[('lr', log_reg), ('rd', forest_clf), ('svc', svm_clf)],
    voting='hard'
)
voting_clf.fit(X_train, y_train)

In [10]:
#look at each classifiers accuracy
from sklearn.metrics import accuracy_score

for clf in (log_reg, svm_clf, forest_clf, voting_clf):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_pred))

LogisticRegression 0.8878787878787879
SVC 1.0
RandomForestClassifier 0.996969696969697
VotingClassifier 1.0


If all the classifiers are able to predict class probabilities(i.e; they have a `predict_proba()` method) then you can tell scikit-learn to predict class with the highest class probability, averaged over all individual classifiers. This is called soft voting.
It ofter achieves higher performance than hard voting because it gives more weight to highly confident votes. (replace voting="hard" with voting="soft")

In [11]:
voting_clf_soft = VotingClassifier(
    estimators=[('lr', log_reg), ('rd', forest_clf)], #svc is not used because it do not have attribute predict proba
    voting='soft'
)
voting_clf.fit(X_train, y_train)

In [12]:
for clf in (log_reg,svm_clf, forest_clf, voting_clf, voting_clf_soft):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_pred))

LogisticRegression 0.8878787878787879
SVC 1.0
RandomForestClassifier 1.0
VotingClassifier 1.0
VotingClassifier 0.9787878787878788


## Bagging and Pasting

Another approach is to use the same training algorithm for every predictor, but to train them on different random subsets of the training set. When sampling is performed with replacement, this method is called bagging (short for
bootstrap aggregating). When sampling is performed without replacement, it is called pasting.

The following code trains an ensemble of 500 Decision Tree classifiers each trained on 100 training instances randomly sampled from the training set with replacement (this is an example of bagging, but if you want to use pasting instead, just set bootstrap=False).

The n_jobs parameter tells Scikit-Learn the number of CPU cores to use for training and predictions (–1 tells Scikit-Learn to use all available cores):

In [13]:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

tree_clf = DecisionTreeClassifier()
bag_clf = BaggingClassifier(
    tree_clf, n_estimators=500, max_samples=100, bootstrap=True, n_jobs=-1, oob_score=True
)
bag_clf.fit(X_train, y_train)

In [14]:
y_pred = bag_clf.predict(X_test)
accuracy_score(y_pred , y_test)

0.9818181818181818

In [15]:
bag_clf.oob_score_ #gives to out_of_bag evaluation of the classifier

0.9805970149253731

In [16]:
bag_clf.oob_decision_function_ #gives the out_of_bag (oob) evaluation for each clf

array([[0.05924171, 0.94075829],
       [0.06526807, 0.93473193],
       [0.07311321, 0.92688679],
       ...,
       [0.99543379, 0.00456621],
       [0.71724138, 0.28275862],
       [0.05104408, 0.94895592]])

## Random Forests

In [17]:
from sklearn.ensemble import RandomForestClassifier

rnd_clf = RandomForestClassifier(n_estimators=500, max_leaf_nodes=16, n_jobs=-1)
rnd_clf.fit(X_train, y_train)
y_pred_rf = rnd_clf.predict(X_test)

In [18]:
accuracy_score(y_test, y_pred_rf)

0.996969696969697

## Extra Trees

When you are growing a tree in a Random Forest, at each node only a random subset of the features is considered for splitting (as discussed earlier). It is possible to make trees even more random by also using random thresholds for each feature rather than searching for the best possible thresholds. A forest of such extremely random trees is simply called an Extremely Randomized Trees ensemble (or Extra trees).

In [20]:
from sklearn.ensemble import ExtraTreesClassifier

ext_clf = ExtraTreesClassifier(n_estimators=500, max_leaf_nodes=16, n_jobs=-1)
ext_clf.fit(X_train, y_train)
ext_pred = ext_clf.predict(X_test)
accuracy_score(ext_pred, y_test)

0.996969696969697

## Feature Importance

Yet another great quality of Random Forests is that they make it easy to measure the relative importance of each feature. Scikit-Learn measures a feature’s importance by looking at how much the tree nodes that use that feature reduce impurity on average (across all trees in the forest).



In [25]:
from sklearn.datasets import load_iris

iris = load_iris()
rnd_clf = RandomForestClassifier(n_estimators=500, n_jobs=-1)
rnd_clf.fit(iris["data"], iris["target"])

for name, score in zip(iris["feature_names"], rnd_clf.feature_importances_):
    print(name , score)

sepal length (cm) 0.09525330806560865
sepal width (cm) 0.02524556690084
petal length (cm) 0.4329718910598638
petal width (cm) 0.44652923397368754


It seems that the most important features are the petal length (44%) and width (42%), while sepal length and width are rather unimportant in comparison (11% and 2%, respectively).