<a href="https://colab.research.google.com/github/Matinsalami/DataScience/blob/main/Hands_on_Machine_Learning/Chapter_7/Ensemble_Learning_and_Random_Forests.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

A group of predictors are called **Ensemble** and using this ensemble to get a better prediction on a dataset is called **Ensemble Learning**. An Ensemble Learning algorithm is called an Ensemble method. Here se discuss different methods for regressors and classifiers.  

#Voting Classifiers

Having trained a number of different classifiers on a dataset(SVM, Random Forest, K-Nearest Neighbors e.g.), a very simple way to have a better classifier is to aggregate the predictions pf each classifier and predict the class that gets the most votes. This new classifier seems to achieve better accuracy even if each classifier in the ensemble is weak learner, the ensemble can still be a strong learner provided there are a sufficient number of weak learners and they are sufficiently diverse.

Note: Ensemble methods ork best when the predictors are as independent from one another as possible. One way to get diverse classifiers is to train them using different algorithms. This increases the chance that they will make very different types of errors, improving the ensemble's accuracy.

This is an example of hard-voting classifier:

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Load dataset
X, y = load_iris(return_X_y=True)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define base classifiers
clf1 = LogisticRegression(max_iter=1000)
clf2 = DecisionTreeClassifier()
clf3 = SVC(probability=False)  # no need for probabilities in hard voting
clf4 = RandomForestClassifier()

# Create hard voting ensemble
voting_clf = VotingClassifier(
    estimators=[('lr', clf1), ('dt', clf2), ('svc', clf3), ('rf', clf4)],
    voting='hard'  # 'hard' = majority voting
)

# Train the ensemble
voting_clf.fit(X_train, y_train)

for clf in (clf1,clf2,clf3,clf4,voting_clf):
  clf.fit(X_train,y_train)
  y_pred = clf.predict(X_test)
  print("Accuracy: ", accuracy_score(y_test,y_pred))

Accuracy:  1.0
Accuracy:  1.0
Accuracy:  1.0
Accuracy:  1.0
Accuracy:  1.0


If all the classifiers in the ensemble have a dict_proba() method, meaning they are able to estimate class probabilities, we can use soft voting classifier. It often achieves higher performance than hard voting classifier because it gives more weight to highly confident votes. In Scikit-Learn you only need to replace `voting="hard"` with `voting="soft"`.

Note: In SVM you need to set its `probability` hyperparameter to `True`.  