<a href="https://colab.research.google.com/github/Kaustubh-20/ML_Practice/blob/main/Ensemble_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**IMPORTING REQUIRED LIBRARIES**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix

**Loaded iris dataset from sklearn library**

In [None]:
data = load_iris()
x = data.data
y = data.target

**Split the data into test and train**

In [None]:
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=300)

In [None]:
dt_model = DecisionTreeClassifier(criterion='entropy', splitter='best', random_state=300)
dt_model.fit(X_train, y_train)

**Accuracy Obtained from using Decision Tree as a Classifier**

In [None]:
dt_model.score(X_test,y_test)

0.9736842105263158

In [None]:
rf_model = RandomForestClassifier(n_estimators=100, criterion='entropy', random_state=300)
rf_model.fit(X_train, y_train)

**Accuracy Obtained from using Random Forest as a Classifier**

In [None]:
rf_model.score(X_test,y_test)

0.9473684210526315

**Printing Confusion Matrix**

In [None]:
y_pred = rf_model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
accuracy = rf_model.score(X_test, y_test)

**Here we are using random forest as it is resolving the overfitting issue with Decision Tree**

In [None]:
print("Confusion Matrix:")
print(cm)
print("Model Accuracy:", accuracy)

Confusion Matrix:
[[15  0  0]
 [ 0 11  0]
 [ 0  2 10]]
Model Accuracy: 0.9473684210526315


**We got 15 correct predictions of Class A, 11 correct predictions of Class B, 10 correct predictions of Class C**

**And 2 flowers of Class C are wrongly predicted as of Class B**

#Random Forest reduces the variance of a model through several key mechanisms:


* Bootstrapping: Random Forest uses bootstrapping, which involves randomly selecting subsets of the training data with replacement. This creates multiple diverse subsets, each with variations in the data points it contains. As a result, each decision tree in the ensemble is trained on a different dataset. This introduces randomness and diversity into the training process, reducing the likelihood of overfitting to specific patterns in the data.

* Random Feature Selection: In addition to bootstrapping, Random Forest also employs random feature selection. Instead of considering all features when making a split in each tree, only a random subset of features is considered at each split. This further diversifies the trees and prevents them from relying too heavily on a single feature, reducing the potential for overfitting.

* Ensemble Averaging or Voting: After training individual decision trees on their respective datasets, Random Forest combines their predictions through averaging (for regression tasks) or voting (for classification tasks). This ensemble technique helps smooth out the predictions and reduce variance. For regression, the final prediction is often the average of the individual tree predictions, which reduces the impact of outliers or noise.

By combining these techniques, Random Forest effectively reduces the variance of the model, making it more stable and better at generalizing to unseen data.




#Random Forest can be used for both classification and regression tasks. The main difference between these two tasks when using Random Forest lies in the type of output or prediction they produce.

1. Classification with Random Forest:

In classification tasks, Random Forest is used to predict a categorical or discrete target variable. For example, it can predict whether an email is spam or not (binary classification) or classify images into different categories (multiclass classification).
The output of a Random Forest classification model is a class label or category. It assigns the input data to one of the predefined classes based on the majority class among the individual decision trees in the forest. In other words, it provides a probability distribution over the classes for each input, and the class with the highest probability is chosen as the prediction.
2. Regression with Random Forest:

In regression tasks, Random Forest is used to predict a continuous or numerical target variable. For instance, it can predict the price of a house based on its features, such as square footage, number of bedrooms, and location.
The output of a Random Forest regression model is a numerical value. It predicts a real number as the target variable, typically by averaging or taking a weighted vote of the individual decision tree predictions. The result is a continuous prediction that represents an estimate of the target variable's value.

#Bagging
Which stands for Bootstrap Aggregating, is an ensemble learning technique employed to decrease the variance of a model by combining predictions from multiple independently trained models. Bagging entails generating numerous subsets of the training data through random sampling with replacement and training a distinct model on each of these subsets. The predictions from these individual models are then combined to produce a final prediction.

The term "Bagging" aptly describes this process and the concept of forming multiple "bags" of data:

1) Bootstrap Sampling: Bagging initiates the process by crafting multiple subsets of the original training data. Each subset is created by randomly selecting data from the original dataset with replacement. Consequently, some data points may be included multiple times within a subset, while others may not be included at all. These subsets are commonly referred to as "bags."

2) Aggregation: Subsequently, once the bags are generated, a separate model is trained using each bag's respective training data. For instance, when working with decision trees, each model corresponds to an individual decision tree trained on a different bag of data.

3) Predictions and Aggregation: Upon completing the training phase, each individual model generates predictions for new, unseen data. The ultimate prediction is obtained by aggregating the predictions from all individual models. In classification tasks, the majority voting method is frequently employed, whereby the predicted class that appears most frequently across all models is selected. In regression tasks, it is customary to compute the average of the predicted values from all models.

The term "Bootstrap" in "Bagging" alludes to the random sampling procedure with replacement utilized to create the bags of data. Bootstrap sampling entails producing new datasets that closely resemble the original data but exhibit some variability due to the random sampling process.