<a href="https://colab.research.google.com/github/ReemAlbluwi/Machine-Learning---T5/blob/main/Copy_of_Bagging_Exercise.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Bagging Exercise

In this exercise, you will explore the concept of Bagging (Bootstrap Aggregating) and implement it using a random forest model. Bagging is an ensemble technique mainly used for reducing the variance of a predictive model and preventing overfitting. The main idea behind bagging is to combine multiple learners in a way that the ensemble model performs better than an individual model.

## Dataset
We will use the Iris dataset for this exercise. The Iris dataset is a classic dataset from the field of machine learning, containing measurements for iris flowers of three different species. **Feel free to use another dataset!!**

## Task
Your task is to:
1. Load the dataset.
2. Preprocess the data (if necessary).
3. Implement Bagging models.
4. Evaluate the models performance.

Please fill in the following code blocks to complete the exercise.


In [15]:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import BaggingClassifier

from sklearn.tree import DecisionTreeClassifier

import numpy as np



# Load the dataset


In [3]:


# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target


# Preprocess the data (if necessary)

In [4]:


# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# # Preprocess the data (if necessary)
# No preprocessing is necessary for the Iris dataset in this case.


# Split the Dataset

In [5]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


# Initialize and Train the Classifiers

## Random Forest
Initialize and train a Random Forest classifier.

In [6]:
# Initialize and train a Random Forest classifier.

rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(X_train, y_train)


### Evaluate the model performance

In [7]:

# Evaluate the model performance
y_pred = rf_classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


Accuracy: 1.0


## Bagging Meta-estimator
Initialize a K-Nearest Neighbors classifier and use it as the base estimator for the Bagging classifier.

In [9]:
# Initialize a K-Nearest Neighbors classifier
knn_classifier = KNeighborsClassifier()

# Initialize a Bagging classifier with KNN as the base estimator
bagging_classifier = BaggingClassifier(base_estimator=knn_classifier, n_estimators=10, random_state=42)

# Fit the Bagging classifier to the training data
bagging_classifier.fit(X_train, y_train)




### Evaluate the model performance

In [10]:
# Evaluate the model performance
y_pred_bagging = bagging_classifier.predict(X_test)
accuracy_bagging = accuracy_score(y_test, y_pred_bagging)
print("Bagging Accuracy:", accuracy_bagging)


Bagging Accuracy: 1.0


## Pasting
Initialize a Decision Tree classifier and use it as the base estimator for a Bagging classifier with Pasting (without replacement).

In [11]:
# Initialize a Decision Tree classifier
tree_classifier = DecisionTreeClassifier()

# Initialize a Bagging classifier with Decision Tree as the base estimator and Pasting
pasting_classifier = BaggingClassifier(base_estimator=tree_classifier, n_estimators=10, bootstrap=False, random_state=42)

# Fit the Pasting classifier to the training data
pasting_classifier.fit(X_train, y_train)

# ### Evaluate the model performance
# Evaluate the model performance
y_pred_pasting = pasting_classifier.predict(X_test)
accuracy_pasting = accuracy_score(y_test, y_pred_pasting)
print("Pasting Accuracy:", accuracy_pasting)


Pasting Accuracy: 1.0




### Evaluate the model performance

In [13]:
# Evaluate the model performance
y_pred = rf_classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


Accuracy: 1.0


## Roughly Balanced Bagging (RBB)
Implement Roughly Balanced Bagging by manually creating balanced bootstrap samples and aggregating predictions from multiple Decision Tree classifiers.

In [16]:
# Create balanced bootstrap samples
n_samples = X_train.shape[0]
n_classes = len(np.unique(y_train))
n_estimators = 10

predictions = []
for _ in range(n_estimators):
  # Create a balanced bootstrap sample
  bootstrap_indices = []
  for class_label in range(n_classes):
    class_indices = np.where(y_train == class_label)[0]
    sample_size = len(class_indices)
    bootstrap_class_indices = np.random.choice(class_indices, size=sample_size, replace=True)
    bootstrap_indices.extend(bootstrap_class_indices)

  # Train a Decision Tree classifier on the bootstrap sample
  tree_classifier = DecisionTreeClassifier()
  tree_classifier.fit(X_train[bootstrap_indices], y_train[bootstrap_indices])

  # Make predictions on the test set
  y_pred = tree_classifier.predict(X_test)
  predictions.append(y_pred)

# Aggregate predictions using majority voting
predictions = np.array(predictions)
y_pred_rbb = np.apply_along_axis(lambda x: np.argmax(np.bincount(x)), axis=0, arr=predictions)

# Evaluate the model performance
accuracy_rbb = accuracy_score(y_test, y_pred_rbb)
print("Roughly Balanced Bagging Accuracy:", accuracy_rbb)


Roughly Balanced Bagging Accuracy: 1.0


### Evaluate the model performance

In [17]:
# Evaluate the model performance
y_pred = rf_classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


Accuracy: 1.0
