<a href="https://colab.research.google.com/github/7atemAlawwad/T5/blob/main/Bagging_Exercise.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Bagging Exercise

In this exercise, you will explore the concept of Bagging (Bootstrap Aggregating) and implement it using a random forest model. Bagging is an ensemble technique mainly used for reducing the variance of a predictive model and preventing overfitting. The main idea behind bagging is to combine multiple learners in a way that the ensemble model performs better than an individual model.

## Dataset
We will use the Iris dataset for this exercise. The Iris dataset is a classic dataset from the field of machine learning, containing measurements for iris flowers of three different species. **Feel free to use another dataset!!**

## Task
Your task is to:
1. Load the dataset.
2. Preprocess the data (if necessary).
3. Implement Bagging models.
4. Evaluate the models performance.

Please fill in the following code blocks to complete the exercise.


In [1]:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.ensemble import BaggingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.ensemble import BaggingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier



# Load the dataset


In [2]:
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target']= iris.target
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [3]:
df['target'].unique()

array([0, 1, 2])

# Preprocess the data (if necessary)

In [4]:
df.isnull().sum()

Unnamed: 0,0
sepal length (cm),0
sepal width (cm),0
petal length (cm),0
petal width (cm),0
target,0


# Split the Dataset

In [5]:
x = df.drop('target', axis=1)
y = df['target']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

# Initialize and Train the Classifiers

## Random Forest
Initialize and train a Random Forest classifier.

In [6]:
random_forest_classifier = RandomForestClassifier(n_estimators=50, random_state=42)
random_forest_classifier.fit(x_train, y_train)
y_pred = random_forest_classifier.predict(x_test)


### Evaluate the model performance

In [7]:
accuracy = accuracy_score(y_test, y_pred)
print(f'Random Forest Model Accuracy: {accuracy * 100:.2f}%')

Random Forest Model Accuracy: 100.00%


## Bagging Meta-estimator
Initialize a K-Nearest Neighbors classifier and use it as the base estimator for the Bagging classifier.

In [8]:
base_estimator = KNeighborsClassifier()
bagging_classifier = BaggingClassifier(base_estimator, n_estimators=50, random_state=42)

bagging_classifier.fit(x_train, y_train)
y_pred = bagging_classifier.predict(x_test)



### Evaluate the model performance

In [9]:
print(f'Bagging Model Accuracy: {accuracy_score(y_test, y_pred) * 100:.2f}%')

Bagging Model Accuracy: 100.00%


## Pasting
Initialize a Decision Tree classifier and use it as the base estimator for a Bagging classifier with Pasting (without replacement).

In [10]:
base_estimator = DecisionTreeClassifier()
pasting_classifier = BaggingClassifier(base_estimator, n_estimators=50, max_samples=0.7, bootstrap=False, random_state=42)

pasting_classifier.fit(x_train, y_train)

predictions = pasting_classifier.predict(x_train)


### Evaluate the model performance

In [11]:
accuracy = accuracy_score(y_test, y_pred)
print(f'Pasting Classifier Model Accuracy: {accuracy * 100:.2f}%')

Pasting Classifier Model Accuracy: 100.00%


## Roughly Balanced Bagging (RBB)
Implement Roughly Balanced Bagging by manually creating balanced bootstrap samples and aggregating predictions from multiple Decision Tree classifiers.

In [16]:
import numpy as np

n_estimators = 100

ensemble_preds = np.zeros((n_estimators, len(x_test)))
ensemble_models = []

for i in range(n_estimators):

    a_indices = np.where(y_train == 1)[0]
    b_indices = np.where(y_train == 0)[0]
    d_indices = np.where(y_train == 2)[0]



    chosen_d_indices = np.random.choice(d_indices, size=len(b_indices), replace=True)
    chosen_a_indices = np.random.choice(a_indices, size=len(b_indices), replace=True)
    chosen_b_indices = np.random.choice(b_indices, size=len(b_indices), replace=True)

    balanced_sample_indices = np.concatenate([chosen_b_indices, chosen_a_indices, chosen_d_indices ])
    np.random.shuffle(balanced_sample_indices)

    x_train_balanced = x_train.iloc[balanced_sample_indices]
    y_train_balanced = y_train.iloc[balanced_sample_indices]

    tree_clf = DecisionTreeClassifier(random_state=i)
    tree_clf.fit(x_train_balanced, y_train_balanced)
    ensemble_models.append(tree_clf)

    ensemble_preds[i] = tree_clf.predict(x_test)

    final_preds = np.round(np.mean(ensemble_preds, axis=0))

### Evaluate the model performance

In [18]:
accuracy = accuracy_score(y_test, final_preds)
print(f'Roughly Balanced Bagging Model Accuracy: {accuracy*100:.2f}')

Roughly Balanced Bagging Model Accuracy: 100.00
