# Bagging Exercise

In this exercise, you will explore the concept of Bagging (Bootstrap Aggregating) and implement it using a random forest model. Bagging is an ensemble technique mainly used for reducing the variance of a predictive model and preventing overfitting. The main idea behind bagging is to combine multiple learners in a way that the ensemble model performs better than an individual model.

## Dataset
We will use the Iris dataset for this exercise. The Iris dataset is a classic dataset from the field of machine learning, containing measurements for iris flowers of three different species. **Feel free to use another dataset!!**

## Task
Your task is to:
1. Load the dataset.
2. Preprocess the data (if necessary).
3. Implement Bagging models.
4. Evaluate the models performance.

Please fill in the following code blocks to complete the exercise.


In [70]:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split



# Load the dataset


In [122]:
iris = load_iris()

df = pd.DataFrame(iris.data,columns=iris.feature_names)


In [123]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 4 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   sepal length (cm)  150 non-null    float64
 1   sepal width (cm)   150 non-null    float64
 2   petal length (cm)  150 non-null    float64
 3   petal width (cm)   150 non-null    float64
dtypes: float64(4)
memory usage: 4.8 KB


# Preprocess the data (if necessary)

In [124]:
df.isnull().sum()
df.duplicated().sum()
df = df.drop_duplicates()
df.duplicated().sum()

np.int64(0)

# Split the Dataset

In [125]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier, RandomForestClassifier,BaggingRegressor
from sklearn.tree import DecisionTreeRegressor

In [126]:
x = df
y = df['petal width (cm)']


# Initialize and Train the Classifiers

In [127]:
x_train, x_test,y_train,y_test = train_test_split(x , y , test_size=0.2,random_state=42)


## Random Forest
Initialize and train a Random Forest classifier.

In [128]:
base_estimator = RandomForestRegressor()
Bagging = BaggingRegressor(base_estimator,n_estimators=50,random_state=42)

Bagging.fit(x_train,y_train)

In [129]:
predict = Bagging.predict(x_test)
predict

array([1.24342, 0.34298, 2.25522, 1.49852, 1.40276, 0.39998, 1.26418,
       2.39128, 1.4974 , 1.22224, 2.42482, 0.13178, 0.20264, 0.1322 ,
       0.33202, 1.5753 , 2.4263 , 1.0917 , 1.30608, 2.02826, 0.2017 ,
       1.80698, 0.4131 , 1.79828, 2.09684, 2.2926 , 2.01874, 1.88456,
       0.3102 , 0.20168])

### Evaluate the model performance

In [137]:
from sklearn.metrics import mean_squared_error
#mse = mean_squared_error(predict,y_train)
#print("Mean Squared Error:", mse)

## Bagging Meta-estimator
Initialize a K-Nearest Neighbors classifier and use it as the base estimator for the Bagging classifier.

### Evaluate the model performance

In [96]:
from sklearn.neighbors import KNeighborsRegressor
base_estimator = KNeighborsRegressor()

Bagging = BaggingRegressor(base_estimator, n_estimators=50, random_state=42)
Bagging.fit(x_train,y_train)

In [99]:
predict = Bagging.predict(x_test)

## Pasting
Initialize a Decision Tree classifier and use it as the base estimator for a Bagging classifier with Pasting (without replacement).

### Evaluate the model performance

## Roughly Balanced Bagging (RBB)
Implement Roughly Balanced Bagging by manually creating balanced bootstrap samples and aggregating predictions from multiple Decision Tree classifiers.

### Evaluate the model performance