<font color="red" size="6"><b>Ensemble Methods</font>
<p><font color="Yellow" size="5"><b>2_EasyEnsembleClassifier</font>

The EasyEnsembleClassifier is a type of ensemble method based on under-sampling the majority class. It works by creating multiple subsets (or bags) from the minority class and then creating several versions of the dataset by combining each bag with a random under-sampled subset of the majority class. These subsets are then used to train different classifiers, and their predictions are combined to make the final decision.

<font color="pink" size=4>Steps in EasyEnsembleClassifier:</font>
<ol>
    <li><font color="orange">Under-sampling of Majority Class:</font> In each iteration, a random under-sample of the majority class is selected.</li>
    <li><font color="orange">Ensemble Creation:</font> Multiple subsets are created by combining each under-sampled majority class set with the original minority class.</li>
    <li><font color="orange">Model Training:</font> For each subset, a classifier is trained on the balanced dataset.</li>
    <li><font color="orange">Voting:</font> The final prediction is made using a majority vote (for classification) from all the individual classifiers in the ensemble.</li></ol>

In [13]:
from imblearn.ensemble import EasyEnsembleClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from collections import Counter
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

# Step 1: Create an imbalanced dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=2, 
                            n_redundant=10, n_classes=2, weights=[0.9, 0.1], 
                            random_state=42)

# Step 2: Check the class distribution before applying EasyEnsembleClassifier
print("Class distribution before EasyEnsembleClassifier:", Counter(y))

# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 4: Train the EasyEnsembleClassifier with the correct parameters
clf = EasyEnsembleClassifier(n_estimators=10, random_state=42,sampling_strategy=.9)
clf.fit(X_train, y_train)

# Step 5: Make predictions
y_pred = clf.predict(X_test)

# Step 6: Evaluate the classifier
print("Classification Report:\n", classification_report(y_test, y_pred))

# Step 7: Check the class distribution of the predictions
print("Class distribution after EasyEnsembleClassifier:", Counter(y_pred))


Class distribution before EasyEnsembleClassifier: Counter({0: 898, 1: 102})
Classification Report:
               precision    recall  f1-score   support

           0       0.98      0.91      0.95       275
           1       0.46      0.84      0.59        25

    accuracy                           0.90       300
   macro avg       0.72      0.87      0.77       300
weighted avg       0.94      0.90      0.92       300

Class distribution after EasyEnsembleClassifier: Counter({0: 254, 1: 46})


<b><font color="sky blue">The EasyEnsembleClassifier does not force an exact 50-50 distribution after resampling. Instead, it aims to balance the class distribution in a way that can improve the model’s performance on the minority class. The final distribution may not always be exactly even (i.e., 50-50) because the minority class is often much smaller to begin with, so it may still have a slight imbalance depending on how many estimators (models) you use.
Why isn't the distribution exactly even?</font></b>
<ol>
    <li>The ensemble tries to balance each individual classifier by sampling a subset of the majority class, but it doesn't necessarily ensure an exactly balanced dataset across all classifiers. Instead, it just aims for a balanced performance by adjusting the majority class for each weak learner.</li>
    <li>The class distribution after resampling depends on the number of estimators (n_estimators), the number of samples in the minority class, and the number of under-sampled majority class instances.</li></ol>