<font color="red" size="6"><b>Ensemble Methods</font>
<p><font color="Yellow" size="5"><b>1_BalancedRandomForestClassifier</font>

The BalancedRandomForestClassifier is a classifier from the imbalanced-learn library that is specifically designed to handle imbalanced datasets. It is an extension of the Random Forest classifier, but with the added feature of balancing the class distribution within each decision tree during the training process. This is achieved by randomly under-sampling the majority class before each decision tree is built.

<font color="pink" size=4>How It Works:</font>
<ol>
    <li><font color="orange">Random Forest Algorithm:</font> Like a standard Random Forest, the BalancedRandomForestClassifier builds multiple decision trees using a bagging approach. Each tree is trained on a random subset of the data.</li>
    <li><font color="orange">Class Balancing:</font> During the training of each individual tree, the majority class is under-sampled to match the size of the minority class, ensuring that each tree is trained on a balanced dataset. This prevents the model from being biased towards the majority class.</li>
    <li><font color="orange">Ensemble Learning:</font> Once all the trees are trained, their predictions are aggregated (using a majority vote or average) to make the final prediction. This results in a robust model that performs well on imbalanced datasets.</li></ol>

In [2]:
import numpy as np
from imblearn.ensemble import BalancedRandomForestClassifier
from sklearn.datasets import make_classification
from collections import Counter
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# Step 1: Create an imbalanced dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=2, 
                            n_redundant=10, n_classes=2, weights=[0.9, 0.1], 
                            random_state=42)

# Step 2: Check the class distribution before applying BalancedRandomForestClassifier
print("Class distribution before BalancedRandomForestClassifier:", Counter(y))

# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 4: Train the BalancedRandomForestClassifier
clf = BalancedRandomForestClassifier(random_state=42)
clf.fit(X_train, y_train)

print("Class distribution AFTER BalancedRandomForestClassifier:", Counter(y))
# Step 5: Make predictions
y_pred = clf.predict(X_test)

# Step 6: Evaluate the classifier
print("Classification Report:\n", classification_report(y_test, y_pred))


Class distribution before BalancedRandomForestClassifier: Counter({0: 898, 1: 102})


  warn(
  warn(
  warn(


Class distribution AFTER BalancedRandomForestClassifier: Counter({0: 898, 1: 102})
Classification Report:
               precision    recall  f1-score   support

           0       0.99      0.91      0.95       275
           1       0.47      0.88      0.61        25

    accuracy                           0.91       300
   macro avg       0.73      0.89      0.78       300
weighted avg       0.94      0.91      0.92       300



<font color="sky blue" size=4>Note:</font><b>
The class distribution after using the BalancedRandomForestClassifier doesn't change in the sense of resampling the entire dataset. However, the classifier internally balances each individual decision tree during training by under-sampling the majority class to match the size of the minority class within each tree, which helps to address the class imbalance problem.

<font color="pink" size=4>Advantages of BalancedRandomForestClassifier:</font>
<ol>
    <li><font color="orange">Prevents Overfitting on Majority Class:</font> By under-sampling the majority class for each tree, the model prevents the majority class from overwhelming the decision trees.</li>
    <li><font color="orange">Better Performance on Imbalanced Datasets:</font> It improves the classification performance, especially for the minority class, by ensuring the model does not become biased towards the majority class.</li>
    <li><font color="orange">Ensemble Method:</font> Like other ensemble methods, Random Forest is more robust and performs better compared to a single classifier.</li></ol>

<font color="pink" size=4>Limitations:</font>
<ol>
    <li><font color="orange">Loss of Majority Class Information:</font> By under-sampling the majority class for each tree, the classifier may lose important information from the majority class, especially in cases where the majority class has a large amount of data.</li>
    <li><font color="orange">Computational Complexity:</font> Random forests are computationally expensive, and balancing each tree further adds to the complexity.</li></ol>