<font color="red" size="6"><b>Ensemble Methods</font>
<p><font color="Yellow" size="5"><b>3_RUSBoostClassifier</font>

The RUSBoostClassifier is an ensemble method available in the imbalanced-learn library. It combines Random Under-Sampling (RUS) with the Boosting algorithm to handle imbalanced datasets. It is based on the principle of adaptive boosting (AdaBoost), where weak learners are trained iteratively, but the training data is balanced in each iteration by under-sampling the majority class.

<font color="pink" size=4>Key Characteristics of RUSBoostClassifier:</font>
<ol>
    <li><font color="orange">Random Under-Sampling:</font> At each boosting iteration, the majority class is randomly under-sampled to match the size of the minority class.</li>
    <li><font color="orange">Boosting:</font> Boosting assigns weights to misclassified samples to focus on hard-to-classify instances in subsequent iterations.</li>
    <li><font color="orange">Base Estimator:</font> You can specify the weak learner (default is a decision tree).</li>
    <li><font color="orange">Handles Class Imbalance:</font> Works well for highly imbalanced datasets by combining under-sampling with boosting.</li></ol>

<font color="pink" size=4>Parameters of RUSBoostClassifier</font>
<ol>
    <li><font color="orange">base_estimator:</font> Specify the weak learner (default is DecisionTreeClassifier(max_depth=1)).</li>
   <li><font color="orange"> n_estimators:</font> Number of boosting iterations (default: 50).</li>
    <li><font color="orange">sampling_strategy:</font> Proportion of minority to majority class after under-sampling (default: auto).</li>
    <li><font color="orange">random_state:</font> Seed for reproducibility.</li></ol>

In [4]:
from imblearn.ensemble import RUSBoostClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from collections import Counter

# Step 1: Create an imbalanced dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=2,
                            n_redundant=10, weights=[0.9, 0.1], random_state=42)

# Step 2: Check class distribution
print("Class distribution before resampling:", Counter(y))

# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 4: Initialize the RUSBoostClassifier
rusboost = RUSBoostClassifier(n_estimators=50, random_state=42)

# Step 5: Train the RUSBoostClassifier
rusboost.fit(X_train, y_train)

# Step 6: Make predictions
y_pred = rusboost.predict(X_test)

# Step 7: Evaluate the classifier
print("Classification Report:\n", classification_report(y_test, y_pred))

# Step 8: Check class distribution after resampling
print("Class distribution after resampling in predictions:", Counter(y_pred))


Class distribution before resampling: Counter({0: 898, 1: 102})
Classification Report:
               precision    recall  f1-score   support

           0       0.98      0.95      0.96       275
           1       0.59      0.80      0.68        25

    accuracy                           0.94       300
   macro avg       0.78      0.87      0.82       300
weighted avg       0.95      0.94      0.94       300

Class distribution after resampling in predictions: Counter({0: 266, 1: 34})


<b><font color="sky blue">When to Use RUSBoostClassifier?</font></b>
<ol>
   <li>When you have a highly imbalanced dataset, and simple under-sampling methods lead to loss of important data.</li> 
    <li> When you want to combine the power of boosting (focus on hard examples) with balancing techniques.</li> </ol>