<font color="red" size="6">Undersampling Methods</font>
<p><font color="yellow" size="5"><b>1_RandomUnderSampler </b></font></p>

<font color="pink" size=4><b>RandomUnderSampler</b>

RandomUnderSampler is an oversampling technique used to handle class imbalance in datasets. Unlike techniques like SMOTE or RandomOverSampler, which generate synthetic samples for the minority class, RandomUnderSampler addresses the class imbalance by randomly removing samples from the majority class.

The idea behind RandomUnderSampler is to reduce the number of majority class samples so that the class distribution becomes more balanced. While this approach is simple and effective, it may result in the loss of potentially valuable data, especially if the majority class contains important samples that could improve the model's performance.

<font color="pink" size=4>How RandomUnderSampler Works:</font>
<ol>
    <li><font color="orange">Identify Majority Class:</font> The first step is to identify the class that is overrepresented (majority class).<?
    <li><font color="orange">Random Sampling:</font> A random subset of samples from the majority class is selected, and these samples are removed from the dataset.
    <li><font color="orange">Resampling:</font> After removing samples, the dataset will be balanced with respect to the classes.

In [None]:
import numpy as np
from imblearn.under_sampling import RandomUnderSampler
from sklearn.datasets import make_classification
from collections import Counter
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

# Step 1: Create an imbalanced dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=2, 
                            n_redundant=10, n_classes=2, weights=[0.9, 0.1], 
                            random_state=42)

# Step 2: Check the class distribution before applying RandomUnderSampler
print("Class distribution before RandomUnderSampler:", Counter(y))

# Step 3: Apply RandomUnderSampler to balance the class distribution
random_under_sampler = RandomUnderSampler(random_state=42)
X_resampled, y_resampled = random_under_sampler.fit_resample(X, y)

# Step 4: Check the class distribution after applying RandomUnderSampler
print("Class distribution after RandomUnderSampler:", Counter(y_resampled))

# Step 5: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_resampled, y_resampled, 
                                                    test_size=0.3, random_state=42)

# Step 6: Train a classifier (RandomForest) on the resampled data
clf = RandomForestClassifier(random_state=42)
clf.fit(X_train, y_train)

# Step 7: Evaluate the classifier
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))


<font color="pink" size=4>Advantages of RandomUnderSampler:</font>
<ol>
    <li><font color="orange">Simple and Easy to Implement:</font> It is easy to apply and does not require complex computations.</li>
    <li><font color="orange">Reduces Overfitting:</font> By reducing the number of majority class samples, it may help reduce overfitting in some cases.</li>
    <li><font color="orange">Improved Computational Efficiency:</font> Reducing the number of majority class samples can speed up the training process.</li></ol>

<font color="pink" size=4>Drawbacks of RandomUnderSampler:</font>
<ol>
    <li><font color="orange">Loss of Valuable Data:</font> By removing samples from the majority class, there is a risk of losing valuable information, especially if the majority class is large and diverse.</li>
    <li><font color="orange">Risk of Underfitting:</font> If too many majority class samples are removed, the model may underfit, as it will not have enough information to learn from.</li>
    <li><font color="orange">Bias Toward Minority Class:</font> Over-sampling the minority class or under-sampling the majority class may lead to a model that has biased performance or poor generalization.</li></ol>