### 📊 **What is Imbalanced Data?**

**Imbalanced data** refers to a dataset where the **distribution of classes is not approximately equal**. One class has **much more data** than the other(s).

### Example:

binary classification:

| Class | Count |
|-------|-------|
| 0     | 950   |
| 1     | 50    |

Here, class `0` dominates the data — this is **imbalanced**.

---

### Why is it a problem?

1. **Misleading Accuracy**:
   - A model that **predicts all 0s** would still get **95% accuracy** — but it’s **useless** for detecting class `1`.
   
2. **Poor Recall or Precision**:
   - The model might **ignore the minority class**, leading to low **recall** or **F1 score**.

3. **Bias toward majority class**:
   - The model becomes biased toward predicting the more frequent class.

# Ways To Handle Imbalance Data

## Under sampling majority class
It involves removing some samples from the majority class to match the size of the minority class. It is not wise to use this method as a lot of data is being missed.

In [None]:
from imblearn.under_sampling import RandomUnderSampler

# Initialize undersampler
undersample = RandomUnderSampler(random_state=42)

# Fit and resample
X_resampled, y_resampled = undersample.fit_resample(X_train, y_train)

# Now use X_resampled and y_resampled to train your model    

## Over sampling minority class by duplication

You randomly duplicate examples from the minority class to make its count equal to the majority class.
This helps the model see the minority class more often during training.

In [None]:
from imblearn.over_sampling import RandomOverSampler
from collections import Counter

# Before oversampling
print("Before:", Counter(y_train))

# Initialize oversampler
oversample = RandomOverSampler(random_state=42)

# Resample
X_resampled, y_resampled = oversample.fit_resample(X_train, y_train)

# After oversampling
print("After :", Counter(y_resampled))


# SMOTE

SMOTE stands for Synthetic Minority Over-sampling Technique.
Instead of duplicating existing data, SMOTE creates new synthetic examples of the minority class by interpolating between existing ones.

For each minority class sample, SMOTE:

Finds its k nearest neighbors (usually k=5).

Randomly picks one neighbor.

Generates a new sample somewhere between the original and the neighbor.

This gives more diverse and realistic samples than just copying.



In [None]:
from imblearn.over_sampling import SMOTE
from collections import Counter

# Check class distribution before
print("Before:", Counter(y_train))

# Apply SMOTE
smote = SMOTE(random_state=42)
X_smote, y_smote = smote.fit_resample(X_train, y_train)

# Check class distribution after
print("After :", Counter(y_smote))


# Ensembel Method

In this method the majority data is 