# 📌 SMOTE (Synthetic Minority Over-sampling Technique) Analysis  

When dealing with **imbalanced datasets** (e.g., fraud detection, churn prediction, rare disease classification), machine learning models often become biased towards the majority class.  

👉 Example: If 95% of samples are "Not Fraud" and 5% are "Fraud," the model may just predict "Not Fraud" all the time and still achieve 95% accuracy.  

This is where **SMOTE** comes in.  

---

## 🔎 What is SMOTE?  

**SMOTE (Synthetic Minority Over-sampling Technique)** generates new synthetic samples for the minority class instead of duplicating existing ones.  

- ✅ Balances dataset  
- ✅ Reduces overfitting from simple oversampling  
- ✅ Improves model performance on minority class  

---

## ⚙️ How SMOTE Works  

1. For each minority class sample, SMOTE:  
   - Finds its *k-nearest neighbors* (default k=5).  
   - Selects one neighbor randomly.  
   - Generates a synthetic sample along the line between the two points.  

2. This way, the decision boundary for minority class expands and becomes more generalizable.  

---

## 🧑‍💻 Hands-on Code Example  

### Step 1: Import Libraries
```python
import pandas as pd
from sklearn.datasets import make_classification
from collections import Counter
from imblearn.over_sampling import SMOTE
import matplotlib.pyplot as plt


In [1]:
import pandas as pd
from sklearn.datasets import make_classification
from collections import Counter
from imblearn.over_sampling import SMOTE
import matplotlib.pyplot as plt

### Step 2: Create an Imbalanced Dataset

In [2]:
X, y = make_classification(n_classes=2, class_sep=2,
                           weights=[0.9, 0.1], # 90% majority, 10% minority
                           n_informative=3, n_redundant=1, n_features=5,
                           n_clusters_per_class=1, n_samples=1000, random_state=42)

print("Original Dataset Shape:", Counter(y))


Original Dataset Shape: Counter({np.int64(0): 897, np.int64(1): 103})


### Step 3: Apply SMOTE

In [3]:
smote = SMOTE(random_state=42)
X_res, y_res = smote.fit_resample(X, y)

print("Resampled Dataset Shape:", Counter(y_res))


Resampled Dataset Shape: Counter({np.int64(0): 897, np.int64(1): 897})
