#  Handling Imbalanced Datasets: SMOTE (Synthetic Minority Over-sampling Technique)

### Why use SMOTE?

In imbalanced datasets (e.g., fraud detection, rare disease classification), models may predict the majority class more often. SMOTE creates synthetic examples of the minority class to balance the dataset.

- Dataset: Fraud Detection
- Random Forest Classifier

In [24]:
from imblearn.over_sampling import SMOTE
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

# Load Dataset (Assume fraud.csv has a 'Class' column with 0 (non-fraud) and 1 (fraud))
import pandas as pd
data = pd.read_csv('fraud.csv')
X = data.drop(columns=['Class'])
y = data['Class']

# Apply SMOTE
smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X, y)

# Train Model
model = RandomForestClassifier()
X_train, X_test, y_train, y_test = train_test_split(X_resampled, y_resampled, test_size=0.2, random_state=42)
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))


              precision    recall  f1-score   support

           0       1.00      0.80      0.89        20
           1       0.81      1.00      0.89        17

    accuracy                           0.89        37
   macro avg       0.90      0.90      0.89        37
weighted avg       0.91      0.89      0.89        37

