1) **for class imbalanced dataset do we use stratified CV or Handle class imbalance with class weight=’balanced’ ?**

2) **I don't understand "If imbalanced, report PR-AUC alongside ROC-AUC/F1 and give the prevalence baseline for PR-AUC calibrate probabilities if you threshold."**

3) **why Handle high-cardinality fields with target/frequency encoding inside CV, why not in preprocessing pipeline?**

# Excellent Questions! Let me clarify these important concepts.

---

## 1. Stratified CV vs class_weight='balanced'

### They're NOT Alternatives - Use BOTH!

**They solve different problems:**

```python
┌─────────────────────────────────────────────────────────────┐
│ STRATIFIED CV          → Ensures fair data splitting       │
│ class_weight='balanced' → Adjusts model's learning process │
│                                                              │
│ Use BOTH for imbalanced data! ✅                            │
└─────────────────────────────────────────────────────────────┘
```

### Stratified CV - What It Does

**Problem without stratification:**

```python
# Dataset: 90% class 0, 10% class 1
from sklearn.model_selection import KFold

# Regular KFold (BAD for imbalanced data!)
kf = KFold(n_splits=5, shuffle=True, random_state=42)

for fold_idx, (train_idx, val_idx) in enumerate(kf.split(X, y)):
    print(f"Fold {fold_idx}:")
    print(f"  Train: {np.bincount(y[train_idx])}")  
    print(f"  Val:   {np.bincount(y[val_idx])}")

# Output might be:
# Fold 0:
#   Train: [7200, 800]  # 90% vs 10% ✅
#   Val:   [1800, 200]  # 90% vs 10% ✅
# 
# Fold 1:
#   Train: [7350, 650]  # 91.9% vs 8.1% ⚠️  Different!
#   Val:   [1650, 350]  # 82.5% vs 17.5% ⚠️  Much different!
# 
# Fold 2:
#   Train: [7100, 900]  # 88.75% vs 11.25% ⚠️
#   Val:   [1900, 100]  # 95% vs 5% ⚠️  Very different!

# Each fold has different class distribution → Inconsistent evaluation!
```

**Solution: Stratified CV**

```python
from sklearn.model_selection import StratifiedKFold

# Stratified CV maintains class distribution
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

for fold_idx, (train_idx, val_idx) in enumerate(skf.split(X, y)):
    print(f"Fold {fold_idx}:")
    print(f"  Train: {np.bincount(y[train_idx])}")  
    print(f"  Val:   {np.bincount(y[val_idx])}")

# Output:
# Fold 0:
#   Train: [7200, 800]  # 90% vs 10% ✅
#   Val:   [1800, 200]  # 90% vs 10% ✅
# 
# Fold 1:
#   Train: [7200, 800]  # 90% vs 10% ✅
#   Val:   [1800, 200]  # 90% vs 10% ✅
# 
# All folds have same distribution → Fair, consistent evaluation!
```

**Stratified CV ensures:**
- ✅ Each fold has same class distribution
- ✅ Fair model comparison across folds
- ✅ Prevents one fold from having too few minority samples
- ✅ More stable cross-validation scores

### class_weight='balanced' - What It Does

**Problem: Model ignores minority class**

```python
# Without class weights
# Model learns: "Just predict majority class to minimize loss"

# Dataset: 9000 class 0, 1000 class 1
model = RandomForestClassifier(random_state=42)  # No class_weight
model.fit(X_train, y_train)

# Loss function treats all errors equally:
# Wrong prediction on class 0: Loss = 1
# Wrong prediction on class 1: Loss = 1
# 
# Model optimizes: Minimize total loss
# → Just predict class 0 always! (9000 correct, 1000 wrong = 90% accuracy!)
```

**Solution: class_weight='balanced'**

```python
model = RandomForestClassifier(class_weight='balanced', random_state=42)
model.fit(X_train, y_train)

# What 'balanced' does internally:
n_samples = len(y_train)  # 10,000
n_classes = 2
n_samples_per_class = np.bincount(y_train)  # [9000, 1000]

# Calculate weights:
weight_0 = n_samples / (n_classes * n_samples_per_class[0])
         = 10000 / (2 * 9000) = 0.556

weight_1 = n_samples / (n_classes * n_samples_per_class[1])
         = 10000 / (2 * 1000) = 5.0

# Now loss function:
# Wrong prediction on class 0: Loss = 1 × 0.556 = 0.556
# Wrong prediction on class 1: Loss = 1 × 5.0 = 5.0
#
# Minority class errors are penalized 9x more!
# → Model learns to balance predictions
```

**class_weight='balanced' makes:**
- ✅ Model care about minority class errors
- ✅ Adjusts decision boundary
- ✅ Improves minority class recall

### Use BOTH Together!

```python
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.ensemble import RandomForestClassifier

# ✅ CORRECT: Use both!
model = RandomForestClassifier(
    class_weight='balanced',  # Adjust learning
    random_state=42
)

cv = StratifiedKFold(
    n_splits=5, 
    shuffle=True, 
    random_state=42  # Fair splitting
)

scores = cross_val_score(model, X_train, y_train, cv=cv, scoring='roc_auc')
print(f"CV AUC: {scores.mean():.3f} ± {scores.std():.3f}")
```

### Complete Strategy for Imbalanced Data

```python
# Progression (try in order):

# 1. Baseline: class_weight='balanced' + StratifiedKFold
model = RandomForestClassifier(class_weight='balanced')
cv = StratifiedKFold(5, shuffle=True, random_state=42)

# 2. If not enough: Add SMOTE
from imblearn.pipeline import Pipeline as ImbPipeline
from imblearn.over_sampling import SMOTE

pipeline = ImbPipeline([
    ('smote', SMOTE(random_state=42)),
    ('model', RandomForestClassifier())  # No class_weight with SMOTE!
])
cv = StratifiedKFold(5, shuffle=True, random_state=42)

# 3. NEVER combine SMOTE + class_weight (redundant!)
# ❌ WRONG
pipeline = ImbPipeline([
    ('smote', SMOTE()),  # Already balances data
    ('model', RandomForestClassifier(class_weight='balanced'))  # Double balancing!
])
# Pick one: SMOTE OR class_weight, not both
```

---

## 2. PR-AUC, Calibration, and Thresholding

Let me break down this complex statement piece by piece.

### Part 1: "Report PR-AUC alongside ROC-AUC"

**Why ROC-AUC can be misleading for imbalanced data:**

```python
from sklearn.metrics import roc_auc_score, average_precision_score, roc_curve, precision_recall_curve
import matplotlib.pyplot as plt

# Extreme imbalance: 99% class 0, 1% class 1
# Model A: Good model
# Model B: Bad model (barely better than random)

# Dummy example probabilities
y_test = np.array([0]*990 + [1]*10)  # 99% class 0, 1% class 1

# Model A: Good predictions
y_proba_good = np.concatenate([
    np.random.beta(2, 5, 990),   # Low probs for class 0
    np.random.beta(5, 2, 10)     # High probs for class 1
])

# Model B: Bad predictions (barely better than random)
y_proba_bad = np.concatenate([
    np.random.beta(3, 3, 990),   # Medium probs for class 0
    np.random.beta(4, 3, 10)     # Slightly higher for class 1
])

# Evaluate both
print("Model A (Good):")
print(f"  ROC-AUC: {roc_auc_score(y_test, y_proba_good):.4f}")  # ~0.95
print(f"  PR-AUC:  {average_precision_score(y_test, y_proba_good):.4f}")  # ~0.75

print("\nModel B (Bad):")
print(f"  ROC-AUC: {roc_auc_score(y_test, y_proba_bad):.4f}")  # ~0.65 ⚠️ Still looks OK!
print(f"  PR-AUC:  {average_precision_score(y_test, y_proba_bad):.4f}")  # ~0.15 ⚠️ Reveals it's bad!

# ROC-AUC is optimistic because true negatives (990 samples) dominate
# PR-AUC focuses on precision and recall (ignores true negatives) → more honest
```

**Visual difference:**

```python
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# ROC Curve
fpr_good, tpr_good, _ = roc_curve(y_test, y_proba_good)
fpr_bad, tpr_bad, _ = roc_curve(y_test, y_proba_bad)

axes[0].plot(fpr_good, tpr_good, label=f'Good (AUC={roc_auc_score(y_test, y_proba_good):.3f})')
axes[0].plot(fpr_bad, tpr_bad, label=f'Bad (AUC={roc_auc_score(y_test, y_proba_bad):.3f})')
axes[0].plot([0, 1], [0, 1], 'k--', label='Random')
axes[0].set_xlabel('False Positive Rate')
axes[0].set_ylabel('True Positive Rate')
axes[0].set_title('ROC Curve (Optimistic for Imbalanced Data)')
axes[0].legend()
axes[0].grid(alpha=0.3)

# PR Curve
prec_good, rec_good, _ = precision_recall_curve(y_test, y_proba_good)
prec_bad, rec_bad, _ = precision_recall_curve(y_test, y_proba_bad)

axes[1].plot(rec_good, prec_good, label=f'Good (AP={average_precision_score(y_test, y_proba_good):.3f})')
axes[1].plot(rec_bad, prec_bad, label=f'Bad (AP={average_precision_score(y_test, y_proba_bad):.3f})')
axes[1].axhline(y=0.01, color='k', linestyle='--', label='Baseline (prevalence)')
axes[1].set_xlabel('Recall')
axes[1].set_ylabel('Precision')
axes[1].set_title('PR Curve (Honest for Imbalanced Data)')
axes[1].legend()
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()
```

**Key insight:**
- **ROC-AUC**: Looks at FPR (false positives / all negatives)
  - With 990 negatives, even 100 false positives = 10% FPR (looks good!)
- **PR-AUC**: Looks at precision (true positives / all predicted positives)
  - If you predict 110 positives (10 TP + 100 FP), precision = 10/110 = 9% (terrible!)

### Part 2: "Give the prevalence baseline for PR-AUC"

**Prevalence baseline** = Performance of a dummy classifier that predicts positive randomly at the rate of the minority class.

```python
# Calculate prevalence (base rate of minority class)
prevalence = (y_test == 1).sum() / len(y_test)
print(f"Prevalence (minority class rate): {prevalence:.2%}")  # 1%

# Prevalence baseline for PR-AUC
# A random classifier gets precision = prevalence
# Example: If you predict randomly, 1% of predictions will be correct

# Your model's PR-AUC MUST beat this baseline!
pr_auc = average_precision_score(y_test, y_proba)
print(f"Model PR-AUC: {pr_auc:.4f}")
print(f"Baseline:     {prevalence:.4f}")

if pr_auc > prevalence:
    print(f"✅ Model is {pr_auc/prevalence:.1f}x better than random!")
else:
    print("❌ Model is useless (no better than random)")
```

**Always report both:**

```python
# Complete evaluation for imbalanced data
print("="*60)
print("MODEL EVALUATION (Imbalanced Data: 99% vs 1%)")
print("="*60)

prevalence = (y_test == 1).mean()

print(f"\nClass distribution:")
print(f"  Class 0: {(y_test == 0).sum()} ({(y_test == 0).mean():.1%})")
print(f"  Class 1: {(y_test == 1).sum()} ({prevalence:.1%})")

print(f"\nMetrics:")
print(f"  ROC-AUC: {roc_auc_score(y_test, y_proba):.4f}")
print(f"  PR-AUC:  {average_precision_score(y_test, y_proba):.4f}")
print(f"  Baseline (prevalence): {prevalence:.4f}")
print(f"  PR-AUC improvement: {average_precision_score(y_test, y_proba)/prevalence:.1f}x over random")
```

### Part 3: "Calibrate probabilities if you threshold"

**What is thresholding?**

Default prediction uses 0.5 threshold:
```python
# Default behavior
y_pred = model.predict(X_test)
# Internally: y_pred = (model.predict_proba(X_test)[:, 1] >= 0.5).astype(int)
```

**Custom thresholding** = Adjust the cutoff:
```python
# Custom threshold
threshold = 0.3  # Predict positive if probability >= 30%
y_proba = model.predict_proba(X_test)[:, 1]
y_pred_custom = (y_proba >= threshold).astype(int)

# Why? To optimize for your use case:
# - Fraud detection: Low threshold (catch more fraud, accept false alarms)
# - Spam filter: High threshold (avoid false positives)
```

**Why calibration matters for thresholding:**

```python
from sklearn.calibration import CalibratedClassifierCV

# Example: Uncalibrated model (e.g., SVM or Naive Bayes)
from sklearn.naive_bayes import GaussianNB

model = GaussianNB()
model.fit(X_train, y_train)
y_proba_uncal = model.predict_proba(X_test)[:, 1]

# Problem: Probabilities are overconfident
# Model says 0.7 but true rate is only 0.4
# Your threshold of 0.3 won't work as intended!

# Solution: Calibrate
calibrated = CalibratedClassifierCV(model, method='sigmoid', cv=5)
calibrated.fit(X_train, y_train)
y_proba_cal = calibrated.predict_proba(X_test)[:, 1]

# Now probabilities are trustworthy
# 0.7 actually means 70% chance
# Thresholding works as expected
```

**When to calibrate:**

```python
# You need calibration if:
# 1. You're using thresholds (not just default 0.5)
# 2. Probabilities matter for decision-making
# 3. Using models known to be poorly calibrated:
#    - Naive Bayes (very overconfident)
#    - SVMs (not designed for probabilities)
#    - Boosted trees (often miscalibrated)

# Example: Cost-sensitive decision
def make_decision(proba, threshold=0.3):
    """
    threshold=0.3: Only act if >30% confident
    Requires calibrated probabilities!
    """
    return proba >= threshold

# Uncalibrated model says 0.4 → You act → But true probability was 0.2 → Bad decision!
# Calibrated model says 0.2 → You don't act → Correct!
```

**Complete example:**

```python
from sklearn.calibration import CalibratedClassifierCV, calibration_curve
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import brier_score_loss

# Train uncalibrated model
model = GaussianNB()
model.fit(X_train, y_train)

# Calibrate it
calibrated_model = CalibratedClassifierCV(
    model, 
    method='sigmoid',
    cv=StratifiedKFold(5, shuffle=True, random_state=42)
)
calibrated_model.fit(X_train, y_train)

# Get probabilities
y_proba_uncal = model.predict_proba(X_test)[:, 1]
y_proba_cal = calibrated_model.predict_proba(X_test)[:, 1]

# Evaluate calibration
brier_uncal = brier_score_loss(y_test, y_proba_uncal)
brier_cal = brier_score_loss(y_test, y_proba_cal)

print(f"Brier score (uncalibrated): {brier_uncal:.4f}")
print(f"Brier score (calibrated):   {brier_cal:.4f}")
print(f"Improvement: {brier_uncal - brier_cal:.4f}")

# Use custom threshold with calibrated probabilities
threshold = 0.2  # Lower threshold for imbalanced data

y_pred_uncal = (y_proba_uncal >= threshold).astype(int)
y_pred_cal = (y_proba_cal >= threshold).astype(int)

print(f"\nWith threshold={threshold}:")
print(f"Uncalibrated recall: {recall_score(y_test, y_pred_uncal):.3f}")
print(f"Calibrated recall:   {recall_score(y_test, y_pred_cal):.3f}")
```

---

## 3. Target/Frequency Encoding Inside CV vs Preprocessing

### The Data Leakage Problem

**Target encoding** uses the target variable to encode features:

```python
# Example: Customer ID with purchase history
# customer_id | purchased (target)
# ------------|-------------------
# A           | 1
# A           | 1
# A           | 0
# B           | 0
# B           | 0
# C           | 1

# Target encoding: Replace customer_id with mean purchase rate
# customer_id → purchase_rate
# A → 0.67 (2/3 purchases)
# B → 0.00 (0/2 purchases)
# C → 1.00 (1/1 purchases)
```

**❌ WRONG: Encoding before CV**

```python
from category_encoders import TargetEncoder

# ❌ DATA LEAKAGE!
encoder = TargetEncoder()
X_encoded = encoder.fit_transform(X_train, y_train)  # Uses ALL training data!

# Now do CV
cv_scores = cross_val_score(model, X_encoded, y_train, cv=5)

# PROBLEM: Each fold's validation set contains information from its own target values!
```

**What went wrong:**

```python
# Let's trace what happens:

# Original data (before split):
# customer_id | purchased
# A           | 1
# A           | 1  ← Fold 1 validation
# A           | 0
# B           | 0
# B           | 0  ← Fold 1 validation
# C           | 1

# You encoded BEFORE CV split:
encoder.fit(X_train, y_train)  # Uses ALL data including fold 1 validation!
# A → 0.67 (calculated using the validation row too!)
# B → 0.00 (calculated using the validation row too!)

# Now in Fold 1:
# Validation set:
#   A (encoded as 0.67) | actual=1  ← This 0.67 was calculated USING this row!
#   B (encoded as 0.00) | actual=0  ← This 0.00 was calculated USING this row!
#
# The model is cheating! It saw the validation labels during encoding!
```

**✅ CORRECT: Encoding inside CV (via Pipeline)**

```python
from category_encoders import TargetEncoder
from sklearn.pipeline import Pipeline

# ✅ NO LEAKAGE
pipeline = Pipeline([
    ('encoder', TargetEncoder()),  # Encoding happens INSIDE each fold
    ('model', RandomForestClassifier())
])

cv_scores = cross_val_score(pipeline, X_train, y_train, cv=5)

# What happens in each fold:
# Fold 1:
#   1. Split data → train_fold, val_fold
#   2. Fit encoder on train_fold only → A=0.50, B=0.00, C=1.00
#   3. Transform train_fold with these encodings
#   4. Transform val_fold with these encodings (never saw val_fold targets!)
#   5. Train model on encoded train_fold
#   6. Evaluate on encoded val_fold (clean!)
#
# Fold 2:
#   1. Different split...
#   2. Refit encoder (different encodings this time!)
#   ...
```

### Visualizing the Difference

```python
import pandas as pd
from sklearn.model_selection import StratifiedKFold
from category_encoders import TargetEncoder

# Sample data
df = pd.DataFrame({
    'customer_id': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C'],
    'purchased': [1, 1, 0, 0, 0, 1, 1, 1]
})

X = df[['customer_id']]
y = df['purchased']

cv = StratifiedKFold(n_splits=2, shuffle=False)  # For illustration

# ❌ WRONG: Fit on all data first
encoder_wrong = TargetEncoder()
encoder_wrong.fit(X, y)

print("="*60)
print("WRONG WAY (Data Leakage)")
print("="*60)
print("\nEncoder fitted on ALL data:")
print(encoder_wrong.transform(X))
# All A's get same encoding (0.67) calculated from all A's including validation!

for fold_idx, (train_idx, val_idx) in enumerate(cv.split(X, y)):
    print(f"\nFold {fold_idx}:")
    print(f"  Validation indices: {val_idx}")
    print(f"  Validation data:")
    print(df.iloc[val_idx])
    print(f"  Encoded values used: {encoder_wrong.transform(X.iloc[val_idx]).values.flatten()}")
    print("  ⚠️  These encodings were calculated USING validation targets!")

# ✅ CORRECT: Fit only on training fold
print("\n" + "="*60)
print("CORRECT WAY (No Leakage)")
print("="*60)

for fold_idx, (train_idx, val_idx) in enumerate(cv.split(X, y)):
    # Fit encoder ONLY on training fold
    encoder_correct = TargetEncoder()
    encoder_correct.fit(X.iloc[train_idx], y.iloc[train_idx])
    
    print(f"\nFold {fold_idx}:")
    print(f"  Training indices: {train_idx}")
    print(f"  Training data:")
    print(df.iloc[train_idx])
    
    print(f"\n  Encoder fitted on training only:")
    print(encoder_correct.transform(X.iloc[train_idx]))
    
    print(f"\n  Validation indices: {val_idx}")
    print(f"  Validation data:")
    print(df.iloc[val_idx])
    print(f"  Encoded values: {encoder_correct.transform(X.iloc[val_idx]).values.flatten()}")
    print("  ✅ These encodings were calculated WITHOUT seeing validation targets!")
```

### Why Pipeline Prevents Leakage

```python
from sklearn.pipeline import Pipeline
from category_encoders import TargetEncoder
from sklearn.ensemble import RandomForestClassifier

# ✅ Pipeline automatically does the right thing
pipeline = Pipeline([
    ('target_encoder', TargetEncoder(cols=['customer_id'])),
    ('model', RandomForestClassifier())
])

# During cross_val_score, for each fold:
cv_scores = cross_val_score(pipeline, X, y, cv=StratifiedKFold(5))

# Internally, sklearn does:
for train_idx, val_idx in cv.split(X, y):
    X_train_fold, X_val_fold = X[train_idx], X[val_idx]
    y_train_fold, y_val_fold = y[train_idx], y[val_idx]
    
    # Clone the pipeline for this fold
    fold_pipeline = clone(pipeline)
    
    # Fit ONLY on training fold
    fold_pipeline.fit(X_train_fold, y_train_fold)
    # ↑ This fits encoder on training fold only!
    
    # Evaluate on validation fold
    score = fold_pipeline.score(X_val_fold, y_val_fold)
    # ↑ Validation fold was never seen during encoding!
```

### Other Encodings That Need Pipeline

```python
# Any encoding that uses target variable MUST be in pipeline:

# ❌ WRONG (leakage)
from category_encoders import TargetEncoder, CatBoostEncoder
encoder = TargetEncoder()
X_encoded = encoder.fit_transform(X_train, y_train)
cross_val_score(model, X_encoded, y_train, cv=5)  # Leakage!

# ✅ CORRECT
pipeline = Pipeline([
    ('encoder', TargetEncoder()),
    ('model', model)
])
cross_val_score(pipeline, X_train, y_train, cv=5)  # Safe!

# Same for:
# - TargetEncoder
# - CatBoostEncoder  
# - WOEEncoder (Weight of Evidence)
# - JamesSteinEncoder
# - MEstimateEncoder
```

### Encodings Safe in Preprocessing (No Leakage)

```python
# These DON'T use target, so safe to do before CV:

from sklearn.preprocessing import LabelEncoder, OneHotEncoder

# ✅ Safe: OneHotEncoding (doesn't use y)
ohe = OneHotEncoder(handle_unknown='ignore')
X_encoded = ohe.fit_transform(X_train)  # No y needed!
cross_val_score(model, X_encoded, y_train, cv=5)  # No leakage

# ✅ Safe: Frequency encoding (doesn't use y)
def frequency_encode(X):
    freq_map = X['customer_id'].value_counts().to_dict()
    return X['customer_id'].map(freq_map)

X_encoded = frequency_encode(X_train)
cross_val_score(model, X_encoded, y_train, cv=5)  # No leakage

# But STILL better to use Pipeline for consistency!
```

---

## Summary

### 1. Stratified CV vs class_weight='balanced'

```python
# ✅ USE BOTH!
model = RandomForestClassifier(class_weight='balanced')  # Adjusts learning
cv = StratifiedKFold(5, shuffle=True, random_state=42)   # Fair splitting

# They solve different problems:
# - Stratified CV: Ensures fair data splitting
# - class_weight: Adjusts model's loss function
```

### 2. Imbalanced Data Evaluation

```python
# ✅ ALWAYS REPORT BOTH ROC-AUC AND PR-AUC
prevalence = (y_test == 1).mean()

print(f"ROC-AUC: {roc_auc_score(y_test, y_proba):.4f}")
print(f"PR-AUC:  {average_precision_score(y_test, y_proba):.4f}")
print(f"Baseline (prevalence): {prevalence:.4f}")
print(f"Improvement over random: {average_precision_score(y_test, y_proba)/prevalence:.1f}x")

# ✅ CALIBRATE if using custom thresholds
calibrated_model = CalibratedClassifierCV(model, method='sigmoid', cv=5)
y_proba = calibrated_model.predict_proba(X_test)[:, 1]
y_pred = (y_proba >= 0.3).astype(int)  # Custom threshold
```

### 3. Target Encoding

```python
# ❌ WRONG - Data leakage
encoder = TargetEncoder()
X_encoded = encoder.fit_transform(X_train, y_train)
cross_val_score(model, X_encoded, y_train, cv=5)  # LEAKAGE!

# ✅ CORRECT - Use Pipeline
pipeline = Pipeline([
    ('encoder', TargetEncoder()),
    ('model', model)
])
cross_val_score(pipeline, X_train, y_train, cv=5)  # Safe!
```

Want me to show:
1. **Complete example combining all three concepts**?
2. **How to find optimal threshold** for imbalanced data?
3. **More advanced target encoding techniques** (smoothing, leave-one-out)?