# **AI TECH INSTITUTE** ¬∑ *Intermediate AI & Data Science*
### Week 10 Session 2: Tree-Based Models for Classification
**Instructor:** Amir Charkhi | **Goal:** Master Tree-Based Classification

### Learning Objectives
- Understand decision trees for classification
- Learn ensemble methods: Random Forest and Gradient Boosting
- Compare tree-based with linear classification models
- Master feature importance for classification
- Handle imbalanced classification with tree models
- Apply advanced hyperparameter tuning

---

## 1. Import Libraries

**What you need to do:**  
Import all necessary libraries for tree-based classification.

**üí° Hint:** We'll need `DecisionTreeClassifier`, `RandomForestClassifier`, `GradientBoostingClassifier`, and classification metrics.

In [None]:
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Scikit-learn imports
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV, RandomizedSearchCV
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    confusion_matrix, classification_report, roc_auc_score, roc_curve,
    precision_recall_curve, average_precision_score
)

# Advanced gradient boosting (optional)
try:
    from xgboost import XGBClassifier
    print("‚úÖ XGBoost available")
except ImportError:
    print("‚ö†Ô∏è XGBoost not installed")

try:
    from lightgbm import LGBMClassifier
    print("‚úÖ LightGBM available")
except ImportError:
    print("‚ö†Ô∏è LightGBM not installed")

# Set visualization style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

print("\n‚úÖ All core libraries imported successfully!")

---
## 2. Load and Prepare Dataset

**What you need to do:**  
Load the same Online Shoppers dataset we used for linear models.

**Our Goal:** Predict **Revenue** (purchase or not) using tree-based models.

In [None]:
# Load the dataset
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00468/online_shoppers_intention.csv'

print("üì• Loading Online Shoppers dataset...")
df_raw = pd.read_csv(url)

print(f"‚úÖ Dataset loaded!")
print(f"üìä Shape: {df_raw.shape[0]:,} rows √ó {df_raw.shape[1]} columns")

---
## 3. Data Preprocessing

**What you need to do:**  
Apply the same preprocessing steps as linear models.

**Note:** Tree-based models have different characteristics:
- ‚úÖ Don't require feature scaling
- ‚úÖ Handle non-linear relationships naturally
- ‚úÖ Can work with categorical variables (though we'll encode them)
- ‚úÖ Robust to outliers

In [None]:
# Preprocessing
print("üßπ Preprocessing data...\n")

df = df_raw.copy()

# Convert target to binary
df['Revenue'] = df['Revenue'].astype(int)

# Encode categorical variables
month_map = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'June': 6,
             'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
df['Month'] = df['Month'].map(month_map)

# VisitorType: one-hot encoding
visitor_dummies = pd.get_dummies(df['VisitorType'], prefix='Visitor', drop_first=True)
visitor_dummies = visitor_dummies.astype(int)
df = pd.concat([df, visitor_dummies], axis=1)

# Weekend to int
df['Weekend'] = df['Weekend'].astype(int)

# Drop original categorical column
df = df.drop(columns=['VisitorType'])

# Prepare features
feature_cols = [col for col in df.columns if col != 'Revenue']
X = df[feature_cols].copy()
y = df['Revenue'].copy()

print(f"‚úÖ Preprocessing complete!")
print(f"üìä Features: {len(feature_cols)} columns")
print(f"üìä Class distribution: {y.value_counts().to_dict()}")

---
## 4. Train-Validation-Test Split

**What you need to do:**  
Split data with stratification: 60% train, 20% validation, 20% test

In [None]:
# Split with stratification
print("‚úÇÔ∏è Splitting data with stratification...\n")

X_temp, X_test, y_temp, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)
X_train, X_val, y_train, y_val = train_test_split(
    X_temp, y_temp, test_size=0.25, random_state=42, stratify=y_temp
)

print(f"üìä Training:   {X_train.shape[0]:>6,} samples ({X_train.shape[0]/len(X)*100:.1f}%)")
print(f"üìä Validation: {X_val.shape[0]:>6,} samples ({X_val.shape[0]/len(X)*100:.1f}%)")
print(f"üìä Test:       {X_test.shape[0]:>6,} samples ({X_test.shape[0]/len(X)*100:.1f}%)")

print(f"\n‚úÖ Class distribution maintained:")
print(f"   Train:      {y_train.value_counts(normalize=True)[1]:.3f} positive class")
print(f"   Validation: {y_val.value_counts(normalize=True)[1]:.3f} positive class")
print(f"   Test:       {y_test.value_counts(normalize=True)[1]:.3f} positive class")

print(f"\nüí° Note: Tree models DON'T require feature scaling!")

---
## 5. Quick EDA Summary

In [None]:
# Quick summary
print("üìä Training Features Summary:")
print(X_train.describe())

---
## 6. Helper Function: Classification Evaluation

In [None]:
def evaluate_classifier(model, X_train, y_train, X_val, y_val, model_name="Model"):
    """
    Comprehensive evaluation of classification model.
    """
    # Predictions
    y_train_pred = model.predict(X_train)
    y_val_pred = model.predict(X_val)
    
    # Probabilities
    if hasattr(model, 'predict_proba'):
        y_val_prob = model.predict_proba(X_val)[:, 1]
        val_auc = roc_auc_score(y_val, y_val_prob)
    else:
        y_val_prob = None
        val_auc = None
    
    # Metrics
    train_acc = accuracy_score(y_train, y_train_pred)
    val_acc = accuracy_score(y_val, y_val_pred)
    val_precision = precision_score(y_val, y_val_pred)
    val_recall = recall_score(y_val, y_val_pred)
    val_f1 = f1_score(y_val, y_val_pred)
    
    # Print results
    print(f"üìä {model_name} Performance:")
    print("="*70)
    print(f"{'Metric':<30} {'Training':>15} {'Validation':>15}")
    print("="*70)
    print(f"{'Accuracy':<30} {train_acc:>15.4f} {val_acc:>15.4f}")
    print(f"{'Precision':<30} {'':>15} {val_precision:>15.4f}")
    print(f"{'Recall':<30} {'':>15} {val_recall:>15.4f}")
    print(f"{'F1-Score':<30} {'':>15} {val_f1:>15.4f}")
    if val_auc is not None:
        print(f"{'ROC-AUC':<30} {'':>15} {val_auc:>15.4f}")
    print("="*70)
    
    # Check overfitting
    if train_acc - val_acc > 0.1:
        print(f"\n‚ö†Ô∏è Overfitting detected! Training accuracy much higher than validation.")
    
    return {
        'train_acc': train_acc,
        'val_acc': val_acc,
        'val_precision': val_precision,
        'val_recall': val_recall,
        'val_f1': val_f1,
        'val_auc': val_auc,
        'y_val_pred': y_val_pred,
        'y_val_prob': y_val_prob
    }

print("‚úÖ Helper function defined!")

---
## 7. Model 1: Decision Tree Classifier

**üìö Theory:**  
Decision Trees for classification work similarly to regression trees but predict class labels.

**How It Works:**
1. Start with all data at root
2. Find best split that maximizes **information gain** or minimizes **Gini impurity**
3. Create child nodes recursively
4. Predict: most common class in leaf node

**Splitting Criteria:**

**Gini Impurity (default):**
$$Gini = 1 - \sum_{i=1}^{c} p_i^2$$
Where $p_i$ is probability of class $i$. Lower Gini = purer node.

**Entropy (Information Gain):**
$$Entropy = -\sum_{i=1}^{c} p_i \log_2(p_i)$$
Measures uncertainty. Lower entropy = more certain.

**Key Hyperparameters:**
- **criterion:** 'gini' or 'entropy'
- **max_depth:** Maximum tree depth
- **min_samples_split:** Min samples to split
- **min_samples_leaf:** Min samples in leaf
- **max_features:** Features per split
- **class_weight:** Handle imbalanced classes ('balanced' or dict)

**Pros:**
- Easy to visualize and interpret
- Handles non-linear patterns
- No scaling needed
- Can output feature importance

**Cons:**
- **Very prone to overfitting**
- Unstable (small changes ‚Üí different tree)
- Not optimal for prediction accuracy

**üìñ References:**
- [Scikit-learn: Decision Tree Classifier](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html)

---

In [None]:
# Train Decision Tree (unrestricted)
print("üå≥ Training Decision Tree Classifier (Unrestricted)...\n")

dt_model = DecisionTreeClassifier(random_state=42)
dt_model.fit(X_train, y_train)

print("‚úÖ Decision Tree trained!")
print(f"\nüìä Tree Structure:")
print(f"   Max depth: {dt_model.get_depth()}")
print(f"   Number of leaves: {dt_model.get_n_leaves()}")
print(f"\n")

# Evaluate
dt_results = evaluate_classifier(dt_model, X_train, y_train, X_val, y_val, "Decision Tree")

### Feature Importance

In [None]:
# Feature importance
dt_importance = pd.DataFrame({
    'Feature': feature_cols,
    'Importance': dt_model.feature_importances_
}).sort_values('Importance', ascending=False)

print("üéØ Decision Tree Feature Importance (Top 10):")
print("="*60)
print(dt_importance.head(10).to_string(index=False))

# Visualize
plt.figure(figsize=(10, 6))
top_features = dt_importance.head(10)
plt.barh(top_features['Feature'], top_features['Importance'])
plt.xlabel('Feature Importance', fontsize=11)
plt.title('Decision Tree: Top 10 Features', fontsize=12, pad=15)
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()

### Confusion Matrix

In [None]:
# Confusion Matrix
cm = confusion_matrix(y_val, dt_results['y_val_pred'])

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=False,
            xticklabels=['No Purchase', 'Purchase'],
            yticklabels=['No Purchase', 'Purchase'])
plt.xlabel('Predicted', fontsize=11)
plt.ylabel('Actual', fontsize=11)
plt.title('Decision Tree: Confusion Matrix', fontsize=12, pad=15)
plt.tight_layout()
plt.show()

### Hyperparameter Tuning: Decision Tree

In [None]:
# Hyperparameter tuning
print("üéØ Tuning Decision Tree hyperparameters...\n")

param_grid_dt = {
    'max_depth': [3, 5, 7, 10, 15, None],
    'min_samples_split': [2, 5, 10, 20],
    'min_samples_leaf': [1, 2, 4, 8],
    'criterion': ['gini', 'entropy'],
    'class_weight': [None, 'balanced']
}

dt_grid = GridSearchCV(
    DecisionTreeClassifier(random_state=42),
    param_grid_dt,
    cv=5,
    scoring='f1',
    n_jobs=-1,
    verbose=1
)

dt_grid.fit(X_train, y_train)

print(f"\n‚úÖ Best parameters: {dt_grid.best_params_}")
print(f"üìä Best CV F1-Score: {dt_grid.best_score_:.4f}")

In [None]:
# Evaluate tuned model
best_dt = dt_grid.best_estimator_
dt_tuned_results = evaluate_classifier(best_dt, X_train, y_train, X_val, y_val,
                                        "Decision Tree (Tuned)")

print(f"\nüìä Tree Structure (Tuned):")
print(f"   Max depth: {best_dt.get_depth()}")
print(f"   Number of leaves: {best_dt.get_n_leaves()}")

---
## 8. Model 2: Random Forest Classifier

**üìö Theory:**  
Random Forest for classification: ensemble of decision tree classifiers.

**How It Works:**
1. Create N decision tree classifiers
2. Each tree: bootstrap sample + random feature subset
3. Each tree votes for a class
4. Final prediction = **majority vote**
5. Probabilities = average of tree probabilities

**Key Advantages for Classification:**
- Reduces overfitting dramatically
- Provides probability estimates
- Handles imbalanced classes well
- Out-of-bag (OOB) score for free validation

**Key Hyperparameters:**
- **n_estimators:** Number of trees
- **max_depth:** Tree depth
- **min_samples_split / leaf:** Node constraints
- **max_features:** Features per split ('sqrt' is common for classification)
- **class_weight:** Handle imbalanced classes
- **bootstrap:** Use bootstrap sampling (default True)

**Pros:**
- Excellent performance out-of-the-box
- Robust to overfitting
- Handles imbalanced data
- Provides feature importance
- Probability estimates available

**Cons:**
- Less interpretable than single tree
- Slower than single tree
- Larger model size

**üìñ References:**
- [Scikit-learn: Random Forest Classifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)

---

In [None]:
# Train Random Forest
print("üå≤üå≤üå≤ Training Random Forest Classifier...\n")

rf_model = RandomForestClassifier(
    n_estimators=100,
    random_state=42,
    n_jobs=-1,
    verbose=1
)
rf_model.fit(X_train, y_train)

print("\n‚úÖ Random Forest trained with 100 trees!\n")

# Evaluate
rf_results = evaluate_classifier(rf_model, X_train, y_train, X_val, y_val, "Random Forest")

### Feature Importance

In [None]:
# Feature importance
rf_importance = pd.DataFrame({
    'Feature': feature_cols,
    'Importance': rf_model.feature_importances_
}).sort_values('Importance', ascending=False)

print("üéØ Random Forest Feature Importance (Top 10):")
print("="*60)
print(rf_importance.head(10).to_string(index=False))

# Visualize
plt.figure(figsize=(10, 6))
top_features_rf = rf_importance.head(10)
plt.barh(top_features_rf['Feature'], top_features_rf['Importance'], color='forestgreen')
plt.xlabel('Feature Importance', fontsize=11)
plt.title('Random Forest: Top 10 Features', fontsize=12, pad=15)
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()

### ROC Curve

In [None]:
# ROC Curve
fpr, tpr, _ = roc_curve(y_val, rf_results['y_val_prob'])

plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, linewidth=2, label=f"Random Forest (AUC = {rf_results['val_auc']:.3f})")
plt.plot([0, 1], [0, 1], 'k--', linewidth=1, label='Random')
plt.xlabel('False Positive Rate', fontsize=11)
plt.ylabel('True Positive Rate', fontsize=11)
plt.title('Random Forest: ROC Curve', fontsize=12, pad=15)
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

### Hyperparameter Tuning: Random Forest

In [None]:
# Hyperparameter tuning
print("üéØ Tuning Random Forest hyperparameters...\n")
print("‚è≥ This may take a few minutes...\n")

param_dist_rf = {
    'n_estimators': [100, 200, 300],
    'max_depth': [10, 15, 20, None],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'max_features': ['sqrt', 'log2'],
    'class_weight': [None, 'balanced', 'balanced_subsample']
}

rf_random = RandomizedSearchCV(
    RandomForestClassifier(random_state=42, n_jobs=-1),
    param_dist_rf,
    n_iter=20,
    cv=5,
    scoring='f1',
    n_jobs=-1,
    verbose=1,
    random_state=42
)

rf_random.fit(X_train, y_train)

print(f"\n‚úÖ Best parameters: {rf_random.best_params_}")
print(f"üìä Best CV F1-Score: {rf_random.best_score_:.4f}")

In [None]:
# Evaluate tuned Random Forest
best_rf = rf_random.best_estimator_
rf_tuned_results = evaluate_classifier(best_rf, X_train, y_train, X_val, y_val,
                                        "Random Forest (Tuned)")

---
## 9. Model 3: Gradient Boosting Classifier

**üìö Theory:**  
Gradient Boosting for classification: sequential ensemble that builds trees to correct errors.

**How It Works:**
1. Start with simple prediction (log-odds of base rate)
2. Calculate pseudo-residuals (gradient of loss)
3. Fit new tree to residuals
4. Update model by adding scaled tree prediction
5. Repeat for N iterations
6. Final prediction via sigmoid transformation

**Loss Function (Log Loss):**
$$L = -\frac{1}{m} \sum_{i=1}^{m} [y_i \log(p_i) + (1-y_i) \log(1-p_i)]$$

**Key Hyperparameters:**
- **n_estimators:** Number of boosting iterations
- **learning_rate:** Shrinkage (0.01-0.3)
- **max_depth:** Tree depth (3-8 works well)
- **subsample:** Fraction of samples per tree
- **min_samples_split / leaf:** Node constraints

**Pros:**
- Often best performance
- Handles imbalanced classes well
- Provides probability estimates
- Feature importance available

**Cons:**
- Sequential training (slower)
- More hyperparameters to tune
- Can overfit if not careful
- Less interpretable

**üìñ References:**
- [Scikit-learn: Gradient Boosting Classifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html)

---

In [None]:
# Train Gradient Boosting
print("üöÄ Training Gradient Boosting Classifier...\n")

gb_model = GradientBoostingClassifier(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=5,
    random_state=42,
    verbose=1
)
gb_model.fit(X_train, y_train)

print("\n‚úÖ Gradient Boosting trained!\n")

# Evaluate
gb_results = evaluate_classifier(gb_model, X_train, y_train, X_val, y_val,
                                  "Gradient Boosting")

### Feature Importance

In [None]:
# Feature importance
gb_importance = pd.DataFrame({
    'Feature': feature_cols,
    'Importance': gb_model.feature_importances_
}).sort_values('Importance', ascending=False)

print("üéØ Gradient Boosting Feature Importance (Top 10):")
print("="*60)
print(gb_importance.head(10).to_string(index=False))

# Visualize
plt.figure(figsize=(10, 6))
top_features_gb = gb_importance.head(10)
plt.barh(top_features_gb['Feature'], top_features_gb['Importance'], color='darkorange')
plt.xlabel('Feature Importance', fontsize=11)
plt.title('Gradient Boosting: Top 10 Features', fontsize=12, pad=15)
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()

### Hyperparameter Tuning: Gradient Boosting

In [None]:
# Hyperparameter tuning
print("üéØ Tuning Gradient Boosting hyperparameters...\n")
print("‚è≥ This will take several minutes...\n")

param_grid_gb = {
    'n_estimators': [100, 200, 300],
    'learning_rate': [0.01, 0.05, 0.1],
    'max_depth': [3, 5, 7],
    'min_samples_split': [2, 5, 10],
    'subsample': [0.8, 1.0]
}

gb_random = RandomizedSearchCV(
    GradientBoostingClassifier(random_state=42),
    param_grid_gb,
    n_iter=20,
    cv=5,
    scoring='f1',
    n_jobs=-1,
    verbose=1,
    random_state=42
)

gb_random.fit(X_train, y_train)

print(f"\n‚úÖ Best parameters: {gb_random.best_params_}")
print(f"üìä Best CV F1-Score: {gb_random.best_score_:.4f}")

In [None]:
# Evaluate tuned Gradient Boosting
best_gb = gb_random.best_estimator_
gb_tuned_results = evaluate_classifier(best_gb, X_train, y_train, X_val, y_val,
                                        "Gradient Boosting (Tuned)")

---
## 10. Model Comparison

**What you need to do:**  
Compare all tree-based classification models.

In [None]:
# Create comparison table
tree_comparison = pd.DataFrame({
    'Model': [
        'Decision Tree',
        'Decision Tree (Tuned)',
        'Random Forest',
        'Random Forest (Tuned)',
        'Gradient Boosting',
        'Gradient Boosting (Tuned)'
    ],
    'Accuracy': [
        dt_results['val_acc'],
        dt_tuned_results['val_acc'],
        rf_results['val_acc'],
        rf_tuned_results['val_acc'],
        gb_results['val_acc'],
        gb_tuned_results['val_acc']
    ],
    'Precision': [
        dt_results['val_precision'],
        dt_tuned_results['val_precision'],
        rf_results['val_precision'],
        rf_tuned_results['val_precision'],
        gb_results['val_precision'],
        gb_tuned_results['val_precision']
    ],
    'Recall': [
        dt_results['val_recall'],
        dt_tuned_results['val_recall'],
        rf_results['val_recall'],
        rf_tuned_results['val_recall'],
        gb_results['val_recall'],
        gb_tuned_results['val_recall']
    ],
    'F1-Score': [
        dt_results['val_f1'],
        dt_tuned_results['val_f1'],
        rf_results['val_f1'],
        rf_tuned_results['val_f1'],
        gb_results['val_f1'],
        gb_tuned_results['val_f1']
    ],
    'ROC-AUC': [
        dt_results['val_auc'],
        dt_tuned_results['val_auc'],
        rf_results['val_auc'],
        rf_tuned_results['val_auc'],
        gb_results['val_auc'],
        gb_tuned_results['val_auc']
    ]
})

# Sort by F1-Score
tree_comparison = tree_comparison.sort_values('F1-Score', ascending=False)

print("\n" + "="*90)
print("üìä TREE-BASED CLASSIFICATION MODELS - VALIDATION SET")
print("="*90)
print(tree_comparison.to_string(index=False))
print("="*90)

best_tree_model = tree_comparison.iloc[0]['Model']
print(f"\nüèÜ BEST TREE MODEL: {best_tree_model}")
print(f"   F1-Score: {tree_comparison.iloc[0]['F1-Score']:.4f}")
print(f"   ROC-AUC:  {tree_comparison.iloc[0]['ROC-AUC']:.4f}")

In [None]:
# Visualize comparison
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Accuracy
axes[0,0].barh(tree_comparison['Model'], tree_comparison['Accuracy'], color='steelblue')
axes[0,0].set_xlabel('Accuracy')
axes[0,0].set_title('Tree Models: Accuracy')
axes[0,0].invert_yaxis()

# F1-Score
axes[0,1].barh(tree_comparison['Model'], tree_comparison['F1-Score'], color='purple')
axes[0,1].set_xlabel('F1-Score')
axes[0,1].set_title('Tree Models: F1-Score (Higher is Better)')
axes[0,1].invert_yaxis()

# Precision
axes[1,0].barh(tree_comparison['Model'], tree_comparison['Precision'], color='coral')
axes[1,0].set_xlabel('Precision')
axes[1,0].set_title('Tree Models: Precision')
axes[1,0].invert_yaxis()

# ROC-AUC
axes[1,1].barh(tree_comparison['Model'], tree_comparison['ROC-AUC'], color='seagreen')
axes[1,1].set_xlabel('ROC-AUC')
axes[1,1].set_title('Tree Models: ROC-AUC')
axes[1,1].invert_yaxis()

plt.tight_layout()
plt.show()

---
## 11. Final Evaluation on Test Set

**‚ö†Ô∏è CRITICAL: Test set evaluation**

In [None]:
# Select best model
if best_tree_model == 'Decision Tree (Tuned)':
    final_model = best_dt
elif best_tree_model == 'Random Forest (Tuned)':
    final_model = best_rf
elif best_tree_model == 'Gradient Boosting (Tuned)':
    final_model = best_gb
else:
    final_model = best_rf  # Default

print(f"üèÜ Selected Model: {best_tree_model}")
print(f"\nüîì Unlocking test set...\n")

In [None]:
# Final test evaluation
y_test_pred = final_model.predict(X_test)
y_test_prob = final_model.predict_proba(X_test)[:, 1]

test_acc = accuracy_score(y_test, y_test_pred)
test_precision = precision_score(y_test, y_test_pred)
test_recall = recall_score(y_test, y_test_pred)
test_f1 = f1_score(y_test, y_test_pred)
test_auc = roc_auc_score(y_test, y_test_prob)

print("\n" + "="*80)
print(f"üìä FINAL TEST SET PERFORMANCE: {best_tree_model}")
print("="*80)
print(f"Accuracy:  {test_acc:.4f}")
print(f"Precision: {test_precision:.4f}")
print(f"Recall:    {test_recall:.4f}")
print(f"F1-Score:  {test_f1:.4f}")
print(f"ROC-AUC:   {test_auc:.4f}")
print("="*80)

# Classification report
print("\nüìä Detailed Classification Report:")
print("="*80)
print(classification_report(y_test, y_test_pred, target_names=['No Purchase', 'Purchase']))

In [None]:
# Final confusion matrix
cm_test = confusion_matrix(y_test, y_test_pred)

plt.figure(figsize=(8, 6))
sns.heatmap(cm_test, annot=True, fmt='d', cmap='Blues', cbar=False,
            xticklabels=['No Purchase', 'Purchase'],
            yticklabels=['No Purchase', 'Purchase'])
plt.xlabel('Predicted', fontsize=11)
plt.ylabel('Actual', fontsize=11)
plt.title(f'{best_tree_model}: Confusion Matrix (Test Set)', fontsize=12, pad=15)
plt.tight_layout()
plt.show()

---
## 12. Key Takeaways

**What you should have learned:**

### 1Ô∏è‚É£ Tree-Based Classification Models

‚úÖ **Decision Tree Classifier**
- Interpretable but prone to overfitting
- Uses Gini or Entropy for splits
- Good for understanding feature interactions

‚úÖ **Random Forest Classifier**
- Robust ensemble method
- Excellent for imbalanced classification
- Provides probability estimates
- Often best out-of-the-box performance

‚úÖ **Gradient Boosting Classifier**
- Sequential ensemble (boosting)
- Often achieves best performance when tuned
- More sensitive to hyperparameters
- Slower training than Random Forest

### 2Ô∏è‚É£ Trees vs Linear Models for Classification

**Tree-Based Advantages:**
- Handle non-linear decision boundaries
- No feature scaling needed
- Automatic feature interaction detection
- Robust to outliers
- Work well with mixed data types

**Linear Model Advantages:**
- More interpretable coefficients
- Faster training and prediction
- Better with high-dimensional sparse data
- Calibrated probabilities (Logistic Regression)

### 3Ô∏è‚É£ Handling Imbalanced Classification

**Strategies Used:**
- `class_weight='balanced'`: Adjusts for class imbalance
- `stratify=y`: Maintains class distribution in splits
- F1-Score: Better metric than accuracy
- ROC-AUC: Threshold-independent metric

**Other Options (not covered):**
- SMOTE: Synthetic minority oversampling
- Undersampling majority class
- Threshold tuning based on business costs

### 4Ô∏è‚É£ Feature Importance Insights

**Key Findings (likely):**
- PageValues: Most important (higher values ‚Üí purchase)
- ProductRelated_Duration: Time on product pages matters
- ExitRates/BounceRates: Negative indicators
- Month: Seasonal patterns exist

**Important Notes:**
- Feature importance ‚â† causation
- Different models may rank features differently
- Always validate with domain knowledge

### 5Ô∏è‚É£ Model Selection Guidelines

**For Production Systems:**
1. **Start with Random Forest**: Robust, minimal tuning needed
2. **Try Gradient Boosting**: If you need max performance
3. **Consider Logistic Regression**: If interpretability crucial
4. **Avoid single Decision Tree**: Unless interpretability is paramount

**For This E-Commerce Dataset:**
- Tree models likely outperformed linear models
- Random Forest: Great balance of performance and speed
- Gradient Boosting: Possibly best F1-score
- Class imbalance (~16% purchase) handled well by trees

---

### üìù Reflection Questions
1. Did tree models outperform linear models? Why?
2. How did Random Forest reduce overfitting vs single tree?
3. What's the precision-recall tradeoff for this business problem?
4. Which model would you deploy and why?
5. How would you handle false positives vs false negatives?

---

### üöÄ Next Steps: Week 11
**Advanced Topics:**
- Neural Networks for classification
- XGBoost and LightGBM deep dive
- Model deployment with Flask/FastAPI
- Advanced feature engineering
- Calibration and threshold optimization

---

**AI Tech Institute** | *Building Tomorrow's AI Engineers Today*