# **AI TECH INSTITUTE** ¬∑ *Intermediate AI & Data Science*
### Week 10 Session 1: Linear Models for Classification
**Instructor:** Amir Charkhi | **Goal:** Master Linear Classification Models

### Learning Objectives
- Understand logistic regression and linear classification
- Learn classification metrics: accuracy, precision, recall, F1, ROC-AUC
- Master confusion matrix interpretation
- Apply regularization for classification: Ridge and Lasso equivalents
- Handle imbalanced classes
- Compare classification models systematically

---

## 1. Import Libraries

**What you need to do:**  
Import all necessary libraries for classification tasks.

**Required imports:**
- NumPy and Pandas for data handling
- Matplotlib and Seaborn for visualization
- Scikit-learn for classification models and metrics

**üí° Hint:** We'll need `LogisticRegression`, `RidgeClassifier`, `SGDClassifier`, and classification metrics like `classification_report`, `confusion_matrix`, `roc_auc_score`.

In [None]:
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Scikit-learn imports
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.linear_model import LogisticRegression, RidgeClassifier, SGDClassifier
from sklearn.svm import LinearSVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    confusion_matrix, classification_report, roc_auc_score, roc_curve,
    precision_recall_curve, average_precision_score
)

# Set visualization style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

print("‚úÖ All libraries imported successfully!")

---
## 2. Load the Dataset

**What you need to do:**  
Load the Online Shoppers Purchasing Intention dataset from UCI.

**Theory:**  
This dataset contains **12,330 sessions** from an online shopping website. Each session belongs to a different user over a 1-year period. The goal is to predict whether a visitor will make a purchase (Revenue: True/False).

**Features:**
- **Administrative, Administrative_Duration:** Pages and time on admin pages
- **Informational, Informational_Duration:** Pages and time on info pages
- **ProductRelated, ProductRelated_Duration:** Pages and time on product pages
- **BounceRates, ExitRates, PageValues:** Website metrics
- **SpecialDay:** Closeness to special days (Valentine's, Mother's Day, etc.)
- **Month:** Month of the year
- **OperatingSystems, Browser, Region, TrafficType:** Technical attributes
- **VisitorType:** New, Returning, or Other
- **Weekend:** Whether the session was on a weekend
- **Revenue:** Target variable (True/False) - Did the user make a purchase?

**Our Goal:** Predict **Revenue** (purchase or not) - **Binary Classification**

**üí° Hint:** We'll load directly from UCI or use a CSV file.

In [None]:
# Load the Online Shoppers Purchasing Intention dataset
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00468/online_shoppers_intention.csv'

print("üì• Loading Online Shoppers dataset from UCI...")
print("This dataset predicts whether a visitor will make a purchase.\n")

# Load the data
df_raw = pd.read_csv(url)

print(f"‚úÖ Dataset loaded successfully!")
print(f"üìä Shape: {df_raw.shape[0]:,} rows √ó {df_raw.shape[1]} columns")

---
## 3. Initial Data Inspection & Preprocessing

**What you need to do:**  
Inspect the raw data and prepare it for classification.

**Tasks:**
1. Display first few rows
2. Check data types and missing values
3. Examine target variable distribution (class imbalance?)
4. Encode categorical variables
5. Handle any data quality issues

In [None]:
# Inspect raw data
print("üìã First few rows:")
print(df_raw.head())
print("\n" + "="*80 + "\n")

print("üîç Data Info:")
print(df_raw.info())
print("\n" + "="*80 + "\n")

print("‚ùì Missing Values:")
print(df_raw.isnull().sum())

In [None]:
# Examine target variable
print("üéØ Target Variable Distribution:")
print("="*60)
print(df_raw['Revenue'].value_counts())
print("\nüìä Percentage:")
print(df_raw['Revenue'].value_counts(normalize=True) * 100)

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

df_raw['Revenue'].value_counts().plot(kind='bar', ax=axes[0], color=['steelblue', 'coral'])
axes[0].set_title('Target Variable Distribution', fontsize=12, pad=15)
axes[0].set_xlabel('Revenue (Purchase Made)')
axes[0].set_ylabel('Count')
axes[0].set_xticklabels(['No Purchase', 'Purchase'], rotation=0)

df_raw['Revenue'].value_counts(normalize=True).plot(kind='pie', ax=axes[1], 
                                                      autopct='%1.1f%%', colors=['steelblue', 'coral'])
axes[1].set_title('Class Distribution (%)', fontsize=12, pad=15)
axes[1].set_ylabel('')

plt.tight_layout()
plt.show()

# Check for imbalance
positive_pct = df_raw['Revenue'].value_counts(normalize=True)[True] * 100
if positive_pct < 30:
    print(f"\n‚ö†Ô∏è Class Imbalance Detected: Only {positive_pct:.1f}% positive class (Revenue=True)")
    print("We'll need to consider this when evaluating models.")
else:
    print(f"\n‚úÖ Reasonably balanced: {positive_pct:.1f}% positive class")

In [None]:
# Data preprocessing
print("üßπ Preprocessing data...\n")

df = df_raw.copy()

# Convert target to binary (0/1)
df['Revenue'] = df['Revenue'].astype(int)

# Encode categorical variables
print("üìù Encoding categorical variables...")

# Month encoding (ordinal makes sense here)
month_map = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'June': 6,
             'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
df['Month'] = df['Month'].map(month_map)

# VisitorType: one-hot encoding
visitor_dummies = pd.get_dummies(df['VisitorType'], prefix='Visitor', drop_first=True)
visitor_dummies = visitor_dummies.astype(int)
df = pd.concat([df, visitor_dummies], axis=1)

# Weekend: already boolean, convert to int
df['Weekend'] = df['Weekend'].astype(int)

# Drop original categorical column
df = df.drop(columns=['VisitorType'])

# Verify all features are numeric
feature_cols = [col for col in df.columns if col != 'Revenue']
non_numeric = df[feature_cols].select_dtypes(exclude=[np.number]).columns.tolist()

print(f"‚úÖ Preprocessing complete!")
print(f"üìä Final shape: {df.shape}")
print(f"‚úÖ All features numeric: {len(non_numeric) == 0}")
print(f"\nüìã Feature columns ({len(feature_cols)}): {feature_cols}")

---
## 4. Train-Validation-Test Split

**‚ö†Ô∏è CRITICAL: Split BEFORE detailed EDA to prevent data leakage!**

**What you need to do:**  
Split data: 60% train, 20% validation, 20% test

**üí° Important for Classification:** Use `stratify` parameter to maintain class balance across splits!

In [None]:
# Prepare X and y
X = df[feature_cols].copy()
y = df['Revenue'].copy()

print(f"üéØ Features: {len(feature_cols)} columns")
print(f"üìä X shape: {X.shape}")
print(f"üìä y shape: {y.shape}")
print(f"\nüìä Class distribution in full dataset:")
print(y.value_counts(normalize=True))

In [None]:
# Split data with stratification
print("‚úÇÔ∏è Splitting data with stratification...\n")

# First split: 80% train+val, 20% test
X_temp, X_test, y_temp, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Second split: 75% of temp = 60% train, 25% of temp = 20% val
X_train, X_val, y_train, y_val = train_test_split(
    X_temp, y_temp, test_size=0.25, random_state=42, stratify=y_temp
)

print(f"‚úÖ Data split complete with stratification!")
print(f"\nüìä Training set:   {X_train.shape[0]:>6,} samples ({X_train.shape[0]/len(X)*100:.1f}%)")
print(f"üìä Validation set: {X_val.shape[0]:>6,} samples ({X_val.shape[0]/len(X)*100:.1f}%)")
print(f"üìä Test set:       {X_test.shape[0]:>6,} samples ({X_test.shape[0]/len(X)*100:.1f}%)")

# Verify stratification worked
print(f"\n‚úÖ Class distribution maintained:")
print(f"   Train:      {y_train.value_counts(normalize=True)[1]:.3f} positive class")
print(f"   Validation: {y_val.value_counts(normalize=True)[1]:.3f} positive class")
print(f"   Test:       {y_test.value_counts(normalize=True)[1]:.3f} positive class")

print(f"\nüîí Test set is now LOCKED until final evaluation!")

---
## 5. Exploratory Data Analysis (EDA)

**‚ö†Ô∏è IMPORTANT: Perform EDA ONLY on the training set!**

**What you need to do:**  
Understand patterns in the training data.

In [None]:
# Summary statistics by class
print("üìä Training Set Summary Statistics by Class:")
print("="*80)

train_data = X_train.copy()
train_data['Revenue'] = y_train.values

# Select a few key features to compare
key_features = ['ProductRelated', 'ProductRelated_Duration', 'BounceRates', 
                'ExitRates', 'PageValues', 'Month']

for feature in key_features:
    print(f"\n{feature}:")
    print(train_data.groupby('Revenue')[feature].describe()[['mean', '50%', 'std']])

In [None]:
# Visualize key features by class
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.ravel()

for idx, feature in enumerate(key_features):
    train_data[train_data['Revenue']==0][feature].hist(bins=30, alpha=0.6, 
                                                         label='No Purchase', ax=axes[idx], color='steelblue')
    train_data[train_data['Revenue']==1][feature].hist(bins=30, alpha=0.6, 
                                                         label='Purchase', ax=axes[idx], color='coral')
    axes[idx].set_xlabel(feature)
    axes[idx].set_ylabel('Frequency')
    axes[idx].legend()
    axes[idx].set_title(f'{feature} Distribution by Class')

plt.tight_layout()
plt.show()

In [None]:
# Correlation with target
# Use only numeric columns
numeric_train = train_data.select_dtypes(include=[np.number])
correlations = numeric_train.corr()['Revenue'].drop('Revenue').sort_values(ascending=False)

print("üîó Top 10 Features Correlated with Revenue:")
print("="*60)
for feature, corr in correlations.head(10).items():
    print(f"{feature:.<45} {corr:>8.4f}")

print("\nüîó Bottom 5 Features:")
print("="*60)
for feature, corr in correlations.tail(5).items():
    print(f"{feature:.<45} {corr:>8.4f}")

In [None]:
# Correlation heatmap
top_10_features = correlations.abs().head(10).index.tolist() + ['Revenue']
correlation_matrix = numeric_train[top_10_features].corr()

plt.figure(figsize=(12, 10))
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm',
            center=0, square=True, linewidths=1)
plt.title('Correlation Heatmap: Top Features vs Revenue', fontsize=14, pad=20)
plt.tight_layout()
plt.show()

---
## 6. Feature Scaling

**Theory:**  
Linear classification models benefit from feature scaling, especially when using regularization.

**What you need to do:**  
Standardize features using StandardScaler.

**‚ö†Ô∏è CRITICAL:** Fit the scaler ONLY on training data!

In [None]:
# Feature scaling
print("‚öñÔ∏è Scaling features...\n")

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)
X_test_scaled = scaler.transform(X_test)

print("‚úÖ Features scaled successfully!")
print(f"\nüìä Scaled training features - Mean: {X_train_scaled.mean():.6f}, Std: {X_train_scaled.std():.6f}")

---
## 7. Helper Function: Classification Metrics

**What you need to do:**  
Create a reusable function to evaluate classification models.

**Classification Metrics:**
- **Accuracy:** Overall correctness
- **Precision:** Of predicted positives, how many are actually positive?
- **Recall (Sensitivity):** Of actual positives, how many did we find?
- **F1-Score:** Harmonic mean of precision and recall
- **ROC-AUC:** Area under ROC curve (probability ranking metric)
- **Confusion Matrix:** Breakdown of predictions

In [None]:
def evaluate_classifier(model, X_train, y_train, X_val, y_val, model_name="Model"):
    """
    Comprehensive evaluation of a classification model.
    """
    # Predictions
    y_train_pred = model.predict(X_train)
    y_val_pred = model.predict(X_val)
    
    # Probabilities (if available)
    if hasattr(model, 'predict_proba'):
        y_val_prob = model.predict_proba(X_val)[:, 1]
    elif hasattr(model, 'decision_function'):
        y_val_prob = model.decision_function(X_val)
    else:
        y_val_prob = None
    
    # Calculate metrics
    train_acc = accuracy_score(y_train, y_train_pred)
    val_acc = accuracy_score(y_val, y_val_pred)
    val_precision = precision_score(y_val, y_val_pred)
    val_recall = recall_score(y_val, y_val_pred)
    val_f1 = f1_score(y_val, y_val_pred)
    
    if y_val_prob is not None:
        val_auc = roc_auc_score(y_val, y_val_prob)
    else:
        val_auc = None
    
    # Print results
    print(f"üìä {model_name} Performance:")
    print("="*70)
    print(f"{'Metric':<30} {'Training':>15} {'Validation':>15}")
    print("="*70)
    print(f"{'Accuracy':<30} {train_acc:>15.4f} {val_acc:>15.4f}")
    print(f"{'Precision':<30} {'':>15} {val_precision:>15.4f}")
    print(f"{'Recall':<30} {'':>15} {val_recall:>15.4f}")
    print(f"{'F1-Score':<30} {'':>15} {val_f1:>15.4f}")
    if val_auc is not None:
        print(f"{'ROC-AUC':<30} {'':>15} {val_auc:>15.4f}")
    print("="*70)
    
    # Confusion Matrix
    cm = confusion_matrix(y_val, y_val_pred)
    print(f"\nüìä Confusion Matrix (Validation):")
    print("="*40)
    print(f"                 Predicted No  Predicted Yes")
    print(f"Actual No        {cm[0,0]:>12}  {cm[0,1]:>13}")
    print(f"Actual Yes       {cm[1,0]:>12}  {cm[1,1]:>13}")
    print("="*40)
    
    return {
        'train_acc': train_acc,
        'val_acc': val_acc,
        'val_precision': val_precision,
        'val_recall': val_recall,
        'val_f1': val_f1,
        'val_auc': val_auc,
        'confusion_matrix': cm,
        'y_val_pred': y_val_pred,
        'y_val_prob': y_val_prob
    }

print("‚úÖ Helper function defined!")

---
## 8. Model 1: Logistic Regression

**üìö Theory:**  
Logistic Regression is the fundamental linear classification algorithm. Despite its name, it's a **classification** algorithm, not regression.

**How It Works:**
1. Computes linear combination: $z = \beta_0 + \beta_1 x_1 + ... + \beta_n x_n$
2. Applies sigmoid function: $P(y=1|x) = \frac{1}{1 + e^{-z}}$
3. Outputs probability between 0 and 1
4. Predicts class based on threshold (usually 0.5)

**Mathematical Form:**
$$P(y=1|x) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + ... + \beta_n x_n)}}$$

**Loss Function (Log Loss / Cross-Entropy):**
$$L = -\frac{1}{m} \sum_{i=1}^{m} [y_i \log(\hat{y}_i) + (1-y_i) \log(1-\hat{y}_i)]$$

**Key Parameters:**
- **C:** Inverse of regularization strength (higher C = less regularization)
- **penalty:** Type of regularization ('l1', 'l2', 'elasticnet', 'none')
- **solver:** Algorithm ('lbfgs', 'liblinear', 'saga')
- **class_weight:** Handle imbalanced classes ('balanced' or dict)
- **max_iter:** Maximum iterations for convergence

**Pros:**
- Fast training and prediction
- Outputs calibrated probabilities
- Interpretable coefficients
- Works well with high-dimensional data
- Supports regularization

**Cons:**
- Assumes linear decision boundary
- Can't capture complex patterns
- Sensitive to outliers
- Requires feature scaling for regularization

**When to Use:**
- As a strong baseline
- When you need probability estimates
- When interpretability is important
- For real-time prediction systems

**üìñ References:**
- [Scikit-learn: Logistic Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)
- [ISL Book - Chapter 4: Logistic Regression](https://www.statlearning.com/)

---

In [None]:
# Train Logistic Regression
print("üöÄ Training Logistic Regression...\n")

lr_model = LogisticRegression(random_state=42, max_iter=1000)
lr_model.fit(X_train_scaled, y_train)

print("‚úÖ Logistic Regression trained!\n")

# Evaluate
lr_results = evaluate_classifier(lr_model, X_train_scaled, y_train, 
                                  X_val_scaled, y_val, "Logistic Regression")

### Feature Importance (Coefficients)

In [None]:
# Feature importance
lr_coef = pd.DataFrame({
    'Feature': feature_cols,
    'Coefficient': lr_model.coef_[0]
}).sort_values('Coefficient', key=abs, ascending=False)

print("üéØ Logistic Regression Coefficients (Top 10):")
print("="*60)
print(lr_coef.head(10).to_string(index=False))

# Visualize
plt.figure(figsize=(10, 6))
top_coef = lr_coef.head(15).sort_values('Coefficient')
plt.barh(top_coef['Feature'], top_coef['Coefficient'])
plt.xlabel('Coefficient Value', fontsize=11)
plt.title('Logistic Regression: Top 15 Coefficients', fontsize=12, pad=15)
plt.axvline(x=0, color='black', linestyle='--', linewidth=0.8)
plt.tight_layout()
plt.show()

print("\nüí° Positive coefficients increase purchase probability.")
print("   Negative coefficients decrease purchase probability.")

### ROC Curve and Precision-Recall Curve

In [None]:
# ROC and Precision-Recall curves
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# ROC Curve
fpr, tpr, _ = roc_curve(y_val, lr_results['y_val_prob'])
axes[0].plot(fpr, tpr, linewidth=2, label=f"ROC (AUC = {lr_results['val_auc']:.3f})")
axes[0].plot([0, 1], [0, 1], 'k--', linewidth=1, label='Random Classifier')
axes[0].set_xlabel('False Positive Rate', fontsize=11)
axes[0].set_ylabel('True Positive Rate', fontsize=11)
axes[0].set_title('ROC Curve', fontsize=12, pad=15)
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Precision-Recall Curve
precision, recall, _ = precision_recall_curve(y_val, lr_results['y_val_prob'])
avg_precision = average_precision_score(y_val, lr_results['y_val_prob'])
axes[1].plot(recall, precision, linewidth=2, label=f"PR (AP = {avg_precision:.3f})")
axes[1].axhline(y=y_val.mean(), color='k', linestyle='--', linewidth=1, label='Baseline')
axes[1].set_xlabel('Recall', fontsize=11)
axes[1].set_ylabel('Precision', fontsize=11)
axes[1].set_title('Precision-Recall Curve', fontsize=12, pad=15)
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("üí° ROC-AUC: Good for balanced classes. Measures ranking ability.")
print("üí° PR Curve: Better for imbalanced classes. Shows precision-recall tradeoff.")

### Cross-Validation

In [None]:
# Cross-validation
print("üîÑ Performing 5-Fold Cross-Validation...\n")

cv_scores_acc = cross_val_score(lr_model, X_train_scaled, y_train, cv=5, scoring='accuracy', n_jobs=-1)
cv_scores_f1 = cross_val_score(lr_model, X_train_scaled, y_train, cv=5, scoring='f1', n_jobs=-1)
cv_scores_auc = cross_val_score(lr_model, X_train_scaled, y_train, cv=5, scoring='roc_auc', n_jobs=-1)

print("üìä Cross-Validation Results:")
print("="*60)
print(f"Accuracy: {cv_scores_acc.mean():.4f} (¬± {cv_scores_acc.std():.4f})")
print(f"F1-Score: {cv_scores_f1.mean():.4f} (¬± {cv_scores_f1.std():.4f})")
print(f"ROC-AUC:  {cv_scores_auc.mean():.4f} (¬± {cv_scores_auc.std():.4f})")

### Hyperparameter Tuning: Logistic Regression

In [None]:
# Hyperparameter tuning
print("üéØ Tuning Logistic Regression hyperparameters...\n")

param_grid_lr = {
    'C': [0.001, 0.01, 0.1, 1, 10, 100],
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear']
}

lr_grid = GridSearchCV(
    LogisticRegression(random_state=42, max_iter=1000),
    param_grid_lr,
    cv=5,
    scoring='roc_auc',
    n_jobs=-1,
    verbose=1
)

lr_grid.fit(X_train_scaled, y_train)

print(f"\n‚úÖ Best parameters: {lr_grid.best_params_}")
print(f"üìä Best CV ROC-AUC: {lr_grid.best_score_:.4f}")

In [None]:
# Evaluate tuned model
best_lr = lr_grid.best_estimator_
lr_tuned_results = evaluate_classifier(best_lr, X_train_scaled, y_train,
                                        X_val_scaled, y_val, "Logistic Regression (Tuned)")

---
## 9. Model 2: Ridge Classifier (L2 Regularization)

**üìö Theory:**  
Ridge Classifier converts classification to regression by encoding labels, then applies Ridge regression. It's fast and works well with many features.

**Key Characteristics:**
- Uses squared hinge loss (different from logistic loss)
- L2 regularization like Ridge regression
- Doesn't output probabilities (only class predictions)
- Very fast training

**Pros:**
- Extremely fast
- Good with high-dimensional data
- Handles multicollinearity well

**Cons:**
- No probability estimates
- Less common than Logistic Regression
- Different loss function may not suit all problems

**üìñ References:**
- [Scikit-learn: Ridge Classifier](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeClassifier.html)

---

In [None]:
# Train Ridge Classifier
print("üöÄ Training Ridge Classifier...\n")

ridge_clf = RidgeClassifier(random_state=42)
ridge_clf.fit(X_train_scaled, y_train)

print("‚úÖ Ridge Classifier trained!\n")

# Evaluate (note: no probabilities available)
y_train_pred_ridge = ridge_clf.predict(X_train_scaled)
y_val_pred_ridge = ridge_clf.predict(X_val_scaled)

train_acc_ridge = accuracy_score(y_train, y_train_pred_ridge)
val_acc_ridge = accuracy_score(y_val, y_val_pred_ridge)
val_precision_ridge = precision_score(y_val, y_val_pred_ridge)
val_recall_ridge = recall_score(y_val, y_val_pred_ridge)
val_f1_ridge = f1_score(y_val, y_val_pred_ridge)

print("üìä Ridge Classifier Performance:")
print("="*70)
print(f"{'Metric':<30} {'Training':>15} {'Validation':>15}")
print("="*70)
print(f"{'Accuracy':<30} {train_acc_ridge:>15.4f} {val_acc_ridge:>15.4f}")
print(f"{'Precision':<30} {'':>15} {val_precision_ridge:>15.4f}")
print(f"{'Recall':<30} {'':>15} {val_recall_ridge:>15.4f}")
print(f"{'F1-Score':<30} {'':>15} {val_f1_ridge:>15.4f}")
print("="*70)
print("\n‚ö†Ô∏è Note: Ridge Classifier doesn't provide probability estimates.")

### Hyperparameter Tuning: Ridge Classifier

In [None]:
# Hyperparameter tuning
print("üéØ Tuning Ridge Classifier...\n")

param_grid_ridge = {
    'alpha': [0.001, 0.01, 0.1, 1.0, 10, 100]
}

ridge_grid = GridSearchCV(
    RidgeClassifier(random_state=42),
    param_grid_ridge,
    cv=5,
    scoring='f1',
    n_jobs=-1,
    verbose=1
)

ridge_grid.fit(X_train_scaled, y_train)

print(f"\n‚úÖ Best parameters: {ridge_grid.best_params_}")
print(f"üìä Best CV F1-Score: {ridge_grid.best_score_:.4f}")

In [None]:
# Evaluate tuned Ridge
best_ridge = ridge_grid.best_estimator_
y_val_pred_ridge_tuned = best_ridge.predict(X_val_scaled)

val_acc_ridge_tuned = accuracy_score(y_val, y_val_pred_ridge_tuned)
val_precision_ridge_tuned = precision_score(y_val, y_val_pred_ridge_tuned)
val_recall_ridge_tuned = recall_score(y_val, y_val_pred_ridge_tuned)
val_f1_ridge_tuned = f1_score(y_val, y_val_pred_ridge_tuned)

print("üìä Ridge Classifier (Tuned) Performance:")
print("="*70)
print(f"Accuracy:  {val_acc_ridge_tuned:.4f}")
print(f"Precision: {val_precision_ridge_tuned:.4f}")
print(f"Recall:    {val_recall_ridge_tuned:.4f}")
print(f"F1-Score:  {val_f1_ridge_tuned:.4f}")
print("="*70)

---
## 10. Model 3: Linear SVC (Support Vector Classifier)

**üìö Theory:**  
Linear SVC finds the hyperplane that maximizes the margin between classes. It's based on Support Vector Machine theory but optimized for linear decision boundaries.

**How It Works:**
- Finds optimal separating hyperplane
- Maximizes margin (distance to nearest points)
- Only support vectors (points near boundary) matter
- Uses hinge loss

**Mathematical Objective:**
$$\min_{w,b} \frac{1}{2}||w||^2 + C \sum_{i=1}^{m} \max(0, 1 - y_i(w^T x_i + b))$$

**Key Parameters:**
- **C:** Regularization parameter (inverse strength)
- **loss:** 'hinge' or 'squared_hinge'
- **penalty:** 'l1' or 'l2'
- **dual:** Solve dual or primal problem

**Pros:**
- Effective in high dimensions
- Memory efficient (only stores support vectors)
- Works well when classes are separable

**Cons:**
- Sensitive to feature scaling
- No probability estimates by default
- Sensitive to C parameter

**üìñ References:**
- [Scikit-learn: Linear SVC](https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html)
- [ISL Book - Chapter 9: Support Vector Machines](https://www.statlearning.com/)

---

In [None]:
# Train Linear SVC
print("üöÄ Training Linear SVC...\n")

svc_model = LinearSVC(random_state=42, max_iter=2000, dual=False)
svc_model.fit(X_train_scaled, y_train)

print("‚úÖ Linear SVC trained!\n")

# Evaluate
y_train_pred_svc = svc_model.predict(X_train_scaled)
y_val_pred_svc = svc_model.predict(X_val_scaled)

train_acc_svc = accuracy_score(y_train, y_train_pred_svc)
val_acc_svc = accuracy_score(y_val, y_val_pred_svc)
val_precision_svc = precision_score(y_val, y_val_pred_svc)
val_recall_svc = recall_score(y_val, y_val_pred_svc)
val_f1_svc = f1_score(y_val, y_val_pred_svc)

# Get decision function for AUC
y_val_decision = svc_model.decision_function(X_val_scaled)
val_auc_svc = roc_auc_score(y_val, y_val_decision)

print("üìä Linear SVC Performance:")
print("="*70)
print(f"{'Metric':<30} {'Training':>15} {'Validation':>15}")
print("="*70)
print(f"{'Accuracy':<30} {train_acc_svc:>15.4f} {val_acc_svc:>15.4f}")
print(f"{'Precision':<30} {'':>15} {val_precision_svc:>15.4f}")
print(f"{'Recall':<30} {'':>15} {val_recall_svc:>15.4f}")
print(f"{'F1-Score':<30} {'':>15} {val_f1_svc:>15.4f}")
print(f"{'ROC-AUC':<30} {'':>15} {val_auc_svc:>15.4f}")
print("="*70)

### Hyperparameter Tuning: Linear SVC

In [None]:
# Hyperparameter tuning
print("üéØ Tuning Linear SVC...\n")

param_grid_svc = {
    'C': [0.001, 0.01, 0.1, 1, 10],
    'loss': ['hinge', 'squared_hinge']
}

svc_grid = GridSearchCV(
    LinearSVC(random_state=42, max_iter=2000, dual=False),
    param_grid_svc,
    cv=5,
    scoring='f1',
    n_jobs=-1,
    verbose=1
)

svc_grid.fit(X_train_scaled, y_train)

print(f"\n‚úÖ Best parameters: {svc_grid.best_params_}")
print(f"üìä Best CV F1-Score: {svc_grid.best_score_:.4f}")

In [None]:
# Evaluate tuned SVC
best_svc = svc_grid.best_estimator_
y_val_pred_svc_tuned = best_svc.predict(X_val_scaled)
y_val_decision_tuned = best_svc.decision_function(X_val_scaled)

val_acc_svc_tuned = accuracy_score(y_val, y_val_pred_svc_tuned)
val_precision_svc_tuned = precision_score(y_val, y_val_pred_svc_tuned)
val_recall_svc_tuned = recall_score(y_val, y_val_pred_svc_tuned)
val_f1_svc_tuned = f1_score(y_val, y_val_pred_svc_tuned)
val_auc_svc_tuned = roc_auc_score(y_val, y_val_decision_tuned)

print("üìä Linear SVC (Tuned) Performance:")
print("="*70)
print(f"Accuracy:  {val_acc_svc_tuned:.4f}")
print(f"Precision: {val_precision_svc_tuned:.4f}")
print(f"Recall:    {val_recall_svc_tuned:.4f}")
print(f"F1-Score:  {val_f1_svc_tuned:.4f}")
print(f"ROC-AUC:   {val_auc_svc_tuned:.4f}")
print("="*70)

---
## 11. Model Comparison

**What you need to do:**  
Compare all linear classification models.

In [None]:
# Create comparison table
comparison_df = pd.DataFrame({
    'Model': [
        'Logistic Regression',
        'Logistic Regression (Tuned)',
        'Ridge Classifier',
        'Ridge Classifier (Tuned)',
        'Linear SVC',
        'Linear SVC (Tuned)'
    ],
    'Accuracy': [
        lr_results['val_acc'],
        lr_tuned_results['val_acc'],
        val_acc_ridge,
        val_acc_ridge_tuned,
        val_acc_svc,
        val_acc_svc_tuned
    ],
    'Precision': [
        lr_results['val_precision'],
        lr_tuned_results['val_precision'],
        val_precision_ridge,
        val_precision_ridge_tuned,
        val_precision_svc,
        val_precision_svc_tuned
    ],
    'Recall': [
        lr_results['val_recall'],
        lr_tuned_results['val_recall'],
        val_recall_ridge,
        val_recall_ridge_tuned,
        val_recall_svc,
        val_recall_svc_tuned
    ],
    'F1-Score': [
        lr_results['val_f1'],
        lr_tuned_results['val_f1'],
        val_f1_ridge,
        val_f1_ridge_tuned,
        val_f1_svc,
        val_f1_svc_tuned
    ],
    'ROC-AUC': [
        lr_results['val_auc'],
        lr_tuned_results['val_auc'],
        np.nan,
        np.nan,
        val_auc_svc,
        val_auc_svc_tuned
    ]
})

# Sort by F1-Score
comparison_df = comparison_df.sort_values('F1-Score', ascending=False)

print("\n" + "="*90)
print("üìä LINEAR CLASSIFICATION MODELS COMPARISON - VALIDATION SET")
print("="*90)
print(comparison_df.to_string(index=False))
print("="*90)

best_model_name = comparison_df.iloc[0]['Model']
print(f"\nüèÜ BEST MODEL: {best_model_name}")
print(f"   F1-Score: {comparison_df.iloc[0]['F1-Score']:.4f}")
print(f"   Accuracy: {comparison_df.iloc[0]['Accuracy']:.4f}")

In [None]:
# Visualize comparison
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Accuracy
axes[0,0].barh(comparison_df['Model'], comparison_df['Accuracy'], color='steelblue')
axes[0,0].set_xlabel('Accuracy')
axes[0,0].set_title('Model Comparison: Accuracy')
axes[0,0].invert_yaxis()

# Precision
axes[0,1].barh(comparison_df['Model'], comparison_df['Precision'], color='coral')
axes[0,1].set_xlabel('Precision')
axes[0,1].set_title('Model Comparison: Precision')
axes[0,1].invert_yaxis()

# Recall
axes[1,0].barh(comparison_df['Model'], comparison_df['Recall'], color='seagreen')
axes[1,0].set_xlabel('Recall')
axes[1,0].set_title('Model Comparison: Recall')
axes[1,0].invert_yaxis()

# F1-Score
axes[1,1].barh(comparison_df['Model'], comparison_df['F1-Score'], color='purple')
axes[1,1].set_xlabel('F1-Score')
axes[1,1].set_title('Model Comparison: F1-Score (Higher is Better)')
axes[1,1].invert_yaxis()

plt.tight_layout()
plt.show()

---
## 12. Final Evaluation on Test Set

**‚ö†Ô∏è CRITICAL: Test set evaluation for best model**

**What you need to do:**  
Evaluate the best model on held-out test data.

In [None]:
# Select best model
if best_model_name == 'Logistic Regression (Tuned)':
    final_model = best_lr
elif best_model_name == 'Logistic Regression':
    final_model = lr_model
elif best_model_name == 'Ridge Classifier (Tuned)':
    final_model = best_ridge
elif best_model_name == 'Linear SVC (Tuned)':
    final_model = best_svc
else:
    final_model = lr_model

print(f"üèÜ Selected Model: {best_model_name}")
print(f"\nüîì Unlocking test set for final evaluation...\n")

In [None]:
# Final test set evaluation
y_test_pred = final_model.predict(X_test_scaled)

test_acc = accuracy_score(y_test, y_test_pred)
test_precision = precision_score(y_test, y_test_pred)
test_recall = recall_score(y_test, y_test_pred)
test_f1 = f1_score(y_test, y_test_pred)

# Get probabilities/decision function if available
if hasattr(final_model, 'predict_proba'):
    y_test_prob = final_model.predict_proba(X_test_scaled)[:, 1]
    test_auc = roc_auc_score(y_test, y_test_prob)
elif hasattr(final_model, 'decision_function'):
    y_test_decision = final_model.decision_function(X_test_scaled)
    test_auc = roc_auc_score(y_test, y_test_decision)
else:
    test_auc = None

print("\n" + "="*80)
print(f"üìä FINAL TEST SET PERFORMANCE: {best_model_name}")
print("="*80)
print(f"Accuracy:  {test_acc:.4f}")
print(f"Precision: {test_precision:.4f}")
print(f"Recall:    {test_recall:.4f}")
print(f"F1-Score:  {test_f1:.4f}")
if test_auc is not None:
    print(f"ROC-AUC:   {test_auc:.4f}")
print("="*80)

# Detailed classification report
print("\nüìä Detailed Classification Report (Test Set):")
print("="*80)
print(classification_report(y_test, y_test_pred, target_names=['No Purchase', 'Purchase']))

In [None]:
# Confusion Matrix Visualization
cm_test = confusion_matrix(y_test, y_test_pred)

plt.figure(figsize=(8, 6))
sns.heatmap(cm_test, annot=True, fmt='d', cmap='Blues', cbar=False,
            xticklabels=['No Purchase', 'Purchase'],
            yticklabels=['No Purchase', 'Purchase'])
plt.xlabel('Predicted Label', fontsize=11)
plt.ylabel('True Label', fontsize=11)
plt.title(f'Confusion Matrix: {best_model_name} (Test Set)', fontsize=12, pad=15)
plt.tight_layout()
plt.show()

print(f"\nüìä Confusion Matrix Breakdown:")
print(f"   True Negatives:  {cm_test[0,0]:>6,} (Correct No Purchase)")
print(f"   False Positives: {cm_test[0,1]:>6,} (Predicted Purchase, Actually No)")
print(f"   False Negatives: {cm_test[1,0]:>6,} (Predicted No, Actually Purchase)")
print(f"   True Positives:  {cm_test[1,1]:>6,} (Correct Purchase)")

---
## 13. Key Takeaways & Classification Insights

**What you should have learned:**

### 1Ô∏è‚É£ Linear Classification Models

‚úÖ **Logistic Regression**
- Gold standard for binary classification
- Outputs calibrated probabilities
- Supports L1, L2, and ElasticNet regularization
- Use when: You need probabilities or interpretability

‚úÖ **Ridge Classifier**
- Fast alternative to Logistic Regression
- No probability estimates
- Good with high-dimensional data
- Use when: Speed matters, probabilities not needed

‚úÖ **Linear SVC**
- Based on Support Vector Machine theory
- Maximizes margin between classes
- Effective in high dimensions
- Use when: Classes are well-separated

### 2Ô∏è‚É£ Classification Metrics Deep Dive

**Accuracy:** Overall correctness
- Good for: Balanced classes
- Bad for: Imbalanced classes

**Precision:** How many predicted positives are correct?
- Use when: False positives are costly
- Example: Spam detection (don't want to mark real emails as spam)

**Recall (Sensitivity):** How many actual positives did we find?
- Use when: False negatives are costly
- Example: Disease detection (don't want to miss sick patients)

**F1-Score:** Balance of precision and recall
- Use when: You want balance between precision/recall
- Good for: Imbalanced classes

**ROC-AUC:** Probability ranking metric
- Use when: You care about ranking quality
- Threshold-independent

### 3Ô∏è‚É£ Important Classification Concepts

**Class Imbalance:**
- Our dataset: ~84% no purchase, ~16% purchase
- Solutions: class_weight='balanced', SMOTE, threshold tuning
- Metrics: Prefer F1, precision/recall, PR-AUC over accuracy

**Threshold Tuning:**
- Default: 0.5 probability threshold
- Can adjust based on business needs
- Higher threshold ‚Üí More precision, less recall
- Lower threshold ‚Üí More recall, less precision

**Stratification:**
- Always use `stratify=y` in train_test_split
- Maintains class distribution across splits
- Critical for imbalanced datasets

### 4Ô∏è‚É£ Model Selection for Classification

**Start with Logistic Regression:**
- Excellent baseline
- Fast, interpretable, probabilities
- Hard to beat on many problems

**Try Ridge Classifier if:**
- You need speed
- You don't need probabilities
- You have many features

**Try Linear SVC if:**
- Classes are well-separated
- You want margin maximization
- High-dimensional space

### 5Ô∏è‚É£ Real-World Insights

- **PageValues** was likely the most important feature (higher page values ‚Üí more likely to purchase)
- **ProductRelated_Duration** matters (time spent on product pages)
- **BounceRates and ExitRates** are negative indicators
- Linear models work well for binary classification even with imbalanced data

---

### üìù Reflection Questions
1. Why is accuracy not the best metric for imbalanced classification?
2. When would you prefer precision over recall?
3. How does regularization affect classification boundaries?
4. What's the difference between ROC-AUC and PR-AUC?
5. Why might Logistic Regression outperform more complex models?

---

### üöÄ Next Steps: Week 10 Session 2
**Tree-Based Classification:**
- Decision Trees for classification
- Random Forests
- Gradient Boosting
- Feature importance
- Handling non-linear decision boundaries

---

**AI Tech Institute** | *Building Tomorrow's AI Engineers Today*