# Logistic Regression with Regularization

This notebook implements logistic regression with L1 (Lasso) and L2 (Ridge) regularization from scratch using NumPy.

## Mathematical Foundation

### Standard Logistic Regression Cost Function
$$J(w,b) = -\frac{1}{m} \sum_{i=1}^m [y^{(i)} \log(h_w(x^{(i)})) + (1-y^{(i)}) \log(1-h_w(x^{(i)}))]$$

where $h_w(x) = \sigma(w^T x + b) = \frac{1}{1 + e^{-(w^T x + b)}}$

### Ridge Logistic Regression (L2 Regularization)
$$J_{ridge}(w,b) = J(w,b) + \lambda \sum_{j=1}^n w_j^2$$

### Lasso Logistic Regression (L1 Regularization)
$$J_{lasso}(w,b) = J(w,b) + \lambda \sum_{j=1}^n |w_j|$$

### Elastic Net Logistic Regression (L1 + L2)
$$J_{elastic}(w,b) = J(w,b) + \lambda_1 \sum_{j=1}^n |w_j| + \lambda_2 \sum_{j=1}^n w_j^2$$

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

print("üìö Logistic Regression with Regularization")
print("Implementing Ridge, Lasso, and Elastic Net from scratch")

## Data Loading

We'll use the same tumor dataset from the original logistic regression notebook for binary classification.

In [None]:
# Load the tumor dataset (same as original logistic regression notebook)
data = pd.read_csv("../../../Data/tumor_data.csv")

print(f"Dataset shape: {data.shape}")
print(f"\nFirst 5 rows:")
print(data.head())

print(f"\nTarget distribution:")
print(data['malignant'].value_counts())
print(f"\nTarget proportions:")
print(data['malignant'].value_counts(normalize=True))

# Check for missing values
print(f"\nMissing values: {data.isnull().sum().sum()}")

# Basic statistics
print(f"\nDataset statistics:")
print(data.describe())

## Data Preprocessing

We'll split the data and standardize the features (same approach as the original notebook).

In [None]:
# Split the data into features and labels
X = data.drop('malignant', axis=1).values
y = data['malignant'].values

print(f"Features shape: {X.shape}")
print(f"Target shape: {y.shape}")

# Split the dataset into training, validation, and test sets
X_temp, X_test, y_temp, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
X_train, X_val, y_train, y_val = train_test_split(X_temp, y_temp, test_size=0.25, random_state=42, stratify=y_temp)  # 0.25 * 0.8 = 0.2

print(f"\nData splits:")
print(f"Training set: {X_train.shape} ({X_train.shape[0]/len(y)*100:.0f}%)")
print(f"Validation set: {X_val.shape} ({X_val.shape[0]/len(y)*100:.0f}%)")
print(f"Test set: {X_test.shape} ({X_test.shape[0]/len(y)*100:.0f}%)")

# Standardize the features (same as original notebook)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)
X_test_scaled = scaler.transform(X_test)

print(f"\nFeature scaling completed")
print(f"Training features mean: {X_train_scaled.mean(axis=0)[:3]:.3f}...")  # Show first 3
print(f"Training features std: {X_train_scaled.std(axis=0)[:3]:.3f}...")   # Show first 3

## Regularized Logistic Regression Implementation

### Base Class with Common Functionality

In [None]:
class RegularizedLogisticRegression:
    """
    Base class for regularized logistic regression
    """
    def __init__(self, learning_rate=0.01, max_iterations=1000, tolerance=1e-6):
        self.learning_rate = learning_rate
        self.max_iterations = max_iterations
        self.tolerance = tolerance
        self.weights = None
        self.bias = None
        self.cost_history = []
        self.train_cost_history = []
        self.val_cost_history = []
    
    def sigmoid(self, z):
        """Sigmoid activation function with numerical stability"""
        # Clip z to prevent overflow
        z = np.clip(z, -500, 500)
        return 1 / (1 + np.exp(-z))
    
    def initialize_parameters(self, n_features):
        """Initialize weights and bias"""
        self.weights = np.zeros(n_features)
        self.bias = 0.0
    
    def predict_proba(self, X):
        """Predict class probabilities"""
        z = np.dot(X, self.weights) + self.bias
        return self.sigmoid(z)
    
    def predict(self, X):
        """Make binary predictions"""
        probabilities = self.predict_proba(X)
        return (probabilities > 0.5).astype(int)
    
    def compute_logistic_cost(self, y_true, y_pred_proba):
        """Compute logistic regression cost (cross-entropy)"""
        epsilon = 1e-15  # Prevent log(0)
        y_pred_clipped = np.clip(y_pred_proba, epsilon, 1 - epsilon)
        return -np.mean(y_true * np.log(y_pred_clipped) + (1 - y_true) * np.log(1 - y_pred_clipped))
    
    def compute_cost(self, X, y):
        """Compute total cost (logistic + regularization) - to be overridden"""
        y_pred_proba = self.predict_proba(X)
        return self.compute_logistic_cost(y, y_pred_proba)
    
    def compute_gradients(self, X, y):
        """Compute gradients - to be overridden"""
        m = len(y)
        y_pred_proba = self.predict_proba(X)
        
        dw = (1/m) * np.dot(X.T, (y_pred_proba - y))
        db = (1/m) * np.sum(y_pred_proba - y)
        
        return dw, db
    
    def fit(self, X_train, y_train, X_val=None, y_val=None):
        """Train the model"""
        # Initialize parameters
        self.initialize_parameters(X_train.shape[1])
        
        # Training loop
        for i in range(self.max_iterations):
            # Compute gradients and update parameters
            dw, db = self.compute_gradients(X_train, y_train)
            
            # Update parameters
            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate * db
            
            # Compute costs
            train_cost = self.compute_cost(X_train, y_train)
            self.train_cost_history.append(train_cost)
            
            if X_val is not None and y_val is not None:
                val_cost = self.compute_cost(X_val, y_val)
                self.val_cost_history.append(val_cost)
            
            # Progress reporting
            if i % 100 == 0:
                if X_val is not None:
                    print(f"Iteration {i}: Train Cost = {train_cost:.6f}, Val Cost = {val_cost:.6f}")
                else:
                    print(f"Iteration {i}: Cost = {train_cost:.6f}")
            
            # Early stopping based on cost change
            if i > 0 and abs(self.train_cost_history[-2] - train_cost) < self.tolerance:
                print(f"Converged at iteration {i}")
                break
    
    def evaluate(self, X, y):
        """Evaluate model performance"""
        y_pred_proba = self.predict_proba(X)
        y_pred = self.predict(X)
        
        # Accuracy
        accuracy = np.mean(y_pred == y)
        
        # Log loss
        log_loss = self.compute_logistic_cost(y, y_pred_proba)
        
        return {'accuracy': accuracy, 'log_loss': log_loss}

print("‚úÖ Base RegularizedLogisticRegression class implemented")

### Ridge Logistic Regression (L2 Regularization)

Ridge logistic regression adds a penalty term proportional to the sum of squares of the weights:

$$\text{Penalty} = \lambda \sum_{j=1}^n w_j^2$$

**Gradient for Ridge:**
$$\frac{\partial J_{ridge}}{\partial w_j} = \frac{\partial J}{\partial w_j} + 2\lambda w_j$$

In [None]:
class RidgeLogisticRegression(RegularizedLogisticRegression):
    """
    Ridge Logistic Regression (L2 Regularization)
    """
    def __init__(self, alpha=1.0, learning_rate=0.01, max_iterations=1000, tolerance=1e-6):
        super().__init__(learning_rate, max_iterations, tolerance)
        self.alpha = alpha  # Regularization strength
    
    def compute_cost(self, X, y):
        """Compute cost with L2 regularization"""
        y_pred_proba = self.predict_proba(X)
        logistic_cost = self.compute_logistic_cost(y, y_pred_proba)
        
        # L2 regularization term (don't regularize bias)
        l2_penalty = self.alpha * np.sum(self.weights ** 2)
        
        return logistic_cost + l2_penalty
    
    def compute_gradients(self, X, y):
        """Compute gradients with L2 regularization"""
        m = len(y)
        y_pred_proba = self.predict_proba(X)
        
        # Standard gradients
        dw = (1/m) * np.dot(X.T, (y_pred_proba - y))
        db = (1/m) * np.sum(y_pred_proba - y)
        
        # Add L2 regularization to weight gradients
        dw += 2 * self.alpha * self.weights
        
        return dw, db

print("‚úÖ RidgeLogisticRegression class implemented")

### Lasso Logistic Regression (L1 Regularization)

Lasso logistic regression adds a penalty term proportional to the sum of absolute values of weights:

$$\text{Penalty} = \lambda \sum_{j=1}^n |w_j|$$

**Gradient for Lasso:**
$$\frac{\partial J_{lasso}}{\partial w_j} = \frac{\partial J}{\partial w_j} + \lambda \cdot \text{sign}(w_j)$$

In [None]:
class LassoLogisticRegression(RegularizedLogisticRegression):
    """
    Lasso Logistic Regression (L1 Regularization)
    """
    def __init__(self, alpha=1.0, learning_rate=0.01, max_iterations=1000, tolerance=1e-6):
        super().__init__(learning_rate, max_iterations, tolerance)
        self.alpha = alpha  # Regularization strength
    
    def compute_cost(self, X, y):
        """Compute cost with L1 regularization"""
        y_pred_proba = self.predict_proba(X)
        logistic_cost = self.compute_logistic_cost(y, y_pred_proba)
        
        # L1 regularization term (don't regularize bias)
        l1_penalty = self.alpha * np.sum(np.abs(self.weights))
        
        return logistic_cost + l1_penalty
    
    def compute_gradients(self, X, y):
        """Compute gradients with L1 regularization"""
        m = len(y)
        y_pred_proba = self.predict_proba(X)
        
        # Standard gradients
        dw = (1/m) * np.dot(X.T, (y_pred_proba - y))
        db = (1/m) * np.sum(y_pred_proba - y)
        
        # Add L1 regularization to weight gradients
        # Use sign function, but handle zero weights carefully
        l1_gradient = np.where(self.weights > 0, 1, 
                              np.where(self.weights < 0, -1, 0))
        dw += self.alpha * l1_gradient
        
        return dw, db

print("‚úÖ LassoLogisticRegression class implemented")

### Elastic Net Logistic Regression (L1 + L2)

Elastic Net combines both L1 and L2 regularization:

$$\text{Penalty} = \lambda_1 \sum_{j=1}^n |w_j| + \lambda_2 \sum_{j=1}^n w_j^2$$

In [None]:
class ElasticNetLogisticRegression(RegularizedLogisticRegression):
    """
    Elastic Net Logistic Regression (L1 + L2 Regularization)
    """
    def __init__(self, alpha=1.0, l1_ratio=0.5, learning_rate=0.01, max_iterations=1000, tolerance=1e-6):
        super().__init__(learning_rate, max_iterations, tolerance)
        self.alpha = alpha  # Total regularization strength
        self.l1_ratio = l1_ratio  # Ratio of L1 to total regularization (0=Ridge, 1=Lasso)
        
        # Split alpha between L1 and L2
        self.alpha_l1 = alpha * l1_ratio
        self.alpha_l2 = alpha * (1 - l1_ratio)
    
    def compute_cost(self, X, y):
        """Compute cost with L1 + L2 regularization"""
        y_pred_proba = self.predict_proba(X)
        logistic_cost = self.compute_logistic_cost(y, y_pred_proba)
        
        # L1 and L2 regularization terms
        l1_penalty = self.alpha_l1 * np.sum(np.abs(self.weights))
        l2_penalty = self.alpha_l2 * np.sum(self.weights ** 2)
        
        return logistic_cost + l1_penalty + l2_penalty
    
    def compute_gradients(self, X, y):
        """Compute gradients with L1 + L2 regularization"""
        m = len(y)
        y_pred_proba = self.predict_proba(X)
        
        # Standard gradients
        dw = (1/m) * np.dot(X.T, (y_pred_proba - y))
        db = (1/m) * np.sum(y_pred_proba - y)
        
        # Add L1 regularization
        l1_gradient = np.where(self.weights > 0, 1, 
                              np.where(self.weights < 0, -1, 0))
        dw += self.alpha_l1 * l1_gradient
        
        # Add L2 regularization
        dw += 2 * self.alpha_l2 * self.weights
        
        return dw, db

print("‚úÖ ElasticNetLogisticRegression class implemented")

## Training and Comparison

Let's train all models and compare their performance:

In [None]:
# Define models to compare
models = {
    'Logistic (No Regularization)': RegularizedLogisticRegression(learning_rate=0.01, max_iterations=2000),
    'Ridge (Œ±=0.01)': RidgeLogisticRegression(alpha=0.01, learning_rate=0.01, max_iterations=2000),
    'Ridge (Œ±=0.1)': RidgeLogisticRegression(alpha=0.1, learning_rate=0.01, max_iterations=2000),
    'Ridge (Œ±=1.0)': RidgeLogisticRegression(alpha=1.0, learning_rate=0.01, max_iterations=2000),
    'Lasso (Œ±=0.01)': LassoLogisticRegression(alpha=0.01, learning_rate=0.01, max_iterations=2000),
    'Lasso (Œ±=0.1)': LassoLogisticRegression(alpha=0.1, learning_rate=0.01, max_iterations=2000),
    'Elastic Net (Œ±=0.1)': ElasticNetLogisticRegression(alpha=0.1, l1_ratio=0.5, learning_rate=0.01, max_iterations=2000)
}

# Train all models
trained_models = {}
results = {}

print("üöÄ Training all models...\n")

for name, model in models.items():
    print(f"Training {name}...")
    print("-" * 50)
    
    # Train model
    model.fit(X_train_scaled, y_train, X_val_scaled, y_val)
    
    # Evaluate on training, validation, and test sets
    train_metrics = model.evaluate(X_train_scaled, y_train)
    val_metrics = model.evaluate(X_val_scaled, y_val)
    test_metrics = model.evaluate(X_test_scaled, y_test)
    
    # Store results
    trained_models[name] = model
    results[name] = {
        'train': train_metrics,
        'val': val_metrics,
        'test': test_metrics,
        'weights_norm': np.linalg.norm(model.weights),
        'non_zero_weights': np.sum(np.abs(model.weights) > 1e-6)
    }
    
    print(f"Train Acc: {train_metrics['accuracy']:.4f}, Val Acc: {val_metrics['accuracy']:.4f}, Test Acc: {test_metrics['accuracy']:.4f}")
    print(f"Weights norm: {results[name]['weights_norm']:.4f}")
    print(f"Non-zero weights: {results[name]['non_zero_weights']}/{len(model.weights)}\n")

print("‚úÖ All models trained!")

## Results Analysis and Visualization

In [None]:
# Create comprehensive results table
results_df = pd.DataFrame({
    'Model': list(results.keys()),
    'Train Accuracy': [results[name]['train']['accuracy'] for name in results.keys()],
    'Val Accuracy': [results[name]['val']['accuracy'] for name in results.keys()],
    'Test Accuracy': [results[name]['test']['accuracy'] for name in results.keys()],
    'Train Log Loss': [results[name]['train']['log_loss'] for name in results.keys()],
    'Val Log Loss': [results[name]['val']['log_loss'] for name in results.keys()],
    'Test Log Loss': [results[name]['test']['log_loss'] for name in results.keys()],
    'Weights Norm': [results[name]['weights_norm'] for name in results.keys()],
    'Non-zero Weights': [results[name]['non_zero_weights'] for name in results.keys()]
})

# Calculate overfitting (difference between train and validation performance)
results_df['Overfitting (Train-Val Acc)'] = results_df['Train Accuracy'] - results_df['Val Accuracy']

print("üìä Model Comparison Results:")
print("=" * 100)
print(results_df.round(4).to_string(index=False))

# Find best model based on validation performance
best_model_name = results_df.loc[results_df['Val Accuracy'].idxmax(), 'Model']
print(f"\nüèÜ Best model based on validation accuracy: {best_model_name}")

In [None]:
# Comprehensive visualization
plt.figure(figsize=(18, 12))

# Plot 1: Training and Validation Loss Curves
plt.subplot(2, 4, 1)
for name, model in trained_models.items():
    if len(model.val_cost_history) > 0:
        plt.plot(model.train_cost_history, label=f'{name} (Train)', alpha=0.7)
        plt.plot(model.val_cost_history, label=f'{name} (Val)', linestyle='--', alpha=0.7)
plt.xlabel('Iteration')
plt.ylabel('Cost')
plt.title('Learning Curves')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, alpha=0.3)
plt.yscale('log')

# Plot 2: Accuracy Comparison
plt.subplot(2, 4, 2)
x_pos = np.arange(len(results_df))
width = 0.25

plt.bar(x_pos - width, results_df['Train Accuracy'], width, label='Train', alpha=0.8)
plt.bar(x_pos, results_df['Val Accuracy'], width, label='Validation', alpha=0.8)
plt.bar(x_pos + width, results_df['Test Accuracy'], width, label='Test', alpha=0.8)

plt.xlabel('Models')
plt.ylabel('Accuracy')
plt.title('Accuracy Comparison')
plt.xticks(x_pos, [name.split('(')[0].strip() for name in results_df['Model']], rotation=45)
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 3: Overfitting Analysis
plt.subplot(2, 4, 3)
plt.bar(range(len(results_df)), results_df['Overfitting (Train-Val Acc)'], alpha=0.8)
plt.xlabel('Models')
plt.ylabel('Train Acc - Val Acc')
plt.title('Overfitting Analysis')
plt.xticks(range(len(results_df)), [name.split('(')[0].strip() for name in results_df['Model']], rotation=45)
plt.axhline(y=0, color='red', linestyle='--', alpha=0.5)
plt.grid(True, alpha=0.3)

# Plot 4: Weights Magnitude
plt.subplot(2, 4, 4)
plt.bar(range(len(results_df)), results_df['Weights Norm'], alpha=0.8)
plt.xlabel('Models')
plt.ylabel('L2 Norm of Weights')
plt.title('Weights Magnitude')
plt.xticks(range(len(results_df)), [name.split('(')[0].strip() for name in results_df['Model']], rotation=45)
plt.yscale('log')
plt.grid(True, alpha=0.3)

# Plot 5: Feature Selection (Non-zero weights)
plt.subplot(2, 4, 5)
plt.bar(range(len(results_df)), results_df['Non-zero Weights'], alpha=0.8)
plt.xlabel('Models')
plt.ylabel('Number of Non-zero Weights')
plt.title('Feature Selection Effect')
plt.xticks(range(len(results_df)), [name.split('(')[0].strip() for name in results_df['Model']], rotation=45)
plt.axhline(y=X_train_scaled.shape[1], color='red', linestyle='--', alpha=0.5, label='Total Features')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 6: Log Loss Comparison
plt.subplot(2, 4, 6)
plt.bar(x_pos - width, results_df['Train Log Loss'], width, label='Train', alpha=0.8)
plt.bar(x_pos, results_df['Val Log Loss'], width, label='Validation', alpha=0.8)
plt.bar(x_pos + width, results_df['Test Log Loss'], width, label='Test', alpha=0.8)

plt.xlabel('Models')
plt.ylabel('Log Loss')
plt.title('Log Loss Comparison')
plt.xticks(x_pos, [name.split('(')[0].strip() for name in results_df['Model']], rotation=45)
plt.legend()
plt.grid(True, alpha=0.3)

# Plot 7: Confusion Matrix for Best Model
plt.subplot(2, 4, 7)
best_model = trained_models[best_model_name]
y_pred_test = best_model.predict(X_test_scaled)
cm = confusion_matrix(y_test, y_pred_test)

plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)
plt.title(f'Confusion Matrix\n({best_model_name})')
plt.colorbar()
tick_marks = np.arange(2)
plt.xticks(tick_marks, ['Benign', 'Malignant'])
plt.yticks(tick_marks, ['Benign', 'Malignant'])
plt.ylabel('True Label')
plt.xlabel('Predicted Label')

# Add text annotations
thresh = cm.max() / 2.
for i, j in np.ndindex(cm.shape):
    plt.text(j, i, format(cm[i, j], 'd'),
             horizontalalignment="center",
             color="white" if cm[i, j] > thresh else "black")

# Plot 8: Weight Distribution for Best Model
plt.subplot(2, 4, 8)
weights = best_model.weights
plt.hist(weights, bins=20, alpha=0.7, edgecolor='black')
plt.xlabel('Weight Value')
plt.ylabel('Frequency')
plt.title(f'Weight Distribution\n({best_model_name})')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Regularization Path Analysis

Let's analyze how different regularization strengths affect the model:

In [None]:
def analyze_regularization_path(RegularizationClass, alphas, model_name):
    """
    Analyze how regularization strength affects model performance
    """
    results = []
    
    for alpha in alphas:
        # Train model
        model = RegularizationClass(alpha=alpha, learning_rate=0.01, max_iterations=1000)
        model.fit(X_train_scaled, y_train, X_val_scaled, y_val)
        
        # Evaluate
        train_metrics = model.evaluate(X_train_scaled, y_train)
        val_metrics = model.evaluate(X_val_scaled, y_val)
        
        results.append({
            'alpha': alpha,
            'train_accuracy': train_metrics['accuracy'],
            'val_accuracy': val_metrics['accuracy'],
            'weights_norm': np.linalg.norm(model.weights),
            'non_zero_weights': np.sum(np.abs(model.weights) > 1e-6)
        })
    
    return pd.DataFrame(results)

# Define alpha ranges
alphas_ridge = np.logspace(-4, 1, 15)  # 0.0001 to 10
alphas_lasso = np.logspace(-4, 0, 15)  # 0.0001 to 1

print("üîç Analyzing regularization paths...")

# Analyze Ridge regularization path
ridge_path = analyze_regularization_path(RidgeLogisticRegression, alphas_ridge, 'Ridge')

# Analyze Lasso regularization path
lasso_path = analyze_regularization_path(LassoLogisticRegression, alphas_lasso, 'Lasso')

print("‚úÖ Regularization path analysis completed!")

In [None]:
# Visualize regularization paths
plt.figure(figsize=(15, 10))

# Ridge Regularization Path
plt.subplot(2, 3, 1)
plt.semilogx(ridge_path['alpha'], ridge_path['train_accuracy'], 'b-', label='Train Accuracy', marker='o')
plt.semilogx(ridge_path['alpha'], ridge_path['val_accuracy'], 'r-', label='Val Accuracy', marker='s')
plt.xlabel('Regularization Strength (Œ±)')
plt.ylabel('Accuracy')
plt.title('Ridge: Accuracy vs Regularization Strength')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(2, 3, 2)
plt.loglog(ridge_path['alpha'], ridge_path['weights_norm'], 'g-', marker='o')
plt.xlabel('Regularization Strength (Œ±)')
plt.ylabel('Weights L2 Norm')
plt.title('Ridge: Weights Magnitude vs Œ±')
plt.grid(True, alpha=0.3)

plt.subplot(2, 3, 3)
plt.semilogx(ridge_path['alpha'], ridge_path['non_zero_weights'], 'purple', marker='o')
plt.xlabel('Regularization Strength (Œ±)')
plt.ylabel('Non-zero Weights')
plt.title('Ridge: Feature Selection vs Œ±')
plt.grid(True, alpha=0.3)

# Lasso Regularization Path
plt.subplot(2, 3, 4)
plt.semilogx(lasso_path['alpha'], lasso_path['train_accuracy'], 'b-', label='Train Accuracy', marker='o')
plt.semilogx(lasso_path['alpha'], lasso_path['val_accuracy'], 'r-', label='Val Accuracy', marker='s')
plt.xlabel('Regularization Strength (Œ±)')
plt.ylabel('Accuracy')
plt.title('Lasso: Accuracy vs Regularization Strength')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(2, 3, 5)
plt.loglog(lasso_path['alpha'], lasso_path['weights_norm'], 'g-', marker='o')
plt.xlabel('Regularization Strength (Œ±)')
plt.ylabel('Weights L2 Norm')
plt.title('Lasso: Weights Magnitude vs Œ±')
plt.grid(True, alpha=0.3)

plt.subplot(2, 3, 6)
plt.semilogx(lasso_path['alpha'], lasso_path['non_zero_weights'], 'purple', marker='o')
plt.xlabel('Regularization Strength (Œ±)')
plt.ylabel('Non-zero Weights')
plt.title('Lasso: Feature Selection vs Œ±')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Find optimal alpha values
optimal_ridge_alpha = ridge_path.loc[ridge_path['val_accuracy'].idxmax(), 'alpha']
optimal_lasso_alpha = lasso_path.loc[lasso_path['val_accuracy'].idxmax(), 'alpha']

print(f"üéØ Optimal Ridge Œ±: {optimal_ridge_alpha:.6f}")
print(f"üéØ Optimal Lasso Œ±: {optimal_lasso_alpha:.6f}")

## Key Insights and Conclusions

### Regularization Effects in Logistic Regression:

1. **Ridge Logistic Regression (L2)**:
   - Shrinks weights towards zero but doesn't eliminate them
   - Helps prevent overfitting by penalizing large weights
   - Maintains all features but reduces their impact

2. **Lasso Logistic Regression (L1)**:
   - Can set weights exactly to zero (automatic feature selection)
   - Produces sparse models by eliminating irrelevant features
   - Useful when you suspect many features are irrelevant

3. **Elastic Net Logistic Regression**:
   - Combines benefits of both Ridge and Lasso
   - Good balance between feature selection and weight shrinkage
   - Handles correlated features better than pure Lasso

### When to Use Each:
- **Ridge**: When you believe most features are relevant but want to prevent overfitting
- **Lasso**: When you want automatic feature selection and suspect many features are irrelevant
- **Elastic Net**: When you want both feature selection and handling of correlated features
- **No Regularization**: When you have few features relative to samples and overfitting isn't a concern

In [None]:
# Final summary
print("üìã LOGISTIC REGRESSION REGULARIZATION SUMMARY")
print("=" * 60)
print(f"Dataset: Tumor classification with {X_train_scaled.shape[0]} training samples, {X_train_scaled.shape[1]} features")
print(f"Best performing model: {best_model_name}")
print(f"Best validation accuracy: {results_df['Val Accuracy'].max():.4f}")
print(f"Best test accuracy: {results_df.loc[results_df['Val Accuracy'].idxmax(), 'Test Accuracy']:.4f}")
print(f"Optimal Ridge Œ±: {optimal_ridge_alpha:.6f}")
print(f"Optimal Lasso Œ±: {optimal_lasso_alpha:.6f}")

print("\nüéì Key Learnings:")
print("‚Ä¢ Regularization helps prevent overfitting in logistic regression")
print("‚Ä¢ L1 (Lasso) provides automatic feature selection for classification")
print("‚Ä¢ L2 (Ridge) shrinks weights but keeps all features")
print("‚Ä¢ Elastic Net combines benefits of both L1 and L2")
print("‚Ä¢ Cross-validation is crucial for selecting optimal regularization strength")
print("‚Ä¢ Medical diagnosis benefits from regularization to avoid overfitting to training data")