# Module 08: Loss Functions and Metrics

**Difficulty**: ⭐⭐ (Intermediate)

**Estimated Time**: 45-60 minutes

**Prerequisites**: 
- [Module 05: Feed-Forward Neural Networks with Keras](05_feedforward_neural_networks_keras.ipynb)
- [Module 04: Introduction to TensorFlow and Keras](04_introduction_to_tensorflow_keras.ipynb)
- [Module 02: Backpropagation and Gradient Descent](02_backpropagation_and_gradient_descent.ipynb)

## Learning Objectives

By the end of this notebook, you will be able to:
1. Understand and apply regression loss functions (MSE, MAE, Huber)
2. Understand and apply classification loss functions (Binary/Categorical Cross-Entropy, Focal Loss)
3. Implement custom loss functions for specific problems
4. Choose appropriate evaluation metrics for different tasks
5. Handle class imbalance in classification problems
6. Distinguish between multi-class and multi-label classification
7. Create custom metrics for monitoring training

## 1. Setup and Imports

In [None]:
# Core libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# TensorFlow and Keras
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, losses, metrics
from tensorflow.keras.datasets import fashion_mnist

# Sklearn for metrics and data generation
from sklearn.metrics import (
    confusion_matrix, classification_report, roc_curve, auc,
    precision_recall_curve, f1_score, mean_squared_error, mean_absolute_error
)
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression, make_classification

# Reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# Visualization settings
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
%matplotlib inline

print(f"TensorFlow version: {tf.__version__}")
print(f"Keras version: {keras.__version__}")

## 2. Regression Loss Functions

### 2.1 Mean Squared Error (MSE) - L2 Loss

**Formula**:
$$
\text{MSE} = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2
$$

**Properties**:
- **Sensitive to outliers**: Squares the errors, heavily penalizes large errors
- **Differentiable everywhere**: Smooth gradients
- **Units**: Squared units of target variable

**When to use**:
- Default choice for regression
- When you want to heavily penalize large errors
- When outliers should be minimized

### 2.2 Mean Absolute Error (MAE) - L1 Loss

**Formula**:
$$
\text{MAE} = \frac{1}{n} \sum_{i=1}^n |y_i - \hat{y}_i|
$$

**Properties**:
- **Robust to outliers**: Linear penalty
- **Same units** as target variable
- **Not differentiable** at zero (but subgradient works)

**When to use**:
- When outliers are present in data
- When you want interpretable error in original units
- When all errors should be weighted equally

### 2.3 Huber Loss

**Formula** (combines MSE and MAE):
$$
L_\delta(y, \hat{y}) = \begin{cases}
\frac{1}{2}(y - \hat{y})^2 & \text{if } |y - \hat{y}| \leq \delta \\
\delta \cdot (|y - \hat{y}| - \frac{1}{2}\delta) & \text{otherwise}
\end{cases}
$$

**Properties**:
- **Best of both worlds**: MSE for small errors, MAE for large errors
- **Robust to outliers** while maintaining MSE's smoothness
- **Hyperparameter $\delta$**: Threshold between quadratic and linear

**When to use**:
- When you have some outliers but want smooth gradients
- Default choice for robust regression

In [None]:
# Visualize regression loss functions
def visualize_regression_losses():
    """
    Visualize how different regression losses penalize errors.
    """
    # Generate error range
    errors = np.linspace(-5, 5, 200)
    
    # Compute losses
    mse_loss = errors ** 2
    mae_loss = np.abs(errors)
    
    # Huber loss with delta=1
    delta = 1.0
    huber_loss = np.where(
        np.abs(errors) <= delta,
        0.5 * errors ** 2,
        delta * (np.abs(errors) - 0.5 * delta)
    )
    
    # Plot
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
    
    # Loss functions
    ax1.plot(errors, mse_loss, linewidth=3, label='MSE (L2)', color='red')
    ax1.plot(errors, mae_loss, linewidth=3, label='MAE (L1)', color='blue')
    ax1.plot(errors, huber_loss, linewidth=3, label='Huber (δ=1)', color='green')
    ax1.axvline(x=0, color='black', linestyle='--', alpha=0.3)
    ax1.axhline(y=0, color='black', linestyle='--', alpha=0.3)
    ax1.set_xlabel('Error (y - ŷ)', fontsize=12)
    ax1.set_ylabel('Loss', fontsize=12)
    ax1.set_title('Regression Loss Functions', fontsize=14, fontweight='bold')
    ax1.legend(fontsize=11)
    ax1.grid(True, alpha=0.3)
    ax1.set_ylim(0, 10)
    
    # Zoom in on small errors
    ax2.plot(errors, mse_loss, linewidth=3, label='MSE (L2)', color='red')
    ax2.plot(errors, mae_loss, linewidth=3, label='MAE (L1)', color='blue')
    ax2.plot(errors, huber_loss, linewidth=3, label='Huber (δ=1)', color='green')
    ax2.axvline(x=0, color='black', linestyle='--', alpha=0.3)
    ax2.axhline(y=0, color='black', linestyle='--', alpha=0.3)
    ax2.set_xlabel('Error (y - ŷ)', fontsize=12)
    ax2.set_ylabel('Loss', fontsize=12)
    ax2.set_title('Zoomed: Small Errors Region', fontsize=14, fontweight='bold')
    ax2.legend(fontsize=11)
    ax2.grid(True, alpha=0.3)
    ax2.set_xlim(-2, 2)
    ax2.set_ylim(0, 3)
    
    plt.tight_layout()
    plt.show()
    
    print("Observations:")
    print("- MSE: Grows quadratically, heavily penalizes outliers")
    print("- MAE: Grows linearly, robust to outliers")
    print("- Huber: Quadratic near zero (smooth), linear for large errors (robust)")

visualize_regression_losses()

In [None]:
# Demonstrate on synthetic regression data
print("Generating synthetic regression data with outliers...\n")

# Generate clean data
X, y = make_regression(n_samples=1000, n_features=10, noise=10, random_state=42)

# Add outliers (10% of data)
n_outliers = 100
outlier_indices = np.random.choice(len(y), n_outliers, replace=False)
y[outlier_indices] += np.random.randn(n_outliers) * 100  # Large noise

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training samples: {len(X_train)}")
print(f"Test samples: {len(X_test)}")
print(f"Outliers added: {n_outliers}")
print(f"Target mean: {y_train.mean():.2f}")
print(f"Target std: {y_train.std():.2f}")

In [None]:
# Train models with different loss functions
def create_regression_model():
    """Create simple regression model."""
    model = models.Sequential([
        layers.Input(shape=(10,)),
        layers.Dense(64, activation='relu'),
        layers.Dense(32, activation='relu'),
        layers.Dense(1)  # Single output for regression
    ])
    return model

# Train with different losses
loss_functions = {
    'MSE': losses.MeanSquaredError(),
    'MAE': losses.MeanAbsoluteError(),
    'Huber': losses.Huber(delta=1.0)
}

results = {}

for name, loss_fn in loss_functions.items():
    print(f"\nTraining with {name} loss...")
    
    model = create_regression_model()
    model.compile(optimizer='adam', loss=loss_fn, metrics=['mae'])
    
    history = model.fit(
        X_train, y_train,
        epochs=50,
        batch_size=32,
        validation_split=0.2,
        verbose=0
    )
    
    # Evaluate
    y_pred = model.predict(X_test, verbose=0)
    test_mse = mean_squared_error(y_test, y_pred)
    test_mae = mean_absolute_error(y_test, y_pred)
    
    results[name] = {
        'history': history,
        'predictions': y_pred,
        'test_mse': test_mse,
        'test_mae': test_mae
    }
    
    print(f"  Test MSE: {test_mse:.2f}")
    print(f"  Test MAE: {test_mae:.2f}")

print("\nAll models trained!")

In [None]:
# Compare regression loss functions
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# Training curves
for name, result in results.items():
    ax1.plot(result['history'].history['loss'], linewidth=2, label=name)

ax1.set_xlabel('Epoch', fontsize=12)
ax1.set_ylabel('Training Loss', fontsize=12)
ax1.set_title('Training Loss Comparison', fontsize=14, fontweight='bold')
ax1.legend(fontsize=11)
ax1.grid(True, alpha=0.3)

# Test metrics comparison
metrics_names = ['Test MSE', 'Test MAE']
x = np.arange(len(loss_functions))
width = 0.35

mse_values = [results[name]['test_mse'] for name in loss_functions.keys()]
mae_values = [results[name]['test_mae'] for name in loss_functions.keys()]

ax2.bar(x - width/2, mse_values, width, label='MSE', alpha=0.8)
ax2.bar(x + width/2, mae_values, width, label='MAE', alpha=0.8)

ax2.set_xlabel('Loss Function Used', fontsize=12)
ax2.set_ylabel('Error Value', fontsize=12)
ax2.set_title('Test Set Performance', fontsize=14, fontweight='bold')
ax2.set_xticks(x)
ax2.set_xticklabels(loss_functions.keys())
ax2.legend(fontsize=11)
ax2.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("\nConclusion:")
print("Huber loss often performs best with outliers - robust yet smooth!")

## 3. Classification Loss Functions

### 3.1 Binary Cross-Entropy (Log Loss)

**For binary classification** (2 classes: 0 or 1)

**Formula**:
$$
\text{BCE} = -\frac{1}{n} \sum_{i=1}^n \left[ y_i \log(\hat{y}_i) + (1-y_i) \log(1-\hat{y}_i) \right]
$$

Where:
- $y_i \in \{0, 1\}$ is the true label
- $\hat{y}_i \in (0, 1)$ is the predicted probability

**When to use**:
- Binary classification tasks
- Output layer: Single neuron with sigmoid activation

### 3.2 Categorical Cross-Entropy

**For multi-class classification** (K > 2 classes, mutually exclusive)

**Formula**:
$$
\text{CCE} = -\frac{1}{n} \sum_{i=1}^n \sum_{k=1}^K y_{i,k} \log(\hat{y}_{i,k})
$$

Where:
- $y_{i,k}$ is 1 if sample $i$ belongs to class $k$, else 0 (one-hot encoded)
- $\hat{y}_{i,k}$ is the predicted probability for class $k$

**Variants**:
- **`categorical_crossentropy`**: Expects one-hot encoded labels
- **`sparse_categorical_crossentropy`**: Expects integer labels (more convenient)

**When to use**:
- Multi-class classification (single label per sample)
- Output layer: K neurons with softmax activation

### 3.3 Focal Loss

**For imbalanced classification**

**Formula**:
$$
\text{FL}(p_t) = -\alpha_t (1 - p_t)^\gamma \log(p_t)
$$

Where:
- $p_t$ is the model's estimated probability for the true class
- $\gamma$ is the focusing parameter (typically 2)
- $\alpha_t$ is the class weight

**Key idea**: Down-weight easy examples, focus on hard examples

**When to use**:
- Severe class imbalance
- Object detection
- When easy examples dominate training

In [None]:
# Visualize cross-entropy loss
def visualize_cross_entropy():
    """
    Visualize how cross-entropy penalizes predictions.
    """
    # Predicted probabilities
    probs = np.linspace(0.001, 0.999, 200)
    
    # Loss when true label is 1
    loss_positive = -np.log(probs)
    
    # Loss when true label is 0
    loss_negative = -np.log(1 - probs)
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
    
    # Binary cross-entropy
    ax1.plot(probs, loss_positive, linewidth=3, label='True Class = 1', color='blue')
    ax1.plot(probs, loss_negative, linewidth=3, label='True Class = 0', color='red')
    ax1.axhline(y=0, color='black', linestyle='--', alpha=0.3)
    ax1.set_xlabel('Predicted Probability', fontsize=12)
    ax1.set_ylabel('Loss', fontsize=12)
    ax1.set_title('Binary Cross-Entropy Loss', fontsize=14, fontweight='bold')
    ax1.legend(fontsize=11)
    ax1.grid(True, alpha=0.3)
    ax1.set_ylim(0, 8)
    
    # Focal loss comparison (for true class = 1)
    focal_gamma_0 = loss_positive  # Focal loss with γ=0 is just CE
    focal_gamma_1 = -(1 - probs) * np.log(probs)
    focal_gamma_2 = -((1 - probs) ** 2) * np.log(probs)
    
    ax2.plot(probs, focal_gamma_0, linewidth=3, label='γ=0 (Standard CE)', linestyle='--')
    ax2.plot(probs, focal_gamma_1, linewidth=3, label='γ=1')
    ax2.plot(probs, focal_gamma_2, linewidth=3, label='γ=2 (Focal Loss)')
    ax2.axhline(y=0, color='black', linestyle='--', alpha=0.3)
    ax2.set_xlabel('Predicted Probability (for True Class)', fontsize=12)
    ax2.set_ylabel('Loss', fontsize=12)
    ax2.set_title('Focal Loss: Down-weighting Easy Examples', fontsize=14, fontweight='bold')
    ax2.legend(fontsize=11)
    ax2.grid(True, alpha=0.3)
    ax2.set_ylim(0, 4)
    
    plt.tight_layout()
    plt.show()
    
    print("Observations:")
    print("- Cross-Entropy: Heavily penalizes confident wrong predictions")
    print("- Focal Loss: Reduces loss for well-classified examples (high prob)")
    print("- Higher γ: More focus on hard examples")

visualize_cross_entropy()

In [None]:
# Load Fashion-MNIST for multi-class classification
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()

# Preprocess
X_train_full = X_train_full.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0

X_train_full_flat = X_train_full.reshape(-1, 784)
X_test_flat = X_test.reshape(-1, 784)

# Split validation
X_train = X_train_full_flat[:50000]
X_valid = X_train_full_flat[50000:]
y_train = y_train_full[:50000]
y_valid = y_train_full[50000:]

print(f"Training set: {X_train.shape}")
print(f"Validation set: {X_valid.shape}")
print(f"Number of classes: {len(np.unique(y_train))}")

In [None]:
# Create binary classification task: T-shirts (0) vs Everything Else
y_train_binary = (y_train == 0).astype(int)
y_valid_binary = (y_valid == 0).astype(int)
y_test_binary = (y_test == 0).astype(int)

print("Binary Classification Setup:")
print(f"  Class 0 (T-shirts): {np.sum(y_train_binary == 0)} samples")
print(f"  Class 1 (Others): {np.sum(y_train_binary == 1)} samples")
print(f"  Imbalance ratio: {np.sum(y_train_binary == 0) / np.sum(y_train_binary == 1):.2f}")

In [None]:
# Model for binary classification
def create_binary_classifier():
    """Binary classification model."""
    model = models.Sequential([
        layers.Input(shape=(784,)),
        layers.Dense(128, activation='relu'),
        layers.Dense(64, activation='relu'),
        layers.Dense(1, activation='sigmoid')  # Single output with sigmoid
    ])
    return model

# Train binary classifier
print("Training binary classifier...\n")
model_binary = create_binary_classifier()
model_binary.compile(
    optimizer='adam',
    loss='binary_crossentropy',  # Binary cross-entropy
    metrics=['accuracy', metrics.Precision(), metrics.Recall()]
)

history_binary = model_binary.fit(
    X_train, y_train_binary,
    epochs=10,
    batch_size=128,
    validation_data=(X_valid, y_valid_binary),
    verbose=1
)

print("\nBinary classifier trained!")

In [None]:
# Model for multi-class classification
def create_multiclass_classifier():
    """Multi-class classification model."""
    model = models.Sequential([
        layers.Input(shape=(784,)),
        layers.Dense(128, activation='relu'),
        layers.Dense(64, activation='relu'),
        layers.Dense(10, activation='softmax')  # 10 classes with softmax
    ])
    return model

# Train multi-class classifier
print("Training multi-class classifier...\n")
model_multiclass = create_multiclass_classifier()
model_multiclass.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',  # Sparse categorical CE
    metrics=['accuracy']
)

history_multiclass = model_multiclass.fit(
    X_train, y_train,
    epochs=10,
    batch_size=128,
    validation_data=(X_valid, y_valid),
    verbose=1
)

print("\nMulti-class classifier trained!")

## 4. Evaluation Metrics

### 4.1 Classification Metrics

#### Confusion Matrix:
```
                Predicted
              Negative  Positive
Actual Negative   TN       FP
       Positive   FN       TP
```

#### Derived Metrics:

**Accuracy**: Overall correctness
$$\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}$$

**Precision**: Of predicted positives, how many are correct?
$$\text{Precision} = \frac{TP}{TP + FP}$$

**Recall (Sensitivity)**: Of actual positives, how many did we find?
$$\text{Recall} = \frac{TP}{TP + FN}$$

**F1 Score**: Harmonic mean of precision and recall
$$\text{F1} = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}$$

**ROC-AUC**: Area under ROC curve (TPR vs FPR)
- 1.0 = Perfect classifier
- 0.5 = Random guess

### When to Use Which Metric:

| Scenario | Best Metric |
|----------|-------------|
| **Balanced classes** | Accuracy |
| **Imbalanced classes** | F1, ROC-AUC, Precision-Recall AUC |
| **False positives costly** | Precision |
| **False negatives costly** | Recall |
| **Need single metric** | F1 Score |
| **Probability calibration matters** | ROC-AUC |

In [None]:
# Evaluate binary classifier
y_pred_proba = model_binary.predict(X_test_flat, verbose=0)
y_pred_binary = (y_pred_proba > 0.5).astype(int)

# Confusion matrix
cm = confusion_matrix(y_test_binary, y_pred_binary)

# Visualize confusion matrix
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# Confusion matrix
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=ax1,
            xticklabels=['T-shirt', 'Other'],
            yticklabels=['T-shirt', 'Other'])
ax1.set_xlabel('Predicted', fontsize=12)
ax1.set_ylabel('Actual', fontsize=12)
ax1.set_title('Confusion Matrix', fontsize=14, fontweight='bold')

# ROC curve
fpr, tpr, thresholds = roc_curve(y_test_binary, y_pred_proba)
roc_auc = auc(fpr, tpr)

ax2.plot(fpr, tpr, linewidth=3, label=f'ROC Curve (AUC = {roc_auc:.3f})')
ax2.plot([0, 1], [0, 1], 'k--', linewidth=2, label='Random Guess')
ax2.set_xlabel('False Positive Rate', fontsize=12)
ax2.set_ylabel('True Positive Rate', fontsize=12)
ax2.set_title('ROC Curve', fontsize=14, fontweight='bold')
ax2.legend(fontsize=11)
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Classification report
print("\nClassification Report:")
print("="*60)
print(classification_report(y_test_binary, y_pred_binary, 
                           target_names=['T-shirt', 'Other']))

In [None]:
# Evaluate multi-class classifier
y_pred_multiclass = model_multiclass.predict(X_test_flat, verbose=0)
y_pred_classes = np.argmax(y_pred_multiclass, axis=1)

# Multi-class confusion matrix
cm_multi = confusion_matrix(y_test, y_pred_classes)

class_names = ['T-shirt', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

plt.figure(figsize=(12, 10))
sns.heatmap(cm_multi, annot=True, fmt='d', cmap='Blues',
            xticklabels=class_names, yticklabels=class_names)
plt.xlabel('Predicted', fontsize=12)
plt.ylabel('Actual', fontsize=12)
plt.title('Multi-Class Confusion Matrix', fontsize=14, fontweight='bold')
plt.xticks(rotation=45, ha='right')
plt.yticks(rotation=0)
plt.tight_layout()
plt.show()

# Per-class metrics
print("\nMulti-Class Classification Report:")
print("="*80)
print(classification_report(y_test, y_pred_classes, target_names=class_names))

## 5. Handling Class Imbalance

### Problem:
When one class significantly outnumbers others, model may:
- Ignore minority class
- Achieve high accuracy by always predicting majority class
- Have poor performance on minority class

### Solutions:

#### 1. Class Weights
Penalize misclassification of minority class more heavily:
$$w_k = \frac{n_{\text{samples}}}{n_{\text{classes}} \times n_{\text{samples in class k}}}$$

#### 2. Oversampling Minority Class
Duplicate minority class samples

#### 3. Undersampling Majority Class
Remove majority class samples

#### 4. Focal Loss
Down-weight well-classified examples

#### 5. Different Metrics
Use F1, ROC-AUC instead of accuracy

In [None]:
# Create severely imbalanced dataset
# Keep only 10% of "Other" class samples
print("Creating imbalanced dataset...\n")

# Indices for each class
tshirt_indices = np.where(y_train_binary == 0)[0]
other_indices = np.where(y_train_binary == 1)[0]

# Keep all T-shirts, only 10% of others
n_others_keep = len(other_indices) // 10
other_indices_sampled = np.random.choice(other_indices, n_others_keep, replace=False)

# Combine
imbalanced_indices = np.concatenate([tshirt_indices, other_indices_sampled])
np.random.shuffle(imbalanced_indices)

X_train_imb = X_train[imbalanced_indices]
y_train_imb = y_train_binary[imbalanced_indices]

print(f"Original dataset:")
print(f"  T-shirts: {len(tshirt_indices)}")
print(f"  Others: {len(other_indices)}")
print(f"  Ratio: {len(tshirt_indices) / len(other_indices):.2f}")

print(f"\nImbalanced dataset:")
print(f"  T-shirts: {np.sum(y_train_imb == 0)}")
print(f"  Others: {np.sum(y_train_imb == 1)}")
print(f"  Ratio: {np.sum(y_train_imb == 0) / np.sum(y_train_imb == 1):.2f}")

In [None]:
# Train without class weights (baseline)
print("\nTraining WITHOUT class weights...")
model_no_weights = create_binary_classifier()
model_no_weights.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy', metrics.Precision(), metrics.Recall()]
)

history_no_weights = model_no_weights.fit(
    X_train_imb, y_train_imb,
    epochs=10,
    batch_size=128,
    validation_data=(X_valid, y_valid_binary),
    verbose=0
)

# Evaluate
y_pred_no_weights = (model_no_weights.predict(X_test_flat, verbose=0) > 0.5).astype(int)
print("\nResults WITHOUT class weights:")
print(classification_report(y_test_binary, y_pred_no_weights, 
                           target_names=['T-shirt', 'Other']))

In [None]:
# Calculate class weights
from sklearn.utils.class_weight import compute_class_weight

class_weights_array = compute_class_weight(
    class_weight='balanced',
    classes=np.unique(y_train_imb),
    y=y_train_imb
)
class_weights = dict(enumerate(class_weights_array))

print(f"\nComputed class weights:")
print(f"  Class 0 (T-shirt): {class_weights[0]:.3f}")
print(f"  Class 1 (Other): {class_weights[1]:.3f}")

# Train WITH class weights
print("\nTraining WITH class weights...")
model_with_weights = create_binary_classifier()
model_with_weights.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy', metrics.Precision(), metrics.Recall()]
)

history_with_weights = model_with_weights.fit(
    X_train_imb, y_train_imb,
    epochs=10,
    batch_size=128,
    validation_data=(X_valid, y_valid_binary),
    class_weight=class_weights,  # Apply class weights
    verbose=0
)

# Evaluate
y_pred_with_weights = (model_with_weights.predict(X_test_flat, verbose=0) > 0.5).astype(int)
print("\nResults WITH class weights:")
print(classification_report(y_test_binary, y_pred_with_weights,
                           target_names=['T-shirt', 'Other']))

In [None]:
# Compare class weight impact
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Confusion matrices
cm_no_weights = confusion_matrix(y_test_binary, y_pred_no_weights)
cm_with_weights = confusion_matrix(y_test_binary, y_pred_with_weights)

sns.heatmap(cm_no_weights, annot=True, fmt='d', cmap='Reds', ax=axes[0],
            xticklabels=['T-shirt', 'Other'],
            yticklabels=['T-shirt', 'Other'])
axes[0].set_xlabel('Predicted', fontsize=12)
axes[0].set_ylabel('Actual', fontsize=12)
axes[0].set_title('WITHOUT Class Weights', fontsize=14, fontweight='bold')

sns.heatmap(cm_with_weights, annot=True, fmt='d', cmap='Greens', ax=axes[1],
            xticklabels=['T-shirt', 'Other'],
            yticklabels=['T-shirt', 'Other'])
axes[1].set_xlabel('Predicted', fontsize=12)
axes[1].set_ylabel('Actual', fontsize=12)
axes[1].set_title('WITH Class Weights', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

print("\nObservation:")
print("Class weights help balance precision/recall for minority class!")

## 6. Custom Loss Functions and Metrics

Sometimes you need to implement custom loss functions or metrics for specific problems.

In [None]:
# Example 1: Custom weighted MSE loss
@tf.function
def weighted_mse(y_true, y_pred, weight_positive=2.0):
    """
    Custom MSE that weights positive values more heavily.
    Useful when predicting rare events.
    """
    error = y_true - y_pred
    squared_error = tf.square(error)
    
    # Apply higher weight to positive true values
    weights = tf.where(y_true > 0, weight_positive, 1.0)
    weighted_error = squared_error * weights
    
    return tf.reduce_mean(weighted_error)

# Example 2: Custom F1 score metric
class F1Score(keras.metrics.Metric):
    """
    Custom F1 score metric for binary classification.
    """
    def __init__(self, name='f1_score', **kwargs):
        super(F1Score, self).__init__(name=name, **kwargs)
        self.precision = metrics.Precision()
        self.recall = metrics.Recall()
    
    def update_state(self, y_true, y_pred, sample_weight=None):
        self.precision.update_state(y_true, y_pred, sample_weight)
        self.recall.update_state(y_true, y_pred, sample_weight)
    
    def result(self):
        p = self.precision.result()
        r = self.recall.result()
        # F1 = 2 * (precision * recall) / (precision + recall)
        return 2 * ((p * r) / (p + r + tf.keras.backend.epsilon()))
    
    def reset_state(self):
        self.precision.reset_state()
        self.recall.reset_state()

# Example 3: Focal loss implementation
@tf.function
def focal_loss(y_true, y_pred, gamma=2.0, alpha=0.25):
    """
    Focal loss for binary classification.
    
    Parameters:
    -----------
    gamma : float
        Focusing parameter (typically 2.0)
    alpha : float
        Class weight (typically 0.25)
    """
    # Clip predictions to prevent log(0)
    y_pred = tf.clip_by_value(y_pred, 1e-7, 1 - 1e-7)
    
    # Compute focal loss
    cross_entropy = -y_true * tf.math.log(y_pred) - (1 - y_true) * tf.math.log(1 - y_pred)
    
    # Focal weight
    focal_weight = tf.where(y_true == 1,
                           alpha * tf.pow(1 - y_pred, gamma),
                           (1 - alpha) * tf.pow(y_pred, gamma))
    
    focal_loss_value = focal_weight * cross_entropy
    
    return tf.reduce_mean(focal_loss_value)

print("Custom loss functions and metrics defined!")
print("\nExamples:")
print("1. weighted_mse: Penalizes errors on positive samples more")
print("2. F1Score: Custom metric for F1 score")
print("3. focal_loss: Down-weights easy examples")

In [None]:
# Train with custom focal loss
print("Training with Focal Loss...\n")

model_focal = create_binary_classifier()
model_focal.compile(
    optimizer='adam',
    loss=focal_loss,  # Custom focal loss
    metrics=['accuracy', F1Score()]  # Custom F1 metric
)

history_focal = model_focal.fit(
    X_train_imb, y_train_imb,
    epochs=10,
    batch_size=128,
    validation_data=(X_valid, y_valid_binary),
    verbose=1
)

print("\nFocal Loss model trained!")

## 7. Summary

### Key Concepts:

#### Regression Losses:
1. **MSE**: Default, penalizes outliers heavily
2. **MAE**: Robust to outliers, same units as target
3. **Huber**: Best of both - smooth and robust

#### Classification Losses:
1. **Binary Cross-Entropy**: Binary classification (2 classes)
2. **Categorical Cross-Entropy**: Multi-class (mutually exclusive)
3. **Focal Loss**: Handles class imbalance

#### Metrics:
1. **Accuracy**: Good for balanced datasets
2. **Precision**: Minimize false positives
3. **Recall**: Minimize false negatives
4. **F1 Score**: Balance precision and recall
5. **ROC-AUC**: Overall discriminative ability

#### Class Imbalance Solutions:
1. **Class weights**: Weight minority class more
2. **Focal loss**: Down-weight easy examples
3. **Sampling**: Over/undersample classes
4. **Better metrics**: F1, ROC-AUC instead of accuracy

### Decision Guide:

**For Regression:**
```
Are there outliers?
    |-- NO  → Use MSE (standard)
    |-- YES → Use Huber or MAE
```

**For Classification:**
```
How many classes?
    |-- 2 → Binary Cross-Entropy
    |-- >2 → Categorical Cross-Entropy
    
Is dataset balanced?
    |-- NO → Use class weights or Focal Loss
    |-- YES → Standard cross-entropy is fine
```

### What's Next?

- **Module 09**: Hyperparameter Tuning for Deep Learning

### Additional Resources:

- [Keras Loss Functions](https://keras.io/api/losses/)
- [Keras Metrics](https://keras.io/api/metrics/)
- [Focal Loss Paper](https://arxiv.org/abs/1708.02002)
- [Imbalanced Learning](https://imbalanced-learn.org/)

## 8. Exercises

### Exercise 1: Implement Custom Dice Loss

**Task**: Implement the Dice loss, commonly used in image segmentation.

**Dice Coefficient**:
$$\text{Dice} = \frac{2 |A \cap B|}{|A| + |B|}$$

**Dice Loss**:
$$\text{Dice Loss} = 1 - \text{Dice}$$

**Requirements**:
- Implement as TensorFlow function
- Test on binary classification
- Compare with Binary Cross-Entropy

```python
# Your code here
```

In [None]:
# Exercise 1 Solution
# Uncomment to reveal

# @tf.function
# def dice_loss(y_true, y_pred, smooth=1e-6):
#     """
#     Dice loss for binary classification.
#     
#     Parameters:
#     -----------
#     smooth : float
#         Smoothing constant to avoid division by zero
#     """
#     # Flatten predictions and labels
#     y_true_f = tf.reshape(y_true, [-1])
#     y_pred_f = tf.reshape(y_pred, [-1])
#     
#     # Calculate intersection and union
#     intersection = tf.reduce_sum(y_true_f * y_pred_f)
#     union = tf.reduce_sum(y_true_f) + tf.reduce_sum(y_pred_f)
#     
#     # Dice coefficient
#     dice = (2.0 * intersection + smooth) / (union + smooth)
#     
#     # Dice loss
#     return 1.0 - dice
# 
# # Test on binary classification
# # (Add testing code here)

### Exercise 2: Multi-Label Classification

**Task**: Implement multi-label classification (multiple labels per sample).

**Scenario**: Classify Fashion-MNIST into multiple attributes:
- Is clothing? (T-shirt, Dress, Coat, etc.)
- Is footwear? (Sandal, Sneaker, Ankle boot)
- Is accessory? (Bag)

**Requirements**:
- Create multi-label targets
- Use Binary Cross-Entropy (not Categorical!)
- Use sigmoid activation (not softmax!)
- Evaluate with per-label metrics

```python
# Your code here
```

In [None]:
# Exercise 2 Solution
# Uncomment to reveal

# # Define multi-label mapping
# # Fashion-MNIST: 0=T-shirt, 1=Trouser, 2=Pullover, 3=Dress, 4=Coat,
# #                5=Sandal, 6=Shirt, 7=Sneaker, 8=Bag, 9=Ankle boot
# 
# def create_multilabel_targets(y):
#     """Convert to multi-label: [is_clothing, is_footwear, is_accessory]"""
#     n = len(y)
#     multilabel = np.zeros((n, 3))
#     
#     # is_clothing (0,2,3,4,6)
#     multilabel[:, 0] = np.isin(y, [0, 2, 3, 4, 6]).astype(int)
#     # is_footwear (5,7,9)
#     multilabel[:, 1] = np.isin(y, [5, 7, 9]).astype(int)
#     # is_accessory (8)
#     multilabel[:, 2] = (y == 8).astype(int)
#     
#     return multilabel
# 
# # Create multi-label targets
# y_train_multilabel = create_multilabel_targets(y_train)
# y_valid_multilabel = create_multilabel_targets(y_valid)
# 
# # Model for multi-label
# model_multilabel = models.Sequential([
#     layers.Input(shape=(784,)),
#     layers.Dense(128, activation='relu'),
#     layers.Dense(64, activation='relu'),
#     layers.Dense(3, activation='sigmoid')  # Sigmoid for multi-label
# ])
# 
# model_multilabel.compile(
#     optimizer='adam',
#     loss='binary_crossentropy',  # Binary CE for multi-label
#     metrics=['accuracy']
# )
# 
# # Train
# # (Add training code here)

### Exercise 3: ROC Curve Analysis

**Task**: Analyze different probability thresholds for binary classification.

**Requirements**:
- Train binary classifier
- Plot ROC curve
- Plot Precision-Recall curve
- Find optimal threshold for:
  - Maximum F1 score
  - 95% recall (minimize false negatives)
  - 95% precision (minimize false positives)
- Compare trade-offs

```python
# Your code here
```

In [None]:
# Exercise 3 Solution
# Uncomment to reveal

# # Get predicted probabilities
# y_proba = model_binary.predict(X_test_flat, verbose=0)
# 
# # Compute precision-recall curve
# precision, recall, thresholds_pr = precision_recall_curve(y_test_binary, y_proba)
# 
# # Compute F1 for each threshold
# f1_scores = 2 * (precision * recall) / (precision + recall + 1e-10)
# 
# # Find optimal thresholds
# idx_max_f1 = np.argmax(f1_scores)
# optimal_threshold_f1 = thresholds_pr[idx_max_f1]
# 
# # Threshold for 95% recall
# idx_95_recall = np.argmin(np.abs(recall - 0.95))
# threshold_95_recall = thresholds_pr[idx_95_recall]
# 
# # Threshold for 95% precision
# idx_95_precision = np.argmin(np.abs(precision - 0.95))
# threshold_95_precision = thresholds_pr[idx_95_precision]
# 
# print(f"Optimal thresholds:")
# print(f"  Max F1: {optimal_threshold_f1:.3f} (F1={f1_scores[idx_max_f1]:.3f})")
# print(f"  95% Recall: {threshold_95_recall:.3f}")
# print(f"  95% Precision: {threshold_95_precision:.3f}")

---

**Congratulations!** You've completed Module 08. You now understand:
- Different loss functions for regression and classification
- When to use each loss function
- Important evaluation metrics and their trade-offs
- How to handle class imbalance
- How to implement custom loss functions and metrics

Continue to **Module 09: Hyperparameter Tuning for Deep Learning** to learn how to optimize your models!