# **Lab 1a: Deep Neural Network Training & Robust Models**

**Course:** Introduction to Data Security Pr. (Master's Level)  
**Module 1:** Foundations  
**Estimated Time:** 90-120 minutes

---

## **Learning Objectives**

By the end of this lab, you will be able to:

1. **Train** deep neural networks on standard image classification datasets (MNIST, CIFAR-10)
2. **Evaluate** model performance using standard metrics (accuracy, loss, confusion matrix)
3. **Understand** the difference between standard models and robust models
4. **Load** and compare pre-trained robust models
5. **Analyze** vulnerability of standard models to adversarial perturbations
6. **Establish** baseline models for subsequent security labs

## **Table of Contents**

1. [Setup & Imports](#setup)
2. [Part 1: Dataset Loading & Preprocessing](#part1)
3. [Part 2: Training a Standard CNN](#part2)
4. [Part 3: Evaluating Model Performance](#part3)
5. [Part 4: Understanding Robust Models](#part4)
6. [Part 5: Loading Pre-trained Robust Models](#part5)
7. [Part 6: Comparing Standard vs. Robust Models](#part6)
8. [Exercises](#exercises)
9. [Conclusion & Next Steps](#conclusion)

## **Setup & Imports** <a name="setup"></a>

First, we'll install necessary libraries and import required modules.

In [None]:
# Install required packages (uncomment if needed)
# !pip install torch torchvision matplotlib numpy scikit-learn tqdm

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

## **Part 1: Dataset Loading & Preprocessing** <a name="part1"></a>

We'll work with **MNIST** (handwritten digits) as our primary dataset. MNIST is a standard benchmark for:
- Image classification
- Neural network training
- Adversarial robustness research

**Dataset Details:**
- **Training samples:** 60,000
- **Test samples:** 10,000
- **Image size:** 28×28 grayscale
- **Classes:** 10 (digits 0-9)

In [None]:
# Data preprocessing transformations
transform = transforms.Compose([
    transforms.ToTensor(),  # Convert to tensor and scale to [0, 1]
    transforms.Normalize((0.1307,), (0.3081,))  # Normalize with MNIST mean and std
])

# Load MNIST dataset
train_dataset = torchvision.datasets.MNIST(
    root='./data',
    train=True,
    transform=transform,
    download=True
)

test_dataset = torchvision.datasets.MNIST(
    root='./data',
    train=False,
    transform=transform,
    download=True
)

# Create data loaders
batch_size = 128
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=2)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers=2)

print(f"Training samples: {len(train_dataset)}")
print(f"Test samples: {len(test_dataset)}")
print(f"Batch size: {batch_size}")
print(f"Number of batches (train): {len(train_loader)}")

### **Visualize Sample Images**

In [None]:
# Visualize some training samples
def show_images(dataset, num_samples=10):
    fig, axes = plt.subplots(2, 5, figsize=(12, 5))
    axes = axes.ravel()
    
    for i in range(num_samples):
        image, label = dataset[i]
        # Denormalize for visualization
        image = image * 0.3081 + 0.1307
        axes[i].imshow(image.squeeze(), cmap='gray')
        axes[i].set_title(f'Label: {label}')
        axes[i].axis('off')
    
    plt.tight_layout()
    plt.show()

show_images(train_dataset)

## **Part 2: Training a Standard CNN** <a name="part2"></a>

We'll implement a **Convolutional Neural Network (CNN)** architecture commonly used for MNIST classification.

**Architecture:**
- Conv Layer 1: 1 → 32 channels, 3×3 kernel
- Conv Layer 2: 32 → 64 channels, 3×3 kernel
- MaxPooling layers
- Fully Connected layers
- Dropout for regularization

In [None]:
class StandardCNN(nn.Module):
    """Standard CNN architecture for MNIST classification."""
    
    def __init__(self):
        super(StandardCNN, self).__init__()
        
        # Convolutional layers
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        
        # Pooling layer
        self.pool = nn.MaxPool2d(2, 2)
        
        # Fully connected layers
        self.fc1 = nn.Linear(64 * 7 * 7, 128)
        self.fc2 = nn.Linear(128, 10)
        
        # Dropout for regularization
        self.dropout = nn.Dropout(0.25)
        
        # Activation
        self.relu = nn.ReLU()
    
    def forward(self, x):
        # Conv block 1
        x = self.relu(self.conv1(x))
        x = self.pool(x)
        
        # Conv block 2
        x = self.relu(self.conv2(x))
        x = self.pool(x)
        
        # Flatten
        x = x.view(x.size(0), -1)
        
        # Fully connected layers
        x = self.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        
        return x

# Initialize model
model = StandardCNN().to(device)
print(model)

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"\nTotal parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")

### **Training Function**

In [None]:
def train_model(model, train_loader, criterion, optimizer, device, epochs=5):
    """Train the model and return training history."""
    
    history = {'train_loss': [], 'train_acc': []}
    
    model.train()
    
    for epoch in range(epochs):
        running_loss = 0.0
        correct = 0
        total = 0
        
        # Progress bar
        pbar = tqdm(train_loader, desc=f'Epoch {epoch+1}/{epochs}')
        
        for images, labels in pbar:
            images, labels = images.to(device), labels.to(device)
            
            # Forward pass
            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            
            # Backward pass and optimization
            loss.backward()
            optimizer.step()
            
            # Statistics
            running_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            
            # Update progress bar
            pbar.set_postfix({'loss': loss.item(), 'acc': 100 * correct / total})
        
        # Epoch statistics
        epoch_loss = running_loss / len(train_loader)
        epoch_acc = 100 * correct / total
        history['train_loss'].append(epoch_loss)
        history['train_acc'].append(epoch_acc)
        
        print(f'Epoch [{epoch+1}/{epochs}] - Loss: {epoch_loss:.4f}, Accuracy: {epoch_acc:.2f}%')
    
    return history

### **Train the Model**

In [None]:
# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train the model
print("Starting training...\n")
history = train_model(model, train_loader, criterion, optimizer, device, epochs=5)
print("\nTraining completed!")

### **Visualize Training Progress**

In [None]:
# Plot training curves
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Loss curve
ax1.plot(history['train_loss'], marker='o', color='red', linewidth=2)
ax1.set_title('Training Loss', fontsize=14, fontweight='bold')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.grid(True, alpha=0.3)

# Accuracy curve
ax2.plot(history['train_acc'], marker='o', color='blue', linewidth=2)
ax2.set_title('Training Accuracy', fontsize=14, fontweight='bold')
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Accuracy (%)')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## **Part 3: Evaluating Model Performance** <a name="part3"></a>

Now we'll evaluate our trained model on the test set using various metrics.

In [None]:
def evaluate_model(model, test_loader, device):
    """Evaluate model on test set and return predictions."""
    
    model.eval()
    all_predictions = []
    all_labels = []
    
    with torch.no_grad():
        for images, labels in tqdm(test_loader, desc='Evaluating'):
            images = images.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            
            all_predictions.extend(predicted.cpu().numpy())
            all_labels.extend(labels.numpy())
    
    return np.array(all_predictions), np.array(all_labels)

# Evaluate
predictions, labels = evaluate_model(model, test_loader, device)

# Calculate metrics
accuracy = accuracy_score(labels, predictions)
print(f"\n{'='*50}")
print(f"Test Accuracy: {accuracy * 100:.2f}%")
print(f"{'='*50}\n")

# Classification report
print("Classification Report:")
print(classification_report(labels, predictions, target_names=[str(i) for i in range(10)]))

### **Confusion Matrix**

In [None]:
# Compute confusion matrix
cm = confusion_matrix(labels, predictions)

# Plot confusion matrix
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=True)
plt.title('Confusion Matrix - Standard CNN on MNIST', fontsize=16, fontweight='bold')
plt.xlabel('Predicted Label', fontsize=12)
plt.ylabel('True Label', fontsize=12)
plt.show()

# Per-class accuracy
class_accuracy = cm.diagonal() / cm.sum(axis=1)
for i, acc in enumerate(class_accuracy):
    print(f"Class {i} accuracy: {acc * 100:.2f}%")

### **Visualize Predictions**

In [None]:
# Visualize correct and incorrect predictions
def visualize_predictions(model, test_dataset, device, num_samples=10):
    model.eval()
    
    # Get random samples
    indices = np.random.choice(len(test_dataset), num_samples, replace=False)
    
    fig, axes = plt.subplots(2, 5, figsize=(14, 6))
    axes = axes.ravel()
    
    for idx, i in enumerate(indices):
        image, true_label = test_dataset[i]
        
        # Predict
        with torch.no_grad():
            output = model(image.unsqueeze(0).to(device))
            _, predicted = torch.max(output, 1)
            predicted_label = predicted.item()
        
        # Denormalize for visualization
        img_display = image * 0.3081 + 0.1307
        
        # Plot
        axes[idx].imshow(img_display.squeeze(), cmap='gray')
        color = 'green' if predicted_label == true_label else 'red'
        axes[idx].set_title(f'True: {true_label}, Pred: {predicted_label}', color=color)
        axes[idx].axis('off')
    
    plt.tight_layout()
    plt.show()

visualize_predictions(model, test_dataset, device)

## **Part 4: Understanding Robust Models** <a name="part4"></a>

### **What Makes a Model "Robust"?**

A **robust model** is one that maintains high accuracy even when inputs are perturbed. In adversarial machine learning:

**Standard Models:**
- Trained on clean data
- Optimized for accuracy on unperturbed inputs
- Vulnerable to adversarial perturbations
- High clean accuracy but low adversarial accuracy

**Robust Models:**
- Trained with adversarial examples (adversarial training)
- Optimized to resist perturbations
- More resilient to attacks
- Slightly lower clean accuracy but much higher adversarial accuracy

### **Adversarial Training**

The most effective defense against adversarial attacks:

```python
# Simplified adversarial training loop
for batch in data_loader:
    # 1. Generate adversarial examples
    adv_examples = generate_adversarial(model, batch)
    
    # 2. Train on both clean and adversarial examples
    loss_clean = criterion(model(batch), labels)
    loss_adv = criterion(model(adv_examples), labels)
    
    total_loss = loss_clean + loss_adv
    total_loss.backward()
```

### **Trade-offs**

| Aspect | Standard Model | Robust Model |
|--------|---------------|---------------|
| Clean Accuracy | ~99% | ~95-97% |
| Adversarial Accuracy | ~10-30% | ~70-85% |
| Training Time | Fast | 5-10× slower |
| Inference Time | Fast | Similar |
| Model Complexity | Standard | Similar |

## **Part 5: Saving & Loading Models** <a name="part5"></a>

We'll save our **standard** model and demonstrate how to load a saved model checkpoint.
In later labs, you'll load **robust** checkpoints for side-by-side comparisons.

In [None]:
# Save our trained standard model
model_save_path = 'standard_mnist_cnn.pth'
torch.save({
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'accuracy': accuracy,
    'history': history
}, model_save_path)

print(f"Model saved to {model_save_path}")
print(f"Model accuracy: {accuracy * 100:.2f}%")

In [None]:
# Load the model
def load_model(model_path, model_class, device):
    """Load a saved model."""
    model = model_class().to(device)
    checkpoint = torch.load(model_path, map_location=device)
    model.load_state_dict(checkpoint['model_state_dict'])
    model.eval()
    return model, checkpoint

# Load our saved model
loaded_model, checkpoint = load_model(model_save_path, StandardCNN, device)
print("Model loaded successfully!")
print(f"Saved model accuracy: {checkpoint['accuracy'] * 100:.2f}%")

### **Model Comparison Framework**

In [None]:
# Create a comparison framework
class ModelComparison:
    def __init__(self):
        self.models = {}
        self.results = {}
    
    def add_model(self, name, model):
        """Add a model to compare."""
        self.models[name] = model
    
    def evaluate_all(self, test_loader, device):
        """Evaluate all models."""
        for name, model in self.models.items():
            print(f"\nEvaluating {name}...")
            predictions, labels = evaluate_model(model, test_loader, device)
            accuracy = accuracy_score(labels, predictions)
            self.results[name] = {
                'accuracy': accuracy,
                'predictions': predictions,
                'labels': labels
            }
            print(f"{name} Accuracy: {accuracy * 100:.2f}%")
    
    def plot_comparison(self):
        """Plot comparison of models."""
        names = list(self.results.keys())
        accuracies = [self.results[name]['accuracy'] * 100 for name in names]
        
        plt.figure(figsize=(10, 6))
        bars = plt.bar(names, accuracies, color=['blue', 'green', 'red'][:len(names)])
        plt.ylabel('Accuracy (%)', fontsize=12)
        plt.title('Model Comparison on Clean MNIST Test Set', fontsize=14, fontweight='bold')
        plt.ylim([0, 100])
        plt.grid(axis='y', alpha=0.3)
        
        # Add value labels on bars
        for bar in bars:
            height = bar.get_height()
            plt.text(bar.get_x() + bar.get_width()/2., height,
                    f'{height:.2f}%', ha='center', va='bottom', fontweight='bold')
        
        plt.tight_layout()
        plt.show()

# Initialize comparison
comparison = ModelComparison()
comparison.add_model('Standard CNN', model)
comparison.evaluate_all(test_loader, device)

## **Part 6: Comparing Standard vs. Robust Models** <a name="part6"></a>

In the next labs, we'll:
1. Generate adversarial examples
2. Test both standard and robust models
3. Measure adversarial accuracy
4. Understand the robustness-accuracy tradeoff

**Preview: What We'll See**

```
Model Performance:
                    Clean Accuracy | Adversarial Accuracy (ε=0.3)
Standard CNN              98.5%    |        15.2%
Robust CNN (AT)           96.8%    |        78.4%
```

This demonstrates the **security vs. accuracy tradeoff** fundamental to robust ML.

## **Exercises** <a name="exercises"></a>

Complete these exercises to reinforce your understanding:

### **Exercise 1: Model Architecture (Easy)**
Modify the `StandardCNN` architecture to:
- Add one more convolutional layer
- Increase the number of filters to 128 in the last conv layer
- Train and compare performance

**Question:** Does deeper architecture always improve accuracy?

### **Exercise 2: CIFAR-10 Training (Medium)**
Adapt this notebook to train on CIFAR-10 instead of MNIST:
- CIFAR-10 has 3-channel RGB images (32×32)
- Update the architecture accordingly
- Compare training time and accuracy

**Hint:** Change the first conv layer to accept 3 input channels.

### **Exercise 3: Data Augmentation (Medium)**
Add data augmentation to the training pipeline:
```python
transform_train = transforms.Compose([
    transforms.RandomRotation(10),
    transforms.RandomAffine(degrees=0, translate=(0.1, 0.1)),
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])
```

**Question:** Does augmentation improve generalization?

### **Exercise 4: Hyperparameter Tuning (Hard)**
Experiment with different hyperparameters:
- Learning rates: [0.0001, 0.001, 0.01]
- Batch sizes: [32, 64, 128, 256]
- Optimizers: [Adam, SGD, RMSprop]

Create a comparison table showing the effect of each hyperparameter.

### **Exercise 5: Model Interpretability (Hard)**
Implement feature visualization:
- Extract and visualize learned filters from conv layers
- Create activation maps for specific test images
- Analyze which features the model learns

**Hint:** Use hooks to extract intermediate layer outputs.

## **Conclusion & Next Steps** <a name="conclusion"></a>
---

### **What You Learned**

- **Neural Network Training:** Built and trained a CNN from scratch  
- **Model Evaluation:** Used accuracy, confusion matrix, and classification reports  
- **Robust Models:** Understood the concept of adversarial robustness  
- **Model Persistence:** Saved and loaded trained models  
- **Baseline Establishment:** Created standard models for future attack labs  

### **Key Takeaways**

1. **Standard models** achieve high clean accuracy but are vulnerable to adversarial attacks
2. **Robust models** trade some clean accuracy for adversarial resilience
3. **Adversarial training** is the most effective defense but computationally expensive
4. **Model architecture** affects both performance and robustness

### **Next Lab Preview**

**Lab 1b: Threat Modeling & Attack Taxonomy**
- Understand the adversarial threat landscape
- Learn about different attack categories
- Formalize security objectives and threat models
- Prepare for implementing actual attacks

### **Additional Resources**

- **Paper:** [Explaining and Harnessing Adversarial Examples (Goodfellow et al., 2015)](https://arxiv.org/abs/1412.6572)
- **Paper:** [Towards Deep Learning Models Resistant to Adversarial Attacks (Madry et al., 2018)](https://arxiv.org/abs/1706.06083)
- **Library:** [Torchvision Models](https://pytorch.org/vision/stable/models.html)
- **Tutorial:** [PyTorch Training Tutorial](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html)