# üî• FILE 2-C: CNN & Model Checkpoint

**PH·∫¶N 2 - INTERMEDIATE (CORE DEEP LEARNING) - FINAL**

---

## üìã N·ªôi Dung

‚úÖ CNN (Convolutional Neural Networks) l√† g√¨

‚úÖ Conv2d - Convolutional layers

‚úÖ Pooling - MaxPool, AvgPool

‚úÖ CNN Architecture patterns

‚úÖ D·ª± √°n: Image Classification v·ªõi CNN

‚úÖ Save & Load model checkpoints

‚úÖ Best practices cho model saving

---

## ‚è±Ô∏è Th·ªùi Gian H·ªçc: 3-4 gi·ªù

---

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import torchvision
import torchvision.transforms as transforms
import numpy as np
import matplotlib.pyplot as plt

print(f"PyTorch version: {torch.__version__}")
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Device: {device}")

---

# 1Ô∏è‚É£ CNN C∆° B·∫£n

## T·∫°i Sao C·∫ßn CNN?

**V·∫•n ƒë·ªÅ v·ªõi Fully Connected cho ·∫£nh:**

```
·∫¢nh 224√ó224√ó3 (RGB)
‚Üí Flatten: 224 √ó 224 √ó 3 = 150,528 features
‚Üí FC(150528, 1000): 150M+ parameters!
```

‚ùå Qu√° nhi·ªÅu parameters

‚ùå M·∫•t spatial information

‚ùå Kh√¥ng translation invariant

**CNN gi·∫£i quy·∫øt:**

‚úÖ Local connectivity (receptive field)

‚úÖ Parameter sharing (same filter)

‚úÖ Translation invariance

‚úÖ Hierarchical features

## Conv2d Layer

In [None]:
print("=" * 70)
print("nn.Conv2d")
print("=" * 70)

# Conv2d parameters
conv = nn.Conv2d(
    in_channels=3,      # Input channels (RGB = 3)
    out_channels=16,    # Output channels (number of filters)
    kernel_size=3,      # Filter size: 3√ó3
    stride=1,           # Stride
    padding=1           # Padding
)

print(f"Conv layer: {conv}")
print(f"\nParameters:")
print(f"  Weight shape: {conv.weight.shape}")
print(f"  ‚Üí (out_channels, in_channels, kernel_h, kernel_w)")
print(f"  Bias shape: {conv.bias.shape}")
print(f"  Total params: {conv.weight.numel() + conv.bias.numel()}")

# Test forward
x = torch.randn(1, 3, 32, 32)  # (batch, channels, height, width)
out = conv(x)
print(f"\nInput: {x.shape}")
print(f"Output: {out.shape}")

print("""
C√îNG TH·ª®C OUTPUT SIZE:

H_out = (H_in + 2*padding - kernel_size) / stride + 1
W_out = (W_in + 2*padding - kernel_size) / stride + 1

V√≠ d·ª•:
  Input: 32√ó32
  kernel=3, padding=1, stride=1
  Output: (32 + 2*1 - 3)/1 + 1 = 32√ó32

üí° SAME PADDING: padding = (kernel_size - 1) / 2
   ‚Üí Output size = Input size (when stride=1)
""")

## Pooling Layers

In [None]:
print("=" * 70)
print("POOLING LAYERS")
print("=" * 70)

# MaxPool
maxpool = nn.MaxPool2d(kernel_size=2, stride=2)

# AvgPool
avgpool = nn.AvgPool2d(kernel_size=2, stride=2)

x = torch.randn(1, 16, 32, 32)
max_out = maxpool(x)
avg_out = avgpool(x)

print(f"Input: {x.shape}")
print(f"MaxPool output: {max_out.shape}")
print(f"AvgPool output: {avg_out.shape}")

print("""
POOLING:

MaxPool2d(2, 2):
  - Chia th√†nh windows 2√ó2
  - L·∫•y gi√° tr·ªã MAX trong m·ªói window
  - Gi·∫£m size: 32√ó32 ‚Üí 16√ó16

AvgPool2d(2, 2):
  - L·∫•y TRUNG B√åNH thay v√¨ max

L·ª¢I √çCH:
  ‚úÖ Downsampling (gi·∫£m spatial size)
  ‚úÖ Gi·∫£m parameters
  ‚úÖ Translation invariance
  ‚úÖ TƒÉng receptive field

KHUY·∫æN NGH·ªä:
  - MaxPool: Ph·ªï bi·∫øn h∆°n
  - kernel=2, stride=2: Standard
""")

---

# 2Ô∏è‚É£ CNN Architecture

In [None]:
print("=" * 70)
print("SIMPLE CNN ARCHITECTURE")
print("=" * 70)

class SimpleCNN(nn.Module):
    """Simple CNN for image classification"""
    
    def __init__(self, num_classes=10):
        super().__init__()
        
        # Convolutional layers
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        
        # Pooling
        self.pool = nn.MaxPool2d(2, 2)
        
        # Fully connected
        self.fc1 = nn.Linear(128 * 4 * 4, 512)
        self.fc2 = nn.Linear(512, num_classes)
        
        # Dropout
        self.dropout = nn.Dropout(0.5)
    
    def forward(self, x):
        # Conv block 1: 32√ó32 ‚Üí 16√ó16
        x = self.pool(torch.relu(self.conv1(x)))
        
        # Conv block 2: 16√ó16 ‚Üí 8√ó8
        x = self.pool(torch.relu(self.conv2(x)))
        
        # Conv block 3: 8√ó8 ‚Üí 4√ó4
        x = self.pool(torch.relu(self.conv3(x)))
        
        # Flatten
        x = x.view(x.size(0), -1)
        
        # FC layers
        x = self.dropout(torch.relu(self.fc1(x)))
        x = self.fc2(x)
        
        return x

model = SimpleCNN(num_classes=10)
print(model)

# Test
x = torch.randn(1, 3, 32, 32)
out = model(x)
print(f"\nInput: {x.shape}")
print(f"Output: {out.shape}")
print(f"\nTotal parameters: {sum(p.numel() for p in model.parameters()):,}")

## Modern CNN Pattern

In [None]:
print("=" * 70)
print("MODERN CNN WITH BATCHNORM")
print("=" * 70)

class ModernCNN(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()
        
        self.features = nn.Sequential(
            # Block 1
            nn.Conv2d(3, 64, 3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, 3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2),
            
            # Block 2
            nn.Conv2d(64, 128, 3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 128, 3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2),
            
            # Block 3
            nn.Conv2d(128, 256, 3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2),
        )
        
        self.classifier = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(256 * 4 * 4, 512),
            nn.ReLU(inplace=True),
            nn.Dropout(0.5),
            nn.Linear(512, num_classes)
        )
    
    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x

print("""
MODERN CNN PATTERN:

Conv ‚Üí BatchNorm ‚Üí ReLU ‚Üí Conv ‚Üí BatchNorm ‚Üí ReLU ‚Üí Pool

KEY POINTS:
  ‚úÖ BatchNorm after Conv
  ‚úÖ Double Conv before pooling
  ‚úÖ Increasing channels: 64 ‚Üí 128 ‚Üí 256
  ‚úÖ Decreasing spatial: 32 ‚Üí 16 ‚Üí 8 ‚Üí 4
  ‚úÖ Dropout in FC layers
""")

---

# 3Ô∏è‚É£ D·ª± √Ån: Image Classification

## Load CIFAR-10 Dataset

In [None]:
print("=" * 70)
print("LOADING CIFAR-10")
print("=" * 70)

# Data transforms
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

# Download dataset
train_dataset = torchvision.datasets.CIFAR10(
    root='./data', train=True, download=True, transform=transform
)

test_dataset = torchvision.datasets.CIFAR10(
    root='./data', train=False, download=True, transform=transform
)

# DataLoaders
train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True, num_workers=2)
test_loader = DataLoader(test_dataset, batch_size=128, shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat', 'deer', 
           'dog', 'frog', 'horse', 'ship', 'truck')

print(f"Train samples: {len(train_dataset)}")
print(f"Test samples: {len(test_dataset)}")
print(f"Classes: {classes}")

# Visualize samples
dataiter = iter(train_loader)
images, labels = next(dataiter)

fig, axes = plt.subplots(2, 8, figsize=(16, 4))
for i, ax in enumerate(axes.flatten()):
    img = images[i].numpy().transpose(1, 2, 0)
    img = img * 0.5 + 0.5  # Denormalize
    ax.imshow(img)
    ax.set_title(classes[labels[i]])
    ax.axis('off')
plt.tight_layout()
plt.show()

## Training CNN

In [None]:
print("=" * 70)
print("TRAINING CNN")
print("=" * 70)

# Model
model = SimpleCNN(num_classes=10).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training
num_epochs = 10
history = {'train_loss': [], 'train_acc': [], 'test_acc': []}

for epoch in range(num_epochs):
    # Train
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
    
    train_loss = running_loss / len(train_loader)
    train_acc = 100 * correct / total
    
    # Test
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    
    test_acc = 100 * correct / total
    
    history['train_loss'].append(train_loss)
    history['train_acc'].append(train_acc)
    history['test_acc'].append(test_acc)
    
    print(f"Epoch [{epoch+1}/{num_epochs}] Loss: {train_loss:.3f} "
          f"Train Acc: {train_acc:.1f}% Test Acc: {test_acc:.1f}%")

print("\n‚úÖ Training completed!")

---

# 4Ô∏è‚É£ Save & Load Model

## C√°ch L∆∞u Model

In [None]:
print("=" * 70)
print("SAVE & LOAD MODEL")
print("=" * 70)

# C√ÅCH 1: Save to√†n b·ªô model (KH√îNG khuy·∫øn kh√≠ch)
torch.save(model, 'entire_model.pth')
loaded_model = torch.load('entire_model.pth')
print("‚úì Saved entire model")

# C√ÅCH 2: Save state_dict (KHUY·∫æN KH√çCH)
torch.save(model.state_dict(), 'model_weights.pth')
print("‚úì Saved state_dict")

# Load state_dict
new_model = SimpleCNN(num_classes=10)
new_model.load_state_dict(torch.load('model_weights.pth'))
new_model.to(device)
print("‚úì Loaded state_dict")

# C√ÅCH 3: Save checkpoint (BEST PRACTICE)
checkpoint = {
    'epoch': num_epochs,
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'loss': history['train_loss'][-1],
    'accuracy': history['test_acc'][-1],
}

torch.save(checkpoint, 'checkpoint.pth')
print("‚úì Saved checkpoint")

# Load checkpoint
checkpoint = torch.load('checkpoint.pth')
model = SimpleCNN(num_classes=10)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer = optim.Adam(model.parameters())
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']

print(f"‚úì Loaded checkpoint from epoch {epoch}")

print("""

BEST PRACTICES:

1. L∆∞u state_dict, KH√îNG l∆∞u to√†n b·ªô model
2. L∆∞u optimizer state ƒë·ªÉ resume training
3. L∆∞u epoch, loss, accuracy
4. L∆∞u hyperparameters, config
5. ƒê·∫∑t t√™n file r√µ r√†ng: model_epoch10_acc85.pth

CHECKPOINT STRUCTURE:
{
    'epoch': int,
    'model_state_dict': OrderedDict,
    'optimizer_state_dict': dict,
    'loss': float,
    'accuracy': float,
    'config': dict  # Optional
}
""")

## Early Stopping v·ªõi Checkpoint

In [None]:
print("=" * 70)
print("TRAINING WITH CHECKPOINTING")
print("=" * 70)

class ModelCheckpoint:
    """Save best model during training"""
    
    def __init__(self, filepath, monitor='val_loss', mode='min', verbose=True):
        self.filepath = filepath
        self.monitor = monitor
        self.mode = mode
        self.verbose = verbose
        self.best = float('inf') if mode == 'min' else float('-inf')
    
    def __call__(self, model, optimizer, epoch, metrics):
        current = metrics[self.monitor]
        
        if self.mode == 'min':
            is_better = current < self.best
        else:
            is_better = current > self.best
        
        if is_better:
            self.best = current
            
            checkpoint = {
                'epoch': epoch,
                'model_state_dict': model.state_dict(),
                'optimizer_state_dict': optimizer.state_dict(),
                **metrics
            }
            
            torch.save(checkpoint, self.filepath)
            
            if self.verbose:
                print(f"\nüíæ Saved checkpoint: {self.monitor}={current:.4f}")

print("""
USAGE:

checkpoint = ModelCheckpoint(
    filepath='best_model.pth',
    monitor='val_acc',
    mode='max'
)

for epoch in range(epochs):
    # Training...
    metrics = {
        'train_loss': train_loss,
        'val_loss': val_loss,
        'val_acc': val_acc
    }
    checkpoint(model, optimizer, epoch, metrics)
""")

---

# ‚úÖ T·ªïng K·∫øt FILE 2-C & INTERMEDIATE LEVEL

## FILE 2-C: H·ªçc ƒê∆∞·ª£c

‚úÖ **CNN basics**: Conv2d, Pooling

‚úÖ **CNN architecture**: Pattern thi·∫øt k·∫ø

‚úÖ **Image classification**: CIFAR-10 project

‚úÖ **Model checkpoint**: Save/load best practices

---

## üéâ HO√ÄN TH√ÄNH INTERMEDIATE LEVEL!

### T·ªïng K·∫øt PH·∫¶N 2:

üìó **FILE 2-A**: Dataset & DataLoader
- Custom Dataset
- DataLoader
- Training/Validation loop
- Overfitting/Underfitting
- Early Stopping

üìó **FILE 2-B**: Optimizer & Regularization
- Activation functions
- Optimizers (SGD, Adam, AdamW)
- Learning rate scheduling
- Regularization (Dropout, Weight Decay, BatchNorm)

üìó **FILE 2-C**: CNN & Checkpoint
- CNN architecture
- Image classification
- Model saving/loading

---

## B·∫°n Gi·ªù C√≥ Th·ªÉ:

‚úÖ X·ª≠ l√Ω data v·ªõi Dataset/DataLoader

‚úÖ Build training pipeline ho√†n ch·ªânh

‚úÖ Detect v√† fix overfitting

‚úÖ Tune hyperparameters

‚úÖ Build CNN cho image tasks

‚úÖ Save/load models ƒë√∫ng c√°ch

---

## Template: Complete CNN Training Pipeline

```python
# 1. Data
train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=128, shuffle=False)

# 2. Model
model = ModernCNN(num_classes=10).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.AdamW(model.parameters(), lr=0.001, weight_decay=0.01)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.5)

# 3. Training
checkpoint = ModelCheckpoint('best_model.pth', monitor='val_acc', mode='max')
early_stop = EarlyStopping(patience=10)

for epoch in range(epochs):
    # Train
    train_loss, train_acc = train_epoch(model, train_loader, ...)
    
    # Validate
    val_loss, val_acc = validate_epoch(model, val_loader, ...)
    
    # Save best
    checkpoint(model, optimizer, epoch, {'val_acc': val_acc, ...})
    
    # Early stopping
    early_stop(val_loss)
    if early_stop.early_stop:
        break
    
    # Update LR
    scheduler.step()
```

---

## üöÄ B∆∞·ªõc Ti·∫øp Theo

B·∫°n ƒë√£ s·∫µn s√†ng cho:
- Transfer Learning
- Advanced architectures (ResNet, VGG, etc.)
- Object Detection
- Semantic Segmentation
- GANs, Transformers

**Congratulations on completing Intermediate level! üéâ**

---