
# üìö Ng√†y 2 ‚Äî PyTorch & Deep Learning cho Computer Vision (Chu·∫©n b·ªã ph·ªèng v·∫•n)
**Th·ªùi l∆∞·ª£ng ƒë·ªÅ xu·∫•t:** 5 gi·ªù h·ªçc + 1 gi·ªù th·ª±c h√†nh (code).

**M·ª•c ti√™u:**

- Hi·ªÉu PyTorch c∆° b·∫£n: `torch.tensor`, `requires_grad`, `nn.Module`, `forward`, optimizer, `loss.backward()`.

- L√†m vi·ªác v·ªõi `DataLoader` (MNIST), x√¢y CNN nh·ªè (2 conv + 2 fc), vi·∫øt training loop, t√≠nh accuracy, l∆∞u/load model.
- In confusion matrix (d√πng `sklearn`) v√† so s√°nh ·∫£nh h∆∞·ªüng c·ªßa batch size (64 vs 128).


**Ghi ch√∫:** Notebook n√†y ƒë∆∞·ª£c vi·∫øt b·∫±ng ti·∫øng Vi·ªát, k√®m nhi·ªÅu ch√∫ th√≠ch. N·∫øu ch·∫°y tr√™n Colab, nh·ªõ ch·ªçn Runtime > GPU.



## 0) C√†i ƒë·∫∑t (n·∫øu c·∫ßn)

C√°c d√≤ng d∆∞·ªõi ƒë√¢y c√≥ th·ªÉ c·∫ßn ch·∫°y **n·∫øu m√°y/em ch∆∞a c√†i** PyTorch, torchvision, scikit-learn, matplotlib. ·ªû Colab th∆∞·ªùng ƒë√£ c√†i s·∫µn (ho·∫∑c d√πng `pip` ph√π h·ª£p v·ªõi CUDA).

```bash
# Ch·ªâ ch·∫°y n·∫øu c·∫ßn. C√≥ th·ªÉ c·∫ßn thay command install ph√π h·ª£p v·ªõi CUDA tr√™n m√°y em.
# !pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
# !pip install scikit-learn matplotlib tqdm
```

N·∫øu em d√πng Colab: Runtime > Change runtime type > GPU, sau ƒë√≥ ch·∫°y l·∫°i cell import.


In [None]:
# 1) Imports & Device
import os
import time
import random
from collections import OrderedDict

import numpy as np
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from torchvision import datasets, transforms

# Optional: tqdm for progress bars
try:
    from tqdm import tqdm
except Exception:
    tqdm = lambda x: x

# sklearn for confusion matrix and classification report
from sklearn.metrics import confusion_matrix, classification_report

# Device (GPU n·∫øu c√≥)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Device:', device)

# Reproducibility
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(SEED)

# Create output folder
os.makedirs('/mnt/data/pytorch_day2', exist_ok=True)
print('/mnt/data/pytorch_day2 is ready')



## 2) Datasets & DataLoader (MNIST)
- Ta s·ª≠ d·ª•ng MNIST ƒë·ªÉ t·∫≠p trung v√†o pipeline: download, transform, DataLoader.
- N·∫øu mu·ªën test CIFAR10, ch·ªâ c·∫ßn ƒë·ªïi dataset v√† transforms t∆∞∆°ng ·ª©ng.


In [None]:
# 2.1 Transforms: normalize theo mean/std c·ªßa MNIST (grayscale)
from torchvision import transforms

transform = transforms.Compose([
    transforms.ToTensor(),            # tr·∫£ v·ªÅ tensor shape (C,H,W) v√† normalize [0,1]
    transforms.Normalize((0.1307,), (0.3081,))  # mean/std MNIST
])

# 2.2 Download dataset
root = '/mnt/data/pytorch_day2/data'
train_dataset = datasets.MNIST(root=root, train=True, download=True, transform=transform)
test_dataset  = datasets.MNIST(root=root, train=False, download=True, transform=transform)

print('Train size =', len(train_dataset))
print('Test size  =', len(test_dataset))

# 2.3 Create DataLoader: default batch_size 64 (we will experiment with 128 later)
BATCH_SIZE = 64
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=2, pin_memory=True)
test_loader  = DataLoader(test_dataset,  batch_size=BATCH_SIZE, shuffle=False, num_workers=2, pin_memory=True)

# Visual check: show some samples
import matplotlib.pyplot as plt

def show_images(dataset, n=6):
    fig, axes = plt.subplots(1, n, figsize=(12,2))
    for i in range(n):
        img, label = dataset[i]
        img = img.squeeze().numpy()
        axes[i].imshow(img, cmap='gray')
        axes[i].set_title(str(label))
        axes[i].axis('off')
    plt.show()

show_images(train_dataset, n=6)



## 3) X√¢y CNN nh·ªè (2 conv layers + 2 fully connected)
- M√¥ t·∫£ ki·∫øn tr√∫c: Conv -> ReLU -> MaxPool -> Conv -> ReLU -> MaxPool -> Flatten -> FC -> ReLU -> FC(logits)
- Ta gi·∫£i th√≠ch shapes t·ª´ng b∆∞·ªõc trong code.


In [None]:
import torch.nn.functional as F

class SimpleCNN(nn.Module):
    def __init__(self, num_classes=10):
        super(SimpleCNN, self).__init__()
        # Conv1: input channels=1 (MNIST), output channels=16, kernel=3
        self.conv1 = nn.Conv2d(1, 16, kernel_size=3, padding=1)  # output size -> (16,28,28)
        self.bn1 = nn.BatchNorm2d(16)
        # Conv2: input 16, output 32, kernel=3
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, padding=1) # output size -> (32,14,14) after pool
        self.bn2 = nn.BatchNorm2d(32)
        # After two pools: input 28x28 -> 14x14 -> 7x7
        # FC layers: flatten 32*7*7 -> hidden 128 -> out num_classes
        self.fc1 = nn.Linear(32 * 7 * 7, 128)
        self.dropout = nn.Dropout(0.5)
        self.fc2 = nn.Linear(128, num_classes)

    def forward(self, x):
        # x: (batch, 1, 28, 28)
        x = self.conv1(x)            # -> (batch,16,28,28)
        x = self.bn1(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)      # -> (batch,16,14,14)

        x = self.conv2(x)           # -> (batch,32,14,14)
        x = self.bn2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)      # -> (batch,32,7,7)

        x = x.view(x.size(0), -1)   # flatten -> (batch, 32*7*7)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)             # logits
        return x

# Instantiate and print parameter count
model = SimpleCNN().to(device)
print(model)

# Count parameters
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print('Trainable params:', count_parameters(model))



## 4) Loss, Optimizer & Utility functions
- Ch√∫ng ta s·∫Ω d√πng `CrossEntropyLoss` (k·∫øt h·ª£p softmax + NLL) v√† `optim.SGD` v·ªõi momentum.
- Vi·∫øt h√†m `train_one_epoch`, `evaluate` v√† `compute_accuracy`.


In [None]:
# 4.1 Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
# Optional LR scheduler
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.1)

# 4.2 Utility: accuracy
def compute_accuracy(logits, labels):
    preds = torch.argmax(logits, dim=1)
    correct = (preds == labels).sum().item()
    return correct / labels.size(0)

# 4.3 Train & Eval loops

def train_one_epoch(model, loader, optimizer, criterion, device):
    model.train()
    running_loss = 0.0
    running_corrects = 0
    total = 0
    for images, labels in tqdm(loader):
        images = images.to(device)
        labels = labels.to(device)

        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward
        loss.backward()
        optimizer.step()

        # Stats
        batch_size = labels.size(0)
        running_loss += loss.item() * batch_size
        running_corrects += (torch.argmax(outputs, dim=1) == labels).sum().item()
        total += batch_size

    epoch_loss = running_loss / total
    epoch_acc  = running_corrects / total
    return epoch_loss, epoch_acc


def evaluate(model, loader, criterion, device):
    model.eval()
    running_loss = 0.0
    running_corrects = 0
    total = 0
    all_preds = []
    all_labels = []
    with torch.no_grad():
        for images, labels in loader:
            images = images.to(device)
            labels = labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)

            batch_size = labels.size(0)
            running_loss += loss.item() * batch_size
            running_corrects += (torch.argmax(outputs, dim=1) == labels).sum().item()
            total += batch_size

            all_preds.append(torch.argmax(outputs, dim=1).cpu().numpy())
            all_labels.append(labels.cpu().numpy())

    epoch_loss = running_loss / total
    epoch_acc  = running_corrects / total
    all_preds = np.concatenate(all_preds)
    all_labels = np.concatenate(all_labels)
    return epoch_loss, epoch_acc, all_preds, all_labels



## 5) Full training script (run experiments)
- H√†m `run_training` s·∫Ω train model trong N epochs, l∆∞u l·ªãch s·ª≠ loss/acc v√† l∆∞u model cu·ªëi.
- Sau khi train, ta s·∫Ω evaluate v√† in confusion matrix.


In [None]:
def plot_history(history, title_prefix=''):
    epochs = np.arange(1, len(history['train_loss'])+1)
    plt.figure(figsize=(10,4))
    plt.subplot(1,2,1)
    plt.plot(epochs, history['train_loss'], label='train_loss')
    plt.plot(epochs, history['val_loss'], label='val_loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend()
    plt.title(title_prefix + ' Loss')

    plt.subplot(1,2,2)
    plt.plot(epochs, history['train_acc'], label='train_acc')
    plt.plot(epochs, history['val_acc'], label='val_acc')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.legend()
    plt.title(title_prefix + ' Accuracy')
    plt.show()


def plot_confusion_matrix(cm, classes, normalize=False, title='Confusion matrix'):
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
    plt.figure(figsize=(6,6))
    plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i in range(cm.shape[0]):
        for j in range(cm.shape[1]):
            plt.text(j, i, format(cm[i, j], fmt),
                     horizontalalignment='center',
                     color='white' if cm[i, j] > thresh else 'black')
    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    plt.tight_layout()
    plt.show()


def run_training(model, train_loader, val_loader, optimizer, criterion, device, epochs=5, scheduler=None, model_name='model.pth'):
    history = {'train_loss':[], 'train_acc':[], 'val_loss':[], 'val_acc':[]}
    best_val_acc = 0.0
    best_state = None
    for epoch in range(1, epochs+1):
        start = time.time()
        train_loss, train_acc = train_one_epoch(model, train_loader, optimizer, criterion, device)
        val_loss, val_acc, val_preds, val_labels = evaluate(model, val_loader, criterion, device)
        if scheduler is not None:
            scheduler.step()
        elapsed = time.time() - start

        history['train_loss'].append(train_loss)
        history['train_acc'].append(train_acc)
        history['val_loss'].append(val_loss)
        history['val_acc'].append(val_acc)

        print(f"Epoch {epoch}/{epochs} - time: {elapsed:.1f}s - train_loss: {train_loss:.4f}, train_acc: {train_acc:.4f} - val_loss: {val_loss:.4f}, val_acc: {val_acc:.4f}")

        # Save best
        if val_acc > best_val_acc:
            best_val_acc = val_acc
            best_state = model.state_dict()
            torch.save(best_state, os.path.join('/mnt/data/pytorch_day2', model_name))
            print('Saved best model with val_acc =', best_val_acc)

    # At the end, return history and last evaluation
    return history, (val_loss, val_acc, val_preds, val_labels)

# Run a short experiment: 5 epochs with BATCH_SIZE already set to 64
model = SimpleCNN().to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

history, eval_res = run_training(model, train_loader, test_loader, optimizer, criterion, device, epochs=5, scheduler=scheduler, model_name='mnist_cnn_bs64.pth')
plot_history(history, title_prefix='MNIST_CNN_bs64')

# Confusion matrix and classification report
val_loss, val_acc, val_preds, val_labels = eval_res
cm = confusion_matrix(val_labels, val_preds)
print('Confusion matrix (counts):')
plot_confusion_matrix(cm, classes=[str(i) for i in range(10)], normalize=False)
print('Classification report:')
print(classification_report(val_labels, val_preds))



## 6) So s√°nh Batch Size (64 vs 128)
- D·ª±ng h√†m `experiment_batch_size` ƒë·ªÉ train ng·∫Øn (v√≠ d·ª• 5 epoch) v·ªõi batch size kh√°c nhau v√† so s√°nh loss/acc.
- L∆∞u √Ω: khi tƒÉng batch size, learning rate c≈©ng c√≥ th·ªÉ c·∫ßn tƒÉng (rule-of-thumb: LR ‚àù batch_size), nh∆∞ng ·ªü ƒë√¢y ta gi·ªØ LR c·ªë ƒë·ªãnh ƒë·ªÉ quan s√°t kh√°c bi·ªát.


In [None]:
def experiment_batch_size(batch_size, epochs=5):
    print('Running experiment with batch_size =', batch_size)
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=2, pin_memory=True)
    test_loader  = DataLoader(test_dataset,  batch_size=batch_size, shuffle=False, num_workers=2, pin_memory=True)
    model = SimpleCNN().to(device)
    optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
    history, eval_res = run_training(model, train_loader, test_loader, optimizer, criterion, device, epochs=epochs,
                                     scheduler=optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.1),
                                     model_name=f'mnist_cnn_bs{batch_size}.pth')
    return history, eval_res

# Run experiments
hist64, res64 = experiment_batch_size(64, epochs=5)
hist128, res128 = experiment_batch_size(128, epochs=5)

print('\n--- Summary ---')
print('BS=64 val_acc:', res64[1])
print('BS=128 val_acc:', res128[1])

# Plot comparison (accuracy)
plt.figure(figsize=(6,4))
plt.plot(hist64['val_acc'], label='val_acc_bs64')
plt.plot(hist128['val_acc'], label='val_acc_bs128')
plt.xlabel('Epoch')
plt.ylabel('Validation Accuracy')
plt.legend()
plt.title('Compare validation accuracy')
plt.show()



## 7) L∆∞u & T·∫£i model
V√≠ d·ª• l∆∞u `state_dict` v√† c√°ch load ƒë·ªÉ inference.


In [None]:
# Load best model example
model_path = '/mnt/data/pytorch_day2/mnist_cnn_bs64.pth'
if os.path.exists(model_path):
    model = SimpleCNN().to(device)
    model.load_state_dict(torch.load(model_path, map_location=device))
    model.eval()
    print('Loaded model from', model_path)
else:
    print('Model file not found:', model_path)

# Inference on few test images
model.eval()
imgs, labels = next(iter(test_loader))
imgs = imgs.to(device)
labels = labels.to(device)
with torch.no_grad():
    logits = model(imgs[:16])
    preds = torch.argmax(logits, dim=1)

# show first 8 predictions
plt.figure(figsize=(12,3))
for i in range(8):
    plt.subplot(1,8,i+1)
    img = imgs[i].cpu().squeeze().numpy()
    plt.imshow(img, cmap='gray')
    plt.title(f'p={preds[i].item()}, t={labels[i].item()}')
    plt.axis('off')
plt.show()



## 8) G·ª£i √Ω tr·∫£ l·ªùi ph·ªèng v·∫•n ‚Äî C√°c c√¢u h·ªèi ph·ªï bi·∫øn & c√°ch tr·∫£ l·ªùi ng·∫Øn
- **Q:** T·∫°i sao d√πng `CrossEntropyLoss` cho classification?
  - **A:** V√¨ `CrossEntropyLoss` k·∫øt h·ª£p softmax + NLL, ph√π h·ª£p cho multi-class logits ƒë·∫ßu ra.

- **Q:** Hi·ªÉu `requires_grad` l√† g√¨?
  - **A:** N·∫øu tensor `requires_grad=True`, PyTorch s·∫Ω theo d√µi c√°c thao t√°c ƒë·ªÉ t√≠nh gradient trong backprop.

- **Q:** Khi n√†o d√πng `model.train()` vs `model.eval()`?
  - **A:** `model.train()` b·∫≠t dropout/batchnorm ·ªü ch·∫ø ƒë·ªô train; `model.eval()` t·∫Øt dropout v√† d√πng running stats c·ªßa batchnorm.

- **Q:** L√†m sao tr√°nh overfitting?
  - **A:** Data augmentation, dropout, weight decay (L2), early stopping, tƒÉng d·ªØ li·ªáu, reducing model capacity.

- **Q:** N√™n tƒÉng LR hay batch size ƒë·ªÉ train nhanh?
  - **A:** C√¢n nh·∫Øc; th∆∞·ªùng LR ‚àù batch_size nh∆∞ng ph·∫£i ƒëi·ªÅu ch·ªânh. C√≥ k·ªπ thu·∫≠t LR warmup, cyclical LR.

- **Q:** M√¥ t·∫£ nhanh training loop?
  - **A:** forward -> compute loss -> backward (loss.backward()) -> optimizer.step() -> zero_grad().

- **Q:** Kh√°c bi·ªát `state_dict` v√† l∆∞u c·∫£ model?
  - **A:** `state_dict` ch·ª©a weights; l∆∞u c·∫£ model (pickle) c√≥ th·ªÉ d·ªÖ load nh∆∞ng √≠t linh ho·∫°t + c√≥ v·∫•n ƒë·ªÅ t∆∞∆°ng th√≠ch.



## 9) B√†i t·∫≠p (1h) ‚Äî Em l√†m code trong notebook
1. Train CNN tr√™n MNIST 5‚Äì10 epoch (n·∫øu th·ªùi gian), in loss/accuracy m·ªói epoch (ƒë√£ c√≥ trong notebook).
2. Th·ª≠ tƒÉng batch size l√™n 128, so s√°nh loss/accuracy (ƒë√£ c√≥ experiment function).
3. (Bonus) Thay optimizer sang `Adam(lr=1e-3)` v√† xem kh√°c bi·ªát.
4. (CV task) Th·ª≠ chuy·ªÉn dataset sang **CIFAR10**: thay transform (normalize 3 channels) v√† s·ª≠a m·∫°ng (input channels=3, c·∫ßn small changes).

Khi xong, g·ª≠i cho th·∫ßy file logs ho·∫∑c screenshot training logs ƒë·ªÉ th·∫ßy review.
