# Machine Learning Development Tools Tutorial

**Author:** DuhyeonKim + Perplexity (Claude 4)

This notebook demonstrates essential tools and techniques for machine learning development including progress tracking, device management, argument parsing, loss tracking, evaluation, and more.

## Installation Requirements

First, let's install the required packages:

In [None]:
# !pip install torch torchvision tqdm tensorboard matplotlib pillow numpy scikit-learn

## 1. tqdm - Progress Tracking

tqdm provides fast, extensible progress bars for loops and iterables.

There are two ways to use tqdm.

- First is using pbar and update by myself in the for loop.
- Second is wrapping my iterable with tqdm so tqdm can track the process itself.

### using pdar.update(1)

self update

In [2]:
import time
from tqdm import tqdm

# Simplified tqdm example using pbar.update()
pbar = tqdm(total=100, desc='Processing')

for i, _ in enumerate(range(100)):
    time.sleep(0.02)  # Simulate delay
    pbar.update(1)

pbar.close()

Processing: 100%|██████████| 100/100 [00:02<00:00, 41.25it/s]


In [3]:
import time
from tqdm import tqdm

# Simple example applying tqdm with pbar.update() for epochs
epochs = 3
num_batches = 10

for epoch in range(1, epochs + 1):
    pbar = tqdm(total=num_batches, desc=f'Epoch {epoch}')
    for batch_idx in range(num_batches):
        time.sleep(0.5)  # Simulate delay -> originally training computation
        pbar.update(1)
    pbar.close()
    
    print(f"Epoch {epoch} completed. average loss: {0.01 * epoch:.4f}")


Epoch 1: 100%|██████████| 10/10 [00:05<00:00,  1.98it/s]


Epoch 1 completed. average loss: 0.0100


Epoch 2: 100%|██████████| 10/10 [00:05<00:00,  1.98it/s]


Epoch 2 completed. average loss: 0.0200


Epoch 3: 100%|██████████| 10/10 [00:05<00:00,  1.98it/s]

Epoch 3 completed. average loss: 0.0300





### wrapping iterable(list, tupe, dataframe, etc.) with tqdm

In [7]:
import random
import time
from tqdm import tqdm

long_array = [random.randint(1, 1000) for _ in range(1000)]

for item in tqdm(long_array, desc='Processing items'):
    time.sleep(0.01)
    pass

Processing items: 100%|██████████| 1000/1000 [00:12<00:00, 82.41it/s]


## 2. Device Management

Proper device management for GPU/CPU operations.

> ⚠️ **Warning**: one of the tensors is not allocated to device -> error occurs

In [10]:
import torch
import torch.nn as nn

# Simple device detection using accelerator
# device = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'
device = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else "cpu"
print(f"Using device: {device}")

# Create a simple model and move to device
model = nn.Linear(10, 1).to(device)

# Create sample data and move to device
x = torch.randn(5, 10).to(device)
# x = torch.randn(5, 10)

# Forward pass
output = model(x)

# Print meaningful information
print(f"Input device: {x.device}")
print(f"Model device: {next(model.parameters()).device}")
print(f"Output device: {output.device}")
print(f"Output shape: {output.shape}")

# Print all available devices if cuda or mps
if device == 'cuda':
    device_count = torch.cuda.device_count()
    print("All available cuda devices:")
    for i in range(device_count):
        print(f"  - cuda:{i}")
elif device == 'mps':
    print("All available mps devices:")
    print("  - mps:0")  # f-string 제거


Using device: mps
Input device: mps:0
Model device: mps:0
Output device: mps:0
Output shape: torch.Size([5, 1])
All available mps devices:
  - mps:0


## 3. argparse - Command Line Arguments

argparse is used for parsing command-line arguments in terminal scripts.

In [11]:
import argparse

def main():
    parser = argparse.ArgumentParser(description='Simple training script')
    parser.add_argument('--model', type=str, required=True, help='Model name')
    parser.add_argument('--mode', type=str, choices=['train', 'test'], required=True, help='Mode: train or test')
    parser.add_argument('--dataset', type=str, required=True, help='Dataset name')
    parser.add_argument('--epoch', type=int, default=10, help='Number of epochs')

    args = parser.parse_args()

    print(f"Model: {args.model}")
    print(f"Mode: {args.mode}")
    print(f"Dataset: {args.dataset}")
    print(f"Epochs: {args.epoch}")

if __name__ == '__main__':
    main()


usage: ipykernel_launcher.py [-h] --model MODEL --mode {train,test}
                             --dataset DATASET [--epoch EPOCH]
ipykernel_launcher.py: error: the following arguments are required: --model, --mode, --dataset


SystemExit: 2

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


## 4. Loss Tracking

> 훈련 손실 (Training Loss)

- `train_loss_step`: 개별 배치의 즉시 손실값
- `train_loss_epoch`: 전체 에포크에 걸친 가중 평균 손실  
- `avg_train_loss`: 모든 스텝 손실의 평균값

> 검증 손실 (Validation Loss)

- `val_loss`: 개별 검증 배치의 손실
- `val_loss_epoch`: 검증 에포크 전체의 손실
- `avg_val_loss`: 검증 손실의 평균값


## 5. Evaluation Stage

**Why Evaluation is Necessary**

1. **Overfitting Detection**
   Through evaluation metrics, we can detect overfitting where models perform well only on training data but poorly on new data.

2. **Model Comparison and Selection**
   We can select the most suitable model for a given problem among various algorithms or hyperparameter configurations.

3. **Accuracy Assurance**
   Ensures that the model can make reliable predictions in real-world environments. High accuracy indicates excellent predictive power of the model.

4. **Deployment Decision Support**
   Evaluation results provide crucial information for deciding whether to deploy the model to production environments.

**Evaluation Methodologies**

**Hold-out Validation**
The simplest method that separates the dataset into training and testing sets for evaluation. The model is built on the training set and evaluated on the test set.

**Cross-validation**
A more robust evaluation method that reduces performance variations due to data splitting approaches.


## 6. Computational Efficiency

Tools for measuring and optimizing computational efficiency.

In [None]:
import torch
import torchvision.transforms as transforms
from torch.utils.data import Dataset, DataLoader
import os
from PIL import Image
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
from tqdm import tqdm
import time

device = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else "cpu"        # device="mps"
print("using device:", device)

class CustomDataset(Dataset):
    def __init__(self, root, transform=None):
        self.root = root
        self.transform = transform
        
        self.samples = []
        self.class_to_idx = {}

        # Filter out hidden files/folders starting with ._ or .
        classes = sorted([cls for cls in os.listdir(root) 
                         if not cls.startswith('.') and os.path.isdir(os.path.join(root, cls))])
        self.class_to_idx = {cls_name: idx for idx, cls_name in enumerate(classes)}

        for cls_name in classes:
            cls_folder = os.path.join(root, cls_name)
            for fname in os.listdir(cls_folder):
                # Skip files starting with ._
                if fname.startswith('.'):
                    continue
                    
                if fname.lower().endswith(('.png', '.jpg', '.jpeg')):
                    path = os.path.join(cls_folder, fname)
                    self.samples.append((path, self.class_to_idx[cls_name]))
    def __len__(self):
        return len(self.samples)

    def __getitem__(self, idx):
        # print(self.class_to_idx)
        path, label = self.samples[idx]
        img = Image.open(path)
        if self.transform:
            img = self.transform(img)
        return img, label

class CustomModel(nn.Module):
    def __init__(self, num_classes=10):  # MODIFIED: Made number of classes as parameter
        super(CustomModel, self).__init__()
        
        # MODIFIED: Improved with deeper network architecture
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.bn1 = nn.BatchNorm2d(32)  # MODIFIED: Added Batch Normalization
        
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm2d(64)  # MODIFIED: Added Batch Normalization
        
        # MODIFIED: Additional convolutional layer
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.bn3 = nn.BatchNorm2d(128)
        
        self.pool = nn.MaxPool2d(2, 2)
        
        # MODIFIED: Adjusted dropout rate and applied to multiple locations
        self.dropout1 = nn.Dropout(0.3)
        self.dropout2 = nn.Dropout(0.3)
        
        # MODIFIED: Increased fully connected layer size
        self.fc1 = nn.Linear(128 * 4 * 4, 512)  # 32x32 -> 16x16 -> 8x8 -> 4x4
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, num_classes)

    def forward(self, x):
        # MODIFIED: Improved forward pass with Batch Normalization
        x = self.pool(F.relu(self.bn1(self.conv1(x))))  # 32x32 -> 16x16
        x = self.pool(F.relu(self.bn2(self.conv2(x))))  # 16x16 -> 8x8
        x = self.pool(F.relu(self.bn3(self.conv3(x))))  # 8x8 -> 4x4
        
        x = self.dropout1(x)
        x = x.view(-1, 128 * 4 * 4)
        
        x = F.relu(self.fc1(x))
        x = self.dropout2(x)
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

class CustomModel_simple(nn.Module):
    def __init__(self, num_classes=10):
        super(CustomModel_simple, self).__init__()
        self.fc = nn.Linear(3 * 32 * 32, num_classes)

    def forward(self, x):
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        return x

# ----------------- dataset and dataloader -----------------
train_root = '../day2/cifar10_images/train'
test_root = '../day2/cifar10_images/test'

# external disk
# train_root = '/Volumes/T7 Shield/cifar10_images/train'
# test_root = '/Volumes/T7 Shield/cifar10_images/test'

# MODIFIED: Enhanced data augmentation techniques and added normalization
train_transform = transforms.Compose([
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomCrop(32, padding=4),  # MODIFIED: Added random crop
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))  # MODIFIED: ImageNet standard normalization
])

test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))  # MODIFIED: Added normalization
])

train_dataset = CustomDataset(train_root, transform=train_transform)
test_dataset = CustomDataset(test_root, transform=test_transform)

train_loader = DataLoader(train_dataset, batch_size=512, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1024, shuffle=True)

# ----------------- model and training setup -----------------
# MODIFIED: Calculate number of classes
num_classes = len(train_dataset.class_to_idx)
model = CustomModel(num_classes=num_classes).to(device)
# model = CustomModel_simple(num_classes=num_classes).to(device)

# ----------------- training loop -----------------
total_epochs = 2
criterion = nn.CrossEntropyLoss().to(device)

# ‼️ optimizer modi SGD to Adam for better convergence
optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)

train_losses = []  # MODIFIED: List to store training losses
compute_efficiencies = []

try:
    for epoch in range(total_epochs):
        model.train()
        # MODIFIED: added running loss for each epoch
        running_loss = 0.0
        pbar = tqdm(total=len(train_loader))
        start_time = time.time()

        for i, (images, labels) in enumerate(train_loader):             # enumerate returns both index(i) and value( (images, labels) )
            optimizer.zero_grad()
            
            images = images.to(device)
            labels = labels.to(device)

            prepare_time = time.time() - start_time
            compute_start = time.time()

            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            process_time = time.time() - compute_start
            total_time = prepare_time + process_time
            compute_efficiency = process_time / total_time
            compute_efficiencies.append(compute_efficiency)

            running_loss += loss.item()

            pbar.set_description(
                f"Epoch {epoch+1}/{total_epochs} | "
                f"Compute Eff: {compute_efficiency:.3f} | "
                f"Loss: {loss.item():.4f} | "
                f"Prepare Time: {prepare_time:.3f}s | "
                f"Process Time: {process_time:.3f}s | "
            )
            pbar.update(1)  # Manual step update
            start_time = time.time()


        pbar.close()  # Close progress bar after epoch

        avg_compute_efficiency = sum(compute_efficiencies[-len(train_loader):]) / len(train_loader)
        avg_loss = running_loss / len(train_loader)
        train_losses.append(avg_loss)

        print(f"Epoch [{epoch + 1}/{total_epochs}]  Train Loss: {avg_loss:.4f} | "
            f"Avg Compute Efficiency: {avg_compute_efficiency:.3f}")
except KeyboardInterrupt:
    print("Training interrupted by user.")
    pbar.close()


## 7. TensorBoard Integration

Logging and visualizing training progress with TensorBoard.

> Let's use util.py (external file) and import from it.

It can be used to track gradient vanishing problems.

In [16]:
import torch
import torchvision.transforms as transforms
from torch.utils.data import Dataset, DataLoader
import os
from PIL import Image
import torch.nn as nn
import torch.nn.functional as F
from tqdm import tqdm

from util import Logger

class CustomDataset(Dataset):
    def __init__(self, root, transform=None):
        self.root = root
        self.transform = transform
        
        self.samples = []
        self.class_to_idx = {}

        classes = sorted(os.listdir(root))
        self.class_to_idx = {cls_name: idx for idx, cls_name in enumerate(classes)}

        for cls_name in classes:
            cls_folder = os.path.join(root, cls_name)
            for fname in os.listdir(cls_folder):
                if fname.lower().endswith(('.png', '.jpg', '.jpeg')):
                    path = os.path.join(cls_folder, fname)
                    self.samples.append((path, self.class_to_idx[cls_name]))

    def __len__(self):
        return len(self.samples)

    def __getitem__(self, idx):
        # print(self.class_to_idx)
        path, label = self.samples[idx]
        img = Image.open(path)
        if self.transform:
            img = self.transform(img)
        return img, label

class CustomModel(nn.Module):
    def __init__(self, num_classes=10):  # MODIFIED: Made number of classes as parameter
        super(CustomModel, self).__init__()
        
        # MODIFIED: Improved with deeper network architecture
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.bn1 = nn.BatchNorm2d(32)  # MODIFIED: Added Batch Normalization
        
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm2d(64)  # MODIFIED: Added Batch Normalization
        
        # MODIFIED: Additional convolutional layer
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.bn3 = nn.BatchNorm2d(128)
        
        self.pool = nn.MaxPool2d(2, 2)
        
        # MODIFIED: Adjusted dropout rate and applied to multiple locations
        self.dropout1 = nn.Dropout(0.3)
        self.dropout2 = nn.Dropout(0.3)
        
        # MODIFIED: Increased fully connected layer size
        self.fc1 = nn.Linear(128 * 4 * 4, 512)  # 32x32 -> 16x16 -> 8x8 -> 4x4
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, num_classes)

    def forward(self, x):
        # MODIFIED: Improved forward pass with Batch Normalization
        x = self.pool(F.relu(self.bn1(self.conv1(x))))  # 32x32 -> 16x16
        x = self.pool(F.relu(self.bn2(self.conv2(x))))  # 16x16 -> 8x8
        x = self.pool(F.relu(self.bn3(self.conv3(x))))  # 8x8 -> 4x4
        
        x = self.dropout1(x)
        x = x.view(-1, 128 * 4 * 4)
        
        x = F.relu(self.fc1(x))
        x = self.dropout2(x)
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


# ----------------- dataset and dataloader -----------------
train_root = '../day2/cifar10_images/train'
test_root = '../day2/cifar10_images/test'

# MODIFIED: Enhanced data augmentation techniques and added normalization
train_transform = transforms.Compose([
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomCrop(32, padding=4),  # MODIFIED: Added random crop
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))  # MODIFIED: ImageNet standard normalization
])

test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))  # MODIFIED: Added normalization
])

train_dataset = CustomDataset(train_root, transform=train_transform)
test_dataset = CustomDataset(test_root, transform=test_transform)

train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1024, shuffle=True)

# ----------------- model and training setup -----------------
device = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else "cpu"        # device="mps"

# MODIFIED: Calculate number of classes
num_classes = len(train_dataset.class_to_idx)
model = CustomModel(num_classes=num_classes).to(device)

# ----------------- training loop -----------------
total_epochs = 50
print("using device:", device)
criterion = nn.CrossEntropyLoss().to(device)

# ‼️ optimizer modi SGD to Adam for better convergence
optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)

train_losses = []  # MODIFIED: List to store training losses

# logger
logger = Logger(log_dir='./logs')
global_step = 0

for epoch in range(total_epochs):
    model.train()
    # MODIFIED: added running loss for each epoch
    running_loss = 0.0
    
    pbar = tqdm(enumerate(train_loader), total=len(train_loader))

    for i, (images, labels) in pbar:             # enumerate returns both index(i) and value( (images, labels) )
        optimizer.zero_grad()
        
        images = images.to(device)
        labels = labels.to(device)

        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

        # logger logging
        logger.scalar_summary('Train/Loss_Step', loss.item(), global_step)

        if epoch == 0 and i == 0:
            mean = torch.tensor([0.4914, 0.4822, 0.4465]).view(3, 1, 1).to(device)
            std = torch.tensor([0.2023, 0.1994, 0.2010]).view(3, 1, 1).to(device)
            sample_images = images[:8].clone()
            sample_images = sample_images * std + mean
            sample_images = torch.clamp(sample_images, 0, 1)
        
        logger.images_summary('Train/Sample_Images', sample_images, global_step)

        pbar.set_description(f"Epoch {epoch+1}/{total_epochs} | Loss: {loss.item():.4f}")

        global_step += 1

    avg_loss = running_loss / len(train_loader)
    train_losses.append(avg_loss)

    # logger logging
    logger.scalar_summary('Train/Loss_Epoch', avg_loss, epoch)
    logger.scalar_summary('Train/Learning_Rate', optimizer.param_groups[0]['lr'], epoch)

    # loggger logging parameters and gradients
    if epoch % 5 == 0:
        for name, param in model.named_parameters():
            if param.grad is not None:
                logger.histo_summary(f'Parameters/{name}', param.data, epoch)
                logger.histo_summary(f'Gradients/{name}', param.grad.data, epoch)

    print(f"Epoch [{epoch + 1}/{total_epochs}]  Train Loss: {avg_loss:.4f}")


using device: mps


Epoch 1/50 | Loss: 1.1249: 100%|██████████| 391/391 [00:23<00:00, 16.81it/s]


Epoch [1/50]  Train Loss: 1.5181


Epoch 2/50 | Loss: 1.1498: 100%|██████████| 391/391 [00:22<00:00, 17.46it/s]


Epoch [2/50]  Train Loss: 1.1771


Epoch 3/50 | Loss: 0.8370: 100%|██████████| 391/391 [00:23<00:00, 16.76it/s]


Epoch [3/50]  Train Loss: 1.0427


Epoch 4/50 | Loss: 0.7313: 100%|██████████| 391/391 [00:22<00:00, 17.73it/s]


Epoch [4/50]  Train Loss: 0.9593


Epoch 5/50 | Loss: 1.0640: 100%|██████████| 391/391 [00:22<00:00, 17.40it/s]


Epoch [5/50]  Train Loss: 0.9107


Epoch 6/50 | Loss: 1.1872: 100%|██████████| 391/391 [00:23<00:00, 16.31it/s]


Epoch [6/50]  Train Loss: 0.8704


Epoch 7/50 | Loss: 1.0390: 100%|██████████| 391/391 [00:24<00:00, 16.02it/s]


Epoch [7/50]  Train Loss: 0.8302


Epoch 8/50 | Loss: 0.9002: 100%|██████████| 391/391 [00:25<00:00, 15.12it/s]


Epoch [8/50]  Train Loss: 0.7984


Epoch 9/50 | Loss: 0.7237: 100%|██████████| 391/391 [00:20<00:00, 19.36it/s]


Epoch [9/50]  Train Loss: 0.7806


Epoch 10/50 | Loss: 0.6132: 100%|██████████| 391/391 [00:17<00:00, 22.95it/s]


Epoch [10/50]  Train Loss: 0.7492


Epoch 11/50 | Loss: 0.8217: 100%|██████████| 391/391 [00:17<00:00, 21.88it/s]


Epoch [11/50]  Train Loss: 0.7341


Epoch 12/50 | Loss: 0.6123: 100%|██████████| 391/391 [00:18<00:00, 20.62it/s]


Epoch [12/50]  Train Loss: 0.7113


Epoch 13/50 | Loss: 0.6913: 100%|██████████| 391/391 [00:19<00:00, 20.28it/s]


Epoch [13/50]  Train Loss: 0.7012


Epoch 14/50 | Loss: 0.6913: 100%|██████████| 391/391 [00:19<00:00, 19.56it/s]


Epoch [14/50]  Train Loss: 0.6800


Epoch 15/50 | Loss: 0.5594: 100%|██████████| 391/391 [00:21<00:00, 18.54it/s]


Epoch [15/50]  Train Loss: 0.6718


Epoch 16/50 | Loss: 0.6538: 100%|██████████| 391/391 [00:21<00:00, 18.55it/s]


Epoch [16/50]  Train Loss: 0.6574


Epoch 17/50 | Loss: 0.6174: 100%|██████████| 391/391 [00:21<00:00, 17.98it/s]


Epoch [17/50]  Train Loss: 0.6420


Epoch 18/50 | Loss: 0.5780: 100%|██████████| 391/391 [00:22<00:00, 17.40it/s]


Epoch [18/50]  Train Loss: 0.6305


Epoch 19/50 | Loss: 0.6982: 100%|██████████| 391/391 [00:22<00:00, 17.52it/s]


Epoch [19/50]  Train Loss: 0.6207


Epoch 20/50 | Loss: 0.6058: 100%|██████████| 391/391 [00:22<00:00, 17.31it/s]


Epoch [20/50]  Train Loss: 0.6143


Epoch 21/50 | Loss: 0.8273: 100%|██████████| 391/391 [00:20<00:00, 19.45it/s]


Epoch [21/50]  Train Loss: 0.5987


Epoch 22/50 | Loss: 0.6465: 100%|██████████| 391/391 [00:20<00:00, 18.70it/s]


Epoch [22/50]  Train Loss: 0.5917


Epoch 23/50 | Loss: 0.5781: 100%|██████████| 391/391 [00:26<00:00, 14.99it/s]


Epoch [23/50]  Train Loss: 0.5840


Epoch 24/50 | Loss: 0.7967: 100%|██████████| 391/391 [00:22<00:00, 17.54it/s]


Epoch [24/50]  Train Loss: 0.5781


Epoch 25/50 | Loss: 0.4022: 100%|██████████| 391/391 [00:23<00:00, 17.00it/s]


Epoch [25/50]  Train Loss: 0.5706


Epoch 26/50 | Loss: 1.0024: 100%|██████████| 391/391 [00:22<00:00, 17.13it/s]


Epoch [26/50]  Train Loss: 0.5643


Epoch 27/50 | Loss: 0.6077: 100%|██████████| 391/391 [00:23<00:00, 16.83it/s]


Epoch [27/50]  Train Loss: 0.5586


Epoch 28/50 | Loss: 0.5017: 100%|██████████| 391/391 [00:23<00:00, 16.82it/s]


Epoch [28/50]  Train Loss: 0.5486


Epoch 29/50 | Loss: 0.5423: 100%|██████████| 391/391 [00:25<00:00, 15.38it/s]


Epoch [29/50]  Train Loss: 0.5488


Epoch 30/50 | Loss: 0.7128: 100%|██████████| 391/391 [00:27<00:00, 14.33it/s]


Epoch [30/50]  Train Loss: 0.5415


Epoch 31/50 | Loss: 0.4670:  48%|████▊     | 187/391 [00:13<00:14, 13.92it/s]


KeyboardInterrupt: 

## 8. Additional Features

Network initialization, schedulers, checkpointing, and GIF creation.

### Network Initialization

$$
a = \text{gain} \times \sqrt{\frac{6}{\text{fan\_in} + \text{fan\_out}}}
$$


In [None]:
import torch
import torch.nn as nn

w = torch.empty(3, 5)

print("Xavier(Glorot) Normal Initialization:")
nn.init.xavier_normal_(w, gain=0.2)
print(w)

print("Xavier(Glorot) Uniform Initialization:")
nn.init.xavier_uniform_(w, gain=1.0)
print(w)

print("\nKaiming(He) Normal Initialization (fan_in, relu):")
nn.init.kaiming_normal_(w, a=0, mode='fan_in', nonlinearity='relu')
print(w)

print("\nKaiming(He) Uniform Initialization (fan_in, relu):")
nn.init.kaiming_uniform_(w, a=0, mode='fan_in', nonlinearity='relu')
print(w)

print("\nOrthogonal Initialization:")
nn.init.orthogonal_(w, gain=0.2)
print(w)

In [None]:
import torch
import torch.nn as nn
import torch.nn.init as init
import math

class NetworkInitializer:
    """Utility class for different network initialization strategies"""
    
    @staticmethod
    def xavier_normal(module):
        """Xavier/Glorot normal initialization"""
        if isinstance(module, (nn.Linear, nn.Conv2d)):
            init.xavier_normal_(module.weight)
            if module.bias is not None:
                init.constant_(module.bias, 0)
    
    @staticmethod
    def xavier_uniform(module):
        """Xavier/Glorot uniform initialization"""
        if isinstance(module, (nn.Linear, nn.Conv2d)):
            init.xavier_uniform_(module.weight)
            if module.bias is not None:
                init.constant_(module.bias, 0)
    
    @staticmethod
    def kaiming_normal(module):
        """Kaiming/He normal initialization"""
        if isinstance(module, (nn.Linear, nn.Conv2d)):
            init.kaiming_normal_(module.weight, mode='fan_out', nonlinearity='relu')
            if module.bias is not None:
                init.constant_(module.bias, 0)
    
    @staticmethod
    def kaiming_uniform(module):
        """Kaiming/He uniform initialization"""
        if isinstance(module, (nn.Linear, nn.Conv2d)):
            init.kaiming_uniform_(module.weight, mode='fan_out', nonlinearity='relu')
            if module.bias is not None:
                init.constant_(module.bias, 0)
    
    @staticmethod
    def normal_init(module, mean=0.0, std=0.01):
        """Normal initialization with specified mean and std"""
        if isinstance(module, (nn.Linear, nn.Conv2d)):
            init.normal_(module.weight, mean, std)
            if module.bias is not None:
                init.constant_(module.bias, 0)
    
    @staticmethod
    def orthogonal_init(module):
        """Orthogonal initialization"""
        if isinstance(module, (nn.Linear, nn.Conv2d)):
            init.orthogonal_(module.weight)
            if module.bias is not None:
                init.constant_(module.bias, 0)

def initialize_model(model, init_type='kaiming_normal'):
    """Initialize model with specified initialization type"""
    init_functions = {
        'xavier_normal': NetworkInitializer.xavier_normal,
        'xavier_uniform': NetworkInitializer.xavier_uniform,
        'kaiming_normal': NetworkInitializer.kaiming_normal,
        'kaiming_uniform': NetworkInitializer.kaiming_uniform,
        'normal': NetworkInitializer.normal_init,
        'orthogonal': NetworkInitializer.orthogonal_init
    }
    
    if init_type not in init_functions:
        raise ValueError(f"Unknown initialization type: {init_type}")
    
    model.apply(init_functions[init_type])
    print(f"Model initialized with {init_type}")
    return model

# Example usage
class ExampleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 32, 3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(64 * 8 * 8, 128)
        self.fc2 = nn.Linear(128, 10)
        self.dropout = nn.Dropout(0.5)
    
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 64 * 8 * 8)
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x

# Test different initializations
for init_type in ['xavier_normal', 'kaiming_normal', 'orthogonal']:
    model = ExampleNet()
    model = initialize_model(model, init_type)
    
    # Check weight statistics
    conv1_weights = model.conv1.weight.data
    fc1_weights = model.fc1.weight.data
    
    print(f"  Conv1 weights - Mean: {conv1_weights.mean():.6f}, Std: {conv1_weights.std():.6f}")
    print(f"  FC1 weights - Mean: {fc1_weights.mean():.6f}, Std: {fc1_weights.std():.6f}\n")

### Learning Rate Schedulers

In [None]:
import torch
import torch.optim as optim
import torch.optim.lr_scheduler as lr_scheduler
import matplotlib.pyplot as plt
import numpy as np

def demonstrate_schedulers():
    """Demonstrate different learning rate schedulers"""
    
    # Create a dummy model and optimizer
    model = nn.Linear(10, 1)
    initial_lr = 0.1
    epochs = 50
    
    schedulers_config = {
        'StepLR': {
            'optimizer': optim.SGD(model.parameters(), lr=initial_lr),
            'scheduler_class': lr_scheduler.StepLR,
            'scheduler_kwargs': {'step_size': 10, 'gamma': 0.5}
        },
        'ExponentialLR': {
            'optimizer': optim.SGD(model.parameters(), lr=initial_lr),
            'scheduler_class': lr_scheduler.ExponentialLR,
            'scheduler_kwargs': {'gamma': 0.95}
        },
        'CosineAnnealingLR': {
            'optimizer': optim.SGD(model.parameters(), lr=initial_lr),
            'scheduler_class': lr_scheduler.CosineAnnealingLR,
            'scheduler_kwargs': {'T_max': epochs}
        },
        'ReduceLROnPlateau': {
            'optimizer': optim.SGD(model.parameters(), lr=initial_lr),
            'scheduler_class': lr_scheduler.ReduceLROnPlateau,
            'scheduler_kwargs': {'mode': 'min', 'factor': 0.5, 'patience': 5}
        }
    }
    
    results = {}
    
    for name, config in schedulers_config.items():
        # Recreate optimizer for each scheduler
        optimizer = optim.SGD(model.parameters(), lr=initial_lr)
        scheduler = config['scheduler_class'](optimizer, **config['scheduler_kwargs'])
        
        learning_rates = []
        
        for epoch in range(epochs):
            # Record current learning rate
            current_lr = optimizer.param_groups[0]['lr']
            learning_rates.append(current_lr)
            
            # Simulate training step
            optimizer.zero_grad()
            # Dummy loss
            loss = torch.tensor(1.0 - epoch * 0.01 + 0.1 * np.sin(epoch * 0.3))
            
            # Step scheduler
            if name == 'ReduceLROnPlateau':
                scheduler.step(loss)
            else:
                scheduler.step()
        
        results[name] = learning_rates
    
    # Plot results
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    axes = axes.flatten()
    
    for i, (name, lrs) in enumerate(results.items()):
        axes[i].plot(range(epochs), lrs, linewidth=2, marker='o', markersize=3)
        axes[i].set_title(f'{name}')
        axes[i].set_xlabel('Epoch')
        axes[i].set_ylabel('Learning Rate')
        axes[i].grid(True, alpha=0.3)
        axes[i].set_yscale('log')
    
    plt.tight_layout()
    plt.show()
    
    return results

# Custom scheduler example
class WarmupCosineScheduler:
    """Custom scheduler with warmup and cosine annealing"""
    
    def __init__(self, optimizer, warmup_epochs, total_epochs, min_lr=1e-6):
        self.optimizer = optimizer
        self.warmup_epochs = warmup_epochs
        self.total_epochs = total_epochs
        self.min_lr = min_lr
        self.base_lr = optimizer.param_groups[0]['lr']
        self.current_epoch = 0
    
    def step(self):
        if self.current_epoch < self.warmup_epochs:
            # Warmup phase
            lr = self.base_lr * (self.current_epoch + 1) / self.warmup_epochs
        else:
            # Cosine annealing phase
            progress = (self.current_epoch - self.warmup_epochs) / (self.total_epochs - self.warmup_epochs)
            lr = self.min_lr + (self.base_lr - self.min_lr) * 0.5 * (1 + math.cos(math.pi * progress))
        
        for param_group in self.optimizer.param_groups:
            param_group['lr'] = lr
        
        self.current_epoch += 1
    
    def get_lr(self):
        return [param_group['lr'] for param_group in self.optimizer.param_groups]

# Test custom scheduler
model = nn.Linear(10, 1)
optimizer = optim.SGD(model.parameters(), lr=0.1)
custom_scheduler = WarmupCosineScheduler(optimizer, warmup_epochs=10, total_epochs=50)

custom_lrs = []
for epoch in range(50):
    custom_lrs.append(optimizer.param_groups[0]['lr'])
    custom_scheduler.step()

# Plot custom scheduler
plt.figure(figsize=(10, 6))
plt.plot(range(50), custom_lrs, linewidth=2, marker='o', markersize=3)
plt.title('Custom Warmup + Cosine Annealing Scheduler')
plt.xlabel('Epoch')
plt.ylabel('Learning Rate')
plt.grid(True, alpha=0.3)
plt.axvline(x=10, color='red', linestyle='--', alpha=0.7, label='End of Warmup')
plt.legend()
plt.show()

# Demonstrate all schedulers
print("Learning Rate Scheduler Comparison:")
scheduler_results = demonstrate_schedulers()

### Model Checkpointing

In [None]:
checkpoint = {
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'epoch': epoch,
    # 기타 정보
}
torch.save(checkpoint, 'checkpoint.pth')

model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])

In [None]:
import torch
import os
import glob
from datetime import datetime
import pickle

class ModelCheckpoint:
    """Comprehensive model checkpointing utility"""
    
    def __init__(self, save_dir='checkpoints', save_best=True, save_last=True, 
                 monitor='val_loss', mode='min', save_top_k=3):
        self.save_dir = save_dir
        self.save_best = save_best
        self.save_last = save_last
        self.monitor = monitor
        self.mode = mode
        self.save_top_k = save_top_k
        
        # Create save directory
        os.makedirs(save_dir, exist_ok=True)
        
        # Track best metrics
        self.best_metric = float('inf') if mode == 'min' else float('-inf')
        self.saved_checkpoints = []
    
    def save_checkpoint(self, model, optimizer, scheduler, epoch, metrics, 
                       additional_state=None, filename=None):
        """Save model checkpoint"""
        
        if filename is None:
            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
            filename = f"checkpoint_epoch_{epoch:03d}_{timestamp}.pth"
        
        filepath = os.path.join(self.save_dir, filename)
        
        # Prepare checkpoint data
        checkpoint = {
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'scheduler_state_dict': scheduler.state_dict() if scheduler else None,
            'metrics': metrics,
            'timestamp': datetime.now().isoformat()
        }
        
        if additional_state:
            checkpoint['additional_state'] = additional_state
        
        # Save checkpoint
        torch.save(checkpoint, filepath)
        
        # Check if this is the best model
        current_metric = metrics.get(self.monitor)
        is_best = False
        
        if current_metric is not None:
            if ((self.mode == 'min' and current_metric < self.best_metric) or
                (self.mode == 'max' and current_metric > self.best_metric)):
                self.best_metric = current_metric
                is_best = True
                
                if self.save_best:
                    best_filepath = os.path.join(self.save_dir, 'best_model.pth')
                    torch.save(checkpoint, best_filepath)
                    print(f"New best model saved: {self.monitor} = {current_metric:.6f}")
        
        # Save last model
        if self.save_last:
            last_filepath = os.path.join(self.save_dir, 'last_model.pth')
            torch.save(checkpoint, last_filepath)
        
        # Manage top-k checkpoints
        if current_metric is not None:
            self.saved_checkpoints.append({
                'filepath': filepath,
                'metric': current_metric,
                'epoch': epoch
            })
            
            # Sort by metric and keep only top-k
            reverse = (self.mode == 'max')
            self.saved_checkpoints.sort(key=lambda x: x['metric'], reverse=reverse)
            
            if len(self.saved_checkpoints) > self.save_top_k:
                # Remove worst checkpoint file
                worst_checkpoint = self.saved_checkpoints.pop()
                if os.path.exists(worst_checkpoint['filepath']):
                    os.remove(worst_checkpoint['filepath'])
                    print(f"Removed checkpoint: {worst_checkpoint['filepath']}")
        
        print(f"Checkpoint saved: {filepath}")
        return filepath, is_best
    
    def load_checkpoint(self, filepath, model, optimizer=None, scheduler=None):
        """Load model checkpoint"""
        if not os.path.exists(filepath):
            raise FileNotFoundError(f"Checkpoint file not found: {filepath}")
        
        checkpoint = torch.load(filepath, map_location='cpu')
        
        # Load model state
        model.load_state_dict(checkpoint['model_state_dict'])
        
        # Load optimizer state
        if optimizer and 'optimizer_state_dict' in checkpoint:
            optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
        
        # Load scheduler state
        if scheduler and 'scheduler_state_dict' in checkpoint and checkpoint['scheduler_state_dict']:
            scheduler.load_state_dict(checkpoint['scheduler_state_dict'])
        
        print(f"Checkpoint loaded: {filepath}")
        print(f"Epoch: {checkpoint['epoch']}, Metrics: {checkpoint['metrics']}")
        
        return checkpoint
    
    def get_best_checkpoint_path(self):
        """Get path to best checkpoint"""
        best_path = os.path.join(self.save_dir, 'best_model.pth')
        return best_path if os.path.exists(best_path) else None
    
    def get_last_checkpoint_path(self):
        """Get path to last checkpoint"""
        last_path = os.path.join(self.save_dir, 'last_model.pth')
        return last_path if os.path.exists(last_path) else None
    
    def list_checkpoints(self):
        """List all checkpoints in save directory"""
        pattern = os.path.join(self.save_dir, '*.pth')
        checkpoints = glob.glob(pattern)
        return sorted(checkpoints)

# Example usage
def demo_checkpointing():
    """Demonstrate checkpointing functionality"""
    
    # Create a simple model
    model = nn.Sequential(
        nn.Linear(10, 50),
        nn.ReLU(),
        nn.Linear(50, 1)
    )
    
    optimizer = optim.Adam(model.parameters(), lr=0.01)
    scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.5)
    
    # Initialize checkpoint manager
    checkpoint_manager = ModelCheckpoint(
        save_dir='demo_checkpoints',
        monitor='val_loss',
        mode='min',
        save_top_k=3
    )
    
    # Simulate training with checkpointing
    for epoch in range(10):
        # Simulate training
        train_loss = 1.0 * np.exp(-epoch/5) + 0.1 * np.random.random()
        val_loss = train_loss + 0.1 + 0.1 * np.random.random()
        
        metrics = {
            'train_loss': train_loss,
            'val_loss': val_loss,
            'epoch': epoch
        }
        
        # Save checkpoint
        additional_state = {'random_seed': 42, 'epoch_data': f'data_for_epoch_{epoch}'}
        checkpoint_manager.save_checkpoint(
            model, optimizer, scheduler, epoch, metrics, additional_state
        )
        
        scheduler.step()
    
    # List all checkpoints
    print("\nAll checkpoints:")
    for cp in checkpoint_manager.list_checkpoints():
        print(f"  {cp}")
    
    # Test loading best checkpoint
    best_path = checkpoint_manager.get_best_checkpoint_path()
    if best_path:
        print(f"\nLoading best checkpoint: {best_path}")
        
        # Create new model instance
        new_model = nn.Sequential(
            nn.Linear(10, 50),
            nn.ReLU(),
            nn.Linear(50, 1)
        )
        new_optimizer = optim.Adam(new_model.parameters(), lr=0.01)
        new_scheduler = optim.lr_scheduler.StepLR(new_optimizer, step_size=5, gamma=0.5)
        
        loaded_checkpoint = checkpoint_manager.load_checkpoint(
            best_path, new_model, new_optimizer, new_scheduler
        )
        
        print(f"Additional state: {loaded_checkpoint.get('additional_state')}")

# Run the demo
demo_checkpointing()

### GIF Creation for Visualization

## Summary

This notebook has demonstrated essential tools and techniques for machine learning development:

1. **tqdm** - Progress tracking for training loops
2. **Device Management** - Proper GPU/CPU handling
3. **argparse** - Command line argument parsing
4. **Loss Tracking** - Comprehensive metrics monitoring
5. **Evaluation** - Structured model evaluation
6. **Computational Efficiency** - Performance profiling
7. **TensorBoard** - Training visualization and logging
8. **Additional Features** - Network initialization, schedulers, checkpointing, and GIF creation

Each section provides practical, reusable code that can be integrated into your machine learning projects. The examples are designed to be simple yet comprehensive, allowing you to understand the core concepts and extend them for your specific needs.

**Next Steps:**
- Integrate these tools into your existing ML pipeline
- Customize the configurations based on your project requirements
- Experiment with different initialization strategies and schedulers
- Use TensorBoard to monitor and debug your training process
- Create visualizations to better understand your model's behavior

**Author:** DuhyeonKim + Perplexity (Claude 4)