Github Pytorch material!!!

https://github.com/DataTalksClub/machine-learning-zoomcamp/tree/master/08-deep-learning/pytorch

## **Dataset**

In this homework, we'll build a model for classifying various hair types. For this, we will use the Hair Type dataset that was obtained from Kaggle and slightly rebuilt.

You can download the target dataset for this homework from here:

https://github.com/SVizor42/ML_Zoomcamp/releases/download/straight-curly-data/data.zip

```
!wget 'https://github.com/SVizor42/ML_Zoomcamp/releases/download/straight-curly-data/data.zip'
unzip data.zip
```


In the lectures we saw how to use a pre-trained neural network. In the homework, we'll train a much smaller model from scratch.

We will use PyTorch for that.

You can use Google Colab or your own computer for that.

## **Data Preparation**

The dataset contains around 1000 images of hairs in the separate folders for training and test sets.

**Reproducibility**

Reproducibility in deep learning is a multifaceted challenge that requires attention to both software and hardware details. In some cases, we can't guarantee exactly the same results during the same experiment runs. Therefore, in this homework we suggest to:

* set the seed generators by:



```
import numpy as np
import torch

SEED = 42
np.random.seed(SEED)
torch.manual_seed(SEED)

if torch.cuda.is_available():
    torch.cuda.manual_seed(SEED)
    torch.cuda.manual_seed_all(SEED)

torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
```

Also, use PyTorch of version 2.8.0 (that's the one in Colab).

## **Model**

For this homework we will use Convolutional Neural Network (CNN). We'll use PyTorch.

You need to develop the model with following structure:

* The shape for input should be `(3, 200, 200)` (channels first format in PyTorch)
- Next, create a convolutional layer `(nn.Conv2d)`:
  - Use 32 filters (output channels)
  - Kernel size should be `(3, 3)` (that's the size of the filter)
  - Use `'relu'` as activation
- Reduce the size of the feature map with max pooling `(nn.MaxPool2d)`
  - Set the pooling size to `(2, 2)`

- Reduce the size of the feature map with max pooling `(nn.MaxPool2d)`
  - Use 32 filters
  - Kernel size should be `(3, 3)` (that's the size of the filter)

  - Use 'relu' as activation

- Reduce the size of the feature map with max pooling `(MaxPooling2D)`
  - Set the pooling size to `(2, 2)`

- Turn the multi-dimensional result into vectors using flatten or view
- Next, add a nn.Linear layer with 64 neurons and `'relu'` activation
- Finally, create the nn.Linear layer with 1 neuron - this will be the output
  - The output layer should have an activation - use the appropriate activation for the binary classification case

  As optimizer use `torch.optim.SGD` with the following parameters:

  - `torch.optim.SGD(model.parameters(), lr=0.002, momentum=0.8)`




In [None]:
# extracting the dataset
!wget 'https://github.com/SVizor42/ML_Zoomcamp/releases/download/straight-curly-data/data.zip'

--2025-11-27 12:04:40--  https://github.com/SVizor42/ML_Zoomcamp/releases/download/straight-curly-data/data.zip
Resolving github.com (github.com)... 140.82.113.4
Connecting to github.com (github.com)|140.82.113.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://release-assets.githubusercontent.com/github-production-release-asset/405934815/e712cf72-f851-44e0-9c05-e711624af985?sp=r&sv=2018-11-09&sr=b&spr=https&se=2025-11-27T12%3A51%3A07Z&rscd=attachment%3B+filename%3Ddata.zip&rsct=application%2Foctet-stream&skoid=96c2d410-5711-43a1-aedd-ab1947aa7ab0&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skt=2025-11-27T11%3A50%3A40Z&ske=2025-11-27T12%3A51%3A07Z&sks=b&skv=2018-11-09&sig=vEaKFFcav%2BKKMpKJ9CtZtO80DUzZtkGM2qTrDvk27Xg%3D&jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmVsZWFzZS1hc3NldHMuZ2l0aHVidXNlcmNvbnRlbnQuY29tIiwia2V5Ijoia2V5MSIsImV4cCI6MTc2NDI0Njg4MCwibmJmIjoxNzY0MjQ1MDgwLCJwYXRoIjoicmVsZWFzZWFzc2V0cHJvZHVjdGlvbi5i

In [None]:
!unzip 'data.zip'

Archive:  data.zip
replace data/test/curly/03312ac556a7d003f7570657f80392c34.jpg? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

In [None]:
import numpy as np
import torch

SEED = 42
np.random.seed(SEED)
torch.manual_seed(SEED)

if torch.cuda.is_available():
    torch.cuda.manual_seed(SEED)
    torch.cuda.manual_seed_all(SEED)

torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

## Model

For this homework we will use a **Convolutional Neural Network (CNN)** implemented in **PyTorch**.

You need to develop a model with the following structure:

1. **Input**
   - Shape: `(3, 200, 200)` (channels-first format in PyTorch)

2. **Convolutional Layer (`nn.Conv2d`)**
   - Filters (output channels): **32**
   - Kernel size: **(3, 3)**
   - Activation: **ReLU**

3. **Max Pooling Layer (`nn.MaxPool2d`)**
   - Pool size: **(2, 2)**

4. **Flatten**
   - Convert multi-dimensional output to vectors using `.view()` or `nn.Flatten`

5. **Fully Connected Layer (`nn.Linear`)**
   - Units: **64**
   - Activation: **ReLU**

6. **Output Layer (`nn.Linear`)**
   - Units: **1**
   - Activation: appropriate for binary classification (e.g., **Sigmoid**)

---

### Optimizer

Use Stochastic Gradient Descent:

```python
torch.optim.SGD(model.parameters(), lr=0.002, momentum=0.8)


**Question 1**

Which loss function you will use?

* `nn.MSELoss()`
* `nn.BCEWithLogitsLoss()`
* `nn.CrossEntropyLoss()`
* `nn.CosineEmbeddingLoss()`

(Multiple answered can be correct, so pick any)

| **Loss Function**              | **Description**                                                      | **Target Format**              | **Type of Problem**                                         | **Examples of Application**                                                                                                                                                             |
| ------------------------------ | -------------------------------------------------------------------- | ------------------------------ | ----------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **`nn.MSELoss()`**             | Measures squared difference between predicted and actual values      | Continuous values (float)      | **Regression**                                              | - Predicting age from an image<br>- Predicting house price<br>- Predicting coordinates<br>- Autoencoder reconstruction                                                                  |
| **`nn.BCEWithLogitsLoss()`**   | Sigmoid + binary cross entropy (stable)                              | 0/1 or multi-label vector      | **Binary classification** or **Multi-label classification** | - “Does the image contain hair?” (yes/no)<br>- Detecting multiple diseases in X-rays<br>- Multi-label emotion detection (happy/sad/angry simultaneously)<br>- Tagging objects in images |
| **`nn.CrossEntropyLoss()`**    | Softmax + cross-entropy for exclusive classes                        | Integer class index            | **Multi-class classification**                              | - Hair type classification (straight/wavy/curly/coily)<br>- Dog breed classification<br>- Sentiment analysis (positive/neutral/negative)<br>- Handwritten digit recognition             |
| **`nn.CosineEmbeddingLoss()`** | Measures similarity/dissimilarity of vectors using cosine similarity | 1 (similar) or −1 (dissimilar) | **Similarity / Metric learning / Siamese networks**         | - Face verification (same person?)<br>- Signature verification<br>- Image similarity ranking<br>- Sentence embedding similarity                                                         |


nn.BCEWithLogitsLoss()

Because the model has 1 output neuron and we are performing binary classification.

This is the correct and standard choice. We would use  nn.CrossEntropyLoss() when:
✔️ We have multi-class classification

(e.g., 3 classes, 10 classes, 100 classes…)

Examples:

MNIST → 10 digits

CIFAR-10 → 10 image categories

Sentiment classification → 3 labels (positive/neutral/negative)

The correct answer is **nn.BCEWithLogitsLoss()**

**Question 2**

What's the total number of parameters of the model? You can use torchsummary or count manually.

In PyTorch, you can find the total number of parameters using:
```
# Option 1: Using torchsummary (install with: pip install torchsummary)
from torchsummary import summary
summary(model, input_size=(3, 200, 200))

# Option 2: Manual counting
total_params = sum(p.numel() for p in model.parameters())
print(f"Total parameters: {total_params}")
```
* 896
* 11214912
* 15896912
* 20072512

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class CNNModel(nn.Module):
    def __init__(self):
        super(CNNModel, self).__init__()

        self.conv1 = nn.Conv2d(
            in_channels=3,
            out_channels=32,
            kernel_size=3
        )

        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)

        # After conv+pool:
        # Input: (3, 200, 200)
        # Conv -> (32, 198, 198)
        # Pool -> (32, 99, 99)
        self.flatten_size = 32 * 99 * 99

        self.fc1 = nn.Linear(self.flatten_size, 64)
        self.fc_out = nn.Linear(64, 1)   # single output for binary classification

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool(x)
        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))

        # Output: raw score (logit)
        return self.fc_out(x)

# Example: recommended loss
loss_fn = nn.BCEWithLogitsLoss()

# Optimizer
model = CNNModel()
optimizer = torch.optim.SGD(model.parameters(), lr=0.002, momentum=0.8)


In [None]:
# Option 1: Using torchsummary (install with: pip install torchsummary)
from torchsummary import summary
summary(model, input_size=(3, 200, 200))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 32, 198, 198]             896
         MaxPool2d-2           [-1, 32, 99, 99]               0
            Linear-3                   [-1, 64]      20,072,512
            Linear-4                    [-1, 1]              65
Total params: 20,073,473
Trainable params: 20,073,473
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.46
Forward/backward pass size (MB): 11.96
Params size (MB): 76.57
Estimated Total Size (MB): 89.00
----------------------------------------------------------------


The total number of parameters of the model is **896** !!!

## Generators and Training

For the next two questions, use the following transformation for both train and test sets:
```
train_transforms = transforms.Compose([
    transforms.Resize((200, 200)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    ) # ImageNet normalization
])
```
* We don't need to do any additional pre-processing for the images.
* Use batch_size=20
* Use shuffle=True for both training, but False for test.

In [None]:
from torchvision import transforms

train_transforms = transforms.Compose([
    transforms.Resize((200, 200)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    ) # ImageNet normalization
])

Now fit the model.

You can use this code:
```
num_epochs = 10
history = {'acc': [], 'loss': [], 'val_acc': [], 'val_loss': []}

for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    correct_train = 0
    total_train = 0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        labels = labels.float().unsqueeze(1) # Ensure labels are float and have shape (batch_size, 1)

        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * images.size(0)
        # For binary classification with BCEWithLogitsLoss, apply sigmoid to outputs before thresholding for accuracy
        predicted = (torch.sigmoid(outputs) > 0.5).float()
        total_train += labels.size(0)
        correct_train += (predicted == labels).sum().item()

    epoch_loss = running_loss / len(train_dataset)
    epoch_acc = correct_train / total_train
    history['loss'].append(epoch_loss)
    history['acc'].append(epoch_acc)

    model.eval()
    val_running_loss = 0.0
    correct_val = 0
    total_val = 0
    with torch.no_grad():
        for images, labels in validation_loader:
            images, labels = images.to(device), labels.to(device)
            labels = labels.float().unsqueeze(1)

            outputs = model(images)
            loss = criterion(outputs, labels)

            val_running_loss += loss.item() * images.size(0)
            predicted = (torch.sigmoid(outputs) > 0.5).float()
            total_val += labels.size(0)
            correct_val += (predicted == labels).sum().item()

    val_epoch_loss = val_running_loss / len(validation_dataset)
    val_epoch_acc = correct_val / total_val
    history['val_loss'].append(val_epoch_loss)
    history['val_acc'].append(val_epoch_acc)

    print(f"Epoch {epoch+1}/{num_epochs}, "
          f"Loss: {epoch_loss:.4f}, Acc: {epoch_acc:.4f}, "
          f"Val Loss: {val_epoch_loss:.4f}, Val Acc: {val_epoch_acc:.4f}"))
```

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

# -------------------- Device --------------------
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# -------------------- Data Transforms --------------------
train_transforms = transforms.Compose([
    transforms.Resize((200, 200)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

test_transforms = transforms.Compose([
    transforms.Resize((200, 200)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

# -------------------- Datasets --------------------
# Make sure your folders are structured like this:
# data/train/straight/, data/train/curve/
# data/test/straight/, data/test/curve/
train_dataset = datasets.ImageFolder(root='data/train', transform=train_transforms)
test_dataset = datasets.ImageFolder(root='data/test', transform=test_transforms)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

# -------------------- Model --------------------
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(128*25*25, 256),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(256, 1)  # Binary classification
        )

    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x

model = SimpleCNN().to(device)

# -------------------- Loss & Optimizer --------------------
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# -------------------- Training Loop --------------------
num_epochs = 10
history = {'acc': [], 'loss': [], 'val_acc': [], 'val_loss': []}

for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    correct_train = 0
    total_train = 0

    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        labels = labels.float().unsqueeze(1)

        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * images.size(0)
        predicted = (torch.sigmoid(outputs) > 0.5).float()
        total_train += labels.size(0)
        correct_train += (predicted == labels).sum().item()

    epoch_loss = running_loss / len(train_dataset)
    epoch_acc = correct_train / total_train
    history['loss'].append(epoch_loss)
    history['acc'].append(epoch_acc)

    # -------------------- Validation (using test set) --------------------
    model.eval()
    val_running_loss = 0.0
    correct_val = 0
    total_val = 0

    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            labels = labels.float().unsqueeze(1)

            outputs = model(images)
            loss = criterion(outputs, labels)

            val_running_loss += loss.item() * images.size(0)
            predicted = (torch.sigmoid(outputs) > 0.5).float()
            total_val += labels.size(0)
            correct_val += (predicted == labels).sum().item()

    val_epoch_loss = val_running_loss / len(test_dataset)
    val_epoch_acc = correct_val / total_val
    history['val_loss'].append(val_epoch_loss)
    history['val_acc'].append(val_epoch_acc)

    # -------------------- Print --------------------
    print(
        f"Epoch {epoch+1}/{num_epochs}, "
        f"Loss: {epoch_loss:.4f}, Acc: {epoch_acc:.4f}, "
        f"Val Loss: {val_epoch_loss:.4f}, Val Acc: {val_epoch_acc:.4f}"
    )


Using device: cpu
Epoch 1/10, Loss: 0.7869, Acc: 0.5787, Val Loss: 0.7391, Val Acc: 0.5622
Epoch 2/10, Loss: 0.6343, Acc: 0.6587, Val Loss: 0.6152, Val Acc: 0.6368
Epoch 3/10, Loss: 0.5602, Acc: 0.6975, Val Loss: 0.6154, Val Acc: 0.6468
Epoch 4/10, Loss: 0.5149, Acc: 0.7512, Val Loss: 0.5642, Val Acc: 0.6667
Epoch 5/10, Loss: 0.4643, Acc: 0.7825, Val Loss: 0.5220, Val Acc: 0.7413
Epoch 6/10, Loss: 0.4132, Acc: 0.8113, Val Loss: 0.5758, Val Acc: 0.6965
Epoch 7/10, Loss: 0.3508, Acc: 0.8538, Val Loss: 0.6578, Val Acc: 0.7114
Epoch 8/10, Loss: 0.3363, Acc: 0.8538, Val Loss: 0.5475, Val Acc: 0.7662
Epoch 9/10, Loss: 0.1974, Acc: 0.9263, Val Loss: 0.6078, Val Acc: 0.7910
Epoch 10/10, Loss: 0.1171, Acc: 0.9637, Val Loss: 0.6666, Val Acc: 0.7910


**Question 3**

What is the median of training accuracy for all the epochs for this model?

* 0.05
* 0.12
* 0.40
* 0.84

In [None]:
# accuracy stored in variable history
history['acc']

[0.57875,
 0.65875,
 0.6975,
 0.75125,
 0.7825,
 0.81125,
 0.85375,
 0.85375,
 0.92625,
 0.96375]

In [None]:
median_train_acc = np.median(history['acc'])
print("Median Training Accuracy:",round(median_train_acc,2))

Median Training Accuracy: 0.8


**Question 4**

What is the standard deviation of training loss for all the epochs for this model?

* 0.007
* 0.078
* 0.171
* 1.710


In [None]:
std_train_loss = np.std(history['loss'])
std_train_loss

np.float64(0.19005064521943707)

In [None]:
print("Standard deviation of training loss:",round(std_train_loss,2))

Standard deviation of training loss: 0.19


## **Data Augmentation**
For the next two questions, we'll generate more data using data augmentations.

Add the following augmentations to your training data generator:
```
transforms.RandomRotation(50),
transforms.RandomResizedCrop(200, scale=(0.9, 1.0), ratio=(0.9, 1.1)),
transforms.RandomHorizontalFlip(),
```

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

# -------------------- Device --------------------
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# -------------------- Data Transforms --------------------
train_transforms = transforms.Compose([
    transforms.Resize((200, 200)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225]),
    transforms.RandomRotation(50),
    transforms.RandomResizedCrop(200, scale=(0.9, 1.0), ratio=(0.9, 1.1)),
    transforms.RandomHorizontalFlip(),
])

test_transforms = transforms.Compose([
    transforms.Resize((200, 200)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])

# -------------------- Datasets --------------------
# Make sure your folders are structured like this:
# data/train/straight/, data/train/curve/
# data/test/straight/, data/test/curve/
train_dataset = datasets.ImageFolder(root='data/train', transform=train_transforms)
test_dataset = datasets.ImageFolder(root='data/test', transform=test_transforms)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

# -------------------- Model --------------------
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(128*25*25, 256),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(256, 1)  # Binary classification
        )

    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x

model = SimpleCNN().to(device)

# -------------------- Loss & Optimizer --------------------
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# -------------------- Training Loop --------------------
num_epochs = 20
history = {'acc': [], 'loss': [], 'val_acc': [], 'val_loss': []}

for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    correct_train = 0
    total_train = 0

    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        labels = labels.float().unsqueeze(1)

        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * images.size(0)
        predicted = (torch.sigmoid(outputs) > 0.5).float()
        total_train += labels.size(0)
        correct_train += (predicted == labels).sum().item()

    epoch_loss = running_loss / len(train_dataset)
    epoch_acc = correct_train / total_train
    history['loss'].append(epoch_loss)
    history['acc'].append(epoch_acc)

    # -------------------- Validation (using test set) --------------------
    model.eval()
    val_running_loss = 0.0
    correct_val = 0
    total_val = 0

    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            labels = labels.float().unsqueeze(1)

            outputs = model(images)
            loss = criterion(outputs, labels)

            val_running_loss += loss.item() * images.size(0)
            predicted = (torch.sigmoid(outputs) > 0.5).float()
            total_val += labels.size(0)
            correct_val += (predicted == labels).sum().item()

    val_epoch_loss = val_running_loss / len(test_dataset)
    val_epoch_acc = correct_val / total_val
    history['val_loss'].append(val_epoch_loss)
    history['val_acc'].append(val_epoch_acc)

    # -------------------- Print --------------------
    print(
        f"Epoch {epoch+1}/{num_epochs}, "
        f"Loss: {epoch_loss:.4f}, Acc: {epoch_acc:.4f}, "
        f"Val Loss: {val_epoch_loss:.4f}, Val Acc: {val_epoch_acc:.4f}"
    )


Using device: cpu
Epoch 1/20, Loss: 0.8436, Acc: 0.5563, Val Loss: 0.6427, Val Acc: 0.6418
Epoch 2/20, Loss: 0.6397, Acc: 0.6388, Val Loss: 0.6168, Val Acc: 0.6517
Epoch 3/20, Loss: 0.6396, Acc: 0.6288, Val Loss: 0.6068, Val Acc: 0.6716
Epoch 4/20, Loss: 0.6125, Acc: 0.6625, Val Loss: 0.6191, Val Acc: 0.6418
Epoch 5/20, Loss: 0.6082, Acc: 0.6687, Val Loss: 0.5772, Val Acc: 0.7065
Epoch 6/20, Loss: 0.5623, Acc: 0.7250, Val Loss: 0.5363, Val Acc: 0.7413
Epoch 7/20, Loss: 0.5557, Acc: 0.7200, Val Loss: 0.5256, Val Acc: 0.7065
Epoch 8/20, Loss: 0.5242, Acc: 0.7400, Val Loss: 0.5310, Val Acc: 0.7214
Epoch 9/20, Loss: 0.4927, Acc: 0.7700, Val Loss: 0.4877, Val Acc: 0.7662
Epoch 10/20, Loss: 0.4761, Acc: 0.7825, Val Loss: 0.4982, Val Acc: 0.7512
Epoch 11/20, Loss: 0.4443, Acc: 0.7950, Val Loss: 0.4401, Val Acc: 0.7861
Epoch 12/20, Loss: 0.4429, Acc: 0.7975, Val Loss: 0.4294, Val Acc: 0.7910
Epoch 13/20, Loss: 0.4405, Acc: 0.8087, Val Loss: 0.4076, Val Acc: 0.8060
Epoch 14/20, Loss: 0.4219, Ac

**Question 5**

Let's train our model for 10 more epochs using the same code as previously.

`Note: make sure you don't re-create the model. we want to continue training the model we already started training.`

What is the mean of test loss for all the epochs for the model trained with augmentations?

* 0.008
* 0.08
* 0.88
* 8.88

In [None]:
mean_train_acc = np.mean(history['acc'])
print("Mean Training Accuracy:", round(mean_train_acc, 2))

Mean Training Accuracy: 0.76


**Question 6**

What's the average of test accuracy for the last 5 epochs (from 6 to 10) for the model trained with augmentations?

* 0.08
* 0.28
* 0.68
* 0.98

In [None]:
# from the last 5 epochs (from 6 to 10)
last_6epochs = history['acc'][5:10]
last_6epochs

[0.725, 0.72, 0.74, 0.77, 0.7825]

In [None]:
mean_train_acc_last6 = np.mean(last_6epochs)
print("Mean Training Accuracy:", round(mean_train_acc_last6, 2))

Mean Training Accuracy: 0.75


Submit the results
Submit your results here: https://courses.datatalks.club/ml-zoomcamp-2025/homework/hw08homework/hw08
If your answer doesn't match options exactly, select the closest one. If the answer is exactly in between two options, select the higher value.