# Assignment 3

### Deep Learning

_Submission Deadline: **18.01.2026**_

---

#### Submission Information

Upload your solution via the VC course. Please upload **one Zip archive** per group. This must contain:

- Your solution as a **Notebook** (an `.ipynb` file)
- An **images** folder with all your images (keep the size of the images relatively small)

Your Zip file should be named according to the following schema:

```
assignment_<assignment number>_solution_<group number>.zip
```

In this assignment, you can achieve a total of **82** points. From these points, **2.5 bonus points** for the exam are calculated as follows:

| **Points in Assignment** | **Bonus Points for Exam** |
| :-: | :-: |
| $78$ | $2.5$ |
| $66$ | $2.0$ |
| $54$ | $1.5$ |
| $41$ | $1.0$ |
| $29$ | $0.5$ |

<div class='alert alert-block alert-danger'>

##### **Important Notes**

1. **This assignment is graded. You can receive bonus points for the exam.**
2. **If it is obvious to us that a task was copied from another source and no original work was performed, we will not award bonus points. Formulate all answers in your own words!**
3. **If LLMs (such as ChatGPT or CoPilot) were used to create your submission, please indicate this in the respective places. Also note the [AI Policy](https://cogsys.uni-bamberg.de/teaching/ki-richtlinie.html).**

---

#### Install Requirements

Run the next cell to install the requirements.

In [2]:
# Installs the required packages with the currently selected Python interpreter
%pip install -U -r requirements.txt

Collecting numpy==2.3.5 (from -r requirements.txt (line 1))
  Downloading numpy-2.3.5-cp313-cp313-win_amd64.whl.metadata (60 kB)
Collecting pillow==12.0.0 (from -r requirements.txt (line 2))
  Downloading pillow-12.0.0-cp313-cp313-win_amd64.whl.metadata (9.0 kB)
Collecting torch==2.9.1 (from -r requirements.txt (line 3))
  Downloading torch-2.9.1-cp313-cp313-win_amd64.whl.metadata (30 kB)
Collecting torchvision==0.24.1 (from -r requirements.txt (line 4))
  Downloading torchvision-0.24.1-cp313-cp313-win_amd64.whl.metadata (5.9 kB)
Collecting scikit-learn==1.8.0 (from -r requirements.txt (line 6))
  Downloading scikit_learn-1.8.0-cp313-cp313-win_amd64.whl.metadata (11 kB)
Downloading numpy-2.3.5-cp313-cp313-win_amd64.whl (12.8 MB)
   ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
   --------------------- ------------------ 6.8/12.8 MB 38.4 MB/s eta 0:00:01
   ---------------------------------------- 12.8/12.8 MB 37.0 MB/s  0:00:00
Downloading pillow-12.0.0-cp313-cp313

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
opencv-python 4.12.0.88 requires numpy<2.3.0,>=2; python_version >= "3.9", but you have numpy 2.3.5 which is incompatible.
torchaudio 2.9.0 requires torch==2.9.0, but you have torch 2.9.1 which is incompatible.


---
## 1 | Data Preprocessing

_For a total of **25** points_

In this assignment, we work with the [Fruits-360 3-Body Problem](https://github.com/fruits-360/fruits-360-3-body-problem) dataset. This dataset consists of 3 classes of fruits.

### **(03.1.0)** Download Dataset

_For **5** Extra Points_

<div class='alert alert-warning'>

Please do **NOT** include the dataset in your VC submission, this only unnecessarily increases the size of your submission.

</div>

Download the dataset from GitHub at `https://github.com/fruits-360/fruits-360-3-body-problem/archive/eed2e925766e61034e910da64f9119f19c057845.zip` into the `./data` folder.
The `data` folder should then contain exactly the downloaded repository. For example, the folder `data/Test` or `data/Train` should exist directly.

Use exactly the linked dataset!

<div class='alert alert-info'>

Working on this task is completely optional. Instead of implementing the dataset download with code, you can also simply download the ZIP archive manually and unpack it correctly.

However, if you solve this task **reproducibly** using Python code, you can receive up to _5 extra points_. These will be added to your total score in this assignment and can thus be used to compensate for other mistakes.

**Reaching the total score is also possible without these _extra points_!**

</div>

<details>

<summary>Do you want to work on the task?</summary>

Download the dataset from GitHub in the following code cell.
To do this, you must download the ZIP archive, unpack it, and additionally adjust the folder structure.

Make sure your code works on Windows and macOS!

</details>

In [3]:
# TODO Download the Fruits-360 dataset

### **(03.1.1)** Define Class Labels

_For **1** Point_

**Define a list with all labels of the available classes in the following code cell**

For example, if the available classes were `Banana`, `Pear`, and `Lemon`, the list could be `['Banana', 'Pear', 'Lemon']`.

In [13]:
# TODO Define the class labels

### **(03.1.2)** Create `Dataset` Class

_For **15** Points_

**Create the `Dataset` class for the Fruits-360 dataset. The header of the class and the constructor are already given, the usage of the parameters is described in the docstring.**

**(9 Points) Constructor.**
In the constructor:
1. The root directory should be stored as an attribute.
2. A list of all image paths and their corresponding labels should be created and stored as attributes.
3. A further $10\%$ large validation dataset should be split from the set of training examples. This split should be reproducible when creating the dataset (either via `np.random.seed()` before splitting or via `random_state` if `scikit-learn` is used).
4. Depending on the `split` parameter, only one split should be contained in the dataset.
5. The `transform` and `label_transform` callables should be stored as attributes.

**(5 Points) `__getitem__`.**
In the `__getitem__` method, a single sample from the dataset should be returned depending on `key : int`, as a `tuple` consisting of the image and label.
1. Load the image using the stored path (`PIL.Image.open()`). Use `.convert('RGB')` after loading, because some images might be black and white images.
2. Get the corresponding label from the stored list.
3. Apply the `Transform` callables. You can use the respective attributes syntactically like a function here. Make sure that the default value is `None`. You should check beforehand with a short condition whether the callables are assigned.

**(1 Point) `__len__`.**
Implement the `__len__` method, which returns the number of examples in the `Dataset` as `int`.

**Note: Validation Dataset.** For datasets that are to be used as a _benchmark_, the test dataset is often provided either without labels or not at all. This is done so that the model cannot train on the test data to cheat in the leaderboards. The test dataset of the Fruits-360 dataset is used here as the final evaluation dataset.
We should break out our own _validation dataset_ from the training dataset to monitor model performance during training.

In [14]:
from typing import Callable

import torch.utils.data as tdata


class Fruits360(tdata.Dataset):
    '''
    torch dataset for the Fruits-360 dataset
    '''

    def __init__(self, root : str, split : str = 'train', transform : Callable = None, label_transform : Callable = None):
        ''' 9 Points
        Parameters
        ----------

        - root (str) : path to the to root directory of the dataset
        - split (str) : one of 'train', 'val', or 'test'. Should decide which split of the dataset to return.
        - transform (Callable) : transform callable to apply to the images
        - label_transform (Callable) : transform callable to apply to the labels
        '''

        # TODO: Implement constructor
        pass

    
    def __getitem__(self, key : int) -> tuple:
        ''' 5 Points
        '''

        # TODO: Implement __getitem__
        pass

    def __len__(self) -> int:
        ''' 2 Points
        '''
        # TODO: Implement __len__
        pass

### **(03.1.3)** Create `Transform` Classes and `Compose`

_For **6** Points_

**Implement the `Transform` callables that are passed to the dataset as the `transform` and `label_transform` parameters.**

1. **(2 Points)** For `label_transform`, create a class that converts the `str` labels from the dataframe to integer tensors. Depending on the implementation of the dataset, you can use the `codes` or `values` list containing the class names. The class must implement the `__call__` method, which defines the behavior of an object when it is called like a function.
2. **(4 Points)** For `transform`, the loaded `Image` should be converted to a tensor in PyTorch format `(c, h, w)`, and then scaled to the resolution `128x128`. Use the `transforms.Compose` container for this.

In [6]:
import torch



class ConvertLabel(object):
    ''' 2 Points
    callable object converting a label to an integer
    '''

    def __init__(self, labels):
        # TODO: Implement constructor
        pass

    def __call__(self, sample : str) -> torch.Tensor:
        # TODO: Implement __call__
        pass
    

# TODO: Implement fruits360_transforms using transforms.Compose
fruits360_transforms = None

### **(03.1.4)** Create `DataLoader`

_For **3** Points_

**Create `DataLoader` for the training, validation, and test splits. In any case, make sure for the validation loader that the samples are drawn in the same order during each iteration through the DataLoader.**

In [7]:
# TODO: Create train_loader

# TODO: Create val_loader

# TODO: Create test_loader

---
## 2 | Convolutional Neural Networks

_For a total of **35** Points_

### **(03.2.1)** Create Network

_For **10** Points_

**Implement a neural network consisting of blocks from the `torch.nn` module, the `BasicBlock` from `torchvision.models.resnet`, and possibly self-implemented or adapted blocks _(5 Points)_.**

**Further points for this task are awarded for complying with the following requirements:**
- **(3 Points):** The model should have fewer than $100,000$ parameters. Points will be deducted depending on the amount of excess
($<100,000$ -0 Points, $100,000 - 120,000$ -1 Point, $120,000-140,000$ -2 Points, $>140,000$ -3 Points).
- **(2 Points):** The model should achieve an accuracy of at least $.55$ on the test split of the dataset (within the framework of the requirements of the next task). Points will be deducted depending on the amount of shortfall
($>.55$ -0 Points, $.55-.5$ -1 Point, $<.5$ -2 Points).

In [8]:
from torch import nn
from torch.nn import functional as F
from torchvision.models.resnet import BasicBlock

# Hey! You should really read the code of "BasicBlock" to get a
# better understanding of the architecture!

In [9]:
class Model(nn.Module):

    def __init__(self):
        super().__init__()

        # TODO: Implement your model architecture
        pass

    def forward(self, x):
        # TODO: Implement forward pass
        pass

### **(03.2.2)** Train Network

_For **23** Points_

**Implement a training loop for the network designed above. Points for this task are awarded for the implementation of the following components:**

1. **(11 Points) Training General.** Generally, a training loop for a `PyTorch` network requires an `Optimizer` and a loss function. During training...
   - Model and data should be on the same (strongest) device
   - the model should be set to training mode
   - iterate through the training `DataLoader`
   - for each batch, the gradients in the optimizer should be set to 0
   - the outputs for the batch...
   - and with them and the labels, the loss function should be calculated
   - the loss should be backpropagated
   - an optimization step should be performed
2. **(5 Points) Validation.** At the end of each training epoch, the current model should be evaluated on the validation split created in task **03.1.1**. For this...
   - the model should be set to evaluation mode
   - the calculation of gradients should be turned off
   - the accuracy of the model for the validation `DataLoader` should be calculated
3. **(3 Points) Learning-Rate Scheduling.** Using the `torch.optim.lr_scheduler.ReduceLROnPlateau` learning rate scheduler, the learning rate of the optimizer should be reduced when a metric no longer improves. Choose a meaningful metric and patience and build the learning rate scheduler into your training loop.
4. **(4 Points) Early-Stopping and Checkpointing.** Training should be interrupted if a metric no longer improves. Choose a meaningful metric and patience and build a condition for early stopping into your training loop. Ensure that if training is terminated by early stopping, the best, _not the most recent_ model is used. The models you are working with are small enough that potentially a checkpoint could be created every epoch.

**Experiment with parameters and other techniques to train your model as efficiently as possible. It can also be helpful and illustrative to print important metrics to the console during training to track training progress.**

<div class='alert alert-block alert-warning'>

**Attention: To achieve the accuracy requirement of the previous task, the model may be trained for _a maximum of 25 epochs_.**

<div class='alert alert-block alert-info'>

##### **Performance Note: [Google Colab](https://colab.research.google.com)**

***Training networks of this size is very computationally intensive. When training on the CPU, the training time is so long that experimenting with different architectures is made difficult (because one has to wait very long for results). You can speed up the training time by running your training on Google Colab. Even the free GPU access should be sufficient to reduce the training time to under 10 minutes. It is also recommended to work with fewer training epochs when testing architectures.***

In [10]:
torch.manual_seed(42)

device = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'

# TODO: Implement training loop with:
# - Optimizer and loss function
# - Training on the device
# - Validation after each epoch
# - Learning rate scheduling
# - Early stopping and checkpointing

### **(03.2.3)** Evaluate Network

_For **2** Points_

**Calculate the accuracy of the trained network on the predefined validation split (here test split) of the Fruits-360 dataset.**

<div class='alert alert-block alert-info'>

##### **Submission Note: Model Checkpoint**

**It is permissible to submit a model checkpoint in the Zip file to ensure the reproducibility of your results. In this case, the loading of the checkpoint must be fully implemented and functional in this notebook.**

In [11]:
# TODO: Implement test evaluation to calculate accuracy on test_loader

---
## 3 | Network Analysis

_For a total of **22** Points_

Consider the following neural network for the classification of RGB images with a resolution of $96\times96$ and $8$ classes:

1. Convolution Layer with $3$ input channels and $24$ output channels, kernel size $5$, stride $1$, padding $2$, groups $1$ and no bias.
2. ReLU
3. MaxPooling Layer with kernel size $2$
4. Convolution Layer with $24$ input channels and $48$ output channels, kernel size $3$, stride $1$, padding $1$, groups $3$ and no bias.
5. ReLU
6. Convolution Layer with $48$ input channels and $48$ output channels, kernel size $3$, stride $2$, padding $1$, groups $1$ and no bias.
7. ReLU
8. Convolution Layer with $48$ input channels and $64$ output channels, kernel size $3$, stride $1$, padding $1$, groups $4$ and no bias.
9. ReLU
10. MaxPooling Layer with kernel size $2$
11. Convolution Layer with $64$ input channels and $64$ output channels, kernel size $3$, stride $1$, padding $1$, groups $8$ and no bias.
12. ReLU
13. Flattening Layer
14. Linear Layer with $\fbox{???}$ input features and $512$ output features and no bias.
15. ReLU
16. Linear Layer with $512$ input features and $128$ output features and no bias.
17. ReLU
18. Linear Layer with $128$ input features and $32$ output features and no bias.
19. ReLU
20. Linear Layer with $32$ input features and $8$ output features and no bias.
21. Softmax Layer

### **(03.3.1)** How many input features must the first Linear Layer (Layer 14) of the network described above have? Also explain how the number is calculated.

_For **4** Points_


> ...

### **(03.3.2)** Draw a network diagram of the network described above. Label the diagram with the correct dimensions at the input and after each layer where the dimensions change.

_For **8** Points_


> ...

### **(03.3.3)** Calculate the number of parameters of the network described above. Also ensure that the calculation path is comprehensible.

_For 10 Points_


> ...