# Exercise 1: Classifying penguin species with PyTorch

<img src="https://allisonhorst.github.io/palmerpenguins/reference/figures/lter_penguins.png" width="750" />


Artwork by @allison_horst

In this exercise, we will use the python package [``palmerpenguins``](https://github.com/mcnakhaee/palmerpenguins) to supply a toy dataset containing various features and measurements of penguins.

We have already created a PyTorch dataset which yields data for each of the penguins, but first we should examine the dataset and see what it contains.

### Task 1: look at the data
In the following code block, we import the ``load_penguins`` function from the ``palmerpenguins`` package.

- Call this function, which returns a single object, and print it.
- Recognise the function has returned a ``pandas.DataFrame``.
- Consider which features it might make sense to use in order to classify the species of the penguins.
  - Let's now discuss the features.

In [1]:
from palmerpenguins import load_penguins

# Insert code here ...

The features we will use to classify the species are:
- ...
- ...
- ...

### Task 2: creating a ``torch.utils.data.Dataset``

All PyTorch dataset objects are subclasses of the ``torch.utils.data.Dataset`` class. To make a custom dataset, create a class which inherits from the ``Dataset`` class, implement some methods (the Python magic methods ``__len__`` and ``__getitem__``) and supply some data.

Spoiller alert: we've done this for you already.

- Open the file ``src/ml_workshop/_penguins.py``.
- Examine, and discuss, each of the methods.
  - ``__len__``
  - ``__getitem__``
- Review and discuss the class arguments.

### Task 3: Obtaining training and validation datasets.

- Instantiate the penguin dataloader.
  - Make sure you supply the correct column titles for the features and the targets.
- Iterate over the dataset
    - Hint:
        ```python
        for features, targets in dataset:
            # print the features and targets here
        ```
- Can we give these items to a neural network, or do they need to be transformed first?

In [2]:
from ml_workshop import PenguinDataset

### Task 4: Applying transforms to the data

A common way of transforming inputs to neural networks is to apply a series of transforms to them using ``torchvision.transforms.Compose``. The ``Compose`` object takes a list of callable objects and applies them to the incoming data.

These transforms can be very useful for mapping between file paths and tensors of images, etc.

In [None]:
from torchvision.transforms import Compose

# Let's apply the transfroms we need to the PenguinDataset to get out inputs
# targets as Tensors.

### Task 5: Creating ``DataLoaders``—and why?

- In Pytorch, once we have created a ``Dataset`` object, we then wrap it in a ``DataLoader``. Why?
  - The ``DataLoader`` object allows us to put our inputs and targets in mini-batches, which makes for more efficient training.
  - The ``DataLoader`` also randomly shuffles the data each epoch (when training).
  - It allows us to load different mini-batches in parallel, which can be very useful for larger datasets and images that can't all fit in memory at once.

In [None]:
# Create training and validation DataLoaders.

### Task 6: Creating a neural network in PyTorch

Here we will create our neural network in PyTorch, and have a general discussion on clean and messy ways of going about it.

In [None]:
from torch.nn import Module


class FCNet(Module):
    """Fully-connected neural network."""

### Task 7: Selecting a loss function

- Binary cross-entropy is about the most common loss function for classification.

### Task 8: Selecting an optimiser

While we talked about stochastic gradient descent in the slides, most people use the so-called Adam optimiser.

[https://pytorch.org/docs/stable/generated/torch.optim.Adam.html](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html)

You can think of it as a fancy version of SGD.

In [None]:
# Create an optimiser and give it the model's parameters.

### Task 9: Writing basic training and validation loops



In [None]:
from typing import Dict


def train_one_epoch(
    model: Module,
    train_loader: DataLoader,
    optimiser: Adam,
    loss_func: BCELoss,
) -> Dict[str, float]:
    """Train ``model`` for once epoch.

    Parameters
    ----------
    model : Module
        The neural network.
    train_loader : DataLoader
        Training dataloader.
    optimiser : Adam
        The optimiser.
    loss_func : BCELoss
        Binary cross-entropy loss function.

    Returns
    -------
    Dict[str, float]
        A dictionary of metrics.

    """

### Task 10: Extracting and plotting metrics