# Exercise 2: Classifying penguin species with PyTorch

<img src="https://allisonhorst.github.io/palmerpenguins/reference/figures/lter_penguins.png" width="750" />


Artwork by @allison_horst

In this exercise, we will again use the [``palmerpenguins``](https://github.com/mcnakhaee/palmerpenguins) data to continue our exploration of PyTorch.

We will use the same dataset object as before, but this time we'll take a look at a regression problem: predicting the mass of a penguin given other physical features.

### Task 1: look at the data
In the following code block, we import the ``load_penguins`` function from the ``palmerpenguins`` package.

- Load the penguin data as you did before.
- This time, consider which features we might like to use to predict a penguin's mass.

In [1]:
from palmerpenguins import load_penguins

# Insert code here ...

The features we will use to classify the species are:
- ...
- ...
- ...

### Task 2: creating a ``torch.utils.data.Dataset``

As before, we need to create PyTorch ``Dataset`` objects to supply data to our neural network.

### Task 3: Obtaining training and validation datasets.

- Instantiate the penguin dataloader.
  - Make sure you supply the correct column titles for the features and the targets.
  - Remeber, the target is the mass now, and not the species!
- Iterate over the dataset
    - Hint:
        ```python
        for features, targets in dataset:
            # print the features and targets here
        ```
- Can we give these items to a neural network, or do they need to be transformed first?

In [2]:
from ml_workshop import PenguinDataset

### Task 4: Applying transforms to the data

A common way of transforming inputs to neural networks is to apply a series of transforms to them using ``torchvision.transforms.Compose``. The ``Compose`` object takes a list of callable objects and applies them to the incoming data.

These transforms can be very useful for mapping between file paths and tensors of images, etc.

In [None]:
from torchvision.transforms import Compose

# Let's apply the transfroms we need to the PenguinDataset to get out inputs
# targets as Tensors.

### Task 5: Creating ``DataLoaders``—Again!

As before, we wrap our ``Dataset``s in ``DataLoader`` before we proceed.

In [None]:
# Create training and validation DataLoaders.

### Task 6: Creating a neural network in PyTorch

Previously we created our neural network on scratch, but doing this every time we can to solve a new problem is cumbersome. Many groups working with the ICCS have examples of code where they hard-code the numbers of layers, layer sizes and other parts of their models.

The result is very ugly, non-general and heavily-duplicated code. Here, I am going to shamelessly punt my own Python repo, ``[TorchTools](https://github.com/jdenholm/TorchTools)``, which contains a bunch of commonly-used PyTorch tools there are relatively general, and save save us some time.

Here, we can use the ``FCNet`` model, whose documentation lives [here](https://jdenholm.github.io/TorchTools/models.html). This model is simply a fully-connected neural network with various options for dropout, batch normalisation and easily-modifiable layers.

#### A brief sidebar
Note: the repo is easily pip-installable with
```bash
pip install https://github.com/jdenholm/TorchTools.git
```
but it's included in the requirements of the package for this workshop.

While my peers may laugh at me for including this, it is useful to know you can install packages from GitHub using pip. To install specific versions you can use:
```bash
pip install https://github.com/jdenholm/TorchTools.git@v0.1.0
```
(The famous [segment anything model](https://github.com/facebookresearch/segment-anything) (SAM) published by Facebook Research was released in such a way.)

##### Back to work: let's instantiate the model.

In [1]:
from torch_tools import FCNet

# model =

### Task 7: Selecting a loss function

The previous loss function we chose was appopriate for classification, and _not_ regression. Here we'll use the mean-squared-error loss.

In [2]:
from torch.nn import MSELoss

# loss_func = ...

### Task 8: Selecting an optimiser

``Adam`` is king: let's use it again.

[https://pytorch.org/docs/stable/generated/torch.optim.Adam.html](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html).

In [None]:
# Create an optimiser and give it the model's parameters.

### Task 9: Writing basic training and validation loops



In [None]:
from typing import Dict


def train_one_epoch(
    model: Module,
    train_loader: DataLoader,
    optimiser: Adam,
    loss_func: BCELoss,
) -> Dict[str, float]:
    """Train ``model`` for once epoch.

    Parameters
    ----------
    model : Module
        The neural network.
    train_loader : DataLoader
        Training dataloader.
    optimiser : Adam
        The optimiser.
    loss_func : BCELoss
        Binary cross-entropy loss function.

    Returns
    -------
    Dict[str, float]
        A dictionary of metrics.

    """


def validate_one_epoch(
    model: Module,
    valid_loader: DataLoader,
    loss_func: BCELoss,
) -> Dict[str, float]:
    """Validate ``model`` for a single epoch.

    Parameters
    ----------
    model : Module
        The neural network.
    train_loader : DataLoader
        Training dataloader.
    loss_func : BCELoss
        Binary cross-entropy loss function.

    Returns
    -------
    Dict[str, float]
        Metrics of interest.

    """

### Task 10: Extracting and plotting metrics