[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/bitfount/tutorials/main?labpath=06_running_an_image_data_pod.ipynb)

# Federated Learning - Part 6: Running an image data Pod

Welcome to the Bitfount federated learning tutorials! In this sequence of tutorials, you will learn how federated learning works on the Bitfount platform. This is the sixth notebook in the series.

By the end of this notebook, you will know how to run a Pod that uses image data.

Let's import the relevant pieces from our API reference for constructing a Pod. While several of these are optional, it is best practice to import them all for flexibility.

In [None]:
import logging

import nest_asyncio

from bitfount import CSVSource, Pod
from bitfount.runners.config_schemas import (
    DataSplitConfig,
    PodDataConfig,
    PodDetailsConfig,
)
from bitfount.runners.utils import setup_loggers

nest_asyncio.apply()  # Needed because Jupyter also has an asyncio loop

Let's set up the loggers. The loggers are necessary to ensure you can receive real-time feedback on your task's progress or error messages if something goes wrong:

In [None]:
loggers = setup_loggers([logging.getLogger("bitfount")])

We now specify the config for the Pod to run. You'll need to download some data to run the image Pod. For this tutorial we will be using a subset of the MNIST dataset:

In [None]:
# Download and extract MNIST images and labels
!curl https://bitfount-hosted-downloads.s3.eu-west-2.amazonaws.com/bitfount-tutorials/mnist_images.zip -o mnist_images.zip
!curl https://bitfount-hosted-downloads.s3.eu-west-2.amazonaws.com/bitfount-tutorials/mnist_labels.csv -o mnist_labels.csv
!unzip -o mnist_images.zip

Image datasets are slightly different from the tabular datasets we have been using up until this point. For image datasets, the DataSource file you point to when configuring the Pod will need to have references to the images on which you want to train. Suppose the column in your DataSource that holds these references is called `file`, we must inform the Pod that the contents of this column holds references to images. We achieve this by specifying the columns as `"image"` through the `force_stypes` parameter in the `PodDataConfig`.

In [None]:
image_data_config = PodDataConfig(
    force_stypes={"mnist-demo": {"categorical": ["target"], "image": ["file"]}}
)

If you take a look at `mnist_label.csv`, the specified data source for this tutorial example, you will see the `file` column contains the image files names. If you followed the commands above you will have extracted the images to a directory `mnist_images/`. The column (in this case `file`) that holds the references to the image file locations must either be absolute or relative to the current directory. Here we can use the `modifiers` parameter to add the `mnist_images/` prefix to the `file` column to achieve the correct relative path.

Otherwise the setup is very similar to the Pods we have run previously:

In [None]:
pod = Pod(
    name="mnist-demo",
    datasource=CSVSource(
        "mnist_labels.csv", modifiers={"file": {"prefix": "mnist_images/"}}
    ),
    pod_details_config=PodDetailsConfig(
        display_name="MNIST demo pod",
        description="This pod contains a subset of the MNIST data.",
    ),
    data_config=image_data_config,
)

That's the setup done. Let's run the Pod. You'll notice that the notebook cell doesn't complete. That's because the Pod is set to run until it is interrupted!

In [None]:
pod.start()

You should now be able to see your Pod as registered in your Pods page on Bitfount Hub (https://hub.bitfount.com/{username}/pods).

Open the next tutorial and refer to Part 7 where will will show how to train a model on this image Pod.

If you are following the tutorials in Binder, make sure the sidebar is displayed by clicking the folder icon on the left of the screen. Here you will be able to navigate to the next tutorial.