[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/bitfount/tutorials/main?labpath=01_running_a_pod.ipynb)

# Federated Learning - Part 1: Running a pod

Welcome to the Bitfount federated learning tutorials! In this sequence of tutorials, you will learn how federated learning works on the Bitfount platform. This is the first notebook in the series.

This first tutorial introduces the concept of Pods (Processor of Data). A Pod is the component of the Bitfount network which allows for models or queries to run on remote data. Pods are co-located with data, check that users are authorised to perform a given operation, and then execute any approved computation.

By the end of this Jupyter notebook, you should know how to run a Pod by interacting with the Bitfount Python API.

### 1.1 Setting up

If you haven't already, create your Bitfount account at https://hub.bitfount.com. If you'd like to run these tutorials locally, activate your virtual environment, download the files into a directory on your virtual environment, and open a Jupyter notebook by running `jupyter notebook` in your preferred terminal client.

To run a Pod, we must import the relevant pieces from our [API reference](https://docs.bitfount.com/api/bitfount/federated/pod) for constructing a Pod. While several of these are optional, it is best practice to import them all for flexibility.

In [None]:
import logging

import nest_asyncio

from bitfount import CSVSource, Pod
from bitfount.runners.config_schemas import (
    DataSplitConfig,
    PodConfig,
    PodDataConfig,
    PodDetailsConfig,
)
from bitfount.runners.utils import setup_loggers

nest_asyncio.apply()  # Needed because Jupyter also has an asyncio loop

Let's set up the loggers. The loggers are necessary to ensure you can receive real-time feedback on your task's progress or error messages if something goes wrong:

In [None]:
loggers = setup_loggers([logging.getLogger("bitfount")])

In order to set up a Pod, we must specify a config detailing the characteristics of the Pod. For example:

In [None]:
# Configure a pod using the census income data.
pod = Pod(
    name="census-income-demo",
    datasource=CSVSource(
        "https://bitfount-hosted-downloads.s3.eu-west-2.amazonaws.com/bitfount-tutorials/census_income.csv"
    ),
    pod_details_config=PodDetailsConfig(
        display_name="Census Income Demo Pod",
        description="This pod contains data from the census income demo set",
    ),
    data_config=PodDataConfig(
        ignore_cols=["fnlwgt"],
        force_stypes={
            "census-income-demo": {
                "categorical": [
                    "TARGET",
                    "workclass",
                    "marital-status",
                    "occupation",
                    "relationship",
                    "race",
                    "native-country",
                    "gender",
                    "education",
                ],
            },
        },
        modifiers=None,
        datasource_args={"seed": 100},
        data_split=DataSplitConfig(data_splitter="percentage", args={}),
    ),
    approved_pods=[
        "census-income-yaml-demo"
    ],  # this is an optional attribute, but we will use it later in Tutorial 5
)

Notice how we specified which dataset to connect using CSVSource and how to read the dataset by including the details in PodDataConfig. [PodDataConfig](https://docs.bitfount.com/api/bitfount/runners/config_schemas#poddataconfig) has several parameters, many of which are optional, so be sure to check what will work best for your dataset configuration. 

Notice also how `datasource_args` and `data_split` are optional parameters, which are typically used in the event the dataset will be used for machine learning use cases. These parameters can be used to specify the percentage of the dataset split into test, train, and dev subsets of the data.

That's the setup done. Let's run the Pod. You'll notice that the notebook cell doesn't complete. That's because the pod is set to run until it is interrupted! This is important, as the Pod will need to be running in order for it to be accessed. This means if you are planning to continue to the next tutorial set, keep the kernel running!

In [None]:
pod.start()

You should now be able to see your Pod as registered in your Pods page on Bitfount Hub (https://hub.bitfount.com/{username}/pods). If you'd like to learn an alternative mechanism to running a Pod by pointing to a YAML file configuration, go to [tutorial 2](https://github.com/bitfount/tutorials/blob/main/02_running_a_pod_using_yaml.ipynb). If you'd like to skip to training a model or running a SQL query on a Pod, open up [Part 3](https://github.com/bitfount/tutorials/blob/main/03_querying_and_training_a_model.ipynb).