# Welcome to the CIDR DL Framework!

This framework contains sample implementations of FL using *Flower* and *Pytorch* 

To get started with the demo for Federated Learning under the framework, let's start with a demonstration of CIFAR10. Let's get all the dependencies with the following code:

In [None]:
# Ray after 1.11.1 has a memory leak when used with flower
!pip install ray=1.11.1
!pip install flwr["simulation"]
git clone https://github.com/Lawrence-lugs/cidr-ufl


!./get_dataset.sh

To get started with the framework, we first import the framework.

In [None]:
import dl_framework

my_config = dl_framework.fw_config()

Fist, we set the locations for the simulation's output data (we'll view this later with tensorboard)  

In [None]:
my_config.tensorboard_runs_dir = 'tb_data/sample'
my_config.run_name = 'demo'

We also need to set the general settings for the distributed learning simulation. Note that all of these have default values so you can actually skip everything. Look into `dl_framework/__init__.py` to see the default values.

In [None]:
my_config.num_nodes = 2
my_config.clients_per_gpu = 1 # careful when setting this to more than 1, you'll probably need more than a GTX 1080T
my_config.num_rounds = 10
my_config.local_epochs = 10

Importantly, we define the node class that the framework will use. In the framework, we define & use `dl_framework.dp_node` and `dl_framework.dl_model` objects to specify the behavior of the node, including the training algorithm and the model.

This is compatible with any AI framework, so long as you can inherit and define the necessary classes.

For now, let's settle with using a premade node & model.

In [4]:
import dl_framework.node
my_config.node_class = dl_framework.node.dl_node

The last thing we need to configure, which is not optional (there are no default values for this one) is the testset and the trainset(s).

We'll need to define the trainset as a list of two trainsets, since we have two nodes.

Below is code that can create any number of IID subsets of the toycar dataset (this framework should be able to handle non-IID subsets, as it is based on **Flower** and **FedAvg** - experiments with such are still to follow but are easy to implement in the framework by editing the code below).

In [5]:
import torch
print(torch.__version__)
def split_dataset(trainset: torch.utils.data.dataset.Dataset,num_nodes: int):

    torchseed = torch.Generator().manual_seed(42)
    dataset_shares = [1 / num_nodes] * num_nodes

    import numpy as np
    print(np.sum(dataset_shares))

    local_trainsets = torch.utils.data.random_split(trainset, dataset_shares, torchseed)

    return local_trainsets 

import torchvision
import image_classification.utils
trainset = torchvision.datasets.CIFAR10(
        root='data',
        train=True,
        download=False,
        transform=image_classification.utils.t_cropflip_augment
    )
trainsets = split_dataset(trainset,my_config.num_nodes)
trainset = None
testset = torchvision.datasets.CIFAR10(
        root='data',
        train=False,
        download=False,
        transform=image_classification.utils.t_normalize
    )

my_config.testset = testset
my_config.trainsets = trainsets

To be able to view the outputs, we need to open the tensorboard server. Personally, I would open a browser to look at `localhost:6006` instead of viewing it here for a better experience.

When using the framework without this notebook, you'll have to open tensorboard using the same command as below, but in a terminal. `tensorboard --logdir tb_data/sample`

In [6]:
%load_ext tensorboard

%tensorboard --logdir tb_data/sample

Finally, let's start the simulation.

In [7]:
import dl_framework.framework as framework
framework.run(my_config)