# Simulation of Federated Learning

We present step-by-step description of how to simulate the federated learning on MNIST data.

## Installation

To this end, we first make sure that the required dependencies are installed.

In [None]:
!pip install "appfl[analytics,examples]"

You can also install the package from the Github repository.

In [None]:
!git clone git@github.com:APPFL/APPFL.git
!cd APPFL
!pip install -e ".[analytics,examples]"

## Import dependencies

We put all the imports here. 
Our framework `appfl` is backboned by `torch` and its neural network model `torch.nn`. We also import `torchvision` to download the `MNIST` dataset.
`omegaconf` is used to read a hierarchical configuration from `.yaml` files.

In [None]:
import numpy as np
import math
import torch
import torch.nn as nn
import torchvision
from torchvision.transforms import ToTensor

import appfl.run as ppfl
from appfl.misc.data import Dataset

from omegaconf import DictConfig, OmegaConf

## Train datasets

Since this is a simulation of federated learning, we manually split the training datasets. Note, however, that this is not necessary in practice.
In this example, we consider only two clinets in the simulation. But, we can set `num_clients` to a larger value for more clients.

In [None]:
num_clients = 2

Each client needs to create `Dataset` object with the training data. Here, we create the objects for all the clients.

In [3]:
train_data_raw = torchvision.datasets.MNIST(
    "./_data", train=True, download=True, transform=ToTensor()
)
split_train_data_raw = np.array_split(range(len(train_data_raw)), num_clients)
train_datasets = []
for i in range(num_clients):

    train_data_input = []
    train_data_label = []
    for idx in split_train_data_raw[i]:
        train_data_input.append(train_data_raw[idx][0].tolist())
        train_data_label.append(train_data_raw[idx][1])

    train_datasets.append(
        Dataset(
            torch.FloatTensor(train_data_input),
            torch.tensor(train_data_label),
        )
    )

## Test dataset

The test data also needs to be wrapped in `Dataset` object.

In [None]:
test_data_raw = torchvision.datasets.MNIST(
    "./_data", train=False, download=False, transform=ToTensor()
)
test_data_input = []
test_data_label = []
for idx in range(len(test_data_raw)):
    test_data_input.append(test_data_raw[idx][0].tolist())
    test_data_label.append(test_data_raw[idx][1])

test_dataset = Dataset(
    torch.FloatTensor(test_data_input), torch.tensor(test_data_label)
)

## User-defined model

Users can define their own models by deriving `torch.nn.Module`. For example in this simulation, we define the following convolutional neural network.

In [None]:
class CNN(nn.Module):
    def __init__(self, num_channel=1, num_classes=10, num_pixel=28):
        super().__init__()
        self.conv1 = nn.Conv2d(
            num_channel, 32, kernel_size=5, padding=0, stride=1, bias=True
        )
        self.conv2 = nn.Conv2d(32, 64, kernel_size=5, padding=0, stride=1, bias=True)
        self.maxpool = nn.MaxPool2d(kernel_size=(2, 2))
        self.act = nn.ReLU(inplace=True)

        X = num_pixel
        X = math.floor(1 + (X + 2 * 0 - 1 * (5 - 1) - 1) / 1)
        X = X / 2
        X = math.floor(1 + (X + 2 * 0 - 1 * (5 - 1) - 1) / 1)
        X = X / 2
        X = int(X)

        self.fc1 = nn.Linear(64 * X * X, 512)
        self.fc2 = nn.Linear(512, num_classes)

    def forward(self, x):
        x = self.act(self.conv1(x))
        x = self.maxpool(x)
        x = self.act(self.conv2(x))
        x = self.maxpool(x)
        x = torch.flatten(x, 1)
        x = self.act(self.fc1(x))
        x = self.fc2(x)
        return x

model = CNN()

## Runs with configuration

We run the `appfl` training with the data and model defined above. 
A number of parameters can be easily set by changing the values in the configuration files.
Sample configuration files are available in directory `./config`.

We present two different ways to run `appfl` here: one for notebook setting and the other for command-line setting.

### Run in Notebook

We manually read the configuration files by using `OmegaConf.load`. 

In [None]:
cfg = OmegaConf.load("./config/config.yaml")
for d in cfg.defaults[1:]:
    for k, v in d.items():
        cfg[k] = OmegaConf.load(f"./config/{k}/{v}.yaml")

print(OmegaConf.to_yaml(cfg))

And, we can start training with the configuration `cfg`.

In [None]:
ppfl.run_serial(cfg, model, train_datasets, test_dataset, "MNIST")

### Run in command line

The training jobs can be much easier in command line by using `hydra` that allows to read configurations from the files, as well as from command-line arguments.
To this end, we can write the following code to a file and run the file from command line tool.

In [None]:
import hydra

@hydra.main(config_path="./config", config_name="config")
def main(cfg: DictConfig):
    ppfl.run_serial(cfg, model, train_datasets, test_dataset, "MNIST")

if __name__ == "__main__":
    main()