# **USE CASE 1.1.** Image classification in FATE

## Initialize FATE

Prior to the execution of the FATE code, the user may initialize FATE as seen in their [webpage](https://fate.readthedocs.io/en/latest/deploy/standalone-deploy/#3-install-fate-in-the-host-using-the-compiled-installer). The following commands are commented, so the user may uncomment them if needed; i.e., if FATE has not been initialized so far in this session. We assume that FATE has been already installed as indicated in their [webpage](https://fate.readthedocs.io/en/latest/deploy/standalone-deploy/#3-install-fate-in-the-host-using-the-compiled-installer).

In [None]:
# Move to the folder where FATE standalone has been downloaded. Modify the path as required.
#!cd /path/to/standalone_fate_install_1.10.0_release

# Start FATE service
#!bash bin/init.sh start

# Load environment variables
#!source bin/init_env.sh

# Initialize FATE's pipeline
!pipeline init --ip 127.0.0.1 --port 9380

## Required libraries and configuration

Import required libraries

In [None]:
import os
import torch as t
from torch import nn
from pipeline import fate_torch_hook
from pipeline.component import HomoNN
from pipeline.backend.pipeline import PipeLine
from pipeline.component import Reader, Evaluation, DataTransform
from pipeline.interface import Data, Model

from pipeline.component.nn import DatasetParam
from pipeline.component.homo_nn import TrainerParam

t = fate_torch_hook(t)

Define some parameters for the simulation, such as the guest and hosts ids, the number of clients in the federated scenario, the number of federated rounds, the number of epochs of each client before communicating, the batch size for training phase, and the seed for random numbers.

FATE's clients must be assigned to guest and host roles. The guest will be by default the one that initiates the learning, so in this case, one guest and four hosts are defined. Note that, when including more than four hosts, FATE was not able to execute the experiment.

In [None]:
# Bind data path to name & namespace
fate_project_path = os.path.abspath('/workspace/FATE/')

# Roles
guest = 10000
hosts = [10001, 10002, 10003, 10004]
arbiter = 9999

# Some parameters
NUM_CLIENTS = len(hosts) + 1 # The number of clients is given by the number of hosts + the guest
NUM_ROUNDS = 10 # Number of learning rounds in the federated computation
NUM_EPOCHS = 5 # Number of epochs FATE uses a pipeline with components that are connected between them once added.
BATCH_SIZE = 20 # Batch size for training phase

# Seed for random numbers
seed = 90

FATE uses a pipeline with components that are connected between them once added.
First of all, we initialize the pipeline and set the roles of each client.

In [None]:
# Initialize the pipeline
pipeline = PipeLine()

# Set job initiator; the guest initiates the process
pipeline.set_initiator(role='guest', party_id=guest)

# Set participants information
pipeline.set_roles(guest=guest, host=hosts, arbiter=arbiter)

## Loading and preparing the input data

The data to be used has been downloaded from [this GitHub repository](https://github.com/teavanist/MNIST-JPG), and later organized in folders: each client has two folders: training and testing data; these folders are later divided by digits. Note that FATE also provides a smaller subset of MNIST that can be downloadad in [this link](https://webank-ai-1251170195.cos.ap-guangzhou.myqcloud.com/fate/examples/data/mnist.zip)

Define the data tables, which will be used later in the FATE job configuration. We use several partitions of the MNIST dataset, one for each of the clients, with different images. We define different tables for training and testing data.

In [None]:
# Create tables for each client's training data
train_datas = [None]*NUM_CLIENTS
for i in range(NUM_CLIENTS):
    train_datas[i] = {"name": "mnist_train" + str(i), "namespace": "experiment"}

# Different data paths for each client
# The path may differ depending where the user has download the mnist images
train_data_paths = [None]*NUM_CLIENTS
for i in range(NUM_CLIENTS): 
    train_data_paths[i] = fate_project_path + '/c' + str(i) + '_train'

for i in range(NUM_CLIENTS): 
    pipeline.bind_table(name=train_datas[i]['name'], 
                        namespace=train_datas[i]['namespace'], 
                        path=train_data_paths[i])

In [None]:
# Create tables for each client's testing data
test_datas = [None]*NUM_CLIENTS
for i in range(NUM_CLIENTS):
    test_datas[i] = {"name": "mnist_test" + str(i), "namespace": "experiment"}

test_data_paths = [None]*NUM_CLIENTS
for i in range(NUM_CLIENTS): 
    test_data_paths[i] = fate_project_path + '/c' + str(i) + '_test'

for i in range(NUM_CLIENTS): 
    pipeline.bind_table(name=test_datas[i]['name'], 
                        namespace=test_datas[i]['namespace'], 
                        path=test_data_paths[i])

The ``Reader`` component, that will later be included to in the pipeline, is in charge of reading the data. We define different readers for either training or testing datasets, since they will be used at different stages of the pipeline.

In [None]:
# Configure Reader for training data
reader_train = Reader(name="reader_train")

# Set the data of each client. The guest will be defined as the client 0, and the rest will be the hosts.
reader_train.get_party_instance(role='guest', party_id=guest).component_param(table=train_datas[0])
for i in range(1, NUM_CLIENTS):
    reader_train.get_party_instance(role='host', party_id=hosts[i-1]).component_param(table=train_datas[i])

# Configure Reader for testing data
reader_test = Reader(name="reader_test")

reader_test.get_party_instance(role='guest', party_id=guest).component_param(table=test_datas[0])
for i in range(1,NUM_CLIENTS):
    reader_test.get_party_instance(role='host', party_id=hosts[i-1]).component_param(table=test_datas[i])


dataset_param = DatasetParam(dataset_name='mnist_dataset', flatten_feature=True)

## Create a Deep Learning model

It should be highlighted that,  at the time of developing this work, FATE only support Dense/Linear layers when developing a Neural Network (NN) for Horizontal/Vertical Federated Learning. To check the supported layers, check this [link](https://fate.readthedocs.io/en/latest/federatedml_component/homo_nn/#supported-layers_1). Besides, it only supports PyTorch as the framework for developing NN for HFL.

Therefore, in next cell we define a model based on PyTorch with only linear layers. Later, the HomoNN class facilitates the integration of such deep learning model into the FATE environment. The HomoNN class receives some parameters such as the loss, optimizer, or federated training process to be follwed. Specifically, the trainer process uses the federated average aggregator, and note that the ``epochs`` parameter stands for the total number of epochs (including those of only local training), but the results are aggregated each ``aggregate_every_n_epoch`` epochs.

In [None]:
# Define model with linear layers
model = t.nn.Sequential(
    t.nn.Linear(784, 32),
    t.nn.ReLU(),
    t.nn.Linear(32, 10),
    t.nn.Softmax(dim=1)
)

nn_0 = HomoNN(name='nn_0',
              model=model,
              loss=t.nn.CrossEntropyLoss(),
              optimizer=t.optim.Adam(
                  model.parameters(), 
                  lr=0.001
              ),
              dataset=dataset_param, 
              trainer=TrainerParam(
                  trainer_name='fedavg_trainer', 
                  epochs=NUM_EPOCHS * NUM_ROUNDS,
                  aggregate_every_n_epoch=NUM_EPOCHS,
                  batch_size=BATCH_SIZE, 
                  validation_freqs=NUM_EPOCHS
              ),
              torch_seed=seed # random seed
             )

In addition to the previous model, we define another one which will later be a copy of the previous. That is needed in order to feed each of them with either training or testing data.

In [None]:
nn_1 = HomoNN(name="nn_1")

## Configure the Pipeline and its components

FATE creates a Pipeline and then add the components that we want to use to train/predict in the federated scenario. They are added in the given order of task execution.

We add the ``Reader`` and ``HomoNN`` components to the pipeline. Besides, we define a new component to perform the evaluation, and add it to the pipeline

In [None]:
# Add data readers
pipeline.add_component(reader_train)
pipeline.add_component(reader_test)

# Add nn_0; its input is the training data
pipeline.add_component(nn_0, data=Data(train_data=reader_train.output.data))

# Add nn_1; its input is the testing data, and the model is the nn_0 previously trained
pipeline.add_component(nn_1, data=Data(test_data=reader_test.output.data),
                       model=Model(model=nn_0.output.model))

In [None]:
# Define the evaluation strategy
evaluation_0 = Evaluation(name='eval_0', eval_type='multi')

# Add to the pipeline
pipeline.add_component(evaluation_0, data=Data(data=[nn_0.output.data, nn_1.output.data]))

## Training and testing in the federated scenario

Once we have the pipeline with all the components ready, we just compile the pipeline and use the fit method.

In [None]:
# compile pipeline once finished adding modules, this step will form conf and dsl files for running job
pipeline.compile()

In [None]:
# fit model
pipeline.fit()

We can check the results of the model in different ways. The first one is using the components with the evaluation strategy defined in the pipeline

In [None]:
# query component summary
pipeline.get_component('nn_0').get_summary()

In [None]:
pipeline.get_component('nn_0').get_output_data()

In [None]:
pipeline.get_component('nn_1').get_output_data()

FATE does not provide the loss or accuracy metrics in testing, but we may calculate the accuracy given the prediction information of previous table

In [None]:
labels = list(pipeline.get_component('nn_1').get_output_data()['label'])
predicted = list(pipeline.get_component('nn_1').get_output_data()['predict_result'])
accuracy = sum(1 for x,y in zip(labels,predicted) if x == y) / float(len(labels))
print(accuracy)

## FATE board

Using the board provided by the FATE framework (By default, available at: [127.0.0.1:8080](http://127.0.0.1:8080)), we can see the results in graphics, instead of using only the output of the summary.

![nn_0](nn_0.png)

![evaluation_0](evaluation.png)