# Using the Library

In this document, we will look at using the library for a few standard federated learning environments.

In [1]:
#%pip install -U git+https://github.com/codymlewis/ymir.git git+https://github.com/codymlewis/tenjin.git tqdm

import tensorflow as tf
import tenjin
from tqdm.notebook import trange

import ymir

2022-04-20 10:56:17.658407: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-04-20 10:56:17.658431: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


Lets first look standard federated learning. We will write a function to create a keras model as normal.

In [2]:
def create_model(input_shape, output_shape, lr=0.1):
    inputs = tf.keras.layers.Input(shape=input_shape)
    x = tf.keras.layers.Flatten()(inputs)
    x = tf.keras.layers.Dense(300, activation="relu")(x)
    x = tf.keras.layers.Dense(100, activation="relu")(x)
    outputs = tf.keras.layers.Dense(output_shape, activation="softmax")(x)
    model = tf.keras.models.Model(inputs=inputs, outputs=outputs)
    opt = tf.keras.optimizers.SGD(learning_rate=lr)
    loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
    model.compile(loss=loss_fn, optimizer=opt, metrics=['accuracy'])
    return model

Next, we will load the MNIST dataset, define the per-client batch sizes and perform a latent Dirichlet allocation (LDA) on the dataset.

Finally, we will create separate validation and test datasets to evaluate the global model.

In [3]:
num_clients = 10
dataset = ymir.mp.datasets.Dataset(*tenjin.load('mnist'))
batch_sizes = [32 for _ in range(num_clients)]
data = dataset.fed_split(batch_sizes, ymir.mp.distributions.lda)
train_eval = dataset.get_iter("train", 10_000)
test_eval = dataset.get_iter("test", 10_000)

Next, we create the network and clients, adding each client to the network.

In [4]:
network = ymir.mp.network.Network()
for d in data:
    network.add_client(ymir.regiment.Scout(create_model(dataset.input_shape, dataset.classes), d, 1, test_data=test_eval))

2022-04-20 11:02:19.300739: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-04-20 11:02:19.300782: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-04-20 11:02:19.300809: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (cmla): /proc/driver/nvidia/version does not exist
2022-04-20 11:02:19.301089: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Finally, we create the federated learning global model and controller.

In [5]:
learner = ymir.garrison.fedavg.Captain(create_model(dataset.input_shape, dataset.classes, lr=1), network)

We perform federated learning by repeatedly calling the `step` method on the controller. There will likely be retracing warnings,
these arise due to calling training steps on each client independently, cause a tracing step for each one, this does not impact
performance.

In the following we also, periodically evaluate the global model on the test dataset.

In [7]:
for r in (pbar := trange(500)):
    loss = learner.step()
    if r % 10 == 0:
        metrics = learner.model.test_on_batch(*next(test_eval), return_dict=True)
        pbar.set_postfix(metrics)

  0%|          | 0/500 [00:00<?, ?it/s]

# Alternative Learning Methods

In this library, we include a number of alternative methods for federated learning. In the following, we will cover the most notable.

## Different Aggregators

Using a different aggregator is as simple as using a different Captain object either from the `garrison` module or by a class that
inherits from `Captain`.

In [8]:
learner = ymir.garrison.median.Captain(create_model(dataset.input_shape, dataset.classes, lr=1), network)

Then we can do the learning loop as normal.

In [9]:
for r in (pbar := trange(500)):
    loss = learner.step()
    if r % 10 == 0:
        metrics = learner.model.test_on_batch(*next(test_eval), return_dict=True)
        pbar.set_postfix(metrics)

  0%|          | 0/500 [00:00<?, ?it/s]

## Personalized Learning

Personalized learning methods require the construction of a different client within the network, one that does not overwrite
the local model weights with the global model weights.

In the following example we will construct a network of ditto personalized learners and apply federated averaging for aggregation.

In [10]:
network = ymir.mp.network.Network()
for d in data:
    network.add_client(ymir.regiment.ditto.Scout(create_model(dataset.input_shape, dataset.classes), d, 1, test_data=test_eval))
learner = ymir.garrison.fedavg.Captain(create_model(dataset.input_shape, dataset.classes, lr=1), network)
for r in (pbar := trange(500)):
    loss = learner.step()
    if r % 10 == 0:
        metrics = learner.model.test_on_batch(*next(test_eval), return_dict=True)
        pbar.set_postfix(metrics)

2022-04-20 11:14:37.501637: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.


INFO:tensorflow:Assets written to: ram://244af4cf-3002-4833-b62e-55b7bb50e98f/assets
INFO:tensorflow:Assets written to: ram://cfc071f1-95dd-41d9-88c3-e203e157e8e1/assets
INFO:tensorflow:Assets written to: ram://11e5875e-f0ab-44c6-9d0f-e1916299a8a5/assets
INFO:tensorflow:Assets written to: ram://d6e9f232-d748-4af8-87d9-758981256b72/assets
INFO:tensorflow:Assets written to: ram://aebe4237-afa8-4025-9777-b2bae1c93043/assets
INFO:tensorflow:Assets written to: ram://9e491cb2-c7ce-4c13-b5c0-64b5d1e9e4bd/assets
INFO:tensorflow:Assets written to: ram://d935f5a7-c653-4aba-bbcd-48d17a53aa59/assets
INFO:tensorflow:Assets written to: ram://4f02eb2a-27a8-4b41-a20e-d4c26a243444/assets
INFO:tensorflow:Assets written to: ram://180a8e6a-cb61-4114-9d97-55791fab2796/assets
INFO:tensorflow:Assets written to: ram://b4e3f388-96bc-4fcc-8b50-84defc5d4154/assets


  0%|          | 0/500 [00:00<?, ?it/s]



## Proximal Terms/FL Regularization