Skip to content

Latest commit

 

History

History
174 lines (112 loc) · 7.57 KB

hello_scatter_and_gather.rst

File metadata and controls

174 lines (112 loc) · 7.57 KB

Hello Scatter and Gather

Before You Start

Before jumping into this guide, make sure you have an environment with NVIDIA FLARE installed.

You can follow :ref:`getting_started` on the general concept of setting up a Python virtual environment (the recommended environment) and how to install NVIDIA FLARE.

Introduction

This tutorial is meant solely to demonstrate how the NVIDIA FLARE system works, without introducing any actual deep learning concepts.

Through this exercise, you will learn how to use NVIDIA FLARE with numpy to perform basic computations across two clients with the included Scatter and Gather workflow, which broadcasts the training tasks then aggregates the results that come back.

Due to the simplified weights, you will be able to clearly see and understand the results of the FL aggregation and the model persistor process.

The setup of this exercise consists of one server and two clients. The server side model starting with weights [[1, 2, 3], [4, 5, 6], [7, 8, 9]].

The following steps compose one cycle of weight updates, called a round:

  1. Clients are responsible for adding a delta to the weights to calculate new weights for the model.
  2. These updates are then sent to the server which will aggregate them to produce a model with new weights.
  3. Finally, the server sends this updated version of the model back to each client, so the clients can continue to calculate the next model weights in future rounds.

For this exercise, we will be working with the hello-numpy-sag application in the examples folder. Custom FL applications can contain the folders:

  1. custom: contains any custom components (custom Python code)
  2. config: contains client and server configurations (config_fed_client.json, config_fed_server.json)
  3. resources: contains the logger config (log.config)

Let's get started. First clone the repo, if you haven't already:

$ git clone https://github.com/NVIDIA/NVFlare.git

Remember to activate your NVIDIA FLARE Python virtual environment from the installation guide. Ensure numpy is installed.

(nvflare-env) $ python3 -m pip install numpy

Now that you have all your dependencies installed, let's implement the federated learning system.

NVIDIA FLARE Client

You will first notice that the hello-numpy-sag application does not contain a custom folder.

The code for the client and server components has been implemented in the nvflare/app-common/np folder of the NVFlare code tree.

These files, for example the trainer in np_trainer.py can be copied into a custom folder in the hello-numpy-sag application as custom_trainer.py and modified to perform additional tasks.

The config_fed_client.json configuration discussed below would then be modified to point to this custom code by providing the custom path.

For example, replacing nvflare.app_common.np.np_trainer.NPTrainer with custom_trainer.NPTrainer.

In the np_trainer.py trainer, we first import nvflare and numpy. We then implement the execute function to enable the clients to perform a simple addition of a diff to represent one calculation of training a round.

The server sends either the initial weights or any stored weights to each of the clients through the Shareable object passed into execute(). Each client adds the diff to the model data after retrieving it from the DXO (see :ref:`data_exchange_object`) obtained from the Shareable, and creates a new Shareable to include the new weights also contained within a DXO.

In a real federated learning training scenario, each client does its training independently on its own dataset. As such, the weights returned to the server will likely be different for each of the clients.

The FL server can aggregate (in this case average) the clients' results to produce the aggregated model.

You can learn more about Shareable and FLContext in the :ref:`programming guide <programming_guide>`.

NVIDIA FLARE Server & Application

Model Persistor

.. literalinclude:: ../../nvflare/app_common/np/np_model_persistor.py
   :language: python
   :linenos:
   :caption: np_model_persistor.py

The model persistor is used to load and save models on the server. Here, the model refer to weights packaged into a ModelLearnable object.

Internally, DXO is used to manage data after :class:`FullModelShareableGenerator<nvflare.app_common.shareablegenerators.full_model_shareable_generator.FullModelShareableGenerator>` converts Learnable to Shareable on the FL server.

The DXO helps all of the FL components agree on the format.

In this exercise, we can simply save the model as a binary ".npy" file. Depending on the frameworks and tools, the methods of saving the model may vary.

Application Configuration

Inside the config folder there are two files, config_fed_client.json and config_fed_server.json. For now, the default configurations are sufficient.

.. literalinclude:: ../../examples/hello-world/hello-numpy-sag/jobs/hello-numpy-sag/app/config/config_fed_server.json
   :language: json
   :linenos:
   :caption: config_fed_server.json

.. literalinclude:: ../../examples/hello-world/hello-numpy-sag/jobs/hello-numpy-sag/app/config/config_fed_client.json
   :language: json
   :linenos:
   :caption: config_fed_client.json


Here, in executors, the Trainer implementation NPTrainer is configured for the task "train".

If you had implemented your own custom NPTrainer training routine, for example in hello-numpy-sag/custom/custom_trainer.py, this config_fed_client.json configuration would be modified to point to this custom code by providing the custom path.

For example, replacing nvflare.app_common.np.np_trainer.NPTrainer with custom_trainer.NPTrainer.

Federated Numpy with Scatter and Gather Workflow!

After starting the server and clients, you should begin to see some outputs in each terminal tracking the progress of the FL run. If everything went as planned, you should see that through 10 rounds, the FL system has aggregated new models on the server with the results produced by the clients.

Congratulations!

You've successfully built and run your first numpy federated learning system.

You now have a decent grasp of the main FL concepts, and are ready to start exploring how NVIDIA FLARE can be applied to many other tasks.

The full application for this exercise can be found in examples/hello-numpy-sag, with the client and server components implemented in the nvflare/app-common/np folder of the NVFlare code tree.

Previous Versions of Hello Scatter and Gather