HELLO-PT-TB COMPONENTS #1271

FabioNotaro2001 · 2023-01-28T14:41:25Z

FabioNotaro2001
Jan 28, 2023

Hi, I have many questions about the hello-pt-tb components and terms, so I'd like to have a general overwiev on many terms find diffcult to understand. I tried to find specific information reading the documentation but in many case these terms aren't specified and descripted, so I'd like to have a general explanation here.

You don't need to describe in depth every single term, I only need a general description on what every terms refers to and what are their main features and why we need them.

In particular I'd like to receive an explanation for the following terms:

Overseer (I'd define this as an observer, a sort of node role between Admin and Server, but I cannot find the reason of its presence)
POC (Proof Of Concept)
LeanerExecutor class
Persistor
Shareable Generator
Aggregator
Model Locator
JSON Generator
TB Analytic Receiver
provision

I know there are many terms, so if you prefer you can share with me the links of the documentation where these terms are explained, in such a way I can autonomously try to understand these components.

In any case, thank you for your support, I'm waiting for you answers!

Answered by chesterxgchen

Jan 29, 2023

thanks for your interests in NVFLARE and questions. See you having so many questions, I am start to wondering if our documentation did not serving your needs somehow. Please let us know what we can improve in documentation.

Let me try to explain some of them,

Overseer : https://nvflare.readthedocs.io/en/2.2.1/programming_guide/system_architecture.html

Overseer -- Overseer is subsystem that monitoring the FL servers. In HA deployment, there is at least two FL servers, Overseer tell the FLARE console (admin client) which FL server to connect. If one FL server is dead, Overseer will auto point the 2nd FL server to client. In some way, overseer is the broker to do the leader selection. Overse…

View full answer

chesterxgchen · 2023-01-29T02:20:58Z

chesterxgchen
Jan 29, 2023
Maintainer

thanks for your interests in NVFLARE and questions. See you having so many questions, I am start to wondering if our documentation did not serving your needs somehow. Please let us know what we can improve in documentation.

Let me try to explain some of them,

Overseer : https://nvflare.readthedocs.io/en/2.2.1/programming_guide/system_architecture.html

Overseer -- Overseer is subsystem that monitoring the FL servers. In HA deployment, there is at least two FL servers, Overseer tell the FLARE console (admin client) which FL server to connect. If one FL server is dead, Overseer will auto point the 2nd FL server to client. In some way, overseer is the broker to do the leader selection. Overseer is only applied to High-Availability (HA), for non-HA, you don't need overseer or simply use DummyOverseer.

POC mode -- this is the mode to simulate the real deployment. In such as case, there is no Overseer ( non-HA), none security mode. In most cases, in a single machine. you can have a FL server, multiple clients and Flare Console to simulate the real deployment

Provision -- Provision refer to the process of generate a software package, known as startkit in NVFLARE. In federated setting, all clients and server may located at different locations. One need to trust each other in order to communication. The Provision process will generate the SSL Certificates used to authenticate the Clients, these certificates and other needed software packages will be part of the Start kit. can be distributed to different organizations before one can start the FL process.

Executor and LeanerExecutor

Assume you already understand controller and executor concepts. The controller is Workflow coordinator on the FL Server side, where the executor is the component that execute the task received from Controller.

The LearnerExecutor is special type of the Executor, where it delegate the actual training work to Learner

the benefits of such delegation to hide the communication constructs that are specific to the NVFLARE communication, error code handling and etc. to LearnerExecutor while let end user only focused on the training, validation part of the FL.

Before LearnerExecutor, the communication is like this

Server ==> Client + User Facing
Controller ==> Executor

After LearnerExecutor

Server ==> Client ==> User Facing
Controller ==> LearnerExecutor ==> Learner

TB Analytic Receiver

This stands for "Tensorboard Analytics receiver". This is part of the ML Experimental tracking. In Federated settings, different from traditional ML experimental tracking, we have two ways of tracking ML metrics, "client-side" and "server-side". NFLARE implemented the server-side ML Experimental tracking, with Tensorboard as ML tracking tool. With such, the client side is merely collect the logs, where the FL server has the Tensorboard Summary Writer to send to Tensorboard. The TB Analytics Receiver is the component that receives the logging from different clients and then write to Tensorboard. If you still like to learn about more checkout the unleased PR for Expanding ML tracking support, where we add MLFlow Receiver and Weights & Biases Receiver (in example)

Persistor -- Not sure which Persistor you refer to. as there are many persistors. But in general, Persistor concept refers that object that help to save stuff ( Model, state etc). You need to Persistor to abstract the location and method to perform the save actions. For example, LocalFilePersistor could save something to local file S3Persistor means you will implement the method (via AWS S3 APIs or equivallent APIs) to save things to S3 buckets. You can implement your own custom persistor depends on your needs

Aggregator -- https://nvflare.readthedocs.io/en/2.2.1/programming_guide/controllers/scatter_and_gather_workflow.html#aggregator
Aggregator, by it definition, is FL Server side component that used to "accept" clients' contribution and perform aggregation.

Server side has "controller". Some controller workflow can be reused and does't need to start scratch. One such controller is "Scatter and gather controller. With such controller, you don't need to worry about writing the controller, but you need to know how to use the weights collected from training process on each client. Hence the Aggregator performs such as role, all you need is to implement the aggregator instead of controller, where you accept client's contribution and do the aggregate.

Shareable Generator -- is a component that converting between shareable object and model objects. I could find a better explanation later.

Model Locator -- The ModelLocator’s job is to find the models to be included for cross site evaluation located on server. https://nvflare.readthedocs.io/en/2.2.1/apidocs/nvflare.app_common.np.np_model_locator.html#nvflare.app_common.np.np_model_locator.NPModelLocator.locate_model

6 replies

chesterxgchen Jan 29, 2023
Maintainer

@FabioNotaro2001 thank you for the feedback. Your impression is likely many others' impression well.
so we need to hear that, please do me a favor and do few things for me if you don't mind

"... I find it excessively verbose in some sections." -- can you create an github issue and point out the section, we can change it to benefits all
"... I personally found is that it is sometimes difficult to follow," -- can you elaborate on this point, so we can improve.
"... , perhaps even think of the presence of things that were in old versions and that have been removed but that remain in the guide (such as folder sharing zipped by password, this is the first example that comes to my mind). ", please raise a github issue, or even better provide a pull request.

Since this is open source project, we love community involvements: ask question like you did, raise github issues, or pull requests.

thanks you for your feedback, if you see any other issues in both software or documentation, please let us know, you current question/discussion already make me thinking the way we need change in the documentation.

FabioNotaro2001 Jan 29, 2023
Author

Thank you for your support, I'll do what you asked me without any problem! Always happy to help!

In particular, for what concerns the second point you requested, I meant that the guide is a bit tortuous, in the sense that often before reaching a goal it provides you with a lot of information that is useful but of secondary importance in my opinion.
Furthermore, I have often found slight discrepancies and differences between what is said in the complete documentation and what is present in the guides on GitHub.

So I simply meant that the guide can be difficult to follow in some sections not in the sense of the form, but of the amount of contents which in my opinion could be explored in an optional way separately instead of mixing them with the execution of a job or the explanation of a deplyment of an application example.

I repeat, I want to emphasize that this is my opinion and these small defects do not compromise the commendable work you have done and are doing with this guide, which in fact achieves very well the purpose for which it was designed to provide a point reference to learn how to use this framework.

I also take this opportunity to ask you the definition you would give of JSON Generator, as you missed it in the previous answer.

Thank you very much, as always, for your support!

chesterxgchen Jan 29, 2023
Maintainer

Those are good feedback, we will review the documents and make adjustments accordingly.

regarding the definition of json generator, I actually don't know myself. Not sure where did you encounter this concept. I will ask other contributors to help to answer this one.

FabioNotaro2001 Jan 30, 2023
Author

I encountered this term into the file config_fed_server.json, I hope this can be useful

YuanTingHsieh Jan 30, 2023
Maintainer

@FabioNotaro2001 please check my reply below, thanks.

chesterxgchen · 2023-01-29T16:43:50Z

chesterxgchen
Jan 29, 2023
Maintainer

@nvkevlu @YuanTingHsieh please follow-up the threads during next weeks while I am offline, we can discuss the changes needed when I am come back

0 replies

YuanTingHsieh · 2023-01-30T21:17:01Z

YuanTingHsieh
Jan 30, 2023
Maintainer

@FabioNotaro2001 thanks for your interest and feedback!
@chesterxgchen thanks for tagging me.

As mentioned, the only drawback that I personally found is that it is sometimes difficult to follow, perhaps even think of the presence of things that were in old versions and that have been removed but that remain in the guide (such as folder sharing zipped by password, this is the first example that comes to my mind).

This point I already raised an issue here: #1129
The docs will be updated in the next release.

I see Chester answered most of your questions already.
For that "json generator", I think you are referring to https://github.com/NVIDIA/NVFlare/blob/dev/examples/hello-pt-tb/app/config/config_fed_server.json#L38-L42

Its purpose is to "Catches VALIDATION_RESULT_RECEIVED event and generates a results.json containing accuracy of each validated model." as written here in the docstring: https://github.com/NVIDIA/NVFlare/blob/dev/nvflare/app_common/widgets/validation_json_generator.py#L28-L29

This means that, server side would store a file called "results.json" that contains the result of "cross-site validation".
Example file content may look like:

{"server": {"client_1": 0.95, "client_2": 0.93}, "client_1": {"client_2": 0.91}, .... etc }

And it means the "server" site's model runs evaluation on client_1 data gets a metric of 0.95.
"server" site's model runs evaluation on client_2 data gets a metric of 0.93.
"client_1" site's model runs evaluation on client_2 data gets a metric of 0.91.

Hope this explain it.

We will enhance the cross-site validation related codes / scripts in the future.
I will make sure to update the docstrings as well.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HELLO-PT-TB COMPONENTS #1271

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 6 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

HELLO-PT-TB COMPONENTS #1271

FabioNotaro2001 Jan 28, 2023

Replies: 3 comments · 6 replies

chesterxgchen Jan 29, 2023 Maintainer

chesterxgchen Jan 29, 2023 Maintainer

FabioNotaro2001 Jan 29, 2023 Author

chesterxgchen Jan 29, 2023 Maintainer

FabioNotaro2001 Jan 30, 2023 Author

YuanTingHsieh Jan 30, 2023 Maintainer

chesterxgchen Jan 29, 2023 Maintainer

YuanTingHsieh Jan 30, 2023 Maintainer

FabioNotaro2001
Jan 28, 2023

Replies: 3 comments 6 replies

chesterxgchen
Jan 29, 2023
Maintainer

chesterxgchen Jan 29, 2023
Maintainer

FabioNotaro2001 Jan 29, 2023
Author

chesterxgchen Jan 29, 2023
Maintainer

FabioNotaro2001 Jan 30, 2023
Author

YuanTingHsieh Jan 30, 2023
Maintainer

chesterxgchen
Jan 29, 2023
Maintainer

YuanTingHsieh
Jan 30, 2023
Maintainer