HELLO-PT-TB COMPONENTS #1271
-
Hi, I have many questions about the hello-pt-tb components and terms, so I'd like to have a general overwiev on many terms find diffcult to understand. I tried to find specific information reading the documentation but in many case these terms aren't specified and descripted, so I'd like to have a general explanation here. You don't need to describe in depth every single term, I only need a general description on what every terms refers to and what are their main features and why we need them. In particular I'd like to receive an explanation for the following terms:
I know there are many terms, so if you prefer you can share with me the links of the documentation where these terms are explained, in such a way I can autonomously try to understand these components. In any case, thank you for your support, I'm waiting for you answers! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 6 replies
-
thanks for your interests in NVFLARE and questions. See you having so many questions, I am start to wondering if our documentation did not serving your needs somehow. Please let us know what we can improve in documentation. Let me try to explain some of them, Overseer : https://nvflare.readthedocs.io/en/2.2.1/programming_guide/system_architecture.html Overseer -- Overseer is subsystem that monitoring the FL servers. In HA deployment, there is at least two FL servers, Overseer tell the FLARE console (admin client) which FL server to connect. If one FL server is dead, Overseer will auto point the 2nd FL server to client. In some way, overseer is the broker to do the leader selection. Overseer is only applied to High-Availability (HA), for non-HA, you don't need overseer or simply use DummyOverseer. POC mode -- this is the mode to simulate the real deployment. In such as case, there is no Overseer ( non-HA), none security mode. In most cases, in a single machine. you can have a FL server, multiple clients and Flare Console to simulate the real deployment Provision -- Provision refer to the process of generate a software package, known as startkit in NVFLARE. In federated setting, all clients and server may located at different locations. One need to trust each other in order to communication. The Provision process will generate the SSL Certificates used to authenticate the Clients, these certificates and other needed software packages will be part of the Start kit. can be distributed to different organizations before one can start the FL process. Executor and LeanerExecutor Assume you already understand controller and executor concepts. The controller is Workflow coordinator on the FL Server side, where the executor is the component that execute the task received from Controller. The LearnerExecutor is special type of the Executor, where it delegate the actual training work to Learner the benefits of such delegation to hide the communication constructs that are specific to the NVFLARE communication, error code handling and etc. to LearnerExecutor while let end user only focused on the training, validation part of the FL. Before LearnerExecutor, the communication is like this Server ==> Client + User Facing After LearnerExecutor Server ==> Client ==> User Facing TB Analytic Receiver This stands for "Tensorboard Analytics receiver". This is part of the ML Experimental tracking. In Federated settings, different from traditional ML experimental tracking, we have two ways of tracking ML metrics, "client-side" and "server-side". NFLARE implemented the server-side ML Experimental tracking, with Tensorboard as ML tracking tool. With such, the client side is merely collect the logs, where the FL server has the Tensorboard Summary Writer to send to Tensorboard. The TB Analytics Receiver is the component that receives the logging from different clients and then write to Tensorboard. If you still like to learn about more checkout the unleased PR for Expanding ML tracking support, where we add MLFlow Receiver and Weights & Biases Receiver (in example) Persistor -- Not sure which Persistor you refer to. as there are many persistors. But in general, Persistor concept refers that object that help to save stuff ( Model, state etc). You need to Persistor to abstract the location and method to perform the save actions. For example, LocalFilePersistor could save something to local file S3Persistor means you will implement the method (via AWS S3 APIs or equivallent APIs) to save things to S3 buckets. You can implement your own custom persistor depends on your needs Aggregator -- https://nvflare.readthedocs.io/en/2.2.1/programming_guide/controllers/scatter_and_gather_workflow.html#aggregator Server side has "controller". Some controller workflow can be reused and does't need to start scratch. One such controller is "Scatter and gather controller. With such controller, you don't need to worry about writing the controller, but you need to know how to use the weights collected from training process on each client. Hence the Aggregator performs such as role, all you need is to implement the aggregator instead of controller, where you accept client's contribution and do the aggregate. Shareable Generator -- is a component that converting between shareable object and model objects. I could find a better explanation later. Model Locator -- The ModelLocator’s job is to find the models to be included for cross site evaluation located on server. https://nvflare.readthedocs.io/en/2.2.1/apidocs/nvflare.app_common.np.np_model_locator.html#nvflare.app_common.np.np_model_locator.NPModelLocator.locate_model |
Beta Was this translation helpful? Give feedback.
-
@nvkevlu @YuanTingHsieh please follow-up the threads during next weeks while I am offline, we can discuss the changes needed when I am come back |
Beta Was this translation helpful? Give feedback.
-
@FabioNotaro2001 thanks for your interest and feedback!
This point I already raised an issue here: #1129 I see Chester answered most of your questions already. Its purpose is to "Catches VALIDATION_RESULT_RECEIVED event and generates a results.json containing accuracy of each validated model." as written here in the docstring: https://github.com/NVIDIA/NVFlare/blob/dev/nvflare/app_common/widgets/validation_json_generator.py#L28-L29 This means that, server side would store a file called "results.json" that contains the result of "cross-site validation".
And it means the "server" site's model runs evaluation on client_1 data gets a metric of 0.95. Hope this explain it. We will enhance the cross-site validation related codes / scripts in the future. |
Beta Was this translation helpful? Give feedback.
thanks for your interests in NVFLARE and questions. See you having so many questions, I am start to wondering if our documentation did not serving your needs somehow. Please let us know what we can improve in documentation.
Let me try to explain some of them,
Overseer : https://nvflare.readthedocs.io/en/2.2.1/programming_guide/system_architecture.html
Overseer -- Overseer is subsystem that monitoring the FL servers. In HA deployment, there is at least two FL servers, Overseer tell the FLARE console (admin client) which FL server to connect. If one FL server is dead, Overseer will auto point the 2nd FL server to client. In some way, overseer is the broker to do the leader selection. Overse…