## Intro to the FL Simulator

The [FL Simulator](https://nvflare.readthedocs.io/en/main/user_guide/fl_simulator.html) runs a local simulation of a running NVFLARE FL deployment.  This allows researchers to test and debug an application without provisioning a real, distributed FL project. The FL Simulator runs a server and multiple clients in the same local process, with communication that mimics a real deployment.  This allows researchers to more quickly build out new components and jobs that can be directly used in a production deployment.

### Setup
The NVFlare [Getting Started Guide](https://nvflare.readthedocs.io/en/main/getting_started.html) provides instructions for setting up FLARE on a local system or in a Docker image.  For this DLI, we're using the MONAI Toolkit NGC Container which includes NVFlare, MONAI, and all dependencies. We've also cloned the NVFlare GitHub in our top-level working directory.  This includes the examples and integrations that will be used throughout the DLI.

### Structure of a FLARE Application - hello-numpy-cross-val

To introduce the FL Simulator, we'll run the `hello-numpy-cross-val` example located in the `NVFlare/examples/tutorial/hello-numpy-cross-val` directory.

Here we can see the basic configuration of a FLARE application in the `hello-numpy-cross-val/app` directory.  This includes a `config/` directory with client and server configurations in [`config_fed_client.json`](example/hello-numpy-cross-val/app/config/config_fed_client.json) and [`config_fed_server.json`](examples/hello-numpy-cross-val/app/config/config_fed_server.json).

Taking a look first at `config_fed_server.json` below, we can see the components that define the main FLARE workflow.  This includes built-in modules to manage the model, create the shareable data that's exchanged between server and clients, aggregate results of training, as well as the training workflow itself.

The `workflows` section of the config defines the two main server controller workflows and the training task assigned to the clients.

- the `scatter_and_gathe` workflow, implemented in [`nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather`](../NVFlare/nvflare/app_common/workflows/scatter_and_gather.py)
- the `cross_site_model_eval` workflow, implemented in [`nvflare.app_common.workflows.cross_site_model_eval.CrossSiteModelEval`](../NVFlare/nvflare/app_common/workflows/cross_site_model_eval.py)


##### [`config_fed_server.json`](examples/hello-numpy-cross-val/app/config/config_fed_server.json):

```json
{
  "format_version": 2,
  "server": {
    "heart_beat_timeout": 600
  },
  "task_data_filters": [],
  "task_result_filters": [],
  "components": [
    {
      "id": "persistor",
      "path": "nvflare.app_common.np.np_model_persistor.NPModelPersistor",
      "args": {}
    },
    {
      "id": "shareable_generator",
      "path": "nvflare.app_common.shareablegenerators.full_model_shareable_generator.FullModelShareableGenerator",
      "args": {}
    },
    {
      "id": "aggregator",
      "path": "nvflare.app_common.aggregators.intime_accumulate_model_aggregator.InTimeAccumulateWeightedAggregator",
      "args": {
        "expected_data_kind": "WEIGHTS"
      }
    },
    {
      "id": "model_locator",
      "path": "nvflare.app_common.np.np_model_locator.NPModelLocator",
      "args": {}
    },
    {
      "id": "formatter",
      "path": "nvflare.app_common.np.np_formatter.NPFormatter",
      "args": {}
    },
    {
      "id": "json_generator",
      "path": "nvflare.app_common.widgets.validation_json_generator.ValidationJsonGenerator",
      "args": {}
    }
  ],
  "workflows": [
    {
      "id": "scatter_and_gather",
      "path": "nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather",
      "args": {
        "min_clients": 2,
        "num_rounds": 3,
        "start_round": 0,
        "wait_time_after_min_received": 10,
        "aggregator_id": "aggregator",
        "persistor_id": "persistor",
        "shareable_generator_id": "shareable_generator",
        "train_task_name": "train",
        "train_timeout": 6000
      }
    },
    {
      "id": "cross_site_model_eval",
      "path": "nvflare.app_common.workflows.cross_site_model_eval.CrossSiteModelEval",
      "args": {
        "model_locator_id": "model_locator",
        "submit_model_timeout": 600,
        "validation_timeout": 6000,
        "cleanup_models": false
      }
    }
  ]
}
```


In the client config in the `config_fed_client.json`, we define the task executors.  In this case, the `train` and `submit_model` tasks are implemented in the built-in [`nvflare.app_common.np.np_trainer.NPTrainer`](../NVFlare/nvflare/app_common/np/np_trainer.py).  Similarly, the `validate` task is implemted in [`nvflare.app_common.np.np_trainer.NPValidator`](../NVFlare/nvflare/app_common/np/np_validator.py)

##### [`config_fed_client.json`](example/hello-numpy-cross-val/app/config/config_fed_client.json):
```json
{
  "format_version": 2,
  "executors": [
    {
      "tasks": [
        "train",
        "submit_model"
      ],
      "executor": {
        "path": "nvflare.app_common.np.np_trainer.NPTrainer",
        "args": {}
      }
    },
    {
      "tasks": [
        "validate"
      ],
      "executor": {
        "path": "nvflare.app_common.np.np_validator.NPValidator"
      }
    }
  ],
  "task_result_filters": [],
  "task_data_filters": [],
  "components": []
}
```

The `NPTrainer` and `NPValidator` Executor classes implement a very simple example numpy workflow in which the server assigns an initial matrix:
```
    [ 1, 2, 3 ]
    [ 4, 5, 6 ]
    [ 7, 8, 9 ]
```
that is modified by the clients in each round by incrementing matrix elements by 1.

This can be seen in the [`nvflare.app_common.np.np_trainer.NPTrainer`](../NVFlare/nvflare/app_common/np/np_trainer.py) train function starting at line 80, where the client receives the shareable object and the data exchange object (incoming_dxo), verifies the data type, and increments the client's `np_data` by `self._delta`:
```python
    # Ensure that data is of type weights. Extract model data.
    if incoming_dxo.data_kind != DataKind.WEIGHTS:
        self.system_panic("Model DXO should be of kind DataKind.WEIGHTS.", fl_ctx)
        return make_reply(ReturnCode.BAD_TASK_DATA)
    np_data = incoming_dxo.data

    # Display properties.
    self.log_info(fl_ctx, f"Incoming data kind: {incoming_dxo.data_kind}")
    self.log_info(fl_ctx, f"Model: \n{np_data}")
    self.log_info(fl_ctx, f"Current Round: {current_round}")
    self.log_info(fl_ctx, f"Total Rounds: {total_rounds}")
    self.log_info(fl_ctx, f"Client identity: {fl_ctx.get_identity_name()}")

    # Check abort signal
    if abort_signal.triggered:
        return make_reply(ReturnCode.TASK_ABORTED)

    # Doing some dummy training.
    if np_data:
        if NPConstants.NUMPY_KEY in np_data:
            np_data[NPConstants.NUMPY_KEY] += self._delta
        else:
            self.log_error(fl_ctx, "numpy_key not found in model.")
            return make_reply(ReturnCode.BAD_TASK_DATA)
    else:
        self.log_error(fl_ctx, "No model weights found in shareable.")
        return make_reply(ReturnCode.BAD_TASK_DATA)
```

The [`nvflare.app_common.np.np_trainer.NPValidator`](../NVFlare/nvflare/app_common/np/np_validator.py) completes the workflow by performing a simple example "validation" that extracts the `np_data` matrix from the incoming shareable and dxo object, calculates np.sum(np_data / np.max(np_data), and adds a random epsilon.


```python
    # Do some dummy validation.
    random_epsilon = np.random.random()
    self.log_info(fl_ctx, f"Adding random epsilon {random_epsilon} in validation.")
    val_results = {}
    np_data = model[NPConstants.NUMPY_KEY]
    np_data = np.sum(np_data / np.max(np_data))
    val_results["accuracy"] = np_data + random_epsilon
```
While this isn't a particularly useful workflow, it serves to illustrate the relationship between the server controllers and client executors that define the overall federated workflow.

### Running the FL Simulator

FL Simulator usage can be displayed with the NVFlare CLI: `nvflare simulator -h`

In [None]:
!nvflare simulator -h

The two key arguments here are `-w WORKSPACE` and the `job_folder` argument.  For this example, we'll create a test workspace for the `hello-numpy-cross-val` app, and use the `examples/hello-numpy-cross-val` app folder.

We also specify the number of clients with the `-n N_CLIENTS` argument, the number of threads `-t THREADS` over which to run the clients, and a GPU device to use.  Setting `-n 2 -t 2 -gpu 0`, we will run two clients in parallel, both using GPU device 0.

In [None]:
!mkdir hello-numpy-cross-val-workspace
!nvflare simulator -w hello-numpy-cross-val-workspace -n 2 -t 2 -gpu 0 examples/hello-numpy-cross-val/jobs/hello-numpy-cross-val

Watch the output above for the server to signal the run has completed:
```
    SimulatorServer - INFO - shutting down server
    SimulatorServer - INFO - canceling sync locks
    SimulatorServer - INFO - server off
```


We can then check the contents of the `hello-numpy-cross-val-workspace` directory to see the job output.

In [None]:
!tree hello-numpy-cross-val-workspace

In [None]:
import json
cross_val_file = open("hello-numpy-cross-val-workspace/simulate_job/cross_site_val/cross_val_results.json")
cross_val_json = json.load(cross_val_file)
print(json.dumps(cross_val_json, indent=2))