# NVFLARE Job Recipe

This tutorial covers how to use Job Recipes in NVFlare to simplify federated learning job creation and execution. Job Recipes provide a simplified abstraction that hides the complexity of low-level job configurations while exposing only the key arguments users should care about. 

<div class="alert alert-block alert-info">
<b> Note: this is a technical preview, not all algorithms are currently implemented with recipes.
</div>
    
## Setup

The NVFlare [Quickstart Guide](https://nvflare.readthedocs.io/en/main/quickstart.html#installation) provides instructions for setting up FLARE on a local system or in a Docker image. We've also cloned the NVFlare GitHub in our top-level working directory.

## Motivation for Using JobRecipe

The **Job API** provides a powerful and flexible way to define FLARE FL workflow and configurations in python without manually edit configuration files. While the API made it simple compare to previous process, but it is not simple enough. For new users and / or data scientists working with standard pipeline, such learning curve is unnecessary in understanding detailed concepts such as controllers, executors, workflows, and how to wire them together. 

To address this, NVFlare introduces the concept of **Job Recipes**. A `JobRecipe` is a simplified abstraction designed to provide a high-level API with:

* **Only the key arguments** a data scientist should care about, such as number of clients, number of rounds, training scripts, and model definition.
* **Provide consistent entry points** for common federated learning patterns such as **FedAvg** and **Cyclic Training**.
* **Provide execution environments** from simulation to production for a same job. 

This makes `JobRecipe` particularly useful as a **first touchpoint** for new users and data scientists working with standard pipelines:

* Instead of learning the entire Job API, users can start with a recipe and focus only on high-level parameters (e.g., `min_clients`, `num_rounds`, etc).
* Recipes encapsulate the necessary job structure and execution logic, ensuring correctness while reducing the chance of misconfiguration.
* If necessary, users can later progress to customizing the full Job API once they are comfortable with the basics.

## Basic Example

Let's start with a simple example using the `FedAvgRecipe` for PyTorch. This recipe automatically handles all the complexity of setting up a federated averaging workflow.

We use our existing training network under `../hello-world/hello-pt/model.py` and script `client.py` to generate the recipe:

In [None]:
import os
import sys
sys.path.append("../hello-world/hello-pt")

from nvflare.app_opt.pt.recipes.fedavg import FedAvgRecipe
from model import SimpleNetwork

# Create a FedAvg recipe
recipe = FedAvgRecipe(
    name="hello-pt",
    min_clients=2,
    num_rounds=3,
    initial_model=SimpleNetwork(),
    train_script="client.py",
    train_args="--batch_size 32",
)

print("Recipe created successfully!")
print(f"Recipe name: {recipe.name}")
print(f"Min clients: {recipe.min_clients}")
print(f"Number of rounds: {recipe.num_rounds}")

## Execution Environments

A **Job Recipe** defines *what* to run in a federated learning setting, but it also needs to know *where* to run. NVFlare provides several **execution environments** that allow the same recipe to be executed in different contexts:

* **Simulation (`SimEnv`)** – for local testing and experimentation on a single machine
* **Proof-of-Concept (`PocEnv`)** – for small-scale, multi-process setups that mimic real-world deployment on a single machine
* **Production (`ProdEnv`)** – for full-scale distributed deployments across multiple organizations and sites

This separation enables users to **prototype once and deploy anywhere** without modifying the core job definition.

### SimEnv – Simulation Environment

Runs all clients and the server as **threads** within a single process. This is lightweight and easy to set up; no networking required. Best suited for:

* Quick experiments
* Debugging scripts and models
* Educational use cases

**Arguments:**
* `num_clients` (int): number of simulated clients
* `clients`: a list of client names (length needs to match num_clients if both are provided)
* `num_threads`: number of threads to use to run simulated clients
* `gpu_config` (str): list of GPU Device Ids, comma separated
* `log_config` (str): "log config mode ('concise', 'full', 'verbose'), filepath, or level"

Now let's test the running of the prepared recipe with SimEnv:

In [None]:
from nvflare.recipe.sim_env import SimEnv
# Create a simulation environment
env = SimEnv(
    num_clients=2, 
    num_threads=2,
)
# Execute the recipe
run = recipe.execute(env=env)
run.get_status()
run.get_result()

The result are stored under `/tmp/nvflare/simulation/hello-pt`

### PocEnv – Proof-of-Concept Environment

Runs server and clients as **separate processes** on the same machine. This simulates real-world deployment within a single node, with server and clients are running in different processes. More realistic than `SimEnv`, but still lightweight enough for a single node.

Best suited for:
* Demonstrations
* Small-scale validation before production deployment
* Debugging orchestration logic

**Arguments:**
* `num_clients` (int, optional): Number of clients to use in POC mode. Defaults to 2.
* `clients` (List[str], optional): List of client names. If None, will generate site-1, site-2, etc.
* `gpu_ids` (List[int], optional): List of GPU IDs to assign to clients. If None, uses CPU only.
* `auto_stop` (bool, optional): Whether to automatically stop POC services after job completion.
* `use_he` (bool, optional): Whether to use HE. Defaults to False.
* `docker_image` (str, optional): Docker image to use for POC.
* `project_conf_path` (str, optional): Path to the project configuration file.

Let's first set the path to POC env

In [None]:
%env NVFLARE_POC_WORKSPACE=/tmp/nvflare/poc

In [None]:
from nvflare.recipe.poc_env import POCEnv

# Create a POC environment
env = POCEnv(
    num_clients=2
)
# Execute the recipe
run = recipe.execute(env=env)
run.get_status()
run.get_result()

The result are stored under the directory `/tmp/nvflare/poc`.

### ProdEnv – Production Environment

We assume the system of a server and clients is up and running across **multiple machines and sites**. Uses secure communication channels and real-world NVFLARE deployment infrastructure. The ProdEnv will utilize the admin's startup package to communicate with an existing NVFlare system to execute and monitor that job execution.

Best suited for:
* Enterprise federated learning deployments
* Multi-institution collaborations
* Production-scale workloads

**Arguments:**
* `startup_kit_location` (str): the directory that contains the startup kit of the admin (generated by nvflare provisioning)
* `login_timeout` (float): timeout value for the admin to login to the system
* `monitor_job_duration` (int): duration to monitor the job execution, None means no monitoring at all

Let's first provision a startup kit:

In [None]:
!nvflare provision -p project.yml -w /tmp/nvflare/prod_workspaces

Let's then start all parties (from terminal, rather than running the below script directly within notebook):

`bash /tmp/nvflare/prod_workspaces/example_project/prod_00/start_all.sh`

Now let's go ahead with environment creation and recipe execution.

In [None]:
from nvflare.recipe.prod_env import ProdEnv
import os
import sys
sys.path.append("../hello-world/hello-pt")

from nvflare.app_opt.pt.recipes.fedavg import FedAvgRecipe
from model import SimpleNetwork

# Create a FedAvg recipe
recipe = FedAvgRecipe(
    name="hello-pt",
    min_clients=2,
    num_rounds=3,
    initial_model=SimpleNetwork(),
    train_script="client.py",
    train_args="--batch_size 32",
)
# Create a Prod environment
env = ProdEnv(
    startup_kit_location="/tmp/nvflare/prod_workspaces/example_project/prod_00/admin@nvidia.com"
)
# Execute the recipe
run = recipe.execute(env=env)
run.get_status()
run.get_result()

## Benefits of Environment Abstraction

* **Consistency** – A recipe defined once can be reused across all environments without modification.
* **Progressive workflow** – Start in `SimEnv` for prototyping, move to `PocEnv` for validation, and finally deploy with `ProdEnv`.
* **Scalability** – The same training logic scales from a laptop experiment to a global production deployment.

## Special for Edge Application

Edge applications run with the new hierachical system are not supported by simulator and at current version, they need to run with `ProdEnv`. Please see more detailed examples [here](https://github.com/NVIDIA/NVFlare/tree/main/examples/advanced/edge). In particular,  the edge recipe preparation and experimental run in [this example](https://github.com/NVIDIA/NVFlare/blob/main/examples/advanced/edge/jobs/pt_job_adv.py). 


## Best Practices

1. **Develop in `SimEnv`** to iterate quickly.
2. **Validate in `PocEnv`** to test multi-process orchestration.
3. **Deploy in `ProdEnv`** for real-world federated learning.
4. **Start simple** with basic recipes before customizing.
5. **Use consistent naming** for your recipes and experiments.
6. **Monitor execution** to understand the federated learning process.

## Summary

Job Recipes, combined with execution environments, provide a **unified abstraction** for defining and running federated learning jobs:

* **Recipes define how training should proceed** (e.g., FedAvg, FedOpt, Swarm Learning)
* **Environments define where and how the job runs** (simulation, proof-of-concept, production)

This separation ensures that the same recipe can seamlessly transition from **local testing** to **enterprise-scale production** without requiring code changes.

The goal of Job Recipes is to create a simple point into NVFlare that is most intuitive for new users and data scientist running standard FL pipelines, while still allowing for growth into more complex and customizable workflows.