# Interacting with the NVIDIA FLARE System


## Introduction

Once you have a federated learning system up and running, you need ways to interact with it—to submit jobs, monitor progress, retrieve results, and manage participants. NVIDIA FLARE provides three primary methods for system interaction, each suited to different use cases and preferences:

1. **FLARE Admin Console**: An interactive command-line interface for system administrators
2. **FLARE Python API**: Programmatic access for integration with applications and workflows
3. **FLARE Job CLI**: Command-line tools for job management

In this section, we'll explore each of these interaction methods, demonstrating how to perform common tasks with each approach. By understanding these options, you'll be able to choose the most appropriate method for your specific needs and integrate NVIDIA FLARE into your existing workflows.

### Learning Objectives
By the end of this section, you will be able to:
- Identify the three main methods for interacting with NVIDIA FLARE
- Use the FLARE Admin Console to manage federated learning jobs
- Implement the FLARE Python API to programmatically control the system
- Utilize the FLARE Job CLI for command-line job management
- Monitor and manage jobs through different interfaces

## Preparing a Sample Job

Before we dive into the different interaction methods, let's prepare a sample job that we can use throughout this section. We'll use a CIFAR-10 image classification task similar to what we've seen in previous chapters.

### Step 1: Install Requirements

In [None]:
! echo "Installing required packages..."
! pip install -r code/requirements.txt

### Step 2: Download the CIFAR-10 Dataset

We'll download the CIFAR-10 dataset, which will be used by our federated learning job:

In [None]:
! echo "Downloading CIFAR-10 dataset..."
! python code/data/download.py

### Step 3: Generate the Job Configuration

Now we'll generate the configuration for our federated learning job:

In [None]:
! echo "Generating job configuration..."

%cd code
! python fl_job.py
%cd ../.

! echo "Job directory structure:"
! tree /tmp/nvflare/jobs/workdir/fedavg

### Step 4: Start the FLARE System

Before we can interact with the FLARE system, we need to have it running. In a production environment, the system would already be deployed across multiple sites. For our demonstration, we'll use the POC mode we learned about in the previous section.

To start the FLARE system in POC mode, open a terminal and run:

```bash
# Start all components except the admin console
nvflare poc start -ex admin@nvidia.com
```

Then, in a separate terminal, start the admin console:

```bash
# Start only the admin console
nvflare poc start -p admin@nvidia.com
```

> **Note**: For this tutorial, we assume you have already started the FLARE system using the commands above. If you're following along, please execute these commands in separate terminals before proceeding.

## Method 1: FLARE Admin Console

The FLARE Admin Console is an interactive command-line interface that provides system administrators with direct control over the federated learning system. It's particularly useful for interactive management and troubleshooting.

### Accessing the Admin Console

In a production environment, you would access the admin console by running the `fl_admin.sh` script from your admin startup kit:

```bash
cd /path/to/admin/startup/kit
./startup/fl_admin.sh
```

In our POC environment, the admin console is started with the `nvflare poc start -p admin@nvidia.com` command we ran earlier.

### Admin Console Interface

The admin console provides a command prompt where you can enter commands to manage the system:

![FLARE Admin Console](admin_console.png)

You can find the full list of commands [here](https://nvflare.readthedocs.io/en/main/real_world_fl/operation.html).

### Example Workflow

A typical workflow using the admin console might look like this:

1. Check system status: `check_status`
2. Submit a job: `submit_job /tmp/nvflare/jobs/workdir/fedavg`
3. Download results when complete: `download_job job_id /path/to/save`

## Method 2: FLARE Python API

The FLARE Python API provides programmatic access to the federated learning system, allowing you to integrate NVIDIA FLARE with your applications, workflows, and scripts. This is particularly useful for automation and integration with other systems.

### Key Components of the FLARE API

The main components of the FLARE Python API are:

- **Session**: Represents a connection to the FLARE system
- **Job Management**: Functions for submitting, monitoring, and managing jobs
- **System Information**: Functions for querying system status and configuration

Let's explore how to use these components to interact with the FLARE system.

### Creating a Session

The first step in using the FLARE Python API is to create a session that connects to the FLARE system. This session will be used for all subsequent interactions.

In [None]:
import os
from nvflare.fuel.flare_api.flare_api import new_secure_session
 
# Define the user and workspace location
username = "admin@nvidia.com"
workspace = "/tmp/nvflare/poc/example_project/prod_00"
admin_user_dir = os.path.join(workspace, username)

# Create a new session
sess = new_secure_session(username=username, startup_kit_location=admin_user_dir)

# Get and display system information
print(sess.get_system_info())

### Submitting a Job

Once you have a session, you can submit a job to the FLARE system. This is equivalent to the `submit_job` command in the admin console.

In [None]:
# Define the job directory
job_dir = "/tmp/nvflare/jobs/workdir/fedavg"

# Submit the job
job_id = sess.submit_job(job_dir)
print(f"Job submitted successfully with ID: {job_id}")

### Monitoring Job Progress

After submitting a job, you'll often want to monitor its progress. The FLARE API provides the `monitor_job()` function for this purpose.

By default, `monitor_job()` waits until the job is complete before returning Return Code of JOB_FINISHED. However, you can customize its behavior by providing a callback function that receives job metadata and can perform custom actions.

Here's an example that demonstrates a custom callback function:

In [None]:
from nvflare.fuel.flare_api.flare_api import Session

def sample_cb(
        session: Session, job_id: str, job_meta, *cb_args, **cb_kwargs
    ) -> bool:
    """
    Custom callback function for job monitoring.
    
    Args:
        session: The FLARE session
        job_id: The ID of the job being monitored
        job_meta: Metadata about the job's current state
        cb_args: Additional positional arguments
        cb_kwargs: Additional keyword arguments
        
    Returns:
        bool: True to continue monitoring, False to stop
    """
    if job_meta["status"] == "RUNNING":
        if cb_kwargs["cb_run_counter"]["count"] < 3:
            print(job_meta)
            print(cb_kwargs["cb_run_counter"])
        else:
            print(".", end="")
    else:
        print("\n" + str(job_meta))
    
    cb_kwargs["cb_run_counter"]["count"] += 1
    return True

# Monitor the job with our custom callback
print(f"Monitoring job {job_id}...")
result = sess.monitor_job(job_id, cb=sample_cb, cb_run_counter={"count":0})
print(f"\nMonitoring completed with result: {result}")

### Getting Job Metadata

You can retrieve detailed information about a job using the `get_job_meta()` function. This is useful for checking job status, progress, and results.

In [None]:
# Get and display job metadata
job_meta = sess.get_job_meta(job_id)
print(f"Metadata for job {job_id}:")
print(job_meta)

### List Jobs

The `list_jobs()` method allows you to retrieve information about jobs submitted to the NVFLARE server.

```python
jobs = sess.list_jobs(detailed=False, reverse=False, limit=None, id_prefix=None, name_prefix=None)
```

Key parameters:

- `detailed`: When set to `True`, returns comprehensive metadata for each job including status, 
  timestamps, and configuration details. Default is `False`.

- `reverse`: When set to `True`, returns jobs in reverse chronological order (newest first).
  Default is `False` (oldest first).

- `limit`: Specifies the maximum number of jobs to return. Set to `None` or `0` to return all jobs.

- `id_prefix`: Filters jobs to only those with IDs starting with the specified prefix.

- `name_prefix`: Filters jobs to only those with names starting with the specified prefix.

In [None]:
import json

def format_json( data: dict): 
    print(json.dumps(data, sort_keys=True, indent=4,separators=(',', ': ')))

list_jobs_output = sess.list_jobs()
print(format_json(list_jobs_output))


In [None]:
list_jobs_output_detailed = sess.list_jobs(detailed=True)
print(format_json(list_jobs_output_detailed))

### Download Job Results
The `download_job()` method allows you to download the results of a completed job to your local machine.

- `job_id`: (Required) The ID of the job whose results you want to download.
- `download_dir`: The directory where job results will be saved. This is usually specified in the `fed_admin.json` configuration file located in the admin user directory used when creating the FLARE API Session.

In [None]:
sess.download_job_result(job_id)

### Aborting a Job
The `abort_job()` method allows you to terminate a running job that may be stuck, experiencing issues, or no longer needed. This operation cannot be undone, so use it with caution.


In [None]:
print(job_id)
sess.abort_job(job_id)

## Method 3: FLARE Job CLI

The FLARE Job Command Line Interface (CLI) provides a set of commands for managing jobs from the command line. It's particularly useful for quick operations and integration with shell scripts. More details can be found [here](https://nvflare.readthedocs.io/en/2.5/user_guide/nvflare_cli/job_cli.html).

### Key Job CLI Commands

The FLARE Job CLI includes commands for:

- **Creating jobs**: Generate job configurations from templates
- **Listing templates**: View available job templates
- **Submitting jobs**: Send jobs to the FLARE system
- **Downloading results**: Retrieve job outputs

### Example Usage

Here are some examples of how to use the FLARE Job CLI:


In [None]:
# Show variables
! nvflare job show_variables -j /tmp/nvflare/jobs/workdir/fedavg

# Submit a job
! nvflare job submit -j /tmp/nvflare/jobs/workdir/fedavg

## Summary

In this section, we've explored the three main methods for interacting with the NVIDIA FLARE system:

1. **FLARE Admin Console**: An interactive command-line interface for system administrators
2. **FLARE Python API**: Programmatic access for integration with applications and workflows
3. **FLARE Job CLI**: Command-line tools for job management

Each method has its strengths and is suited to different scenarios. By understanding these options, you can choose the most appropriate method for your specific needs and integrate NVIDIA FLARE into your existing workflows.

In the next section, we'll explore [system monitoring](../03.4_system_monitoring/system_monitorinig.ipynb) capabilities in NVIDIA FLARE.