Skip to content

Commit

Permalink
Clean up CI integration tests (#516)
Browse files Browse the repository at this point in the history
* Clean up CI integration tests

* Remove job root dir in single app yaml

* Fix issue

* Add README to test

* Fix typo
  • Loading branch information
YuanTingHsieh committed May 12, 2022
1 parent 06d20e1 commit 192d180
Show file tree
Hide file tree
Showing 10 changed files with 225 additions and 71 deletions.
39 changes: 39 additions & 0 deletions tests/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# NVIDIA Flare Test


This file introduces how the tests in NVIDIA FLARE is organized.

We divide tests into unit test and integration test.

```commandline
tests:
- unit_test
- integration_test
```

## Unit Tests

### Structure

The structure of unit test is organized as parallel directories of the production code.

Each directory in `test/unit_test` is mapping to their counterparts in `nvflare`.

For example, we have `test/unit_test/app_common/job_schedulers/job_scheduler.py`
that tests `nvflare/app_common/job_schedulers/job_scheduler.py`.

### Run

To run unit test: `./runtest.sh`.

### Develop a test case

We use pytest to run our unit tests.
So please follow pytest test case style.

## Integration Tests

Please refer to [integration tests README](./integration_test/README.md).



93 changes: 93 additions & 0 deletions tests/integration_test/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# NVIDIA FLARE Integration test

The integration tests entry file is `test/integration_test/system_test.py`.

## Run

First switch to this folder and then run

`PYTHONPATH=[path/to/your/NVFlare] ./run_integration_tests.sh`


## Test structure

The main test will read test configurations from `./test_cases.yml`.

### Test config

Each test configuration yaml should contain the following attributes:

| Attributes | Description |
|---------------------|-------------------------------------------------------------|
| `system_setup` | How to set up the system. (num of servers, clients) |
| `single_app_as_job` | Whether the test cases are single app job or not. |
| `cleanup` | Whether to clean up test folders or not. (Default to True.) |
| `tests` | The test cases to run |

If `single_app_as_job` is True, `apps_root_dir` is required.
Otherwise, `jobs_root_dir` is required.

### System setup config

Each system setup configuration yaml should contain the following attribute:

| Attributes | Description |
|----------------|----------------------------------------------------------------------|
| poc | Which poc folder to use (to get fed_server.json and fed_client.json) |
| n_servers | number of servers |
| n_clients | number of client sites |
| snapshot_path | Where to store server snapshot. (needs to match what's inside poc) |
| job_store_path | Where to store job information. (needs to match what's inside poc) |


### Test cases

Each test case has the following attributes:

| Attributes | Description |
|----------------------|----------------------------------------------------------------------------------------------|
| app_name or job_name | testing app or job folder name. Note that these folders need to be inside the jobs_root_dir. |
| validators | Which validator to use to validate the running result once the job is finished. |
| event_sequence_yaml | What event sequence to run during each jobs. (HA test cases) |

### Event sequence

Each event sequence has the following attributes:

| Attributes | Description |
|-------------|----------------------------------------------|
| description | Description of this specific event sequence. |
| events | A list of events |

Each event has the following attributes:

| Attributes | Description |
|--------------|--------------------------------------------------------|
| trigger | When to perform the specified action. |
| action | What actions to take if the trigger is triggered. |
| result_state | What state to expect after these actions are finished. |

Triggers can be triggered based on its type:
- str: match a string based on log output from server
- dict: state based on predefined tracked state variables
(workflow, task, round_number, run_finished etc.)

## Folder Structure

- data:
- apps: applications for testing
- ha: ha test event sequence yaml files
- single_app_as_job: test configuration for single app as job test cases
- system: system setup config yaml files
- tf2: TensorFlow2 related codes for the applications used in
integration tests.
- validators: Codes that implement the logic to validate the running result
once the job is finished.

### apps

Because the applications inside the `apps` folder is treated as single app job.

Each application in apps folder should contain `config` folder.

And should have `config_fed_server.json` and `config_fed_client.json` inside the config folder.
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,17 @@
from nvflare.app_common.app_constant import AppConstants


def _prepare_training_ctx(client_task: ClientTask, fl_ctx: FLContext):
task = client_task.task
fl_ctx.set_prop("current_round", task.props["round"], private=False)
fl_ctx.set_prop("total_rounds", task.props["total"], private=False)


def _process_training_result(client_task: ClientTask, fl_ctx: FLContext):
task = client_task.task
task.data = client_task.result


class CustomController(Controller):
def __init__(
self,
Expand Down Expand Up @@ -59,15 +70,6 @@ def start_controller(self, fl_ctx: FLContext):
self._global_model = self.persistor.load(fl_ctx)
fl_ctx.set_prop(AppConstants.GLOBAL_MODEL, self._global_model, private=True, sticky=True)

def _prepare_training_ctx(self, client_task: ClientTask, fl_ctx: FLContext):
task = client_task.task
fl_ctx.set_prop("current_round", task.props["round"], private=False)
fl_ctx.set_prop("total_rounds", task.props["total"], private=False)

def _process_training_result(self, client_task: ClientTask, fl_ctx: FLContext):
task = client_task.task
task.data = client_task.result

def process_result_of_unknown_task(
self,
client: Client,
Expand All @@ -91,8 +93,8 @@ def control_flow(self, abort_signal: Signal, fl_ctx: FLContext):
data=self.shareable_gen.learnable_to_shareable(self._global_model, fl_ctx),
props={"round": r, "total": self._num_rounds},
timeout=0,
before_task_sent_cb=self._prepare_training_ctx,
result_received_cb=self._process_training_result,
before_task_sent_cb=_prepare_training_ctx,
result_received_cb=_process_training_result,
)

client_list = engine.get_clients()
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
system_setup: ./data/system/1_server_2_clients.yml
jobs_root_dir: ./data/jobs
apps_root_dir: ../../examples
snapshot_path: /tmp/snapshot-storage # need to match what's inside poc, TODO: auto generate
job_store_path: /tmp/jobs-storage # need to match what's inside poc, TODO: auto generate
single_app_as_job: True
cleanup: True


Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
system_setup: ./data/system/ha_1_server_2_clients.yml
jobs_root_dir: ./data/jobs
apps_root_dir: ./data/apps
snapshot_path: /tmp/snapshot-storage # need to match what's inside poc, TODO: auto generate
job_store_path: /tmp/jobs-storage # need to match what's inside poc, TODO: auto generate
single_app_as_job: True
cleanup: True


Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
system_setup: ./data/system/1_server_2_clients.yml
jobs_root_dir: ./data/jobs
apps_root_dir: ./data/apps
snapshot_path: /tmp/snapshot-storage # need to match what's inside poc, TODO: auto generate
job_store_path: /tmp/jobs-storage # need to match what's inside poc, TODO: auto generate
single_app_as_job: True
cleanup: True


Expand Down
2 changes: 2 additions & 0 deletions tests/integration_test/data/system/1_server_2_clients.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
poc: ../../nvflare/poc
n_servers: 1
n_clients: 2
snapshot_path: /tmp/snapshot-storage # need to match what's inside poc, TODO: auto generate
job_store_path: /tmp/jobs-storage # need to match what's inside poc, TODO: auto generate
2 changes: 2 additions & 0 deletions tests/integration_test/data/system/ha_1_server_2_clients.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
poc: ../../nvflare/poc
n_servers: 1
n_clients: 2
snapshot_path: /tmp/snapshot-storage # need to match what's inside poc, TODO: auto generate
job_store_path: /tmp/jobs-storage # need to match what's inside poc, TODO: auto generate
ha: True
120 changes: 69 additions & 51 deletions tests/integration_test/system_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
import shutil
import subprocess
import sys
import tempfile
import time

import pytest
Expand Down Expand Up @@ -51,76 +52,93 @@ def cleanup_path(path: str):
shutil.rmtree(path)


params = [
"./data/test_examples.yml",
"./data/test_internal.yml",
"./data/test_ha.yml",
]
def get_system_setup(system_setup_yaml: str):
"""Gets system setup config from yaml."""
system_setup = read_yaml(system_setup_yaml)
print(f"System setup from {system_setup_yaml}:")
system_setup["ha"] = system_setup.get("ha", False)
for x in ["n_clients", "n_servers", "poc", "snapshot_path", "job_store_path", "ha"]:
if x not in system_setup:
raise RuntimeError(f"System setup: {system_setup_yaml} missing required attributes {x}.")
print(f"\t{x}: {system_setup[x]}")
return system_setup


def get_test_config(test_config_yaml: str):
print(f"Test config from: {test_config_yaml}")
test_config = read_yaml(test_config_yaml)
test_config["single_app_as_job"] = test_config.get("single_app_as_job", False)
test_config["cleanup"] = test_config.get("cleanup", True)
for x in ["system_setup", "cleanup", "single_app_as_job"]:
if x not in test_config:
raise RuntimeError(f"Test config: {test_config_yaml} missing required attributes {x}.")
print(f"\t{x}: {test_config[x]}")

if test_config["single_app_as_job"]:
if "apps_root_dir" not in test_config:
raise RuntimeError(f"Test config: {test_config_yaml} missing apps_root_dir.")
print(f"\tapps_root_dir: {test_config['apps_root_dir']}")
else:
if "jobs_root_dir" not in test_config:
raise RuntimeError(f"Test config: {test_config_yaml} missing jobs_root_dir.")
print(f"\tjobs_root_dir: {test_config['jobs_root_dir']}")

return test_config


test_configs = read_yaml("./test_cases.yml")


@pytest.fixture(
scope="class",
params=params,
params=test_configs["test_configs"],
)
def setup_and_teardown(request):
yaml_path = os.path.join(os.path.dirname(__file__), request.param)
print("Loading params from ", yaml_path)
test_config = read_yaml(yaml_path)
for x in ["system_setup", "cleanup", "jobs_root_dir", "snapshot_path", "job_store_path"]:
if x not in test_config:
raise RuntimeError(f"YAML {yaml_path} missing required attributes {x}.")
cleanup_path(test_config["snapshot_path"])
cleanup_path(test_config["job_store_path"])
system_setup = read_yaml(test_config["system_setup"])
for x in ["n_clients", "n_servers", "poc"]:
if x not in system_setup:
raise RuntimeError(f"system setup {test_config['system_setup']} missing required attributes {x}.")
test_config = get_test_config(yaml_path)
system_setup = get_system_setup(test_config["system_setup"])

jobs_root_dir = test_config["jobs_root_dir"]
snapshot_path = test_config["snapshot_path"]
job_store_path = test_config["job_store_path"]
cleanup = test_config["cleanup"]

poc = system_setup["poc"]
ha = system_setup.get("ha", False)
snapshot_path = system_setup["snapshot_path"]
job_store_path = system_setup["job_store_path"]
cleanup_path(snapshot_path)
cleanup_path(job_store_path)

ha = system_setup["ha"]
poc = POCDirectory(poc_dir=poc, ha=ha)
site_launcher = SiteLauncher(poc_directory=poc)
if ha:
site_launcher.start_overseer()
site_launcher.start_servers(n=system_setup["n_servers"])
site_launcher.start_clients(n=system_setup["n_clients"])

print(f"cleanup = {cleanup}")
print(f"poc = {poc}")
print(f"jobs_root_dir = {jobs_root_dir}")
print(f"snapshot_path = {snapshot_path}")
print(f"job_store_path = {job_store_path}")

# testing jobs
test_jobs = []
generated_jobs = []
for x in test_config["tests"]:
if "job_name" in x:
test_jobs.append((x["job_name"], x["validators"], x.get("setup", []), x.get("teardown", [])))
continue
job_dir = generate_job_dir_for_single_app_job(
app_name=x["app_name"],
app_root_folder=test_config["apps_root_dir"],
clients=[x["name"] for x in site_launcher.client_properties.values()],
destination=jobs_root_dir,
app_as_job=True,
)
test_jobs.append(
(
x["app_name"],
x["validators"],
x.get("setup", []),
x.get("teardown", []),
x.get("event_sequence_yaml", ""),
if test_config["single_app_as_job"]:
jobs_root_dir = tempfile.mkdtemp()
for x in test_config["tests"]:
_ = generate_job_dir_for_single_app_job(
app_name=x["app_name"],
app_root_folder=test_config["apps_root_dir"],
clients=[x["name"] for x in site_launcher.client_properties.values()],
destination=jobs_root_dir,
app_as_job=True,
)
test_jobs.append(
(
x["app_name"],
x["validators"],
x.get("setup", []),
x.get("teardown", []),
x.get("event_sequence_yaml", ""),
)
)
)
generated_jobs.append(job_dir)
else:
jobs_root_dir = test_config["jobs_root_dir"]
for x in test_config["tests"]:
test_jobs.append((x["job_name"], x["validators"], x.get("setup", []), x.get("teardown", [])))

admin_controller = AdminController(jobs_root_dir=jobs_root_dir, ha=ha)
if not admin_controller.initialize():
Expand All @@ -138,9 +156,9 @@ def setup_and_teardown(request):
site_launcher.cleanup()

if cleanup:
for job_dir in generated_jobs:
print(f"Cleaning up job {job_dir}")
shutil.rmtree(job_dir)
if test_config["single_app_as_job"]:
print(f"Cleaning up generated job dir {jobs_root_dir}")
shutil.rmtree(jobs_root_dir)
cleanup_path(snapshot_path)
cleanup_path(job_store_path)

Expand Down
4 changes: 4 additions & 0 deletions tests/integration_test/test_cases.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
test_configs:
- ./data/single_app_as_job/test_example_apps.yml
- ./data/single_app_as_job/test_internal_apps.yml
- ./data/single_app_as_job/test_ha.yml

0 comments on commit 192d180

Please sign in to comment.