Clean up CI integration tests (#516)

* Clean up CI integration tests * Remove job root dir in single app yaml * Fix issue * Add README to test * Fix typo
NVIDIA · May 12, 2022 · 192d180 · 192d180
1 parent 06d20e1
commit 192d180
Show file tree

Hide file tree

Showing 10 changed files with 225 additions and 71 deletions.
diff --git a/tests/README.md b/tests/README.md
@@ -0,0 +1,39 @@
+# NVIDIA Flare Test
+
+
+This file introduces how the tests in NVIDIA FLARE is organized.
+
+We divide tests into unit test and integration test.
+
+```commandline
+tests:
+  - unit_test
+  - integration_test
+```
+
+## Unit Tests
+
+### Structure
+
+The structure of unit test is organized as parallel directories of the production code.
+
+Each directory in `test/unit_test` is mapping to their counterparts in `nvflare`.
+
+For example, we have `test/unit_test/app_common/job_schedulers/job_scheduler.py`
+that tests `nvflare/app_common/job_schedulers/job_scheduler.py`.
+
+### Run
+
+To run unit test: `./runtest.sh`.
+
+### Develop a test case
+
+We use pytest to run our unit tests.
+So please follow pytest test case style.
+
+## Integration Tests
+
+Please refer to [integration tests README](./integration_test/README.md).
+
+
+
diff --git a/tests/integration_test/README.md b/tests/integration_test/README.md
@@ -0,0 +1,93 @@
+# NVIDIA FLARE Integration test
+
+The integration tests entry file is `test/integration_test/system_test.py`.
+
+## Run
+
+First switch to this folder and then run
+
+`PYTHONPATH=[path/to/your/NVFlare] ./run_integration_tests.sh`
+
+
+## Test structure
+
+The main test will read test configurations from `./test_cases.yml`.
+
+### Test config
+
+Each test configuration yaml should contain the following attributes:
+
+| Attributes          | Description                                                 |
+|---------------------|-------------------------------------------------------------|
+| `system_setup`      | How to set up the system. (num of servers, clients)         |
+| `single_app_as_job` | Whether the test cases are single app job or not.           |
+| `cleanup`           | Whether to clean up test folders or not. (Default to True.) |
+| `tests`             | The test cases to run                                       |
+
+If `single_app_as_job` is True, `apps_root_dir` is required.
+Otherwise, `jobs_root_dir` is required.
+
+### System setup config
+
+Each system setup configuration yaml should contain the following attribute:
+
+| Attributes     | Description                                                          |
+|----------------|----------------------------------------------------------------------|
+| poc            | Which poc folder to use (to get fed_server.json and fed_client.json) |
+| n_servers      | number of servers                                                    |
+| n_clients      | number of client sites                                               |
+| snapshot_path  | Where to store server snapshot. (needs to match what's inside poc)   |
+| job_store_path | Where to store job information. (needs to match what's inside poc)   | 
+
+
+### Test cases
+
+Each test case has the following attributes:
+
+| Attributes           | Description                                                                                  |
+|----------------------|----------------------------------------------------------------------------------------------|
+| app_name or job_name | testing app or job folder name. Note that these folders need to be inside the jobs_root_dir. |
+| validators           | Which validator to use to validate the running result once the job is finished.              |
+| event_sequence_yaml  | What event sequence to run during each jobs. (HA test cases)                                 |
+
+### Event sequence
+
+Each event sequence has the following attributes:
+
+| Attributes  | Description                                  |
+|-------------|----------------------------------------------|
+| description | Description of this specific event sequence. |
+| events      | A list of events                             |
+
+Each event has the following attributes:
+
+| Attributes   | Description                                            |
+|--------------|--------------------------------------------------------|
+| trigger      | When to perform the specified action.                  |
+| action       | What actions to take if the trigger is triggered.      |
+| result_state | What state to expect after these actions are finished. |
+
+Triggers can be triggered based on its type:
+  - str: match a string based on log output from server
+  - dict: state based on predefined tracked state variables
+    (workflow, task, round_number, run_finished etc.)
+
+## Folder Structure
+
+- data:
+  - apps: applications for testing
+  - ha: ha test event sequence yaml files
+  - single_app_as_job: test configuration for single app as job test cases
+  - system: system setup config yaml files
+- tf2: TensorFlow2 related codes for the applications used in
+  integration tests.
+- validators: Codes that implement the logic to validate the running result
+  once the job is finished.
+
+### apps
+
+Because the applications inside the `apps` folder is treated as single app job.
+
+Each application in apps folder should contain `config` folder.
+
+And should have `config_fed_server.json` and `config_fed_client.json` inside the config folder.
diff --git a/tests/integration_test/data/apps/tb_streaming/custom/custom_controller.py b/tests/integration_test/data/apps/tb_streaming/custom/custom_controller.py
@@ -22,6 +22,17 @@
 from nvflare.app_common.app_constant import AppConstants
 
 
+def _prepare_training_ctx(client_task: ClientTask, fl_ctx: FLContext):
+    task = client_task.task
+    fl_ctx.set_prop("current_round", task.props["round"], private=False)
+    fl_ctx.set_prop("total_rounds", task.props["total"], private=False)
+
+
+def _process_training_result(client_task: ClientTask, fl_ctx: FLContext):
+    task = client_task.task
+    task.data = client_task.result
+
+
 class CustomController(Controller):
     def __init__(
         self,
@@ -59,15 +70,6 @@ def start_controller(self, fl_ctx: FLContext):
         self._global_model = self.persistor.load(fl_ctx)
         fl_ctx.set_prop(AppConstants.GLOBAL_MODEL, self._global_model, private=True, sticky=True)
 
-    def _prepare_training_ctx(self, client_task: ClientTask, fl_ctx: FLContext):
-        task = client_task.task
-        fl_ctx.set_prop("current_round", task.props["round"], private=False)
-        fl_ctx.set_prop("total_rounds", task.props["total"], private=False)
-
-    def _process_training_result(self, client_task: ClientTask, fl_ctx: FLContext):
-        task = client_task.task
-        task.data = client_task.result
-
     def process_result_of_unknown_task(
         self,
         client: Client,
@@ -91,8 +93,8 @@ def control_flow(self, abort_signal: Signal, fl_ctx: FLContext):
                 data=self.shareable_gen.learnable_to_shareable(self._global_model, fl_ctx),
                 props={"round": r, "total": self._num_rounds},
                 timeout=0,
-                before_task_sent_cb=self._prepare_training_ctx,
-                result_received_cb=self._process_training_result,
+                before_task_sent_cb=_prepare_training_ctx,
+                result_received_cb=_process_training_result,
             )
 
             client_list = engine.get_clients()

diff --git a/...s/integration_test/data/test_examples.yml → ...a/single_app_as_job/test_example_apps.yml b/...s/integration_test/data/test_examples.yml → ...a/single_app_as_job/test_example_apps.yml
@@ -1,8 +1,6 @@
 system_setup: ./data/system/1_server_2_clients.yml
-jobs_root_dir: ./data/jobs
 apps_root_dir: ../../examples
-snapshot_path: /tmp/snapshot-storage # need to match what's inside poc, TODO: auto generate
-job_store_path: /tmp/jobs-storage # need to match what's inside poc, TODO: auto generate
+single_app_as_job: True
 cleanup: True
 
 

diff --git a/tests/integration_test/data/test_ha.yml → ...n_test/data/single_app_as_job/test_ha.yml b/tests/integration_test/data/test_ha.yml → ...n_test/data/single_app_as_job/test_ha.yml
@@ -1,8 +1,6 @@
 system_setup: ./data/system/ha_1_server_2_clients.yml
-jobs_root_dir: ./data/jobs
 apps_root_dir: ./data/apps
-snapshot_path: /tmp/snapshot-storage # need to match what's inside poc, TODO: auto generate
-job_store_path: /tmp/jobs-storage # need to match what's inside poc, TODO: auto generate
+single_app_as_job: True
 cleanup: True
 
 

diff --git a/...s/integration_test/data/test_internal.yml → .../single_app_as_job/test_internal_apps.yml b/...s/integration_test/data/test_internal.yml → .../single_app_as_job/test_internal_apps.yml
@@ -1,8 +1,6 @@
 system_setup: ./data/system/1_server_2_clients.yml
-jobs_root_dir: ./data/jobs
 apps_root_dir: ./data/apps
-snapshot_path: /tmp/snapshot-storage # need to match what's inside poc, TODO: auto generate
-job_store_path: /tmp/jobs-storage # need to match what's inside poc, TODO: auto generate
+single_app_as_job: True
 cleanup: True
 
 

diff --git a/tests/integration_test/data/system/1_server_2_clients.yml b/tests/integration_test/data/system/1_server_2_clients.yml
@@ -1,3 +1,5 @@
 poc: ../../nvflare/poc
 n_servers: 1
 n_clients: 2
+snapshot_path: /tmp/snapshot-storage # need to match what's inside poc, TODO: auto generate
+job_store_path: /tmp/jobs-storage # need to match what's inside poc, TODO: auto generate
diff --git a/tests/integration_test/data/system/ha_1_server_2_clients.yml b/tests/integration_test/data/system/ha_1_server_2_clients.yml
@@ -1,4 +1,6 @@
 poc: ../../nvflare/poc
 n_servers: 1
 n_clients: 2
+snapshot_path: /tmp/snapshot-storage # need to match what's inside poc, TODO: auto generate
+job_store_path: /tmp/jobs-storage # need to match what's inside poc, TODO: auto generate
 ha: True
diff --git a/tests/integration_test/system_test.py b/tests/integration_test/system_test.py
@@ -18,6 +18,7 @@
 import shutil
 import subprocess
 import sys
+import tempfile
 import time
 
 import pytest
@@ -51,76 +52,93 @@ def cleanup_path(path: str):
         shutil.rmtree(path)
 
 
-params = [
-    "./data/test_examples.yml",
-    "./data/test_internal.yml",
-    "./data/test_ha.yml",
-]
+def get_system_setup(system_setup_yaml: str):
+    """Gets system setup config from yaml."""
+    system_setup = read_yaml(system_setup_yaml)
+    print(f"System setup from {system_setup_yaml}:")
+    system_setup["ha"] = system_setup.get("ha", False)
+    for x in ["n_clients", "n_servers", "poc", "snapshot_path", "job_store_path", "ha"]:
+        if x not in system_setup:
+            raise RuntimeError(f"System setup: {system_setup_yaml} missing required attributes {x}.")
+        print(f"\t{x}: {system_setup[x]}")
+    return system_setup
+
+
+def get_test_config(test_config_yaml: str):
+    print(f"Test config from:  {test_config_yaml}")
+    test_config = read_yaml(test_config_yaml)
+    test_config["single_app_as_job"] = test_config.get("single_app_as_job", False)
+    test_config["cleanup"] = test_config.get("cleanup", True)
+    for x in ["system_setup", "cleanup", "single_app_as_job"]:
+        if x not in test_config:
+            raise RuntimeError(f"Test config: {test_config_yaml} missing required attributes {x}.")
+        print(f"\t{x}: {test_config[x]}")
+
+    if test_config["single_app_as_job"]:
+        if "apps_root_dir" not in test_config:
+            raise RuntimeError(f"Test config: {test_config_yaml} missing apps_root_dir.")
+        print(f"\tapps_root_dir: {test_config['apps_root_dir']}")
+    else:
+        if "jobs_root_dir" not in test_config:
+            raise RuntimeError(f"Test config: {test_config_yaml} missing jobs_root_dir.")
+        print(f"\tjobs_root_dir: {test_config['jobs_root_dir']}")
+
+    return test_config
+
+
+test_configs = read_yaml("./test_cases.yml")
 
 
 @pytest.fixture(
     scope="class",
-    params=params,
+    params=test_configs["test_configs"],
 )
 def setup_and_teardown(request):
     yaml_path = os.path.join(os.path.dirname(__file__), request.param)
-    print("Loading params from ", yaml_path)
-    test_config = read_yaml(yaml_path)
-    for x in ["system_setup", "cleanup", "jobs_root_dir", "snapshot_path", "job_store_path"]:
-        if x not in test_config:
-            raise RuntimeError(f"YAML {yaml_path} missing required attributes {x}.")
-    cleanup_path(test_config["snapshot_path"])
-    cleanup_path(test_config["job_store_path"])
-    system_setup = read_yaml(test_config["system_setup"])
-    for x in ["n_clients", "n_servers", "poc"]:
-        if x not in system_setup:
-            raise RuntimeError(f"system setup {test_config['system_setup']} missing required attributes {x}.")
+    test_config = get_test_config(yaml_path)
+    system_setup = get_system_setup(test_config["system_setup"])
 
-    jobs_root_dir = test_config["jobs_root_dir"]
-    snapshot_path = test_config["snapshot_path"]
-    job_store_path = test_config["job_store_path"]
     cleanup = test_config["cleanup"]
 
     poc = system_setup["poc"]
-    ha = system_setup.get("ha", False)
+    snapshot_path = system_setup["snapshot_path"]
+    job_store_path = system_setup["job_store_path"]
+    cleanup_path(snapshot_path)
+    cleanup_path(job_store_path)
 
+    ha = system_setup["ha"]
     poc = POCDirectory(poc_dir=poc, ha=ha)
     site_launcher = SiteLauncher(poc_directory=poc)
     if ha:
         site_launcher.start_overseer()
     site_launcher.start_servers(n=system_setup["n_servers"])
     site_launcher.start_clients(n=system_setup["n_clients"])
 
-    print(f"cleanup = {cleanup}")
-    print(f"poc = {poc}")
-    print(f"jobs_root_dir = {jobs_root_dir}")
-    print(f"snapshot_path = {snapshot_path}")
-    print(f"job_store_path = {job_store_path}")
-
     # testing jobs
     test_jobs = []
-    generated_jobs = []
-    for x in test_config["tests"]:
-        if "job_name" in x:
-            test_jobs.append((x["job_name"], x["validators"], x.get("setup", []), x.get("teardown", [])))
-            continue
-        job_dir = generate_job_dir_for_single_app_job(
-            app_name=x["app_name"],
-            app_root_folder=test_config["apps_root_dir"],
-            clients=[x["name"] for x in site_launcher.client_properties.values()],
-            destination=jobs_root_dir,
-            app_as_job=True,
-        )
-        test_jobs.append(
-            (
-                x["app_name"],
-                x["validators"],
-                x.get("setup", []),
-                x.get("teardown", []),
-                x.get("event_sequence_yaml", ""),
+    if test_config["single_app_as_job"]:
+        jobs_root_dir = tempfile.mkdtemp()
+        for x in test_config["tests"]:
+            _ = generate_job_dir_for_single_app_job(
+                app_name=x["app_name"],
+                app_root_folder=test_config["apps_root_dir"],
+                clients=[x["name"] for x in site_launcher.client_properties.values()],
+                destination=jobs_root_dir,
+                app_as_job=True,
+            )
+            test_jobs.append(
+                (
+                    x["app_name"],
+                    x["validators"],
+                    x.get("setup", []),
+                    x.get("teardown", []),
+                    x.get("event_sequence_yaml", ""),
+                )
             )
-        )
-        generated_jobs.append(job_dir)
+    else:
+        jobs_root_dir = test_config["jobs_root_dir"]
+        for x in test_config["tests"]:
+            test_jobs.append((x["job_name"], x["validators"], x.get("setup", []), x.get("teardown", [])))
 
     admin_controller = AdminController(jobs_root_dir=jobs_root_dir, ha=ha)
     if not admin_controller.initialize():
@@ -138,9 +156,9 @@ def setup_and_teardown(request):
             site_launcher.cleanup()
 
     if cleanup:
-        for job_dir in generated_jobs:
-            print(f"Cleaning up job {job_dir}")
-            shutil.rmtree(job_dir)
+        if test_config["single_app_as_job"]:
+            print(f"Cleaning up generated job dir {jobs_root_dir}")
+            shutil.rmtree(jobs_root_dir)
         cleanup_path(snapshot_path)
         cleanup_path(job_store_path)
 

diff --git a/tests/integration_test/test_cases.yml b/tests/integration_test/test_cases.yml
@@ -0,0 +1,4 @@
+test_configs:
+  - ./data/single_app_as_job/test_example_apps.yml
+  - ./data/single_app_as_job/test_internal_apps.yml
+  - ./data/single_app_as_job/test_ha.yml