Skip to content

[2.8] Fix tracking recipe integration test#4583

Merged
pcnudde merged 2 commits into
NVIDIA:2.8from
YuanTingHsieh:codex/fix-tracking-integration-test
May 13, 2026
Merged

[2.8] Fix tracking recipe integration test#4583
pcnudde merged 2 commits into
NVIDIA:2.8from
YuanTingHsieh:codex/fix-tracking-integration-test

Conversation

@YuanTingHsieh
Copy link
Copy Markdown
Collaborator

@YuanTingHsieh YuanTingHsieh commented May 12, 2026

Summary

  • Replace the dynamic _exp_tracking_model import in the experiment tracking recipe smoke tests with the documented dict model config.
  • Bundle the example model.py into the server app so model.SimpleNetwork resolves when the job runs.
  • Add a dedicated tracking integration backend and route it through the PyTorch CI setup.
  • Refresh the integration test README/backend list and stale manual pytest command.

Why

The tests loaded the example model under a synthetic module name and then passed that live nn.Module instance into FedAvgRecipe. During job export, Python source inspection could not resolve the synthetic module and raised:

TypeError: <class '_exp_tracking_model.SimpleNetwork'> is a built-in class

Using model={"class_path": "model.SimpleNetwork", "args": {}} matches the recipe API and avoids relying on importlib state.

@YuanTingHsieh YuanTingHsieh force-pushed the codex/fix-tracking-integration-test branch from 5770d05 to b2c2754 Compare May 12, 2026 20:53
@YuanTingHsieh YuanTingHsieh marked this pull request as ready for review May 12, 2026 21:04
Copilot AI review requested due to automatic review settings May 12, 2026 21:05
@YuanTingHsieh YuanTingHsieh added the cicd continuous integration/continuous development label May 12, 2026
Comment thread tests/integration_test/test_configs.yml Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes the experiment tracking recipe integration smoke tests by aligning them with the documented Recipe API model configuration and ensuring required example code is packaged into the exported job. It also introduces a dedicated tracking integration backend and hooks it into the existing integration test runner and PyTorch CI path.

Changes:

  • Update experiment tracking recipe tests to pass a dict-based model config (class_path/args) instead of a dynamically imported nn.Module instance, avoiding export-time source inspection failures.
  • Bundle the example model.py into the server app so model.SimpleNetwork can be resolved at job runtime.
  • Add a new tracking integration backend (pytest-file driven) and route it through ci/run_integration.sh’s PyTorch integration setup; refresh backend list docs.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/integration_test/test_configs.yml Adds tracking to pytest_files so it can run dedicated pytest modules.
tests/integration_test/run_integration_tests.sh Registers tracking as a selectable backend in the integration runner.
tests/integration_test/README.md Updates documented backend list to include tracking.
tests/integration_test/experiment_tracking_recipes_test.py Switches to dict model config + bundles model.py into the job’s server app.
ci/run_integration.sh Routes tracking to the PyTorch integration runner path in CI.
Comments suppressed due to low confidence (1)

tests/integration_test/README.md:29

  • README states that only standalone runs explicit pytest files from pytest_files, but this PR also introduces a tracking backend that runs pytest files via the same mechanism. Please update this section to mention tracking as well to avoid misleading instructions.
The backend options are:
`numpy`, `tensorflow`, `pytorch`, `auth`, `preflight`, `cifar`, `stats`, `xgboost`,
`client_api`, `client_api_qa`, `model_controller_api`, `tracking`, and `standalone`.

`preflight` has its own entry file. Most backend options run through
`tests/integration_test/system_test.py`, and `standalone` runs explicit pytest files listed in
`pytest_files` in `tests/integration_test/test_configs.yml`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/integration_test/experiment_tracking_recipes_test.py
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 12, 2026

Greptile Summary

This PR fixes a TypeError in the experiment tracking recipe smoke tests by replacing dynamic importlib-based model instantiation with a dict-based model config ({"class_path": "model.SimpleNetwork", "args": {}}), and bundling model.py into both server and client apps via _add_model_to_apps.

  • Model loading fix: Replaces importlib.util.spec_from_file_location(\"_exp_tracking_model\", ...) with {\"class_path\": \"model.SimpleNetwork\", \"args\": {}} to match the FedAvgRecipe API and avoid synthetic module name resolution failures during job export.
  • File distribution fix: Introduces _add_model_to_apps which calls both add_file_to_server and add_file_to_clients, ensuring model.py is available in every site's custom directory at runtime.

Confidence Score: 5/5

Safe to merge — the fix correctly distributes model.py to both server and all client sites, resolving the runtime import failure.

The change is narrow and targeted: it swaps an unreliable importlib trick for a documented dict config, and ensures model.py is copied into every site's custom directory (server via add_file_to_server, clients via add_file_to_clients using ALL_SITES). Both the server-side model instantiation path and the client-side from model import SimpleNetwork import are now satisfied. No logic changes outside the test file.

No files require special attention.

Important Files Changed

Filename Overview
tests/integration_test/experiment_tracking_recipes_test.py Fixes synthetic module import error by switching to dict model config and distributes model.py to both server and all client sites.

Reviews (3): Last reviewed commit: "Merge branch '2.8' into codex/fix-tracki..." | Re-trigger Greptile

Comment thread tests/integration_test/experiment_tracking_recipes_test.py Outdated
Signed-off-by: YuanTingHsieh <yuantingh@nvidia.com>
@YuanTingHsieh YuanTingHsieh force-pushed the codex/fix-tracking-integration-test branch from 0014fdb to 4169833 Compare May 12, 2026 22:10
Copy link
Copy Markdown
Collaborator Author

Scope update after review: this PR has been narrowed to tests/integration_test/experiment_tracking_recipes_test.py only. It no longer adds a tracking integration backend or changes CI runner wiring.

Current fix:

  • replace the synthetic _exp_tracking_model import with the recipe dict model config model.SimpleNetwork
  • bundle model.py into both server and client apps, since the server persistor and client script both need to import it
  • refresh the stale manual pytest filename in the test docstring

Validated locally:

  • python3 -m py_compile tests/integration_test/experiment_tracking_recipes_test.py
  • git diff upstream/2.8...HEAD --check

@pcnudde pcnudde merged commit 5bade21 into NVIDIA:2.8 May 13, 2026
24 checks passed
YuanTingHsieh added a commit to YuanTingHsieh/NVFlare that referenced this pull request May 13, 2026
## Summary
- Replace the dynamic `_exp_tracking_model` import in the experiment
tracking recipe smoke tests with the documented dict model config.
- Bundle the example `model.py` into the server app so
`model.SimpleNetwork` resolves when the job runs.
- Add a dedicated `tracking` integration backend and route it through
the PyTorch CI setup.
- Refresh the integration test README/backend list and stale manual
pytest command.

## Why
The tests loaded the example model under a synthetic module name and
then passed that live `nn.Module` instance into `FedAvgRecipe`. During
job export, Python source inspection could not resolve the synthetic
module and raised:

```text
TypeError: <class '_exp_tracking_model.SimpleNetwork'> is a built-in class
```

Using `model={"class_path": "model.SimpleNetwork", "args": {}}` matches
the recipe API and avoids relying on importlib state.

Signed-off-by: YuanTingHsieh <yuantingh@nvidia.com>
(cherry picked from commit 5bade21)
pcnudde pushed a commit that referenced this pull request May 13, 2026
## Summary

Port the selected 2.8 fixes back to `main` in 2.8 merge order:

- #4528 Add warnings for missing study data mappings
- #4538 Update deploy prepare launcher docs
- #4550 Align `Run.get_result()` with the `clean_up` parameter spelling
- #4561 Clarify `remove_client` token cleanup semantics
- #4563 Respect `CUDA_VISIBLE_DEVICES` in the GPU resource manager
- #4574 Fix Docker SJ workspace tmpfs permissions
- #4576 Narrow client failure reporting for generic launcher execution
errors
- #4583 Fix tracking recipe integration test

---------

Signed-off-by: YuanTingHsieh <yuantingh@nvidia.com>
@YuanTingHsieh YuanTingHsieh deleted the codex/fix-tracking-integration-test branch May 13, 2026 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cicd continuous integration/continuous development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants