Add cross-site evaluation utility and examples #3923

nvkevlu · 2025-12-30T16:24:04Z

Add cross-site evaluation utility and examples.

Description

Instead of #3895, this takes into account #3695 for adding cross-site evaluation utility and examples.

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Quick tests passed locally by running ./runtest.sh.
In-line docstrings updated.
Documentation updated.

greptile-apps · 2025-12-30T16:30:36Z

Greptile Summary

This PR adds cross-site evaluation (CSE) utility and examples to NVFlare, replacing the previous approach from PR #3895 and integrating with the Recipe API from PR #3695.

Key changes:

Introduced NumpyCrossSiteEvalRecipe in nvflare/app_common/np/recipes/cross_site_eval.py for standalone cross-site evaluation with pre-trained NumPy models
Added add_cross_site_evaluation() utility function in nvflare/recipe/utils.py to augment existing recipes with CSE capabilities, following the same pattern as add_experiment_tracking()
Consolidated two separate job files (job_cse.py and job_train_and_cse.py) into a unified job.py with mode flags for cleaner examples
Updated hello-pt example to demonstrate CSE with PyTorch using the --cross_site_eval flag
Added comprehensive documentation explaining both standalone CSE and training+CSE workflows

Architecture:
The implementation uses a registry pattern for model locators (PyTorch and NumPy) and properly separates server-side components (ModelLocator, CrossSiteModelEval controller) from client-side components (validators). The utility function includes detailed documentation about the requirement to manually add NPValidator to clients when using NumPy recipes.

Minor issue:
One copyright year inconsistency in client.py (2026 instead of 2025).

Confidence Score: 5/5

This PR is safe to merge with only a minor style issue
The implementation is well-structured with clean separation of concerns, comprehensive documentation, and consistent patterns. The code follows existing NVFlare conventions, includes detailed docstrings, and provides clear examples. The only issue is a minor copyright year inconsistency that doesn't affect functionality.
No files require special attention - the copyright year fix in client.py is a trivial style correction

Important Files Changed

Filename	Overview
examples/hello-world/hello-numpy-cross-val/client.py	New training script for cross-site eval example - implements simple NumPy training and evaluation, minor copyright year issue
nvflare/app_common/np/recipes/cross_site_eval.py	New recipe for standalone cross-site evaluation with pre-trained NumPy models - clean implementation with good documentation
nvflare/recipe/utils.py	Added `add_cross_site_evaluation` utility function with comprehensive docs explaining NumPy validator requirements - well-structured with registries
examples/hello-world/hello-numpy-cross-val/job.py	Unified job script supporting both standalone CSE and training+CSE modes using Recipe API - clean implementation demonstrating two workflow patterns
examples/hello-world/hello-pt/client.py	Added CSE support with `flare.is_evaluate()` check to handle evaluation-only tasks without training

Sequence Diagram

sequenceDiagram
    participant User
    participant Recipe
    participant Server
    participant ModelLocator
    participant CSEController as CrossSiteModelEval
    participant Client1
    participant Client2
    participant Validator

    User->>Recipe: Create recipe (FedAvg or CSE-only)
    User->>Recipe: add_cross_site_evaluation()
    
    Note over Recipe: Adds ModelLocator, CrossSiteModelEval,<br/>ValidationJsonGenerator to server
    
    User->>Recipe: execute(env)
    Recipe->>Server: Initialize job
    
    Note over Server,Client2: Training Phase (if FedAvg mode)
    Server->>Client1: Training tasks
    Server->>Client2: Training tasks
    Client1-->>Server: Updated models
    Client2-->>Server: Updated models
    
    Note over Server,Client2: Cross-Site Evaluation Phase
    Server->>CSEController: Start CSE workflow
    CSEController->>ModelLocator: Get models to evaluate
    ModelLocator-->>CSEController: Return model references
    
    CSEController->>Client1: Submit models for validation
    CSEController->>Client2: Submit models for validation
    
    Client1->>Validator: Validate all models on local data
    Client2->>Validator: Validate all models on local data
    
    Validator-->>Client1: Evaluation metrics
    Validator-->>Client2: Evaluation metrics
    
    Client1-->>CSEController: Return validation results
    Client2-->>CSEController: Return validation results
    
    CSEController->>Server: Aggregate cross-site results
    Server->>Server: ValidationJsonGenerator saves results
    
    Server-->>User: Job complete with CSE matrix

greptile-apps

Additional Comments (3)

examples/hello-world/hello-numpy-cross-val/job.py, line 90 (link)

logic: ValidationJsonGenerator is NOT added automatically when using plain FedJob. It's only added automatically by BaseFedJob. You need to explicitly add it here.
examples/hello-world/hello-pt/README.md, line 32 (link)

syntax: Typo: "traiing" should be "training"
examples/hello-world/hello-pt/README.md, line 48 (link)

syntax: Misplaced # character after comma - should be a space

_{9 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

Copilot

Pull request overview

This PR adds a cross-site evaluation utility function and examples to NVFlare's Recipe API, enabling users to easily evaluate models across different client sites without sharing data.

Key Changes:

Added add_cross_site_evaluation() utility function to nvflare/recipe/utils.py for programmatically enabling cross-site model evaluation
Added cross-site evaluation support to the PyTorch hello-world example with a --cross_site_eval command-line flag
Unified the NumPy cross-validation example into a single job.py with two modes: standalone CSE and training+CSE

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 24 comments.

Show a summary per file

File	Description
nvflare/recipe/utils.py	Adds `add_cross_site_evaluation()` utility and MODEL_LOCATOR_REGISTRY for PyTorch and NumPy
examples/hello-world/hello-pt/job.py	Adds cross-site evaluation support via `--cross_site_eval` flag and `--train_script` parameter
examples/hello-world/hello-pt/README.md	Updates documentation with cross-site evaluation instructions and usage examples
examples/hello-world/hello-numpy-cross-val/job.py	Replaces multiple job scripts with unified implementation supporting pretrained and training modes
examples/hello-world/hello-numpy-cross-val/client.py	Adds NumPy training client script with training and evaluation functions
examples/hello-world/hello-numpy-cross-val/generate_pretrain_models.py	Updates comment to clarify cross-site evaluation terminology
examples/hello-world/hello-numpy-cross-val/README.md	Comprehensive rewrite documenting both CSE modes and Recipe API usage patterns
examples/hello-world/hello-numpy-cross-val/job_train_and_cse.py	Deleted - functionality consolidated into unified job.py
examples/hello-world/hello-numpy-cross-val/job_cse.py	Deleted - functionality consolidated into unified job.py

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

examples/hello-world/hello-pt/README.md

examples/hello-world/hello-numpy-cross-val/job.py

nvflare/recipe/utils.py

examples/hello-world/hello-pt/README.md

greptile-apps

Additional Comments (1)

nvflare/recipe/utils.py, line 107-108 (link)

logic: comp_ids doesn't exist on plain FedJob objects - only on BaseFedJob. NumpyFedAvgRecipe uses plain FedJob which will cause AttributeError at runtime when calling add_cross_site_evaluation with persistor_id=None.

_{9 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

nvkevlu · 2025-12-30T16:57:34Z

/build

greptile-apps

Additional Comments (2)

nvflare/recipe/utils.py, line 39-50 (link)

style: CLIENT_MODEL_DIR constant is defined but never used in the PR. The run_pretrained_cse() function in job.py references CLIENT_MODEL_DIR directly without using this registry value.
examples/hello-world/hello-numpy-cross-val/job.py, line 38-39 (link)

style: CLIENT_MODEL_DIR is defined but never used anywhere in the code.

_{9 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

examples/hello-world/hello-numpy-cross-val/README.md

examples/hello-world/hello-pt/job.py

examples/hello-world/hello-numpy-cross-val/job.py

examples/hello-world/hello-numpy-cross-val/README.md

greptile-apps

Additional Comments (2)

nvflare/recipe/utils.py, line 79 (link)

style: parameter should have type annotation for consistency with other parameters

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}
examples/hello-world/hello-numpy-cross-val/README.md, line 140 (link)

syntax: Missing import statement for SimEnv

_{9 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

Additional Comments (1)

examples/hello-world/hello-numpy-cross-val/job.py, line 39 (link)

style: CLIENT_MODEL_DIR is defined but never used in the code

_{9 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

nvkevlu · 2026-01-06T21:42:09Z

/build

greptile-apps

Additional Comments (5)

nvflare/app_common/np/recipes/__init__.py, line 15 (link)

syntax: NumpyCrossSiteEvalRecipe is imported but the implementation file cross_site_eval.py doesn't exist in this PR, which will cause ImportError at runtime.
nvflare/app_common/np/recipes/__init__.py, line 18 (link)

syntax: NumpyCrossSiteEvalRecipe is exported but doesn't exist, causing runtime errors.
examples/hello-world/hello-numpy-cross-val/job.py, line 29 (link)

syntax: NumpyCrossSiteEvalRecipe is imported but doesn't exist in nvflare/app_common/np/recipes/, causing ImportError.

Either add the missing cross_site_eval.py implementation file or use manual job construction as shown in the deleted job_cse.py file.
examples/hello-world/hello-numpy-cross-val/README.md, line 107-109 (link)

style: Documentation states to "add cross-site validation with one line" but the code example requires 3 additional lines (importing NPValidator, creating it, and adding it to clients) before calling add_cross_site_evaluation(). Update to clarify the NumPy-specific setup required.
nvflare/recipe/utils.py, line 166 (link)

style: formatter_id parameter missing from CrossSiteModelEval. The old implementation (deleted job_cse.py) included NPFormatter. Verify if formatter is needed for numpy models.

_{10 files reviewed, 5 comments}

_{Edit Code Review Agent Settings | Greptile}

examples/hello-world/hello-pt/README.md

examples/hello-world/hello-pt/job.py

examples/hello-world/hello-numpy-cross-val/client.py

greptile-apps

Additional Comments (3)

nvflare/app_common/np/recipes/__init__.py, line 15 (link)

syntax: the file cross_site_eval.py doesn't exist in the nvflare/app_common/np/recipes/ directory, causing import failure
nvflare/app_common/np/recipes/__init__.py, line 18 (link)

syntax: remove NumpyCrossSiteEvalRecipe since the class doesn't exist
examples/hello-world/hello-numpy-cross-val/job.py, line 29 (link)

logic: NumpyCrossSiteEvalRecipe doesn't exist, will cause runtime error when using --mode pretrained. Is this recipe defined in PR #3695 or should it be added in this PR?

_{10 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

nvkevlu · 2026-01-07T15:53:20Z

/build

greptile-apps

Additional Comments (1)

examples/hello-world/hello-numpy-cross-val/client.py, line 1 (link)

syntax: copyright year should be 2025, not 2026

_{13 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

YuanTingHsieh · 2026-01-07T20:56:40Z

Errors in /home/runner/work/NVFlare/NVFlare/examples/hello-world/hello-numpy-cross-val/README.md

[404] https://nvflare.readthedocs.io/en/main/programming_guide/recipe_api.html | Rejected status code (this depends on your "accept" configuration): Not Found

nvkevlu · 2026-01-07T21:15:25Z

thanks for catching that, should be fixed now

greptile-apps

_{14 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

examples/hello-world/hello-numpy-cross-val/client.py

nvkevlu · 2026-01-07T21:54:51Z

/build

holgerroth

Looks good!

greptile-apps

Greptile Overview

Greptile Summary

This PR adds cross-site evaluation (CSE) utilities and examples to NVFlare, building on previous work in PR #3695. The implementation introduces a new NumpyCrossSiteEvalRecipe for standalone CSE workflows and an add_cross_site_evaluation() utility function to add CSE to existing training recipes.

Key Changes:

Added NumpyCrossSiteEvalRecipe class for standalone cross-site evaluation with pre-trained NumPy models
Added add_cross_site_evaluation() utility function in nvflare/recipe/utils.py to augment any recipe with CSE capabilities
Created comprehensive examples for both NumPy and PyTorch showing standalone CSE and training+CSE workflows
Updated hello-pt example to support CSE via --cross_site_eval flag
PyTorch client now properly handles evaluation-only tasks with flare.is_evaluate() branch
Updated documentation and Jupyter notebooks with CSE usage examples

Minor Issues Found:

Copyright year inconsistency in client.py (2026 vs 2025 in other files)
API consistency: add_cross_site_evaluation not exported from nvflare.recipe.__init__.py while add_experiment_tracking is

The implementation follows good architectural patterns with proper separation of concerns, clear documentation, and safe model loading (allow_pickle=False). The utility function design is extensible with a registry pattern for different model locator types.

Confidence Score: 4/5

This PR is safe to merge with only minor style improvements recommended
The code is well-architected with proper error handling, security practices (no pickle), and comprehensive documentation. The only issues found are minor style inconsistencies (copyright year typo and import export pattern). Core functionality is sound with proper separation between training and validation workflows, safe fallback logic for missing attributes, and clear API patterns.
examples/hello-world/hello-numpy-cross-val/client.py needs copyright year correction from 2026 to 2025. Consider exporting add_cross_site_evaluation from nvflare/recipe/__init__.py for consistency.

Important Files Changed

File Analysis

Filename	Score	Overview
nvflare/app_common/np/recipes/cross_site_eval.py	5/5	New recipe class for standalone cross-site evaluation with NumPy models. Well-structured with clear docstrings and proper component initialization.
nvflare/recipe/utils.py	4/5	Added `add_cross_site_evaluation` utility function with model locator registry. Good abstraction, but could improve API consistency by exporting from init.py.
examples/hello-world/hello-numpy-cross-val/client.py	4/5	Training script for NumPy cross-site eval example. Has copyright year typo (2026 should be 2025).
examples/hello-world/hello-numpy-cross-val/job.py	5/5	Unified script supporting both standalone CSE and training+CSE modes. Clean implementation following recipe API patterns.
examples/hello-world/hello-pt/job.py	5/5	Added cross-site evaluation support via command-line flag. Clean integration with existing recipe.
examples/hello-world/hello-pt/client.py	5/5	Added `is_evaluate()` branch for cross-site evaluation tasks. Proper separation of training and evaluation logic.

Sequence Diagram

sequenceDiagram
    participant User
    participant Recipe
    participant Server
    participant ModelLocator
    participant Client1
    participant Client2
    
    Note over User,Client2: Cross-Site Evaluation Workflow
    
    User->>Recipe: add_cross_site_evaluation(recipe, model_locator_type)
    Recipe->>Server: Add ModelLocator component
    Recipe->>Server: Add CrossSiteModelEval controller
    Recipe->>Server: Add ValidationJsonGenerator
    
    Note over User,Client2: Job Execution Starts
    
    Server->>ModelLocator: get_model_names()
    ModelLocator-->>Server: List of model names
    
    loop For each model
        Server->>ModelLocator: locate_model(model_name)
        ModelLocator-->>Server: Model DXO
        Server->>Server: Save model to cross_val_models_dir
    end
    
    Note over Server,Client2: Distribute models for validation
    
    Server->>Client1: Send TASK_VALIDATION (model_1)
    Server->>Client2: Send TASK_VALIDATION (model_1)
    
    Client1->>Client1: NPValidator.execute() or is_evaluate()
    Client1->>Client1: Validate model on local data
    Client1-->>Server: Return validation metrics
    
    Client2->>Client2: NPValidator.execute() or is_evaluate()
    Client2->>Client2: Validate model on local data
    Client2-->>Server: Return validation metrics
    
    Server->>Server: Collect results in validation matrix
    
    Note over Server,Client2: Repeat for all models
    
    Server->>Server: ValidationJsonGenerator saves cross_val_results.json
    Server-->>User: Job complete with validation matrix

greptile-apps · 2026-01-08T16:15:18Z

examples/hello-world/hello-numpy-cross-val/client.py

@@ -0,0 +1,93 @@
+# Copyright (c) 2026, NVIDIA CORPORATION.  All rights reserved.


copyright year should be 2025, not 2026 (inconsistent with all other files in this PR which use 2025)

greptile-apps · 2026-01-08T16:15:19Z

nvflare/recipe/utils.py

+def add_cross_site_evaluation(
+    recipe: Recipe,
+    model_locator_type: str = "pytorch",
+    model_locator_config: Optional[dict] = None,
+    persistor_id: Optional[str] = None,
+    submit_model_timeout: int = 600,
+    validation_timeout: int = 6000,
+):


consider exporting add_cross_site_evaluation from nvflare.recipe.__init__.py for consistency with add_experiment_tracking (which is exported). This would make the API more consistent and allow users to import both utilities from the same location: from nvflare.recipe import add_experiment_tracking, add_cross_site_evaluation

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

nvkevlu · 2026-01-08T23:22:46Z

/build

Add cross-site evaluation utility and examples

511a00f

Copilot AI review requested due to automatic review settings December 30, 2025 16:24

Copilot started reviewing on behalf of nvkevlu December 30, 2025 16:24 View session

greptile-apps bot reviewed Dec 30, 2025

View reviewed changes

fix consistency

535b2b6

Copilot AI reviewed Dec 30, 2025

View reviewed changes

greptile-apps bot reviewed Dec 30, 2025

View reviewed changes

nvkevlu and others added 2 commits December 30, 2025 11:56

fix PR issues

3b51953

Merge branch 'main' into combined_cross_site_eval_recipe

b03c6d3

nvkevlu mentioned this pull request Dec 30, 2025

Add recipe for hello cross site eval #3895

Closed

6 tasks

greptile-apps bot reviewed Dec 30, 2025

View reviewed changes

YuanTingHsieh requested a review from holgerroth January 2, 2026 22:51

YuanTingHsieh reviewed Jan 2, 2026

View reviewed changes

nvkevlu and others added 2 commits January 6, 2026 10:43

Merge branch 'main' into combined_cross_site_eval_recipe

f565a16

update

f7a9c52

greptile-apps bot reviewed Jan 6, 2026

View reviewed changes

nvkevlu added 2 commits January 6, 2026 16:18

fix issues

bd6a64a

update

9ab66d5

greptile-apps bot reviewed Jan 6, 2026

View reviewed changes

nvkevlu added 2 commits January 6, 2026 16:36

polish

70e5593

polish

d8dd961

greptile-apps bot reviewed Jan 6, 2026

View reviewed changes