Skip to content

Conversation

@nvkevlu
Copy link
Collaborator

@nvkevlu nvkevlu commented Dec 30, 2025

Add cross-site evaluation utility and examples.

Description

Instead of #3895, this takes into account #3695 for adding cross-site evaluation utility and examples.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Quick tests passed locally by running ./runtest.sh.
  • In-line docstrings updated.
  • Documentation updated.

Copilot AI review requested due to automatic review settings December 30, 2025 16:24
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 30, 2025

Greptile Summary

This PR adds cross-site evaluation (CSE) utility and examples to NVFlare, replacing the previous approach from PR #3895 and integrating with the Recipe API from PR #3695.

Key changes:

  • Introduced NumpyCrossSiteEvalRecipe in nvflare/app_common/np/recipes/cross_site_eval.py for standalone cross-site evaluation with pre-trained NumPy models
  • Added add_cross_site_evaluation() utility function in nvflare/recipe/utils.py to augment existing recipes with CSE capabilities, following the same pattern as add_experiment_tracking()
  • Consolidated two separate job files (job_cse.py and job_train_and_cse.py) into a unified job.py with mode flags for cleaner examples
  • Updated hello-pt example to demonstrate CSE with PyTorch using the --cross_site_eval flag
  • Added comprehensive documentation explaining both standalone CSE and training+CSE workflows

Architecture:
The implementation uses a registry pattern for model locators (PyTorch and NumPy) and properly separates server-side components (ModelLocator, CrossSiteModelEval controller) from client-side components (validators). The utility function includes detailed documentation about the requirement to manually add NPValidator to clients when using NumPy recipes.

Minor issue:
One copyright year inconsistency in client.py (2026 instead of 2025).

Confidence Score: 5/5

  • This PR is safe to merge with only a minor style issue
  • The implementation is well-structured with clean separation of concerns, comprehensive documentation, and consistent patterns. The code follows existing NVFlare conventions, includes detailed docstrings, and provides clear examples. The only issue is a minor copyright year inconsistency that doesn't affect functionality.
  • No files require special attention - the copyright year fix in client.py is a trivial style correction

Important Files Changed

Filename Overview
examples/hello-world/hello-numpy-cross-val/client.py New training script for cross-site eval example - implements simple NumPy training and evaluation, minor copyright year issue
nvflare/app_common/np/recipes/cross_site_eval.py New recipe for standalone cross-site evaluation with pre-trained NumPy models - clean implementation with good documentation
nvflare/recipe/utils.py Added add_cross_site_evaluation utility function with comprehensive docs explaining NumPy validator requirements - well-structured with registries
examples/hello-world/hello-numpy-cross-val/job.py Unified job script supporting both standalone CSE and training+CSE modes using Recipe API - clean implementation demonstrating two workflow patterns
examples/hello-world/hello-pt/client.py Added CSE support with flare.is_evaluate() check to handle evaluation-only tasks without training

Sequence Diagram

sequenceDiagram
    participant User
    participant Recipe
    participant Server
    participant ModelLocator
    participant CSEController as CrossSiteModelEval
    participant Client1
    participant Client2
    participant Validator

    User->>Recipe: Create recipe (FedAvg or CSE-only)
    User->>Recipe: add_cross_site_evaluation()
    
    Note over Recipe: Adds ModelLocator, CrossSiteModelEval,<br/>ValidationJsonGenerator to server
    
    User->>Recipe: execute(env)
    Recipe->>Server: Initialize job
    
    Note over Server,Client2: Training Phase (if FedAvg mode)
    Server->>Client1: Training tasks
    Server->>Client2: Training tasks
    Client1-->>Server: Updated models
    Client2-->>Server: Updated models
    
    Note over Server,Client2: Cross-Site Evaluation Phase
    Server->>CSEController: Start CSE workflow
    CSEController->>ModelLocator: Get models to evaluate
    ModelLocator-->>CSEController: Return model references
    
    CSEController->>Client1: Submit models for validation
    CSEController->>Client2: Submit models for validation
    
    Client1->>Validator: Validate all models on local data
    Client2->>Validator: Validate all models on local data
    
    Validator-->>Client1: Evaluation metrics
    Validator-->>Client2: Evaluation metrics
    
    Client1-->>CSEController: Return validation results
    Client2-->>CSEController: Return validation results
    
    CSEController->>Server: Aggregate cross-site results
    Server->>Server: ValidationJsonGenerator saves results
    
    Server-->>User: Job complete with CSE matrix
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (3)

  1. examples/hello-world/hello-numpy-cross-val/job.py, line 90 (link)

    logic: ValidationJsonGenerator is NOT added automatically when using plain FedJob. It's only added automatically by BaseFedJob. You need to explicitly add it here.

  2. examples/hello-world/hello-pt/README.md, line 32 (link)

    syntax: Typo: "traiing" should be "training"

  3. examples/hello-world/hello-pt/README.md, line 48 (link)

    syntax: Misplaced # character after comma - should be a space

9 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a cross-site evaluation utility function and examples to NVFlare's Recipe API, enabling users to easily evaluate models across different client sites without sharing data.

Key Changes:

  • Added add_cross_site_evaluation() utility function to nvflare/recipe/utils.py for programmatically enabling cross-site model evaluation
  • Added cross-site evaluation support to the PyTorch hello-world example with a --cross_site_eval command-line flag
  • Unified the NumPy cross-validation example into a single job.py with two modes: standalone CSE and training+CSE

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 24 comments.

Show a summary per file
File Description
nvflare/recipe/utils.py Adds add_cross_site_evaluation() utility and MODEL_LOCATOR_REGISTRY for PyTorch and NumPy
examples/hello-world/hello-pt/job.py Adds cross-site evaluation support via --cross_site_eval flag and --train_script parameter
examples/hello-world/hello-pt/README.md Updates documentation with cross-site evaluation instructions and usage examples
examples/hello-world/hello-numpy-cross-val/job.py Replaces multiple job scripts with unified implementation supporting pretrained and training modes
examples/hello-world/hello-numpy-cross-val/client.py Adds NumPy training client script with training and evaluation functions
examples/hello-world/hello-numpy-cross-val/generate_pretrain_models.py Updates comment to clarify cross-site evaluation terminology
examples/hello-world/hello-numpy-cross-val/README.md Comprehensive rewrite documenting both CSE modes and Recipe API usage patterns
examples/hello-world/hello-numpy-cross-val/job_train_and_cse.py Deleted - functionality consolidated into unified job.py
examples/hello-world/hello-numpy-cross-val/job_cse.py Deleted - functionality consolidated into unified job.py

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. nvflare/recipe/utils.py, line 107-108 (link)

    logic: comp_ids doesn't exist on plain FedJob objects - only on BaseFedJob. NumpyFedAvgRecipe uses plain FedJob which will cause AttributeError at runtime when calling add_cross_site_evaluation with persistor_id=None.

9 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@nvkevlu
Copy link
Collaborator Author

nvkevlu commented Dec 30, 2025

/build

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (2)

  1. nvflare/recipe/utils.py, line 39-50 (link)

    style: CLIENT_MODEL_DIR constant is defined but never used in the PR. The run_pretrained_cse() function in job.py references CLIENT_MODEL_DIR directly without using this registry value.

  2. examples/hello-world/hello-numpy-cross-val/job.py, line 38-39 (link)

    style: CLIENT_MODEL_DIR is defined but never used anywhere in the code.

9 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (2)

  1. nvflare/recipe/utils.py, line 79 (link)

    style: parameter should have type annotation for consistency with other parameters

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  2. examples/hello-world/hello-numpy-cross-val/README.md, line 140 (link)

    syntax: Missing import statement for SimEnv

9 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. examples/hello-world/hello-numpy-cross-val/job.py, line 39 (link)

    style: CLIENT_MODEL_DIR is defined but never used in the code

9 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@nvkevlu
Copy link
Collaborator Author

nvkevlu commented Jan 6, 2026

/build

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (5)

  1. nvflare/app_common/np/recipes/__init__.py, line 15 (link)

    syntax: NumpyCrossSiteEvalRecipe is imported but the implementation file cross_site_eval.py doesn't exist in this PR, which will cause ImportError at runtime.

  2. nvflare/app_common/np/recipes/__init__.py, line 18 (link)

    syntax: NumpyCrossSiteEvalRecipe is exported but doesn't exist, causing runtime errors.

  3. examples/hello-world/hello-numpy-cross-val/job.py, line 29 (link)

    syntax: NumpyCrossSiteEvalRecipe is imported but doesn't exist in nvflare/app_common/np/recipes/, causing ImportError.

    Either add the missing cross_site_eval.py implementation file or use manual job construction as shown in the deleted job_cse.py file.

  4. examples/hello-world/hello-numpy-cross-val/README.md, line 107-109 (link)

    style: Documentation states to "add cross-site validation with one line" but the code example requires 3 additional lines (importing NPValidator, creating it, and adding it to clients) before calling add_cross_site_evaluation(). Update to clarify the NumPy-specific setup required.

  5. nvflare/recipe/utils.py, line 166 (link)

    style: formatter_id parameter missing from CrossSiteModelEval. The old implementation (deleted job_cse.py) included NPFormatter. Verify if formatter is needed for numpy models.

10 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (3)

  1. nvflare/app_common/np/recipes/__init__.py, line 15 (link)

    syntax: the file cross_site_eval.py doesn't exist in the nvflare/app_common/np/recipes/ directory, causing import failure

  2. nvflare/app_common/np/recipes/__init__.py, line 18 (link)

    syntax: remove NumpyCrossSiteEvalRecipe since the class doesn't exist

  3. examples/hello-world/hello-numpy-cross-val/job.py, line 29 (link)

    logic: NumpyCrossSiteEvalRecipe doesn't exist, will cause runtime error when using --mode pretrained. Is this recipe defined in PR #3695 or should it be added in this PR?

10 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@nvkevlu
Copy link
Collaborator Author

nvkevlu commented Jan 7, 2026

/build

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. examples/hello-world/hello-numpy-cross-val/client.py, line 1 (link)

    syntax: copyright year should be 2025, not 2026

13 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@YuanTingHsieh
Copy link
Collaborator

Errors in /home/runner/work/NVFlare/NVFlare/examples/hello-world/hello-numpy-cross-val/README.md

@nvkevlu
Copy link
Collaborator Author

nvkevlu commented Jan 7, 2026

thanks for catching that, should be fixed now

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@nvkevlu
Copy link
Collaborator Author

nvkevlu commented Jan 7, 2026

/build

Copy link
Collaborator

@holgerroth holgerroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds cross-site evaluation (CSE) utilities and examples to NVFlare, building on previous work in PR #3695. The implementation introduces a new NumpyCrossSiteEvalRecipe for standalone CSE workflows and an add_cross_site_evaluation() utility function to add CSE to existing training recipes.

Key Changes:

  • Added NumpyCrossSiteEvalRecipe class for standalone cross-site evaluation with pre-trained NumPy models
  • Added add_cross_site_evaluation() utility function in nvflare/recipe/utils.py to augment any recipe with CSE capabilities
  • Created comprehensive examples for both NumPy and PyTorch showing standalone CSE and training+CSE workflows
  • Updated hello-pt example to support CSE via --cross_site_eval flag
  • PyTorch client now properly handles evaluation-only tasks with flare.is_evaluate() branch
  • Updated documentation and Jupyter notebooks with CSE usage examples

Minor Issues Found:

  • Copyright year inconsistency in client.py (2026 vs 2025 in other files)
  • API consistency: add_cross_site_evaluation not exported from nvflare.recipe.__init__.py while add_experiment_tracking is

The implementation follows good architectural patterns with proper separation of concerns, clear documentation, and safe model loading (allow_pickle=False). The utility function design is extensible with a registry pattern for different model locator types.

Confidence Score: 4/5

  • This PR is safe to merge with only minor style improvements recommended
  • The code is well-architected with proper error handling, security practices (no pickle), and comprehensive documentation. The only issues found are minor style inconsistencies (copyright year typo and import export pattern). Core functionality is sound with proper separation between training and validation workflows, safe fallback logic for missing attributes, and clear API patterns.
  • examples/hello-world/hello-numpy-cross-val/client.py needs copyright year correction from 2026 to 2025. Consider exporting add_cross_site_evaluation from nvflare/recipe/__init__.py for consistency.

Important Files Changed

File Analysis

Filename Score Overview
nvflare/app_common/np/recipes/cross_site_eval.py 5/5 New recipe class for standalone cross-site evaluation with NumPy models. Well-structured with clear docstrings and proper component initialization.
nvflare/recipe/utils.py 4/5 Added add_cross_site_evaluation utility function with model locator registry. Good abstraction, but could improve API consistency by exporting from init.py.
examples/hello-world/hello-numpy-cross-val/client.py 4/5 Training script for NumPy cross-site eval example. Has copyright year typo (2026 should be 2025).
examples/hello-world/hello-numpy-cross-val/job.py 5/5 Unified script supporting both standalone CSE and training+CSE modes. Clean implementation following recipe API patterns.
examples/hello-world/hello-pt/job.py 5/5 Added cross-site evaluation support via command-line flag. Clean integration with existing recipe.
examples/hello-world/hello-pt/client.py 5/5 Added is_evaluate() branch for cross-site evaluation tasks. Proper separation of training and evaluation logic.

Sequence Diagram

sequenceDiagram
    participant User
    participant Recipe
    participant Server
    participant ModelLocator
    participant Client1
    participant Client2
    
    Note over User,Client2: Cross-Site Evaluation Workflow
    
    User->>Recipe: add_cross_site_evaluation(recipe, model_locator_type)
    Recipe->>Server: Add ModelLocator component
    Recipe->>Server: Add CrossSiteModelEval controller
    Recipe->>Server: Add ValidationJsonGenerator
    
    Note over User,Client2: Job Execution Starts
    
    Server->>ModelLocator: get_model_names()
    ModelLocator-->>Server: List of model names
    
    loop For each model
        Server->>ModelLocator: locate_model(model_name)
        ModelLocator-->>Server: Model DXO
        Server->>Server: Save model to cross_val_models_dir
    end
    
    Note over Server,Client2: Distribute models for validation
    
    Server->>Client1: Send TASK_VALIDATION (model_1)
    Server->>Client2: Send TASK_VALIDATION (model_1)
    
    Client1->>Client1: NPValidator.execute() or is_evaluate()
    Client1->>Client1: Validate model on local data
    Client1-->>Server: Return validation metrics
    
    Client2->>Client2: NPValidator.execute() or is_evaluate()
    Client2->>Client2: Validate model on local data
    Client2-->>Server: Return validation metrics
    
    Server->>Server: Collect results in validation matrix
    
    Note over Server,Client2: Repeat for all models
    
    Server->>Server: ValidationJsonGenerator saves cross_val_results.json
    Server-->>User: Job complete with validation matrix
Loading

@@ -0,0 +1,93 @@
# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copyright year should be 2025, not 2026 (inconsistent with all other files in this PR which use 2025)

Comment on lines +79 to +86
def add_cross_site_evaluation(
recipe: Recipe,
model_locator_type: str = "pytorch",
model_locator_config: Optional[dict] = None,
persistor_id: Optional[str] = None,
submit_model_timeout: int = 600,
validation_timeout: int = 6000,
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider exporting add_cross_site_evaluation from nvflare.recipe.__init__.py for consistency with add_experiment_tracking (which is exported). This would make the API more consistent and allow users to import both utilities from the same location: from nvflare.recipe import add_experiment_tracking, add_cross_site_evaluation

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@nvkevlu nvkevlu enabled auto-merge (squash) January 8, 2026 23:22
@nvkevlu
Copy link
Collaborator Author

nvkevlu commented Jan 8, 2026

/build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants