Skip to content

Cherry pick 4181 to main#4201

Merged
nvkevlu merged 24 commits intoNVIDIA:mainfrom
nvkevlu:cherry_pick_4011_4168_4173_to_main
Mar 9, 2026
Merged

Cherry pick 4181 to main#4201
nvkevlu merged 24 commits intoNVIDIA:mainfrom
nvkevlu:cherry_pick_4011_4168_4173_to_main

Conversation

@nvkevlu
Copy link
Copy Markdown
Collaborator

@nvkevlu nvkevlu commented Feb 18, 2026

Cherry pick 4181 to main.

Description

Cherry pick 4181 to main.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Quick tests passed locally by running ./runtest.sh.
  • In-line docstrings updated.
  • Documentation updated.

Copilot AI review requested due to automatic review settings February 18, 2026 17:26
@nvkevlu nvkevlu changed the title Cherry pick 4168 to main Cherry pick 4181 to main Feb 18, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is a cherry-pick from PR #4168 to main, enhancing the hello-numpy-cross-val example with improved client-side error handling and fail-fast behavior for cross-site evaluation workflows.

Changes:

  • Added comprehensive unit tests for the hello-numpy-cross-val client script with mock-based testing approach
  • Enhanced base_model_controller.py to conditionally set fl_context properties only when not already present
  • Refactored hello-numpy-cross-val client with explicit task type handling (train/evaluate/submit_model) and fail-fast error checking

Reviewed changes

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

File Description
tests/unit_test/recipe/hello_numpy_cross_val_client_test.py New comprehensive unit tests for client helper functions and fail-fast behavior using pytest and mock objects
nvflare/app_common/workflows/base_model_controller.py Modified set_fl_context to only set CURRENT_ROUND and NUM_ROUNDS when not already present, preventing overwrite of workflow-set values
examples/hello-world/hello-numpy-cross-val/client.py Refactored to explicitly handle train/evaluate/submit_model tasks with proper error handling and fail-fast checks for missing parameters

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Feb 18, 2026

Greptile Summary

This cherry-pick of #4181 to main makes two targeted improvements:

  • base_model_controller.pyset_fl_context refactor: Replaces the if data and data.X is not None guard with an early return when data is None, then uses a plain if data.X is not None check for each field. This correctly binds the else debug branches so they only fire when the field is absent on a valid FLModel, not when data itself is None. The underlying set_prop(... private=True, sticky=True) calls are functionally unchanged from the base branch; set_prop already handles re-setting an existing prop with the same mask by updating both the local dict and the sticker store.
  • fl_context.py – lock-ordering comment: Adds inline documentation above _update_lock noting that the module-level lock must always be acquired before FLContextManager._update_lock. This matches the existing implementation in set_prop and prevents future contributors from accidentally introducing a deadlock.

Key points:

  • The simpler set_prop-direct approach correctly updates CURRENT_ROUND and NUM_ROUNDS each round because set_prop falls through to update self.props[key] whenever the existing mask matches the new mask.
  • No update_prop_value helper is used; the prior complex-guard approach discussed in earlier review threads has been dropped in favour of this simpler path.
  • The only actionable gap is that the bool return value of set_prop is not checked, so a mask-mismatch failure would only be visible in set_prop's own internal warning log.

Confidence Score: 4/5

  • Safe to merge with one minor style concern about unchecked set_prop return values.
  • The functional set_prop calls are unchanged from the base branch and are correct — set_prop updates both the local props dict and the sticker store when the mask matches. The refactoring improves null-guard correctness and adds useful lock-ordering documentation. The only gap is that set_prop return values are not checked, which could mask a mask-mismatch failure silently; this is a style concern rather than a blocking defect.
  • No files require special attention; base_model_controller.py warrants a quick look at the set_prop return-value handling.

Important Files Changed

Filename Overview
nvflare/apis/fl_context.py Only adds a lock-ordering comment above _update_lock; no functional code changes. The documented ordering (module-level _update_lock before FLContextManager._update_lock) matches the existing set_prop implementation.
nvflare/app_common/workflows/base_model_controller.py Refactors set_fl_context to use an early-return guard for None data, properly binds the else debug branches, and adds a clarifying docstring. The set_prop calls themselves are unchanged and will correctly update an existing prop when the same (private, sticky) mask is reused. Minor gap: set_prop's bool return value is not checked.

Sequence Diagram

sequenceDiagram
    participant WF as FedAvg / Workflow
    participant BMC as BaseModelController
    participant FLC as FLContext (self.fl_ctx)
    participant FCM as FLContextManager (sticker store)

    WF->>BMC: broadcast_model(data, round=N)
    BMC->>BMC: set_fl_context(data)
    alt data is None
        BMC-->>BMC: early return
    else data.current_round is not None
        BMC->>FLC: set_prop(CURRENT_ROUND, N, private=True, sticky=True)
        FLC->>FCM: update_sticker(CURRENT_ROUND, N, mask)
        FCM-->>FLC: sticker updated
        FLC-->>BMC: True (updated local props + sticker)
    else data.current_round is None
        BMC->>BMC: debug("no current_round in FLModel")
    end
    alt data.total_rounds is not None
        BMC->>FLC: set_prop(NUM_ROUNDS, T, private=True, sticky=True)
        FLC->>FCM: update_sticker(NUM_ROUNDS, T, mask)
        FCM-->>FLC: sticker updated
    end
    BMC->>BMC: broadcast_and_wait(fl_ctx=self.fl_ctx)
    Note over BMC,FLC: _process_result callback (per client response)
    BMC->>FLC: self.fl_ctx = result_fl_ctx
    BMC->>FLC: set_prop(CURRENT_ROUND, N, sticky=True)  [from task header]
Loading

Comments Outside Diff (1)

  1. nvflare/app_common/workflows/base_model_controller.py, line 396-401 (link)

    set_prop return value silently ignored

    set_prop returns False and logs an internal warning when the key already exists with different (private, sticky) attributes (mask mismatch). If that ever happens here — e.g. because another component previously registered CURRENT_ROUND as private=False or sticky=False — the round number will silently stay stale for the entire training run with no error surfaced at the call site.

    Since the same issue also applies to the NUM_ROUNDS block below, consider checking both return values and emitting an explicit error if either fails:

            if data.current_round is not None:
                if not self.fl_ctx.set_prop(
                    AppConstants.CURRENT_ROUND,
                    data.current_round,
                    private=True,
                    sticky=True,
                ):
                    self.error(
                        f"Failed to update CURRENT_ROUND in fl_ctx to {data.current_round}; "
                        "prop may exist with incompatible attributes."
                    )
            else:
                self.debug("The FLModel data does not contain the current_round information.")

Last reviewed commit: da2805a

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment thread nvflare/app_common/workflows/base_model_controller.py Outdated
@nvkevlu nvkevlu force-pushed the cherry_pick_4011_4168_4173_to_main branch from b18a8bb to 2cc7f1a Compare February 18, 2026 22:18
@nvkevlu
Copy link
Copy Markdown
Collaborator Author

nvkevlu commented Feb 18, 2026

/build

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

@nvkevlu nvkevlu mentioned this pull request Feb 18, 2026
6 tasks
pcnudde
pcnudde previously approved these changes Feb 19, 2026
Copy link
Copy Markdown
Collaborator

@pcnudde pcnudde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused with the other PR into 2.7

@pcnudde pcnudde self-requested a review February 19, 2026 18:36
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

@nvkevlu
Copy link
Copy Markdown
Collaborator Author

nvkevlu commented Feb 19, 2026

The other PR into 2.7 was created to match this one, however, there were further comments from copilot that resulted in additional updates. This one actually needs to be further updated to match those changes to have the right behavior: get_prop_detail() + reuse existing (private, sticky) when updating, so the value is updated every round and set_prop() never fails or warns.

@nvkevlu
Copy link
Copy Markdown
Collaborator Author

nvkevlu commented Feb 19, 2026

/build

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

chesterxgchen added a commit that referenced this pull request Feb 20, 2026
Update to match fix in main in #4201 

### Description

Updates nvflare/app_common/workflows/base_model_controller.py to match
#4201 after further changes from greptile comments.

### Types of changes
<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Non-breaking change (fix or new feature that would not break
existing functionality).
- [ ] Breaking change (fix or new feature that would cause existing
functionality to change).
- [ ] New tests added to cover the changes.
- [ ] Quick tests passed locally by running `./runtest.sh`.
- [ ] In-line docstrings updated.
- [ ] Documentation updated.

---------

Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>
@nvkevlu
Copy link
Copy Markdown
Collaborator Author

nvkevlu commented Mar 5, 2026

/build

Comment thread nvflare/apis/fl_context.py Outdated
Comment thread nvflare/app_common/workflows/base_model_controller.py Outdated
@nvkevlu
Copy link
Copy Markdown
Collaborator Author

nvkevlu commented Mar 5, 2026

/build

@nvkevlu nvkevlu requested a review from YuanTingHsieh March 5, 2026 18:20
Comment thread nvflare/app_common/workflows/base_model_controller.py Outdated
Comment thread nvflare/app_common/workflows/base_model_controller.py Outdated
Comment thread nvflare/apis/fl_context.py Outdated
Comment thread nvflare/app_common/workflows/base_model_controller.py Outdated
Comment thread nvflare/app_common/workflows/base_model_controller.py
Comment thread nvflare/apis/fl_context.py Outdated
@nvkevlu
Copy link
Copy Markdown
Collaborator Author

nvkevlu commented Mar 6, 2026

/build

Comment thread nvflare/app_common/workflows/base_model_controller.py
@nvkevlu nvkevlu requested a review from YuanTingHsieh March 6, 2026 19:20
@nvkevlu
Copy link
Copy Markdown
Collaborator Author

nvkevlu commented Mar 6, 2026

/build

@nvkevlu
Copy link
Copy Markdown
Collaborator Author

nvkevlu commented Mar 6, 2026

/build

@nvkevlu nvkevlu enabled auto-merge (squash) March 6, 2026 21:36
Comment thread nvflare/app_common/workflows/base_model_controller.py
@nvkevlu nvkevlu merged commit d200f9a into NVIDIA:main Mar 9, 2026
19 checks passed
YuanTingHsieh pushed a commit to YuanTingHsieh/NVFlare that referenced this pull request Mar 20, 2026
Update to match fix in main in NVIDIA#4201

Updates nvflare/app_common/workflows/base_model_controller.py to match

<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Non-breaking change (fix or new feature that would not break
existing functionality).
- [ ] Breaking change (fix or new feature that would cause existing
functionality to change).
- [ ] New tests added to cover the changes.
- [ ] Quick tests passed locally by running `./runtest.sh`.
- [ ] In-line docstrings updated.
- [ ] Documentation updated.

---------

Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>
YuanTingHsieh pushed a commit to YuanTingHsieh/NVFlare that referenced this pull request Mar 20, 2026
Update to match fix in main in NVIDIA#4201

Updates nvflare/app_common/workflows/base_model_controller.py to match

<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Non-breaking change (fix or new feature that would not break
existing functionality).
- [ ] Breaking change (fix or new feature that would cause existing
functionality to change).
- [ ] New tests added to cover the changes.
- [ ] Quick tests passed locally by running `./runtest.sh`.
- [ ] In-line docstrings updated.
- [ ] Documentation updated.

---------

Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>
YuanTingHsieh pushed a commit to YuanTingHsieh/NVFlare that referenced this pull request Mar 23, 2026
Update to match fix in main in NVIDIA#4201

Updates nvflare/app_common/workflows/base_model_controller.py to match

<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Non-breaking change (fix or new feature that would not break
existing functionality).
- [ ] Breaking change (fix or new feature that would cause existing
functionality to change).
- [ ] New tests added to cover the changes.
- [ ] Quick tests passed locally by running `./runtest.sh`.
- [ ] In-line docstrings updated.
- [ ] Documentation updated.

---------

Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>
YuanTingHsieh pushed a commit to YuanTingHsieh/NVFlare that referenced this pull request Mar 23, 2026
Update to match fix in main in NVIDIA#4201

Updates nvflare/app_common/workflows/base_model_controller.py to match

<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Non-breaking change (fix or new feature that would not break
existing functionality).
- [ ] Breaking change (fix or new feature that would cause existing
functionality to change).
- [ ] New tests added to cover the changes.
- [ ] Quick tests passed locally by running `./runtest.sh`.
- [ ] In-line docstrings updated.
- [ ] Documentation updated.

---------

Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>
chesterxgchen added a commit to chesterxgchen/NVFlare that referenced this pull request Mar 29, 2026
Update to match fix in main in NVIDIA#4201 

### Description

Updates nvflare/app_common/workflows/base_model_controller.py to match
NVIDIA#4201 after further changes from greptile comments.

### Types of changes
<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Non-breaking change (fix or new feature that would not break
existing functionality).
- [ ] Breaking change (fix or new feature that would cause existing
functionality to change).
- [ ] New tests added to cover the changes.
- [ ] Quick tests passed locally by running `./runtest.sh`.
- [ ] In-line docstrings updated.
- [ ] Documentation updated.

---------

Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>
chesterxgchen added a commit to chesterxgchen/NVFlare that referenced this pull request Mar 29, 2026
Update to match fix in main in NVIDIA#4201 

### Description

Updates nvflare/app_common/workflows/base_model_controller.py to match
NVIDIA#4201 after further changes from greptile comments.

### Types of changes
<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Non-breaking change (fix or new feature that would not break
existing functionality).
- [ ] Breaking change (fix or new feature that would cause existing
functionality to change).
- [ ] New tests added to cover the changes.
- [ ] Quick tests passed locally by running `./runtest.sh`.
- [ ] In-line docstrings updated.
- [ ] Documentation updated.

---------

Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants