Skip to content

Add hello-numpy-robust example with optional median aggregation and poisoned-client simulation#4373

Closed
rwilliamspbg-ops wants to merge 2 commits intoNVIDIA:mainfrom
rwilliamspbg-ops:feat/hello-numpy-robust-aggregation
Closed

Add hello-numpy-robust example with optional median aggregation and poisoned-client simulation#4373
rwilliamspbg-ops wants to merge 2 commits intoNVIDIA:mainfrom
rwilliamspbg-ops:feat/hello-numpy-robust-aggregation

Conversation

@rwilliamspbg-ops
Copy link
Copy Markdown

Summary

  • Add hello-world NumPy robust aggregation example
  • Compare default FedAvg vs median aggregation under poisoned-client simulation
  • Add focused custom-aggregator tests
  • Add docs and register example in hello-world README

Scope

  • Example-level contribution only
  • No core NVFlare API changes

Validation

  • Unit tests pass for custom median aggregator
  • Simulator runs complete for both default and median modes
  • README includes output snapshot showing robustness behavior

Copilot AI review requested due to automatic review settings March 27, 2026 15:22
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 27, 2026

Greptile Summary

This PR adds a hello-numpy-robust example that extends the existing NumPy hello-world recipe with a pluggable MedianAggregator and a simple Byzantine-poisoning simulation, demonstrating the robustness advantage of element-wise median aggregation over vanilla FedAvg. The change is entirely example-level with no modifications to NVFlare core APIs.

Key observations:

  • The MedianAggregator implementation and its tests are clean and correctly demonstrate the expected robustness pattern.
  • Three issues flagged in prior review threads remain unaddressed in the current code: (1) aggregate_model() silently returns an empty FLModel instead of raising when no client models are present; (2) aggregate_model() does not reset internal state per the base-class docstring contract; (3) passing --poison_client_name \"\" to job.py drops the empty-string token from train_args, causing argparse in client.py to mis-parse --poison_scale as the value for --poison_client_name — which breaks the documented "disable poisoning" use case from the README.
  • requirements.txt lists tensorboard as a hard dependency while the README describes TensorBoard tracking as optional (see inline comment).

Confidence Score: 4/5

Merging is blocked by the previously-flagged but still-unresolved P1 defect where --poison_client_name "" breaks argument parsing in the documented disable poisoning example.

Three P1 issues from prior review rounds remain unaddressed in the current code (silent empty return in aggregate_model, missing state reset in aggregate_model, and broken empty-string argument passing in job.py). These are real defects in the example paths documented by the README. Because they are unresolved, a score of 4/5 is appropriate rather than the default 5/5 for P2-only findings.

examples/hello-world/hello-numpy-robust/job.py (empty poison_client_name argument handling) and examples/hello-world/hello-numpy-robust/custom_aggregators.py (aggregate_model silent return and missing state reset).

Important Files Changed

Filename Overview
examples/hello-world/hello-numpy-robust/custom_aggregators.py Implements MedianAggregator correctly for the happy path, but aggregate_model does not reset internal state per the base-class docstring contract, and the empty-input path silently returns an empty FLModel rather than raising — both previously flagged in review threads and still unaddressed.
examples/hello-world/hello-numpy-robust/job.py Orchestrates the simulation cleanly; however, the f-string construction of train_args loses the empty-string token when --poison_client_name "" is passed, breaking the documented "disable poisoning" use case (previously flagged, still unaddressed in current code).
examples/hello-world/hello-numpy-robust/client.py Clean FL client with mock training, optional poisoning injection, and SummaryWriter metrics; logic is straightforward and self-consistent.
examples/hello-world/hello-numpy-robust/tests/test_custom_aggregators.py Covers the three key scenarios (outlier reduction, params_type mismatch, reset); test_reset_stats_clears_state validates the silent-empty-return behaviour that was previously flagged as problematic.
examples/hello-world/hello-numpy-robust/requirements.txt tensorboard listed as a hard dependency while the README describes it as optional; minor inconsistency that forces unnecessary installation.
examples/hello-world/hello-numpy-robust/README.md Good demo documentation with expected output snapshot; the Disable poisoning code sample relies on the empty-string argument-passing behaviour that is currently broken.
examples/hello-world/README.md Adds a concise entry for the new hello-numpy-robust example in the correct section; no issues.

Sequence Diagram

sequenceDiagram
    participant U as User (job.py)
    participant R as NumpyFedAvgRecipe
    participant S as Server (ScatterAndGather)
    participant A as Aggregator (Default or MedianAggregator)
    participant C as Clients (client.py x N)

    U->>R: NumpyFedAvgRecipe(aggregator=...)
    U->>R: recipe.execute(SimEnv)
    loop Each round
        S->>C: Broadcast global model
        C->>C: train(params) → maybe_poison_update()
        C->>S: Send FLModel (FULL or DIFF)
        S->>A: accept_model(client_model) x N
        S->>A: aggregate_model()
        A-->>S: FLModel (median or weighted-avg)
        S->>S: reset() → reset_stats()
    end
    S-->>U: run.get_result() / run.get_status()
Loading

Reviews (2): Last reviewed commit: "Merge branch 'main' into feat/hello-nump..." | Re-trigger Greptile

Comment on lines +42 to +45
def aggregate_model(self) -> FLModel:
if not self._client_models:
self.error("No client models to aggregate.")
return FLModel(params={})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Silent empty return masks aggregation data preparation failures

When _client_models is empty, the method logs an error and returns FLModel(params={}). This silent empty return can propagate downstream — the server persistor or controller receives an empty model dict and may overwrite a previously-good checkpoint with empty params, or silently skip aggregation without surfacing the root cause.

Per the project's error-handling convention, an empty aggregation input is an exceptional state that should fail loudly with a descriptive ValueError:

Suggested change
def aggregate_model(self) -> FLModel:
if not self._client_models:
self.error("No client models to aggregate.")
return FLModel(params={})
def aggregate_model(self) -> FLModel:
if not self._client_models:
raise ValueError(
"MedianAggregator.aggregate_model called with no accepted client models. "
"Ensure at least one client submitted a model before aggregation."
)

This surfaces the problem immediately at the aggregation step rather than letting an empty model silently propagate.

Rule Used: When data_loader might be empty during loss comput... (source)

Learnt From
NVIDIA/NVFlare#4001

Comment on lines +42 to +52
def aggregate_model(self) -> FLModel:
if not self._client_models:
self.error("No client models to aggregate.")
return FLModel(params={})

aggregated_params = {}
for key in self._client_models[0].keys():
stacked = np.stack([m[key] for m in self._client_models], axis=0)
aggregated_params[key] = np.median(stacked, axis=0)

return FLModel(params=aggregated_params, params_type=self._params_type)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 aggregate_model() does not reset internal state

The ModelAggregator base class docstring for aggregate_model states it "needs to implement aggregation logic and reset any internal stats". The current implementation aggregates but leaves _client_models populated, so a second call to aggregate_model() (without an intervening reset_stats()) would re-aggregate the same client updates.

While the framework's ScatterAndGather controller calls reset()reset_stats() between rounds, custom orchestration or direct unit-test usage could hit a double-aggregate bug. Consider clearing state at the end of aggregate_model():

def aggregate_model(self) -> FLModel:
    ...
    result = FLModel(params=aggregated_params, params_type=self._params_type)
    self.reset_stats()   # clear accumulated state per base-class contract
    return result

Comment on lines +63 to +67
train_args = (
f"--update_type {args.update_type} "
f"--poison_client_name {args.poison_client_name} "
f"--poison_scale {args.poison_scale}"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Empty poison_client_name breaks client argument parsing

When --poison_client_name "" is passed to job.py (the documented way to disable poisoning), args.poison_client_name is an empty string. The f-string produces a train_args string where --poison_client_name is followed by nothing before --poison_scale. When the script runner tokenizes that string, --poison_scale becomes the next positional token and argparse in client.py will raise an error because it expects a value for --poison_client_name, not another flag.

The documented "disable poisoning" invocation from the README therefore fails at runtime.

To properly pass an empty string as an argument token, wrap the value with shlex.quote:

import shlex

train_args = (
    f"--update_type {args.update_type} "
    f"--poison_client_name {shlex.quote(args.poison_client_name)} "
    f"--poison_scale {args.poison_scale}"
)

shlex.quote("") produces '', which the shell/tokenizer passes correctly as an empty-string argument to the client.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new hello-world NumPy example demonstrating robustness to poisoned/outlier clients by optionally switching from default FedAvg aggregation to an element-wise median aggregator, with accompanying docs and a small unit test.

Changes:

  • Introduce hello-numpy-robust example (job launcher, client script, custom median aggregator) with optional poisoned-client simulation.
  • Add example README and register the example in examples/hello-world/README.md.
  • Add targeted unit tests for the custom median aggregator.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
examples/hello-world/hello-numpy-robust/custom_aggregators.py Adds MedianAggregator implementation for robust aggregation.
examples/hello-world/hello-numpy-robust/job.py Adds simulator job entrypoint with --aggregator and poisoning controls.
examples/hello-world/hello-numpy-robust/client.py Adds client training loop with optional poisoned-update scaling and metrics logging.
examples/hello-world/hello-numpy-robust/requirements.txt Declares dependencies for running the new example.
examples/hello-world/hello-numpy-robust/README.md Documents running baseline vs median aggregation and poisoning toggles.
examples/hello-world/hello-numpy-robust/tests/test_custom_aggregators.py Adds unit tests for median aggregation behavior and error handling.
examples/hello-world/README.md Registers and documents the new example in the hello-world index.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +18 to +21
from custom_aggregators import MedianAggregator

from nvflare.app_common.abstract.fl_model import FLModel, ParamsType
from nvflare.app_common.np.constants import NPConstants
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test imports MedianAggregator via from custom_aggregators import MedianAggregator, but when running pytest from the repo root (default), the example directory isn’t on sys.path, so this will raise ModuleNotFoundError. Add a small sys.path adjustment in the test (e.g., insert the example root directory), or package the example so the module can be imported reliably under the repo’s test runner.

Copilot uses AI. Check for mistakes.
Disable poisoning:

```bash
python job.py --aggregator median --poison_client_name ""
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Disable poisoning” command uses --poison_client_name "", but job.py forwards args to the client by building a whitespace-split string; an empty value will be dropped, causing --poison_client_name to have no argument and the client script to fail argparse parsing. Update the instructions (and corresponding code) to use a safe disable mechanism (e.g., --poison_client_name none or --no-poison).

Suggested change
python job.py --aggregator median --poison_client_name ""
python job.py --aggregator median --poison_client_name none

Copilot uses AI. Check for mistakes.
Comment on lines +63 to +67
train_args = (
f"--update_type {args.update_type} "
f"--poison_client_name {args.poison_client_name} "
f"--poison_scale {args.poison_scale}"
)
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--poison_client_name is documented as “use empty string to disable”, but train_args is built via string concatenation. When poison_client_name is empty, TaskScriptRunner later splits args on whitespace, producing --poison_client_name --poison_scale ... (missing value) and the client argparse will fail. Use a boolean flag (e.g., --poison/--no-poison) or a sentinel value (e.g., none) and only include --poison_client_name <value> in train_args when a non-empty value is intended; update client defaults accordingly.

Suggested change
train_args = (
f"--update_type {args.update_type} "
f"--poison_client_name {args.poison_client_name} "
f"--poison_scale {args.poison_scale}"
)
train_args_parts = [
f"--update_type {args.update_type}",
f"--poison_scale {args.poison_scale}",
]
# Only include poison_client_name if non-empty; empty string disables poisoning.
if args.poison_client_name != "":
train_args_parts.insert(1, f"--poison_client_name {args.poison_client_name}")
train_args = " ".join(train_args_parts)

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,67 @@
# Hello NumPy Robust Aggregation
Copy link
Copy Markdown
Collaborator

@chesterxgchen chesterxgchen Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the readme.md from hello-pt for readme structure (https://github.com/NVIDIA/NVFlare/blob/main/examples/hello-world/hello-pt/README.md)

installation
data
code structure
client code
server side
Job Recipe
Run Job
Result
etc.

For the code structure,

please follow
client.py
model.py ( if exists)
job.py

format when possible

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added fixes

@chesterxgchen
Copy link
Copy Markdown
Collaborator

@rwilliamspbg-ops Numpy Recipe was a misstep we made, We did not have time to fix in 2.7.2 as it was discovered late close to the release date. As the FedAvg recipe itself should cover Numpy, there should no need for a separate numpyfedavg recipe. We were plan to remove it and fold into FedAvg or simply remove NumPy Example all togather.

If you think your example is important to expand the FedAvg algorithms, please consider add it to the PyTorch FedAvg example such as hello-pt or hello-lightning for pytorch lighnining, or in the examples/advanced examples

@rwilliamspbg-ops
Copy link
Copy Markdown
Author

sounds good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants