BioNeMo Conversion to Recipes by holgerroth · Pull Request #4001 · NVIDIA/NVFlare

holgerroth · 2026-01-21T20:01:50Z

Fixes # .

Description

Cherry-pick #3943 and #3982

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Quick tests passed locally by running ./runtest.sh.
In-line docstrings updated.
Documentation updated.

holgerroth · 2026-01-21T20:02:05Z

/build

greptile-apps · 2026-01-21T20:06:51Z

Greptile Summary

This PR cherry-picks #3982 to add BioNeMo Task Fitting with PyTorch. The changes replace the previous BioNeMo implementation with a cleaner PyTorch-based approach for federated protein embeddings and MLP training.

Major Changes:

Replaced custom BioNeMo launchers and learners with PyTorch-based implementations using NVFlare's FedAvgRecipe
Added new MLP training client (task_fitting/job_fedavg/client.py) with proper federated learning workflow
Added inference job (task_fitting/job_inference/) for ESM2 embedding extraction
Refactored downstream examples (SAbDab, TAP, SCL) to use new recipe-based approach
Removed legacy implementation files (bionemo_mlp_learner.py, bionemo_mlp_model_persistor.py, etc.)
Updated notebooks and documentation to reflect PyTorch workflow

Key Improvements:

Cleaner separation between inference (embedding extraction) and training (MLP classification)
Better integration with NVFlare's client API using flare.init(), flare.receive(), flare.send()
Support for both federated and local training modes via SIM_LOCAL environment variable
Added TensorBoard integration for experiment tracking

Issues Found:

Potential division by zero in evaluate_model function when dataloader is empty (critical fix needed)
Interactive input() in job configuration blocks automation (previously noted)

Confidence Score: 4/5

This PR is generally safe to merge with one critical division by zero bug that needs fixing
The PR successfully refactors BioNeMo examples to use PyTorch-based federated learning with cleaner architecture. Code quality is good with proper error handling, comprehensive documentation, and well-structured examples. However, there's a division by zero bug in the evaluation function that could cause runtime errors if an empty dataloader is passed. The interactive input() was already noted in previous reviews. The hardcoded /tmp paths are acceptable for examples. Overall the refactoring improves maintainability and follows NVFlare best practices.
Pay close attention to examples/advanced/bionemo/task_fitting/job_fedavg/client.py which has a division by zero bug that needs fixing before merge

Important Files Changed

Filename	Overview
examples/advanced/bionemo/task_fitting/job_fedavg/client.py	New PyTorch-based federated MLP training client with proper error handling and metrics tracking
examples/advanced/bionemo/task_fitting/job_fedavg/job.py	FedAvg recipe configuration with interactive mode selection, uses hardcoded paths
examples/advanced/bionemo/task_fitting/job_inference/client.py	ESM2 inference client using subprocess to run external BioNeMo command
examples/advanced/bionemo/downstream/client.py	BioNeMo ESM2 fine-tuning client with complex training logic, uses os._exit(0) for cleanup
examples/advanced/bionemo/downstream/sabdab/job.py	SAbDab dataset job configuration with custom filters for BioNeMo model parameters

Sequence Diagram

sequenceDiagram
    participant User
    participant InferenceJob as Inference Job (job.py)
    participant InferenceClient as Inference Client
    participant ESM2Model as ESM2 Model
    participant TrainingJob as Training Job (job.py)
    participant TrainingClient as MLP Training Client
    participant Server as FedAvg Server
    
    Note over User,Server: Phase 1: Embedding Extraction
    User->>InferenceJob: Run inference job
    InferenceJob->>Server: Initialize FedAvgRecipe (1 round)
    Server->>InferenceClient: Send inference task
    InferenceClient->>ESM2Model: Run infer_esm2 subprocess
    ESM2Model-->>InferenceClient: Return protein embeddings
    InferenceClient->>InferenceClient: Save embeddings to /tmp/data/.../results
    InferenceClient->>Server: Send metadata (num_sequences, shapes)
    
    Note over User,Server: Phase 2: MLP Training
    User->>TrainingJob: Run training job (select mode)
    TrainingJob->>Server: Initialize FedAvgRecipe (50 rounds)
    
    loop For each round (1-50)
        Server->>TrainingClient: Send global model weights
        TrainingClient->>TrainingClient: Load embeddings and labels
        TrainingClient->>TrainingClient: Evaluate global model
        TrainingClient->>TrainingClient: Train locally for N epochs
        TrainingClient->>TrainingClient: Evaluate trained model
        TrainingClient->>Server: Send updated weights + metrics
        Server->>Server: Aggregate weights from all clients
    end
    
    Server-->>User: Final global model
    User->>User: View results in TensorBoard

greptile-apps

_{11 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

Fixes # . ### Description Convert bionemo examples to use FedAvgRecipe ### Types of changes  - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [x] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. --------- Signed-off-by: Holger Roth <hroth@nvidia.com> Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>

Fixes # . ### Description Switch to PyTorch for task fitting experiments. ### Types of changes  - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated. --------- Signed-off-by: Holger Roth <hroth@nvidia.com> Co-authored-by: root <root@r1u14.cm.cluster>

greptile-apps

_{27 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

holgerroth · 2026-01-21T20:18:54Z

/build

holgerroth requested a review from ZiyueXu77 January 21, 2026 20:01

ZiyueXu77 previously approved these changes Jan 21, 2026

View reviewed changes

greptile-apps Bot reviewed Jan 21, 2026

View reviewed changes

Comment thread examples/advanced/bionemo/task_fitting/job_fedavg/job.py

holgerroth and others added 2 commits January 21, 2026 15:07

holgerroth dismissed ZiyueXu77’s stale review via ebbbeb8 January 21, 2026 20:08

holgerroth force-pushed the bionemo_recipe_update branch from a1d008d to ebbbeb8 Compare January 21, 2026 20:08

holgerroth changed the title ~~BioNeMo Task Fitting with PyTorch~~ BioNeMo Conversion to Recipes Jan 21, 2026

holgerroth requested a review from ZiyueXu77 January 21, 2026 20:09

greptile-apps Bot reviewed Jan 21, 2026

View reviewed changes

Comment thread examples/advanced/bionemo/task_fitting/job_fedavg/client.py

ZiyueXu77 approved these changes Jan 21, 2026

View reviewed changes

holgerroth enabled auto-merge (squash) January 21, 2026 20:18

holgerroth merged commit 120e8e3 into NVIDIA:main Jan 21, 2026
21 checks passed

holgerroth deleted the bionemo_recipe_update branch January 21, 2026 21:53

This was referenced Mar 9, 2026

G-SHARP v0.2 nvidia-holoscan/holohub#1460

Merged

Adds GLOBE 3D + Dual Tree Traversal Accelerations 🌎🌲 NVIDIA/physicsnemo#1481

Closed

underfill NVIDIA/physicsnemo#1496

Closed

Qwen-VL LoRA Support #4277

Merged

This was referenced Apr 2, 2026

[Research] Add fsi-fraud-detection code #4395

Merged

Security/audit fixes 2026 q2 #4422

Closed

greptile-apps Bot mentioned this pull request Apr 10, 2026

Add heterogeneous-rank HLoRA enhancement to MedGemma example #4424

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BioNeMo Conversion to Recipes#4001

BioNeMo Conversion to Recipes#4001
holgerroth merged 2 commits intoNVIDIA:mainfrom
holgerroth:bionemo_recipe_update

holgerroth commented Jan 21, 2026 •

edited

Loading

Uh oh!

holgerroth commented Jan 21, 2026

Uh oh!

greptile-apps Bot commented Jan 21, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot left a comment

Uh oh!

Uh oh!

greptile-apps Bot left a comment

Uh oh!

Uh oh!

holgerroth commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

holgerroth commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Types of changes

Uh oh!

holgerroth commented Jan 21, 2026

Uh oh!

greptile-apps Bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

holgerroth commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

holgerroth commented Jan 21, 2026 •

edited

Loading

greptile-apps Bot commented Jan 21, 2026 •

edited

Loading