Skip to content

Code for general single cell benchmarking#969

Merged
polinabinder1 merged 32 commits into
mainfrom
pbinder/scbenchmark
Aug 2, 2025
Merged

Code for general single cell benchmarking#969
polinabinder1 merged 32 commits into
mainfrom
pbinder/scbenchmark

Conversation

@polinabinder1
Copy link
Copy Markdown
Collaborator

@polinabinder1 polinabinder1 commented Jul 2, 2025

Description

This a simple, flexible framework for benchmarking any dataloader without requiring inheritance or modifications to your existing code. It works with any iterable dataloader. It measures dataset/ dataloader instantiation time and memory, along with iteration memory and speed. It can be easily run with multiple dataloaders to enable comparisons. The results are printed and exported into a csv.

Type of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactor
  • Documentation update
  • Other (please describe):

Usage

import anndata as ad
from anndata.experimental import AnnCollection, AnnLoader
from bionemo.scspeedtest.benchmark import benchmark_single_dataloader, print_results
import numpy as np
filepath = "cellxgene_example_25k.h5ad"

#create a dataloader factory. This returns anndata in a dense format.
def anndata_factory(input_path, batch_size = 64):
    def factory():
        dataset = ad.read_h5ad(input_path)
        return AnnLoader(dataset, num_workers = 0,
        collate_fn = lambda batch: np.vstack([x.X for x in batch]))
    return factory

#benchmark the dataloader
result = benchmark_single_dataloader(
    dataloader_factory=anndata_factory(filepath),
    data_path=filepath,
    name="AnnLoader",
    max_time_seconds = 10
)

print_results(result)

Pre-submit Checklist

  • I have tested these changes locally
  • I have updated the documentation accordingly
  • I have added/updated tests as needed
  • All existing tests pass successfully

Signed-off-by: Polina Binder <pbinder@nvidia.com>
Signed-off-by: Polina Binder <pbinder@nvidia.com>
Signed-off-by: Polina Binder <pbinder@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jul 2, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Comment thread sub-packages/bionemo-scbenchmark/src/bionemo/scbenchmark/common.py Outdated
Comment thread sub-packages/bionemo-scbenchmark/README.md Outdated
Comment thread sub-packages/bionemo-scbenchmark/README.md Outdated
Comment thread sub-packages/bionemo-scbenchmark/README.md Outdated
Signed-off-by: Polina Binder <pbinder@nvidia.com>
Signed-off-by: Polina Binder <pbinder@nvidia.com>
Signed-off-by: Polina Binder <pbinder@nvidia.com>
Signed-off-by: Polina Binder <pbinder@nvidia.com>
Comment thread sub-packages/bionemo-scbenchmark/README.md Outdated
Comment thread sub-packages/bionemo-scbenchmark/README.md Outdated
Comment thread sub-packages/bionemo-scbenchmark/README.md
Comment thread sub-packages/bionemo-scbenchmark/README.md
Comment thread sub-packages/bionemo-scbenchmark/examples/comprehensive_benchmarking.py Outdated
Comment thread sub-packages/bionemo-scbenchmark/examples/comprehensive_benchmarking.py Outdated
Comment thread sub-packages/bionemo-scbenchmark/examples/comprehensive_benchmarking.py Outdated
Comment thread sub-packages/bionemo-scbenchmark/src/bionemo/scbenchmark/common.py
Comment thread sub-packages/bionemo-scbenchmark/tests/bionemo/scbenchmark/test_benchmark.py Outdated
Signed-off-by: Polina Binder <pbinder@nvidia.com>
Copy link
Copy Markdown
Collaborator

@skothenhill-nv skothenhill-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

Signed-off-by: Polina Binder <pbinder@nvidia.com>
Signed-off-by: polinabinder1 <pbinder@nvidia.com>
Signed-off-by: polinabinder1 <pbinder@nvidia.com>
@polinabinder1
Copy link
Copy Markdown
Collaborator Author

/ok to test 878f347

@polinabinder1 polinabinder1 enabled auto-merge July 31, 2025 23:29
@polinabinder1
Copy link
Copy Markdown
Collaborator Author

/ok to test 878f347

@polinabinder1
Copy link
Copy Markdown
Collaborator Author

/ok to test 594dea5

Signed-off-by: polinabinder1 <pbinder@nvidia.com>
@polinabinder1
Copy link
Copy Markdown
Collaborator Author

/ok to test ce04962

Signed-off-by: polinabinder1 <pbinder@nvidia.com>
@polinabinder1
Copy link
Copy Markdown
Collaborator Author

/ok to test 69bf73d

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Aug 1, 2025

Codecov Report

❌ Patch coverage is 50.54945% with 180 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.16%. Comparing base (58edf11) to head (69bf73d).
⚠️ Report is 342 commits behind head on main.

Files with missing lines Patch % Lines
...nemo-scspeedtest/src/bionemo/scspeedtest/common.py 43.25% 101 Missing ⚠️
...o-scspeedtest/src/bionemo/scspeedtest/benchmark.py 56.83% 79 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #969      +/-   ##
==========================================
- Coverage   83.34%   82.16%   -1.18%     
==========================================
  Files         148      151       +3     
  Lines        9766    10130     +364     
==========================================
+ Hits         8139     8323     +184     
- Misses       1627     1807     +180     
Files with missing lines Coverage Δ
...mo-scspeedtest/src/bionemo/scspeedtest/__init__.py 100.00% <100.00%> (ø)
...o-scspeedtest/src/bionemo/scspeedtest/benchmark.py 56.83% <56.83%> (ø)
...nemo-scspeedtest/src/bionemo/scspeedtest/common.py 43.25% <43.25%> (ø)
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@polinabinder1 polinabinder1 added this pull request to the merge queue Aug 1, 2025
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Aug 1, 2025
@polinabinder1 polinabinder1 added this pull request to the merge queue Aug 1, 2025
@polinabinder1 polinabinder1 removed this pull request from the merge queue due to a manual request Aug 1, 2025
@polinabinder1 polinabinder1 added this pull request to the merge queue Aug 1, 2025
Merged via the queue into main with commit 267eb51 Aug 2, 2025
14 checks passed
@polinabinder1 polinabinder1 deleted the pbinder/scbenchmark branch August 2, 2025 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants