Bug Fix for SingleCellMemap Concat Function #339

savitha-eng · 2024-10-22T06:47:48Z

Summary

This PR contains a bug fix for the row index creation in the concat function of the SingleCellMemap Dataset. This bug occurs in the use case when a user is collating multiple h5ad files into a SingleCellMemmap dataset via a SingleCellCollection/the flatten function.

Details

Previously, the concat function was initializing the number of elements in the cumulative row index incorrectly for more than one dataset since it was incrementing the cumulative elements using the number of rows rather than the number of non zero elements thus far. This resulted in cases where the row index was no longer accurate (and no longer monotonically increasing), and thus the gene data retrieved by using these row pointers via functions like scmemmap.get_row was incorrect.

Usage

There are no changes to how the user interacts with the changed code.

Testing

We add a regression unit test for the specific case on the val data that was failing previously.
Tests for these changes can be run via:

pytest -v /workspace/bionemo2/sub-packages/bionemo-scdl/tests/bionemo/scdl/io/test_single_cell_collection.py::test_sc_concat_in_flatten_cellxval

Did you review the Before your PR is "Ready for review" section before asking for review?
Did you make sure your changes have tests? Did you test your changes locally?
Can you add the SKIP_CI label to your PR?
Can you add the PYTEST_NOT_REQUIRED label to your PR?
Can you add the JET_NOT_REQUIRED label to your PR?

…iple datasets; added unit test that checks for regression

…w-ea into savitha/scdl-concat-fix

malcolmgreaves

Great PR commit message, nice fix, and great tests! 👏 LGTM

savitha-eng · 2024-10-24T22:55:17Z

/build-ci

polinabinder1 · 2024-10-25T22:59:27Z

/build-ci

savitha-eng · 2024-10-28T17:42:29Z

/build-ci

Bug fix for row index creation in concat function when collating mult…

056dfca

…iple datasets; added unit test that checks for regression

savitha-eng marked this pull request as ready for review October 22, 2024 06:48

savitha-eng requested review from jomitchellnv, jstjohn, malcolmgreaves, polinabinder1 and skothenhill-nv as code owners October 22, 2024 06:48

savitha-eng and others added 3 commits October 21, 2024 23:48

Merge branch 'main' into savitha/scdl-concat-fix

95b1bd8

Installed precommit and ran on files to fix formatting issues

0237a4f

Merge branch 'savitha/scdl-concat-fix' of github.com:NVIDIA/bionemo-f…

cb2317c

…w-ea into savitha/scdl-concat-fix

malcolmgreaves assigned malcolmgreaves and savitha-eng and unassigned malcolmgreaves Oct 22, 2024

malcolmgreaves approved these changes Oct 22, 2024

View reviewed changes

polinabinder1 approved these changes Oct 25, 2024

View reviewed changes

Merge branch 'main' into savitha/scdl-concat-fix

1c7afd3

polinabinder1 enabled auto-merge (squash) October 25, 2024 23:01

Merge branch 'main' into savitha/scdl-concat-fix

e6117f7

polinabinder1 merged commit 4774510 into main Oct 28, 2024

polinabinder1 deleted the savitha/scdl-concat-fix branch October 28, 2024 18:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug Fix for SingleCellMemap Concat Function #339

Bug Fix for SingleCellMemap Concat Function #339

Uh oh!

savitha-eng commented Oct 22, 2024

Uh oh!

malcolmgreaves left a comment

Uh oh!

savitha-eng commented Oct 24, 2024

Uh oh!

polinabinder1 commented Oct 25, 2024

Uh oh!

savitha-eng commented Oct 28, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Bug Fix for SingleCellMemap Concat Function #339

Bug Fix for SingleCellMemap Concat Function #339

Uh oh!

Conversation

savitha-eng commented Oct 22, 2024

Summary

Details

Usage

Testing

Uh oh!

malcolmgreaves left a comment

Choose a reason for hiding this comment

Uh oh!

savitha-eng commented Oct 24, 2024

Uh oh!

polinabinder1 commented Oct 25, 2024

Uh oh!

savitha-eng commented Oct 28, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants