Skip to content

Conversation

@savitha-eng
Copy link
Collaborator

Summary

This PR contains a bug fix for the row index creation in the concat function of the SingleCellMemap Dataset. This bug occurs in the use case when a user is collating multiple h5ad files into a SingleCellMemmap dataset via a SingleCellCollection/the flatten function.

Details

Previously, the concat function was initializing the number of elements in the cumulative row index incorrectly for more than one dataset since it was incrementing the cumulative elements using the number of rows rather than the number of non zero elements thus far. This resulted in cases where the row index was no longer accurate (and no longer monotonically increasing), and thus the gene data retrieved by using these row pointers via functions like scmemmap.get_row was incorrect.

Usage

There are no changes to how the user interacts with the changed code.

Testing

We add a regression unit test for the specific case on the val data that was failing previously.
Tests for these changes can be run via:

pytest -v /workspace/bionemo2/sub-packages/bionemo-scdl/tests/bionemo/scdl/io/test_single_cell_collection.py::test_sc_concat_in_flatten_cellxval

…iple datasets; added unit test that checks for regression
Copy link
Contributor

@malcolmgreaves malcolmgreaves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR commit message, nice fix, and great tests! 👏 LGTM

@savitha-eng
Copy link
Collaborator Author

/build-ci

@polinabinder1
Copy link
Collaborator

/build-ci

@polinabinder1 polinabinder1 enabled auto-merge (squash) October 25, 2024 23:01
@savitha-eng
Copy link
Collaborator Author

/build-ci

@polinabinder1 polinabinder1 merged commit 4774510 into main Oct 28, 2024
@polinabinder1 polinabinder1 deleted the savitha/scdl-concat-fix branch October 28, 2024 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants