Skip to content

Polinabinder/file extend#477

Merged
polinabinder1 merged 22 commits into
mainfrom
polinabinder/file_extend
May 12, 2025
Merged

Polinabinder/file extend#477
polinabinder1 merged 22 commits into
mainfrom
polinabinder/file_extend

Conversation

@polinabinder1
Copy link
Copy Markdown
Collaborator

No description provided.

@polinabinder1
Copy link
Copy Markdown
Collaborator Author

This PR enables concatenation of multiple single cell memory map datasets without copying the full files over and needing to double the memory usage on disk. In this approach, the row, column, and data arrays are copied over in blocks of bytes, and files are deleted after they are copied over, ensuring limits on the amount of data stored in duplicate on disk.

@polinabinder1
Copy link
Copy Markdown
Collaborator Author

/build-ci

Copy link
Copy Markdown
Collaborator

@skothenhill-nv skothenhill-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty good, left a few stylistic comments. Docstrings could use examples I think.

Comment thread sub-packages/bionemo-scdl/src/bionemo/scdl/io/single_cell_collection.py Outdated
Comment thread sub-packages/bionemo-scdl/src/bionemo/scdl/util/filecopyutil.py
polinabinder1 and others added 3 commits November 26, 2024 12:49
…ll_memmap_dataset.py

Co-authored-by: Steven Kothen-Hill <148821680+skothenhill-nv@users.noreply.github.com>
Signed-off-by: polinabinder1 <pbinder@nvidia.com>
…ll_memmap_dataset.py

Co-authored-by: Steven Kothen-Hill <148821680+skothenhill-nv@users.noreply.github.com>
Signed-off-by: polinabinder1 <pbinder@nvidia.com>
@polinabinder1
Copy link
Copy Markdown
Collaborator Author

/build-ci

Comment thread sub-packages/bionemo-scdl/src/bionemo/scdl/io/single_cell_collection.py Outdated
@polinabinder1
Copy link
Copy Markdown
Collaborator Author

/build-ci

@polinabinder1
Copy link
Copy Markdown
Collaborator Author

/build-ci

1 similar comment
@polinabinder1
Copy link
Copy Markdown
Collaborator Author

/build-ci

@pstjohn pstjohn removed their request for review November 27, 2024 19:59
Copy link
Copy Markdown
Contributor

@edawson edawson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved! Thank you for adding tests.

@polinabinder1
Copy link
Copy Markdown
Collaborator Author

/build-ci

@polinabinder1
Copy link
Copy Markdown
Collaborator Author

/build-ci

Signed-off-by: polinabinder1 <pbinder@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 10, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@polinabinder1
Copy link
Copy Markdown
Collaborator Author

/ok to test 97eeea3

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 10, 2025

Codecov Report

Attention: Patch coverage is 96.49123% with 2 lines in your changes missing coverage. Please review.

Project coverage is 84.35%. Comparing base (4781597) to head (e7d5fda).
Report is 1 commits behind head on main.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...scdl/src/bionemo/scdl/io/single_cell_collection.py 71.42% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #477      +/-   ##
==========================================
- Coverage   84.35%   84.35%   -0.01%     
==========================================
  Files         138      139       +1     
  Lines        8706     8716      +10     
==========================================
+ Hits         7344     7352       +8     
- Misses       1362     1364       +2     
Files with missing lines Coverage Δ
.../src/bionemo/scdl/io/single_cell_memmap_dataset.py 92.05% <100.00%> (-0.04%) ⬇️
...bionemo-scdl/src/bionemo/scdl/util/filecopyutil.py 100.00% <100.00%> (ø)
...scdl/src/bionemo/scdl/io/single_cell_collection.py 95.34% <71.42%> (-2.22%) ⬇️

... and 1 file with indirect coverage changes

Copy link
Copy Markdown
Contributor

@edawson edawson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor nit, otherwise looks good!

Signed-off-by: polinabinder1 <pbinder@nvidia.com>
@polinabinder1
Copy link
Copy Markdown
Collaborator Author

/ok to test ad7ddd3

@polinabinder1
Copy link
Copy Markdown
Collaborator Author

/ok to test e7d5fda

@polinabinder1 polinabinder1 added this pull request to the merge queue May 12, 2025
Merged via the queue into main with commit 8d51eda May 12, 2025
10 checks passed
@polinabinder1 polinabinder1 deleted the polinabinder/file_extend branch May 12, 2025 23:43
trvachov pushed a commit that referenced this pull request May 16, 2025
Signed-off-by: polinabinder1 <pbinder@nvidia.com>
Co-authored-by: Steven Kothen-Hill <148821680+skothenhill-nv@users.noreply.github.com>
camirr-nv pushed a commit that referenced this pull request Jun 26, 2025
Signed-off-by: Camir Ricketts <camirr@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants