-
Notifications
You must be signed in to change notification settings - Fork 0
Ingest raw count cells for AnnData files (SCP-5103) #359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## development #359 +/- ##
===============================================
+ Coverage 75.29% 75.33% +0.03%
===============================================
Files 29 29
Lines 4279 4297 +18
===============================================
+ Hits 3222 3237 +15
- Misses 1057 1060 +3
|
jlchang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good and manual tests behave as described.
eweitz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good! Nice follow-on for broadinstitute/single_cell_portal_core#2113.
I suggest some trivial readability refinements, no blockers.
Co-authored-by: Eric Weitz <eweitz@broadinstitute.org>
BACKGROUND & CHANGES
This update adds a new
raw_countsextraction phase for AnnData files where a list of cell names is inserted into MongoDB as the "raw" cells for this file. Usually we extract flat files and then use existing ingest classes to process the data. However, since we only require the cell names for raw counts data, and extracting a full MTX bundle would as such be pointless, this update reads the cell names directly from the AnnData file. This assumes that there is data in theadata.rawslot, and that the cells represented there match those inadata.obs_names. There may be future work required if users are not using that slot, and we may wish to let them specify which slot the raw data is in, or even which index the cell names are. But for now, the slots are hard-coded. This is part of work to enable downstream portal actions, such as automated differential expression calculation for AnnData files.MANUAL TESTING
log.txt, look for the following message: