You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Yes, so this does not handle alternative experiments. Our function that we are using will only convert the main experiment. Looking back on AlexsLemonade/scpcaTools#115, we had some initial thoughts on how we wanted to handle it, one of which was outputting a separate file for each altExp. I think we may want to address this in a separate issue/PR here because we will have to think about what we want the output to look like there.
In looking briefly at the Scanpy documentation, it looks like they store everything in one matrix with the adt data as additional rows in the gene by cell counts matrix. Maybe we could do something similar prior to converting to anndata to keep everything in one file? https://scanpy-tutorials.readthedocs.io/en/multiomics/cite-seq/pbmc5k.html
Doing that seems kind of hacky, and I don't love it. I think maybe the right approach going forward is to to export mudata objects? https://mudata.readthedocs.io/en/latest/. This allows wrapping multiple anndata objects in a way much more similar to SCE. The accessing the underlying AnnData objects is done with calls like mudata['rna'].
There are some remaining questions though: For example: do we want all files to be mudata for output, even if there is only RNA data?
All of this I think falls into future discussion, but we should probably resolve it pretty soon to prevent too much rewriting later.
One other thing we will want to consider is how to make sure that the output is compliant with CZI cellxgene, which requires hdf5 files. We may be able to still use muData and make the individual objects CZI compliant.
The text was updated successfully, but these errors were encountered:
From our discussion in DSTM today we have decided to keep separate AnnData objects for CITE-seq and RNA data. This means we are going to have 2 files for every library that contains ADT or hashing data - 1 for the ADT/hashing object and the other for RNA.
We may still want to use muData to store the two AnnData objects for easier processing through Nextflow, but the final output should be two separate files.
Doing that seems kind of hacky, and I don't love it. I think maybe the right approach going forward is to to export
mudata
objects? https://mudata.readthedocs.io/en/latest/. This allows wrapping multiple anndata objects in a way much more similar to SCE. The accessing the underlying AnnData objects is done with calls likemudata['rna']
.There are some remaining questions though: For example: do we want all files to be
mudata
for output, even if there is only RNA data?All of this I think falls into future discussion, but we should probably resolve it pretty soon to prevent too much rewriting later.
Originally posted by @jashapiro in #350 (comment)
One other thing we will want to consider is how to make sure that the output is compliant with CZI cellxgene, which requires
hdf5
files. We may be able to still usemuData
and make the individual objects CZI compliant.The text was updated successfully, but these errors were encountered: