Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle altExps with AnnData output #351

Closed
allyhawkins opened this issue Jun 20, 2023 · 2 comments
Closed

Handle altExps with AnnData output #351

allyhawkins opened this issue Jun 20, 2023 · 2 comments
Assignees

Comments

@allyhawkins
Copy link
Member

Yes, so this does not handle alternative experiments. Our function that we are using will only convert the main experiment. Looking back on AlexsLemonade/scpcaTools#115, we had some initial thoughts on how we wanted to handle it, one of which was outputting a separate file for each altExp. I think we may want to address this in a separate issue/PR here because we will have to think about what we want the output to look like there.

In looking briefly at the Scanpy documentation, it looks like they store everything in one matrix with the adt data as additional rows in the gene by cell counts matrix. Maybe we could do something similar prior to converting to anndata to keep everything in one file? https://scanpy-tutorials.readthedocs.io/en/multiomics/cite-seq/pbmc5k.html

Doing that seems kind of hacky, and I don't love it. I think maybe the right approach going forward is to to export mudata objects? https://mudata.readthedocs.io/en/latest/. This allows wrapping multiple anndata objects in a way much more similar to SCE. The accessing the underlying AnnData objects is done with calls like mudata['rna'].

There are some remaining questions though: For example: do we want all files to be mudata for output, even if there is only RNA data?

All of this I think falls into future discussion, but we should probably resolve it pretty soon to prevent too much rewriting later.

Originally posted by @jashapiro in #350 (comment)

One other thing we will want to consider is how to make sure that the output is compliant with CZI cellxgene, which requires hdf5 files. We may be able to still use muData and make the individual objects CZI compliant.

@allyhawkins
Copy link
Member Author

From our discussion in DSTM today we have decided to keep separate AnnData objects for CITE-seq and RNA data. This means we are going to have 2 files for every library that contains ADT or hashing data - 1 for the ADT/hashing object and the other for RNA.

We may still want to use muData to store the two AnnData objects for easier processing through Nextflow, but the final output should be two separate files.

@allyhawkins
Copy link
Member Author

Closed by #355

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant