Handle altExps with AnnData output #351

allyhawkins · 2023-06-20T16:01:41Z

Yes, so this does not handle alternative experiments. Our function that we are using will only convert the main experiment. Looking back on AlexsLemonade/scpcaTools#115, we had some initial thoughts on how we wanted to handle it, one of which was outputting a separate file for each altExp. I think we may want to address this in a separate issue/PR here because we will have to think about what we want the output to look like there.

In looking briefly at the Scanpy documentation, it looks like they store everything in one matrix with the adt data as additional rows in the gene by cell counts matrix. Maybe we could do something similar prior to converting to anndata to keep everything in one file? https://scanpy-tutorials.readthedocs.io/en/multiomics/cite-seq/pbmc5k.html

Doing that seems kind of hacky, and I don't love it. I think maybe the right approach going forward is to to export mudata objects? https://mudata.readthedocs.io/en/latest/. This allows wrapping multiple anndata objects in a way much more similar to SCE. The accessing the underlying AnnData objects is done with calls like mudata['rna'].

There are some remaining questions though: For example: do we want all files to be mudata for output, even if there is only RNA data?

All of this I think falls into future discussion, but we should probably resolve it pretty soon to prevent too much rewriting later.

Originally posted by @jashapiro in #350 (comment)

One other thing we will want to consider is how to make sure that the output is compliant with CZI cellxgene, which requires hdf5 files. We may be able to still use muData and make the individual objects CZI compliant.

The text was updated successfully, but these errors were encountered:

allyhawkins · 2023-06-21T17:13:13Z

From our discussion in DSTM today we have decided to keep separate AnnData objects for CITE-seq and RNA data. This means we are going to have 2 files for every library that contains ADT or hashing data - 1 for the ADT/hashing object and the other for RNA.

We may still want to use muData to store the two AnnData objects for easier processing through Nextflow, but the final output should be two separate files.

allyhawkins · 2023-07-14T21:15:40Z

Closed by #355

allyhawkins self-assigned this Jun 21, 2023

allyhawkins mentioned this issue Jun 26, 2023

Account for altExp in converting SCE to AnnData #355

Merged

allyhawkins closed this as completed Jul 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle altExps with AnnData output #351

Handle altExps with AnnData output #351

allyhawkins commented Jun 20, 2023

allyhawkins commented Jun 21, 2023

allyhawkins commented Jul 14, 2023

Handle altExps with AnnData output #351

Handle altExps with AnnData output #351

Comments

allyhawkins commented Jun 20, 2023

allyhawkins commented Jun 21, 2023

allyhawkins commented Jul 14, 2023