Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add process to convert SCE to AnnData #226

Closed
allyhawkins opened this issue Oct 6, 2022 · 2 comments
Closed

Add process to convert SCE to AnnData #226

allyhawkins opened this issue Oct 6, 2022 · 2 comments
Assignees

Comments

@allyhawkins
Copy link
Member

Is your feature request related to a problem? Please describe.

We want to be able to output the gene by cell counts matrices as AnnData objects stored as H5 files as well as output the SCE objects we already produce. Currently there are multiple SCE objects that get output as RDS files, the unfiltered, filtered, and processed RDS files. We probably want to have all of them also available as AnnData objects so that they are directly compatible with python.

Describe the solution you'd like

We can create a process that converts RDS files with SCE objects to H5 files with AnnData objects and then run that process 3 times, once for each type of SCE file. Alternatively we could have a process that converts all three in one go using the same input that the QC report uses and then runs a script that converts all 3. I think I might favor the second option, but I'm curious if others have other ideas.

Either way we will definitely need a script that takes RDS files as input and outputs H5 files using the scpcaTools::sce_to_anndata function.

Additional context

We may need to break this up into additional issues as we start to work on this but I wanted to file this to get us started on adding AnnData conversion. We want to do this before we have our next big release and re-process projects.

@jashapiro
Copy link
Member

jashapiro commented Oct 6, 2022

sce_ch = in_ch.flatMap{[[it[0], it[1]], 
                        [it[0], it[2]],
                        [it[0], it[3]]
                       ]}

anndata_ch = sce_to_anndata(sce_ch)

out_ch = anndata_ch.groupTuple(size:3)
  .map{[it[0]] + it[1]}

Thinking about this for a few minutes more, this isn't quite good enough, as it won't necessarily preserve the order, and I have run into trouble before with map items not grouping as expected (hence many of the places we pull out the library ids)... so it might actually be a bit more work than I thought...

My next idea is to make the sce_to_anndata process take an arbitrary number of files, but I can't figure out an easy way to do that, so I'm not exactly sure how to proceed.

I'd definitely make the script convert one SCE at a time, but the process will probably be easiest to do what you suggest and just take all three at once.

@allyhawkins
Copy link
Member Author

Closed by #350

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants