Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cellhash demux results #140

Merged
merged 25 commits into from
Apr 4, 2022

Conversation

jashapiro
Copy link
Member

This PR adds the basic demux results from cellhash tables to the script integrating multiplexed data.

It does this by adding flags to the add_demux_sce.R script to indicate whether the demultiplexing should be performed, along with the option to include a file containing a table of the cellhash sample ids. That file is found in the new cellhash_pool_file param, which is set in the ccdl profile.

This file is not currently documented, so we should probably add that either before merging or as a next step, which might include more documentation about running multiplexed samples.

One other future thought is that while we don't currently have any samples that don't have cellhash data, that is possible. It is also possible (maybe more likely?) that we could get cellhash samples without matched bulks samples for genetic demultiplexing. Right now the workflow doesn't really support that at all. Changing that is probably also related to #100.

Leaving this as a draft for now.

@jashapiro jashapiro mentioned this pull request Mar 30, 2022
12 tasks
Base automatically changed from jashapiro/96-integrate-vireo to development March 31, 2022 13:08
@jashapiro
Copy link
Member Author

Updating with some comments on the way this has changed since the initial draft:

  • I am using the same single R script as before, but I have split adding the cellhash demux and the vireo integration into two separate processes. This allows me to do the cellhash demux results first, right after the feature mapping happens. The main reason for this is that it may (in the future) allow more flexible handling of the case where we might not have bulk RNA-seq data for a multiplexed sample. For now, it has the benefit of working with mulitplexed samples that don't have cellhash data (which we also don't have, but it should work!).
  • I split feature_techs into two separate lists, again for more potential flexibility later. I actually thought I was going to use that to sort out the feature samples with cellhash data, but I ended up using the feature_type flag that gets set in the metadata as part of the generate_merged_sce process, so there may not be any benefit to this, but I left it anyway.

@jashapiro jashapiro marked this pull request as ready for review March 31, 2022 20:50
Copy link
Member

@allyhawkins allyhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I just had one minor suggestion about staying consistent with the use of the cellhash_techs, but I'm not 100% sure if that will work as I think it will. I'm also not crazy about the organization of the addition of processes where the demux results are added to the SCE's being incorporated to the same script where there is generation of SCEs and filtering, since now things are definitely not as linear as they used to be, but I can't think of a better option since they are all related to sce processing.

main.nf Outdated
merged_sce_ch = generate_merged_sce(feature_rna_quant_ch)
feature_sce_ch = generate_merged_sce(feature_rna_quant_ch)
.branch{ // branch cellhash libs
cellhash: it[0]["feature_type"] == "cellhash"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cellhash: it[0]["feature_type"] == "cellhash"
cellhash: it[0]["technology"] in cellhash_techs

Would you maybe be able to do this instead? I just found it generally confusing to map back where this was coming from while being able to use the cellhash_techs that you defined earlier seems like it would be cleaner.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this won't work (this was what I tried to use first!). meta.technology is set for the "primary" technology after we add feature data, so it is just 10Xv3 or whatever. generate_merged_sce adds meta.feature_type, which retains the kind of feature that us in the altexp of the SCE, so we have to use that.

I could maybe modify generate_merged_sce to add feature_technology without changing the string... that might be a better solution, come to think of it... I will try that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, I could use it[0]["feature_meta"]["technology"] as that is already set by the generate_merged_sce workflow (all of meta from feature mappings ends up in meta.feature_meta when the objects are combined). Do you have any thoughts on which is less confusing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that option makes sense (or at least to me). That way you would be able to do it[0]["feature_meta"]["technology"] in cellhash_techs correct? What was originally confusing to me was where the value of cellhash was coming from.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is what it would be. I will test to make sure it works as expected.

@jashapiro jashapiro merged commit 1bb59c6 into development Apr 4, 2022
@jashapiro jashapiro deleted the jashapiro/33-cellhash_demux_results branch April 4, 2022 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants