Add cellhash demux results #140

jashapiro · 2022-03-30T20:33:00Z

This PR adds the basic demux results from cellhash tables to the script integrating multiplexed data.

It does this by adding flags to the add_demux_sce.R script to indicate whether the demultiplexing should be performed, along with the option to include a file containing a table of the cellhash sample ids. That file is found in the new cellhash_pool_file param, which is set in the ccdl profile.

This file is not currently documented, so we should probably add that either before merging or as a next step, which might include more documentation about running multiplexed samples.

One other future thought is that while we don't currently have any samples that don't have cellhash data, that is possible. It is also possible (maybe more likely?) that we could get cellhash samples without matched bulks samples for genetic demultiplexing. Right now the workflow doesn't really support that at all. Changing that is probably also related to #100.

Leaving this as a draft for now.

…h_demux_results

This reverts commit 9fd15e2.

…ellhash_demux_results

jashapiro · 2022-03-31T20:50:52Z

Updating with some comments on the way this has changed since the initial draft:

I am using the same single R script as before, but I have split adding the cellhash demux and the vireo integration into two separate processes. This allows me to do the cellhash demux results first, right after the feature mapping happens. The main reason for this is that it may (in the future) allow more flexible handling of the case where we might not have bulk RNA-seq data for a multiplexed sample. For now, it has the benefit of working with mulitplexed samples that don't have cellhash data (which we also don't have, but it should work!).
I split feature_techs into two separate lists, again for more potential flexibility later. I actually thought I was going to use that to sort out the feature samples with cellhash data, but I ended up using the feature_type flag that gets set in the metadata as part of the generate_merged_sce process, so there may not be any benefit to this, but I left it anyway.

allyhawkins

This looks good to me. I just had one minor suggestion about staying consistent with the use of the cellhash_techs, but I'm not 100% sure if that will work as I think it will. I'm also not crazy about the organization of the addition of processes where the demux results are added to the SCE's being incorporated to the same script where there is generation of SCEs and filtering, since now things are definitely not as linear as they used to be, but I can't think of a better option since they are all related to sce processing.

allyhawkins · 2022-04-01T15:16:28Z

main.nf

-  merged_sce_ch = generate_merged_sce(feature_rna_quant_ch)
+  feature_sce_ch = generate_merged_sce(feature_rna_quant_ch)
+    .branch{ // branch cellhash libs
+      cellhash: it[0]["feature_type"] == "cellhash"


Suggested change

cellhash: it[0]["feature_type"] == "cellhash"

cellhash: it[0]["technology"] in cellhash_techs

Would you maybe be able to do this instead? I just found it generally confusing to map back where this was coming from while being able to use the cellhash_techs that you defined earlier seems like it would be cleaner.

Unfortunately, this won't work (this was what I tried to use first!). meta.technology is set for the "primary" technology after we add feature data, so it is just 10Xv3 or whatever. generate_merged_sce adds meta.feature_type, which retains the kind of feature that us in the altexp of the SCE, so we have to use that.

I could maybe modify generate_merged_sce to add feature_technology without changing the string... that might be a better solution, come to think of it... I will try that.

Alternatively, I could use it[0]["feature_meta"]["technology"] as that is already set by the generate_merged_sce workflow (all of meta from feature mappings ends up in meta.feature_meta when the objects are combined). Do you have any thoughts on which is less confusing?

I think that option makes sense (or at least to me). That way you would be able to do it[0]["feature_meta"]["technology"] in cellhash_techs correct? What was originally confusing to me was where the value of cellhash was coming from.

Yes, that is what it would be. I will test to make sure it works as expected.

jashapiro added 13 commits March 24, 2022 16:25

Add cellhash demux options

fdf3ecd

rename sce process file

578a2c8

Add param for pool file

512f47a

fix args

2a78e74

Merge branch 'jashapiro/96-integrate-vireo' into jashapiro/33-cellhas…

031d5c8

…h_demux_results

Merge branch 'jashapiro/96-integrate-vireo' into jashapiro/33-cellhas…

3a3046c

…h_demux_results

remove unexpected hto labels

58ef88d

Merge branch 'jashapiro/96-integrate-vireo' into jashapiro/33-cellhas…

ea848e1

…h_demux_results

add pool file to demux

4bcdb82

not a val, a path

d6f8c8f

Merge branch 'jashapiro/96-integrate-vireo' into jashapiro/33-cellhas…

83800f4

…h_demux_results

Merge branch 'jashapiro/96-integrate-vireo' into jashapiro/33-cellhas…

0e71071

…h_demux_results

Merge branch 'jashapiro/96-integrate-vireo' into jashapiro/33-cellhas…

af98c48

…h_demux_results

jashapiro mentioned this pull request Mar 30, 2022

Prepare for scpca-nf release v0.2.5 #141

Closed

12 tasks

Base automatically changed from jashapiro/96-integrate-vireo to development March 31, 2022 13:08

jashapiro added 10 commits March 31, 2022 09:16

Merge branch 'development' into jashapiro/33-cellhash_demux_results

d187869

split cellhash and citeseq

9fd15e2

Revert "split cellhash and citeseq"

a33dff1

This reverts commit 9fd15e2.

separate citeseq and cellhash

cdd226b

separate genetic and cellhash demux processes

08a4c50

fix cellhash branching

43f0084

set cellhash_df null by default

89dc49d

Add default values for flags

4306f7b

rename demux workflow with vireo

4be1897

Merge remote-tracking branch 'origin/development' into jashapiro/33-c…

b887868

…ellhash_demux_results

jashapiro marked this pull request as ready for review March 31, 2022 20:50

jashapiro requested a review from allyhawkins March 31, 2022 20:51

allyhawkins approved these changes Apr 1, 2022

View reviewed changes

add feature_technology field

561c4a5

use nested feature_meta object

9d963ef

jashapiro merged commit 1bb59c6 into development Apr 4, 2022

jashapiro deleted the jashapiro/33-cellhash_demux_results branch April 4, 2022 15:03

jashapiro mentioned this pull request Apr 8, 2022

Support for cellhashed data #33

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cellhash demux results #140

Add cellhash demux results #140

jashapiro commented Mar 30, 2022

jashapiro commented Mar 31, 2022

allyhawkins left a comment

allyhawkins Apr 1, 2022

jashapiro Apr 1, 2022

jashapiro Apr 1, 2022

allyhawkins Apr 1, 2022

jashapiro Apr 1, 2022

	cellhash: it[0]["feature_type"] == "cellhash"
	cellhash: it[0]["technology"] in cellhash_techs

Add cellhash demux results #140

Add cellhash demux results #140

Conversation

jashapiro commented Mar 30, 2022

jashapiro commented Mar 31, 2022

allyhawkins left a comment

Choose a reason for hiding this comment

allyhawkins Apr 1, 2022

Choose a reason for hiding this comment

jashapiro Apr 1, 2022

Choose a reason for hiding this comment

jashapiro Apr 1, 2022

Choose a reason for hiding this comment

allyhawkins Apr 1, 2022

Choose a reason for hiding this comment

jashapiro Apr 1, 2022

Choose a reason for hiding this comment