-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to skip spaceranger quantification #134
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! I actually kind of like that we have moved the rearranging and reformatting to the second process, as that makes the initial process cleaner and allows us to keep the "spaceranger format" for a bit longer, which may be beneficial down the line.
My only suggestions are quite minor about the naming of things (a difficult matter), so I don't think I need to review again.
// add sample names and spatial output directory to metadata | ||
.map{it.cr_samples = getCRsamples(it.files); | ||
it.spaceranger_publish_dir = "${params.outdir}/internal/spaceranger/${it.library_id}"; | ||
it.spaceranger_results_dir = "${it.spaceranger_publish_dir}/${it.run_id}-spatial"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super minor, but if we are using _
elsewhere, we might keep it the same here?
it.spaceranger_results_dir = "${it.spaceranger_publish_dir}/${it.run_id}-spatial"; | |
it.spaceranger_results_dir = "${it.spaceranger_publish_dir}/${it.run_id}_spatial"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to match the format of the other folders that are present in the internal
folder that have -rna
appended to the folder. Whereas in the publish
directory all of our filenames use underscores instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, makes sense!
script: | ||
spatial_out = "${meta.library_id}" | ||
out_id = "${meta.run_id}-spatial" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to match my previous comment about -
vs _
consistency.
out_id = "${meta.run_id}-spatial" | |
out_id = "${meta.run_id}_spatial" |
spatial_channel = spatial_channel | ||
// add sample names and spatial output directory to metadata | ||
.map{it.cr_samples = getCRsamples(it.files); | ||
it.spaceranger_publish_dir = "${params.outdir}/internal/spaceranger/${it.library_id}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I managed to get myself confused here, so I'm wondering if we want to change this to it. spaceranger_internal_dir
? I may be overthinking this, and whatever we do should be consistent with the other "skip" settings, so probably not worth changing.
modules/spaceranger.nf
Outdated
// generate metadata.json | ||
| spaceranger_metadata | ||
spaceranger_metadata(grouped_spaceranger_ch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggesting here, though the main change is higher, that we should rename this process, as it is now not just creating the metadata. Perhaps spaceranger_publish
?
Co-authored-by: Joshua Shapiro <josh.shapiro@ccdatalab.org>
Here, I added in the option to skip the spaceranger quantification, if the spaceranger output folder already exists. This was pretty similar to the previous steps that we skipped where as long as the
--repeat_mapping
flag is set to false and the output directory exists, then it will be included in a separate branch on the spatial channel that will not go through thespaceranger
process.As part of this PR, I also addressed the reorganization of the output files, removing the
spaceranger_versions.json
andspaceranger_metrics_summary.csv
files from the final output. However, we need those files to create themetadata.json
file that we create in thespaceranger_metadata
process.So here, what I did was break out the file reorganization that was originally in the
spaceranger
process and put it in thespaceranger_metadata
process. This means that we now are going to store the outs folder from cellranger as is (with removal of the large bam files) in theinternal/spaceranger
folder, and then actually include the files we would like to include on the portal in thepublish
directory. I'm torn on this because we don't really need to have all of these smaller files, but it seemed like the clean way to do it. This way we have the original folder that remains untouched and if it exists we skip spaceranger. Then we output the final files in a separate output folder in the file structure we would like without disturbing the original files.Lastly, to finish addressing the reorganization, I appended the
_spatial
flag to the library folder that gets published as described in #82.I tested these changes and the spaceranger process is successfully skipped and the output files look as expected.
Closes #81.
Closes #82.