Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to skip spaceranger quantification #134

Merged
merged 7 commits into from
Mar 23, 2022

Conversation

allyhawkins
Copy link
Member

Here, I added in the option to skip the spaceranger quantification, if the spaceranger output folder already exists. This was pretty similar to the previous steps that we skipped where as long as the --repeat_mapping flag is set to false and the output directory exists, then it will be included in a separate branch on the spatial channel that will not go through the spaceranger process.

As part of this PR, I also addressed the reorganization of the output files, removing the spaceranger_versions.json and spaceranger_metrics_summary.csv files from the final output. However, we need those files to create the metadata.json file that we create in the spaceranger_metadata process.

So here, what I did was break out the file reorganization that was originally in the spaceranger process and put it in the spaceranger_metadata process. This means that we now are going to store the outs folder from cellranger as is (with removal of the large bam files) in the internal/spaceranger folder, and then actually include the files we would like to include on the portal in the publish directory. I'm torn on this because we don't really need to have all of these smaller files, but it seemed like the clean way to do it. This way we have the original folder that remains untouched and if it exists we skip spaceranger. Then we output the final files in a separate output folder in the file structure we would like without disturbing the original files.

Lastly, to finish addressing the reorganization, I appended the _spatial flag to the library folder that gets published as described in #82.

I tested these changes and the spaceranger process is successfully skipped and the output files look as expected.

Closes #81.
Closes #82.

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! I actually kind of like that we have moved the rearranging and reformatting to the second process, as that makes the initial process cleaner and allows us to keep the "spaceranger format" for a bit longer, which may be beneficial down the line.

My only suggestions are quite minor about the naming of things (a difficult matter), so I don't think I need to review again.

modules/spaceranger.nf Outdated Show resolved Hide resolved
// add sample names and spatial output directory to metadata
.map{it.cr_samples = getCRsamples(it.files);
it.spaceranger_publish_dir = "${params.outdir}/internal/spaceranger/${it.library_id}";
it.spaceranger_results_dir = "${it.spaceranger_publish_dir}/${it.run_id}-spatial";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super minor, but if we are using _ elsewhere, we might keep it the same here?

Suggested change
it.spaceranger_results_dir = "${it.spaceranger_publish_dir}/${it.run_id}-spatial";
it.spaceranger_results_dir = "${it.spaceranger_publish_dir}/${it.run_id}_spatial";

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to match the format of the other folders that are present in the internal folder that have -rna appended to the folder. Whereas in the publish directory all of our filenames use underscores instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, makes sense!

script:
spatial_out = "${meta.library_id}"
out_id = "${meta.run_id}-spatial"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to match my previous comment about - vs _ consistency.

Suggested change
out_id = "${meta.run_id}-spatial"
out_id = "${meta.run_id}_spatial"

spatial_channel = spatial_channel
// add sample names and spatial output directory to metadata
.map{it.cr_samples = getCRsamples(it.files);
it.spaceranger_publish_dir = "${params.outdir}/internal/spaceranger/${it.library_id}";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to get myself confused here, so I'm wondering if we want to change this to it. spaceranger_internal_dir? I may be overthinking this, and whatever we do should be consistent with the other "skip" settings, so probably not worth changing.

// generate metadata.json
| spaceranger_metadata
spaceranger_metadata(grouped_spaceranger_ch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggesting here, though the main change is higher, that we should rename this process, as it is now not just creating the metadata. Perhaps spaceranger_publish?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants