Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

momps and mlst: remove collect from report to avoid overwriting previous results when a pipeline is resumed #196

Closed
wants to merge 2 commits into from

Conversation

cimendes
Copy link
Member

As explained in the issue #195, the compilation of the momps and mlst results into a single file causes for previous results to be overwritten when a pipeline is resumed. To avoid this situation, now the results are saved in a file per sample. After pipeline completion it's easy enough to obtain a single file with all the results for these components by concatenating all files together.

@codecov-io
Copy link

codecov-io commented Feb 18, 2019

Codecov Report

Merging #196 into dev will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##              dev     #196   +/-   ##
=======================================
  Coverage   43.94%   43.94%           
=======================================
  Files          67       67           
  Lines        6039     6039           
=======================================
  Hits         2654     2654           
  Misses       3385     3385

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1e89b0f...82eef75. Read the comment docs.

Copy link
Collaborator

@bfrgoncalves bfrgoncalves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR changes the output files structure of mlst and momps reports by dividing the results into several independent files which is not ideal when running several samples at a time. However, this is required to solve the current issue of --resume overwriting previous results of the same pipeline by the creation of new symlinks.
I approve but a different approach should be made in the future to maintain the output in only one file.

@cimendes
Copy link
Member Author

I don't have an issue with having one result per file, yes it can get a bit cluttered but so far we have everything organized into separate directories, and I think it's a good trade-off to guarantee that the previous results aren't overwritten when a pipeline is resumed. The structure of the files is maintained throughout so it all can be concatenated very easily by the user, if need be. Otherwise, the reports provide a table that can be downloaded (this PR has no interference with how reports are generated).

Copy link
Collaborator

@ODiogoSilva ODiogoSilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure of the issue that led to this PR. When you resume a nextflow pipeline with the same input data, the compilation processes should receive both the old data and the new data from the resume, which means that the end result will be the same. This should only be an issue when we resume AND change the input data somehow?

In any case, I don't think it makes sense to modify the compilation processes. Either these are removed, and the publishing of outptut files is moved to the main processes, or they are kept and we also output the individual result files to a related publishedDirectory. In the last option, you would get the best of both worlds, I think.

@@ -47,17 +47,17 @@ process mlst_{{ pid }} {

process compile_mlst_{{ pid }} {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the results are being produced for each sample, then the whole compile_mlst process is not needed anymore. You could just remove this process and add the publishDir of the appropriate output in the mlst process correct?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If for some reason, this process in kept, then the name does not make sense anymore, as it is not compiling stuff.

@@ -75,17 +76,18 @@ process momps_report_{{ pid }} {
publishDir "results/typing/momps_{{ pid }}/", pattern: "*.tsv"

input:
file(st_file) from momps_st_{{ pid }}.collect()
file(profile_file) from momps_profile_{{ pid }}.collect()
val sample_id from momps_sample_id_{{ pid }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same login applies here. If output is not being compile, then it makes more sense to just create these files on the main process and move the publishDir there.

@cimendes
Copy link
Member Author

Closing as it's solving an non-issue, rightfully pointed out by @ODiogoSilva

@cimendes cimendes closed this Sep 15, 2019
@cimendes cimendes deleted the remove_component_compile_reports branch September 16, 2019 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants