momps and mlst: remove collect from report to avoid overwriting previous results when a pipeline is resumed #196

cimendes · 2019-02-18T17:38:47Z

As explained in the issue #195, the compilation of the momps and mlst results into a single file causes for previous results to be overwritten when a pipeline is resumed. To avoid this situation, now the results are saved in a file per sample. After pipeline completion it's easy enough to obtain a single file with all the results for these components by concatenating all files together.

overwriting the results from previous runs

codecov-io · 2019-02-18T17:42:16Z

Codecov Report

Merging #196 into dev will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##              dev     #196   +/-   ##
=======================================
  Coverage   43.94%   43.94%           
=======================================
  Files          67       67           
  Lines        6039     6039           
=======================================
  Hits         2654     2654           
  Misses       3385     3385

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1e89b0f...82eef75. Read the comment docs.

bfrgoncalves

This PR changes the output files structure of mlst and momps reports by dividing the results into several independent files which is not ideal when running several samples at a time. However, this is required to solve the current issue of --resume overwriting previous results of the same pipeline by the creation of new symlinks.
I approve but a different approach should be made in the future to maintain the output in only one file.

cimendes · 2019-02-19T13:42:42Z

I don't have an issue with having one result per file, yes it can get a bit cluttered but so far we have everything organized into separate directories, and I think it's a good trade-off to guarantee that the previous results aren't overwritten when a pipeline is resumed. The structure of the files is maintained throughout so it all can be concatenated very easily by the user, if need be. Otherwise, the reports provide a table that can be downloaded (this PR has no interference with how reports are generated).

ODiogoSilva

I'm not 100% sure of the issue that led to this PR. When you resume a nextflow pipeline with the same input data, the compilation processes should receive both the old data and the new data from the resume, which means that the end result will be the same. This should only be an issue when we resume AND change the input data somehow?

In any case, I don't think it makes sense to modify the compilation processes. Either these are removed, and the publishing of outptut files is moved to the main processes, or they are kept and we also output the individual result files to a related publishedDirectory. In the last option, you would get the best of both worlds, I think.

ODiogoSilva · 2019-02-24T15:36:55Z

flowcraft/generator/templates/mlst.nf

@@ -47,17 +47,17 @@ process mlst_{{ pid }} {

 process compile_mlst_{{ pid }} {


If the results are being produced for each sample, then the whole compile_mlst process is not needed anymore. You could just remove this process and add the publishDir of the appropriate output in the mlst process correct?

If for some reason, this process in kept, then the name does not make sense anymore, as it is not compiling stuff.

ODiogoSilva · 2019-02-24T15:37:48Z

flowcraft/generator/templates/momps.nf

@@ -75,17 +76,18 @@ process momps_report_{{ pid }} {
    publishDir "results/typing/momps_{{ pid }}/", pattern: "*.tsv"

    input:
-    file(st_file) from momps_st_{{ pid }}.collect()
-    file(profile_file) from momps_profile_{{ pid }}.collect()
+    val sample_id from momps_sample_id_{{ pid }}


The same login applies here. If output is not being compile, then it makes more sense to just create these files on the main process and move the publishDir there.

cimendes · 2019-09-15T09:17:22Z

Closing as it's solving an non-issue, rightfully pointed out by @ODiogoSilva

cimendes added 2 commits February 18, 2019 17:33

momps and mlst processes now compile results per sample to avoid

50be919

overwriting the results from previous runs

update changelog

82eef75

cimendes requested a review from bfrgoncalves February 19, 2019 13:33

bfrgoncalves approved these changes Feb 19, 2019

View reviewed changes

ODiogoSilva requested changes Feb 24, 2019

View reviewed changes

cimendes closed this Sep 15, 2019

cimendes deleted the remove_component_compile_reports branch September 16, 2019 13:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

momps and mlst: remove collect from report to avoid overwriting previous results when a pipeline is resumed #196

momps and mlst: remove collect from report to avoid overwriting previous results when a pipeline is resumed #196

cimendes commented Feb 18, 2019

codecov-io commented Feb 18, 2019 •

edited

Loading

bfrgoncalves left a comment

cimendes commented Feb 19, 2019

ODiogoSilva left a comment

ODiogoSilva Feb 24, 2019

ODiogoSilva Feb 24, 2019

ODiogoSilva Feb 24, 2019

cimendes commented Sep 15, 2019

		@@ -47,17 +47,17 @@ process mlst_{{ pid }} {

		process compile_mlst_{{ pid }} {

momps and mlst: remove collect from report to avoid overwriting previous results when a pipeline is resumed #196

momps and mlst: remove collect from report to avoid overwriting previous results when a pipeline is resumed #196

Conversation

cimendes commented Feb 18, 2019

codecov-io commented Feb 18, 2019 • edited Loading

Codecov Report

bfrgoncalves left a comment

Choose a reason for hiding this comment

cimendes commented Feb 19, 2019

ODiogoSilva left a comment

Choose a reason for hiding this comment

ODiogoSilva Feb 24, 2019

Choose a reason for hiding this comment

ODiogoSilva Feb 24, 2019

Choose a reason for hiding this comment

ODiogoSilva Feb 24, 2019

Choose a reason for hiding this comment

cimendes commented Sep 15, 2019

codecov-io commented Feb 18, 2019 •

edited

Loading