Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add quast #147

Merged
merged 1 commit into from
Oct 13, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions flowcraft/generator/components/assembly_processing.py
Original file line number Diff line number Diff line change
Expand Up @@ -229,3 +229,38 @@ def __init__(self, **kwargs):
"version": "1.22.0-1"
}
}

class Quast(Process):
"""Assess assembly quality using QUAST

This process is set with:

- ``input_type``: assembly
- ``output_type``: tsv
- ``ptype``: post_assembly

"""

def __init__(self, **kwargs):
super().__init__(**kwargs)

self.input_type = "fasta"
self.output_type = "tsv"

self.params = {
"reference": {
"default": "null",
"description": "Compare the assembly to this reference genome"
},
"genomeSizeBp": {
"default": "null",
"description": "Expected genome size (bp)"
},
}

self.directives = {
"quast": {
"container": "quay.io/biocontainers/quast",
"version": "5.0.0--py27pl526ha92aebf_1"
}
}
1 change: 1 addition & 0 deletions flowcraft/generator/engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@
"process_spades": ap.ProcessSpades,
"progressive_mauve":alignment.ProgressiveMauve,
#"prokka": annotation.Prokka,
"quast": ap.Quast,
"raxml": phylogeny.Raxml,
"reads_download": downloads.DownloadReads,
"remove_host": meta.RemoveHost,
Expand Down
48 changes: 48 additions & 0 deletions flowcraft/generator/templates/quast.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
if (params.reference{{param_id}} == null && params.genomeSizeBp{{param_id}} == null)
exit 1, "Specify at least one of reference or genomeSizeBp"
if (params.reference{{param_id}} != null && params.genomeSizeBp{{param_id}} != null)
exit 1, "Specify only one of reference or genomeSizeBp"

if (params.reference{{param_id}} != null) {
process quast_{{pid}} {
{% include "post.txt" ignore missing %}

tag { sample_id }
publishDir "results/assembly/quast_{{pid}}/$sample_id", pattern: "*.tsv"
publishDir "reports/assembly/quast_{{pid}}/$sample_id"

input:
set sample_id, file(assembly) from {{input_channel}}
file reference from Channel.fromPath(params.reference{{param_id}})

output:
file "*"
{% with task_name="quast" %}
{%- include "compiler_channels.txt" ignore missing -%}
{% endwith %}

script:
"/usr/bin/time -v quast -o . -r $reference -s $assembly -l $sample_id -t $task.cpus >> .command.log 2>&1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no trouble in using time to check the run time of the software, but that information (and more) will already be available on the pipeline_stats.txt file generated during the execution of the pipeline 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's excellent!

This is odd though. /usr/bin/time -v reports

Maximum resident set size (kbytes): 408240

whereas pipeline_stats.txt reports

rss = 58.9 MB
vmem = 476.4 MB

So I'd like to leave /usr/bin/time -v in until I sort out this discrepancy.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. Though that's interesting. It could be that the rss reported in by nextflow takes into account any possible overhead of a docker/singularity execution, while the time one is purely from quast. Or they simply measure rss differently. Either way I would be interesting to find out why!

Copy link
Contributor Author

@sjackman sjackman Oct 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, the Docker thing makes sense for the difference between 408M and 476M. I'm more confused though by /usr/bin/time reporting maxrss=408M whereas pipeline_stats.txt reports rss=58.9M. That's a very big difference!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I misread. I though it was 40Mb! That is very weird indeed. It almost seems that time's rss is reporting the virtual memory measure. Does time also provide something like vmem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may be. I doesn't oddly, which is why I usually use zsh -c 'time foobar' rather than /usr/bin/time for this purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think zsh is in that quast Docker image though.

}
} else if (params.genomeSizeBp{{param_id}} != null) {
process quast_{{pid}} {
{% include "post.txt" ignore missing %}

tag { sample_id }
publishDir "results/assembly/quast_{{pid}}/$sample_id", pattern: "*.tsv"
publishDir "reports/assembly/quast_{{pid}}/$sample_id"

input:
set sample_id, file(assembly) from {{input_channel}}
val genomeSizeBp from Channel.value(params.genomeSizeBp{{param_id}})

output:
file "*"
{% with task_name="quast" %}
{%- include "compiler_channels.txt" ignore missing -%}
{% endwith %}

script:
"/usr/bin/time -v quast -o . --est-ref-size=$genomeSizeBp -s $assembly -l $sample_id -t $task.cpus >> .command.log 2>&1"
}
}