-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add quast #147
Add quast #147
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #147 +/- ##
==========================================
+ Coverage 38.97% 39.05% +0.07%
==========================================
Files 59 59
Lines 5647 5654 +7
==========================================
+ Hits 2201 2208 +7
Misses 3446 3446
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We finally have quast! 😃 Thanks @sjackman for the great addition. I've just left a couple of comments, the first being the more relevant. Other than that, it's all good.
"default": "null", | ||
"description": "Compare the assembly to this reference genome" | ||
}, | ||
"genomeSize": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for the genomeSize
to be in "BP"? This parameter already occurs in several other components in "MB" units, so for consistency it would be great for this one to be to. Moreover, when pipelines are built with the --merge-params
option, parameters with the same name for different components are merged into a single one and this could cause problems. If you really need the units to be in BP, I suggest changing the name of the parameter (to genomeSizeBp
or any other you can think of) to avoid these problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to use a base SI unit without an implied prefix for the units of parameters. The reason for this is that when I see --genomeSize=3e9
it's clear what's going on. When I see --genomeSize=3000
I don't know if the units are in bp, kbp, Mbp, or Gbp without reading the documentation. Additionally, when specifying the size of small genomes, such as the human mitochondrial genome, I found --genomeSize=0.016569
to be rather awkward and confusing compared to --genomeSize=16569
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency though and to prevent trouble with --merge-params
I'll change the parameter name to genomeSizeBp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, though those are valid point you make. For this PR it's easier and more contained if we change the parameter name, but in the near future it'll be a good idea to standardize the genomeSize
parameter like you proposed.
{% endwith %} | ||
|
||
script: | ||
"/usr/bin/time -v quast -o . -r $reference -s $assembly -l $sample_id -t $task.cpus >> .command.log 2>&1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no trouble in using time
to check the run time of the software, but that information (and more) will already be available on the pipeline_stats.txt
file generated during the execution of the pipeline 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's excellent!
This is odd though. /usr/bin/time -v
reports
Maximum resident set size (kbytes): 408240
whereas pipeline_stats.txt
reports
rss = 58.9 MB
vmem = 476.4 MB
So I'd like to leave /usr/bin/time -v
in until I sort out this discrepancy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem. Though that's interesting. It could be that the rss
reported in by nextflow takes into account any possible overhead of a docker/singularity execution, while the time
one is purely from quast. Or they simply measure rss
differently. Either way I would be interesting to find out why!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, the Docker thing makes sense for the difference between 408M and 476M. I'm more confused though by /usr/bin/time
reporting maxrss=408M
whereas pipeline_stats.txt
reports rss=58.9M
. That's a very big difference!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I misread. I though it was 40Mb
! That is very weird indeed. It almost seems that time
's rss
is reporting the virtual memory measure. Does time
also provide something like vmem
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That may be. I doesn't oddly, which is why I usually use zsh -c 'time foobar'
rather than /usr/bin/time
for this purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think zsh
is in that quast
Docker image though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good to me and welcome to the team! 😄 (also thumbs on the commit squash)
Thanks! Flowcraft is pretty cool! 😁 |
No description provided.