Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add quast #147

Merged
merged 1 commit into from
Oct 13, 2018
Merged

Add quast #147

merged 1 commit into from
Oct 13, 2018

Conversation

sjackman
Copy link
Contributor

No description provided.

@codecov-io
Copy link

codecov-io commented Oct 13, 2018

Codecov Report

Merging #147 into dev will increase coverage by 0.07%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev     #147      +/-   ##
==========================================
+ Coverage   38.97%   39.05%   +0.07%     
==========================================
  Files          59       59              
  Lines        5647     5654       +7     
==========================================
+ Hits         2201     2208       +7     
  Misses       3446     3446
Impacted Files Coverage Δ
flowcraft/generator/engine.py 91.33% <ø> (ø) ⬆️
...wcraft/generator/components/assembly_processing.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6034a2e...6286b5d. Read the comment docs.

Copy link
Collaborator

@ODiogoSilva ODiogoSilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We finally have quast! 😃 Thanks @sjackman for the great addition. I've just left a couple of comments, the first being the more relevant. Other than that, it's all good.

"default": "null",
"description": "Compare the assembly to this reference genome"
},
"genomeSize": {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for the genomeSize to be in "BP"? This parameter already occurs in several other components in "MB" units, so for consistency it would be great for this one to be to. Moreover, when pipelines are built with the --merge-params option, parameters with the same name for different components are merged into a single one and this could cause problems. If you really need the units to be in BP, I suggest changing the name of the parameter (to genomeSizeBp or any other you can think of) to avoid these problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to use a base SI unit without an implied prefix for the units of parameters. The reason for this is that when I see --genomeSize=3e9 it's clear what's going on. When I see --genomeSize=3000 I don't know if the units are in bp, kbp, Mbp, or Gbp without reading the documentation. Additionally, when specifying the size of small genomes, such as the human mitochondrial genome, I found --genomeSize=0.016569 to be rather awkward and confusing compared to --genomeSize=16569.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency though and to prevent trouble with --merge-params I'll change the parameter name to genomeSizeBp.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, though those are valid point you make. For this PR it's easier and more contained if we change the parameter name, but in the near future it'll be a good idea to standardize the genomeSize parameter like you proposed.

{% endwith %}

script:
"/usr/bin/time -v quast -o . -r $reference -s $assembly -l $sample_id -t $task.cpus >> .command.log 2>&1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no trouble in using time to check the run time of the software, but that information (and more) will already be available on the pipeline_stats.txt file generated during the execution of the pipeline 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's excellent!

This is odd though. /usr/bin/time -v reports

Maximum resident set size (kbytes): 408240

whereas pipeline_stats.txt reports

rss = 58.9 MB
vmem = 476.4 MB

So I'd like to leave /usr/bin/time -v in until I sort out this discrepancy.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. Though that's interesting. It could be that the rss reported in by nextflow takes into account any possible overhead of a docker/singularity execution, while the time one is purely from quast. Or they simply measure rss differently. Either way I would be interesting to find out why!

Copy link
Contributor Author

@sjackman sjackman Oct 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, the Docker thing makes sense for the difference between 408M and 476M. I'm more confused though by /usr/bin/time reporting maxrss=408M whereas pipeline_stats.txt reports rss=58.9M. That's a very big difference!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I misread. I though it was 40Mb! That is very weird indeed. It almost seems that time's rss is reporting the virtual memory measure. Does time also provide something like vmem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may be. I doesn't oddly, which is why I usually use zsh -c 'time foobar' rather than /usr/bin/time for this purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think zsh is in that quast Docker image though.

Copy link
Collaborator

@ODiogoSilva ODiogoSilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good to me and welcome to the team! 😄 (also thumbs on the commit squash)

@sjackman
Copy link
Contributor Author

Thanks! Flowcraft is pretty cool! 😁

@ODiogoSilva ODiogoSilva merged commit 30f5a45 into assemblerflow:dev Oct 13, 2018
@sjackman sjackman deleted the quast branch October 14, 2018 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants