New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add nextclade tool (nextstrain/nextclade) #101
Conversation
Hi Peter @pvanheus, very cool, thanks, I'd be glad to help to resolve the remaining questions.
Could you please clarify the "File type must already exist"? Is this something lacking in out toor or in CWL?
Could we provide (hardcode?) all 3 output flags all the time? Even if user don't need some of the files, there is almost no overhead.
Unless there's a bug, Nextclade should accept all kinds of paths: filenames, relative paths, absolute paths. I think even /dev/stdout might work. If not, and if it's useful, I'd be happy to adjust. If there's something else that would help integration, please let me know, and don't hesitate to @ me any time. P.S. We just released 0.4.3 with the new |
@ivan-aksamentov My queries with regards to outputs are, I think, simple misunderstandings of CWL and cwltool (the "reference" CWL executor) on my side. In CWL, outputs are taken from the working directory of the running job - thus the I have updated by TODO comment accordingly. |
@pvanheus Okay. I have very little understanding how CWL works, but if there's anything needed to be done on nextclade side for the simpler integration, just let me know. Could you please also add the new Regarding
Could we add a link to https://clades.nextstrain.org as credits? This way we could add the actual credits and references right to the app's main page when the time comes. |
Hi @ivan-aksamentov - the latest version of the the PR has the |
This pull request has been mentioned on Common Workflow Language Discourse. There might be relevant details there: |
I switched to using a record for the group of output options, with each output option itself optional - this forces at least one of the output options to be chosen. |
nextclade/nextclade.cwl
Outdated
type: record | ||
name: output_options_record | ||
fields: | ||
output_json_filename: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do all of these filenames need to be specified by the user? Why not take the name root of the FASTA and add the relevant file extension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The underlying tool has options for different output names for each output file. While I am not sure if that will be used in practice, I have followed the underlying tool in the CWL here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nextclade dev here. Let us know, folks, what would be the most convenient set of flags. We are flexible here.
P.S. ping me with @ if I don't reply
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For "user" I meant the user of this CWL description. For nextclade itself @ivan-aksamentov you did the correct thing to ask for the filename/path.
Is there a cost to output all these types?
@pvanheus If not then I recommend always requesting all output types via the arguments
section building off of the nameroot of the fast afile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point: from a workflow point of view, naming of files is not relevant. @ivan-aksamentov if i understand correctly the outputs are typically small and there is no significant cost to asking for all of them?
They do not. The Dockerfiles for the tool are in: https://github.com/nextstrain/nextclade/tree/master/packages/cli/docker. If I understand the biocontainers procedure (https://biocontainers-edu.readthedocs.io/en/latest/what_is_biocontainers.html) they accept contributions via pull request but that is up to upstream folks like @ivan-aksamentov. |
nextclade/nextclade.cwl
Outdated
class: CommandLineTool | ||
|
||
doc: Assign Nextstrain clades to SARS-CoV-2 sequences and provide QC information | ||
id: nextclade |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id: nextclade |
No need for id
here
nextclade/tests/nextclade_t1.yml
Outdated
@@ -0,0 +1,13 @@ | |||
input_fasta: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input_fasta: | |
sequences: |
Never heard of biocontainers. Is that a rebranded Docker or something? I see they require some
No.
Correct. Outputs are all computed regardless, and are all small CSV or JSON files. The difference between JSON, CSV and TSV is only format - they contain the same data. Tree JSON might be on a larger side, but that depends on what tree user has provided as input tree. Future: people asked us to also output aligned sequences, proteins, newick tree etc. This might be added later and might be big. More future: we rewrite Nextclade in C++ |
BioContainers is a community maintained docker/singularity container registry+builds built on quay.io, GitHub, (Bio)Conda & Debian. The quickest route is to get nextclade packaged in the BioConda Conda channel |
nextclade is a command line version of the Nextstrain clade identification and QC service (https://clades.nextstrain.org/).
The tool takes FASTA input and generates a report in JSON / CSV / TSV format. As noted in the TODO I don't know how to pass an output path in (since a File type must already exist as a file on disk), and secondly I don't know how to specify that at least one of the output options must be specified.
This tool pulls from nextclade 0.4.2. I thought it was safer to put a version number in than to leave it out.
I added myself in the dct:creator field because I am the creator of the wrapper (and following Dockstore specification for doing such things). If there is a way to credit the upstream project I would like to do so.