Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nextclade tool (nextstrain/nextclade) #101

Merged
merged 21 commits into from Dec 28, 2020
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
129 changes: 129 additions & 0 deletions nextclade/nextclade.cwl
@@ -0,0 +1,129 @@
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool

doc: Assign Nextstrain clades to SARS-CoV-2 sequences and provide QC information
id: nextclade
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
id: nextclade

No need for id here

label: Nextclade

dct:creator:
"@id": "https://orcid.org/0000-0001-6553-5274"
foaf:name: Peter van Heusden
foaf:mbox: "mailto:pvh@sanbi.ac.za"

requirements:
DockerRequirement:
dockerPull: neherlab/nextclade:0.8.1-alpine

hints:
ResourceRequirement:
coresMin: 1
ramMin: 512 # 512 MB

inputs:
input_fasta:
mr-c marked this conversation as resolved.
Show resolved Hide resolved
type: File
doc: .fasta or .txt file with input sequences
format:
- edam:format_1929 # FASTA
- edam:format_1964 # plain text format
mr-c marked this conversation as resolved.
Show resolved Hide resolved
inputBinding:
prefix: --input-fasta
input_qc_config:
type: File?
doc: QC config json file containing custom QC configuration
format: edam:format_3464 # JSON
inputBinding:
prefix: --input-qc-config
input_root_seq:
type: File?
doc: plain text file containing custom root sequence
format: edam:format_1964
inputBinding:
prefix: --input-root-seq
input-tree:
type: File?
doc: Auspice JSON v2 file containing custom reference tree
format: edam:format_3464
inputBinding:
prefix: --input-tree
input-gene-map:
type: File?
doc: 'JSON file containing custom gene map. Gene map (sometimes also called "gene annotations") is used to resolve aminoacid changes in genes.'
format: edam:format_3464
inputBinding:
prefix: --input-gene-map
input-pcr-primers:
type: File?
doc: CSV file containing a list of custom PCR primer sites. These are used to report mutations in these sites.
format: edam:format_3572
inputBinding:
prefix: --input-pcr-primers
output_options:
type:
type: record
name: output_options_record
fields:
output_json_filename:
Copy link
Member

@mr-c mr-c Dec 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do all of these filenames need to be specified by the user? Why not take the name root of the FASTA and add the relevant file extension?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underlying tool has options for different output names for each output file. While I am not sure if that will be used in practice, I have followed the underlying tool in the CWL here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nextclade dev here. Let us know, folks, what would be the most convenient set of flags. We are flexible here.
P.S. ping me with @ if I don't reply

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For "user" I meant the user of this CWL description. For nextclade itself @ivan-aksamentov you did the correct thing to ask for the filename/path.

Is there a cost to output all these types?

@pvanheus If not then I recommend always requesting all output types via the arguments section building off of the nameroot of the fast afile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point: from a workflow point of view, naming of files is not relevant. @ivan-aksamentov if i understand correctly the outputs are typically small and there is no significant cost to asking for all of them?

type: string?
doc: Filename of output JSON results file
inputBinding:
prefix: --output-json
output_csv_filename:
type: string?
doc: Filename of output CSV results file
inputBinding:
prefix: --output-csv
output_tsv_filename:
type: string?
doc: Filename of output TSV results file
inputBinding:
prefix: --output-tsv
output_tsv_clades_only_filename:
type: string?
doc: Filename to output CSV clades-only file
inputBinding:
prefix: --output-tsv-clades-only
output_tree_filename:
type: string?
doc: Filename of output Auspice v2 tree file
inputBinding:
prefix: --output-tree

outputs:
output_json:
mr-c marked this conversation as resolved.
Show resolved Hide resolved
type: File?
format: edam:format_3464
outputBinding:
glob: $(inputs.output_options.output_json_filename)
output_csv:
type: File?
format: edam:format_3572 # Comma-separated values
mr-c marked this conversation as resolved.
Show resolved Hide resolved
outputBinding:
glob: $(inputs.output_options.output_csv_filename)
output_tsv_clades_only:
type: File?
format: edam:format_3475 # Tab-separated values
outputBinding:
glob: $(inputs.output_options.output_tsv_clades_only_filename)
output_tsv:
type: File?
format: edam:format_3475
outputBinding:
glob: $(inputs.output_options.output_tsv_filename)
output_tree:
type: File?
format: edam:format_3464
outputBinding:
glob: $(inputs.output_options.output_tree_filename)

baseCommand: [ nextclade.js ]


$namespaces:
edam: http://edamontology.org/
dct: http://purl.org/dc/terms/
mr-c marked this conversation as resolved.
Show resolved Hide resolved
foaf: http://xmlns.com/foaf/0.1/
$schemas:
- http://edamontology.org/EDAM_1.18.owl

13 changes: 13 additions & 0 deletions nextclade/tests/nextclade_t1.yml
@@ -0,0 +1,13 @@
input_fasta:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
input_fasta:
sequences:

class: File
format: edam:format_1929
location: https://bigd.big.ac.cn/ncov/genome/sequence/download/single/MT981459
output_options:
output_json_filename: MT981459.json
output_csv_filename: MT981459.csv
mr-c marked this conversation as resolved.
Show resolved Hide resolved

$namespaces:
edam: http://edamontology.org/
$schemas:
- http://edamontology.org/EDAM_1.18.owl