Skip to content

Quick Start

Jessica Rowell edited this page Sep 24, 2024 · 29 revisions

Validate Environment

1. Check that you are in the directory where the TOSTADAS repository was installed by running pwd

Expected Output:

/path/to/working/directory/tostadas

2. Change the submission_config parameter within nextflow.config to the location of your personal submission config file.

Bacterial and viral submission configurations are provided in the repo for testing purposes, but you will not be able to perform a submission without providing a personal submission configuration.

More information on running the submission sub-workflow can be found here: More Information on Submission

Pipeline Execution Examples

We describe a few use-cases of the pipeline below. For more information on input parameters, refer to the documentation found in the following pages:

❗ The paths to the required files must be specified in the nextflow.config file or the params.yaml file.

Basic Usage:

Before we dive into the more complex use-cases of the pipeline, let's look at the most basic way the pipeline can be run:

nextflow run main.nf -profile <test(optional)>,<singularity|docker|conda> --species <virus|bacteria>
  • -profile <test(optional)>,<singularity|docker|conda>
    • Specify the run-time environment (singularity, docker or conda). The conda implementation is less stable so using singularity or docker is recommended if available on your system.
    • You may specify the optional -profile argument test to force the pipeline to ignore the custom configuration found in your nextflow.config file and instead run using a pre-configured test data set and configuration.
  • -- species <virus|bacteria>:
    • The pathogen type must be specified for the pipeline to run.

❗ Important note on arguments: Arguments with a single β€œ-β€œ, such as -profile, are arguments to nextflow. Arguments with a double β€œ--β€œ, such as --species or –-submission are arguments to the TOSTADAS pipeline.

Example:

nextflow run main.nf -profile test,singularity --species virus

Breakdown:

  • -profile test,singularity
    • Set compute environment to singularity
    • Run with test configuration
  • --species virus
    • Viral sample

Overriding parameters through the command line:

Any parameter defined in nexflow.config can be overridden at runtime by providing the parameter as a command line argument with the β€œ--” prefix.

Example: Modifying the output directory

By default, the pipeline will create and store pipeline outputs in the test_output directory. You can modify the location output files are stored by adding the --output_dir flag to the command line and providing the new path as a string.

nextflow run main.nf -profile test,singularity –-species virus --output_dir </path/to/output/dir>

Running Annotation and Submission

1. Annotate viral assemblies and submit them to GenBank and SRA

Database targets are specified at run time. You can specify more than one target by adding additional arguments to the command line.

nextflow run main.nf -profile singularity --species virus --annotation --submission --genbank --sra --biosample --meta_path </path/to/meta_data/file> –-submission_config </path/to/submission/config/file/>

Breakdown:

  • -profile singularity
    • Set compute environment to singularity
  • --species virus
    • Viral sample
  • --sra
    • Prepare an SRA submission
  • --genbank
    • Prepare a GenBank submission
  • --biosample
    • Prepare a BioSample submission
  • --annotation
    • Run annotation
  • --submission
    • Run submission
  • --meta_path
    • Provide path to your meta-data file
    • Paths to fastq files are stored here
  • --submission_config
    • Provide path to submission config file

2. Annotate bacterial assemblies and submit to GenBank and SRA

❗ Note – If you don’t have the BAKTA database, run with the --download_bakta_db true flag or download from it from this Link. If you do have the database, skip ahead to (B).

(A) Download Bakta database

nextflow run main.nf -profile singularity --species bacteria --genbank --sra --biosample --download_bakta_db true --bakta_db_type <light|full> --submission –-annotation --meta_path /path/to/meta_data/file –-submission_config </path/to/submission/config/file/>

(B) Provide path to existing Bakta database

nextflow run main.nf -profile singularity --species bacteria --genbank --sra --biosample --bakta_db_path <path/to/bakta/db> --submission –-annotation --meta_path </path/to/meta_data/file> –-submission_config </path/to/submission/config/file/

Breakdown:

  • --bacteria
    • Bacterial sample
  • --download_bakta_db true
    • Download or refresh the Bakta database
  • --bakta_db_type <light|full>
    • Choose between the faster, lighter-weight light Bakta database or the larger and slower, but more accurate, full Bakta database
  • --bakta_db_path </path/to/bakta/db>
    • The path to the Bakta database

Use Case 2: Running Submission only without Annotation

❗ Note: you can only submit raw files to SRA, not to GenBank.

1. Submit fastqs to SRA

nextflow run main.nf -profile singularity --species virus --annotation false --sra --biosample –-submission --meta_path </path/to/meta_data/file> 
  • --annotation false
    • Disable annotation
  • --meta_path
    • Provide path to your meta-data file
    • Paths to fastq files are stored here

2. Submit user-provided annotations and fasta assembly to GenBank

nextflow run main.nf -profile singularity --species virus --annotation false --genbank --biosample –-submission --meta_path </path/to/meta_data/file> 
  • --annotation false
    • Disable annotation
  • --meta_path
    • Provide path to your meta-data file
    • Paths to fasta and gff files are stored here