Skip to content

AMR/VF LGT focused bacterial genomics analysis workflow

License

Notifications You must be signed in to change notification settings

alexmanuele/arete

Repository files navigation

Nextflow run with conda run with docker run with singularity

aretelogo

Introduction

ARETE is a bioinformatics best-practice analysis pipeline for AMR/VF LGT-focused bacterial genomics workflow.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker / Singularity containers making installation trivial and results highly reproducible. Like other workflow languages it provides useful features like -resume to only rerun tasks that haven't already been completed (e.g., allowing editing of inputs/tasks and recovery from crashes without a full re-run). The nf-core project provided overall project template, pre-written software modules when available, and general best practice recommendations.

Pipeline summary

Read processing:

  1. Raw Read QC (FastQC)
  2. Read Trimming (fastp)
  3. Trimmed Read QC (FastQC)
  4. Taxonomic Profiling (kraken2)

Assembly:

  1. Unicycler (unicycler)
  2. QUAST QC (quast)
  3. CheckM QC ('checkm`)

Annotation:

  1. Prokka (prokka)
  2. AMR (RGI)
  3. Plasmids (mob_suite)
  4. CAZY, VFDB, and BacMet query using DIAMOND (diamond)

Phylogeny:

  1. Roary (roary)
  2. SNPSites (snpsites)
  3. IQTree (iqtree)

Future Development Targets

A list in no particular order of outstanding development features, both in-progress and planned:

  • CI/CD testing of local modules and pipeline logic

  • Sensible default QC parameters to allow automated end-to-end execution with little-to-no required user intervention

  • Consider updating to newer SPAdes as unicycler is dependent on an older version (and newer spades can integrate plasmidspades runs on the same assembly graph).

  • Download tool to download external resources and containers to allow smooth operation in HPC environments where compute nodes have no internet access

  • Bifurcated logic: "Single Species" mode and "Multi Species" mode

  • Integration of additional tools and scripts:

  1. Prophage identification (e.g., PHASTER)
  2. Genomic Island Detection (e.g., IslandCompare)
  3. ICE identification (e.g., ICEFinder)
  4. Ortholog detection in multi-species datasets (e.g. OrthoFinder)
  5. Inference of recombination events (e.g. Gubbins, CFML)
  6. Integration of partner-developed tools and algorithms such as Community Co-Evolution model
  7. Improved result reporting, such as auto-generated figures and more concise aggregated tables

Quick Start

  1. Install nextflow

  2. Install Docker, Singularity, or, as a last resort, Conda. Also ensure you have a working curl installed (should be present on almost all systems).

Note: this workflow should also support Podman, Shifter or Charliecloud execution for full pipeline reproducibility. We have minimized reliance on conda and suggest using it only as a last resort (see docs). Configure mail on your system to send an email on workflow success/failure (without this you may get a small error at the end Failed to invoke workflow.onComplete event handler but this doesn't mean the workflow didn't finish successfully).

  1. Download the pipeline and test with a stub-run. The stub-run will ensure that the pipeline is able to download and use containers as well as execute in the propepr logic.

    nextflow run arete/ --input_sample_table samplesheet.csv -profile <docker/singularity/conda> -stub-run
    • Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use -profile <institute> in your command. This will enable either docker or singularity and set the appropriate execution settings for your local compute environment.
    • If you are using singularity then the pipeline will auto-detect this and attempt to download the Singularity images directly as opposed to performing a conversion from Docker images. If you are persistently observing issues downloading Singularity images directly due to timeout or network issues then please use the --singularity_pull_docker_container parameter to pull and convert the Docker image instead.
  2. Start running your own analysis (ideally using -profile docker or -profile singularity for stability)!

    nextflow run arete -profile <docker/singularity> --input_sample_table samplesheet.csv 

samplesheet.csv must be formatted sample,fastq_1,fastq_2

Note: If you get this error at the end Failed to invoke `workflow.onComplete` event handler it isn't a problem, it just means you don't have an sendmail configured and it can't send an email report saying it finished correctly i.e., its not that the workflow failed.

See usage docs for all of the available options when running the pipeline.

Documentation

The ARETE pipeline comes with documentation about the pipeline: usage and output.

Credits

ARETE was written by Finlay Maguire and is currently developed by Alex Manuele.

Contributions and Support

Thank you for your interest in contributing to ARETE. We are currently in the process of formalizing contribution guidelines. In the meantime, please feel free to open an issue describing your suggested changes.

Citations

This pipeline uses code and infrastructure developed and maintained by the nf-core initative, and reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

In addition, references of tools and data used in this pipeline are as follows can be found in the CITATIONS.md file.

About

AMR/VF LGT focused bacterial genomics analysis workflow

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages