ARETE is a bioinformatics best-practice analysis pipeline for AMR/VF LGT-focused bacterial genomics workflow.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker / Singularity containers making installation trivial and results highly reproducible.
Like other workflow languages it provides useful features like -resume
to only rerun tasks that haven't already been completed (e.g., allowing editing of inputs/tasks and recovery from crashes without a full re-run).
The nf-core project provided overall project template, pre-written software modules when available, and general best practice recommendations.
Read processing:
Assembly:
Annotation:
Phylogeny:
A list in no particular order of outstanding development features, both in-progress and planned:
-
CI/CD testing of local modules and pipeline logic
-
Sensible default QC parameters to allow automated end-to-end execution with little-to-no required user intervention
-
Consider updating to newer SPAdes as unicycler is dependent on an older version (and newer spades can integrate plasmidspades runs on the same assembly graph).
-
Download tool to download external resources and containers to allow smooth operation in HPC environments where compute nodes have no internet access
-
Bifurcated logic: "Single Species" mode and "Multi Species" mode
-
Integration of additional tools and scripts:
- Prophage identification (e.g., PHASTER)
- Genomic Island Detection (e.g., IslandCompare)
- ICE identification (e.g., ICEFinder)
- Ortholog detection in multi-species datasets (e.g. OrthoFinder)
- Inference of recombination events (e.g. Gubbins, CFML)
- Integration of partner-developed tools and algorithms such as Community Co-Evolution model
- Improved result reporting, such as auto-generated figures and more concise aggregated tables
-
Install
nextflow
-
Install
Docker
,Singularity
, or, as a last resort,Conda
. Also ensure you have a workingcurl
installed (should be present on almost all systems).
Note: this workflow should also support Podman
, Shifter
or Charliecloud
execution for full pipeline reproducibility. We have minimized reliance on conda
and suggest using it only as a last resort (see docs). Configure mail
on your system to send an email on workflow success/failure (without this you may get a small error at the end Failed to invoke workflow.onComplete event handler
but this doesn't mean the workflow didn't finish successfully).
-
Download the pipeline and test with a
stub-run
. Thestub-run
will ensure that the pipeline is able to download and use containers as well as execute in the propepr logic.nextflow run arete/ --input_sample_table samplesheet.csv -profile <docker/singularity/conda> -stub-run
- Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use
-profile <institute>
in your command. This will enable eitherdocker
orsingularity
and set the appropriate execution settings for your local compute environment. - If you are using
singularity
then the pipeline will auto-detect this and attempt to download the Singularity images directly as opposed to performing a conversion from Docker images. If you are persistently observing issues downloading Singularity images directly due to timeout or network issues then please use the--singularity_pull_docker_container
parameter to pull and convert the Docker image instead.
- Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use
-
Start running your own analysis (ideally using
-profile docker
or-profile singularity
for stability)!nextflow run arete -profile <docker/singularity> --input_sample_table samplesheet.csv
samplesheet.csv
must be formatted sample,fastq_1,fastq_2
Note: If you get this error at the end Failed to invoke `workflow.onComplete` event handler
it isn't a problem, it just means you don't have an sendmail configured and it can't send an email report saying it finished correctly i.e., its not that the workflow failed.
See usage docs for all of the available options when running the pipeline.
The ARETE pipeline comes with documentation about the pipeline: usage and output.
ARETE was written by Finlay Maguire and is currently developed by Alex Manuele.
Thank you for your interest in contributing to ARETE. We are currently in the process of formalizing contribution guidelines. In the meantime, please feel free to open an issue describing your suggested changes.
This pipeline uses code and infrastructure developed and maintained by the nf-core initative, and reused here under the MIT license.
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
In addition, references of tools and data used in this pipeline are as follows can be found in the CITATIONS.md
file.