Skip to content

Commit

Permalink
refactor: Remove roary (#25)
Browse files Browse the repository at this point in the history
* refactor: Remove roary from phylo subwf

* docs: Remove roary from documentation
  • Loading branch information
jvfe committed Feb 3, 2023
1 parent 069400d commit 786c69c
Show file tree
Hide file tree
Showing 14 changed files with 108 additions and 328 deletions.
1 change: 1 addition & 0 deletions .nf-core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ lint:
nextflow_config:
- params.input
- manifest.name
repository_type: pipeline
4 changes: 2 additions & 2 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,8 @@
* [RGI](https://github.com/arpcard/rgi)
> Alcock et al. 2020. CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database. Nucleic Acids Research, Volume 48, Issue D1, Pages D517-525 [PMID 31665441]
* [Roary](https://github.com/sanger-pathogens/Roary)
> Page AJ, Cummins CA, Hunt M, Wong VK, Reuter S, Holden MTG, Fookes M, Falush D, Keane JA, Parkhill J [Roary: rapid large-scale prokaryote pan genome analysis.](https://doi.org/10.1093/bioinformatics/btv421) _Bioinformatics_ 31, 3691–3693 (2015)
* [Panaroo](https://github.com/gtonkinhill/panaroo)
> Tonkin-Hill, G., MacAlasdair, N., Ruis, C. et al. Producing polished prokaryotic pangenomes with the Panaroo pipeline. Genome Biol 21, 180 (2020). https://doi.org/10.1186/s13059-020-02090-4
* [SNP-sites](https://pubmed.ncbi.nlm.nih.gov/28348851/)
> Page AJ, Taylor B, Delaney AJ, Soares J, Seemann T, Keane JA, Harris SR. SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. Microb Genom. 2016 Apr 29;2(4):e000056. doi: 10.1099/mgen.0.000056. PMID: 28348851; PMCID: PMC5320690.
Expand Down
43 changes: 23 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,49 +7,54 @@
[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)


![aretelogo](docs/images/arete_logo.png)

## Introduction

<!-- TODO nf-core: Write a 1-2 sentence summary of what data the pipeline is for and what it does -->

**ARETE** is a bioinformatics best-practice analysis pipeline for AMR/VF LGT-focused bacterial genomics workflow.

The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker / Singularity containers making installation trivial and results highly reproducible.
Like other workflow languages it provides [useful features](https://www.nextflow.io/docs/latest/getstarted.html#modify-and-resume) like `-resume` to only rerun tasks that haven't already been completed (e.g., allowing editing of inputs/tasks and recovery from crashes without a full re-run).
The [nf-core](https://nf-cor.re) project provided overall project template, pre-written software modules when available, and general best practice recommendations.

<!-- TODO nf-core: Add full-sized test dataset and amend the paragraph below if applicable
<!-- TODO nf-core: Add full-sized test dataset and amend the paragraph below if applicable
On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. -->

## Pipeline summary

<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->

Read processing:

1. Raw Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
2. Read Trimming ([`fastp`](https://github.com/OpenGene/fastp))
3. Trimmed Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
4. Taxonomic Profiling ([`kraken2`](http://ccb.jhu.edu/software/kraken2/))

Assembly:

1. Unicycler ([`unicycler`](https://github.com/rrwick/Unicycler))
2. QUAST QC ([`quast`](http://quast.sourceforge.net/))
3. CheckM QC (['checkm`](https://github.com/Ecogenomics/CheckM))
3. CheckM QC ([`checkm`](https://github.com/Ecogenomics/CheckM))

Annotation:

1. Prokka ([`prokka`](https://github.com/tseemann/prokka))
2. AMR ([`RGI`](https://github.com/arpcard/rgi))
3. Plasmids ([`mob_suite`](https://github.com/phac-nml/mob-suite))
4. CAZY, VFDB, and BacMet query using DIAMOND ([`diamond`](https://github.com/bbuchfink/diamond))

Phylogeny:
1. Roary ([`roary`](https://sanger-pathogens.github.io/Roary/))
2. IQTree ([`iqtree`](http://www.iqtree.org/))

1. Panaroo ([`panaroo`](https://github.com/gtonkinhill/panaroo)`
2. SNP-sites([`SNPsites`](https://github.com/sanger-pathogens/snp-sites))
3. IQTree ([`iqtree`](http://www.iqtree.org/))

### Future Development Targets

A list in no particular order of outstanding development features, both in-progress and planned:
A list in no particular order of outstanding development features, both in-progress and planned:

- CI/CD testing of local modules and pipeline logic

Expand All @@ -71,25 +76,24 @@ A list in no particular order of outstanding development features, both in-progr

Note: this workflow should also support [`Podman`](https://podman.io/), [`Shifter`](https://nersc.gitlab.io/development/shifter/how-to-use/) or [`Charliecloud`](https://hpc.github.io/charliecloud/) execution for full pipeline reproducibility. We have minimized reliance on `conda` and suggest using it only as a last resort (see [docs](https://nf-co.re/usage/configuration#basic-configuration-profiles)). Configure `mail` on your system to send an email on workflow success/failure (without this you may get a small error at the end `Failed to invoke workflow.onComplete event handler` but this doesn't mean the workflow didn't finish successfully).

3. Download the pipeline and test with a `stub-run`. The `stub-run` will ensure that the pipeline is able to download and use containers as well as execute in the proper logic.
3. Download the pipeline and test with a `stub-run`. The `stub-run` will ensure that the pipeline is able to download and use containers as well as execute in the proper logic.

```bash
nextflow run arete/ --input_sample_table samplesheet.csv -profile <docker/singularity/conda> -stub-run
```

* Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile <institute>` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment.
* If you are using `singularity` then the pipeline will auto-detect this and attempt to download the Singularity images directly as opposed to performing a conversion from Docker images. If you are persistently observing issues downloading Singularity images directly due to timeout or network issues then please use the `--singularity_pull_docker_container` parameter to pull and convert the Docker image instead.
- Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile <institute>` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment.
- If you are using `singularity` then the pipeline will auto-detect this and attempt to download the Singularity images directly as opposed to performing a conversion from Docker images. If you are persistently observing issues downloading Singularity images directly due to timeout or network issues then please use the `--singularity_pull_docker_container` parameter to pull and convert the Docker image instead.

4. Start running your own analysis (ideally using `-profile docker` or `-profile singularity` for stability)!

```bash
nextflow run arete -profile <docker/singularity> --input_sample_table samplesheet.csv
```
`samplesheet.csv` must be formatted `sample,fastq_1,fastq_2`
4. Start running your own analysis (ideally using `-profile docker` or `-profile singularity` for stability)!

```bash
nextflow run arete -profile <docker/singularity> --input_sample_table samplesheet.csv
```

`samplesheet.csv` must be formatted `sample,fastq_1,fastq_2`

**Note**: If you get this error at the end ```Failed to invoke `workflow.onComplete` event handler``` it isn't a problem, it just means you don't have an sendmail configured and it can't send an email report saying it finished correctly i.e., its not that the workflow failed.
**Note**: If you get this error at the end `` Failed to invoke `workflow.onComplete` event handler `` it isn't a problem, it just means you don't have an sendmail configured and it can't send an email report saying it finished correctly i.e., its not that the workflow failed.

See [usage docs](docs/usage.md) for all of the available options when running the pipeline.

Expand All @@ -98,10 +102,10 @@ See [usage docs](docs/usage.md) for all of the available options when running th
To test the worklow on a minimal dataset you can use the test configuration (with either docker, conda, or singularity - replace `docker` below as appropriate):

```bash
nextflow run arete -profile test,docker
nextflow run arete -profile test,docker
```

Due to download speed of the Kraken2 database and CAZY database this will take ~25 minutes.
Due to download speed of the Kraken2 database and CAZY database this will take ~25 minutes.
However to accelerate it you can download/cache the database files to a folder (e.g., `test/db_cache`) and provide a database cache parameter.

```bash
Expand All @@ -119,8 +123,8 @@ ARETE was written by [Finlay Maguire](https://github.com/fmaguire) and is curren
## Contributions and Support

<!--If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).-->
Thank you for your interest in contributing to ARETE. We are currently in the process of formalizing contribution guidelines. In the meantime, please feel free to open an issue describing your suggested changes.

Thank you for your interest in contributing to ARETE. We are currently in the process of formalizing contribution guidelines. In the meantime, please feel free to open an issue describing your suggested changes.

## Citations

Expand All @@ -134,6 +138,5 @@ This pipeline uses code and infrastructure developed and maintained by the [nf-c
> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
>
> Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
>
In addition, references of tools and data used in this pipeline are as follows can be found in the [`CITATIONS.md`](CITATIONS.md) file.
1 change: 0 additions & 1 deletion conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ params {
input_sample_table = "test/test_dataset.csv"
use_bakta = false
db_cache = false
use_roary = true
use_full_alignment = true
use_fasttree = true
}
Loading

0 comments on commit 786c69c

Please sign in to comment.