Skip to content

Commit

Permalink
feat: Replace VIBRANT with PhiSpy (#116)
Browse files Browse the repository at this point in the history
* feat: Add PhiSpy to annotation subworkflow

* refactor: Remove VIBRANT from the workflow

* chore: Patch phispy image

* refactor: Add param to skip phispy

* feat: Add some optional outputs to phispy

* feat: Concatenate PhiSpy output

* feat: Add PhiSpy info to report

* fix: Change to correct phispy columns

* fix: Change merge order for phispy

* Revert "fix: Change merge order for phispy"

This reverts commit 15993c7.

* Revert "fix: Change to correct phispy columns"

This reverts commit 336f5dd.

* Revert "feat: Add PhiSpy info to report"

This reverts commit 8c88c73.

* docs: Fix integron finder name in yml
  • Loading branch information
jvfe committed Jun 3, 2023
1 parent 15b93f9 commit 080d88c
Show file tree
Hide file tree
Showing 18 changed files with 263 additions and 205 deletions.
4 changes: 2 additions & 2 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,8 +81,8 @@

> Claire Bertelli, Fiona S L Brinkman, Improved genomic island predictions with IslandPath-DIMOB, Bioinformatics, Volume 34, Issue 13, 01 July 2018, Pages 2161–2167, https://doi.org/10.1093/bioinformatics/bty095
- [VIBRANT](https://doi.org/10.1186/s40168-020-00867-0)
> Kieft, K., Zhou, Z. & Anantharaman, K. VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences. Microbiome 8, 90 (2020). https://doi.org/10.1186/s40168-020-00867-0
- [PhiSpy](https://doi.org/10.1093/nar/gks406)
> Sajia Akhter, Ramy K. Aziz, Robert A. Edwards; PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies. Nucl Acids Res 2012; 40 (16): e126. doi: 10.1093/nar/gks406
## Software packaging/containerisation tools

Expand Down
9 changes: 4 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,14 +38,14 @@ Annotation:
- AMR ([`RGI`](https://github.com/arpcard/rgi))
- Plasmids ([`mob_suite`](https://github.com/phac-nml/mob-suite))
- Genomic Islands ([`IslandPath`](https://github.com/brinkmanlab/islandpath))
- Virus identification ([`VIBRANT`](https://github.com/AnantharamanLab/VIBRANT))
- Phage identification ([`PhiSpy`](https://github.com/linsalrob/PhiSpy))
- CAZY, VFDB, and BacMet query using DIAMOND ([`diamond`](https://github.com/bbuchfink/diamond))

Phylogeny:

- Panaroo ([`panaroo`](https://github.com/gtonkinhill/panaroo))
- FastTree ([`fasttree`](http://www.microbesonline.org/fasttree/))
- (_optionally_) SNP-sites([`SNPsites`](https://github.com/sanger-pathogens/snp-sites))
- (_optionally_) SNP-sites ([`SNPsites`](https://github.com/sanger-pathogens/snp-sites))
- (_optionally_) IQTree ([`iqtree`](http://www.iqtree.org/))

Other:
Expand Down Expand Up @@ -92,11 +92,10 @@ To test the worklow on a minimal dataset you can use the test configuration (wit
```

Due to download speed of the Kraken2, Bakta and CAZY databases this will take ~35 minutes.
However to accelerate it you can download/cache the database files to a folder (e.g., `test/db_cache`) and provide a database cache parameter. As well as set `--bakta_db` to the directory containing the Bakta database and `--vibrant_db`
to the directory containing the VIBRANT database.
However to accelerate it you can download/cache the database files to a folder (e.g., `test/db_cache`) and provide a database cache parameter. As well as set `--bakta_db` to the directory containing the Bakta database.

```bash
nextflow run beiko-lab/ARETE -profile test,docker --db_cache $PWD/test/db_cache --bakta_db $PWD/baktadb/db-light --vibrant_db $PWD/vibrant/
nextflow run beiko-lab/ARETE -profile test,docker --db_cache $PWD/test/db_cache --bakta_db $PWD/baktadb/db-light
```

## Documentation
Expand Down
15 changes: 12 additions & 3 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -255,15 +255,24 @@ process {
]
}

withName: VIBRANT_VIBRANTRUN {
ext.args = '-no_plot'
withName: PHISPY {
ext.args = '--output_choice 27'
ext.prefix = { "${meta.id}_phispy" }
publishDir = [
path: { "${params.outdir}/annotation/vibrant/" },
path: { "${params.outdir}/annotation/phispy/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: CONCAT_PHISPY {
publishDir = [
path: { "${params.outdir}/annotation/phispy/" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('PHISPY.txt') ? filename : null }
]
}

withName: INTEGRON_FINDER {
ext.args = '--gbk'
publishDir = [
Expand Down
2 changes: 1 addition & 1 deletion conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ params {
use_prokka = true
skip_kraken = true
skip_poppunk = true
skip_vibrant = true
skip_phispy = true
light = true
}

Expand Down
12 changes: 6 additions & 6 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [MobRecon](#mobrecon) - Reconstruction and typing of plasmids
- [RGI](#rgi) - Detection and annotation of AMR determinants
- [IslandPath](#islandpath) - Predicts genomic islands in bacterial and archaeal genomes.
- [VIBRANT](#vibrant) - Automated recovery and annotation of bacterial and archaeal viruses
- [PhiSpy](#phispy) - Prediction of prophages from bacterial genomes
- [IntegronFinder](#integronfinder) - Finds integrons in DNA sequences
- [Diamond](#diamond) - Detection and annotation of genes using external databases.
- [CAZy](#cazy): Carbohydrate metabolism
Expand Down Expand Up @@ -220,19 +220,19 @@ Disabled by default. Enable by adding `--run_integronfinder` to your command.

[Integron Finder](https://github.com/gem-pasteur/Integron_Finder) is a bioinformatics tool to find integrons in bacterial genomes.

### VIBRANT
### PhiSpy

<details markdown="1">
<summary>Output files</summary>

- `annotation/vibrant/`
- `${sample_id}/` : VIBRANT results will be in one directory per genome.
- `annotation/phispy/`
- `${sample_id}/` : PhiSpy results will be in one directory per genome.

See the [VIBRANT documentation](https://github.com/AnantharamanLab/VIBRANT/blob/master/output_explanations.pdf) for an extensive description of the output.
See the [PhiSpy documentation](https://github.com/linsalrob/PhiSpy#output-files) for an extensive description of the output.

</details>

[VIBRANT](https://github.com/AnantharamanLab/VIBRANT) is a tool for automated recovery and annotation of bacterial and archaeal viruses, determination of genome completeness, and characterization of viral community function from metagenomic assemblies.
[PhiSpy](https://github.com/linsalrob/PhiSpy) is a tool for identification of prophages in Bacterial (and probably Archaeal) genomes. Given an annotated genome it will use several approaches to identify the most likely prophage regions.

### Panaroo

Expand Down
6 changes: 6 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,12 @@
"installed_by": ["modules"],
"patch": "modules/nf-core/panaroo/run/panaroo-run.diff"
},
"phispy": {
"branch": "master",
"git_sha": "a60792caf1782dd570ad7a091b61806c592734d7",
"installed_by": ["modules"],
"patch": "modules/nf-core/phispy/phispy.diff"
},
"prokka": {
"branch": "master",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
Expand Down
2 changes: 1 addition & 1 deletion modules/local/integronfinder/meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ keywords:
- integron
- annotation
tools:
- vibrant:
- integron_finder:
description: Bioinformatics tool to find integrons in bacterial genomes
homepage: https://github.com/gem-pasteur/Integron_Finder
documentation: https://integronfinder.readthedocs.io/en/latest/
Expand Down
34 changes: 0 additions & 34 deletions modules/local/vibrant/downloadb/main.nf

This file was deleted.

32 changes: 0 additions & 32 deletions modules/local/vibrant/downloadb/meta.yml

This file was deleted.

50 changes: 0 additions & 50 deletions modules/local/vibrant/vibrantrun/main.nf

This file was deleted.

45 changes: 0 additions & 45 deletions modules/local/vibrant/vibrantrun/meta.yml

This file was deleted.

80 changes: 80 additions & 0 deletions modules/nf-core/phispy/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
process PHISPY {
tag "$meta.id"
label 'process_medium'

conda "bioconda::phispy=4.2.21"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/phispy:4.2.21--py310h30d9df9_1':
'quay.io/biocontainers/phispy:4.2.21--py310h30d9df9_1' }"

input:
tuple val(meta), path(gbk)

output:
tuple val(meta), path("${prefix}.tsv") , emit: coordinates
tuple val(meta), path("${prefix}.gb*") , emit: gbk
tuple val(meta), path("${prefix}.log") , emit: log
tuple val(meta), path("${prefix}_prophage_information.tsv"), optional:true, emit: information
tuple val(meta), path("${prefix}_bacteria.fasta") , optional:true, emit: bacteria_fasta
tuple val(meta), path("${prefix}_bacteria.gbk") , optional:true, emit: bacteria_gbk
tuple val(meta), path("${prefix}_phage.fasta") , optional:true, emit: phage_fasta
tuple val(meta), path("${prefix}_phage.gbk") , optional:true, emit: phage_gbk
tuple val(meta), path("${prefix}_prophage.gff3") , optional:true, emit: prophage_gff
tuple val(meta), path("${prefix}_prophage.tbl") , optional:true, emit: prophage_tbl
tuple val(meta), path("${prefix}_prophage.tsv") , optional:true, emit: prophage_tsv
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}"
// Extract GBK file extension, i.e. .gbff, .gbk.gz
gbk_extension = gbk.getName() - gbk.getSimpleName()

if ("$gbk" == "${prefix}${gbk_extension}") error "Input and output names are the same, set prefix in module configuration to disambiguate!"

"""
PhiSpy.py \\
$args \\
--threads $task.cpus \\
-p $prefix \\
-o . \\
$gbk
mv ${prefix}_prophage_coordinates.tsv ${prefix}.tsv
mv ${prefix}_${gbk} ${prefix}${gbk_extension}
mv ${prefix}_phispy.log ${prefix}.log
cat <<-END_VERSIONS > versions.yml
"${task.process}":
PhiSpy: \$(echo \$(PhiSpy.py --version 2>&1))
END_VERSIONS
"""

stub:
prefix = task.ext.prefix ?: "${meta.id}"
gbk_extension = gbk.getName() - gbk.getSimpleName()

if ("$gbk" == "${prefix}${gbk_extension}") error "Input and output names are the same, set prefix in module configuration to disambiguate!"

"""
touch ${prefix}.tsv
touch ${prefix}${gbk_extension}
touch ${prefix}.log
touch ${prefix}_prophage_information.tsv
touch ${prefix}_bacteria.fasta
touch ${prefix}_bacteria.gbk
touch ${prefix}_phage.fasta
touch ${prefix}_phage.gbk
touch ${prefix}_prophage.gff3
touch ${prefix}_prophage.tbl
touch ${prefix}_prophage.tsv
cat <<-END_VERSIONS > versions.yml
"${task.process}":
PhiSpy: \$(echo \$(PhiSpy.py --version 2>&1))
END_VERSIONS
"""
}
Loading

0 comments on commit 080d88c

Please sign in to comment.