Skip to content

Commit

Permalink
docs: Add round of general documentation improvements (#154)
Browse files Browse the repository at this point in the history
* docs: Group output doc tools by subworfklow

* docs: Add the other entries to usage doc

* docs: Add README examples

* docs: Add slurm HPC faq question

* docs: Change header
  • Loading branch information
jvfe committed Jul 27, 2023
1 parent 9cb7e38 commit a10f7bd
Show file tree
Hide file tree
Showing 4 changed files with 266 additions and 105 deletions.
71 changes: 65 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,11 +52,13 @@ Annotation:

- Genome annotation with Bakta ([`bakta`](https://github.com/oschwengers/bakta)) or Prokka ([`prokka`](https://github.com/tseemann/prokka))
- Feature prediction:
- AMR genes with the Resistance Gene Identifier ([`RGI`](https://github.com/arpcard/rgi))
- Plasmids with MOB-Suite ([`mob_suite`](https://github.com/phac-nml/mob-suite))
- Genomic Islands with IslandPath ([`IslandPath`](https://github.com/brinkmanlab/islandpath))
- Phages with PhiSpy ([`PhiSpy`](https://github.com/linsalrob/PhiSpy))
- Specialized databaes: CAZY, VFDB, and BacMet using DIAMOND homology search ([`diamond`](https://github.com/bbuchfink/diamond))

- AMR genes with the Resistance Gene Identifier ([`RGI`](https://github.com/arpcard/rgi))
- Plasmids with MOB-Suite ([`mob_suite`](https://github.com/phac-nml/mob-suite))
- Genomic Islands with IslandPath ([`IslandPath`](https://github.com/brinkmanlab/islandpath))
- Phages with PhiSpy ([`PhiSpy`](https://github.com/linsalrob/PhiSpy))
- (_optionally_) Integrons with [`IntegronFinder`](https://github.com/gem-pasteur/Integron_Finder)
- Specialized databases: CAZY, VFDB, BacMet and ICEberg2 using DIAMOND homology search ([`diamond`](https://github.com/bbuchfink/diamond))

Phylogenomics:

Expand Down Expand Up @@ -137,16 +139,73 @@ nextflow run beiko-lab/ARETE \
--bakta_db $PWD/baktadb/db-light
```

## A couple of examples <a name="examples"></a>
## Examples <a name="examples"></a>

The fine details of how to run ARETE are described in the command reference and documentation, but here are a couple of illustrative examples:

### Assembly, annotation, and pan-genome inference from a modestly sized dataset (50 or so genomes) from paired-end reads

```bash
nextflow run beiko-lab/ARETE \
--input_sample_table samplesheet.csv \
--annotation_tools 'mobsuite,rgi,vfdb,bacmet,islandpath,phispy,report' \
--poppunk_model bgmm \
-profile docker
```

Parameters used:

- `--input_sample_table` - Input dataset in samplesheet format (See [usage](https://beiko-lab.github.io/arete/usage/#samplesheet-input))
- `--annotation_tools` - Select the annotation tools and modules to be executed (See the [parameter documentation](https://beiko-lab.github.io/arete/params/#annotation) for defaults)
- `--poppunk_model` - Model to be used by [PopPUNK](poppunk.readthedocs.io/)
- `-profile docker` - Run tools in docker containers.

### Annotation to evolutionary dynamics on 300-ish genomes

```bash
nextflow run beiko-lab/ARETE \
--input_sample_table samplesheet.csv \
--poppunk_model dbscan \
--run_recombination \
--run_gubbins \
--use_ppanggolin \
-entry annotation \
-profile docker
```

Parameters used:

- `--input_sample_table` - Input dataset in samplesheet format (See [usage](https://beiko-lab.github.io/arete/usage/#samplesheet-input))
- `--poppunk_model` - Model to be used by [PopPUNK](poppunk.readthedocs.io/).
- `--run_recombination` - Run the recombination subworkflow.
- `--run_gubbins` - Run [Gubbins](https://github.com/nickjcroucher/gubbins) as part of the recombination subworkflow.
- `--use_ppanggolin` - Use [PPanGGOLiN](https://github.com/labgem/PPanGGOLiN) for calculating the pangenome. Tends to perform better on larger input sets.
- `-entry annotation` - Run annotation subworkflow and further steps (See [usage](https://beiko-lab.github.io/arete/usage/)).
- `-profile docker` - Run tools in docker containers.

### Annotation to evolutionary dynamics on 10,000 genomes

```bash
nextflow run beiko-lab/ARETE \
--input_sample_table samplesheet.csv \
--poppunk_model dbscan \
--use_ppanggolin \
--run_recombination \
--enable_subsetting \
-entry annotation \
-profile docker
```

Parameters used:

- `--input_sample_table` - Input dataset in samplesheet format (See [usage](https://beiko-lab.github.io/arete/usage/#samplesheet-input))
- `--poppunk_model` - Model to be used by [PopPUNK](poppunk.readthedocs.io/).
- `--run_recombination` - Run the recombination subworkflow.
- `--use_ppanggolin` - Use [PPanGGOLiN](https://github.com/labgem/PPanGGOLiN) for calculating the pangenome. Tends to perform better on larger input sets.
- `--enable_subsetting` - Enable subsetting workflow based on genome similarity (See [subsetting documentation](https://beiko-lab.github.io/arete/subsampling/))
- `-entry annotation` - Run annotation subworkflow and further steps (See [usage](https://beiko-lab.github.io/arete/usage/)).
- `-profile docker` - Run tools in docker containers.

## Credits <a name="credits"></a>

The ARETE software was originally written and developed by [Finlay Maguire](https://github.com/fmaguire) and [Alex Manuele](https://github.com/alexmanuele), and is currently developed by [João Cavalcante](https://github.com/jvfe).
Expand Down
37 changes: 36 additions & 1 deletion docs/faq.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,38 @@
# Frequently Asked Questions

## How do I x?
## How do I run ARETE in a Slurm HPC environment?

- Set a config file under `~/.nextflow/config` to use the slurm executor:

```json
process {
executor = 'slurm'
pollInterval = '60 sec'
submitRateLimit = '60/1min'
queueSize = 100
// If an account is necessary:
clusterOptions = '--account=<my-account>'
}
```

See the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html#scope-executor) for a description of these options.

- Now, when running ARETE, you'll need to set additional options if your compute nodes don't have network access - as is common for most Slurm clusters. The example below uses the default test data, i.e. the `test` profile, for demonstration purposes only.

```bash
nextflow run beiko-lab/ARETE \
--db_cache path/to/db_cache \
--bakta_db path/to/baktadb \
-profile test,singularity
```

Apart from `-profile singularity`, which just makes ARETE use Singularity/Apptainer containers for running the tools, there are two additional parameters:

- `--db_cache` should be the location for the pre-downloaded databases used in the DIAMOND alignments (i.e. Bacmet, VFDB, ICEberg2 and CAZy FASTA files) and in the Kraken2 taxonomic read classification.

- Although these tools run by default, you can change the selection of annotation tools by changing `--annotation_tools` and
skip Kraken2 by adding `--skip_kraken`. See the [parameter documentation](https://beiko-lab.github.io/arete/params/) for a full list of parameters and their defaults.

- `--bakta_db` should be the location of the pre-downloaded [Bakta database](https://github.com/oschwengers/bakta#database-download)

- Alternatively, you can use Prokka for annotating your assemblies, since it doesn't require a downloaded database (`--use_prokka`).
Loading

0 comments on commit a10f7bd

Please sign in to comment.