docs: Add round of general documentation improvements (#154)

* docs: Group output doc tools by subworfklow * docs: Add the other entries to usage doc * docs: Add README examples * docs: Add slurm HPC faq question * docs: Change header
beiko-lab · Jul 27, 2023 · a10f7bd · a10f7bd
1 parent 9cb7e38
commit a10f7bd
Show file tree

Hide file tree

Showing 4 changed files with 266 additions and 105 deletions.
diff --git a/README.md b/README.md
@@ -52,11 +52,13 @@ Annotation:
 
 - Genome annotation with Bakta ([`bakta`](https://github.com/oschwengers/bakta)) or Prokka ([`prokka`](https://github.com/tseemann/prokka))
 - Feature prediction:
-  - AMR genes with the Resistance Gene Identifier ([`RGI`](https://github.com/arpcard/rgi))
-  - Plasmids with MOB-Suite ([`mob_suite`](https://github.com/phac-nml/mob-suite))
-  - Genomic Islands with IslandPath ([`IslandPath`](https://github.com/brinkmanlab/islandpath))
-  - Phages with PhiSpy ([`PhiSpy`](https://github.com/linsalrob/PhiSpy))
-  - Specialized databaes: CAZY, VFDB, and BacMet using DIAMOND homology search ([`diamond`](https://github.com/bbuchfink/diamond))
+
+      - AMR genes with the Resistance Gene Identifier ([`RGI`](https://github.com/arpcard/rgi))
+      - Plasmids with MOB-Suite ([`mob_suite`](https://github.com/phac-nml/mob-suite))
+      - Genomic Islands with IslandPath ([`IslandPath`](https://github.com/brinkmanlab/islandpath))
+      - Phages with PhiSpy ([`PhiSpy`](https://github.com/linsalrob/PhiSpy))
+      - (_optionally_) Integrons with [`IntegronFinder`](https://github.com/gem-pasteur/Integron_Finder)
+      - Specialized databases: CAZY, VFDB, BacMet and ICEberg2 using DIAMOND homology search ([`diamond`](https://github.com/bbuchfink/diamond))
 
 Phylogenomics:
 
@@ -137,16 +139,73 @@ nextflow run beiko-lab/ARETE \
   --bakta_db $PWD/baktadb/db-light
 ```
 
-## A couple of examples <a name="examples"></a>
+## Examples <a name="examples"></a>
 
 The fine details of how to run ARETE are described in the command reference and documentation, but here are a couple of illustrative examples:
 
 ### Assembly, annotation, and pan-genome inference from a modestly sized dataset (50 or so genomes) from paired-end reads
 
+```bash
+nextflow run beiko-lab/ARETE \
+ --input_sample_table samplesheet.csv \
+ --annotation_tools 'mobsuite,rgi,vfdb,bacmet,islandpath,phispy,report' \
+ --poppunk_model bgmm \
+ -profile docker
+```
+
+Parameters used:
+
+- `--input_sample_table` - Input dataset in samplesheet format (See [usage](https://beiko-lab.github.io/arete/usage/#samplesheet-input))
+- `--annotation_tools` - Select the annotation tools and modules to be executed (See the [parameter documentation](https://beiko-lab.github.io/arete/params/#annotation) for defaults)
+- `--poppunk_model` - Model to be used by [PopPUNK](poppunk.readthedocs.io/)
+- `-profile docker` - Run tools in docker containers.
+
 ### Annotation to evolutionary dynamics on 300-ish genomes
 
+```bash
+nextflow run beiko-lab/ARETE \
+ --input_sample_table samplesheet.csv \
+ --poppunk_model dbscan \
+ --run_recombination \
+ --run_gubbins \
+ --use_ppanggolin \
+ -entry annotation \
+ -profile docker
+```
+
+Parameters used:
+
+- `--input_sample_table` - Input dataset in samplesheet format (See [usage](https://beiko-lab.github.io/arete/usage/#samplesheet-input))
+- `--poppunk_model` - Model to be used by [PopPUNK](poppunk.readthedocs.io/).
+- `--run_recombination` - Run the recombination subworkflow.
+- `--run_gubbins` - Run [Gubbins](https://github.com/nickjcroucher/gubbins) as part of the recombination subworkflow.
+- `--use_ppanggolin` - Use [PPanGGOLiN](https://github.com/labgem/PPanGGOLiN) for calculating the pangenome. Tends to perform better on larger input sets.
+- `-entry annotation` - Run annotation subworkflow and further steps (See [usage](https://beiko-lab.github.io/arete/usage/)).
+- `-profile docker` - Run tools in docker containers.
+
 ### Annotation to evolutionary dynamics on 10,000 genomes
 
+```bash
+nextflow run beiko-lab/ARETE \
+ --input_sample_table samplesheet.csv \
+ --poppunk_model dbscan \
+ --use_ppanggolin \
+ --run_recombination \
+ --enable_subsetting \
+ -entry annotation \
+ -profile docker
+```
+
+Parameters used:
+
+- `--input_sample_table` - Input dataset in samplesheet format (See [usage](https://beiko-lab.github.io/arete/usage/#samplesheet-input))
+- `--poppunk_model` - Model to be used by [PopPUNK](poppunk.readthedocs.io/).
+- `--run_recombination` - Run the recombination subworkflow.
+- `--use_ppanggolin` - Use [PPanGGOLiN](https://github.com/labgem/PPanGGOLiN) for calculating the pangenome. Tends to perform better on larger input sets.
+- `--enable_subsetting` - Enable subsetting workflow based on genome similarity (See [subsetting documentation](https://beiko-lab.github.io/arete/subsampling/))
+- `-entry annotation` - Run annotation subworkflow and further steps (See [usage](https://beiko-lab.github.io/arete/usage/)).
+- `-profile docker` - Run tools in docker containers.
+
 ## Credits <a name="credits"></a>
 
 The ARETE software was originally written and developed by [Finlay Maguire](https://github.com/fmaguire) and [Alex Manuele](https://github.com/alexmanuele), and is currently developed by [João Cavalcante](https://github.com/jvfe).

diff --git a/docs/faq.md b/docs/faq.md
@@ -1,3 +1,38 @@
 # Frequently Asked Questions
 
-## How do I x?
+## How do I run ARETE in a Slurm HPC environment?
+
+- Set a config file under `~/.nextflow/config` to use the slurm executor:
+
+```json
+  process {
+    executor = 'slurm'
+    pollInterval = '60 sec'
+    submitRateLimit = '60/1min'
+    queueSize = 100
+    // If an account is necessary:
+    clusterOptions = '--account=<my-account>'
+  }
+```
+
+See the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html#scope-executor) for a description of these options.
+
+- Now, when running ARETE, you'll need to set additional options if your compute nodes don't have network access - as is common for most Slurm clusters. The example below uses the default test data, i.e. the `test` profile, for demonstration purposes only.
+
+```bash
+nextflow run beiko-lab/ARETE \
+  --db_cache path/to/db_cache \
+  --bakta_db path/to/baktadb \
+  -profile test,singularity
+```
+
+Apart from `-profile singularity`, which just makes ARETE use Singularity/Apptainer containers for running the tools, there are two additional parameters:
+
+- `--db_cache` should be the location for the pre-downloaded databases used in the DIAMOND alignments (i.e. Bacmet, VFDB, ICEberg2 and CAZy FASTA files) and in the Kraken2 taxonomic read classification.
+
+      - Although these tools run by default, you can change the selection of annotation tools by changing `--annotation_tools` and
+        skip Kraken2 by adding `--skip_kraken`. See the [parameter documentation](https://beiko-lab.github.io/arete/params/) for a full list of parameters and their defaults.
+
+- `--bakta_db` should be the location of the pre-downloaded [Bakta database](https://github.com/oschwengers/bakta#database-download)
+
+      - Alternatively, you can use Prokka for annotating your assemblies, since it doesn't require a downloaded database (`--use_prokka`).