02 Running and Monitoring Jobs

Alt text

Executing the main workflow

BGCFlow can be executed as a normal snakemake workflow, or using the run command from the bgcflow_wrapper.

$ bgcflow run --help
Usage: bgcflow run [OPTIONS]

  A snakemake CLI wrapper to run BGCFlow. Automatically run panoptes.

Options:
  --bgcflow_dir TEXT   Location of BGCFlow directory. (DEFAULT: Current
                       working directory.)
  --workflow TEXT      Select which snakefile to run. Available subworkflows:
                       {BGC|Database|Report|Metabase}. (DEFAULT:
                       workflow/Snakefile)
  --wms-monitor TEXT   Panoptes address. (DEFAULT: http://127.0.0.1:5000)
  -c, --cores INTEGER  Use at most N CPU cores/jobs in parallel. (DEFAULT: 8)
  -n, --dryrun         Test run.
  --unlock             Remove a lock on the snakemake working directory.
  --until TEXT         Runs the pipeline until it reaches the specified rules
                       or files.
  -t, --touch          Touch output files (mark them up to date without really
                       changing them).
  -h, --help           Show this message and exit.

Running bgcflow run will execute the main Snakemake workflow from the Snakefile located in workflow/Snakefile. It is common to execute a dry-run before submitting the real jobs to make sure the configurations does not have errors.

Running the workflow will return this:

$ bgcflow run -n

DEBUG    15/08 15:32:36   Starting new HTTP connection (1): 127.0.0.1:5000
Running Panoptes to monitor BGCFlow jobs at http://127.0.0.1:5000
Panoptes job id: 20679
Connecting to Panoptes...
DEBUG    15/08 15:32:36   Starting new HTTP connection (1): 127.0.0.1:5000
Retrying to connect: 1x
 * Serving Flask app 'panoptes.app'
 * Debug mode: off
DEBUG    15/08 15:32:37   Starting new HTTP connection (1): 127.0.0.1:5000
DEBUG    15/08 15:32:37   http://127.0.0.1:5000 "GET /api/service-info HTTP/1.1" 200 21
Panoptes status: running
cd . && snakemake --snakefile workflow/Snakefile --use-conda --keep-going --rerun-incomplete --rerun-triggers mtime -c 8 --dryrun    --wms-monitor http://127.0.0.1:5000

This is BGCflow version 0.7.1.

Checking dependencies...
Found configuration setting to use antiSMASH 7
antismash from: workflow/envs/antismash.yaml
 - antismash will be installed from git+https://github.com/antismash/antismash.git
 - antismash==7.0.0
bigslice from: workflow/envs/bigslice.yaml
 - bigslice will be installed from git+https://github.com/medema-group/bigslice.git
 - bigslice==103d8f2
cblaster from: workflow/envs/cblaster.yaml
 - cblaster will be installed using pip
 - cblaster==1.3.12
prokka from: workflow/envs/prokka.yaml
 - prokka==1.14.6
eggnog-mapper from: workflow/envs/eggnog.yaml
 - eggnog-mapper==2.1.6
roary from: workflow/envs/roary.yaml
 - roary==3.13.0
seqfu from: workflow/envs/seqfu.yaml
 - seqfu==1.15.3
checkm from: workflow/envs/checkm.yaml
 - checkm==1.1.3
gtdbtk from: workflow/envs/gtdbtk.yaml
 - gtdbtk==2.3.0

Step 1. Extracting project information from config...

Step 2.1 Getting sample information from: config/lactobacillus_delbruecki/project_config.yaml
 - Processing project [Lactobacillus_delbrueckii]
 - Custom input directory: False
 - Getting input files from: /data/a/matinnu/bgcflow/data/raw/fasta
 - Custom input format: False
 - Default input file type: fna
Step 3 Merging genome_ids across projects...

Step 4. Checking for user-defined local resources...
   All resources set.

Step 5. Preparing list of final outputs...
 - Getting outputs for project: Lactobacillus_delbrueckii
 - WARNING: ignoring errors in rule_dictionary
 - Ready to generate all outputs.

GTDB API | Grabbing metadata using GTDB release version: r214
Building DAG of jobs...
...

See the Snakemake documentation for further details of the Snakemake CLI.

Monitoring the workflow

By default, each time a workflow is run, BGCFlow will start monitoring jobs using Panoptes, which can be accessed in http://localhost:5000/.

Once the job is finished, the monitoring server will also be closed. To avoid this, we can serve the monitoring workflow independently by using:

bgcflow serve --panoptes

The command bgcflow serve is a utility tool serving various servers that we will explore in the next section.

$ bgcflow serve --help
Usage: bgcflow serve [OPTIONS]

  Serve static HTML report or other utilities (Metabase, etc.).

Options:
  --port_markdown INTEGER  Port to use. (DEFAULT: 8001)
  --port_panoptes INTEGER  Port to use. (DEFAULT: 8001)
  --file_server TEXT       Port to use for fileserver. (DEFAULT:
                           http://localhost:8002)
  --bgcflow_dir TEXT       Location of BGCFlow directory. (DEFAULT: Current
                           working directory)
  --metabase               Run Metabase server at http://localhost:3000.
                           Requires Java to be installed. See:
                           https://www.metabase.com/docs/latest/installation-
                           and-operation/java-versions
  --panoptes               Run Panoptes server to monitor workflow at
                           http://localhost:5000
  --project TEXT           Name of the project. (DEFAULT: all)
  -h, --help               Show this message and exit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

02 Running and Monitoring Jobs

Executing the main workflow

Monitoring the workflow

Clone this wiki locally