Skip to content

02 Running and Monitoring Jobs

Matin Nuhamunada edited this page Aug 15, 2023 · 3 revisions

Alt text

Executing the main workflow

BGCFlow can be executed as a normal snakemake workflow, or using the run command from the bgcflow_wrapper.

$ bgcflow run --help
Usage: bgcflow run [OPTIONS]

  A snakemake CLI wrapper to run BGCFlow. Automatically run panoptes.

Options:
  --bgcflow_dir TEXT   Location of BGCFlow directory. (DEFAULT: Current
                       working directory.)
  --workflow TEXT      Select which snakefile to run. Available subworkflows:
                       {BGC|Database|Report|Metabase}. (DEFAULT:
                       workflow/Snakefile)
  --wms-monitor TEXT   Panoptes address. (DEFAULT: http://127.0.0.1:5000)
  -c, --cores INTEGER  Use at most N CPU cores/jobs in parallel. (DEFAULT: 8)
  -n, --dryrun         Test run.
  --unlock             Remove a lock on the snakemake working directory.
  --until TEXT         Runs the pipeline until it reaches the specified rules
                       or files.
  -t, --touch          Touch output files (mark them up to date without really
                       changing them).
  -h, --help           Show this message and exit.

Running bgcflow run will execute the main Snakemake workflow from the Snakefile located in workflow/Snakefile. It is common to execute a dry-run before submitting the real jobs to make sure the configurations does not have errors.

Running the workflow will return this:

$ bgcflow run -n

DEBUG    15/08 15:32:36   Starting new HTTP connection (1): 127.0.0.1:5000
Running Panoptes to monitor BGCFlow jobs at http://127.0.0.1:5000
Panoptes job id: 20679
Connecting to Panoptes...
DEBUG    15/08 15:32:36   Starting new HTTP connection (1): 127.0.0.1:5000
Retrying to connect: 1x
 * Serving Flask app 'panoptes.app'
 * Debug mode: off
DEBUG    15/08 15:32:37   Starting new HTTP connection (1): 127.0.0.1:5000
DEBUG    15/08 15:32:37   http://127.0.0.1:5000 "GET /api/service-info HTTP/1.1" 200 21
Panoptes status: running
cd . && snakemake --snakefile workflow/Snakefile --use-conda --keep-going --rerun-incomplete --rerun-triggers mtime -c 8 --dryrun    --wms-monitor http://127.0.0.1:5000

This is BGCflow version 0.7.1.

Checking dependencies...
Found configuration setting to use antiSMASH 7
antismash from: workflow/envs/antismash.yaml
 - antismash will be installed from git+https://github.com/antismash/antismash.git
 - antismash==7.0.0
bigslice from: workflow/envs/bigslice.yaml
 - bigslice will be installed from git+https://github.com/medema-group/bigslice.git
 - bigslice==103d8f2
cblaster from: workflow/envs/cblaster.yaml
 - cblaster will be installed using pip
 - cblaster==1.3.12
prokka from: workflow/envs/prokka.yaml
 - prokka==1.14.6
eggnog-mapper from: workflow/envs/eggnog.yaml
 - eggnog-mapper==2.1.6
roary from: workflow/envs/roary.yaml
 - roary==3.13.0
seqfu from: workflow/envs/seqfu.yaml
 - seqfu==1.15.3
checkm from: workflow/envs/checkm.yaml
 - checkm==1.1.3
gtdbtk from: workflow/envs/gtdbtk.yaml
 - gtdbtk==2.3.0

Step 1. Extracting project information from config...

Step 2.1 Getting sample information from: config/lactobacillus_delbruecki/project_config.yaml
 - Processing project [Lactobacillus_delbrueckii]
 - Custom input directory: False
 - Getting input files from: /data/a/matinnu/bgcflow/data/raw/fasta
 - Custom input format: False
 - Default input file type: fna
Step 3 Merging genome_ids across projects...

Step 4. Checking for user-defined local resources...
   All resources set.

Step 5. Preparing list of final outputs...
 - Getting outputs for project: Lactobacillus_delbrueckii
 - WARNING: ignoring errors in rule_dictionary
 - Ready to generate all outputs.

GTDB API | Grabbing metadata using GTDB release version: r214
Building DAG of jobs...
...

See the Snakemake documentation for further details of the Snakemake CLI.

Monitoring the workflow

By default, each time a workflow is run, BGCFlow will start monitoring jobs using Panoptes, which can be accessed in http://localhost:5000/.

Once the job is finished, the monitoring server will also be closed. To avoid this, we can serve the monitoring workflow independently by using:

bgcflow serve --panoptes

The command bgcflow serve is a utility tool serving various servers that we will explore in the next section.

$ bgcflow serve --help
Usage: bgcflow serve [OPTIONS]

  Serve static HTML report or other utilities (Metabase, etc.).

Options:
  --port_markdown INTEGER  Port to use. (DEFAULT: 8001)
  --port_panoptes INTEGER  Port to use. (DEFAULT: 8001)
  --file_server TEXT       Port to use for fileserver. (DEFAULT:
                           http://localhost:8002)
  --bgcflow_dir TEXT       Location of BGCFlow directory. (DEFAULT: Current
                           working directory)
  --metabase               Run Metabase server at http://localhost:3000.
                           Requires Java to be installed. See:
                           https://www.metabase.com/docs/latest/installation-
                           and-operation/java-versions
  --panoptes               Run Panoptes server to monitor workflow at
                           http://localhost:5000
  --project TEXT           Name of the project. (DEFAULT: all)
  -h, --help               Show this message and exit.