PanBGC-DB is a publicly available database and analysis platform designed to explore the genetic diversity and evolutionary dynamics of biosynthetic gene cluster (BGC) families using a pangenome-inspired framework. The platform integrates large-scale BGC datasets from antiSMASH-DB and MIBiG, clusters them into gene cluster families (GCFs), and applies orthologous grouping and openness metrics to characterize core, accessory, and unique genes within each family.
- Interactive visualization of BGC families and gene presence/absence matrices
- Gene classification into core, accessory, and unique types
- Openness metric calculation using Heaps’ Law
- Comparative views of gene architecture, domain organization, and phylogenetic relationships
- Query tool for uploading and comparing custom BGCs
The web platform is available at:
👉 https://panbgc-db.cs.uni-tuebingen.de
This repository contains the source code for:
- The PanBGC-DB web interface
- The backend clustering, orthologous grouping, and openness analysis scripts
- Data processing and visualization modules
These instructions work for Linux and macOS. If you are on Windows, the Windows Subsystem for Linux (WSL) is recommended.
-
Install Miniconda
Download and install Miniconda for your operating system. -
Download Required Tools
- Download and extract the folder
Astral-Pro3from GitHub. - Download the pipeline scripts from this repository file or directly from the website unuder the Visualisation tab.
- Download and extract the folder
For Debian/Ubuntu and macOS with Intel chip:
conda create --name PanBGC_vis -c conda-forge -c bioconda zolFor macOS with Apple Silicon chip:
CONDA_SUBDIR=osx-64 conda create -n PanBGC_vis -c conda-forge -c bioconda zolFor all systems (after creating the environment):
conda activate PanBGC_vis
pip install openpyxl tqdmNote: If you want to use all annotation libraries, you can remove the
-mflag from the following command (increases download and run time):
setup_annotation_dbs.py -m-
Extract the Scripts
Unzip the downloadedScripts.zipto a location of your choice. -
Navigate to the Scripts Directory
cd /path/to/extracted/Scripts- Run the Pipeline
If visualizing a single Gene Cluster Family (GCF), the input folder can contain GenBank files directly.
For multiple GCFs, place subfolders (each containing GenBank files) inside the input folder. No further changes are required.
python PanBGC.py -i /path/to/input_folder \
-o /path/to/result_folder \
-log /path/to/result_folder/log \
-c number_of_threads \
-al /path/to/astral-pro3Note: The
astral-pro3executable is found in the extractedAstral-Pro3folder under:
ASTER-Linux/bin/astral-pro3
Parameters Explained:
-i: Path to the input folder with GenBank files-o: Path to the output directory-log: Log file location-c: Number of CPU threads to use-al: Path toastral-pro3executable
After successful execution, your output folder will contain:
/path/to/result_folder/single_family/Final_Results/Visualisation.json
This file can be uploaded directly using the form on the left side of the PanBGC-DB interface.
Example command:
python PanBGC.py -i ~/data/genomes \
-o ~/results/analysis \
-log ~/results/analysis/pipeline.log \
-c 4 \
-al ~/tools/ASTER-Linux/bin/astral-pro3