Skip to content

doesm/PocketHotspot

Repository files navigation

PocketHotspot

PocketHotspot is a tool for structure-based drug design that extends MolSnapper by providing advanced methods for selecting initial atom placement and pharmacophore-guided molecule generation.

Overview

PocketHotspot provides enhanced sampling capabilities for generating drug-like molecules in protein pockets. It includes:

  • Multiple initial atom selection methods: score-based, pharmacophore locator, H-bond predictor, and random selection
  • Cavity detection: Multiple modes including pyKVFinder, ligand proximity, and ligand coordinates

Prerequisites

PocketHotspot requires [MolSnapper] as a dependency. Both repositories should be cloned as sibling directories.

Installation

1. Clone MolSnapper

First, clone the MolSnapper repository (the original repository):

cd /path/to/your/workspace
git clone https://github.com/oxpig/MolSnapper.git MolSnapper
cd MolSnapper

2. Clone PocketHotspot

Clone PocketHotspot as a sibling directory to MolSnapper:

cd /path/to/your/workspace
git clone https://github.com/doesm/PocketHotspot.git PocketHotspot

3. Verify Directory Structure

Your directory structure should look like this:

your_workspace/
├── MolSnapper/          # External dependency
│   ├── models/
│   ├── configs/
│   ├── utils/
│   ├── ckpt/
│   │   ├── MolDiff.pt
│   │   └── bond_predictor.pt
│   └── ...
└── PocketHotspot/       # This repository
    ├── utils/
    ├── methods/
    ├── data/
    ├── sample_pocket.py
    ├── sample_dataset.py
    └── ...

4. Install Dependencies

The environment uses Python 3.9, PyTorch 2.0 and CUDA 11.7.

# Create and activate the conda environment (recommended)
conda env create -f PocketHotspot_env.yml
conda activate PocketHotspot

Usage

Sampling for a Single Pocket

Use sample_pocket.py to generate molecules for a single protein pocket:

cd PocketHotspot
python sample_pocket.py \
    --receptor path/to/receptor.pdb \
    --ligand path/to/ligand.sdf \
    --device cuda:0 \
    --ref_atoms_method pharmacophore_locator \
    --config ../MolSnapper/configs/sample/sample_MolDiff.yml

Key Arguments

  • --receptor: Path to receptor PDB/PDBQT file (required)
  • --ligand: Path to ligand SDF/PDBQT/MOL2 file. Required for score_based and for cavity modes ligand_coords, ligand_proximity, kvfinder_with_ligand (and when auto uses them); optional otherwise.
  • --config: Path to MolSnapper config file (default: ../MolSnapper/configs/sample/sample_MolDiff.yml)
  • --device: Device (cuda:0 or cpu; default: cuda:0)
  • --batch_size: Batch size for generation (default: 8)
  • --mol_size: Target molecule size (default: 20)
  • --clash_rate: Clash rate for pipeline (default: 0.1)
  • --ref_atoms_method: Method to select reference atoms (default: hbond_predictor)
    • pharmacophore_locator: Pharmacophore features
    • score_based: Affinity scores (requires --ligand)
    • hbond_predictor: H-bond prediction model
    • random: Random atom placement
  • --pocket_detection: Cavity detection mode (default: ligand_coords). Modes that require --ligand: ligand_coords, ligand_proximity, kvfinder_with_ligand.
    • kvfinder: pyKVFinder blind detection (no ligand)
    • kvfinder_interactive: pyKVFinder interactive (no ligand)
    • kvfinder_with_ligand: pyKVFinder with ligand
    • ligand_coords: Use ligand coordinates directly
    • ligand_proximity: Grid around ligand
    • auto: Try modes in order until one works
  • --ligand_proximity_radius: Radius (Å) for ligand_proximity mode (default: 4.0)

Method-specific options (--atom_fraction, --cutoff, --top_k_per_type, --hbond_model_path, --random_num_atoms, etc.) are listed in Methods.

Example

python sample_pocket.py \
    --receptor data/5NGZ/5ngz_A_rec.pdb \
    --ligand data/5NGZ/5ngz_A_rec_5ngz_2bg_lig_tt_min_0.sdf \
    --device cuda:0 \
    --ref_atoms_method hbond_predictor \
    --pocket_detection ligand_proximity \
    --ligand_proximity_radius 5.0

Sampling with Datasets

Use sample_dataset.py to generate molecules for multiple pockets from CrossDocked datasets:

cd PocketHotspot
python sample_dataset.py \
    --config ../MolSnapper/configs/sample/sample_MolDiff.yml \
    --dataset_dir ./data/crossdocked \
    --device cuda:0 \
    --ref_atoms_method pharmacophore_locator \
    --batch_size 8 \
    --clash_rate 0.1

Key Arguments

  • --config: Path to MolSnapper config file (default: ../MolSnapper/configs/sample/sample_MolDiff.yml or ./configs/sample/sample_MolDiff.yml)
  • --dataset_dir: Directory with CrossDocked dataset files (default: ./data/crossdocked)
  • --outdir: Output directory (default: ./outputs)
  • --device: Device (cuda:0 or cpu; default: cuda:0)
  • --batch_size: Batch size for generation (0 = use config default; default: 0)
  • --clash_rate: Clash rate for pipeline (default: 0.1)
  • --ref_atoms_method: Method to select reference atoms (default: pharmacophore_locator). Same choices as sample_pocket.py
  • --pocket_detection: Cavity detection mode (default: ligand_proximity). Same choices as sample_pocket.py
  • --ligand_proximity_radius: Radius (Å) for ligand_proximity mode (default: 4.0)

Method-specific options (--atom_fraction, --cutoff, --top_k_per_type, --hbond_model_path, --cavity_max_dist, --random_*) are listed in Methods.

Methods

Initial Atom Selection Methods

  1. pharmacophore_locator: Selects atoms based on pharmacophore features

    • --cutoff: Maximum distance to sum contributions (default: 6.0)
    • --top_k_per_type: Maximum pharmacophores per type (default: 3)
  2. score_based: Selects atoms with best affinity scores

    • Requires --ligand argument
    • --atom_fraction: Fraction of best atoms to select (default: 0.2)
  3. hbond_predictor: Uses trained EGNN model to predict H-bond sites

    • --hbond_model_path: Path to trained EGNN model (default: trained_hbond_predictor/best_model.pt)
    • --cavity_max_dist: Maximum distance to cavity in Å (default: 4.0; sample_dataset.py only)
  4. random: Random atom placement

    • --random_num_atoms: Number of atoms (default: 5)
    • --random_min_distance: Minimum distance between atoms in Å (default: 1.5)
    • --random_element_type: Element type (O, N, C, or random)
    • --random_seed: Seed for reproducibility (optional)

Output

Generated molecules are saved in the output directory with:

  • SMILES.txt: SMILES strings of generated molecules
  • samples_all.pt: PyTorch file with all generated molecules
  • *_SDF/: Directory with SDF files of generated molecules
  • reference_atoms_initial.sdf: Initial reference atoms used
  • pocket.pdb: Detected/generated pocket structure
  • cavity_points.pdb: Cavity points (if detected)

About

Structural Hotspot Conditioning for Diffusion-Based Molecular Design

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages