MCPB-GPU4PySCF

A GPU-accelerated QM/MM parameterization workflow for metalloenzyme active sites. Combines the MCPB (Metal Center Parameter Builder) methodology with GPU4PySCF to replace the traditional Gaussian QM backend — delivering the same rigorous parameterization on modern NVIDIA hardware.

Overview

Accurate force-field parameterization of metalloenzyme active sites requires high-quality QM data — optimised geometry, vibrational force constants, and electrostatic potential (ESP) charges — that is prohibitively slow for large models (~150 atoms) using conventional CPU-based codes such as Gaussian.

This project replaces the Gaussian QM backend with PySCF / GPU4PySCF, enabling:

DF-DFT (density-fitting DFT) with analytical gradients and Hessians on GPU
Full two-stage RESP fitting within PySCF (no external resp binary required)
Seminario bond/angle force constants directly from the GPU Hessian matrix
An end-to-end pipeline from raw PDB to tleap-ready topology

Step descriptions

Step	Engine	Inputs	Key Outputs
1 — Model building	pyMSMT	Raw PDB	Small/large `.com`, `.in`, mol2 templates
2a — Geometry opt	GPU4PySCF	Small model	Optimised `.xyz`, `.json`
2b — Hessian	GPU4PySCF	Optimised geometry	Force constants `.npy`, `.fchk`
2c — RESP	GPU4PySCF	Large model	Charges `.npy`, `.txt`
3 — Frcmod	Seminario	Hessian `.npy`	`_mcpbpy.frcmod`
4 — Leap prep	pyMSMT	RESP mol2 + PDB	`_mcpbpy.pdb`, `_tleap.in`

Features

Full GPU acceleration — DF-DFT SCF, analytical gradients, and Hessians via GPU4PySCF
Multi-GPU Hessian — built-in patches (lib/patch.py) fix upstream single-GPU bottlenecks and OOM issues that occur with ≥3 GPUs (see Multi-GPU Hessian Patch)
Two-stage RESP fitting — stage-1 free fit; stage-2 with pyMSMT-style backbone charge constraints (chgmod) and CH₂/CH₃ equivalence constraints, entirely within gpu4pyscf.pop.esp
Seminario frcmod — metal-site bond/angle force constants from the GPU4PySCF Hessian (no Gaussian .fchk required)
Open-shell support — automatically selects UKS for radicals / high-spin metal centres
Dispersion correction — D3BJ / D3 / D4 / D3Zero
JSON parameter interface — all settings via a single JSON file; no code edits needed
tleap-ready output — set head/tail connect-atom definitions auto-inserted; redundant backbone bond commands removed
Optional fchk output — Gaussian-compatible .fchk files for Multiwfn, NBO, etc.

Requirements

Package	Version	Notes
Python	≥ 3.11
`pyscf`	2.12.1
`gpu4pyscf`	1.7.0	CUDA toolkit required
`geometric`	1.1	Geometry optimisation driver
`numpy`	≥ 2.0
`openmm`	8.4.0	PDB preparation
`pdbfixer`	1.12.0	PDB repair & protonation
`pyMSMT`	22.0	MCPB model building (steps 1, 3, 4)
AmberTools (`tleap`)	≥ 22	Final topology generation

Installation

conda create -n gpu4pyscf python=3.11
conda activate gpu4pyscf

# Core stack
conda install -c conda-forge numpy openmm pdbfixer pymsmt ambertools

# PySCF
pip install pyscf geometric

# GPU4PySCF  (choose your CUDA version)
pip install gpu4pyscf-cuda12x   # CUDA 12.x
# pip install gpu4pyscf-cuda11x # CUDA 11.x

Clone this repository into any working directory — no installation step required:

git clone <repo-url> mcpb_run
cd mcpb_run

Quick Start

Full pipeline (steps 1 → 4)

python main.py input.pdb output_H.pdb \
    --metal-charge ZN:2 \
    --step1 --step2 --step3 --step4 \
    --json scf_params.json

Add --gpu-json gpu.json to any command that runs a Hessian (step 2) to override the GPU memory-tuning defaults — see GPU Memory Tuning.

Standalone steps

Run each step independently from the directory containing the .in file:

# Step 2 — QM calculations (opt + Hessian + RESP)
python main.py --step2 --json scf_params.json

# Step 3 — Seminario frcmod
python main.py --step3 --json scf_params.json

# Step 4 — mol2 files, _mcpbpy.pdb, _tleap.in
python main.py --step4 --json scf_params.json

Running tleap

tleap -s -f {gname}_tleap.in

SCF Parameters (JSON)

All settings are controlled by a single JSON file passed with --json. Copy and edit scf_params.json:

{
    "basis":                "def2-svp",
    "auxbasis":             "def2-svp-ri",
    "xc":                   "b3lyp",

    "device":               "auto",
    "cuda_visible_devices": "0,1,2",

    "max_cycle":            100,
    "conv_tol":             1e-7,
    "grid_level":           3,
    "damp":                 0.2,
    "level_shift":          0.2,

    "maxsteps":             30,
    "hessian":              true,
    "hessian_device":       "gpu",
    "hessian_patch":        true,
    "cphf_grid_level":      null,

    "output_dir":           "outputs",
    "disp":                 "",
    "verbose":              4,
    "write_fchk":           true,

    "resp_a1":              5e-4,
    "resp_a2":              1e-3,
    "resp_b":               0.1,
    "hfree":                true,
    "chgmod":               1,

    "scalef":               1.0,
    "bond_avg":             true,
    "angle_avg":            true,

    "paraset":              "12_6"
}

Parameter Reference

SCF / Geometry optimisation

Key	Type	Default	Description
`basis`	string	`"def2-svp"`	Orbital basis set
`auxbasis`	string	`"def2-svp-ri"`	Auxiliary basis for density fitting
`xc`	string	`"b3lyp"`	XC functional
`device`	string	`"auto"`	`"auto"` (GPU → CPU fallback), `"gpu"`, `"cpu"`
`cuda_visible_devices`	string	`"0"`	GPU index(es) for `CUDA_VISIBLE_DEVICES`
`max_cycle`	int	`100`	Maximum SCF iterations
`conv_tol`	float	`1e-7`	SCF energy convergence threshold (Hartree)
`grid_level`	int	`3`	DFT integration grid level (1–9)
`damp`	float	`0.2`	SCF damping factor
`level_shift`	float	`0.2`	Level-shift value (Hartree)
`maxsteps`	int	`30`	Maximum geometry optimisation steps
`disp`	string\|null	`""`	Dispersion: `"d3bj"`, `"d3"`, `"d4"`, `"d3zero"`, or `""`
`verbose`	int	`4`	PySCF verbosity (0=silent … 5=debug)

Hessian

Key	Type	Default	Description
`hessian`	bool	`true`	Compute Hessian after optimisation
`hessian_device`	string	`"same"`	Device for Hessian SCF: `"same"` as opt, or `"cpu"`
`hessian_patch`	bool	`true`	Enable multi-GPU Hessian patches (see below)
`cphf_grid_level`	int\|null	`null`	Separate DFT grid level for CPSCF XC response (`null` = same as `grid_level`; `2` ≈ SG-1 for ~20–40 % speedup)

Output

Key	Type	Default	Description
`output_dir`	string	`"outputs"`	Directory for all QM output files
`write_fchk`	bool	`false`	Write Gaussian-compatible `.fchk` files

RESP charge fitting

Key	Type	Default	Description
`resp_a1`	float	`5e-4`	Stage-1 restraint strength
`resp_a2`	float	`1e-3`	Stage-2 restraint strength
`resp_b`	float	`0.1`	Hyperbolic restraint width
`hfree`	bool	`true`	Exclude H atoms from RESP restraints
`chgmod`	int	`1`	Backbone charge constraint level: `0`=free; `1`=fix CA/N/C/O; `2`=+H/HA; `3`=+CB

Frcmod generation (Seminario, step 3)

Key	Type	Default	Description
`scalef`	float	`1.0`	Frequency scale factor (force constants scaled by `scalef²`)
`bond_avg`	bool	`true`	Average both off-diagonal Hessian blocks per bond
`angle_avg`	bool	`true`	4-way block average for angle force constants

Leap preparation (step 4)

Key	Type	Default	Description
`paraset`	string	`"12_6"`	Ion non-bonded parameter set: `"12_6"`, `"12_6_4"`, `"HFE"`, `"CM"`, `"IOD"`

GPU Memory Tuning (`gpu.json`)

The Hessian pipeline sizes every large GPU buffer at run time from a small set of memory-tuning knobs. These default to values that work well on the test hardware, but you can override any of them without editing the source by passing a JSON file with --gpu-json:

python main.py --step2 --json scf_params.json --gpu-json gpu.json

The file is optional — omit --gpu-json to use the built-in defaults. Only the keys you want to change need to appear; any key starting with _ is treated as a comment and ignored, and an unknown key raises an error. All values are read when the Hessian kernel runs, so a single load in main() applies to the whole job.

Accuracy is unaffected. These parameters only change how tensors are blocked and distributed across GPUs — never the arithmetic. Lower a *_frac / *_factor (or raise cphf_est_cycles / make_h1_avail_div) to use less GPU memory at the cost of speed; raise them to go faster when VRAM is ample.

Example `gpu.json`

{
    "_comment": "GPU memory-tuning parameters. Pass with --gpu-json gpu.json.",

    "cphf_subspace_frac":        0.50,
    "cphf_est_cycles":           20,

    "make_h1_pinned_frac":       0.40,
    "make_h1_avail_div":         4,
    "make_h1_stream_tile_frac":  0.08,

    "split_min_frac":            0.55,
    "hk_ao_aux_budget_frac":     0.30,
    "hk_ip1_ip1_budget_frac":    0.35,
    "aux_aux_budget_frac":       0.20,

    "df_blksize_factor":         0.05
}

Parameter Reference

CPHF / Krylov subspace (`solve_mo1`)

Key	Type	Default	Description
`cphf_subspace_frac`	float	`0.50`	Fraction of device-0 free VRAM the Krylov subspace (`xs`/`ax`) may occupy. Lower if CPHF OOMs on GPU 0.
`cphf_est_cycles`	int	`20`	Assumed Krylov iteration count used to size the per-atom subspace cost. Raise to shrink the atom block (less memory, more batches).

`make_h1` / `_get_jk_ip`

Key	Type	Default	Description
`make_h1_pinned_frac`	float	`0.40`	If `wk_Pl_` ≥ this fraction of the smallest GPU's free memory, build it in pinned host memory instead of on-GPU. Lower to spill to host sooner.
`make_h1_avail_div`	int	`4`	Divisor applied to `get_avail_mem` inside native `make_h1` (smaller blocks). Raise for a smaller footprint.
`make_h1_stream_tile_frac`	float	`0.08`	Fraction of free VRAM used to size the streamed `wk0_10_P__` aux tile. Lower if `make_h1` OOMs.

Multi-GPU `partial_hess_elec`

Key	Type	Default	Description
`split_min_frac`	float	`0.55`	Minimum share (relative to an even split) each GPU is guaranteed, so no device is starved to zero rows.
`hk_ao_aux_budget_frac`	float	`0.30`	`BlockPlan` budget fraction for the ao-aux contraction phase.
`hk_ip1_ip1_budget_frac`	float	`0.35`	`BlockPlan` budget fraction for the fused ip1/ip1 phase.
`aux_aux_budget_frac`	float	`0.20`	`BlockPlan` budget fraction for the aux-aux (`int2c_ip`) phase.

DF JK loop

Key	Type	Default	Description
`df_blksize_factor`	float	`0.05`	Fraction of free VRAM used to size the DF JK-loop block. Lower for a smaller footprint.

If a phase still OOMs at runtime, solve_mo1 additionally auto-halves its atom block and retries, so these knobs mainly trade peak memory for fewer/larger (faster) batches rather than being strictly required for correctness.

Multi-GPU Hessian Patch

lib/patch.py is loaded automatically when "hessian_patch": true (the default). It fixes three upstream issues in GPU4PySCF 1.7 that cause OOM crashes or sub-optimal utilisation when running Hessian calculations with multiple GPUs.

Set "hessian_patch": false to bypass all patches and use the unmodified GPU4PySCF path — recommended only when VRAM per GPU is large enough to hold the full intermediate tensors (typically A100/H100 80 GB cards).

Fix 1 — `partial_hess_elec` OOM (`_hk_ip1_ip1`)

Root cause: Inside _partial_hess_ejk, the function _hk_ip1_ip1 computes a tensor contraction 'pikx,pkiy->ikxy' of shape [blksize × nao × nao × 3]. CuPy's einsum implementation reshapes this into a batch matrix multiply of size [nao², blksize, 3], creating a temporary array of equal size. The original blksize formula uses a 0.4 memory factor that does not account for this temporary — the actual peak usage is 2 × blksize × nao² × 3 × 8 bytes, all on GPU 0.

Example: nao=1200, 20 GB card → 23 GB required on GPU 0 alone → OOM.

Fix: _hk_ip1_ip1_multigpu replaces the function for the duration of kernel(). It splits the aux-shell (nnz) axis across all visible GPUs via ThreadPoolExecutor and uses a conservative 0.1 blksize factor. The results are reduced onto GPU 0. Peak memory per GPU drops to ~2.2 GB; wall-time is ~2–3× faster than the original single-GPU path.

Fix 2 — `make_h1` single-GPU OOM (`get_int3c2e_wjk`)

Root cause: make_h1 calls _get_jk_ip → get_int3c2e_wjk, which allocates wk_Pl_ of shape [naux × nao × nocc] on GPU 0 alone when it fits within 40% of GPU 0's free VRAM, then solve_j2c allocates a second equally-sized tensor on the same device. The other GPUs are not considered.

Fix: When wk_Pl_ exceeds 40% of the minimum free VRAM across all GPUs, the patch forces the built-in pinned-CPU-memory path. The downstream get_int3c2e_ip1_vjk and get_int3c2e_ip2_vjk functions already distribute work across GPUs via ThreadPoolExecutor and transparently accept CPU/pinned input.

Fix 3 — Redundant XC-kernel recomputation in CPSCF (`solve_mo1`)

Root cause: Inside the CPSCF Krylov loop, get_veff_resp_mo calls ni.cache_xc_kernel on every iteration to compute the ground-state XC kernel (ρ₀, vxc, fxc). This kernel depends only on the converged SCF density, which does not change during CPSCF — yet the original code recomputes it every step. For a large system (300 000 DFT grid points) this wastes 5–15 s per Krylov iteration, or 250–750 s over 50 iterations.

Fix: ni.cache_xc_kernel is shadowed with a memoised wrapper for the lifetime of kernel(). The first call computes and stores (ρ₀, vxc, fxc); all subsequent calls return the cached result instantly. The wrapper is removed unconditionally in the finally block, leaving the NumInt object unmodified after the Hessian.

Numerical accuracy: all three fixes are exact — they perform identical floating-point operations and produce Hessian values within floating-point rounding error (~10⁻¹⁵) of the original code.

Output Files

All QM outputs are written to output_dir (default: outputs/). Step 3/4 outputs are written to the working directory.

Step 2 outputs

File	Description
`{gname}_small_opt.xyz`	Optimised small-model geometry
`{gname}_small_opt.json`	Optimisation metadata (energy, convergence, timings)
`{gname}_small_fc.hessian.npy`	Mass-unweighted Hessian (3N × 3N), Hartree/Bohr²
`{gname}_small_fc.hessian.txt`	Hessian (plain text)
`{gname}_small_fc.json`	Hessian metadata
`{gname}_small_fc.fchk`	Gaussian fchk for small model (if `write_fchk: true`)
`{gname}_large_mk.resp_charges.npy`	Final RESP charges (natm,), electrons
`{gname}_large_mk.resp_charges.txt`	RESP charges plain text (stage-1 and final)
`{gname}_large_mk.json`	RESP metadata
`{gname}_large_mk.fchk`	Gaussian fchk for large model (if `write_fchk: true`)

Step 3 outputs

File	Description
`{gname}_mcpbpy_pre.frcmod`	Pre-frcmod with mass/VDW/standard parameters
`{gname}_mcpbpy.frcmod`	Final frcmod with Seminario bond/angle force constants
`{gname}_step3.json`	Step-3 metadata

Step 4 outputs

File	Description
`{mcresname}.mol2`	One mol2 per metal-site residue (with RESP charges)
`{gname}_mcpbpy.pdb`	PDB with MCPB-style residue names for tleap
`{gname}_tleap.in`	tleap input (patched `set head/tail`, cleaned backbone bonds)
`{gname}_step4.json`	Step-4 metadata

MCPB Input File (`.in`)

The .in file is generated automatically in step 1, or can be written manually. Key fields:

group_name      1okl
ion_ids         260
ion_mol2files   ZN.mol2
original_pdb    1okl_H.pdb
force_field     ff19SB
water_model     opc
cut_off         2.8

The standalone --step2, --step3, and --step4 modes auto-detect this file in the current working directory — no extra arguments required.

Plugin: Gaussian External Interface

plugin/external_pyscf.py is a standalone Gaussian External interface that lets you call PySCF / GPU4PySCF as a drop-in QM engine from within a Gaussian job (geometry optimisations, IRC, frequency, etc.) via the External= keyword.

# Gaussian input example
#External="python /path/to/external_pyscf.py --xc b3lyp --basis def2-svp --gpu 0 --disp d3bj --ri"

This plugin is independent of the MCPB workflow and can be used separately.

License

Released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 200 Commits
.vscode		.vscode
claude		claude
example		example
lib		lib
mcpb		mcpb
pic		pic
plugin		plugin
process		process
tests		tests
utils		utils
.gitignore		.gitignore
CHANGELOG		CHANGELOG
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
gpu.json		gpu.json
main.py		main.py
mem.py		mem.py
requirements.txt		requirements.txt
scf_params.json		scf_params.json

Folders and files

Latest commit

History

Repository files navigation

MCPB-GPU4PySCF

Table of Contents

Overview

Step descriptions

Features

Requirements

Installation

Quick Start

Full pipeline (steps 1 → 4)

Standalone steps

Running tleap

SCF Parameters (JSON)

Parameter Reference

SCF / Geometry optimisation

Hessian

Output

RESP charge fitting

Frcmod generation (Seminario, step 3)

Leap preparation (step 4)

GPU Memory Tuning (gpu.json)

Example gpu.json

Parameter Reference

CPHF / Krylov subspace (solve_mo1)

make_h1 / _get_jk_ip

Multi-GPU partial_hess_elec

DF JK loop

Multi-GPU Hessian Patch

Fix 1 — partial_hess_elec OOM (_hk_ip1_ip1)

Fix 2 — make_h1 single-GPU OOM (get_int3c2e_wjk)

Fix 3 — Redundant XC-kernel recomputation in CPSCF (solve_mo1)

Output Files

Step 2 outputs

Step 3 outputs

Step 4 outputs

MCPB Input File (.in)

Plugin: Gaussian External Interface

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

GPU Memory Tuning (`gpu.json`)

Example `gpu.json`

CPHF / Krylov subspace (`solve_mo1`)

`make_h1` / `_get_jk_ip`

Multi-GPU `partial_hess_elec`

Fix 1 — `partial_hess_elec` OOM (`_hk_ip1_ip1`)

Fix 2 — `make_h1` single-GPU OOM (`get_int3c2e_wjk`)

Fix 3 — Redundant XC-kernel recomputation in CPSCF (`solve_mo1`)

MCPB Input File (`.in`)

Packages