Skip to content

EricBoittier/mmml_tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

98 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MMML Tutorial: Cyanobenzene (CYBZ)

Step-by-step tutorial for building and simulating cyanobenzene using MMML's hybrid MM/ML workflow.

Prerequisites: MMML installed (uv sync --extra all or make micromamba-create-full), CHARMM set up for make_res/make_box.


01 – Generating a molecule

Create the residue structure and pack it into a simulation box.

make_res

Generates PDB, PSF, and topology for a single residue using PyCHARMM/CGENFF:

# CLI (like pyscf-dft)
mmml make-res --res CYBZ
mmml make-res --res CYBZ --skip-energy-show   # skip energy.show() on clusters

# Or run the example scripts (from project root):
bash examples/mmml_tutorial/cli/01_make_res_cli.sh
uv run python examples/mmml_tutorial/programmatic/01_make_res_programmatic.py

Output: pdb/initial.pdb, psf/initial.psf, xyz/initial.xyz, CHARMM topology files (in cli/ or programmatic/ respectively).

make_box

Packs molecules into a periodic box (vacuum or solvated).

In this case, we will prepare the dimers.

# CLI
mmml make-box --res CYBZ --n 2 --side_length 25.0
mmml make-box --res CYBZ --n 2 --side_length 25.0 --solvent TIP3 --density 1.0

# Or run the example scripts (from project root):
bash examples/mmml_tutorial/cli/02_make_box_cli.sh
uv run python examples/mmml_tutorial/programmatic/02_make_box_programmatic.py

Output: pdb/init-packmol.pdb (or pdb/init-TIP3box.pdb if solvated) in cli/ or programmatic/ respectively.

run-pycharmm (pure CHARMM)

Run CHARMM heating and equilibration only (no ML). Uses the box from step 02:

# CLI (from cli/ directory after 01–02)
mmml run-pycharmm --pdbfile pdb/init-packmol.pdb --cell 25.0

# Or run the example script (from project root):
bash examples/mmml_tutorial/cli/11_run_pycharmm_cli.sh

Output: heat.pdb, equi.pdb, restart files (heat.res, equi.res), and trajectories (heat.dcd, equi.dcd).


02 – Calculating energy, forces, ESPs

QM/DFT (GPU)

# CLI example
mmml pyscf-dft --mol pdb/initial.pdb --energy
mmml pyscf-mp2 --mol "O 0 0 0; H 0.96 0 0; H -0.24 0.93 0" --energy --gradient

# Or run the example scripts (from project root):
# 03: DFT energy
bash examples/mmml_tutorial/cli/03_pyscf_dft_cli.sh
uv run python examples/mmml_tutorial/programmatic/03_pyscf_dft_programmatic.py

# 04: DFT full (energy, gradient, hessian, harmonic, thermo)
bash examples/mmml_tutorial/cli/04_pyscf_dft_cli_full.sh
uv run python examples/mmml_tutorial/programmatic/04_pyscf_dft_programmatic.py

# 05: MP2 (post-HF)
bash examples/mmml_tutorial/cli/05_pyscf_mp2_cli.sh
uv run python examples/mmml_tutorial/programmatic/05_pyscf_mp2_programmatic.py

# 06: Normal mode sampling (from step 04 harmonic output)
bash examples/mmml_tutorial/cli/06_normal_mode_sample_cli.sh
uv run python examples/mmml_tutorial/programmatic/06_normal_mode_sample_programmatic.py

# 07: Evaluate sampled geometries (E, F, D, ESP)
bash examples/mmml_tutorial/cli/07_pyscf_evaluate_cli.sh
uv run python examples/mmml_tutorial/programmatic/07_pyscf_evaluate_programmatic.py

# 08: Fix units and create train/valid/test splits
bash examples/mmml_tutorial/cli/08_fix_and_split_cli.sh

# 09: Train PhysNet (energies, forces, dipoles)
bash examples/mmml_tutorial/cli/09_physnet_train_cli.sh

# 10: Train PhysNet+DCMNet (joint, with ESP)
bash examples/mmml_tutorial/cli/10_physnet_dcmnet_train_cli.sh

See examples/pyscf4gpu/README.md for full GPU-accelerated DFT/MP2 docs.

Full training workflow (1000 structures)

For a larger dataset suitable for training, run:

# Generate 1000 structures (uses combination modes when needed)
mmml normal-mode-sample -i out/04_results.h5 -o out/06_sampled.npz --amplitude 0.1 --max-samples 1000

# Evaluate with pyscf-dft (E, F, D, ESP)
mmml pyscf-evaluate -i out/06_sampled.npz -o out/07_evaluated.npz --esp

# Or run the full pipeline script:
bash examples/mmml_tutorial/cli/run_full_training.sh

Normal mode sampling

After running step 04 (DFT full with harmonic), sample geometries along vibrational modes for downstream QM/ML:

mmml normal-mode-sample -i out/04_results.h5 -o out/06_sampled.npz --amplitude 0.1 --max-samples 10
mmml normal-mode-sample -i out/04_results.h5 -o out/06_sampled.npz --amplitude 0.1 --include-equilibrium

Output: NPZ with R (n_samples, n_atoms, 3), Z, N.

Evaluate sampled geometries

Run DFT on each sampled geometry (energy, forces, dipoles, ESP) in one GPU process:

mmml pyscf-evaluate -i out/06_sampled.npz -o out/07_evaluated.npz --esp

Output: NPZ with R, Z, N, E, F, Dxyz, esp, esp_grid. Ready for PhysNet/DCMNet training.


03 – PhysNet

Train a PhysNetJAX model on energies, forces, and dipoles.

Prepare data (fix-and-split)

Converts units (Hartree → eV, etc.) and creates train/valid/test splits. If 07_evaluated.npz contains esp/esp_grid, no separate grid file is needed:

mmml fix-and-split --efd out/07_evaluated.npz --output-dir out/splits

Output: energies_forces_dipoles_{train,valid,test}.npz, grids_esp_{train,valid,test}.npz.

Train PhysNet

bash examples/mmml_tutorial/cli/09_physnet_train_cli.sh

Or manually:

uv run python examples/other/co2/physnet_train/trainer.py \
  --train out/splits/energies_forces_dipoles_train.npz \
  --valid out/splits/energies_forces_dipoles_valid.npz \
  --natoms 16 --epochs 50 --batch-size 1 --name cybz_physnet --ckpt-dir out/ckpts

Checkpoints: out/ckpts/cybz_physnet/.

Optional: SO(3) rotational augmentation

New augmentation controls are available in batch builders and EF CLI training:

  • --rot-augment: enable random rotation augmentation
  • --rot-perturbation: strength in [0, 1] (1.0 = full random rotations, 0.0 = identity)

Example (EF CLI):

mmml ef-train \
  --train-npz out/splits_ef_sim/energies_forces_dipoles_train.npz \
  --valid-npz out/splits_ef_sim/energies_forces_dipoles_valid.npz \
  --rot-augment \
  --rot-perturbation 1.0 \
  --output-dir ~/ckpts/ef_run_rotaug

When enabled, positions and vector targets are rotated consistently; scalar targets stay unchanged.


04 – DCMNet

Train a DCMNet model for distributed charges and ESPs (electrostatic potential).

PhysNet+DCMNet joint training

Train PhysNet and DCMNet together for end-to-end energy, forces, dipoles, and ESP:

bash examples/mmml_tutorial/cli/10_physnet_dcmnet_train_cli.sh

Or manually:

uv run python -m mmml.cli.misc.train_joint \
  --train-efd out/splits/energies_forces_dipoles_train.npz \
  --train-esp out/splits/grids_esp_train.npz \
  --valid-efd out/splits/energies_forces_dipoles_valid.npz \
  --valid-esp out/splits/grids_esp_valid.npz \
  --use-repo-physnet-params \
  --epochs 50 --batch-size 1 --name cybz_joint --ckpt-dir out/ckpts

Checkpoints: out/ckpts/cybz_joint/.

DCMNet only

cd examples/dcm-net
python train.py model=base training=bootstrap

With PhysNet (joint training)

See examples/other/co2/dcmnet_physnet_train/ for joint PhysNet–DCMNet training.

DCMNet predicts monopoles and dipoles per atom for improved ESP and charge fitting.

Plot and print checkpoint training metrics

For Orbax checkpoint runs, you can now extract and plot training metrics directly from the CLI:

mmml extract-checkpoint-metrics /path/to/checkpoint_run_dir -o training_metrics.png --log-loss

This prints a text summary and writes a figure with loss/MAE trends.


05 – DCMNet to fMDCM, kMDCM

Convert trained DCMNet models to MDCM (multipole-derived charge model) formats for use in classical force fields.

  • fMDCM: Fixed-site MDCM
  • kMDCM: Kernel-based MDCM variant

(Detailed workflow and scripts to be added.)

05 – MMML

Run simulation

Run MD or evaluate energy/forces with a trained ML model. From Python (see examples/general/dimers/sim.py):

import argparse
from pathlib import Path
from mmml.cli.run.run_sim import run
from mmml.cli.base import resolve_desdimers_checkpoint

args = argparse.Namespace(
    pdbfile=Path("pdb/init-packmol.pdb"),
    checkpoint=resolve_desdimers_checkpoint(),
    n_monomers=50,
    n_atoms_monomer=15,
    cell=25.0,
    nsteps_jaxmd=1000,
    nsteps_ase=100,
    temperature=298.0,
    ensemble="npt",
    # ... see run_sim.parse_args() for full options
)
run(args)

md_10mer CLI setups (free-space + periodic)

The generalized MD workflows are exposed in two ways:

  • MMML library CLI wrapper: mmml md-system ...
  • Direct script argparse entrypoints in scripts/

Examples using the MMML CLI wrapper:

# Free-space MD
mmml md-system --setup free_nve --output-dir out/md/free_nve
mmml md-system --setup free_nvt --temperature 300 --output-dir out/md/free_nvt

# Periodic MD
mmml md-system --setup pbc_nve --output-dir out/md/pbc_nve
mmml md-system --setup pbc_nvt --backend jaxmd --temperature 300 --output-dir out/md/pbc_nvt
mmml md-system --setup pbc_npt --temperature 300 --pressure 1.0 --output-dir out/md/pbc_npt
mmml md-system --setup pbc_npt --n-molecules 100 --box-size 60.0 --seed 7 --output-dir out/md/pbc_npt_60A

# Mixed composition (methanol:water = 1:1, TIP3 water)
mmml md-system --setup pbc_nvt --backend jaxmd --composition MEOH:5,TIP3:5 --temperature 300 --output-dir out/md/meoh_tip3_1to1

# Long runs: split trajectory into multiple files (e.g., 5000 frames per file)
mmml md-system --setup pbc_nvt --composition MEOH:5,TIP3:5 --temperature 300 --traj-chunk-frames 5000

For long simulations, use --traj-chunk-frames to avoid a single very large trajectory file.
When enabled, output is split as *.part0000.traj, *.part0001.traj, etc.

The periodic tutorial shell scripts default to the JAX-MD backend via MDSYS_BACKEND=jaxmd; set MDSYS_BACKEND=ase to compare against the ASE route. During JAX-MD production, inter-monomer overlap detections warn and continue by default. Use --extra-args --dynamics-overlap-action error for hard aborts or off to disable that diagnostic.

Equivalent direct argparse script usage:

# ASE-based md_10mer suite (free-space + periodic NVE/NVT)
python ~/mmml/scripts/md_10mer_mmml_pbc_suite.py --only vac_nve
python ~/mmml/scripts/md_10mer_mmml_pbc_suite.py --only vac_nvt_nhc --nvt-temp-K 300
python ~/mmml/scripts/md_10mer_mmml_pbc_suite.py --only pbc_nve
python ~/mmml/scripts/md_10mer_mmml_pbc_suite.py --only pbc_nvt_nhc --nvt-temp-K 300

# JAX-MD periodic setup (NHC thermostat; NPT adds a barostat)
python ~/mmml/scripts/md_10mer_mmml_pbc_suite_jaxmd.py --ensemble nvt --temperature 300
python ~/mmml/scripts/md_10mer_mmml_pbc_suite_jaxmd.py --ensemble npt --temperature 300 --pressure 1.0

Tutorial helper scripts (from mmml_tutorial/cli):

bash 16_md_10mer_free_nve.sh
bash 17_md_10mer_free_nvt.sh
bash 18_md_10mer_pbc_nve.sh
bash 19_md_10mer_pbc_nvt.sh
bash 20_md_10mer_pbc_npt.sh
bash 21_md_system_meoh_tip3_1to1.sh

Test calculator (energy, forces, charges, dipole)

python -m mmml.cli.calculator --checkpoint <path-to-checkpoint> --test-molecule CO2

Reference

# Step CLI Scripts
01 make_res mmml make-res --res CYBZ cli/01_make_res_cli.sh, programmatic/01_make_res_programmatic.py
02 make_box mmml make-box --res CYBZ --n 50 --side_length 25 cli/02_make_box_cli.sh, programmatic/02_make_box_programmatic.py
11 run-pycharmm mmml run-pycharmm --pdbfile pdb/init-packmol.pdb --cell 25.0 cli/11_run_pycharmm_cli.sh
03 pyscf-dft mmml pyscf-dft --mol "..." --energy cli/03_pyscf_dft_cli.sh, programmatic/03_pyscf_dft_programmatic.py
04 pyscf-dft full mmml pyscf-dft --mol xyz/initial.xyz --energy --gradient --hessian --harmonic --thermo cli/04_pyscf_dft_cli_full.sh, programmatic/04_pyscf_dft_programmatic.py
05 pyscf-mp2 mmml pyscf-mp2 --mol "..." --energy --gradient cli/05_pyscf_mp2_cli.sh, programmatic/05_pyscf_mp2_programmatic.py
06 normal-mode-sample mmml normal-mode-sample -i out/04_results.h5 -o out/06_sampled.npz --amplitude 0.1 --max-samples 10 cli/06_normal_mode_sample_cli.sh, programmatic/06_normal_mode_sample_programmatic.py
07 pyscf-evaluate mmml pyscf-evaluate -i out/06_sampled.npz -o out/07_evaluated.npz --esp cli/07_pyscf_evaluate_cli.sh, programmatic/07_pyscf_evaluate_programmatic.py
08 fix-and-split mmml fix-and-split --efd out/07_evaluated.npz --output-dir out/splits cli/08_fix_and_split_cli.sh
09 PhysNet train python examples/other/co2/physnet_train/trainer.py --train ... --valid ... cli/09_physnet_train_cli.sh
10 PhysNet+DCMNet python -m mmml.cli.misc.train_joint --train-efd ... --train-esp ... cli/10_physnet_dcmnet_train_cli.sh
16 md-system free NVE mmml md-system --setup free_nve cli/16_md_10mer_free_nve.sh
17 md-system free NVT mmml md-system --setup free_nvt --temperature 300 cli/17_md_10mer_free_nvt.sh
18 md-system pbc NVE mmml md-system --setup pbc_nve cli/18_md_10mer_pbc_nve.sh
19 md-system pbc NVT mmml md-system --setup pbc_nvt --temperature 300 cli/19_md_10mer_pbc_nvt.sh
20 md-system pbc NPT mmml md-system --setup pbc_npt --temperature 300 --pressure 1.0 cli/20_md_10mer_pbc_npt.sh
21 md-system mixed 1:1 mmml md-system --setup pbc_nvt --composition MEOH:5,TIP3:5 --temperature 300 cli/21_md_system_meoh_tip3_1to1.sh
Full workflow bash examples/mmml_tutorial/cli/run_full_training.sh 1000 structures, split, PhysNet, PhysNet+DCMNet
DCMNet only examples/dcm-net/train.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors