Step-by-step tutorial for building and simulating cyanobenzene using MMML's hybrid MM/ML workflow.
Prerequisites: MMML installed (uv sync --extra all or make micromamba-create-full), CHARMM set up for make_res/make_box.
Create the residue structure and pack it into a simulation box.
Generates PDB, PSF, and topology for a single residue using PyCHARMM/CGENFF:
# CLI (like pyscf-dft)
mmml make-res --res CYBZ
mmml make-res --res CYBZ --skip-energy-show # skip energy.show() on clusters
# Or run the example scripts (from project root):
bash examples/mmml_tutorial/cli/01_make_res_cli.sh
uv run python examples/mmml_tutorial/programmatic/01_make_res_programmatic.pyOutput: pdb/initial.pdb, psf/initial.psf, xyz/initial.xyz, CHARMM topology files (in cli/ or programmatic/ respectively).
Packs molecules into a periodic box (vacuum or solvated).
In this case, we will prepare the dimers.
# CLI
mmml make-box --res CYBZ --n 2 --side_length 25.0
mmml make-box --res CYBZ --n 2 --side_length 25.0 --solvent TIP3 --density 1.0
# Or run the example scripts (from project root):
bash examples/mmml_tutorial/cli/02_make_box_cli.sh
uv run python examples/mmml_tutorial/programmatic/02_make_box_programmatic.pyOutput: pdb/init-packmol.pdb (or pdb/init-TIP3box.pdb if solvated) in cli/ or programmatic/ respectively.
Run CHARMM heating and equilibration only (no ML). Uses the box from step 02:
# CLI (from cli/ directory after 01–02)
mmml run-pycharmm --pdbfile pdb/init-packmol.pdb --cell 25.0
# Or run the example script (from project root):
bash examples/mmml_tutorial/cli/11_run_pycharmm_cli.shOutput: heat.pdb, equi.pdb, restart files (heat.res, equi.res), and trajectories (heat.dcd, equi.dcd).
# CLI example
mmml pyscf-dft --mol pdb/initial.pdb --energy
mmml pyscf-mp2 --mol "O 0 0 0; H 0.96 0 0; H -0.24 0.93 0" --energy --gradient
# Or run the example scripts (from project root):
# 03: DFT energy
bash examples/mmml_tutorial/cli/03_pyscf_dft_cli.sh
uv run python examples/mmml_tutorial/programmatic/03_pyscf_dft_programmatic.py
# 04: DFT full (energy, gradient, hessian, harmonic, thermo)
bash examples/mmml_tutorial/cli/04_pyscf_dft_cli_full.sh
uv run python examples/mmml_tutorial/programmatic/04_pyscf_dft_programmatic.py
# 05: MP2 (post-HF)
bash examples/mmml_tutorial/cli/05_pyscf_mp2_cli.sh
uv run python examples/mmml_tutorial/programmatic/05_pyscf_mp2_programmatic.py
# 06: Normal mode sampling (from step 04 harmonic output)
bash examples/mmml_tutorial/cli/06_normal_mode_sample_cli.sh
uv run python examples/mmml_tutorial/programmatic/06_normal_mode_sample_programmatic.py
# 07: Evaluate sampled geometries (E, F, D, ESP)
bash examples/mmml_tutorial/cli/07_pyscf_evaluate_cli.sh
uv run python examples/mmml_tutorial/programmatic/07_pyscf_evaluate_programmatic.py
# 08: Fix units and create train/valid/test splits
bash examples/mmml_tutorial/cli/08_fix_and_split_cli.sh
# 09: Train PhysNet (energies, forces, dipoles)
bash examples/mmml_tutorial/cli/09_physnet_train_cli.sh
# 10: Train PhysNet+DCMNet (joint, with ESP)
bash examples/mmml_tutorial/cli/10_physnet_dcmnet_train_cli.shSee examples/pyscf4gpu/README.md for full GPU-accelerated DFT/MP2 docs.
For a larger dataset suitable for training, run:
# Generate 1000 structures (uses combination modes when needed)
mmml normal-mode-sample -i out/04_results.h5 -o out/06_sampled.npz --amplitude 0.1 --max-samples 1000
# Evaluate with pyscf-dft (E, F, D, ESP)
mmml pyscf-evaluate -i out/06_sampled.npz -o out/07_evaluated.npz --esp
# Or run the full pipeline script:
bash examples/mmml_tutorial/cli/run_full_training.shAfter running step 04 (DFT full with harmonic), sample geometries along vibrational modes for downstream QM/ML:
mmml normal-mode-sample -i out/04_results.h5 -o out/06_sampled.npz --amplitude 0.1 --max-samples 10
mmml normal-mode-sample -i out/04_results.h5 -o out/06_sampled.npz --amplitude 0.1 --include-equilibriumOutput: NPZ with R (n_samples, n_atoms, 3), Z, N.
Run DFT on each sampled geometry (energy, forces, dipoles, ESP) in one GPU process:
mmml pyscf-evaluate -i out/06_sampled.npz -o out/07_evaluated.npz --espOutput: NPZ with R, Z, N, E, F, Dxyz, esp, esp_grid. Ready for PhysNet/DCMNet training.
Train a PhysNetJAX model on energies, forces, and dipoles.
Converts units (Hartree → eV, etc.) and creates train/valid/test splits. If 07_evaluated.npz contains esp/esp_grid, no separate grid file is needed:
mmml fix-and-split --efd out/07_evaluated.npz --output-dir out/splitsOutput: energies_forces_dipoles_{train,valid,test}.npz, grids_esp_{train,valid,test}.npz.
bash examples/mmml_tutorial/cli/09_physnet_train_cli.shOr manually:
uv run python examples/other/co2/physnet_train/trainer.py \
--train out/splits/energies_forces_dipoles_train.npz \
--valid out/splits/energies_forces_dipoles_valid.npz \
--natoms 16 --epochs 50 --batch-size 1 --name cybz_physnet --ckpt-dir out/ckptsCheckpoints: out/ckpts/cybz_physnet/.
New augmentation controls are available in batch builders and EF CLI training:
--rot-augment: enable random rotation augmentation--rot-perturbation: strength in[0, 1](1.0= full random rotations,0.0= identity)
Example (EF CLI):
mmml ef-train \
--train-npz out/splits_ef_sim/energies_forces_dipoles_train.npz \
--valid-npz out/splits_ef_sim/energies_forces_dipoles_valid.npz \
--rot-augment \
--rot-perturbation 1.0 \
--output-dir ~/ckpts/ef_run_rotaugWhen enabled, positions and vector targets are rotated consistently; scalar targets stay unchanged.
Train a DCMNet model for distributed charges and ESPs (electrostatic potential).
Train PhysNet and DCMNet together for end-to-end energy, forces, dipoles, and ESP:
bash examples/mmml_tutorial/cli/10_physnet_dcmnet_train_cli.shOr manually:
uv run python -m mmml.cli.misc.train_joint \
--train-efd out/splits/energies_forces_dipoles_train.npz \
--train-esp out/splits/grids_esp_train.npz \
--valid-efd out/splits/energies_forces_dipoles_valid.npz \
--valid-esp out/splits/grids_esp_valid.npz \
--use-repo-physnet-params \
--epochs 50 --batch-size 1 --name cybz_joint --ckpt-dir out/ckptsCheckpoints: out/ckpts/cybz_joint/.
cd examples/dcm-net
python train.py model=base training=bootstrapSee examples/other/co2/dcmnet_physnet_train/ for joint PhysNet–DCMNet training.
DCMNet predicts monopoles and dipoles per atom for improved ESP and charge fitting.
For Orbax checkpoint runs, you can now extract and plot training metrics directly from the CLI:
mmml extract-checkpoint-metrics /path/to/checkpoint_run_dir -o training_metrics.png --log-lossThis prints a text summary and writes a figure with loss/MAE trends.
Convert trained DCMNet models to MDCM (multipole-derived charge model) formats for use in classical force fields.
- fMDCM: Fixed-site MDCM
- kMDCM: Kernel-based MDCM variant
(Detailed workflow and scripts to be added.)
Run MD or evaluate energy/forces with a trained ML model.
From Python (see examples/general/dimers/sim.py):
import argparse
from pathlib import Path
from mmml.cli.run.run_sim import run
from mmml.cli.base import resolve_desdimers_checkpoint
args = argparse.Namespace(
pdbfile=Path("pdb/init-packmol.pdb"),
checkpoint=resolve_desdimers_checkpoint(),
n_monomers=50,
n_atoms_monomer=15,
cell=25.0,
nsteps_jaxmd=1000,
nsteps_ase=100,
temperature=298.0,
ensemble="npt",
# ... see run_sim.parse_args() for full options
)
run(args)The generalized MD workflows are exposed in two ways:
- MMML library CLI wrapper:
mmml md-system ... - Direct script argparse entrypoints in
scripts/
Examples using the MMML CLI wrapper:
# Free-space MD
mmml md-system --setup free_nve --output-dir out/md/free_nve
mmml md-system --setup free_nvt --temperature 300 --output-dir out/md/free_nvt
# Periodic MD
mmml md-system --setup pbc_nve --output-dir out/md/pbc_nve
mmml md-system --setup pbc_nvt --backend jaxmd --temperature 300 --output-dir out/md/pbc_nvt
mmml md-system --setup pbc_npt --temperature 300 --pressure 1.0 --output-dir out/md/pbc_npt
mmml md-system --setup pbc_npt --n-molecules 100 --box-size 60.0 --seed 7 --output-dir out/md/pbc_npt_60A
# Mixed composition (methanol:water = 1:1, TIP3 water)
mmml md-system --setup pbc_nvt --backend jaxmd --composition MEOH:5,TIP3:5 --temperature 300 --output-dir out/md/meoh_tip3_1to1
# Long runs: split trajectory into multiple files (e.g., 5000 frames per file)
mmml md-system --setup pbc_nvt --composition MEOH:5,TIP3:5 --temperature 300 --traj-chunk-frames 5000For long simulations, use --traj-chunk-frames to avoid a single very large trajectory file.
When enabled, output is split as *.part0000.traj, *.part0001.traj, etc.
The periodic tutorial shell scripts default to the JAX-MD backend via MDSYS_BACKEND=jaxmd; set MDSYS_BACKEND=ase to compare against the ASE route. During JAX-MD production, inter-monomer overlap detections warn and continue by default. Use --extra-args --dynamics-overlap-action error for hard aborts or off to disable that diagnostic.
Equivalent direct argparse script usage:
# ASE-based md_10mer suite (free-space + periodic NVE/NVT)
python ~/mmml/scripts/md_10mer_mmml_pbc_suite.py --only vac_nve
python ~/mmml/scripts/md_10mer_mmml_pbc_suite.py --only vac_nvt_nhc --nvt-temp-K 300
python ~/mmml/scripts/md_10mer_mmml_pbc_suite.py --only pbc_nve
python ~/mmml/scripts/md_10mer_mmml_pbc_suite.py --only pbc_nvt_nhc --nvt-temp-K 300
# JAX-MD periodic setup (NHC thermostat; NPT adds a barostat)
python ~/mmml/scripts/md_10mer_mmml_pbc_suite_jaxmd.py --ensemble nvt --temperature 300
python ~/mmml/scripts/md_10mer_mmml_pbc_suite_jaxmd.py --ensemble npt --temperature 300 --pressure 1.0Tutorial helper scripts (from mmml_tutorial/cli):
bash 16_md_10mer_free_nve.sh
bash 17_md_10mer_free_nvt.sh
bash 18_md_10mer_pbc_nve.sh
bash 19_md_10mer_pbc_nvt.sh
bash 20_md_10mer_pbc_npt.sh
bash 21_md_system_meoh_tip3_1to1.shpython -m mmml.cli.calculator --checkpoint <path-to-checkpoint> --test-molecule CO2| # | Step | CLI | Scripts |
|---|---|---|---|
| 01 | make_res | mmml make-res --res CYBZ |
cli/01_make_res_cli.sh, programmatic/01_make_res_programmatic.py |
| 02 | make_box | mmml make-box --res CYBZ --n 50 --side_length 25 |
cli/02_make_box_cli.sh, programmatic/02_make_box_programmatic.py |
| 11 | run-pycharmm | mmml run-pycharmm --pdbfile pdb/init-packmol.pdb --cell 25.0 |
cli/11_run_pycharmm_cli.sh |
| 03 | pyscf-dft | mmml pyscf-dft --mol "..." --energy |
cli/03_pyscf_dft_cli.sh, programmatic/03_pyscf_dft_programmatic.py |
| 04 | pyscf-dft full | mmml pyscf-dft --mol xyz/initial.xyz --energy --gradient --hessian --harmonic --thermo |
cli/04_pyscf_dft_cli_full.sh, programmatic/04_pyscf_dft_programmatic.py |
| 05 | pyscf-mp2 | mmml pyscf-mp2 --mol "..." --energy --gradient |
cli/05_pyscf_mp2_cli.sh, programmatic/05_pyscf_mp2_programmatic.py |
| 06 | normal-mode-sample | mmml normal-mode-sample -i out/04_results.h5 -o out/06_sampled.npz --amplitude 0.1 --max-samples 10 |
cli/06_normal_mode_sample_cli.sh, programmatic/06_normal_mode_sample_programmatic.py |
| 07 | pyscf-evaluate | mmml pyscf-evaluate -i out/06_sampled.npz -o out/07_evaluated.npz --esp |
cli/07_pyscf_evaluate_cli.sh, programmatic/07_pyscf_evaluate_programmatic.py |
| 08 | fix-and-split | mmml fix-and-split --efd out/07_evaluated.npz --output-dir out/splits |
cli/08_fix_and_split_cli.sh |
| 09 | PhysNet train | python examples/other/co2/physnet_train/trainer.py --train ... --valid ... |
cli/09_physnet_train_cli.sh |
| 10 | PhysNet+DCMNet | python -m mmml.cli.misc.train_joint --train-efd ... --train-esp ... |
cli/10_physnet_dcmnet_train_cli.sh |
| 16 | md-system free NVE | mmml md-system --setup free_nve |
cli/16_md_10mer_free_nve.sh |
| 17 | md-system free NVT | mmml md-system --setup free_nvt --temperature 300 |
cli/17_md_10mer_free_nvt.sh |
| 18 | md-system pbc NVE | mmml md-system --setup pbc_nve |
cli/18_md_10mer_pbc_nve.sh |
| 19 | md-system pbc NVT | mmml md-system --setup pbc_nvt --temperature 300 |
cli/19_md_10mer_pbc_nvt.sh |
| 20 | md-system pbc NPT | mmml md-system --setup pbc_npt --temperature 300 --pressure 1.0 |
cli/20_md_10mer_pbc_npt.sh |
| 21 | md-system mixed 1:1 | mmml md-system --setup pbc_nvt --composition MEOH:5,TIP3:5 --temperature 300 |
cli/21_md_system_meoh_tip3_1to1.sh |
| — | Full workflow | bash examples/mmml_tutorial/cli/run_full_training.sh |
1000 structures, split, PhysNet, PhysNet+DCMNet |
| — | DCMNet only | — | examples/dcm-net/train.py |