# mbe-tools walkthrough / 使用指南

**English** — quick tour of fragmenting, building inputs, templating jobs, and saving outputs.

**中文** — 快速演示如何拆分片段、生成输入、制作作业脚本并保存结果。

# Overview / 概览

- English: This notebook demonstrates reading XYZ, fragment sampling, MBE subset generation, input rendering (Q-Chem/ORCA), PBS/Slurm template generation, and writing outputs to notebooks/result.
- 中文：演示读取 XYZ、抽样片段、生成 MBE 子集、构建 Q-Chem/ORCA 输入、生成 PBS/Slurm 模板，并将结果写入 notebooks/result。

In [1]:
# Section 1: Load toolkit source context
from pathlib import Path
import mbe_tools

root = Path(mbe_tools.__file__).resolve().parent
print("mbe_tools package path:", root)
print("Available modules:", [p.name for p in root.glob('*.py')])

mbe_tools package path: /Users/jiarui/Downloads/mbe-tools/src/mbe_tools
Available modules: ['analysis.py', 'mbe.py', '__init__.py', 'input_builder.py', 'cli.py', 'mbe_math.py', 'utils.py', 'cluster.py', 'hpc_templates.py']


## Sections 2 & 3: README (EN) and README_CN snapshots
We verify the documentation already present in the workspace rather than re-generating it here.

In [2]:
# Show first lines of README.md and README_CN.md
from pathlib import Path

root = Path.cwd().parent if (Path.cwd().name == 'notebooks') else Path.cwd()
readme_en = root / "README.md"
readme_cn = root / "README_CN.md"

print("README.md ->", readme_en)
print("".join(readme_en.read_text(encoding="utf-8").splitlines(True)[:15]))

print("\nREADME_CN.md ->", readme_cn)
print("".join(readme_cn.read_text(encoding="utf-8").splitlines(True)[:15]))

README.md -> /Users/jiarui/Downloads/mbe-tools/README.md
# mbe-tools

`mbe-tools` is a Python package that covers the full **Many-Body Expansion (MBE)** loop:

- **Cluster design**: read `.xyz`, extract fragments, and sample subsets (with optional ion retention).
- **MBE job prep**: generate subset geometries, build Q-Chem/ORCA inputs, and emit PBS/Slurm job scripts (with optional chunked submission).
- **Parsing**: read ORCA / Q-Chem outputs, infer method/basis/grid metadata from paths or companion inputs, and write JSONL.
- **Analysis**: inclusion–exclusion MBE(k), summaries, CSV/Excel export, and basic plots.

Status: **0.1.0 (MVP)**. Backend syntax (e.g., ghost atoms) may need local tweaks.

---

## Install (editable for development)



README_CN.md -> /Users/jiarui/Downloads/mbe-tools/README_CN.md
# mbe-tools 简介

`mbe-tools` 覆盖 Many-Body Expansion (MBE) 工作流的常见环节：

- **簇与片段处理**：读取 `.xyz`，拆分片段并随机抽样（可保证包含离子）。
- **作业准备**：生成子集几何，渲染 Q-Chem / ORCA 输入文件，产出 PBS/Slurm 作业脚本（支持按批次切分提交）。
- **结

## Section 4: Sample usage (inputs, templates, MBE math)

In [3]:
# Geometry block for water dimer (toy)
geom_block = """O  0.000000  0.000000  0.000000
H  0.757000  0.586000  0.000000
H -0.757000  0.586000  0.000000
O  2.900000  0.000000  0.000000
H  3.657000  0.586000  0.000000
H  2.143000  0.586000  0.000000"""

from mbe_tools.input_builder import render_qchem_input, render_orca_input
from mbe_tools.hpc_templates import render_pbs_qchem, render_slurm_orca
from mbe_tools.mbe_math import assemble_mbe_energy

print("Q-Chem input (first lines):")
print("\n".join(render_qchem_input(geom_block, method="wb97m-v", basis="def2-ma-qzvpp").splitlines()[:8]))

print("\nORCA input (first lines):")
print("\n".join(render_orca_input(geom_block, method="wb97m-v", basis="def2-ma-qzvpp").splitlines()[:6]))

print("\nPBS template snippet:")
print("\n".join(render_pbs_qchem(job_name="water", chunk_size=3).splitlines()[:15]))

print("\nSlurm template snippet:")
print("\n".join(render_slurm_orca(job_name="orca_job", chunk_size=2).splitlines()[:15]))

# Synthetic MBE energies for fragments (0), (1), and dimer (0,1)
records = [
    {"subset_indices": [0], "energy_hartree": -75.0},
    {"subset_indices": [1], "energy_hartree": -75.1},
    {"subset_indices": [0, 1], "energy_hartree": -150.5},
]
result = assemble_mbe_energy(records)
print("\nMBE(k) order totals:", result["order_totals"])
print("Contributions:", result["contributions"])
print("Missing subsets:", result["missing_subsets"])


Q-Chem input (first lines):
$molecule
0 1
O  0.000000  0.000000  0.000000
H  0.757000  0.586000  0.000000
H -0.757000  0.586000  0.000000
O  2.900000  0.000000  0.000000
H  3.657000  0.586000  0.000000
H  2.143000  0.586000  0.000000

ORCA input (first lines):
! wb97m-v def2-ma-qzvpp
* xyz 0 1
O  0.000000  0.000000  0.000000
H  0.757000  0.586000  0.000000
H -0.757000  0.586000  0.000000
O  2.900000  0.000000  0.000000

PBS template snippet:
#!/bin/bash
#PBS -N water
#PBS -j oe
#PBS -l walltime=24:00:00,mem=32000Mb,ncpus=16
#PBS -o water.log

set -euo pipefail
shopt -s nullglob

FILES_PER_JOB=3
MEM=32000Mb
NCPUS=16
QC_MOD=qchem/5.2.2
BASE_JOBNAME=water


Slurm template snippet:
#!/bin/bash
#SBATCH --job-name=orca_job
#SBATCH --time=24:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
#SBATCH --mem=32GB
#SBATCH --output=orca_job.%j.out
#SBATCH --error=orca_job.%j.err

set -euo pipefail
shopt -s nullglob

module load orca/5.0.3
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-16}


MBE(k)

## Section 4b: Run-control example
Run-control is embedded in the PBS/Slurm templates. A control file (`<input>.mbe.control.toml` or `mbe.control.toml`) can confirm success via regexes, retry with cleanup/sleep, and write state to `.mbe_state.json`. The snippet below writes a minimal control file into `notebooks/result` for reuse in the demo.

In [None]:
from pathlib import Path

root = Path.cwd().parent if (Path.cwd().name == "notebooks") else Path.cwd()
result_dir = root / "notebooks" / "result"
result_dir.mkdir(parents=True, exist_ok=True)

control_text = """version = 1

[confirm]
regex_any = ["TOTAL ENERGY", "Energy\\s+="]
regex_none = ["SCF failed", "Error"]

[retry]
enabled = true
max_attempts = 2
sleep_seconds = 5
cleanup_globs = ["temp*", "Scratch/*"]
write_failed_last = true

[state]
skip_if_done = true
state_file = ".mbe_state.json"

[delete]
enabled = false
allow_delete_outputs = false
"""

ctrl_path = result_dir / "demo.mbe.control.toml"
ctrl_path.write_text(control_text, encoding="utf-8")
print("Wrote control file:", ctrl_path)
print(control_text)

## Section 5: Validate files in workspace

In [4]:
# List docs and notebook presence
from pathlib import Path
root = Path.cwd().parent if (Path.cwd().name == 'notebooks') else Path.cwd()
for path in [root / "README.md", root / "README_CN.md", root / "notebooks" / "sample_walkthrough.ipynb"]:
    print(path, "exists:" , path.exists())


/Users/jiarui/Downloads/mbe-tools/README.md exists: True
/Users/jiarui/Downloads/mbe-tools/README_CN.md exists: True
/Users/jiarui/Downloads/mbe-tools/notebooks/sample_walkthrough.ipynb exists: True


## Section 6: Full workflow using W20_3.xyz
We load the provided 20-water cluster, fragment it, generate subsets (k≤2), render a Q-Chem input for the first subset, and emit a PBS template (chunked).

In [None]:
# Full workflow demo
from pathlib import Path
from mbe_tools.cluster import read_xyz, fragment_by_water_heuristic, sample_fragments
from mbe_tools.mbe import MBEParams, generate_subsets_xyz
from mbe_tools.input_builder import render_qchem_input
from mbe_tools.hpc_templates import render_pbs_qchem

root = Path.cwd().parent if (Path.cwd().name == "notebooks") else Path.cwd()
xyz_path = root / "notebooks" / "data" / "W20_3.xyz"
result_dir = root / "notebooks" / "result"
result_dir.mkdir(parents=True, exist_ok=True)

print("XYZ path:", xyz_path)

xyz = read_xyz(str(xyz_path))
frags = fragment_by_water_heuristic(xyz, oh_cutoff=1.25)
print(f"Total fragments detected: {len(frags)}")

# Sample a manageable subset of fragments (10) to keep this demo light
sampled = sample_fragments(frags, n=min(10, len(frags)), seed=7, require_ion=False)
print(f"Sampled fragments: {len(sampled)}")

params = MBEParams()
subset_jobs = list(generate_subsets_xyz(sampled, params))
print(f"Generated subset geometries: {len(subset_jobs)} (k<=2)")

# Write all subset inputs
written = 0
for job_id, subset, geom_text in subset_jobs:
    inp = render_qchem_input(geom_text, method="wb97m-v", basis="def2-ma-qzvpp")
    out_path = result_dir / f"{job_id}.inp"
    out_path.write_text(inp, encoding="utf-8")
    written += 1

print(f"Wrote {written} Q-Chem inputs to {result_dir}")
print("Example: ", out_path)
print("\nPreview of last written input:\n", "\n".join(inp.splitlines()[:12]))

# Emit a PBS template with chunked submission
pbs_script = render_pbs_qchem(job_name="w20_demo", chunk_size=10, ncpus=16, mem_gb=32)
pbs_path = result_dir / "w20_demo.pbs"
pbs_path.write_text(pbs_script, encoding="utf-8")
print("\nPBS script preview:\n", "\n".join(pbs_script.splitlines()[:18]))
print("Saved:", pbs_path)

## Notes / 提示
- English: Run cells in order. Outputs are saved under `notebooks/result`.
- 中文：按顺序运行各个单元；结果保存在 `notebooks/result`。