Release v0.1.0 · NVIDIA/nvalchemi-toolkit

Initial public-beta release of NVIDIA ALCHEMI Toolkit, a GPU-first Python
framework for AI-driven atomic simulation workflows.

AtomicData — Pydantic-backed graph representation of atomic systems
(positions, atomic numbers, masses, node/edge properties) with factory
constructors from_atoms() (ASE) and from_structure() (pymatgen).
Batch — GPU-resident graph batch with MultiLevelStorage backend
supporting node-, edge-, and system-level tensors. Lazy batch_idx/batch_ptr,
index_select, append, and from_data_list for efficient batching.
Zarr I/O — AtomicDataZarrWriter and AtomicDataZarrReader with
configurable Zstd compression, chunking, and sharding for high-throughput
trajectory storage.
Dataset & DataLoader — CUDA-stream prefetching, async I/O, and
drop-in DataLoader replacement yielding Batch objects.

All wrappers implement BaseModelMixin with a unified ModelConfig for
capability declaration and runtime control.

DemoModelWrapper — Lightweight test/demo model (point-cloud energy +
autograd forces).
MACEWrapper — MACE equivariant neural network; supports foundation
checkpoints; COO neighbor format; conservative forces via autograd.
AIMNet2Wrapper — AIMNet2 atom-in-molecule network; energy, forces,
charges, stress; MATRIX neighbor format; NSE auto-detection.
LennardJonesModelWrapper — Warp-accelerated single-species LJ with
analytical forces and optional virial stress.
EwaldModelWrapper — Real + reciprocal space Ewald summation for
periodic charged systems; k-vector caching; hybrid analytical forces.
PMEModelWrapper — Particle Mesh Ewald (FFT-based, O(N log N)) for
large periodic systems.
DFTD3ModelWrapper — DFT-D3(BJ) dispersion correction with
auto-downloaded reference parameters and cutoff smoothing.
PipelineModelWrapper — Compose multiple models into groups with
independent derivative strategies (autograd vs. analytical).

BaseDynamics — Abstract base orchestrating model evaluation, integrator
updates, hook dispatch, convergence detection, and inflight batching.
9 hook insertion points per step (DynamicsStage enum): BEFORE_STEP,
BEFORE_PRE_UPDATE, AFTER_PRE_UPDATE, BEFORE_COMPUTE, AFTER_COMPUTE,
BEFORE_POST_UPDATE, AFTER_POST_UPDATE, AFTER_STEP, ON_CONVERGE.
ConvergenceHook — Flexible convergence criteria with from_fmax()
convenience constructor and per-system masking.

NVE — Velocity Verlet; symplectic, time-reversible, energy-conserving.
NVTLangevin — BAOAB Langevin dynamics with Ornstein-Uhlenbeck
thermostat for canonical sampling.
NVTNoseHoover — Nosé-Hoover chain thermostat with Yoshida-Suzuki
factorization; deterministic and ergodic.
NPT — Martyna-Tobias-Klein isothermal-isobaric with dual Nosé-Hoover
chains (particle + cell DOFs).
NPH — MTK isenthalpic-isobaric without thermostat.

FIRE — Fast Inertial Relaxation Engine with adaptive timestep.
FIREVariableCell — FIRE with NPH-like variable-cell propagation.
FIRE2 — Improved FIRE (Shuang et al. 2020) with better restart
conditions and modified velocity mixing.
FIRE2VariableCell — FIRE2 with variable-cell structural relaxation.

Dynamics hooks (nvalchemi.dynamics.hooks):

LoggingHook — Per-graph scalar statistics with thread-pooled I/O and
optional CUDA stream prefetch.
NaNDetectorHook — Immediate NaN/Inf detection in forces and energy.
MaxForceClampHook — Clamps force magnitudes to prevent numerical
explosions.
EnergyDriftMonitorHook — Cumulative energy drift tracking with
configurable thresholds (absolute and per-atom-per-step).
FreezeAtomsHook — Freezes selected atoms by category during MD.
SnapshotHook — Periodic full-state snapshots to a DataSink.
ConvergedSnapshotHook — Snapshot on convergence.
ProfilerHook — Per-stage wall-clock profiling with NVTX annotations
and CSV output.
AlignCellHook — Upper-triangular cell alignment for variable-cell
optimization.

General hooks (nvalchemi.hooks):

NeighborListHook — On-the-fly neighbor list construction/refresh with
Verlet skin buffer; MATRIX and COO formats.
WrapPeriodicHook — GPU-accelerated PBC wrapping via Warp kernel.
BiasedPotentialHook — External bias potentials for enhanced sampling
(umbrella sampling, metadynamics, etc.).

FusedStage (+ operator) — Compose dynamics stages on a single GPU
with shared forward pass and masked updates per sub-stage.
DistributedPipeline (| operator) — Distribute stages across GPU
ranks with blocking inter-rank communication.
SizeAwareSampler — Bin-packing inflight batching that respects
max_atoms, max_edges, and max_batch_size constraints.
Data sinks — HostMemory (CPU), GPUBuffer (device), ZarrData
(persistent disk) for capturing pipeline outputs.

All low-level kernels built on
nvalchemi-toolkit-ops
via NVIDIA Warp:

20 worked examples across four tiers (basic, intermediate, advanced,
distributed) covering data structures, optimization, MD ensembles,
Zarr I/O, inflight batching, custom hooks, model composition, Ewald
electrostatics, and multi-GPU pipelines.
7 Claude Code agent skills (.claude/skills/) for guided workflows:
model wrapping, data structures, data storage, dynamics API, dynamics
hooks, dynamics implementation, and engineering scoping.
OptionalDependency guards for graceful degradation when MACE, AIMNet2,
ASE, or pymatgen are not installed.

Provide feedback