MolAlign

Open-source 3D molecular alignment and scoring engine for shape, electrostatics, and pharmacophore similarity, with a Rust core and Python API.

Current Status

MolAlign is currently strongest in:

rigid alignment of drug-like molecules
batch CPU throughput for rigid scoring/alignment
integration work toward REINVENT4 and de novo design workflows

Current limitations:

flexible alignment is still experimental for de novo screening and scaffold hopping
rigid batch alignment scales well across CPU cores, and flexible batches now parallelize across molecules, but single flexible alignment is still effectively single-core
LOBSTER oracle benchmarking is in place, but objective selection needs harder analyses than RMSD alone
DUDE-Z virtual-screening performance currently trails published RoShAMBo results on the public CXCR4 and CSF1R targets

Features

Gaussian shape overlap scoring (ROCS-like)
Electrostatic potential similarity (3-Gaussian 1/r fit)
Pharmacophore feature matching (RDKit features as "color" Gaussians)
Pharmacophore features now follow flexible atom motion through contributing-atom tracking
Rigid alignment
Multi-conformer ranking
Flexible alignment with alternating rigid/torsion local search
Batch CPU parallelism via Rayon for rigid and flexible batch processing
REINVENT4 scoring component plugin

Parallelization

MolAlign uses Rayon for molecule-level batch parallelism in both rigid and flexible batch workflows.

Level	Description	Expected Speedup
Molecule	Parallel rigid batch alignment	Measured, scales well
Flexible single molecule	Current state	Effectively single-core
Flexible batch	Parallel over molecules in Rust	Best fit for REINVENT-style batches

Thread Configuration

# Set thread count via environment variable (before Python import)
RAYON_NUM_THREADS=8 python your_script.py

import os
os.environ['RAYON_NUM_THREADS'] = '8'  # Must be set before importing molalign
from molalign import MolAligner
from molalign.prepared import PreparedScreeningDataset

Measured Scaling

Threads	Time	Throughput	Speedup	Efficiency
1	17215 ms	58 mol/s	1.00x	100%
2	10025 ms	100 mol/s	1.72x	86%
4	4898 ms	204 mol/s	3.51x	88%
8	3076 ms	325 mol/s	5.60x	70%

These numbers come from benchmarks/benchmark_scaling.py on a synthetic rigid batch benchmark. The script currently measures CPU scaling on randomly generated molecules, not DUDE-Z data.

For comparison, a single flexible alignment still shows only ~1.15x speedup from 1 to 8 Rayon threads, i.e. within-molecule flexible search remains effectively single-core today even though flexible batches now scale across molecules.

Installation

pip install maturin
maturin develop --release

Usage

from molalign import MolAligner

aligner = MolAligner(
    ref_mol=ref_3d_mol,
    preset="balanced",
    mode="flexible",
    num_conformers=50,
    weights={"shape": 0.5, "esp": 0.25, "pharm": 0.25},
    top_k_starts=3,
)

results = aligner.align(smiles_list)

# Large-scale rigid screening can use prepared packed arrays
prepared = PreparedScreeningDataset.from_rdkit_mols(dataset_mols)
screened = aligner.align_prepared(prepared)

Presets:

screening: faster staged rigid pipeline with shape prescreen and ESP-light reranking
balanced: default compromise for general discovery work
design: larger conformer pools and deeper reranking for accuracy-first workflows

Rigid-mode results now include metadata such as generated conformer count, shortlist size, generation time, effective weights, and whether staged reranking was applied.

Benchmarks

# Basic benchmarks
python benchmarks/benchmark_rigid.py
python benchmarks/benchmark_batch.py

# LOBSTER oracle + de novo benchmark
python benchmarks/benchmark_lobster.py

# Aggregate LOBSTER + DUDE-Z scorecard
python benchmarks/benchmark_scorecard.py

# Targeted DUDE-Z calibration smoke run
python benchmarks/benchmark_roshambo_compare.py --targets CXCR4 --modes shape,shape_esp,shape_pharm,balanced --max-queries 1

# RoShAMBo-style DUDE-Z comparison (published metrics only)
python benchmarks/benchmark_roshambo_compare.py

# Parallelization scaling benchmark
python benchmarks/benchmark_scaling.py

Current Benchmark Snapshot

LOBSTER oracle mode:

rigid validation now uses the actual returned aligned_coords to compute Shape Tversky/Tanimoto and pose deltas
native LOBSTER poses and RDKit rigid alignment are both reported as references for rigid single-conformer validation
score self-consistency is checked by rescoring the returned rigid pose with score_pre_aligned

Validated run on subset_90/80/70 with 30 pairs each, 12 workers, and 20 random baseline samples:

Scheme	Returned Tv	Native Tv	RDKit Tv
shape	0.896	0.892	0.894
shape+esp	0.897	0.892	0.894
shape+pharm	0.899	0.892	0.890

LOBSTER de novo mode:

generated conformer pools are now used to validate rigid multi-conformer selection directly
the harness reports oracle-best conformer in the generated pool, selected conformer, regret, and top-1/top-3 selection accuracy
recommended smoke run: python benchmarks/benchmark_lobster.py --mode all --subsets subset_90 --max-pairs 5 --n-confs 10 --seeds 42,43

Validated de novo run on subset_90/80/70 with 30 pairs each, 10 conformers, and seeds 42,43,44:

Scheme	Selected Tv	Best Tv	Regret	Top-1	Top-3
shape	0.829	0.839	0.010	49.6%	77.4%
shape+esp	0.835	0.844	0.008	55.6%	85.6%
shape+pharm	0.818	0.831	0.013	45.2%	76.3%

Current roadmap progress:

completed: benchmark scorecard generation, staged rigid shortlist/rerank pipeline, adaptive conformer pruning controls, result metadata, molecule-aware rigid calibration
completed: major Rust rigid speed pass with in-place transforms, top-start refinement, conformer-level parallelism, and SIMD-backed shape overlap
next: recover the small quality regression introduced by the fast path while keeping most of the new throughput, then continue rigid ranking/pruning work before flexible search redesign

RoShAMBo-style DUDE-Z comparison:

implemented on the public CXCR4 and CSF1R 3D pose sets
MolAlign currently underperforms published RoShAMBo enrichment and AUC metrics on these targets
this is a real gap to close before publication claims about VS competitiveness

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
benchmarks		benchmarks
docs		docs
python/molalign		python/molalign
rust/core		rust/core
tests		tests
P0_TODO.md		P0_TODO.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MolAlign

Current Status

Features

Parallelization

Thread Configuration

Measured Scaling

Installation

Usage

Benchmarks

Current Benchmark Snapshot

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MolAlign

Current Status

Features

Parallelization

Thread Configuration

Measured Scaling

Installation

Usage

Benchmarks

Current Benchmark Snapshot

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages