Releases: Novel-Therapeutics/boltz-community
v2.10.8 — first PyPI release
Highlights
- First PyPI release.
boltz-communityis now installable from PyPI:pip install boltz-communityanduv add boltz-communityboth work without a git URL (#12). - Releases are now tag-driven via GitHub Actions + PyPI Trusted Publisher (OIDC). No long-lived API tokens stored anywhere; each publish is gated by a required-reviewer rule on the
pypiGitHub environment.
Install
# from PyPI (new, recommended)
pip install boltz-community
pip install "boltz-community[cuda]" # with NVIDIA CUDA kernels
# uv works the same way
uv add boltz-community
# bleeding-edge from main (still supported)
pip install "boltz-community @ git+https://github.com/Novel-Therapeutics/boltz-community.git"Details
What changed
- pyproject.toml metadata (
6de6485): addeddescription, explicitlicense = MIT,authors(Wohlwend / Corso / Passaro — upstream authors),maintainers(this fork),keywords, 10 standardclassifiers, and a[project.urls]block withHomepage/Source/Issues/Changelog/Upstream. The fork relationship is now discoverable from the PyPI page itself. - Tag-driven release workflow (
fe42db2): new.github/workflows/release.ymlfires onv*.*.*tag pushes (and manualworkflow_dispatch). Two jobs:build— sdist + wheel viapython -m build, validated withtwine check, uploaded as a 7-day GitHub artifact.publish— downloads the artifacts and publishes viapypa/gh-action-pypi-publishusing PyPI Trusted Publisher. Requiresid-token: write(OIDC) and runs inside thepypiGitHub environment so PyPI can verify the workflow identity.
- Releasing section in README (
fe42db2): documents the one-time PyPI Trusted Publisher binding, the GitHub environment setup, the per-release flow as a 7-step checklist, and recovery guidance for failed publishes.
Fork etiquette
Standard things this release also nails down explicitly on the published artifact:
LICENSEis preserved verbatim — upstream MIT copyright (Wohlwend, Corso, Passaro 2024) is unchanged.- PyPI Author field lists the upstream authors; Maintainer field lists this fork. Bug reports go to our issue tracker, not theirs.
## Citesection in the README continues to point at the original Boltz-1, Boltz-2, and ColabFold papers.[project.urls].Upstreampoints atjwohlwend/boltzso anyone landing on the PyPI page sees the relationship at a glance.
Commits since v2.10.7
0cbe41cRelease 2.10.8 — first PyPI publishfe42db2Add tag-driven PyPI release workflow + Releasing docs6de6485Fill out PyPI metadata in pyproject.toml
v2.10.7
Highlights
- Fixed affinity prediction running ~5× slower than necessary at default settings. Upstream hardcoded
max_parallel_samples=1for the affinity diffusion path, which was silently a no-op under upstream's buggy chunking math. Our earlier diffusion fix made that hardcode actually take effect, forcing N sequential single-sample forward passes per affinity record. The affinity path now honors the user's--max_parallel_samples, capped at--diffusion_samples_affinity.
Details
Background — how this regression slipped in
Upstream's chunking math in diffusionv2.py was:
sample_ids_chunks = sample_ids.chunk(multiplicity % max_parallel_samples + 1)tensor.chunk(n) splits a tensor into n chunks, not into chunks of size n. For affinity (multiplicity=5, max_parallel_samples=1) this collapsed to chunk(1) — one chunk of 5 samples, all batched into a single forward pass. The hardcoded =1 upstream put on the affinity path never actually took effect.
Our v2.10.x diffusion fix replaced that with the obviously correct:
sample_ids_chunks = sample_ids.split(max_parallel_samples)That fix was independently correct — it made --max_parallel_samples behave as documented in divisible cases — but it changed the semantics under the affinity caller, which was unknowingly tuned against the broken behavior. With the fix in place, the hardcoded 1 started doing what it said: 5 sequential single-sample passes per affinity record.
What changed
- New helper
_resolve_affinity_max_parallel_samples(max_parallel_samples, diffusion_samples_affinity)insrc/boltz/main.py:- Honors the CLI
--max_parallel_samplesfor affinity, matching the behavior already used for structure prediction. - Caps at
diffusion_samples_affinityso we never claim to batch more affinity samples than diffusion will actually run. - Treats
max_parallel_samples=Nonethe same way the diffusion code does ("use multiplicity") — relevant for programmatic callers ofpredict.callback(...)that passNone.
- Honors the CLI
predict_affinity_argscalls the helper instead of hardcoding1.
Behavior matrix
--max_parallel_samples |
--diffusion_samples_affinity |
Resolved affinity max | Affinity behavior |
|---|---|---|---|
| 5 (default) | 5 (default) | 5 | 1 batched forward pass (matches upstream's effective pre-fix behavior) |
| 2 | 5 | 2 | User override respected |
| 50 | 5 | 5 | Capped — no inflated work |
| 1 | 5 | 1 | Explicit single-sample preserved |
None |
5 | 5 | Falls back to diffusion sample count |
Tests
Six new unit tests in tests/test_cli.py::TestAffinityMaxParallelSamples:
- Default CLI setting batches all samples.
- User overrides below the cap are honored.
- User overrides above the cap are capped.
- Explicit
1is preserved. Nonefalls back todiffusion_samples_affinity(regression guard againstTypeError: '<' not supported between 'NoneType' and 'int').- Source regression guard: scans
main.pyfor thepredict_affinity_argsblock and fails if anyone re-hardcodes\"max_parallel_samples\": 1there or removes the helper.
Total fork test count is now 283.
What did not change
- The chunking primitive itself is correct; this only fixes the call site that was implicitly relying on the old broken behavior.
- Structure prediction was already routing
--max_parallel_samplescorrectly. - Numerical outputs are unchanged — this is purely a parallelism / wall-clock fix.
Commits since v2.10.6
5103619Release 2.10.7
v2.10.6
Highlights
- Fixed training-data preprocessing so records produced by
scripts/process/rcsb.pyactually carry thecluster_id,msa_id, andentity_idthat downstream training expects.
This is a training-only fix and does not affect inference users. If you have run boltz predict against the released model weights, nothing here changes for you. If you used scripts/process/rcsb.py to build a custom training set, the fields below were silently wrong and your trained models should be re-trained against re-preprocessed records.
Details
Upstream issue jwohlwend/boltz#686 reported two concrete problems in the training preprocessing pipeline; both reproduced unchanged in this fork.
cluster_idwas always-1.scripts/process/cluster.pykeys itsclustering.jsonoutput byhash_sequence(seq)for polymers and by CCD code for ligands.scripts/process/rcsb.pywas looking upf"{pdb_id}_{entity_id}", which never hits, so every chain fell through tocluster_id=-1andClusterSamplerweighted everything uniformly instead of by cluster size.msa_idwas always"". The# FIXcomment in upstreamrcsb.pywas literal — training records pointed at no MSA file, so training silently ran on single-sequence inputs with no MSA features whenever those records were loaded.entity_idwas dropped. Records ended up withentity_id: null.
What changed
scripts/process/rcsb.pynow computes the per-chain cluster key the same waycluster.pyemits it:hash_sequence(seq)for polymers, lowercased CCD code for ligands. The two scripts now sharehash_sequenceinstead of carrying parallel definitions.msa_idis populated withhash_sequence(seq)for protein chains (and stays empty for DNA/RNA/ligands, whichsrc/boltz/data/module/training.pytreats as "no MSA").entity_idis propagated from the parsed chain through toChainInfo.scripts/process/mmcif.pyexposes a newchain_to_seq: dict[str, str]field onParsedStructuresorcsb.pycan hash polymer sequences without re-parsing the mmCIF.
What did not change
cluster.pyoutput format is preserved. Keying clusters by sequence content is the right design — a sequence shared across PDB entries should cluster together regardless of whichpdb_id_entity_idit appears under.- Branched-ligand handling uses the first residue's CCD as the cluster key, matching the upstream simplification.
- The published Boltz-1 / Boltz-2 weights are unaffected — upstream presumably trained with a private pipeline (the
# FIXcomment in the open-sourcedrcsb.pysuggests the released script was never the one they actually used).
Commits since v2.10.5
1b6f361Release 2.10.6
v2.10.5
Highlights
- Added optional FlashAttention-2 / PyTorch SDPA acceleration for triangle attention and pair-biased attention behind a new
--flash_attnCLI flag (off by default). Reduces attention memory footprint and speeds up inference on Ampere+ GPUs while remaining numerically equivalent to the manual einsum path within float-precision tolerance. - Fixed cuequivariance triangle attention kernel crashes by falling back to PyTorch when kernels are unavailable or unsupported. Extends the v2.10.3 fix that covered only triangle multiplication to the analogous triangle attention path — discovered during FlashAttention-2 validation on a GCP L4 when
cuequivariance_torchis not installed anduse_kernels=True(the default on Ampere+ GPUs).
Details
FlashAttention-2 / SDPA acceleration
- New
--flash_attnflag forboltz predictplumbs through toAttention(pair-biased) andTriangleAttention(starting/ending node) via a newuse_sdpa/use_flash_attnparameter. - Implemented via
torch.nn.functional.scaled_dot_product_attentionwith broadcast-shaped biases reshaped to a single fully-expanded(B_eff, H, N_q, N_k)mask. Q is pre-scaled before the kernel call (scale=1.0) to avoid double-scaling. - SDPA handles its own memory paging, so chunking is skipped when the SDPA path is enabled.
- Parity tests verify numerical equivalence with the einsum path for
AttentionPairBias,TriangleAttentionStartingNode, andTriangleAttentionEndingNode, including partial-mask and multiplicity edge cases. - Smoke tests guard the end-to-end inference flow when the flag is set.
Triangle attention kernel fallback
- The
kernel_triangular_attncall site now wraps imports/dispatch through a_kernel_or_nonehelper. - Recoverable failures —
ImportError(e.g.cuequivariance_torchnot installed) and knowntriangle_attention"Failed to import Triton-based component" / "Not Supported" runtime errors — are treated as fallback signals: a one-timeRuntimeWarningis emitted, a module-level flag is latched, and the call drops through to the PyTorch path. Subsequent calls skip the kernel attempt entirely. - Unexpected kernel errors still surface (no silent swallowing).
- New unit tests mirror the v2.10.3 triangular_mult fallback tests, including: recoverable error path, bare
ModuleNotFoundError, unexpected error propagation, and numerical match between fallback output anduse_kernels=False.
Misc
- Clarified that
--step_scalehas different defaults for Boltz-1 vs Boltz-2 in CLI help. - Fixed a mask-shape bug in the AttentionPairBias FlashAttn-2 benchmark script (development-only).
- Added
.claude/to.gitignore.
Commits since v2.10.4
6f93ea6Release 2.10.5e48849cFix mask shape in AttentionPairBias benchmarkf9b775eIgnore .claude/ worktree metadata21f8400Add SDPA/FlashAttention-2 parity and smoke tests52daeb8fix: correct bias reshape in_attention_sdpafor broadcast-shaped tensors92b9ff8feat: add FlashAttention-2 / SDPA acceleration via--flash_attnflag275da19Clarify--step_scaledefault differs between Boltz-1 and Boltz-2
v2.10.4
Patch release for Boltz-2 validation/fine-tuning.
- Fixed Boltz-2 validation/fine-tuning crashes caused by
validate_structurenot being stored asself.validate_structure. - Added focused regression coverage for the Boltz-2 constructor behavior.
- Updated README release notes and test-count comparison: 266 tests in this fork vs. 5 in current upstream.
v2.10.3
Highlights
- Fixed cuequivariance triangle multiplication kernel crashes by falling back to PyTorch when kernels are unavailable or unsupported.
- Fixed forced contact restraints losing all signal when union weighting underflowed for large distance violations.
- Fixed an incorrect contact-union gradient sign so soft-OR contact restraints apply gradient pressure with the correct magnitude across union members.
Details
- Triangle multiplication now treats known
triangle_multiplicative_updateimport/support failures as recoverable, warns once, and disables both incoming/outgoing kernel paths for the rest of the process. - Unexpected kernel errors still surface instead of being silently swallowed.
- Contact-union weighting now uses numerically stable normalization to avoid float32 underflow on large violations.
- Contact-union gradients now match finite-difference checks, and negative union indices are rejected clearly.
Acknowledgment
Thanks to upstream PR jwohlwend#682 for identifying the kernel fallback and contact restraint areas that motivated these community fixes.
Commits since v2.10.2
7032160Fix contact union gradients and kernel fallback94b0d86Bump version to 2.10.3
v2.10.2
Highlights
- Fixed direct Boltz-2 model use from checkpoint hyperparameters when
steering_argsis missing. - Sparse steering dictionaries now fill known defaults while unknown keys raise a clear error.
Details
AtomDiffusion.sample()now treats missingsteering_argsas disabled steering, matching the default CLI inference path.- Known missing steering keys are filled with safe defaults so sparse checkpoint hyperparameters do not crash sampling.
- Unknown steering argument keys now raise
ValueErrorto avoid silently ignoring misspellings such asfk_steeering. - Added regression coverage for missing
steering_argsand unknown steering keys.
Commits since v2.10.1
3475daaFix Boltz-2 missing steering args
v2.10.1
Highlights
- Fixed YAML
bondconstraints for custom cross-residue covalent bonds, including ACE-CY3 cyclization-style inputs. - Explicit cross-residue covalent bonds now receive atom-level bond-length bounds for physical guidance.
- mmCIF output now preserves those cross-residue covalent links as
_struct_connrecords.
Details
- User-provided YAML
bondconstraints now add explicit RDKit-style bond bounds between the referenced atoms, including cross-residue modified-residue bonds that were previously only recorded as graph connections. - Unknown or placeholder element radii fall back to a loose generic bond-length range and log a warning with the affected atom names.
- Bond constraints that reference unknown chains, residues, or atoms now raise a clearer
ValueErrorinstead of a bareKeyError. - The mmCIF writer now emits
_struct_connrows for Boltz-1Structure.connectionsand Boltz-2 covalent cross-residue bonds, including auth atom IDs and measured endpoint distances. - Existing
_struct_connloops are preserved and extended without clobbering prior rows, including quoted/collidingcovaleNIDs.
Commits since v2.10.0
72fab31Expand README bug fix notes57d9655Fix explicit cross-residue bond constraints4c6c3e9Bump version to 2.10.1
v2.10.0
Highlights
- Added
--batch_sizefor Boltz-2 structure inference so multiple inputs can be processed per prediction batch. - Added repeated-binder affinity support: requesting affinity for one copy of a repeated ligand entity now works, with one output per binder copy.
- Added summarized PAE fields to
confidence_*.json:complex_pae,complex_ipae,chains_pae, andpair_chains_pae.
Details
- Batched Boltz-2 structure inference is now supported for structure prediction.
Current limits: affinity prediction remains single-record, and guided inference (--use_potentials/ contact guidance) stays restricted to--batch_size 1. - Repeated-binder affinity requests now fan out over ligand copies during the affinity stage while keeping the structure prediction pass unchanged.
Outputs are written per binder copy asaffinity_<record>_<chain>.json. - PAE summaries are now written into
confidence_*.json, remain available with--no_write_full_pae, use contact-weighted aggregation like PDE, and serialize undefined values asnull. - Fixed a nested confidence aggregation bug in sequential mode that was breaking affinity GPU smoke tests after the PAE summary change.
Commits since v2.9.5
ecf5067Add contact-weighted PAE summaries to confidence JSON1396534Document PAE summary fix in README992034eAdd batched Boltz-2 structure inference0895889Support affinity prediction for repeated bindersb025b6bFix nested confidence aggregation in sequential moded3832aaBump version to 2.10.0
v2.9.5
What's changed
- Corrective release after
v2.9.4: removes the manual MPS GitHub Actions workflow frommainvia revert commit5965429. - Keeps the template PDB parsing fix from
v2.9.4, including fallback parsing for atom-only or metadata-light PDB template files. - Bumps the package version to
2.9.5so the latest release points at the corrected branch state.
Included commits since v2.9.4
5965429Revert "Add manual MPS workflow for self-hosted runner"7ebf1beBump version to 2.9.5
Validation
pytest tests/data/parse/test_mmcif_parser.py tests/data/parse/test_schema.py -qpassed: 16 tests.