Skip to content

Releases: Novel-Therapeutics/boltz-community

v2.10.8 — first PyPI release

29 May 08:10

Choose a tag to compare

Highlights

  • First PyPI release. boltz-community is now installable from PyPI: pip install boltz-community and uv add boltz-community both work without a git URL (#12).
  • Releases are now tag-driven via GitHub Actions + PyPI Trusted Publisher (OIDC). No long-lived API tokens stored anywhere; each publish is gated by a required-reviewer rule on the pypi GitHub environment.

Install

# from PyPI (new, recommended)
pip install boltz-community
pip install "boltz-community[cuda]"     # with NVIDIA CUDA kernels

# uv works the same way
uv add boltz-community

# bleeding-edge from main (still supported)
pip install "boltz-community @ git+https://github.com/Novel-Therapeutics/boltz-community.git"

Details

What changed

  • pyproject.toml metadata (6de6485): added description, explicit license = MIT, authors (Wohlwend / Corso / Passaro — upstream authors), maintainers (this fork), keywords, 10 standard classifiers, and a [project.urls] block with Homepage / Source / Issues / Changelog / Upstream. The fork relationship is now discoverable from the PyPI page itself.
  • Tag-driven release workflow (fe42db2): new .github/workflows/release.yml fires on v*.*.* tag pushes (and manual workflow_dispatch). Two jobs:
    • build — sdist + wheel via python -m build, validated with twine check, uploaded as a 7-day GitHub artifact.
    • publish — downloads the artifacts and publishes via pypa/gh-action-pypi-publish using PyPI Trusted Publisher. Requires id-token: write (OIDC) and runs inside the pypi GitHub environment so PyPI can verify the workflow identity.
  • Releasing section in README (fe42db2): documents the one-time PyPI Trusted Publisher binding, the GitHub environment setup, the per-release flow as a 7-step checklist, and recovery guidance for failed publishes.

Fork etiquette

Standard things this release also nails down explicitly on the published artifact:

  • LICENSE is preserved verbatim — upstream MIT copyright (Wohlwend, Corso, Passaro 2024) is unchanged.
  • PyPI Author field lists the upstream authors; Maintainer field lists this fork. Bug reports go to our issue tracker, not theirs.
  • ## Cite section in the README continues to point at the original Boltz-1, Boltz-2, and ColabFold papers.
  • [project.urls].Upstream points at jwohlwend/boltz so anyone landing on the PyPI page sees the relationship at a glance.

Commits since v2.10.7

  • 0cbe41c Release 2.10.8 — first PyPI publish
  • fe42db2 Add tag-driven PyPI release workflow + Releasing docs
  • 6de6485 Fill out PyPI metadata in pyproject.toml

v2.10.7

28 May 20:00

Choose a tag to compare

Highlights

  • Fixed affinity prediction running ~5× slower than necessary at default settings. Upstream hardcoded max_parallel_samples=1 for the affinity diffusion path, which was silently a no-op under upstream's buggy chunking math. Our earlier diffusion fix made that hardcode actually take effect, forcing N sequential single-sample forward passes per affinity record. The affinity path now honors the user's --max_parallel_samples, capped at --diffusion_samples_affinity.

Details

Background — how this regression slipped in

Upstream's chunking math in diffusionv2.py was:

sample_ids_chunks = sample_ids.chunk(multiplicity % max_parallel_samples + 1)

tensor.chunk(n) splits a tensor into n chunks, not into chunks of size n. For affinity (multiplicity=5, max_parallel_samples=1) this collapsed to chunk(1) — one chunk of 5 samples, all batched into a single forward pass. The hardcoded =1 upstream put on the affinity path never actually took effect.

Our v2.10.x diffusion fix replaced that with the obviously correct:

sample_ids_chunks = sample_ids.split(max_parallel_samples)

That fix was independently correct — it made --max_parallel_samples behave as documented in divisible cases — but it changed the semantics under the affinity caller, which was unknowingly tuned against the broken behavior. With the fix in place, the hardcoded 1 started doing what it said: 5 sequential single-sample passes per affinity record.

What changed

  • New helper _resolve_affinity_max_parallel_samples(max_parallel_samples, diffusion_samples_affinity) in src/boltz/main.py:
    • Honors the CLI --max_parallel_samples for affinity, matching the behavior already used for structure prediction.
    • Caps at diffusion_samples_affinity so we never claim to batch more affinity samples than diffusion will actually run.
    • Treats max_parallel_samples=None the same way the diffusion code does ("use multiplicity") — relevant for programmatic callers of predict.callback(...) that pass None.
  • predict_affinity_args calls the helper instead of hardcoding 1.

Behavior matrix

--max_parallel_samples --diffusion_samples_affinity Resolved affinity max Affinity behavior
5 (default) 5 (default) 5 1 batched forward pass (matches upstream's effective pre-fix behavior)
2 5 2 User override respected
50 5 5 Capped — no inflated work
1 5 1 Explicit single-sample preserved
None 5 5 Falls back to diffusion sample count

Tests

Six new unit tests in tests/test_cli.py::TestAffinityMaxParallelSamples:

  • Default CLI setting batches all samples.
  • User overrides below the cap are honored.
  • User overrides above the cap are capped.
  • Explicit 1 is preserved.
  • None falls back to diffusion_samples_affinity (regression guard against TypeError: '<' not supported between 'NoneType' and 'int').
  • Source regression guard: scans main.py for the predict_affinity_args block and fails if anyone re-hardcodes \"max_parallel_samples\": 1 there or removes the helper.

Total fork test count is now 283.

What did not change

  • The chunking primitive itself is correct; this only fixes the call site that was implicitly relying on the old broken behavior.
  • Structure prediction was already routing --max_parallel_samples correctly.
  • Numerical outputs are unchanged — this is purely a parallelism / wall-clock fix.

Commits since v2.10.6

  • 5103619 Release 2.10.7

v2.10.6

28 May 10:00

Choose a tag to compare

Highlights

  • Fixed training-data preprocessing so records produced by scripts/process/rcsb.py actually carry the cluster_id, msa_id, and entity_id that downstream training expects.

This is a training-only fix and does not affect inference users. If you have run boltz predict against the released model weights, nothing here changes for you. If you used scripts/process/rcsb.py to build a custom training set, the fields below were silently wrong and your trained models should be re-trained against re-preprocessed records.

Details

Upstream issue jwohlwend/boltz#686 reported two concrete problems in the training preprocessing pipeline; both reproduced unchanged in this fork.

  • cluster_id was always -1. scripts/process/cluster.py keys its clustering.json output by hash_sequence(seq) for polymers and by CCD code for ligands. scripts/process/rcsb.py was looking up f"{pdb_id}_{entity_id}", which never hits, so every chain fell through to cluster_id=-1 and ClusterSampler weighted everything uniformly instead of by cluster size.
  • msa_id was always "". The # FIX comment in upstream rcsb.py was literal — training records pointed at no MSA file, so training silently ran on single-sequence inputs with no MSA features whenever those records were loaded.
  • entity_id was dropped. Records ended up with entity_id: null.

What changed

  • scripts/process/rcsb.py now computes the per-chain cluster key the same way cluster.py emits it: hash_sequence(seq) for polymers, lowercased CCD code for ligands. The two scripts now share hash_sequence instead of carrying parallel definitions.
  • msa_id is populated with hash_sequence(seq) for protein chains (and stays empty for DNA/RNA/ligands, which src/boltz/data/module/training.py treats as "no MSA").
  • entity_id is propagated from the parsed chain through to ChainInfo.
  • scripts/process/mmcif.py exposes a new chain_to_seq: dict[str, str] field on ParsedStructure so rcsb.py can hash polymer sequences without re-parsing the mmCIF.

What did not change

  • cluster.py output format is preserved. Keying clusters by sequence content is the right design — a sequence shared across PDB entries should cluster together regardless of which pdb_id_entity_id it appears under.
  • Branched-ligand handling uses the first residue's CCD as the cluster key, matching the upstream simplification.
  • The published Boltz-1 / Boltz-2 weights are unaffected — upstream presumably trained with a private pipeline (the # FIX comment in the open-sourced rcsb.py suggests the released script was never the one they actually used).

Commits since v2.10.5

  • 1b6f361 Release 2.10.6

v2.10.5

27 May 15:52

Choose a tag to compare

Highlights

  • Added optional FlashAttention-2 / PyTorch SDPA acceleration for triangle attention and pair-biased attention behind a new --flash_attn CLI flag (off by default). Reduces attention memory footprint and speeds up inference on Ampere+ GPUs while remaining numerically equivalent to the manual einsum path within float-precision tolerance.
  • Fixed cuequivariance triangle attention kernel crashes by falling back to PyTorch when kernels are unavailable or unsupported. Extends the v2.10.3 fix that covered only triangle multiplication to the analogous triangle attention path — discovered during FlashAttention-2 validation on a GCP L4 when cuequivariance_torch is not installed and use_kernels=True (the default on Ampere+ GPUs).

Details

FlashAttention-2 / SDPA acceleration

  • New --flash_attn flag for boltz predict plumbs through to Attention (pair-biased) and TriangleAttention (starting/ending node) via a new use_sdpa / use_flash_attn parameter.
  • Implemented via torch.nn.functional.scaled_dot_product_attention with broadcast-shaped biases reshaped to a single fully-expanded (B_eff, H, N_q, N_k) mask. Q is pre-scaled before the kernel call (scale=1.0) to avoid double-scaling.
  • SDPA handles its own memory paging, so chunking is skipped when the SDPA path is enabled.
  • Parity tests verify numerical equivalence with the einsum path for AttentionPairBias, TriangleAttentionStartingNode, and TriangleAttentionEndingNode, including partial-mask and multiplicity edge cases.
  • Smoke tests guard the end-to-end inference flow when the flag is set.

Triangle attention kernel fallback

  • The kernel_triangular_attn call site now wraps imports/dispatch through a _kernel_or_none helper.
  • Recoverable failures — ImportError (e.g. cuequivariance_torch not installed) and known triangle_attention "Failed to import Triton-based component" / "Not Supported" runtime errors — are treated as fallback signals: a one-time RuntimeWarning is emitted, a module-level flag is latched, and the call drops through to the PyTorch path. Subsequent calls skip the kernel attempt entirely.
  • Unexpected kernel errors still surface (no silent swallowing).
  • New unit tests mirror the v2.10.3 triangular_mult fallback tests, including: recoverable error path, bare ModuleNotFoundError, unexpected error propagation, and numerical match between fallback output and use_kernels=False.

Misc

  • Clarified that --step_scale has different defaults for Boltz-1 vs Boltz-2 in CLI help.
  • Fixed a mask-shape bug in the AttentionPairBias FlashAttn-2 benchmark script (development-only).
  • Added .claude/ to .gitignore.

Commits since v2.10.4

  • 6f93ea6 Release 2.10.5
  • e48849c Fix mask shape in AttentionPairBias benchmark
  • f9b775e Ignore .claude/ worktree metadata
  • 21f8400 Add SDPA/FlashAttention-2 parity and smoke tests
  • 52daeb8 fix: correct bias reshape in _attention_sdpa for broadcast-shaped tensors
  • 92b9ff8 feat: add FlashAttention-2 / SDPA acceleration via --flash_attn flag
  • 275da19 Clarify --step_scale default differs between Boltz-1 and Boltz-2

v2.10.4

22 May 13:16

Choose a tag to compare

Patch release for Boltz-2 validation/fine-tuning.

  • Fixed Boltz-2 validation/fine-tuning crashes caused by validate_structure not being stored as self.validate_structure.
  • Added focused regression coverage for the Boltz-2 constructor behavior.
  • Updated README release notes and test-count comparison: 266 tests in this fork vs. 5 in current upstream.

v2.10.3

21 May 06:40

Choose a tag to compare

Highlights

  • Fixed cuequivariance triangle multiplication kernel crashes by falling back to PyTorch when kernels are unavailable or unsupported.
  • Fixed forced contact restraints losing all signal when union weighting underflowed for large distance violations.
  • Fixed an incorrect contact-union gradient sign so soft-OR contact restraints apply gradient pressure with the correct magnitude across union members.

Details

  • Triangle multiplication now treats known triangle_multiplicative_update import/support failures as recoverable, warns once, and disables both incoming/outgoing kernel paths for the rest of the process.
  • Unexpected kernel errors still surface instead of being silently swallowed.
  • Contact-union weighting now uses numerically stable normalization to avoid float32 underflow on large violations.
  • Contact-union gradients now match finite-difference checks, and negative union indices are rejected clearly.

Acknowledgment

Thanks to upstream PR jwohlwend#682 for identifying the kernel fallback and contact restraint areas that motivated these community fixes.

Commits since v2.10.2

  • 7032160 Fix contact union gradients and kernel fallback
  • 94b0d86 Bump version to 2.10.3

v2.10.2

15 May 00:50

Choose a tag to compare

Highlights

  • Fixed direct Boltz-2 model use from checkpoint hyperparameters when steering_args is missing.
  • Sparse steering dictionaries now fill known defaults while unknown keys raise a clear error.

Details

  • AtomDiffusion.sample() now treats missing steering_args as disabled steering, matching the default CLI inference path.
  • Known missing steering keys are filled with safe defaults so sparse checkpoint hyperparameters do not crash sampling.
  • Unknown steering argument keys now raise ValueError to avoid silently ignoring misspellings such as fk_steeering.
  • Added regression coverage for missing steering_args and unknown steering keys.

Commits since v2.10.1

  • 3475daa Fix Boltz-2 missing steering args

v2.10.1

02 May 00:21

Choose a tag to compare

Highlights

  • Fixed YAML bond constraints for custom cross-residue covalent bonds, including ACE-CY3 cyclization-style inputs.
  • Explicit cross-residue covalent bonds now receive atom-level bond-length bounds for physical guidance.
  • mmCIF output now preserves those cross-residue covalent links as _struct_conn records.

Details

  • User-provided YAML bond constraints now add explicit RDKit-style bond bounds between the referenced atoms, including cross-residue modified-residue bonds that were previously only recorded as graph connections.
  • Unknown or placeholder element radii fall back to a loose generic bond-length range and log a warning with the affected atom names.
  • Bond constraints that reference unknown chains, residues, or atoms now raise a clearer ValueError instead of a bare KeyError.
  • The mmCIF writer now emits _struct_conn rows for Boltz-1 Structure.connections and Boltz-2 covalent cross-residue bonds, including auth atom IDs and measured endpoint distances.
  • Existing _struct_conn loops are preserved and extended without clobbering prior rows, including quoted/colliding covaleN IDs.

Commits since v2.10.0

  • 72fab31 Expand README bug fix notes
  • 57d9655 Fix explicit cross-residue bond constraints
  • 4c6c3e9 Bump version to 2.10.1

v2.10.0

29 Apr 13:35

Choose a tag to compare

Highlights

  • Added --batch_size for Boltz-2 structure inference so multiple inputs can be processed per prediction batch.
  • Added repeated-binder affinity support: requesting affinity for one copy of a repeated ligand entity now works, with one output per binder copy.
  • Added summarized PAE fields to confidence_*.json: complex_pae, complex_ipae, chains_pae, and pair_chains_pae.

Details

  • Batched Boltz-2 structure inference is now supported for structure prediction.
    Current limits: affinity prediction remains single-record, and guided inference (--use_potentials / contact guidance) stays restricted to --batch_size 1.
  • Repeated-binder affinity requests now fan out over ligand copies during the affinity stage while keeping the structure prediction pass unchanged.
    Outputs are written per binder copy as affinity_<record>_<chain>.json.
  • PAE summaries are now written into confidence_*.json, remain available with --no_write_full_pae, use contact-weighted aggregation like PDE, and serialize undefined values as null.
  • Fixed a nested confidence aggregation bug in sequential mode that was breaking affinity GPU smoke tests after the PAE summary change.

Commits since v2.9.5

  • ecf5067 Add contact-weighted PAE summaries to confidence JSON
  • 1396534 Document PAE summary fix in README
  • 992034e Add batched Boltz-2 structure inference
  • 0895889 Support affinity prediction for repeated binders
  • b025b6b Fix nested confidence aggregation in sequential mode
  • d3832aa Bump version to 2.10.0

v2.9.5

16 Apr 18:49

Choose a tag to compare

What's changed

  • Corrective release after v2.9.4: removes the manual MPS GitHub Actions workflow from main via revert commit 5965429.
  • Keeps the template PDB parsing fix from v2.9.4, including fallback parsing for atom-only or metadata-light PDB template files.
  • Bumps the package version to 2.9.5 so the latest release points at the corrected branch state.

Included commits since v2.9.4

  • 5965429 Revert "Add manual MPS workflow for self-hosted runner"
  • 7ebf1be Bump version to 2.9.5

Validation

  • pytest tests/data/parse/test_mmcif_parser.py tests/data/parse/test_schema.py -q passed: 16 tests.