Skip to content

davinan/oadmet_structure

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenADMET PXR Blind Challenge — Structure-Track Method Report

Submission: v10.1_failureproofed.zip — 184 PDB files, ligand residue LIG. A uniform 5-model consensus-central ("medoid") selection (RF3, AlphaFold3, Boltz2, Protenix, ESMFold2), then failure-proofed against the OST scorer's ligand-bond NaN mode. Baseline: openadmet_structure_v3.zip — the unmodified 3-model consensus output (first one we submitted to the leaderboard).

TL;DR

For each of the 184 blinded fragments we predict the PXR–ligand complex with independent co-folding models across many seeds — the base trio RF3, AlphaFold3, Boltz2, plus two architecturally-independent models added in v10, Protenix and ESMFold2 — then pick the structure most central to the set of models that agree (the consensus medoid). When the base trio doesn't agree we fall back to the best cross-model pair. 179 / 184 (97.3 %) of submitted structures have ≥ 2 base-trio models agreeing within 1 Å, and v10's medoid additionally incorporates the two independent models on the 133 / 184 ligands where at least one of them joins the base agreement.

Pipeline

Step What
Inputs PXR-LBD protein sequence (293 aa) + ligand SMILES from pxr-challenge_structure_TEST_BLINDED.csv (184 ligands).
Predictions Base trio: RoseTTAFold3, AlphaFold3, Boltz2 — same protein + ligand inputs, ~150 seeds per model accumulated across rounds. v10 adds two architecturally-independent co-folding models, Protenix and ESMFold2, run on all 184 ligands.
Iteration Each round adds many fresh seeds × 3 models for any ligand that does not yet have a 3-way agreement. The campaign ran for >200 seeds before we found a pose where all 3 models agreed for some ligands, but the success distribution started to converge after 200 seeds (so we usually stopped the pipeline after 200 seeds).
Agreement check For every cross-model pair of structures of the same ligand: Kabsch-align on protein Cα, then compute symmetry-aware heavy-atom ligand RMSD (RDKit graph-automorphism enumeration). A 3-way agreement is a (RF3, AF3, Boltz2) triplet where all three pairwise RMSDs are ≤ 1.0 Å.
Selection (medoid) For 3-way-agreed ligands we pick the medoid — the structure minimising the mean cross-model RMSD to the whole agreed model set (i.e. the most consensus-central pose), rather than the single tightest triplet used by the v3.5 baseline. for v7, AF3 is the representative because most benchmarks place it as slightly more accurate than RF3 and Boltz2, but v8 we treat each model the same. v10 extends the medoid set to five models (adding Protenix + ESMFold2 when they join the base agreement). For fallback ligands we keep the lowest-RMSD cross-model pair, preferring the AF3-side member (then RF3, then Boltz2).

Tiering of the 184 submitted structures (1.0 Å threshold)

Tier Definition n %
3-model agreement RF3 + AF3 + Boltz2 within 1.0 Å on a common triplet 105 57.1
2-model agreement No 3-way triplet, best cross-model pair ≤ 1.0 Å 75 40.8
Fallback (no agreement) Best cross-model pair > 1.0 Å — lowest-RMSD pair taken anyway 4 2.2
Total 184 100

179 / 184 (97.3 %) of submitted structures have at least two of the three models agreeing within 1 Å. Of the 79 ligands that did not reach 3-way agreement, 59 have a best cross-model pair ≤ 0.5 Å — these are tight pairs that only just missed the third model. The four true-fallback structures have best pairs in the range 1.0 – 2.17 Å.

The agreement tiers above are a property of the base-trio consensus — identical for every selection variant (v3.5, v7, v8, v10). The variants differ only in which structure is taken within each tier (and, from v8, from which model; v10 additionally lets a Protenix or ESMFold2 pose be the representative when it joins the base agreement — on 133 / 184 ligands one does).

Chosen-model distribution per variant:

Variant AF3 RF3 Boltz2 Protenix ESMFold2 Note
v10.1 (submitted) 80 51 27 8 18 v10 + OST failure-proofing: 4 base picks swapped to the nearest bond-clean pose (3× RF3→AF3, 1× AF3→AF3)
v10 77 54 27 8 18 uniform 5-model-agnostic medoid across all 184; adds the two independent models
v8 (prior best) 75 64 45 model-agnostic medoid — the most-central structure of any base-trio model wins
v7 159 25 0 AF3-only medoid; same mix as v3.5; beat v3.5 on the leaderboard

Why set the threshold at 1Å? The ligands in this dataset are quite small and rigid so we thought that the common 2 Å threshold is too permissive. 2 Å RMSD in Pymol also looks much much worse than 1 Å RMSD. Unfortunately not all ligands have a pose where all 3-models agree on at this threshold. This indicates that the models have themselves learned different distributions about co-folding protein-ligand complexes, and that is why when all 3 agree it might be a good signal that the pose is good.

v3 → v3.5 → v7 → v8 → v10 → v10.1

v3.5 (two-ligand Rosetta rescue). Two structures — x02859-1 and x03063-1 — failed initial OST scoring on v3, despite passing every structural sanity check (correct molecular graphs perceived from coordinates, no overlapping atoms, no ligand–protein clash). We re-prepared both with Rosetta cartesian cst-FastRelax (coordinate_constraint = 0.1) using MMFF94-charged Generic-FF ligand params and substituted the two relaxed PDBs into v3. The ligand and pocket moved ~0.4 Å on average (steric relief, no significant pose change).

v4/5/6 (Were just like v3 but we chose based on ipTM with preference on AF3 structures) Ended up with worse scores than v3.5

v7 (medoid selection). Within each 3-way-agreed ligand, v7 replaces v3.5's single-tightest-triplet AF3 with the medoid AF3 (lowest mean cross-model RMSD to the agreed RF3+Boltz2 set). This is RMSD-driven and more consensus-central, and it improved on v3.5 on the leaderboard. v7 differs from v3.5 on 62 / 184 ligands (all AF3→AF3); the 2-model / fallback picks and the two Rosetta-relaxed structures carry over unchanged.

v8 (medoid + model-agnostic) — the prior leaderboard best. Like v7, but with no AF3-first preference: the medoid is taken over all models (and extended to the 2-model tier), so the most consensus-central structure of any model — AF3, RF3, or Boltz2 — wins. This swings the chosen-model mix to AF3 75 / RF3 64 / Boltz2 45 and differs from v7 on 127 / 184 ligands. v8's submitted file was v8_rescued.zip: the model-agnostic picks, except for three ligands whose selected pose OST could not score — x02872-1 and x03325-1 fall back to their v7 poses, and x03004-1 is Rosetta cst-relaxed (each flagged via source_submission in the manifest). OST mis-perceives a bond on tight strained-ring contacts in those raw poses, which breaks ligand-graph matching; the fallback poses score cleanly. (v10.1 below generalises exactly this rescue.)

v10 (uniform 5-model-agnostic medoid) — adds two independent models. The base trio (RF3, AF3, Boltz2) are correlated — all AF3-style, MSA-based co-folding diffusion models — so their mutual agreement partly reflects shared bias. v10 adds two architecturally-independent co-folding models, Protenix and ESMFold2, run on all 184 ligands, and takes the medoid over all five. Per ligand it anchors on the v8 pick, forms one representative per agreeing base model, then admits a Protenix and/or ESMFold2 pose only if it lands within 1.0 Å (symmetry-aware ligand RMSD) of every base representative; the 5-model medoid is the member with the lowest mean cross-model RMSD. If neither independent model joins, v10 keeps the v8 pick verbatim, so the diff is bounded. The independent models join on 133 / 184 ligands; v10 differs from v8 on 67 / 184, but every change is sub-Ångström (max 0.81 Å; 0 moves > 2 Å) — the medoid just selects a more consensus-central representative of the same tightly-agreeing cluster. Submitted-model mix: AF3 77 / RF3 54 / Boltz2 27 / ESMFold2 18 / Protenix 8.

v10.1 (OST failure-proofing) — the submission. The official OST scorer returns NaN (a 20 Å BiSyRMSD / 0 LDDT-PLI penalty) whenever its geometry-based bond perception reads an extra ligand bond — the strained-ring false-bond mode. A deposited crystal has clean geometry, so OST perceives its true graph; a pose OST reads as over-bonded can therefore never graph-match the crystal. This is detectable with no ground truth: count the OST-perceived heavy-atom bonds of the LIG residue and compare to the true count from the ligand SMILES (RDKit). Four v10 picks carry exactly one such false bond — x02872-1, x03004-1, x03063-1, x03325-1 (x03063-1 was even a failure v3.5 had hand-fixed and the automated v10 silently reintroduced). For each, v10.1 ranks every candidate pose by ligand-RMSD to v10's pick and swaps in the nearest pose that passes the bond check (each < 0.9 Å away); the other 180 picks are byte-identical to v10. Because a NaN-penalised pose can only improve or tie when replaced by a scoreable one, v10.1 ≥ v10 on both leaderboard halves by construction. The submitted file is v10.1_failureproofed.zip; the four swaps are flagged via source_submission in the manifest.

Per-ligand provenance

The manifests/ directory holds the full per-ligand record (184 rows each):

Each row records, per ligand: the selection (source/tier/pick_method, chosen_model), the medoid score (medoid_mean_rmsd, v7/v8/v10), the chosen model's confidence (iptm, ptm, plddt, ranking_score), and the cross-model agreement RMSDs (best_max_rmsd_triplet, rmsd_rf3_af3, rmsd_rf3_boltz2, rmsd_af3_boltz2). The v10 / v10.1 manifests add the built-ligand bond counts (lig_covalent_bonds vs expected_bonds) that the OST failure-proofing checks. Missing fields use the literal string none. Absolute cluster paths (chosen_path) and the selecting seed/sample (chosen_seed / chosen_sample, plus the fallback pair_seed_* / pair_sample_*) are withheld while the challenge is live — the retained columns (chosen_model, the confidence metrics, and the cross-model RMSDs) document the method without revealing the exact draws. Full provenance + prediction files are available on request.

Running the code

The pipeline lives in scripts/: prediction round submission, cross-model agreement scoring, the v7/v8 selection + submission builders, and the v10 / v10.1 (5-model medoid → OST failure-proofing) builders. None of it is hard-coded to a particular machine:

  • Inputs (model predictions + consensus tables) are read from a single base directory given by the OADMET_PREDICTIONS_DIR environment variable (or the --predictions-dir flag). Outputs go to --out-dir / OADMET_OUT_DIR (default .); the blinded ligand CSV is passed via --ligand-csv / OADMET_LIGAND_CSV.
  • v10 adds the two independent models via --protenix-dir / --esmfold2-dir (OADMET_PROTENIX_DIR / OADMET_ESMFOLD2_DIR). v10.1 runs the OST bond check inside your OpenStructure container — pass --ost-sif to build_v10_1_failureproof.py.
  • Running the three models is installation-specific and left as a stub. submit_round.py prepares every per-model input (RF3 .json, AlphaFold3 .json, Boltz2 .yaml + a manifest) and the round/seed bookkeeping, then calls launch_models() — which is a clearly-marked placeholder. Because everyone's GPU setup differs (local, SLURM, cloud, different containers), we do not ship our cluster-specific launcher; add your own RF3/AF3/Boltz2 invocation in launch_models() (the reference hyperparameters are documented in its docstring).

Pipeline order and per-flag details are in scripts/README.md.

Tools

  • RF3 — Institute for Protein Design (UW)
  • AlphaFold3 — DeepMind
  • Boltz2 — MIT
  • Protenix — ByteDance (v10: 4th, independent model)
  • ESMFold2 — Meta AI (v10: 5th, independent model)
  • biotite, atomworks — for loading and processing structures
  • RDKit, OpenBabel, OpenStructure — open source (OpenStructure = the OST scorer used for the v10.1 bond check)
  • Rosetta — cartesian FastRelax for the v3 → v3.5 rescue

AI use acknowledgement

This project made extensive use of Claude Code (Anthropic) as an autonomous coding-and-analysis agent — it wrote and refactored the pipeline scripts, analysed the cross-model agreement and selection results, and managed the multi-round prediction → agreement → selection pipeline largely autonomously (including root-causing the OST scoring failures and the medoid selection study).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors