Skip to content

KERMT v2.0.0 — Contrastive KERMT

Latest

Choose a tag to compare

@evasnow1992 evasnow1992 released this 10 Jun 16:33
e402473

This release introduces Contrastive KERMT, a graph-transformer foundation model for ADMET (absorption, distribution, metabolism, excretion, toxicity) property prediction. It extends the v1 KERMT architecture with new pretraining objectives that produce stronger downstream representations on multi-task ADMET benchmarks.

What's new in v2

  • Contrastive KERMT pretraining objective. Keeps the v1 graph-transformer encoder + chemistry-specific vocabulary heads, and adds two new pretraining-only heads:
    • Transformer-based SMILES reconstruction decoder.
    • In-batch contrastive auxiliary classifier (cMIM).
      All four objectives are jointly optimized under a single unified log-probability factorization. The decoder and contrastive head are pretraining-only and are discarded before downstream fine-tuning, so the inference-time footprint matches v1.
  • Agent skill suite. Eight SKILL.md-format skills under agent/skills/ for driving the full ADMET research lifecycle with LLM agents (Claude Code, Codex, Nemotron): environment setup, pretrain-from-scratch, continue-pretrain, add-cMIM-pretrain, fine-tune, embed, infer, and monitor. See agent/README.md for installation and use.
  • Training infrastructure. Mid-epoch resume, atomic checkpoint saves, configurable WandB integration, task-specific multi-task FFN heads with per-task dropout, multi-worker data loaders.

Pretrained weights

Both contain the same bundle: a .pt checkpoint (~282 MB) plus the three pretraining vocabulary files (pretrain_atom_vocab.json, pretrain_bond_vocab.json, pretrain_smiles_vocab.pkl). Load via the codebase in this repository.

License

Companion materials

  • Manuscript / preprint: Xue et al., Probabilistic Contrastive Pretraining for Multi-task ADME Property Prediction. arXiv:2606.11508