Skip to content

MingYangi/MoMST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

146 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Objective Protein Design via Memory-Aware Test-Time Scaling in Diffusion Models

✨ Official implementation of MoMST from ICML 2026.

📄 paper | 🔗 code

🧠 Overview

This repository implements MOMST, a framework for multi-objective protein sequence design. This framework alternates between noising and memory-guided denoising in diffusion models. By combining self-contrastive learning to extract residue-level preferences from historical trajectories with inference-time Pareto alignment, MOMST effectively balances conflicting functional rewards while strictly preserving the pre-trained model's sequence naturalness.

🧬 Generated Proteins

The Presentation of Result

We present research results on the optimization of several fundamental structural objective functions. These include:

  • Single-objective protein design, exemplified by optimizing the cRMSD metric (e.g., from run EHEE_rd1_0101).
  • Multi-objective design, utilizing a dual-objective combination of globularity and pLDDT.
  • Multi-objective design, incorporating a triple-objective combination of hydrophobicity, globularity, and pLDDT.
                                         

cRMSD (EHEE_rd1_0101)                Globularity + pLDDT                Hydrophobic + Globularity + pLDDT

🚀 Quick Start

⚙️ Installation

Install pytroch, pyrosseta. Then, run the following

conda create -n MoMST python=3.9 
conda activate MoMST
pip install torch torchvision torchaudio
pip install -r requirements.txt

Also, to optimize match_ss and crmsd, go to the ./datasets folder and download the protein examples as shown below. You can also use any PDB files.

python download_model_data.py

This code puts several pdb files into ./datasets/AlphaFoldPDB/.

🧑‍💻 Example of Running the Code

Below is an explanation of the available options.

Argument Description
--decoding decoding method (momst, SVDD_edit, SVDD)
--repeatnum batch size
--duplicate number of andidates
--metrics_name reward functions
--metrics_list weights for rewards
--proteinname target PDB name
--iteration number of iterations
--seq_length protein length

🧬 Single-Objective Protein Design

1. Secondary Structure Match

Design a sequence that folds into a target secondary structure.

CUDA_VISIBLE_DEVICES=0 python refinement.py --decoding momst  --repeatnum 10 --duplicate 20  --metrics_name match_ss  --metrics_list 1 --proteinname XX_run1_0254_0003 --iteration 30

2. cRMSD

Design a sequence that folds into a target structure based on cRMSD.

CUDA_VISIBLE_DEVICES=0 python refinement.py --decoding momst  --repeatnum 20 --duplicate 20  --metrics_name crmsd  --metrics_list 1 --proteinname 5KPH --iteration 40

🧬 Multi-Objective Protein Design

1. Globularity + pLDDT

The globularity-pLDDT combination provides structural confidence in a compact sphere for stable scaffold design.

CUDA_VISIBLE_DEVICES=0 python refinement.py --decoding momst  --repeatnum 10 --duplicate 20  --metrics_name globularity,plddt  --metrics_list 1,1 --iteration 20 --seq_length 150

2. Hydrophobicity + Surface Exposure + pLDDT

The hydrophobicity-surface exposure-pLDDT combination suits therapeutic protein design, ensuring high structural stability, solubility, and reduced aggregation-mediated immunogenic risks.

CUDA_VISIBLE_DEVICES=0 python refinement.py --decoding momst  --repeatnum 10 --duplicate 20  --metrics_name hydrophobic,surface_expose,plddt  --metrics_list 1,1,1 --iteration 20 --seq_length 150

🎓 Acknolwdgements

Our codebase is heavily based on RERD, evodiff, openfold, ESMfold.

About

Multi-objective Protein Design via Reasoning-aware Self-Contrast Learning with Test-time Scaling in Diffusion Models

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages