Skip to content

A comprehensive benchmark on the performances of multiple protein backbone generative models.

License

Notifications You must be signed in to change notification settings

Immortals-33/Scaffold-Lab

Repository files navigation

Scaffold-Lab: A Unified Framework for Evaluating Protein Backbone Generation Methods


Official implementation for Scaffold-Lab: Critical Evaluation and Ranking of Protein Backbone Generation Methods in A Unified Framework.

Description

Scaffold-Lab is the first unified framework for evaluating different protein backbone generation methods.

We present the benchmark for both unconditional generation and conditional generation in terms of designability, diversity, novelty, efficiency and structural properties. Currently evaluated methods are listed below:

Unconditional Generation

Conditional Generation


Updates

  • July 19th, 2024: We now enable motif positions to be partially redesigned with ProteinMPNN. Check out here to see the way of specification.
  • June 19th, 2024 : Scaffold-Lab now supports AlphaFold2 for evaluation! The implementation of AF2 is built upon LocalColabFold. We refer interested users to here for more details.

Note

This is a beta version which has not been tested thoroughly. Bug reports and pull requests are especially welcomed.


Table of Contents


Installation

To quickly set up an environment, just simply run:

conda create -f scaffold-lab.yml
source activate scaffold-lab
# You may also need to install some dependencies manually in certain cases
pip install hydra-core --upgrade
pip install hydra-joblib-launcher --upgrade
pip install ml-collections GPUtil hjson h5py

Outline

Here is a guide about how you can go through this repository. We aim to provide an easy-to-use evaluation pipeline as well as maximize the utility of individual scripts. Let's go through the structure of this repository as a start:

  • scaffold_lab: This is the main directory to run different evaluations described in our paper.

  • analysis: Scripts for calculating several metrics, including diversity, novelty and structural properties.

  • baselines: In order to generate protein backbones directly inside this repository, you may find the code of different methods baselines for unconditional generation and conditional generation then clone their repository under this content. it is highly recommended to run inference for different baselines inside their own virtual environment for potential conflicts of environmental dependencies.

    • Inside the experiment folder we provide scripts for performing motif-scaffolding experiments by Chroma using its SubstrctureConditioner. Refer the script for detailed information if you want.
  • config: We place different configuration settings of Hydra here to organize for evaluations. Hydra is a hierarchical configuration framework to help users systematize different experimental settings. Though it might be confusing when you first get in touch with it, it is a powerful tool to help you perform experiments efficiently with different combinations of parameters, for example, the number of sequences to generate. We recommend readers to Docs for advanced usage.


Usage

Unconditional Generation

Let's start by running a simple evaluation here:

python scaffold_lab/unconditional/refolding.py 

This performs a simple refolding analysis for the proteins we put inside demo/unconditional/.


Conditional Generation (Motif-scaffolding)

To run a minimal version on motif-scaffolding task, simply run:

python scaffold_lab/motif_scaffolding/motif_refolding.py

This performs a evaluation on demo/motif_scaffolding/2KL8/ where the outputs would be saved under outputs/2KL8/.


Customize Methods for Structure Prediction

We support both AlphaFold2 (single-sequence version) and ESMFold for structure prediction during refolding.

ESMFold

Scaffold-Lab performs evaluation using ESMFold by default. Once you set up the environment this should work.

AlphaFold2 (single-chain version)

The implementation of AlphaFold2 is based on LocalColabFold, which is a local version of ColabFold. We provide a brief guideline for enabling using AlphaFold2 during evaluation:

  • Install LocalColabFold. Please follow the installation guide on its official page based on your specific OS. Note that it might take a few tries for a complete installation.

  • Export executable ColabFold into your PATH. This enables the running of ColabFold during the refolding pipeline. Suppose the root directory of your LocalColabFold is {LocalColabFold}, then you can export variable PATH in two ways:

    • Set up inside the config (Recommended). Specifically, two ways to do so:

      • Inside config/unconditional.yaml and config/motif_scaffolding.yaml (Recommended):

        inference:
          af2:
            executive_colabfold_path: {LocalColabFold}/colabfold-conda/bin # Replace {LocalColabFold} by your actual path of LocalColabFold
      • Alternatively, set this in a command-line way:

        python scaffold_lab/unconditional/refolding.py inference.af2.executive_colabfold_path='{LocalColabFold}/colabfold-conda-bin'
    • Direct set variable PATH before running evaluation script, which is similarily done in #5 inside this guide.

  • Set AlphaFold2 as your forward folding method when running evaluation. Inside the config:

    inference:
    ...
      predict_method: [AlphaFold2] # Only run AF2 for evaluation
      predict_method: [AlphaFold2, ESMFold] # Run both AF2 and ESMFold for evaluation
    ...

And voilà!


Contact


Citation

If you use Scaffold-Lab in your research or find it helpful, please cite:

@article{zheng2024scaffoldlab,
title = {Scaffold-Lab: Critical Evaluation and Ranking of Protein Backbone Generation Methods in A Unified Framework},
author = {Zhuoqi, Zheng and Bo, Zhang and Bozitao, Zhong and Kexin, Liu and Zhengxin, Li and Junjie, Zhu and Jinyu, Yu and Ting, Wei and Haifeng, Chen},
year = {2024},
journal = {bioRxiv},
url = {https://www.biorxiv.org/content/10.1101/2024.02.10.579743v3}
}

Acknowledgments

This codebase benefits a lot from FrameDiff, OpenFold and some other amazing open-source projects. Take a look at their work if you find Scaffold-Lab is helpful!

About

A comprehensive benchmark on the performances of multiple protein backbone generative models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages