prtm

Protein Models (prtm) is an inference-only library for deep learning protein models.

Background

This library started out as a learning project to catch up on the deep learning models being used in protein science. After cloning a few repos it became clear that a nascent ecosystem was forming and that there was a need for a common interface to accelerate the creation of new workflows. The goal of prtm is to provide an (hopefully) enjoyable and interactive API for running, comparing, and chaining together protein DL models. Currently covered use cases include:

Folding
Inverse folding
Structure design
Sequence language modeling
Ligand docking

With many more to come!

Motivating Example

A very common workflow is to design a protein structure, apply inverse folding to generate plausible sequences, and then fold those sequences to see if they match the designed structure.

In prtm, we accomplish this with a few lines of code:

from prtm import models
from prtm import visual

# Define models for structure design, inverse folding and folding
designer = models.RFDiffusionForStructureDesign(model_name="auto")
inverse_folder = models.ProteinMPNNForInverseFolding(model_name="ca_only_model-20")
folder = models.OmegaFoldForFolding()

# Tell RFDiffusion to create a structure with exactly 128 residues
designed_structure, _ = designer(
    models.rfdiffusion_config.UnconditionalSamplerConfig(
        contigmap_params=models.rfdiffusion_config.ContigMap(contigs=["128-128"]),
    )
)

# Design a sequence and fold it!
designed_sequence, _ = inverse_folder(designed_structure)
predicted_designed_structure, _ = folder(designed_sequence)

# Visualize the designed structure and the predicted structure overlaid in a notebook
visual.view_superimposed_structures(designed_structure, predicted_designed_structure)

# Convert to PBD
pdb_str = predicted_designed_structure.to_pdb()

# Try docking a ligand (methotrexate) to the designed structure
ligand = "CN(CC1=CN=C2C(=N1)C(=NC(=N2)N)N)C3=CC=C(C=C3)C(=O)NC(CCC(=O)O)C(=O)O"
docker = models.DiffDockForLigandDocking()
poses, aux_output = docker(predicted_designed_structure, ligand)

# Visualize the predicted ligand poses
visual.view_structure_with_ligand(predicted_designed_structure, poses)

Installation

At this early stage, prtm has only been tested on a Linux system with a CUDA-enabled GPU. There are no guarantees that it will work on other systems.

Before getting started it's assumed that you've already installed conda or mamba (preferred), then clone this repo and create a prtm environment:

git clone https://github.com/conradry/prtm.git
cd prtm
mamba env create -f environment.yaml
mamba activate prtm
pip install -e .

To make prtm more accessible it was decided to remove custom CUDA kernels from all models that previously used them, so that's it for most cases!

Optionally, Pyrosetta is a soft-dependency of prtm and is only required for the protein_seq_des model. A license is required to use Pyrosetta and can be obtained for free for academic use. For installation instructions, see here.

What's implemented

Note: Most, but not all models, allow commerial use. Please check the license of each model.

AlphaFold is written and JAX but all other models are written in PyTorch, therefore we chose not to directly integrate the AlphaFold inference code into this repo. Both OpenFold and Uni-Fold allow for the conversion of the AlphaFold JAX weights into PyTorch. The Uni-Fold implementation is designed to work with MMSeqs2 and has support for multimers which is why we adopted it. Eventually, we may decide to subsume the OpenFold models under Uni-Fold.

Model Name	Function	Notebook	Source Code	License
AlphaFold/Uni-Fold	Folding	Notebook	https://github.com/dptech-corp/Uni-Fold	Apache 2.0
AlphaFold/UniFold-Multimer	Folding	Notebook	https://github.com/dptech-corp/Uni-Fold	Apache 2.0
OpenFold	Folding	Notebook	https://github.com/aqlaboratory/openfold	Apache 2.0
ESMFold	Folding	Notebook	https://github.com/facebookresearch/esm	MIT License
RoseTTAFold	Folding	Notebook	https://github.com/RosettaCommons/RoseTTAFold	MIT License
OmegaFold	Folding	Notebook	https://github.com/HeliXonProtein/OmegaFold	Apache 2.0
DMPfold2	Folding	Notebook	https://github.com/psipred/DMPfold2	GPL v3.0
Uni-Fold Symmetry	Folding	Notebook	https://github.com/dptech-corp/Uni-Fold	GPL v3.0
IgFold	Antibody Folding	Notebook	https://github.com/Graylab/IgFold	JHU License
ESM-IF	Inverse Folding	Notebook	https://github.com/facebookresearch/esm	MIT License
ProteinMPNN	Inverse Folding	Notebook	https://github.com/dauparas/ProteinMPNN	MIT License
PiFold	Inverse Folding	Notebook	https://github.com/A4Bio/PiFold	MIT License
ProteinSeqDes	Inverse Folding	Notebook	https://github.com/nanand2/protein_seq_des	BSD-3
ProteinSolver	Inverse Folding	Notebook	https://github.com/ostrokach/proteinsolver	MIT License
RFDiffusion	Design	Notebook	https://github.com/RosettaCommons/RFdiffusion	BSD
ProteinGenerator	Design	Notebook	https://github.com/RosettaCommons/protein_generator	MIT License
Genie	Design	Notebook	https://github.com/aqlaboratory/genie	Apache 2.0
FoldingDiff	Design	Notebook	https://github.com/microsoft/foldingdiff	MIT License
SE3-Diffusion	Design	Notebook	https://github.com/jasonkyuyim/se3_diffusion	MIT License
EigenFold	Fold sampling	Notebook	https://github.com/bjing2016/EigenFold	MIT License
AntiBERTy	Antibody language modeling	Notebook	https://github.com/jeffreyruffolo/AntiBERTy	MIT License
DiffDock	Ligand docking	Notebook	https://github.com/gcorso/DiffDock	MIT License

Links for papers can be found on the Github repos for each model.

Documentation

A real docs page is a work in progress, but to get started the provided notebooks should be enough. In addition to minimal usage notebooks for each implemented model, there are also more general notebooks that cover common use cases and some features of the prtm API. A good order to try is:

For more complex design algorithms like RFDiffusion and ProteinGenerator, there are detailed example notebooks to look at:

Roadmap and Contributing

The currently implemented models only scratch the surface of what's available. There's a sketchy model tracking Google sheet for papers and code repos that are being considered for implementation. If you'd like to contribute or suggest priorities, please open an issue or PR and we can discuss!

There's, of course, also a lot of technical debt to payoff that accumulated from duct taping together code from many different sources. Docstrings, API improvements, bug fixes, and better tests are very welcome!

Acknowledgments

This project is an achievement of copy-paste engineering 😉. It would not have been possible without the hard work of the authors of the models that are implemented here. Please cite their work if you use their model!

Name		Name	Last commit message	Last commit date
Latest commit History 378 Commits
docs		docs
notebooks		notebooks
prtm		prtm
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

prtm

Background

Motivating Example

Installation

What's implemented

Documentation

Roadmap and Contributing

Acknowledgments

About

Releases

Packages

Languages

License

conradry/prtm

Folders and files

Latest commit

History

Repository files navigation

prtm

Background

Motivating Example

Installation

What's implemented

Documentation

Roadmap and Contributing

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages