Skip to content

Graph neural network for generating novel amino acid sequences that fold into proteins with predetermined topologies.

License

Notifications You must be signed in to change notification settings

ostrokach/proteinsolver

Repository files navigation

ProteinSolver

gitlab docs poster binder conda pipeline status coverage report

Description

ProteinSolver is a deep neural network which learns to solve (ill-defined) constraint satisfaction problems (CSPs) from training data. It has shown promising results both on a toy problem of learning how to solve Sudoku puzzles and on a real-world problem of designing protein sequences that fold into a predetermined geometric shape.

Demo notebooks

The following notebooks can be used to explore the basic functionality of proteinsolver.

Notebook name MyBinder Description
20_sudoku_demo.ipynb binder Use a pre-trained network to solve a single Sudoku puzzle.
06_sudoku_analysis.ipynb binder Evaluate a network trained to solve Sudoku puzzles using the validation
and test datasets.
(This notebook is resource-intensive and is best ran on a machine with a GPU).
20_protein_demo.ipynb binder Use a pre-trained network to design sequences for a single protein geometry.
06_protein_analysis.ipynb binder Evaluate a network trained to reconstruct protein sequences using the
validation and test datasets.
(This notebook is resource-intensive and is best ran on a machine with a GPU).

Other notebooks in the notebooks/ directory show how to perform more extensive validations of the networks and how to train new networks.

Docker images

Docker images with all required dependencies are provided at: https://gitlab.com/ostrokach/proteinsolver/container_registry.

To evaluate a proteinsolver network from a Jupyter notebook, we can run the following:

docker run -it --rm -p 8000:8000 registry.gitlab.com/ostrokach/proteinsolver:v0.1.25 jupyter notebook --ip 0.0.0.0 --port 8000

Installation

We recommend installing proteinsolver into a clean conda environment using the following command:

conda create -n proteinsolver -c pytorch -c conda-forge -c kimlab -c ostrokach-forge proteinsolver
conda activate proteinsolver

Development

First, use conda to install proteinsolver into a new conda environment. This will also install all dependencies.

conda create -n proteinsolver -c pytorch -c conda-forge -c kimlab -c ostrokach-forge proteinsolver
conda activate proteinsolver

Second, run pip install --editable . inside the root directory of this package. This will force Python to use the development version of our code.

cd path/to/proteinsolver
pip install --editable .

Pre-trained models

Pre-trained models can be downloaded using wget by running the following command in the root folder of the proteinsolver repository:

wget -r -nH --cut-dirs 1 --reject "index.html*" "http://models.proteinsolver.org/v0.1/"

For an example of how to use a pretrained ProteinSolver models in downstream applications (such as mutation ΔΔG prediction), see the elaspic/elaspic2 repository, and in particular the src/elaspic2/plugins/proteinsolver module.

Training and validation datasets

Data used to train and validate the "proteinsolver" network to solve Sudoku puzzles and reconstruct protein sequences can be downloaded from http://deep-protein-gen.data.proteinsolver.org/:

wget -r -nH --reject "index.html*" "http://deep-protein-gen.data.proteinsolver.org/"

The generation of the training and validation datasets was carried out in our predecessor project: ostrokach/protein-adjacency-net.

Environment variables

  • DATAPKG_DATA_DIR - Location of training and validation data.

Acknowledgements

References

  • Strokach A, Becerra D, Corbi-Verge C, Perez-Riba A, Kim PM. Fast and flexible protein design using deep graph neural networks. Cell Systems (2020); 11: 1–10. doi: 10.1016/j.cels.2020.08.016