Skip to content

Protein Structure Accuracy Estimation using Geometry-Complete Perceptron Networks (Protein Science 2024)


Notifications You must be signed in to change notification settings


Repository files navigation


PyTorch Lightning Config: Hydra Server: Flask DOI



Source code for the paper "Protein Structure Accuracy Estimation using Geometry-Complete Perceptron Networks".

NOTE: A web server implementation is freely available at



Install Mamba

wget "$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh  # accept all terms and install to the default location
rm Mambaforge-$(uname)-$(uname -m).sh  # (optionally) remove installer after using it
source ~/.bashrc  # alternatively, one can restart their shell session to achieve the same result

Install dependencies

# clone project
git clone

# create conda environment
mamba env create -f environment.yaml # NOTE: use `cpu_environment.yaml` instead for web server deployment
conda activate GCPNet-EMA  # NOTE: one still needs to use `conda` to (de)activate environments

# install local project as package
pip3 install -e .

# install ProDy separately to avoid a Biopython version conflict with Ankh
pip3 install prody==2.4.1

# uninstall protobuf (if installed) due to (potential) local GLIBCXX conflicts
pip3 uninstall protobuf

Note: TM-score is required to score predicted protein structures, where one can install it as follows:

# download and compile TM-score
mkdir -p ~/Programs && cd ~/Programs
g++ -static -O3 -ffast-math -lm -o TMscore TMscore.cpp
rm TMscore.cpp

Make sure to update the tmscore_exec_path value in e.g., configs/paths/default.yaml to reflect where you have placed the TM-score executable on your machine. Also, make sure that lddt_exec_path points to the bin/lddt path within your GCPNet-EMA Conda environment, where lddt is installed automatically as described in environment.yaml.

GCPNet for protein structure EMA (GCPNet-EMA)

How to prepare data and checkpoints for GCPNet-EMA

Download training and evaluation data as well as GCPNet-EMA model checkpoints

cd data/EMA/
tar -xzf ema_decoy_model.tar.gz
tar -xzf ema_true_model.tar.gz
cd ../../  # head back to the root project directory

wget -P checkpoints/
wget -P checkpoints/
wget -P checkpoints/
wget -P checkpoints/
wget -P checkpoints/

How to train GCPNet-EMA

Train a model for the estimation of protein structure model accuracy (EMA) task

# NOTE: adjust feature ablation arguments as desired
python3 src/ experiment=gcpnet_ema.yaml model.ablate_af2_plddt=true model.ablate_gtn=true data.ablate_ankh_embeddings=true data.ablate_esm_embeddings=true

How to evaluate GCPNet-EMA

Reproduce our results for the (tertiary structure) EMA task


# NOTE: ensure feature ablation arguments match checkpoint type
python3 src/ data=ema model=gcpnet_ema logger=csv trainer.accelerator=gpu trainer.devices=1 ckpt_path="$default_ema_model_ckpt_path" model.ablate_af2_plddt=true model.ablate_gtn=true data.ablate_ankh_embeddings=true data.ablate_esm_embeddings=true
python3 src/ data=ema model=gcpnet_ema logger=csv trainer.accelerator=gpu trainer.devices=1 ckpt_path="$af2_ema_model_ckpt_path" model.ablate_af2_plddt=false model.ablate_gtn=true data.ablate_ankh_embeddings=true data.ablate_esm_embeddings=true
Default EMA Model - No AlphaFold plDDT or ESM Embeddings as Inputs
┃          Test metric           ┃          DataLoader 0          ┃
│        test/PerModelMAE        │      0.04611478000879288       │
│        test/PerModelMSE        │      0.004228705074638128      │
│  test/PerModelPearsonCorrCoef  │       0.8075723052024841       │
│       test/PerResidueMAE       │      0.07066802680492401       │
│       test/PerResidueMSE       │      0.010494622401893139      │
│ test/PerResiduePearsonCorrCoef │       0.7123321890830994       │
│           test/loss            │      0.005345446057617664      │

AlphaFold EMA Model - No ESM Embeddings as Inputs
┃          Test metric           ┃          DataLoader 0          ┃
│        test/PerModelMAE        │      0.042016904801130295      │
│        test/PerModelMSE        │      0.003771992400288582      │
│  test/PerModelPearsonCorrCoef  │       0.8381679654121399       │
│       test/PerResidueMAE       │      0.06481857597827911       │
│       test/PerResidueMSE       │      0.009247069247066975      │
│ test/PerResiduePearsonCorrCoef │       0.7482331991195679       │
│           test/loss            │      0.004621841479092836      │

Note: Please contact us if you are interested in reproducing our results for the multimer structure EMA task as described in the manuscript. We would be happy to provide you with a copy of this corresponding dataset as desired.

How to predict lDDT scores for protein structures using GCPNet-EMA

Predict per-residue and per-model lDDT scores for 3D protein structures

predict_batch_size=1  # adjust as desired according to available GPU memory
num_workers=0  # note: required when initially processing new PDB file inputs, due to ESM's GPU usage

# NOTE: ensure feature ablation arguments match checkpoint type
python3 src/ model=gcpnet_ema data=ema data.predict_input_dir=$MY_INPUT_PDB_DIR data.predict_true_dir=$MY_OPTIONAL_TRUE_PDB_DIR data.predict_output_dir=$MY_OUTPUTS_DIR data.predict_batch_size=$predict_batch_size data.num_workers=$num_workers logger=csv trainer.accelerator=gpu trainer.devices=1 ckpt_path="$default_ema_model_ckpt_path" model.ablate_af2_plddt=true model.ablate_gtn=true data.ablate_ankh_embeddings=true data.ablate_esm_embeddings=true

For example, one can predict per-residue and per-model lDDT scores for a batch of tertiary protein structure inputs, 6W6VE.pdb and 6W77K.pdb within data/EMA/examples/decoy_model, as follows

python3 src/ model=gcpnet_ema data=ema data.predict_input_dir=data/EMA/examples/decoy_model data.predict_output_dir=data/EMA/examples/outputs data.predict_batch_size=1 data.num_workers=0 data.python_exec_path="$HOME"/mambaforge/envs/gcpnet/bin/python data.lddt_exec_path="$HOME"/mambaforge/envs/gcpnet/bin/lddt data.pdbtools_dir="$HOME"/mambaforge/envs/gcpnet/lib/python3.10/site-packages/pdbtools/ logger=csv trainer.accelerator=gpu trainer.devices=[0] ckpt_path=checkpoints/default_structure_ema_finetuned_gcpnet_without_plddt_or_esm_emb_p0p8c6pz_best_epoch_099.ckpt model.ablate_af2_plddt=true model.ablate_gtn=true data.ablate_ankh_embeddings=true data.ablate_esm_embeddings=true

Note: After running the above command, an output CSV containing metadata for the predictions will be located at logs/predict/runs/YYYY-MM-DD_HH-MM-SS/predict_YYYYMMDD_HHMMSS_rank_0_predictions.csv, with text substitutions for the time at which the above command was completed. This CSV will contain a column called predicted_annotated_pdb_filepath that identifies the temporary location of each input PDB file after annotating it with GCPNet-EMA's predicted lDDT scores for each residue. If a directory containing ground-truth PDB files corresponding one-to-one with the inputs in data.predict_input_dir is provided as data.predict_true_dir, then metrics and PDB annotation filepaths will also be reported in the output CSV to quantitatively and qualitatively describe how well GCPNet-EMA was able to improve upon AlphaFold's initial per-residue plDDT values.

For developers

Set up pre-commit (one time only) for automatic code linting and formatting upon each git commit

pre-commit install

Manually reformat all files in the project, as desired

pre-commit run -a

Update dependencies in environment.yml

mamba env export > env.yaml # e.g., run this after installing new dependencies locally
diff environment.yaml env.yaml # note the differences and copy accepted changes back into `environment.yaml`
rm env.yaml # clean up temporary environment file

Use Gunicorn to parallelize responses to web server requests across 4 workers using port 5000

SERVER_USE_CONFIG_0=true gunicorn -w 4 -b --timeout 300 src.wsgi:app

Test server locally using curl

curl -X POST -F "title=6KHVA" -F "structure_upload=@data/EMA/test_examples/decoy_model/6KHVA.pdb" -F ""

Create a user cronjob (via crontab -e) that checks every five minutes to make sure the Gunicorn web server is running and, if it is not, starts the server by running the Gunicorn command above

# NOTE: add this to your user cronjobs using `crontab -e`
*/5 * * * * pgrep -f "gunicorn -w 4 -b --timeout 300 src.wsgi:app" || cd /bml/$USER/Repositories/Lab_Repositories/GCPNet-EMA && ~/mambaforge/condabin/mamba run -n GCPNet-EMA SERVER_USE_CONFIG_0=true gunicorn -w 4 -b --timeout 300 --chdir /bml/$USER/Repositories/Lab_Repositories/GCPNet-EMA src.wsgi:app >> /bml/$USER/Repositories/Lab_Repositories/GCPNet-EMA/server_crontab_logfile.log 2>&1

The server should now be publicly available at when running it on port 5000 and at when running it on port 5001, and so on e.g., up to port 5003 (as configured locally via one's Apache server proxy).

NOTE: You should substitute the /bml/$USER/Repositories/Lab_Repositories/GCPNet-EMA references above with the absolute path to your personal copy of the repository.

NOTE: Make sure to create in the project's local directory (i.e., ./) a .env file that contains values for four key environment variables for server support: (1) SERVER_EMAIL_ADDRESS (e.g., ""); (2) SERVER_EMAIL_SMTP_SERVER (e.g., ""); (3) SERVER_EMAIL_PORT (e.g., "587" by default); and (4) SERVER_SECRET_KEY (initially generated by the Python secrets package).

NOTE: You can configure deployment of different server versions using the environment variables SERVER_USE_CONFIG_{0,1,2,3}=true. Base config 0 ablates AF2 plDDT and ESM embeddings; 1 ablates just ESM embeddings; 2 ablates just AF2 plDDT; and 3 uses both AF2 plDDT and ESM embeddings.


GCPNet-EMA builds upon the source code and data from the following project(s):

We thank all their contributors and maintainers!

Citing this work

If you use the code or data associated with this project, or otherwise find this work useful, please cite:

  title={Protein Structure Accuracy Estimation using Geometry-Complete Perceptron Networks},
  author={Morehead, Alex and Liu, Jian and Cheng, Jianlin},
  journal={Protein Science}


Protein Structure Accuracy Estimation using Geometry-Complete Perceptron Networks (Protein Science 2024)







No releases published


No packages published