inter-protein contact prediction from sequences of interacting proteins:
-
Please note: To implement protein language models (ESM-1b and ESM-MSA-1b in this study) in esm, model weights of these protein language models should be downloaded first from the links provided in the "Available Models and Datasets" table of esm github. The paths of these model weights need to be set in predict.py later. Besides, the contact regression parameter files of ESM-1b: esm1b_t33_650M_UR50S-contact-regression.pt and ESM-MSA-1b: esm_msa1b_t12_100M_UR50S-contact-regression.pt should be stored in the same directory with the model parameter files.
git clone https://github.com/ChengfeiYan/DRN-1D2D_Inter.git
2. Modify the path of each tool (CCMpred, alnstats ...) and the paths of the model weights of the protien lanuage models (ESM-1b and EMS-MSA-1b) in predict.py
3. Copy the esm1b_t33_650M_UR50S-contact-regression.pt from /data/regression to the location of ESM-1b's model weights; Copy the esm_msa1b_t12_100M_UR50S-contact-regression.pt from /data/regression to the location of ESM-MSA-1b's model weights;
Download the trained models from trained models, then unzip it into the folder named "model".
python predict.py sequenceA msaA sequenceB msaB result_path device
1. sequenceA: fasta file corresponding to target A.
2. msaA: a3m file corresponding to target A (multiple sequence alignment).
3. sequenceB: fasta file corresponding to target B (multiple sequence alignment).
4. msaB: a3m file corresponding to target B.
5. result_path: [a directory for the output]
6. device: cpu, cuda:0, cuda:1, ...
Where MSA should be derived from Uniref90 or Uniref100 database.
python predict.py ./example/1GL1_A.fasta ./example/1GL1_A_uniref100.a3m ./example/1GL1_I.fasta ./example/1GL1_I_uniref100.a3m ./example/result cpu
It should be noted, we downsampled the MSAs of the example target due to the file size limiation of github. The real performance of DRN-1D2D_Inter for the provided example should be better in real practice.
The script used to train DRN-1D2D_Inter is train.py, which contains all the details of training DRN-1D2D_Inter, including how to choose the best model, how to calculate the loss, etc.
Please cite: Yunda Si, Chengfei Yan, Improved inter-protein contact prediction using dimensional hybrid residual networks and protein language models, Briefings in Bioinformatics, 2023, bbad039, https://doi.org/10.1093/bib/bbad039
If you meet any problem in installing or running the program, please contact chengfeiyan@hust.edu.cn.