korpm

A fast method for predicting the stability change upon mutation from a 3D structure. Predicting protein stability changes upon mutation using a simple orientational potential. I. Martín-Hernández, Y. Dehouck, U. Bastolla, J.R. López-Blanco and P. Chacón. 2023. Bioinformatics, 39, 1, 2023

Git users: Be sure that LFS is installed to support long files, otherwise download the repo as a zip file.
The size of korp6Dv1.bin should be 331.8MB

Usage

The usage is very simple:

sbg/bin/korpm input.txt --dir Ssym --score_file pot/korp6Dv1.bin -o out.txt

it only requires: 1) a two-column input file specifying both the PDB file and mutation, i.e. 1BNI IA76A, and 2) the path where the PDB files are located (--dir) and the KORP potential file (--score_file). The results are also stored in the out.txt (-o option) file.

more out.txt
1BNI IA76A       -1.396
1EY0 TA44V        0.108
1IHB FA82Q       -0.727

The mutation columns stand for: the 1st letter is the wild-type amino acid, 2nd is the chain ID, the digits correspond to the PDB residue position, and the last letter is the mutated amino acid. We follow the standard convention ΔΔG >= 0 (positives) is stabilizing and ΔΔG < 0 (negatives) is destabilizing.

Source Code

The complete source C++ code of KORPM is under sbg directory. Just execute compile_korpm.sh script.

ΔΔG Curated Databases

We extracted from Thermomut and ProThermDB unique mutations trying to avoid entries that potentially interact with ligands or belong to protein-protein interfaces, and removing entries measured at extreme temperature or pH conditions. The initial curated database data comprise 3824 mutations from 139 protein families (seq. identity <25%) with an average of ΔΔG -1.0 Kcal/mol and a standard deviation of 1.6 Kcal/mol. In total, 73% are destabilizing (ΔΔG>0) and 27% are stabilizing (ΔΔG<0). By removing mainly alanines' destabilizing mutations, we obtain a more balanced subset that includes 2371 mutations from 129 protein families, 58% destabilizing and 42% stabilizing with an average of ΔΔG -0.7 Kcal/mol and a standard deviation of 1.6 Kcal/mol. This subset, named Id25c03_1merNCLB.txt, was used for extract training and validation datasets for k-fold cross-validation experiments. Note that this subset is far from being perfectly balanced, e.g., the most frequent amino acid involved in the mutation still is alanine, and cysteines, tryptophans, and, prolines still are underpopulated. This dataset does not include any of the mutations of the Ssym test set. There are two additional subsets excluding all the proteins presenting at least 25% of the sequence of identity with either Ssym dataset Id25c03_1merNCLB_Ssym.txt or S641 dataset Id25c03_1merNCLB_S461.txt. In our case, these different datasets were employed to test the absence of significant overfitting in our simple potential.


Id25c03_1merNCL.txt	Id25c03_1merNCLB.txt

In the directory Db you can find all the corresponding PDB files.

Results with Ssym

Ssym is a data set with an equal number of stabilizing and destabilizing mutations compiled by Pucci et al. (https://doi.org/10.1093/bioinformatics/bty348) for which the structure of both the wild-type and mutant protein are available.

sbg/bin/korpm Ssym.txt --dexp --dir Ssym --score_file pot/korp6Dv1.bin -o Ssym_all.txt

Where Ssym.txt is the mutations input file and the Ssym directory is where the input PDB files are stored. Since this input contains the experimental ΔΔG (see Appendix for small corrections) you can cross-check the predictions by:

scripts/Mstat.pl Ssym_all.txt 10 11 2

# Current Positives|Negatives threshold (thr) is 0 (ddG >= 0 are not-destabilizing [positives] and ddG < 0 are destabilizing [negatives]).

aa     S     D     T   TP  avg  err   FP   TN  avg  err   FN     P     N   SEN   SPE   PPV   NPV   ACC  accn  RMSE  MAE   PCC    Sc    Ob1   Ob2  MCC
 X   342   342   684  264  1.5  0.9   71  271 -1.5  0.8   78   335   349 0.772 0.792 0.788 0.777 0.782 0.782 1.331 0.969 0.695  64.6  34.6   0.7  0.56
 A    97    97   194   80  1.9  1.0   15   82 -1.9  0.9   17    95    99 0.825 0.845 0.842 0.828 0.835 0.835 1.525 1.089 0.744  67.0  32.0   1.0  0.67
 V   106   106   212   82  1.2  0.8   24   82 -1.2  0.8   24   106   106 0.774 0.774 0.774 0.774 0.774 0.774 1.187 0.871 0.691  64.6  35.4   0.0  0.55
 I    68    68   136   57  1.6  0.8   11   57 -1.6  0.7   11    68    68 0.838 0.838 0.838 0.838 0.838 0.838 1.128 0.890 0.810  66.9  31.6   1.5  0.68
 L    41    41    82   30  1.7  1.0   12   29 -1.7  1.0   11    42    40 0.732 0.707 0.714 0.725 0.720 0.720 1.507 1.123 0.638  62.2  37.8   0.0  0.44
 .....etc

Check ΔΔG Anti-symmetry in Ssym

sbg/bin/korpm SsymD.txt --dexp --dir Ssym --score_file pot/korp6Dv1.bin -o Ssym_dir.txt
sbg/bin/korpm SsymR.txt --dexp --dir Ssym --score_file pot/korp6Dv1.bin -o Ssym_rev.txt
paste Ssym_dir.txt  Ssym_rev.txt  > temp
awk 'function abs(x){return (x < 0) ? -x : x;} {printf "%s %s %s %s %s %s %s %f  %f %s %s\n",$1,$19, $2, $20, $10, $11,$29, ($11+$29), abs(($11+$29)), $3, $4  }' temp > KORPM_Ssym.txt

you can see the results in your favourite plot, for example in gnuplot:

plot  "KORPM_Ssym.txt" u 6:7
stat "KORPM_Ssym.txt" u 6:7
...
  Linear Model:       y = -0.8207 x + 0.03672
  Slope:              -0.8207 +- 0.02468
  Intercept:          0.03672 +- 0.03807
  Correlation:        r = -0.8745
...

Comparative results Ssym

Here you can find some comparative results with state-of-the-art stability prediction programs:

METHOD	RMSE	MAE	PCC	Sc	Ob1	Ob2	TPR	TNR	PPV	NPV	ACC	MCC	AROC	APRC
KORPM	1.33	0.97	0.69	64.6	34.6	0.7	0.77	0.79	0.79	0.78	0.78	0.56	0.86	0.86
ACDCNN	1.38	1.01	0.69	61.5	38.1	0.0	0.70	0.70	0.70	0.70	0.70	0.40	0.80	0.80
FoldX	1.86	1.29	0.54	60.1	34.5	5.4	0.55	0.78	0.71	0.63	0.66	0.33	0.74	0.75
EvoFF	1.56	1.12	0.54	61.7	34.9	3.4	0.61	0.70	0.67	0.64	0.66	0.31	0.74	0.75
PopMusic-S	1.58	1.15	0.52	56.6	42.4	1.0	0.67	0.71	0.70	0.68	0.69	0.38	0.76	0.74
Dynamut	1.88	1.37	0.38	54.4	38.2	7.5	0.21	0.88	0.64	0.53	0.55	0.13	0.62	0.62
DDGun3D	1.43	1.04	0.63	61.8	37.4	0.7	0.68	0.69	0.69	0.69	0.69	0.37	0.75	0.76
ThermoNet	1.53	1.09	0.55	58.2	40.9	0.9	0.65	0.70	0.69	0.67	0.68	0.35	0.75	0.74
Cartddg	3.44	2.63	0.63	52.3	41.1	6.6	0.58	0.87	0.82	0.67	0.73	0.47	0.81	0.82

you can find complete results in Ssym_results

Results with S461

you can find complete results in S461_results

Appendix.

Corrections of original Ssym dataset based on ThermoMutDB data.

PDB	Mutation	Original	Corrected	Medline References from ThermoMutDB
1BNI	IA96V	-3.1	-0.9	2669964 (-0.90); 1569557 (0.95); 9551101 (-0.80)
1BNI	SA91A	-2.4	-1.8	14516751
1L63	SA44T	0.0	0.01	8289284
1L63	SA38N	0.0	-0.01	1911773
1L63	LA91A	-3.9	-2.6	10545167
1L63	AA130S	1.0	-1.0	8218201
1LZ1	VA2G	-1.3	-2.29	11087397
1LZ1	VA2L	0.3	-0.05	11087397
1LZ1	IA56T	-4.3	-3.6	9010773; 10556244
1LZ1	VA74I	-1.9	0.45	11087397
1LZ1	VA74L	-0.4	0.19	11087397
1LZ1	VA74M	-0.4	0.65	11927576; 11087397
1LZ1	VA110G	-2.2	0.48	11927576; 11087397
1LZ1	VA110I	-0.8	0.86	11927576; 11087397
1LZ1	VA110F	-1.9	-0.05	11927576; 11087397
2LZM	IA3C	0.0	1.2	3405287
2LZM	RA119E	0.0	-0.04	1942034
4LYZ	GA49A	-0.7	-1.9	11112507; 8771183
4LYZ	GA71A	-2.1	-0.38	11112507
4LYZ	GA102A	-1.2	0.02	11112507
4LYZ	GA117A	-0.8	-1.46	11112507
1RN1	QC25K	1.4	0.93	2663837
2LZM	RA96K	0.0	-0.001	Avoiding zero for the binary classification
1L63	SA44E	0.0	0.001	Avoiding zero for the binary classification
2LZM	KA60P	0.0	-0.001	Avoiding zero for the binary classification

Funding

This work was supported by Spanish grants PID2019-109041GB-C21/AEI/10.13039/501100011033

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

korpm

Usage

Source Code

ΔΔG Curated Databases

Results with Ssym

Check ΔΔG Anti-symmetry in Ssym

Comparative results Ssym

Results with S461

Appendix.

Funding

About

Releases

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
Db		Db
S461		S461
S461_results		S461_results
Ssym		Ssym
Ssym_results		Ssym_results
images		images
opt		opt
pot		pot
removed		removed
sbg		sbg
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
Id25c03_1merNCL.txt		Id25c03_1merNCL.txt
Id25c03_1merNCLB.txt		Id25c03_1merNCLB.txt
Id25c03_1merNCLB_S461.txt		Id25c03_1merNCLB_S461.txt
Id25c03_1merNCLB_Ssym.txt		Id25c03_1merNCLB_Ssym.txt
Id25c03_1merNCL_Ssym.txt		Id25c03_1merNCL_Ssym.txt
KORPM_Ssym.txt		KORPM_Ssym.txt
LICENSE		LICENSE
README.md		README.md
S461.txt		S461.txt
Ssym.txt		Ssym.txt
SsymD.txt		SsymD.txt
SsymR.txt		SsymR.txt
input.txt		input.txt

License

chaconlab/korpm

Folders and files

Latest commit

History

Repository files navigation

korpm

Usage

Source Code

ΔΔG Curated Databases

Results with Ssym

Check ΔΔG Anti-symmetry in Ssym

Comparative results Ssym

Results with S461

Appendix.

Funding

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages