# The Kinase Library

In [1]:
import kinase_library as kl

import pandas as pd

## PhosphoProteomics object

### Creating PhosphoProteomics object

Reading example file containing list of phosphosites

In [2]:
phosphosites_data = pd.read_csv('./test_files/pps_data.tsv', sep='\t')
phosphosites_data.head()

Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window
0,Q15149,PLEC,PLEC,Plectin,113,T,1.0,MVMPARRtPHVQAVQ
1,O43865,SAHH2,AHCYL1,S-adenosylhomocysteine hydrolase-like protein 1,29,S,0.911752,EDAEKysFMATVT
2,Q8WX93,PALLD,PALLD,Palladin,35,S,0.999997,PGLsAFLSQEEINKS
3,Q96NY7,CLIC6,CLIC6,Chloride intracellular channel protein 6,322,S,1.0,AGESAGRsPG_____
4,Q02790,FKBP4,FKBP4,Peptidyl-prolyl cis-trans isomerase FKBP4,336,S,0.999938,PDRRLGKLKLQAFsAXXESCHCGGPSA


Creating PhosphoProteomics object using this data.

Please note the following requirments for a sequence to be valid:
* Must be of odd length (in order to have a central amino acid)
* Central amino acid must be a valid phosphoacceptor (S/s, T/s, or Y/y, case insensitive)
* All peripheral amino acids must by capitalized except for S/T/Y, in which lowercase will indicate phosphopriming (upon flagging _pp_ as _True_, default is _False_)
* Truncation or padding can be indicated using '_' [_row 4_]
* Missing amino acids can be indicated by 'X' (will be treated as a random amino acids) [_row 5_]

Each valid sequence will be processed through the following steps:
1. Capitalize peripheral s/t/y if phosphopriming is False (default) [rows 2 & 3]
2. Lowercase central phosphoacceptor [row 3]
3. Generate 15-mer by trimming long sequences ([row 5]) or padding short sequences [row 2]

New columns for processed sequences (_Sequence_) and phospho-residue (_phos_res_) will be added.

In [3]:
pps = kl.PhosphoProteomics(phosphosites_data, seq_col='sequence window')
pps.data.head()

4 entries were omitted due to empty value in the substrates column.
5 entries were omitted due to invalid amino acids or characters.
2 entries were omitted due to even length (no central position).
Use the 'omited_entries' attribute to view dropped enteries due to invalid sequences.


Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window,phos_res,Sequence
0,Q15149,PLEC,PLEC,Plectin,113,T,1.0,MVMPARRtPHVQAVQ,t,MVMPARRtPHVQAVQ
1,O43865,SAHH2,AHCYL1,S-adenosylhomocysteine hydrolase-like protein 1,29,S,0.911752,EDAEKysFMATVT,s,_EDAEKYsFMATVT_
2,Q8WX93,PALLD,PALLD,Palladin,35,S,0.999997,PGLsAFLSQEEINKS,s,PGLSAFLsQEEINKS
3,Q96NY7,CLIC6,CLIC6,Chloride intracellular channel protein 6,322,S,1.0,AGESAGRsPG_____,s,AGESAGRsPG_____
4,Q02790,FKBP4,FKBP4,Peptidyl-prolyl cis-trans isomerase FKBP4,336,S,0.999938,PDRRLGKLKLQAFsAXXESCHCGGPSA,s,KLKLQAFsAXXESCH


If phosphopriming is set to be True, peripheral s/t/y will be kept lowercase and be treated as phosphorylated [_rows 2 & 3_]

In [4]:
pps_pp = kl.PhosphoProteomics(phosphosites_data, seq_col='sequence window', pp=True)
pps_pp.data.head()

4 entries were omitted due to empty value in the substrates column.
5 entries were omitted due to invalid amino acids or characters.
2 entries were omitted due to even length (no central position).
Use the 'omited_entries' attribute to view dropped enteries due to invalid sequences.


Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window,phos_res,Sequence
0,Q15149,PLEC,PLEC,Plectin,113,T,1.0,MVMPARRtPHVQAVQ,t,MVMPARRtPHVQAVQ
1,O43865,SAHH2,AHCYL1,S-adenosylhomocysteine hydrolase-like protein 1,29,S,0.911752,EDAEKysFMATVT,s,_EDAEKysFMATVT_
2,Q8WX93,PALLD,PALLD,Palladin,35,S,0.999997,PGLsAFLSQEEINKS,s,PGLsAFLsQEEINKS
3,Q96NY7,CLIC6,CLIC6,Chloride intracellular channel protein 6,322,S,1.0,AGESAGRsPG_____,s,AGESAGRsPG_____
4,Q02790,FKBP4,FKBP4,Peptidyl-prolyl cis-trans isomerase FKBP4,336,S,0.999938,PDRRLGKLKLQAFsAXXESCHCGGPSA,s,KLKLQAFsAXXESCH


Omited entries (invalid or missing sequences) can be accessed using the _omited_entries_ atrribute

In [5]:
pps.omited_entries

Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window
26,Q9H2F5,EPC1,EPC1,Enhancer of polycomb homolog 1,492,S,0.982631,MLSSPQHkPVNQFAN
67,Q70EL1,UBP54,USP54,Inactive ubiquitin carboxyl-terminal hydrolase 54,669,S,0.812547,YESSORNsSSPVSLD
338,Q13009,TIAM1,TIAM1,T-lymphoma invasion and metastasis-inducing pr...,224,S,0.994002,
627,Q08380,LG3BP,LGALS3BP,Galectin-3-binding protein,444,S,0.999801,GPLVKYSsDYfQAPS
976,O95639,CPSF4,CPSF4,Cleavage and polyadenylation specificity facto...,211,S,0.998086,iqltsqnsspnqqrt
1101,Q99490,AGAP2,AGAP2,"Arf-GAP with GTPase, ANK repeat and PH domain-...",818,S,0.970636,CTPSGDLsPLSREP
1170,Q99952,PTN18,PTPN18,Tyrosine-protein phosphatase non-receptor type 18,341,S,0.985081,GVLRSISVPGSP
1256,Q9Y483,MTF2,MTF2,Metal-response element-binding transcription f...,401,S,1.0,
1397,Q08499,PDE4D,PDE4D,"cAMP-specific 3',5'-cyclic phosphodiesterase 4D",190,S,1.0,
1412,Q96A08,H2B1A,HIST1H2BA,Histone H2B type 1-A,117,T,0.999963,KHAVSEGtKJVTKYT


Once PhosphoProteomics object was created, Ser/Thr and Tyr substrates can be accessed separately

In [6]:
print('Ser/Thr data:')
pps.ser_thr_data

Ser/Thr data:


Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window,phos_res,Sequence
0,Q15149,PLEC,PLEC,Plectin,113,T,1.000000,MVMPARRtPHVQAVQ,t,MVMPARRtPHVQAVQ
1,O43865,SAHH2,AHCYL1,S-adenosylhomocysteine hydrolase-like protein 1,29,S,0.911752,EDAEKysFMATVT,s,_EDAEKYsFMATVT_
2,Q8WX93,PALLD,PALLD,Palladin,35,S,0.999997,PGLsAFLSQEEINKS,s,PGLSAFLsQEEINKS
3,Q96NY7,CLIC6,CLIC6,Chloride intracellular channel protein 6,322,S,1.000000,AGESAGRsPG_____,s,AGESAGRsPG_____
4,Q02790,FKBP4,FKBP4,Peptidyl-prolyl cis-trans isomerase FKBP4,336,S,0.999938,PDRRLGKLKLQAFsAXXESCHCGGPSA,s,KLKLQAFsAXXESCH
...,...,...,...,...,...,...,...,...,...,...
5281,Q6ZV89,SH2D5,SH2D5,SH2 domain-containing protein 5,415,S,0.999887,RPLSHAKsEAELQGL,s,RPLSHAKsEAELQGL
5282,Q92835,SHIP1,INPP5D,"Phosphatidylinositol 3,4,5-trisphosphate 5-pho...",243,S,1.000000,RLFDQQLsPGLRPRP,s,RLFDQQLsPGLRPRP
5283,Q5SXM2,SNPC4,SNAPC4,snRNA-activating protein complex subunit 4,1163,S,0.971390,PTHALSQsPAEADGS,s,PTHALSQsPAEADGS
5284,O15047,SET1A,SETD1A,Histone-lysine N-methyltransferase SETD1A,1167,T,0.927260,PPKKRRKtVSFSAIE,t,PPKKRRKtVSFSAIE


In [7]:
print('Tyrosine data:')
pps.tyrosine_data

Tyrosine data:


Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window,phos_res,Sequence
7,P42768,WASP,WAS,Wiskott-Aldrich syndrome protein,291,Y,1.000000,AETSKLIyDFIEDQG,y,AETSKLIyDFIEDQG
18,Q96PK6,RBM14,RBM14,RNA-binding protein 14,226,Y,0.787879,RSPPRASyVAPLTAQ,y,RSPPRASyVAPLTAQ
22,Q96A65,EXOC4,EXOC4,Exocyst complex component 4,247,Y,0.999846,KFLDTSHySTAGSSS,y,KFLDTSHySTAGSSS
45,Q9Y2H5,PKHA6,PLEKHA6,Pleckstrin homology domain-containing family A...,890,Y,0.998937,EEPGGHAyETPREEI,y,EEPGGHAyETPREEI
58,Q16625,OCLN,OCLN,Occludin,368,Y,0.999580,DDFRQPRySSGGNFE,y,DDFRQPRySSGGNFE
...,...,...,...,...,...,...,...,...,...,...
5210,Q9UKX2,MYH2,MYH2,Myosin-2,362,Y,1.000000,LTGAVMHyGNLKFKQ,y,LTGAVMHyGNLKFKQ
5230,Q7L9L4,MOB1B,MOB1B,MOB kinase activator 1B,26,Y,0.999974,IPEGSHQyELLKHAE,y,IPEGSHQyELLKHAE
5246,Q8WZ42,TITIN,TTN,Titin,17224,Y,1.000000,RIMAQNKyGIGEPLD,y,RIMAQNKyGIGEPLD
5248,O95295,SNAPN,SNAPIN,SNARE-associated protein Snapin,129,Y,0.809142,AMLDSGIyPPGSPGK,y,AMLDSGIyPPGSPGK


### Scoring functions

The list of sites can be scored by kinases from the library

In [8]:
pps.score(kin_type='ser_thr')

Scoring 4944 ser_thr substrates


Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window,phos_res,Sequence,...,VRK2,WNK1,WNK2,WNK3,WNK4,YANK2,YANK3,YSK1,YSK4,ZAK
0,Q15149,PLEC,PLEC,Plectin,113,T,1.000000,MVMPARRtPHVQAVQ,t,MVMPARRtPHVQAVQ,...,-1.2831,-1.6019,-0.2843,-1.4558,-1.3695,-0.1377,1.3807,-2.0277,-3.0038,-1.2099
1,O43865,SAHH2,AHCYL1,S-adenosylhomocysteine hydrolase-like protein 1,29,S,0.911752,EDAEKysFMATVT,s,_EDAEKYsFMATVT_,...,-1.8997,-2.6232,-2.1530,-1.2788,-2.2559,0.0987,0.8019,-3.1612,-1.4313,-1.4806
2,Q8WX93,PALLD,PALLD,Palladin,35,S,0.999997,PGLsAFLSQEEINKS,s,PGLSAFLsQEEINKS,...,-2.6336,-1.1080,-3.2351,-1.8514,-2.4487,-0.3170,-0.4161,-6.7561,-2.5663,-4.8984
3,Q96NY7,CLIC6,CLIC6,Chloride intracellular channel protein 6,322,S,1.000000,AGESAGRsPG_____,s,AGESAGRsPG_____,...,-2.6730,-0.5082,0.6517,-0.7630,-0.3754,-4.1580,-1.6566,-3.8283,-3.2997,-2.8392
4,Q02790,FKBP4,FKBP4,Peptidyl-prolyl cis-trans isomerase FKBP4,336,S,0.999938,PDRRLGKLKLQAFsAXXESCHCGGPSA,s,KLKLQAFsAXXESCH,...,-2.8351,0.1341,-0.6600,-1.3661,-1.7031,-2.6070,-1.5326,-4.1673,-2.2650,-3.0195
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5281,Q6ZV89,SH2D5,SH2D5,SH2 domain-containing protein 5,415,S,0.999887,RPLSHAKsEAELQGL,s,RPLSHAKsEAELQGL,...,-3.3449,-0.6231,-0.9168,-2.0343,-1.8260,-2.8039,-1.7089,-6.2347,-3.2095,-4.1923
5282,Q92835,SHIP1,INPP5D,"Phosphatidylinositol 3,4,5-trisphosphate 5-pho...",243,S,1.000000,RLFDQQLsPGLRPRP,s,RLFDQQLsPGLRPRP,...,-3.1351,-2.8528,-3.2625,-1.6392,-3.6180,-3.8918,-2.4439,-4.6395,-2.8506,-3.0886
5283,Q5SXM2,SNPC4,SNAPC4,snRNA-activating protein complex subunit 4,1163,S,0.971390,PTHALSQsPAEADGS,s,PTHALSQsPAEADGS,...,-3.1629,-2.8077,-2.8231,-3.1159,-3.8809,-4.5063,-2.6299,-6.4121,-4.1877,-4.0953
5284,O15047,SET1A,SETD1A,Histone-lysine N-methyltransferase SETD1A,1167,T,0.927260,PPKKRRKtVSFSAIE,t,PPKKRRKtVSFSAIE,...,-0.0654,4.9850,5.3882,3.9081,4.9151,2.5150,3.2086,2.1280,-2.0021,1.0018


The list can also be scored by only specific kinases

In [9]:
pps.score(kinases=['akt1', 'srpk3', 'braf', 'erk2'])

Scoring 4944 ser_thr substrates


Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window,phos_res,Sequence,AKT1,SRPK3,BRAF,ERK2
0,Q15149,PLEC,PLEC,Plectin,113,T,1.000000,MVMPARRtPHVQAVQ,t,MVMPARRtPHVQAVQ,0.9335,0.6739,-1.0577,3.8516
1,O43865,SAHH2,AHCYL1,S-adenosylhomocysteine hydrolase-like protein 1,29,S,0.911752,EDAEKysFMATVT,s,_EDAEKYsFMATVT_,-1.6461,-4.3370,-1.2551,-1.6412
2,Q8WX93,PALLD,PALLD,Palladin,35,S,0.999997,PGLsAFLSQEEINKS,s,PGLSAFLsQEEINKS,-5.1650,-3.5288,-3.3546,-4.2736
3,Q96NY7,CLIC6,CLIC6,Chloride intracellular channel protein 6,322,S,1.000000,AGESAGRsPG_____,s,AGESAGRsPG_____,-3.0101,1.3132,-3.8515,3.2396
4,Q02790,FKBP4,FKBP4,Peptidyl-prolyl cis-trans isomerase FKBP4,336,S,0.999938,PDRRLGKLKLQAFsAXXESCHCGGPSA,s,KLKLQAFsAXXESCH,-2.8495,-1.3791,-3.1009,-0.9275
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5281,Q6ZV89,SH2D5,SH2D5,SH2 domain-containing protein 5,415,S,0.999887,RPLSHAKsEAELQGL,s,RPLSHAKsEAELQGL,-3.1739,-3.2098,-2.8178,-4.1144
5282,Q92835,SHIP1,INPP5D,"Phosphatidylinositol 3,4,5-trisphosphate 5-pho...",243,S,1.000000,RLFDQQLsPGLRPRP,s,RLFDQQLsPGLRPRP,-3.4372,-0.0613,-3.7152,4.8716
5283,Q5SXM2,SNPC4,SNAPC4,snRNA-activating protein complex subunit 4,1163,S,0.971390,PTHALSQsPAEADGS,s,PTHALSQsPAEADGS,-5.5413,-1.2489,-2.5342,1.7482
5284,O15047,SET1A,SETD1A,Histone-lysine N-methyltransferase SETD1A,1167,T,0.927260,PPKKRRKtVSFSAIE,t,PPKKRRKtVSFSAIE,7.9483,2.6782,0.5657,-0.8746


If the PhosphoProteomics object was defined to contain phosphopriming, it will be scored as such:

In [10]:
pps_pp.score(kin_type='ser_thr')

Scoring 4944 ser_thr substrates


Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window,phos_res,Sequence,...,VRK2,WNK1,WNK2,WNK3,WNK4,YANK2,YANK3,YSK1,YSK4,ZAK
0,Q15149,PLEC,PLEC,Plectin,113,T,1.000000,MVMPARRtPHVQAVQ,t,MVMPARRtPHVQAVQ,...,-1.2831,-1.6019,-0.2843,-1.4558,-1.3695,-0.1377,1.3807,-2.0277,-3.0038,-1.2099
1,O43865,SAHH2,AHCYL1,S-adenosylhomocysteine hydrolase-like protein 1,29,S,0.911752,EDAEKysFMATVT,s,_EDAEKysFMATVT_,...,-1.7990,-2.5212,-1.8896,-1.5944,-2.7923,0.6638,1.4408,-3.7448,-1.4731,-1.7211
2,Q8WX93,PALLD,PALLD,Palladin,35,S,0.999997,PGLsAFLSQEEINKS,s,PGLsAFLsQEEINKS,...,-2.3471,-1.2164,-3.0918,-1.7550,-2.7371,1.2439,0.4747,-6.7709,-2.4648,-4.7334
3,Q96NY7,CLIC6,CLIC6,Chloride intracellular channel protein 6,322,S,1.000000,AGESAGRsPG_____,s,AGESAGRsPG_____,...,-2.6730,-0.5082,0.6517,-0.7630,-0.3754,-4.1580,-1.6566,-3.8283,-3.2997,-2.8392
4,Q02790,FKBP4,FKBP4,Peptidyl-prolyl cis-trans isomerase FKBP4,336,S,0.999938,PDRRLGKLKLQAFsAXXESCHCGGPSA,s,KLKLQAFsAXXESCH,...,-2.8351,0.1341,-0.6600,-1.3661,-1.7031,-2.6070,-1.5326,-4.1673,-2.2650,-3.0195
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5281,Q6ZV89,SH2D5,SH2D5,SH2 domain-containing protein 5,415,S,0.999887,RPLSHAKsEAELQGL,s,RPLSHAKsEAELQGL,...,-3.3449,-0.6231,-0.9168,-2.0343,-1.8260,-2.8039,-1.7089,-6.2347,-3.2095,-4.1923
5282,Q92835,SHIP1,INPP5D,"Phosphatidylinositol 3,4,5-trisphosphate 5-pho...",243,S,1.000000,RLFDQQLsPGLRPRP,s,RLFDQQLsPGLRPRP,...,-3.1351,-2.8528,-3.2625,-1.6392,-3.6180,-3.8918,-2.4439,-4.6395,-2.8506,-3.0886
5283,Q5SXM2,SNPC4,SNAPC4,snRNA-activating protein complex subunit 4,1163,S,0.971390,PTHALSQsPAEADGS,s,PTHALSQsPAEADGS,...,-3.1629,-2.8077,-2.8231,-3.1159,-3.8809,-4.5063,-2.6299,-6.4121,-4.1877,-4.0953
5284,O15047,SET1A,SETD1A,Histone-lysine N-methyltransferase SETD1A,1167,T,0.927260,PPKKRRKtVSFSAIE,t,PPKKRRKtVSFSAIE,...,-0.0654,4.9850,5.3882,3.9081,4.9151,2.5150,3.2086,2.1280,-2.0021,1.0018


To calculate percentile-score:

In [11]:
pps.percentile('ser_thr')

Scoring 4944 ser_thr substrates
Calculating percentile for 4944 ser_thr substrates


100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 309/309 [00:02<00:00, 120.87it/s]


Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window,phos_res,Sequence,...,VRK2,WNK1,WNK2,WNK3,WNK4,YANK2,YANK3,YSK1,YSK4,ZAK
0,Q15149,PLEC,PLEC,Plectin,113,T,1.000000,MVMPARRtPHVQAVQ,t,MVMPARRtPHVQAVQ,...,82.72,53.77,75.96,60.89,68.25,86.29,93.34,80.44,32.16,80.90
1,O43865,SAHH2,AHCYL1,S-adenosylhomocysteine hydrolase-like protein 1,29,S,0.911752,EDAEKysFMATVT,s,_EDAEKYsFMATVT_,...,67.85,35.73,39.60,64.10,53.20,89.10,88.53,63.84,71.21,76.87
2,Q8WX93,PALLD,PALLD,Palladin,35,S,0.999997,PGLsAFLSQEEINKS,s,PGLSAFLsQEEINKS,...,39.75,62.52,19.30,53.33,49.78,83.88,69.06,11.73,44.08,6.80
3,Q96NY7,CLIC6,CLIC6,Chloride intracellular channel protein 6,322,S,1.000000,AGESAGRsPG_____,s,AGESAGRsPG_____,...,38.02,72.36,87.63,73.07,82.37,12.22,35.63,52.68,24.36,47.80
4,Q02790,FKBP4,FKBP4,Peptidyl-prolyl cis-trans isomerase FKBP4,336,S,0.999938,PDRRLGKLKLQAFsAXXESCHCGGPSA,s,KLKLQAFsAXXESCH,...,31.09,81.20,69.73,62.54,62.74,38.66,39.17,46.81,52.24,43.29
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5281,Q6ZV89,SH2D5,SH2D5,SH2 domain-containing protein 5,415,S,0.999887,RPLSHAKsEAELQGL,s,RPLSHAKsEAELQGL,...,12.66,70.62,65.12,49.75,60.64,34.54,34.14,16.68,26.65,16.56
5282,Q92835,SHIP1,INPP5D,"Phosphatidylinositol 3,4,5-trisphosphate 5-pho...",243,S,1.000000,RLFDQQLsPGLRPRP,s,RLFDQQLsPGLRPRP,...,19.35,31.93,18.81,57.48,29.78,15.46,15.92,39.20,36.18,41.50
5283,Q5SXM2,SNPC4,SNAPC4,snRNA-activating protein complex subunit 4,1163,S,0.971390,PTHALSQsPAEADGS,s,PTHALSQsPAEADGS,...,18.37,32.68,26.46,29.57,25.80,8.70,12.36,14.89,7.34,18.30
5284,O15047,SET1A,SETD1A,Histone-lysine N-methyltransferase SETD1A,1167,T,0.927260,PPKKRRKtVSFSAIE,t,PPKKRRKtVSFSAIE,...,96.87,99.96,99.95,99.90,99.98,99.66,99.31,99.56,58.84,97.76


Rank can also be calculated either based on score or percentile:

In [12]:
pps.rank(metric='score', kin_type='ser_thr')

Scoring 4944 ser_thr substrates


Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window,phos_res,Sequence,...,VRK2,WNK1,WNK2,WNK3,WNK4,YANK2,YANK3,YSK1,YSK4,ZAK
0,Q15149,PLEC,PLEC,Plectin,113,T,1.000000,MVMPARRtPHVQAVQ,t,MVMPARRtPHVQAVQ,...,165,189,112,178,173,105,55,206,249,159
1,O43865,SAHH2,AHCYL1,S-adenosylhomocysteine hydrolase-like protein 1,29,S,0.911752,EDAEKysFMATVT,s,_EDAEKYsFMATVT_,...,157,196,174,113,179,39,21,219,125,129
2,Q8WX93,PALLD,PALLD,Palladin,35,S,0.999997,PGLsAFLSQEEINKS,s,PGLSAFLsQEEINKS,...,134,75,158,105,121,49,52,288,128,228
3,Q96NY7,CLIC6,CLIC6,Chloride intracellular channel protein 6,322,S,1.000000,AGESAGRsPG_____,s,AGESAGRsPG_____,...,171,71,54,74,67,258,108,243,213,182
4,Q02790,FKBP4,FKBP4,Peptidyl-prolyl cis-trans isomerase FKBP4,336,S,0.999938,PDRRLGKLKLQAFsAXXESCHCGGPSA,s,KLKLQAFsAXXESCH,...,223,12,36,99,130,208,113,288,186,240
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5281,Q6ZV89,SH2D5,SH2D5,SH2 domain-containing protein 5,415,S,0.999887,RPLSHAKsEAELQGL,s,RPLSHAKsEAELQGL,...,215,85,96,146,137,177,130,294,203,241
5282,Q92835,SHIP1,INPP5D,"Phosphatidylinositol 3,4,5-trisphosphate 5-pho...",243,S,1.000000,RLFDQQLsPGLRPRP,s,RLFDQQLsPGLRPRP,...,175,158,181,98,209,226,140,266,157,171
5283,Q5SXM2,SNPC4,SNAPC4,snRNA-activating protein complex subunit 4,1163,S,0.971390,PTHALSQsPAEADGS,s,PTHALSQsPAEADGS,...,151,139,140,148,181,214,132,292,205,196
5284,O15047,SET1A,SETD1A,Histone-lysine N-methyltransferase SETD1A,1167,T,0.927260,PPKKRRKtVSFSAIE,t,PPKKRRKtVSFSAIE,...,199,60,53,81,64,123,101,128,263,170


Promiscuity index (how many kinases are predicted for each site) can be calculated based on score or percentile and a specific threshold

(_Note: recommended thresholds are:<br>
Ser/Thr percentile: 90<br>
Tyrosine percentile: 80<br>
Ser/Thr and Tyrosine score: 1 [although percentile promiscuity is more recommended]_

In [16]:
pps.promiscuity_index(kin_type='ser_thr', metric='percentile', threshold=90)

Sequence
MVMPARRtPHVQAVQ     61
_EDAEKYsFMATVT_     24
PGLSAFLsQEEINKS     22
AGESAGRsPG_____     25
KLKLQAFsAXXESCH      2
                  ... 
RPLSHAKsEAELQGL     34
RLFDQQLsPGLRPRP     11
PTHALSQsPAEADGS      4
PPKKRRKtVSFSAIE    189
TKDDEGAtPIKRRRV     30
Name: Percentile Promiscuity Index, Length: 4944, dtype: int64

The function _predict_ can be used to calculate all data together. Each kinase will have four columns:
1. _KIN_score_: score
2. _KIN_score_rank_: rank based on score
3. _KIN_percentile_: percentile
4. _KIN_percentile_rank_: rank base on percentile

Two additional columns will be _Score Promiscuity Index_ and _Percentile Promiscuity Index_ (see explanation above)

In [17]:
pps.predict(kin_type='ser_thr')

Scoring 4944 ser_thr substrates
Calculating percentile for 4944 ser_thr substrates


100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 309/309 [00:02<00:00, 118.89it/s]


Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window,phos_res,Sequence,...,YSK1_percentile,YSK1_percentile_rank,YSK4_score,YSK4_score_rank,YSK4_percentile,YSK4_percentile_rank,ZAK_score,ZAK_score_rank,ZAK_percentile,ZAK_percentile_rank
0,Q15149,PLEC,PLEC,Plectin,113,T,1.000000,MVMPARRtPHVQAVQ,t,MVMPARRtPHVQAVQ,...,80.44,130,-3.0038,249,32.16,244,-1.2099,159,80.90,128
1,O43865,SAHH2,AHCYL1,S-adenosylhomocysteine hydrolase-like protein 1,29,S,0.911752,EDAEKysFMATVT,s,_EDAEKYsFMATVT_,...,63.84,150,-1.4313,125,71.21,108,-1.4806,129,76.87,82
2,Q8WX93,PALLD,PALLD,Palladin,35,S,0.999997,PGLsAFLSQEEINKS,s,PGLSAFLsQEEINKS,...,11.73,250,-2.5663,128,44.08,118,-4.8984,228,6.80,290
3,Q96NY7,CLIC6,CLIC6,Chloride intracellular channel protein 6,322,S,1.000000,AGESAGRsPG_____,s,AGESAGRsPG_____,...,52.68,134,-3.2997,213,24.36,284,-2.8392,182,47.80,163
4,Q02790,FKBP4,FKBP4,Peptidyl-prolyl cis-trans isomerase FKBP4,336,S,0.999938,PDRRLGKLKLQAFsAXXESCHCGGPSA,s,KLKLQAFsAXXESCH,...,46.81,216,-2.2650,186,52.24,178,-3.0195,240,43.29,233
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5281,Q6ZV89,SH2D5,SH2D5,SH2 domain-containing protein 5,415,S,0.999887,RPLSHAKsEAELQGL,s,RPLSHAKsEAELQGL,...,16.68,266,-3.2095,203,26.65,229,-4.1923,241,16.56,267
5282,Q92835,SHIP1,INPP5D,"Phosphatidylinositol 3,4,5-trisphosphate 5-pho...",243,S,1.000000,RLFDQQLsPGLRPRP,s,RLFDQQLsPGLRPRP,...,39.20,174,-2.8506,157,36.18,194,-3.0886,171,41.50,170
5283,Q5SXM2,SNPC4,SNAPC4,snRNA-activating protein complex subunit 4,1163,S,0.971390,PTHALSQsPAEADGS,s,PTHALSQsPAEADGS,...,14.89,238,-4.1877,205,7.34,297,-4.0953,196,18.30,220
5284,O15047,SET1A,SETD1A,Histone-lysine N-methyltransferase SETD1A,1167,T,0.927260,PPKKRRKtVSFSAIE,t,PPKKRRKtVSFSAIE,...,99.56,73,-2.0021,263,58.84,243,1.0018,170,97.76,128


For tyrosine kinases, an option to include non-canonical kinases is available (annotated as KIN_TYR, see Yaron-Barir et al. 2024)

In [18]:
pps.percentile(kin_type='tyrosine', non_canonical=True)

Calculating percentile for 331 tyrosine substrates


100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 93/93 [00:00<00:00, 2091.53it/s]


Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window,phos_res,Sequence,...,TRKC,TXK,TYK2,TYRO3,VEGFR1,VEGFR2,VEGFR3,WEE1_TYR,YES,ZAP70
7,P42768,WASP,WAS,Wiskott-Aldrich syndrome protein,291,Y,1.000000,AETSKLIyDFIEDQG,y,AETSKLIyDFIEDQG,...,44.74,67.52,7.86,43.64,31.95,21.97,64.74,16.33,58.18,19.66
18,Q96PK6,RBM14,RBM14,RNA-binding protein 14,226,Y,0.787879,RSPPRASyVAPLTAQ,y,RSPPRASyVAPLTAQ,...,56.10,47.41,61.65,54.13,34.74,70.98,26.75,29.29,13.37,86.63
22,Q96A65,EXOC4,EXOC4,Exocyst complex component 4,247,Y,0.999846,KFLDTSHySTAGSSS,y,KFLDTSHySTAGSSS,...,78.66,68.33,46.53,81.18,38.92,40.93,33.82,51.37,54.43,69.53
45,Q9Y2H5,PKHA6,PLEKHA6,Pleckstrin homology domain-containing family A...,890,Y,0.998937,EEPGGHAyETPREEI,y,EEPGGHAyETPREEI,...,50.36,74.04,45.37,73.54,45.04,42.79,20.59,37.76,66.99,94.72
58,Q16625,OCLN,OCLN,Occludin,368,Y,0.999580,DDFRQPRySSGGNFE,y,DDFRQPRySSGGNFE,...,4.00,47.73,23.97,36.09,23.16,29.26,28.65,51.89,36.81,12.47
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5210,Q9UKX2,MYH2,MYH2,Myosin-2,362,Y,1.000000,LTGAVMHyGNLKFKQ,y,LTGAVMHyGNLKFKQ,...,76.01,45.83,88.40,54.56,60.39,82.23,70.61,82.86,60.65,23.68
5230,Q7L9L4,MOB1B,MOB1B,MOB kinase activator 1B,26,Y,0.999974,IPEGSHQyELLKHAE,y,IPEGSHQyELLKHAE,...,83.56,64.61,47.98,92.32,76.60,92.80,91.71,49.42,79.41,81.84
5246,Q8WZ42,TITIN,TTN,Titin,17224,Y,1.000000,RIMAQNKyGIGEPLD,y,RIMAQNKyGIGEPLD,...,15.34,25.21,20.79,49.09,4.22,12.93,15.37,39.40,70.30,49.13
5248,O95295,SNAPN,SNAPIN,SNARE-associated protein Snapin,129,Y,0.809142,AMLDSGIyPPGSPGK,y,AMLDSGIyPPGSPGK,...,40.77,67.13,27.64,45.35,20.31,22.91,22.98,45.81,59.99,61.11
