# The Kinase Library

In [1]:
import kinase_library as kl

import pandas as pd

## PhosphoProteomics object

### Creating PhosphoProteomics object

Reading example file containing list of phosphosites

In [2]:
phosphosites_data = pd.read_csv('./test_files/pps_data.tsv', sep='\t')
phosphosites_data.head()

Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window
0,Q15149,PLEC,PLEC,Plectin,113,T,1.0,MVMPARRtPHVQAVQ
1,O43865,SAHH2,AHCYL1,S-adenosylhomocysteine hydrolase-like protein 1,29,S,0.911752,EDAEKysFMATVT
2,Q8WX93,PALLD,PALLD,Palladin,35,S,0.999997,PGLsAFLSQEEINKS
3,Q96NY7,CLIC6,CLIC6,Chloride intracellular channel protein 6,322,S,1.0,AGESAGRsPG_____
4,Q02790,FKBP4,FKBP4,Peptidyl-prolyl cis-trans isomerase FKBP4,336,S,0.999938,PDRRLGKLKLQAFsAXXESCHCGGPSA


Creating PhosphoProteomics object using this data.

Please note the following requirments for a sequence to be valid:
* Must be of odd length (in order to have a central amino acid)
* Central amino acid must be a valid phosphoacceptor (S/s, T/s, or Y/y, case insensitive)
* All peripheral amino acids must by capitalized except for S/T/Y, in which lowercase will indicate phosphopriming (upon flagging _pp_ as _True_, default is _False_)
* Truncation or padding can be indicated using '_' [_row 4_]
* Missing amino acids can be indicated by 'X' (will be treated as a random amino acids) [_row 5_]

Each valid sequence will be processed through the following steps:
1. Capitalize peripheral s/t/y if phosphopriming is False (default) [rows 2 & 3]
2. Lowercase central phosphoacceptor [row 3]
3. Generate 15-mer by trimming long sequences ([row 5]) or padding short sequences [row 2]

New columns for processed sequences (_Sequence_) and phospho-residue (_phos_res_) will be added.

In [3]:
pps = kl.PhosphoProteomics(phosphosites_data, seq_col='sequence window')
pps.data.head()

4 entries were omitted due to empty value in the substrates column.
5 entries were omitted due to invalid amino acids or characters.
2 entries were omitted due to even length (no central position).
Use the 'omited_entries' attribute to view dropped enteries due to invalid sequences.


Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window,phos_res,Sequence
0,Q15149,PLEC,PLEC,Plectin,113,T,1.0,MVMPARRtPHVQAVQ,t,MVMPARRtPHVQAVQ
1,O43865,SAHH2,AHCYL1,S-adenosylhomocysteine hydrolase-like protein 1,29,S,0.911752,EDAEKysFMATVT,s,_EDAEKYsFMATVT_
2,Q8WX93,PALLD,PALLD,Palladin,35,S,0.999997,PGLsAFLSQEEINKS,s,PGLSAFLsQEEINKS
3,Q96NY7,CLIC6,CLIC6,Chloride intracellular channel protein 6,322,S,1.0,AGESAGRsPG_____,s,AGESAGRsPG_____
4,Q02790,FKBP4,FKBP4,Peptidyl-prolyl cis-trans isomerase FKBP4,336,S,0.999938,PDRRLGKLKLQAFsAXXESCHCGGPSA,s,KLKLQAFsAXXESCH


If phosphopriming is set to be True, peripheral s/t/y will be kept lowercase and be treated as phosphorylated [_rows 2 & 3_]

In [4]:
pps_pp = kl.PhosphoProteomics(phosphosites_data, seq_col='sequence window', pp=True)
pps_pp.data.head()

4 entries were omitted due to empty value in the substrates column.
5 entries were omitted due to invalid amino acids or characters.
2 entries were omitted due to even length (no central position).
Use the 'omited_entries' attribute to view dropped enteries due to invalid sequences.


Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window,phos_res,Sequence
0,Q15149,PLEC,PLEC,Plectin,113,T,1.0,MVMPARRtPHVQAVQ,t,MVMPARRtPHVQAVQ
1,O43865,SAHH2,AHCYL1,S-adenosylhomocysteine hydrolase-like protein 1,29,S,0.911752,EDAEKysFMATVT,s,_EDAEKysFMATVT_
2,Q8WX93,PALLD,PALLD,Palladin,35,S,0.999997,PGLsAFLSQEEINKS,s,PGLsAFLsQEEINKS
3,Q96NY7,CLIC6,CLIC6,Chloride intracellular channel protein 6,322,S,1.0,AGESAGRsPG_____,s,AGESAGRsPG_____
4,Q02790,FKBP4,FKBP4,Peptidyl-prolyl cis-trans isomerase FKBP4,336,S,0.999938,PDRRLGKLKLQAFsAXXESCHCGGPSA,s,KLKLQAFsAXXESCH


Omited entries (invalid or missing sequences) can be accessed using the _omited_entries_ atrribute

In [5]:
pps.omited_entries

Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window
26,Q9H2F5,EPC1,EPC1,Enhancer of polycomb homolog 1,492,S,0.982631,MLSSPQHkPVNQFAN
67,Q70EL1,UBP54,USP54,Inactive ubiquitin carboxyl-terminal hydrolase 54,669,S,0.812547,YESSORNsSSPVSLD
338,Q13009,TIAM1,TIAM1,T-lymphoma invasion and metastasis-inducing pr...,224,S,0.994002,
627,Q08380,LG3BP,LGALS3BP,Galectin-3-binding protein,444,S,0.999801,GPLVKYSsDYfQAPS
976,O95639,CPSF4,CPSF4,Cleavage and polyadenylation specificity facto...,211,S,0.998086,iqltsqnsspnqqrt
1101,Q99490,AGAP2,AGAP2,"Arf-GAP with GTPase, ANK repeat and PH domain-...",818,S,0.970636,CTPSGDLsPLSREP
1170,Q99952,PTN18,PTPN18,Tyrosine-protein phosphatase non-receptor type 18,341,S,0.985081,GVLRSISVPGSP
1256,Q9Y483,MTF2,MTF2,Metal-response element-binding transcription f...,401,S,1.0,
1397,Q08499,PDE4D,PDE4D,"cAMP-specific 3',5'-cyclic phosphodiesterase 4D",190,S,1.0,
1412,Q96A08,H2B1A,HIST1H2BA,Histone H2B type 1-A,117,T,0.999963,KHAVSEGtKJVTKYT


Once PhosphoProteomics object was created, Ser/Thr and Tyr substrates can be accessed separately

In [6]:
print('Ser/Thr data:')
pps.ser_thr_data

Ser/Thr data:


Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window,phos_res,Sequence
0,Q15149,PLEC,PLEC,Plectin,113,T,1.000000,MVMPARRtPHVQAVQ,t,MVMPARRtPHVQAVQ
1,O43865,SAHH2,AHCYL1,S-adenosylhomocysteine hydrolase-like protein 1,29,S,0.911752,EDAEKysFMATVT,s,_EDAEKYsFMATVT_
2,Q8WX93,PALLD,PALLD,Palladin,35,S,0.999997,PGLsAFLSQEEINKS,s,PGLSAFLsQEEINKS
3,Q96NY7,CLIC6,CLIC6,Chloride intracellular channel protein 6,322,S,1.000000,AGESAGRsPG_____,s,AGESAGRsPG_____
4,Q02790,FKBP4,FKBP4,Peptidyl-prolyl cis-trans isomerase FKBP4,336,S,0.999938,PDRRLGKLKLQAFsAXXESCHCGGPSA,s,KLKLQAFsAXXESCH
...,...,...,...,...,...,...,...,...,...,...
5281,Q6ZV89,SH2D5,SH2D5,SH2 domain-containing protein 5,415,S,0.999887,RPLSHAKsEAELQGL,s,RPLSHAKsEAELQGL
5282,Q92835,SHIP1,INPP5D,"Phosphatidylinositol 3,4,5-trisphosphate 5-pho...",243,S,1.000000,RLFDQQLsPGLRPRP,s,RLFDQQLsPGLRPRP
5283,Q5SXM2,SNPC4,SNAPC4,snRNA-activating protein complex subunit 4,1163,S,0.971390,PTHALSQsPAEADGS,s,PTHALSQsPAEADGS
5284,O15047,SET1A,SETD1A,Histone-lysine N-methyltransferase SETD1A,1167,T,0.927260,PPKKRRKtVSFSAIE,t,PPKKRRKtVSFSAIE


In [7]:
print('Tyrosine data:')
pps.tyrosine_data

Tyrosine data:


Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window,phos_res,Sequence
7,P42768,WASP,WAS,Wiskott-Aldrich syndrome protein,291,Y,1.000000,AETSKLIyDFIEDQG,y,AETSKLIyDFIEDQG
18,Q96PK6,RBM14,RBM14,RNA-binding protein 14,226,Y,0.787879,RSPPRASyVAPLTAQ,y,RSPPRASyVAPLTAQ
22,Q96A65,EXOC4,EXOC4,Exocyst complex component 4,247,Y,0.999846,KFLDTSHySTAGSSS,y,KFLDTSHySTAGSSS
45,Q9Y2H5,PKHA6,PLEKHA6,Pleckstrin homology domain-containing family A...,890,Y,0.998937,EEPGGHAyETPREEI,y,EEPGGHAyETPREEI
58,Q16625,OCLN,OCLN,Occludin,368,Y,0.999580,DDFRQPRySSGGNFE,y,DDFRQPRySSGGNFE
...,...,...,...,...,...,...,...,...,...,...
5210,Q9UKX2,MYH2,MYH2,Myosin-2,362,Y,1.000000,LTGAVMHyGNLKFKQ,y,LTGAVMHyGNLKFKQ
5230,Q7L9L4,MOB1B,MOB1B,MOB kinase activator 1B,26,Y,0.999974,IPEGSHQyELLKHAE,y,IPEGSHQyELLKHAE
5246,Q8WZ42,TITIN,TTN,Titin,17224,Y,1.000000,RIMAQNKyGIGEPLD,y,RIMAQNKyGIGEPLD
5248,O95295,SNAPN,SNAPIN,SNARE-associated protein Snapin,129,Y,0.809142,AMLDSGIyPPGSPGK,y,AMLDSGIyPPGSPGK


### Scoring functions

The list of sites can be scored by kinases from the library

In [8]:
pps.score(kin_type='ser_thr')

Scoring 4944 ser_thr substrates


Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window,phos_res,Sequence,...,VRK2,WNK1,WNK2,WNK3,WNK4,YANK2,YANK3,YSK1,YSK4,ZAK
0,Q15149,PLEC,PLEC,Plectin,113,T,1.000000,MVMPARRtPHVQAVQ,t,MVMPARRtPHVQAVQ,...,-1.283,-1.602,-0.285,-1.456,-1.370,-0.138,1.381,-2.028,-3.004,-1.210
1,O43865,SAHH2,AHCYL1,S-adenosylhomocysteine hydrolase-like protein 1,29,S,0.911752,EDAEKysFMATVT,s,_EDAEKYsFMATVT_,...,-1.900,-2.623,-2.153,-1.279,-2.256,0.099,0.802,-3.161,-1.431,-1.481
2,Q8WX93,PALLD,PALLD,Palladin,35,S,0.999997,PGLsAFLSQEEINKS,s,PGLSAFLsQEEINKS,...,-2.634,-1.108,-3.235,-1.851,-2.449,-0.317,-0.416,-6.756,-2.567,-4.899
3,Q96NY7,CLIC6,CLIC6,Chloride intracellular channel protein 6,322,S,1.000000,AGESAGRsPG_____,s,AGESAGRsPG_____,...,-2.673,-0.508,0.652,-0.763,-0.376,-4.158,-1.657,-3.828,-3.300,-2.839
4,Q02790,FKBP4,FKBP4,Peptidyl-prolyl cis-trans isomerase FKBP4,336,S,0.999938,PDRRLGKLKLQAFsAXXESCHCGGPSA,s,KLKLQAFsAXXESCH,...,-2.835,0.134,-0.660,-1.366,-1.704,-2.607,-1.533,-4.167,-2.265,-3.020
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5281,Q6ZV89,SH2D5,SH2D5,SH2 domain-containing protein 5,415,S,0.999887,RPLSHAKsEAELQGL,s,RPLSHAKsEAELQGL,...,-3.345,-0.623,-0.917,-2.034,-1.826,-2.804,-1.709,-6.235,-3.210,-4.192
5282,Q92835,SHIP1,INPP5D,"Phosphatidylinositol 3,4,5-trisphosphate 5-pho...",243,S,1.000000,RLFDQQLsPGLRPRP,s,RLFDQQLsPGLRPRP,...,-3.135,-2.853,-3.262,-1.639,-3.618,-3.892,-2.444,-4.640,-2.851,-3.089
5283,Q5SXM2,SNPC4,SNAPC4,snRNA-activating protein complex subunit 4,1163,S,0.971390,PTHALSQsPAEADGS,s,PTHALSQsPAEADGS,...,-3.163,-2.808,-2.823,-3.116,-3.881,-4.506,-2.630,-6.412,-4.188,-4.095
5284,O15047,SET1A,SETD1A,Histone-lysine N-methyltransferase SETD1A,1167,T,0.927260,PPKKRRKtVSFSAIE,t,PPKKRRKtVSFSAIE,...,-0.065,4.985,5.388,3.908,4.915,2.515,3.209,2.128,-2.002,1.002


The list can also be scored by only specific kinases

In [9]:
pps.score(kinases=['akt1', 'srpk3', 'braf', 'erk2'])

Scoring 4944 ser_thr substrates


Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window,phos_res,Sequence,AKT1,SRPK3,BRAF,ERK2
0,Q15149,PLEC,PLEC,Plectin,113,T,1.000000,MVMPARRtPHVQAVQ,t,MVMPARRtPHVQAVQ,0.933,0.674,-1.058,3.852
1,O43865,SAHH2,AHCYL1,S-adenosylhomocysteine hydrolase-like protein 1,29,S,0.911752,EDAEKysFMATVT,s,_EDAEKYsFMATVT_,-1.646,-4.337,-1.255,-1.641
2,Q8WX93,PALLD,PALLD,Palladin,35,S,0.999997,PGLsAFLSQEEINKS,s,PGLSAFLsQEEINKS,-5.165,-3.529,-3.355,-4.274
3,Q96NY7,CLIC6,CLIC6,Chloride intracellular channel protein 6,322,S,1.000000,AGESAGRsPG_____,s,AGESAGRsPG_____,-3.010,1.313,-3.852,3.240
4,Q02790,FKBP4,FKBP4,Peptidyl-prolyl cis-trans isomerase FKBP4,336,S,0.999938,PDRRLGKLKLQAFsAXXESCHCGGPSA,s,KLKLQAFsAXXESCH,-2.850,-1.379,-3.101,-0.928
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5281,Q6ZV89,SH2D5,SH2D5,SH2 domain-containing protein 5,415,S,0.999887,RPLSHAKsEAELQGL,s,RPLSHAKsEAELQGL,-3.174,-3.210,-2.818,-4.114
5282,Q92835,SHIP1,INPP5D,"Phosphatidylinositol 3,4,5-trisphosphate 5-pho...",243,S,1.000000,RLFDQQLsPGLRPRP,s,RLFDQQLsPGLRPRP,-3.437,-0.061,-3.715,4.872
5283,Q5SXM2,SNPC4,SNAPC4,snRNA-activating protein complex subunit 4,1163,S,0.971390,PTHALSQsPAEADGS,s,PTHALSQsPAEADGS,-5.541,-1.249,-2.534,1.748
5284,O15047,SET1A,SETD1A,Histone-lysine N-methyltransferase SETD1A,1167,T,0.927260,PPKKRRKtVSFSAIE,t,PPKKRRKtVSFSAIE,7.948,2.678,0.566,-0.874


If the PhosphoProteomics object was defined to contain phosphopriming, it will be scored as such:

In [10]:
pps_pp.score(kin_type='ser_thr')

Scoring 4944 ser_thr substrates


Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window,phos_res,Sequence,...,VRK2,WNK1,WNK2,WNK3,WNK4,YANK2,YANK3,YSK1,YSK4,ZAK
0,Q15149,PLEC,PLEC,Plectin,113,T,1.000000,MVMPARRtPHVQAVQ,t,MVMPARRtPHVQAVQ,...,-1.283,-1.602,-0.285,-1.456,-1.370,-0.138,1.381,-2.028,-3.004,-1.210
1,O43865,SAHH2,AHCYL1,S-adenosylhomocysteine hydrolase-like protein 1,29,S,0.911752,EDAEKysFMATVT,s,_EDAEKysFMATVT_,...,-1.799,-2.521,-1.890,-1.594,-2.793,0.664,1.441,-3.745,-1.473,-1.721
2,Q8WX93,PALLD,PALLD,Palladin,35,S,0.999997,PGLsAFLSQEEINKS,s,PGLsAFLsQEEINKS,...,-2.347,-1.216,-3.092,-1.755,-2.737,1.244,0.475,-6.771,-2.465,-4.733
3,Q96NY7,CLIC6,CLIC6,Chloride intracellular channel protein 6,322,S,1.000000,AGESAGRsPG_____,s,AGESAGRsPG_____,...,-2.673,-0.508,0.652,-0.763,-0.376,-4.158,-1.657,-3.828,-3.300,-2.839
4,Q02790,FKBP4,FKBP4,Peptidyl-prolyl cis-trans isomerase FKBP4,336,S,0.999938,PDRRLGKLKLQAFsAXXESCHCGGPSA,s,KLKLQAFsAXXESCH,...,-2.835,0.134,-0.660,-1.366,-1.704,-2.607,-1.533,-4.167,-2.265,-3.020
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5281,Q6ZV89,SH2D5,SH2D5,SH2 domain-containing protein 5,415,S,0.999887,RPLSHAKsEAELQGL,s,RPLSHAKsEAELQGL,...,-3.345,-0.623,-0.917,-2.034,-1.826,-2.804,-1.709,-6.235,-3.210,-4.192
5282,Q92835,SHIP1,INPP5D,"Phosphatidylinositol 3,4,5-trisphosphate 5-pho...",243,S,1.000000,RLFDQQLsPGLRPRP,s,RLFDQQLsPGLRPRP,...,-3.135,-2.853,-3.262,-1.639,-3.618,-3.892,-2.444,-4.640,-2.851,-3.089
5283,Q5SXM2,SNPC4,SNAPC4,snRNA-activating protein complex subunit 4,1163,S,0.971390,PTHALSQsPAEADGS,s,PTHALSQsPAEADGS,...,-3.163,-2.808,-2.823,-3.116,-3.881,-4.506,-2.630,-6.412,-4.188,-4.095
5284,O15047,SET1A,SETD1A,Histone-lysine N-methyltransferase SETD1A,1167,T,0.927260,PPKKRRKtVSFSAIE,t,PPKKRRKtVSFSAIE,...,-0.065,4.985,5.388,3.908,4.915,2.515,3.209,2.128,-2.002,1.002


To calculate percentile-score:

In [11]:
pps.percentile('ser_thr')

Scoring 4944 ser_thr substrates
Calculating percentile for 4944 ser_thr substrates


100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 309/309 [00:02<00:00, 107.39it/s]


Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window,phos_res,Sequence,...,VRK2,WNK1,WNK2,WNK3,WNK4,YANK2,YANK3,YSK1,YSK4,ZAK
0,Q15149,PLEC,PLEC,Plectin,113,T,1.000000,MVMPARRtPHVQAVQ,t,MVMPARRtPHVQAVQ,...,82.73,53.78,75.96,60.89,68.25,86.30,93.35,80.44,32.17,80.90
1,O43865,SAHH2,AHCYL1,S-adenosylhomocysteine hydrolase-like protein 1,29,S,0.911752,EDAEKysFMATVT,s,_EDAEKYsFMATVT_,...,67.85,35.74,39.60,64.10,53.22,89.11,88.54,63.85,71.22,76.87
2,Q8WX93,PALLD,PALLD,Palladin,35,S,0.999997,PGLsAFLSQEEINKS,s,PGLSAFLsQEEINKS,...,39.75,62.52,19.31,53.35,49.79,83.90,69.07,11.73,44.07,6.80
3,Q96NY7,CLIC6,CLIC6,Chloride intracellular channel protein 6,322,S,1.000000,AGESAGRsPG_____,s,AGESAGRsPG_____,...,38.04,72.36,87.64,73.08,82.37,12.23,35.63,52.69,24.37,47.81
4,Q02790,FKBP4,FKBP4,Peptidyl-prolyl cis-trans isomerase FKBP4,336,S,0.999938,PDRRLGKLKLQAFsAXXESCHCGGPSA,s,KLKLQAFsAXXESCH,...,31.12,81.20,69.74,62.55,62.74,38.67,39.17,46.82,52.25,43.29
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5281,Q6ZV89,SH2D5,SH2D5,SH2 domain-containing protein 5,415,S,0.999887,RPLSHAKsEAELQGL,s,RPLSHAKsEAELQGL,...,12.67,70.63,65.12,49.76,60.65,34.55,34.14,16.68,26.65,16.57
5282,Q92835,SHIP1,INPP5D,"Phosphatidylinositol 3,4,5-trisphosphate 5-pho...",243,S,1.000000,RLFDQQLsPGLRPRP,s,RLFDQQLsPGLRPRP,...,19.38,31.93,18.83,57.49,29.80,15.47,15.92,39.20,36.19,41.50
5283,Q5SXM2,SNPC4,SNAPC4,snRNA-activating protein complex subunit 4,1163,S,0.971390,PTHALSQsPAEADGS,s,PTHALSQsPAEADGS,...,18.39,32.68,26.47,29.58,25.82,8.71,12.37,14.90,7.35,18.32
5284,O15047,SET1A,SETD1A,Histone-lysine N-methyltransferase SETD1A,1167,T,0.927260,PPKKRRKtVSFSAIE,t,PPKKRRKtVSFSAIE,...,96.87,99.96,99.95,99.90,99.98,99.66,99.31,99.56,58.85,97.76


Rank can also be calculated either based on score or percentile:

In [12]:
pps.rank(metric='score', kin_type='ser_thr')

Scoring 4944 ser_thr substrates


Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window,phos_res,Sequence,...,VRK2,WNK1,WNK2,WNK3,WNK4,YANK2,YANK3,YSK1,YSK4,ZAK
0,Q15149,PLEC,PLEC,Plectin,113,T,1.000000,MVMPARRtPHVQAVQ,t,MVMPARRtPHVQAVQ,...,165,189,112,178,173,105,55,206,249,159
1,O43865,SAHH2,AHCYL1,S-adenosylhomocysteine hydrolase-like protein 1,29,S,0.911752,EDAEKysFMATVT,s,_EDAEKYsFMATVT_,...,157,196,174,113,179,39,21,219,125,129
2,Q8WX93,PALLD,PALLD,Palladin,35,S,0.999997,PGLsAFLSQEEINKS,s,PGLSAFLsQEEINKS,...,134,75,158,105,121,49,52,288,128,228
3,Q96NY7,CLIC6,CLIC6,Chloride intracellular channel protein 6,322,S,1.000000,AGESAGRsPG_____,s,AGESAGRsPG_____,...,171,71,54,74,67,258,108,243,213,182
4,Q02790,FKBP4,FKBP4,Peptidyl-prolyl cis-trans isomerase FKBP4,336,S,0.999938,PDRRLGKLKLQAFsAXXESCHCGGPSA,s,KLKLQAFsAXXESCH,...,223,12,36,99,130,208,113,288,186,240
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5281,Q6ZV89,SH2D5,SH2D5,SH2 domain-containing protein 5,415,S,0.999887,RPLSHAKsEAELQGL,s,RPLSHAKsEAELQGL,...,215,85,96,146,137,177,130,294,203,241
5282,Q92835,SHIP1,INPP5D,"Phosphatidylinositol 3,4,5-trisphosphate 5-pho...",243,S,1.000000,RLFDQQLsPGLRPRP,s,RLFDQQLsPGLRPRP,...,175,158,181,98,209,226,140,266,157,171
5283,Q5SXM2,SNPC4,SNAPC4,snRNA-activating protein complex subunit 4,1163,S,0.971390,PTHALSQsPAEADGS,s,PTHALSQsPAEADGS,...,151,139,140,148,181,214,131,292,205,196
5284,O15047,SET1A,SETD1A,Histone-lysine N-methyltransferase SETD1A,1167,T,0.927260,PPKKRRKtVSFSAIE,t,PPKKRRKtVSFSAIE,...,199,60,53,81,64,123,101,128,263,170


Promiscuity index (how many kinases are predicted for each site) can be calculated based on score or percentile and a specific threshold

(_Note: recommended thresholds are:<br>
Ser/Thr percentile: 90<br>
Tyrosine percentile: 80<br>
Ser/Thr and Tyrosine score: 1 [although percentile promiscuity is more recommended]_

In [13]:
pps.promiscuity_index(kin_type='ser_thr', metric='percentile', threshold=90)

Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window,phos_res,Sequence,Percentile Promiscuity Index
0,Q15149,PLEC,PLEC,Plectin,113,T,1.000000,MVMPARRtPHVQAVQ,t,MVMPARRtPHVQAVQ,61
1,O43865,SAHH2,AHCYL1,S-adenosylhomocysteine hydrolase-like protein 1,29,S,0.911752,EDAEKysFMATVT,s,_EDAEKYsFMATVT_,24
2,Q8WX93,PALLD,PALLD,Palladin,35,S,0.999997,PGLsAFLSQEEINKS,s,PGLSAFLsQEEINKS,22
3,Q96NY7,CLIC6,CLIC6,Chloride intracellular channel protein 6,322,S,1.000000,AGESAGRsPG_____,s,AGESAGRsPG_____,25
4,Q02790,FKBP4,FKBP4,Peptidyl-prolyl cis-trans isomerase FKBP4,336,S,0.999938,PDRRLGKLKLQAFsAXXESCHCGGPSA,s,KLKLQAFsAXXESCH,2
...,...,...,...,...,...,...,...,...,...,...,...
5281,Q6ZV89,SH2D5,SH2D5,SH2 domain-containing protein 5,415,S,0.999887,RPLSHAKsEAELQGL,s,RPLSHAKsEAELQGL,34
5282,Q92835,SHIP1,INPP5D,"Phosphatidylinositol 3,4,5-trisphosphate 5-pho...",243,S,1.000000,RLFDQQLsPGLRPRP,s,RLFDQQLsPGLRPRP,11
5283,Q5SXM2,SNPC4,SNAPC4,snRNA-activating protein complex subunit 4,1163,S,0.971390,PTHALSQsPAEADGS,s,PTHALSQsPAEADGS,4
5284,O15047,SET1A,SETD1A,Histone-lysine N-methyltransferase SETD1A,1167,T,0.927260,PPKKRRKtVSFSAIE,t,PPKKRRKtVSFSAIE,189


The function _predict_ can be used to calculate all data together. Each kinase will have four columns:
1. _KIN_score_: score
2. _KIN_score_rank_: rank based on score
3. _KIN_percentile_: percentile
4. _KIN_percentile_rank_: rank base on percentile

Two additional columns will be _Score Promiscuity Index_ and _Percentile Promiscuity Index_ (see explanation above)

In [14]:
pps.predict(kin_type='ser_thr')

Scoring 4944 ser_thr substrates
Calculating percentile for 4944 ser_thr substrates


100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 309/309 [00:02<00:00, 104.13it/s]


Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window,phos_res,Sequence,...,YSK1_percentile,YSK1_percentile_rank,YSK4_score,YSK4_score_rank,YSK4_percentile,YSK4_percentile_rank,ZAK_score,ZAK_score_rank,ZAK_percentile,ZAK_percentile_rank
0,Q15149,PLEC,PLEC,Plectin,113,T,1.000000,MVMPARRtPHVQAVQ,t,MVMPARRtPHVQAVQ,...,80.44,130,-3.004,249,32.17,244,-1.210,159,80.90,128
1,O43865,SAHH2,AHCYL1,S-adenosylhomocysteine hydrolase-like protein 1,29,S,0.911752,EDAEKysFMATVT,s,_EDAEKYsFMATVT_,...,63.85,150,-1.431,125,71.22,108,-1.481,129,76.87,82
2,Q8WX93,PALLD,PALLD,Palladin,35,S,0.999997,PGLsAFLSQEEINKS,s,PGLSAFLsQEEINKS,...,11.73,250,-2.567,128,44.07,119,-4.899,228,6.80,291
3,Q96NY7,CLIC6,CLIC6,Chloride intracellular channel protein 6,322,S,1.000000,AGESAGRsPG_____,s,AGESAGRsPG_____,...,52.69,134,-3.300,213,24.37,284,-2.839,182,47.81,163
4,Q02790,FKBP4,FKBP4,Peptidyl-prolyl cis-trans isomerase FKBP4,336,S,0.999938,PDRRLGKLKLQAFsAXXESCHCGGPSA,s,KLKLQAFsAXXESCH,...,46.82,216,-2.265,186,52.25,178,-3.020,240,43.29,233
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5281,Q6ZV89,SH2D5,SH2D5,SH2 domain-containing protein 5,415,S,0.999887,RPLSHAKsEAELQGL,s,RPLSHAKsEAELQGL,...,16.68,266,-3.210,203,26.65,229,-4.192,241,16.57,267
5282,Q92835,SHIP1,INPP5D,"Phosphatidylinositol 3,4,5-trisphosphate 5-pho...",243,S,1.000000,RLFDQQLsPGLRPRP,s,RLFDQQLsPGLRPRP,...,39.20,175,-2.851,157,36.19,194,-3.089,171,41.50,170
5283,Q5SXM2,SNPC4,SNAPC4,snRNA-activating protein complex subunit 4,1163,S,0.971390,PTHALSQsPAEADGS,s,PTHALSQsPAEADGS,...,14.90,238,-4.188,205,7.35,297,-4.095,196,18.32,220
5284,O15047,SET1A,SETD1A,Histone-lysine N-methyltransferase SETD1A,1167,T,0.927260,PPKKRRKtVSFSAIE,t,PPKKRRKtVSFSAIE,...,99.56,73,-2.002,263,58.85,243,1.002,170,97.76,128


For tyrosine kinases, an option to include non-canonical kinases is available (annotated as KIN_TYR, see Yaron-Barir et al. 2024)

In [15]:
pps.percentile(kin_type='tyrosine', non_canonical=True)

Scoring 331 tyrosine substrates
Calculating percentile for 331 tyrosine substrates


100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 93/93 [00:00<00:00, 1908.75it/s]


Unnamed: 0,uniprot,protein,gene,description,position,residue,best_localization_prob,sequence window,phos_res,Sequence,...,TRKC,TXK,TYK2,TYRO3,VEGFR1,VEGFR2,VEGFR3,WEE1_TYR,YES,ZAP70
7,P42768,WASP,WAS,Wiskott-Aldrich syndrome protein,291,Y,1.000000,AETSKLIyDFIEDQG,y,AETSKLIyDFIEDQG,...,44.78,67.54,7.88,43.64,31.96,21.97,64.76,16.33,58.18,19.68
18,Q96PK6,RBM14,RBM14,RNA-binding protein 14,226,Y,0.787879,RSPPRASyVAPLTAQ,y,RSPPRASyVAPLTAQ,...,56.10,47.41,61.66,54.13,34.74,70.98,26.75,29.31,13.37,86.63
22,Q96A65,EXOC4,EXOC4,Exocyst complex component 4,247,Y,0.999846,KFLDTSHySTAGSSS,y,KFLDTSHySTAGSSS,...,78.66,68.33,46.55,81.18,38.92,40.93,33.84,51.39,54.43,69.53
45,Q9Y2H5,PKHA6,PLEKHA6,Pleckstrin homology domain-containing family A...,890,Y,0.998937,EEPGGHAyETPREEI,y,EEPGGHAyETPREEI,...,50.36,74.04,45.39,73.56,45.04,42.83,20.59,37.76,66.99,94.72
58,Q16625,OCLN,OCLN,Occludin,368,Y,0.999580,DDFRQPRySSGGNFE,y,DDFRQPRySSGGNFE,...,4.00,47.74,24.01,36.11,23.16,29.29,28.65,51.92,36.81,12.47
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5210,Q9UKX2,MYH2,MYH2,Myosin-2,362,Y,1.000000,LTGAVMHyGNLKFKQ,y,LTGAVMHyGNLKFKQ,...,76.01,45.83,88.42,54.59,60.39,82.23,70.61,82.86,60.65,23.68
5230,Q7L9L4,MOB1B,MOB1B,MOB kinase activator 1B,26,Y,0.999974,IPEGSHQyELLKHAE,y,IPEGSHQyELLKHAE,...,83.56,64.61,47.98,92.34,76.60,92.80,91.73,49.42,79.41,81.84
5246,Q8WZ42,TITIN,TTN,Titin,17224,Y,1.000000,RIMAQNKyGIGEPLD,y,RIMAQNKyGIGEPLD,...,15.37,25.21,20.79,49.09,4.22,12.93,15.39,39.40,70.30,49.16
5248,O95295,SNAPN,SNAPIN,SNARE-associated protein Snapin,129,Y,0.809142,AMLDSGIyPPGSPGK,y,AMLDSGIyPPGSPGK,...,40.77,67.13,27.67,45.37,20.33,22.91,22.98,45.81,60.01,61.11
