# The Kinase Library

In [1]:
import kinase_library as kl

## Substrate object

> _An example of how to utlize individual scores to decipher a phosphorylation cascade can be found in ["Host protein kinases required for SARS-CoV-2 nucleocapsid phosphorylation and viral replication" (Yaron et al., Science Signaling, 2023)](https://www.science.org/doi/full/10.1126/scisignal.abm0808)_

For example:

In [2]:
s = kl.Substrate('PSVEPPLsQETFSDL') #p53 S33

Get the scores, percentiles, and ranks for different kinases (can use the pararmeter `kinases` for specific kinases only):

In [3]:
s.score()

ATM       5.0385
SMG1      4.2377
DNAPK     3.8172
ATR       3.5045
FAM20C    3.1716
           ...  
P70S6K   -7.2917
AKT3     -7.3741
PKCI     -7.7337
PBK      -7.9945
NEK3     -8.2455
Length: 309, dtype: float64

In [4]:
s.percentile()

ATM        99.83
SMG1       99.77
ATR        99.69
DNAPK      99.21
FAM20C     95.23
           ...  
BRAF        7.86
AKT2        6.79
P70S6KB     6.64
NEK3        4.85
P70S6K      4.19
Length: 309, dtype: float64

In [5]:
s.rank(method='percentile')

ATM          1
SMG1         2
ATR          3
DNAPK        4
FAM20C       5
          ... 
BRAF       305
AKT2       306
P70S6KB    307
NEK3       308
P70S6K     309
Length: 309, dtype: int64

The function _predict_ can be used to calculate all data together. Each kinase will have four columns:
1. _Score_: score
2. _Score Rank_: rank based on score
3. _Percentile_: percentile
4. _Percentile Rank_: rank base on percentile

In [6]:
s.predict()

Unnamed: 0,Score,Score Rank,Percentile,Percentile Rank
ATM,5.0385,1,99.83,1
SMG1,4.2377,2,99.77,2
ATR,3.5045,4,99.69,3
DNAPK,3.8172,3,99.21,4
FAM20C,3.1716,5,95.23,5
...,...,...,...,...
BRAF,-4.4003,241,7.86,305
AKT2,-5.6530,283,6.79,306
P70S6KB,-3.9915,221,6.64,307
NEK3,-8.2455,309,4.85,308


In [7]:
s.predict(kinases=['ATM','atr','SmG1','DNApk','CHK1','rsk4','mTOR','PDK1','BRAF','p70S6K'])

Unnamed: 0,Score,Score Rank,Percentile,Percentile Rank
ATM,5.0385,1,99.83,1
SMG1,4.2377,2,99.77,2
ATR,3.5045,4,99.69,3
DNAPK,3.8172,3,99.21,4
CHK1,-0.4753,54,73.41,44
RSK4,-2.1822,125,57.23,109
MTOR,-1.0547,85,41.04,171
PDK1,-3.4896,204,23.56,248
BRAF,-4.4003,241,7.86,305
P70S6K,-7.2917,305,4.19,309


If you want to incorporate phosphopriming (phosphorylated residues around the phosphoacceptor) - use lower case:

In [8]:
s_pp = kl.Substrate('QQQSYLDsGIHsGAT', pp=True) #beta-catenin (CTNNB1) S33

In [9]:
s_pp.predict()

Unnamed: 0,Score,Score Rank,Percentile,Percentile Rank
GSK3B,5.3277,1,100.00,1
GSK3A,3.9962,2,98.09,2
CK1G2,1.7640,4,95.75,3
CAMKK2,-0.3313,37,95.56,4
MASTL,0.3855,19,94.63,5
...,...,...,...,...
HASPIN,-8.1115,309,6.11,305
MAP3K15,-6.4820,301,5.59,306
SRPK2,-6.4160,299,3.16,307
SRPK3,-5.1276,272,2.78,308
