# Kinase fold structural alignment

This notebook showcases alignment of kinases by the structure of their ATP-binding folds.

The user may select any two human kinases and their ATP-binding domains will be displayed with structures aligned and superimposed. Alignment is done by the TM-align algorithm.

For this visualization, we use high-confidence predicted structures from AlphaFold2 (https://alphafold.ebi.ac.uk/).

## Instructions

**Choose any two human kinases below by entering their Uniprot entry names.**

You may also choose the colors used to display the protein structures.

In [32]:
### choose proteins to align
### Note: for kinases with multiple ATP-binding domains, append "_1", "_2", etc. to align the specified domain
prot1 = "MTOR_HUMAN"
prot2 = "ATR_HUMAN"
### choose colors for proteins
col1 = "cyan"
col2 = "yellow"

### Other examples

In [33]:
#prot1 = "PK3CA_HUMAN"
#prot2 = "PK3CB_HUMAN"
#prot1 = "STK11_HUMAN"
#prot2 = "GRK7_HUMAN"
#prot1 = "AKT1_HUMAN"
#prot2 = "AKT2_HUMAN"
#prot1 = "KS6A6_HUMAN_1" ### only works with full_proteins == False
#prot2 = "KS6A6_HUMAN_2" ### only works with full_proteins == False

## Main code

### Install necessary Python packages

In [34]:
!pip install biopython
!pip install pyprojroot
!pip install py3Dmol



### Load packages

In [35]:
from Bio.PDB import *
from pyprojroot import here
import os
#import nglview as nv
import py3Dmol
import numpy as np
import pickle
import pandas as pd
import seaborn as sns

### Download data

In [36]:
pf = "AF2_pocketome_tm_score.pkl"
check = not os.path.exists(pf)
call = "wget " + "https://github.com/NicholasClark/collab_kinase_proj/raw/master/" + pf
print(call)
if check:
  os.system(call)

wget https://github.com/NicholasClark/collab_kinase_proj/raw/master/AF2_pocketome_tm_score.pkl


### Load necessary data files -- a pkl file of TM-align output and a CSV with kinase metadata

In [37]:
### load pkl file
pkl_file = "AF2_pocketome_tm_score.pkl"
f = open(pkl_file, "rb")
tmp = pickle.load(f)
f.close()

### load csv file with metadata
csv_file = "https://github.com/NicholasClark/collab_kinase_proj/raw/master/kinome_plus_pocket_meta_2022_04_06.csv"
df = pd.read_csv(csv_file)

### Kinase metadata

In [38]:
### the column "uniprot_name_mod" contains uniprot entry names for all human kinases
### Note: kinases with multiple ATP-binding domains are listed multiple times here, with "_1", "_2", etc. appended.
df[['uniprot_name_mod', 'uniprot_accession', 'gene_names', 'protein_names', 'hgnc_symbol','is_idg_dark_kinase']]

Unnamed: 0,uniprot_name_mod,uniprot_accession,gene_names,protein_names,hgnc_symbol,is_idg_dark_kinase
0,AAK1_HUMAN,Q2M2I8,AAK1 KIAA1048,AP2-associated protein kinase 1 (EC 2.7.11.1) ...,AAK1,0
1,AAPK1_HUMAN,Q13131,PRKAA1 AMPK1,5'-AMP-activated protein kinase catalytic subu...,PRKAA1,0
2,AAPK2_HUMAN,P54646,PRKAA2 AMPK AMPK2,5'-AMP-activated protein kinase catalytic subu...,PRKAA2,0
3,ABL1_HUMAN,P00519,ABL1 ABL JTK7,Tyrosine-protein kinase ABL1 (EC 2.7.10.2) (Ab...,ABL1,0
4,ABL2_HUMAN,P42684,ABL2 ABLL ARG,Tyrosine-protein kinase ABL2 (EC 2.7.10.2) (Ab...,ABL2,0
...,...,...,...,...,...,...
705,WNK4_HUMAN,Q96J92,WNK4 PRKWNK4,Serine/threonine-protein kinase WNK4 (EC 2.7.1...,WNK4,0
706,XYLB_HUMAN,O75191,XYLB,Xylulose kinase (Xylulokinase) (EC 2.7.1.17),XYLB,0
707,XYLK_HUMAN,O75063,FAM20B KIAA0475,Glycosaminoglycan xylosylkinase (EC 2.7.1.-) (...,FAM20B,0
708,YES_HUMAN,P07947,YES1 YES,Tyrosine-protein kinase Yes (EC 2.7.10.2) (Pro...,YES1,0


### Helper functions

In [39]:
## function to take uniprot name and translate to the accession number
## @input uni_name - a uniprot name (i.e. "STK11_HUMAN") -- kinases with multiple ATP-binding domains will have "_1" or "_2" appended 
## @return the corresponding uniprot accession number -- again some will be appended with "_1" or "_2"
def uni_name_to_acc(uni_name):
    ind = np.where(df.uniprot_name_mod == uni_name)
    uni_acc = df.uniprot_accession_mod[int(ind[0])]
    return(uni_acc)

## function to take two proteins and get the TM-align rotation matrix + translation vector
## @input a,b - two uniprot accession numbers 
## @return u,t - a rotation matrix u and a translation vector t
def get_rot_mat(a,b):
    l1 = tmp['uni1']
    l2 = tmp['uni2']
    u_list = tmp['u']
    t_list = tmp['t']
    tm1 = tmp['tm1']
    tm2 = tmp['tm2']
    for i in range(0,len(l1)):
        if l1[i] == a and l2[i] == b:
            u = u_list[i]
            t = t_list[i]
            tm_1 = tm1[i]
            tm_2 = tm2[i]
            order = "same"
            return(u, t, tm_1, tm_2, order)
        if l1[i] == b and l2[i] == a:
            u = u_list[i]
            t = t_list[i]
            tm_1 = tm1[i]
            tm_2 = tm2[i]
            order = "reverse"
            return(u, t, tm_1, tm_2, order)
        ## if not found, return empty lists
    return([], [], [], [], [])

In [40]:
acc1 = uni_name_to_acc(prot1)
acc2 = uni_name_to_acc(prot2)
file1 = "https://github.com/NicholasClark/collab_kinase_proj/raw/master/AF2_kinase_pockets/" + acc1 + "_pocket_only.pdb"
file2 = "https://github.com/NicholasClark/collab_kinase_proj/raw/master/AF2_kinase_pockets/" + acc2 + "_pocket_only.pdb"
os.system("wget " + file1)
os.system("wget " + file2)

0

### Load and align kinase structures

In [41]:
acc1 = uni_name_to_acc(prot1)
acc2 = uni_name_to_acc(prot2)

file1_full = "AF-" + acc1 + "-F1-model_v2.pdb"
file2_full = "AF-" + acc2 + "-F1-model_v2.pdb"
os.system("wget " + "https://github.com/NicholasClark/collab_kinase_proj/raw/master/AF2/" + file1_full)
os.system("wget " + "https://github.com/NicholasClark/collab_kinase_proj/raw/master/AF2/" + file2_full)

file1 = acc1 + "_pocket_only.pdb"
file2 = acc2 + "_pocket_only.pdb"
os.system("wget " + "https://github.com/NicholasClark/collab_kinase_proj/raw/master/AF2_kinase_pockets/" + file1)
os.system("wget " + "https://github.com/NicholasClark/collab_kinase_proj/raw/master/AF2_kinase_pockets/" + file2)

0

In [42]:
parser = PDBParser()
str1 = parser.get_structure(prot1, file1)
str2 = parser.get_structure(prot2, file2)

u,t,tm1,tm2,order = get_rot_mat(acc1, acc2)

if order == "same":
    str1.transform(u.T, t)
else:
    str2.transform(u.T, t)

tmp_f1 = "str1.pdb"
tmp_f2 = "str2.pdb"
io=PDBIO()
io.set_structure(str1)
io.save(tmp_f1)
io=PDBIO()
io.set_structure(str2)
io.save(tmp_f2)

In [43]:
str1_full = parser.get_structure(prot1, file1_full)
str2_full = parser.get_structure(prot2, file2_full)

u_full,t_full,tm1_full,tm2_full,order_full = get_rot_mat(acc1, acc2)

if order_full == "same":
    str1_full.transform(u.T, t)
else:
    str2_full.transform(u.T, t)

tmp_f1_full = "str1_full.pdb"
tmp_f2_full = "str2_full.pdb"
io=PDBIO()
io.set_structure(str1_full)
io.save(tmp_f1_full)
io=PDBIO()
io.set_structure(str2_full)
io.save(tmp_f2_full)

## Output

### Kinase alignment visualization

The aligned kinases are displayed along with their TM-scores.
Note that the TM-score is normalized by the length of the reference protein, so there are two possible TM-scores -- both are displayed:

#### ATP-binding domains only

In [None]:
### print TM-score
print(prot1 + ": " + acc1)
print(prot2 + ": " + acc2)
print("TM-score 1: " + str( round(tm1, 2)) )
print("TM-score 2: " + str( round(tm2, 2)) )
### View protein overlays
view = py3Dmol.view()
view.addModel(open(tmp_f1, 'r').read(), 'pdb')
view.setStyle({'model': 0},{'cartoon':{'color':col1}})
view.addModel(open(tmp_f2, 'r').read(), 'pdb')
view.setStyle({'model': 1},{'cartoon':{'color':col2}})
view.zoomTo()
view.show()

#### Full kinase structures

In [None]:
### print TM-score
print(prot1 + ": " + acc1)
print(prot2 + ": " + acc2)
print("TM-score 1: " + str( round(tm1_full, 2)) )
print("TM-score 2: " + str( round(tm2_full, 2)) )
### View protein overlays
view = py3Dmol.view()
view.addModel(open(tmp_f1_full, 'r').read(), 'pdb')
view.setStyle({'model': 0},{'cartoon':{'color':col1}})
view.addModel(open(tmp_f2_full, 'r').read(), 'pdb')
view.setStyle({'model': 1},{'cartoon':{'color':col2}})
view.zoomTo()
view.show()

#### Heatmap of TM-scores

This heatmap shows pairwise TM-scores of kinases (ATP-binding domain only).

Each TM-score is between 0 (no alignment) and 1 (perfect alignment).

In [None]:
mat = tmp['tm_max_mat']
sns.clustermap(mat, cmap="YlOrRd", vmin=0, vmax=1)