# Database module

The database module allows users to query relevant database APIs to extract clinically relevant mutational data and protein annotations from various sources, including:
+ **cBioPortal:**  multi-institutional repository of sequencing data for cancer genomics
+ **UniProt:** obtain canonical sequence information
+ **Pfam:** annotate protein domains of sequences
+ **HGNC:** standardize gene naming conventions
+ **KLIFS:** kinase-ligand interaction annotations
+ **KinHub:** curated list of human kinases

## KinHub

In this vignette, we will extract a list of human kinases obtained from [KinHub](http://www.kinhub.org/kinases.html) for which we wish to obtain additional protein annotations and extract corresponding mutations from a cBioPortal cohort.

In [5]:
from missense_kinase_toolkit.databases import scrapers

In [6]:
df_kinhub = scrapers.kinhub()

In [7]:
df_kinhub.head()

Unnamed: 0,HGNC Name,xName,Manning Name,Kinase Name,Group,Family,SubFamily,UniprotID
0,ABL1,ABL1,ABL,Tyrosine-protein kinase ABL1,TK,Abl,,P00519
1,TNK2,ACK,ACK,Activated CDC42 kinase 1,TK,Ack,,Q07912
2,ACVR2A,ACTR2,ACTR2,Activin receptor type-2A,TKL,STKR,STKR2,P27037
3,ACVR2B,ACTR2B,ACTR2B,Activin receptor type-2B,TKL,STKR,STKR2,Q13705
4,ADCK4,ADCK4,ADCK4,Uncharacterized aarF domain-containing protein...,Atypical,ABC1,ABC1-A,Q96D53


In [KinHub](http://www.kinhub.org/kinases.html), 13 kinases possess more than 1 kinase domain so are listed as separate entries despite possessing a single UniProt ID. To remedy this, we have aggregated entries by common HGNC gene names so that each entry represents a unique protein rather than a unique kinase domain.

In [13]:
df_kinhub.loc[df_kinhub["Manning Name"].apply(lambda x: "DOMAIN2" in x.upper()), ]

Unnamed: 0,HGNC Name,xName,Manning Name,Kinase Name,Group,Family,SubFamily,UniprotID
141,JAK1,"JAK1, JAK1_b","JAK1, Domain2_JAK1",Tyrosine-protein kinase JAK1,TK,"JakB, Jak",,P23458
142,JAK2,"JAK2, JAK2_b","JAK2, Domain2_JAK2",Tyrosine-protein kinase JAK2,TK,"JakB, Jak",,O60674
143,JAK3,"JAK3, JAK3_b","JAK3, Domain2_JAK3",Tyrosine-protein kinase JAK3,TK,"JakB, Jak",,P52333
191,RPS6KA5,"MSK1_b, MSK1","Domain2_MSK1, MSK1",Ribosomal protein S6 kinase alpha-5,"CAMK, AGC","RSK, RSKb","MSK, MSKb",O75582
192,RPS6KA4,"MSK2_b, MSK2","Domain2_MSK2, MSK2",Ribosomal protein S6 kinase alpha-4,"CAMK, AGC","RSK, RSKb","MSK, MSKb",O75676
268,RPS6KA2,"RSK3, RSK3_b","RSK1, Domain2_RSK1",Ribosomal protein S6 kinase alpha-2,"CAMK, AGC","RSK, RSKb","RSKb, RSKp90",Q15349
269,RPS6KA3,"RSK2_b, RSK2","Domain2_RSK2, RSK2",Ribosomal protein S6 kinase alpha-3,"CAMK, AGC","RSK, RSKb","RSKb, RSKp90",P51812
270,RPS6KA1,"RSK1_b, RSK1","RSK3, Domain2_RSK3",Ribosomal protein S6 kinase alpha-1,"CAMK, AGC","RSK, RSKb","RSKb, RSKp90",Q15418
305,TYK2,"TYK2, TYK2_b","Domain2_TYK2, TYK2",Non-receptor tyrosine-protein kinase TYK2,TK,"JakB, Jak",,P29597
395,EIF2AK4,"GCN2, GCN2_b","Domain2_GCN2, GCN2",Eukaryotic translation initiation factor 2-alp...,"STE, Other","STE-Unique, PEK","nan, GCN2",Q9P2K8


In [11]:
df_kinhub.loc[df_kinhub["HGNC Name"].apply(lambda x: "JAK" in x.upper()), ]

Unnamed: 0,HGNC Name,xName,Manning Name,Kinase Name,Group,Family,SubFamily,UniprotID
141,JAK1,"JAK1, JAK1_b","JAK1, Domain2_JAK1",Tyrosine-protein kinase JAK1,TK,"JakB, Jak",,P23458
142,JAK2,"JAK2, JAK2_b","JAK2, Domain2_JAK2",Tyrosine-protein kinase JAK2,TK,"JakB, Jak",,O60674
143,JAK3,"JAK3, JAK3_b","JAK3, Domain2_JAK3",Tyrosine-protein kinase JAK3,TK,"JakB, Jak",,P52333


## Pfam

In [9]:
from missense_kinase_toolkit.databases import pfam

In [8]:
?pfam.Pfam()

Object `pfam.Pfam()` not found.


In [14]:
?pfam

[0;31mType:[0m        module
[0;31mString form:[0m <module 'missense_kinase_toolkit.databases.pfam' from '/Users/jessicawhite/Library/CloudStorage/OneDrive-Personal/PhD/Chodera/missense_kinase_toolkit/src/missense_kinase_toolkit/databases/pfam.py'>
[0;31mFile:[0m        ~/Library/CloudStorage/OneDrive-Personal/PhD/Chodera/missense_kinase_toolkit/src/missense_kinase_toolkit/databases/pfam.py
[0;31mDocstring:[0m   <no docstring>

## Uniprot

In [1]:
from missense_kinase_toolkit.databases import uniprot