## DrugBank Small Molecule Transporters Drug-Set Library
### Drug-set labels: Genes
#### ALL DATABASES ACCESSED 01/01/20
##### Author : Eryk Kropiwnicki | eryk.kropiwnicki@icahn.mssm.edu

In [1]:
import json
import pandas as pd
from collections import defaultdict
import csv
import os

In [2]:
os.chdir('../../../scripts')
from export_script import *
from gene_resolver import *
os.chdir('../notebooks/Drugbank/Small molecules')

### Matching Small Molecule Names to Entrez Gene Symbols
#### Input File : drugbank_small_molecule_target_polypeptide_ids.csv (https://www.drugbank.ca/releases/latest#protein-identifiers)
#### Downloaded on 0/01/2020

In [3]:
# Import all protein names and ids matched to drugbank drugs #
df = pd.read_csv('input/drugbank_small_molecule_transporter_polypeptide_ids.csv',
                       usecols = ['Gene Name', 'Species', 'Drug IDs'])

In [4]:
df.head()

Unnamed: 0,Gene Name,Species,Drug IDs
0,CACNA1I,Humans,DB09498
1,SCNN1B,Humans,DB14509
2,SLC7A11,Humans,DB01098; DB04348; DB05540; DB08833; DB08834
3,SLC12A5,Humans,DB00761
4,SEC14L4,Humans,DB11251; DB11635; DB14003


In [5]:
# Dropping all non-human gene names #
df = df[df['Species'].str.contains('Humans', na = False)]

In [6]:
len(df)

214

### Validating genes using lookup table
#### Lookup table generated from ftp://ftp.ncbi.nih.gov/gene/DATA/GENE_INFO/Mammalia

In [7]:
gene_resolver(df, columnName = 'Gene Name')

In [8]:
len(df)

214

In [9]:
df.head()

Unnamed: 0,Gene Name,Species,Drug IDs,Approved Symbol
0,CACNA1I,Humans,DB09498,CACNA1I
1,SCNN1B,Humans,DB14509,SCNN1B
2,SLC7A11,Humans,DB01098; DB04348; DB05540; DB08833; DB08834,SLC7A11
3,SLC12A5,Humans,DB00761,SLC12A5
4,SEC14L4,Humans,DB11251; DB11635; DB14003,SEC14L4


In [10]:
# Splitting "; " separated drug IDs into separate rows #
df_transporter = pd.DataFrame(df['Drug IDs'].str.split('; ').tolist(), index = df['Approved Symbol']).stack()
df_transporter = df_transporter.reset_index()[[0, 'Approved Symbol']]
df_transporter.columns = ['DrugBank ID','Gene']

In [11]:
df_transporter.head()

Unnamed: 0,DrugBank ID,Gene
0,DB09498,CACNA1I
1,DB14509,SCNN1B
2,DB01098,SLC7A11
3,DB04348,SLC7A11
4,DB05540,SLC7A11


### Associate each DrugBank ID with InChI Key

In [12]:
# Import Drugbank mapping file
drugbank_mapping = pd.read_csv('../../../metadata/drugmonizome_metadata.tsv', sep = '\t', usecols = ['DrugBank ID',
                                                                                                 'Standard InChI Key'])

In [13]:
drugbank_mapping.head()

Unnamed: 0,DrugBank ID,Standard InChI Key
0,DB00006,OIRCOABEOLEUMC-GEJPAHFPSA-N
1,DB00007,GFIJNRVAKGFPGQ-LIJARHBVSA-N
2,DB00014,BLCLNMBMMGCOAS-URPVMXJPSA-N
3,DB00027,NDAYQJDHGXTBJL-MWWSRJDJSA-N
4,DB00035,NFLWUMRGJYTJIN-PNIOQBSNSA-N


In [14]:
df_transporter = df_transporter.merge(drugbank_mapping)

In [15]:
df_transporter.head(3)

Unnamed: 0,DrugBank ID,Gene,Standard InChI Key
0,DB09498,CACNA1I,AHBGXTDRMVNFER-FCHARDOESA-L
1,DB09498,SLC8A1,AHBGXTDRMVNFER-FCHARDOESA-L
2,DB14509,SCNN1B,XGZVUEUWXADBQD-UHFFFAOYSA-L


In [16]:
# Creating list of gene names and drug IDs #
genes = df_transporter['Gene'].tolist()
drugs = df_transporter['Standard InChI Key'].tolist()

### Creating drugsetlibrary and exporting

In [17]:
# The input file contains duplicate protein ids matched to unique Drugbank accession numbers #
# Tupelizing protein ids and drugbank accession numbers and grouping all corresponding drugbank accession numbers under one common dictionary key #

id_dict = tuple(zip(genes, drugs))

drugsetlibrary = defaultdict(list)
for k, v in id_dict:
    drugsetlibrary[k].append(v)

In [18]:
# Removing all terms paired with less than 5 drugs #
drugsetlibrary = {k:list(set(v)) for k,v in drugsetlibrary.items() if len(set(v))>=5}

In [19]:
os.chdir('../../../data/Drugbank')

In [20]:
gmt_formatter(drugsetlibrary, 'Drugbank_smallmolecule_transporter_drugsetlibrary.gmt')

### Library counts

In [21]:
library_counts(drugsetlibrary)

832 unique drugs
51 unique association terms
2387 unique associations
46.80392156862745 average drugs per term
