## Parser for Table of Pharmacogenomic Biomarkers in Drug Labeling
* obtained the source table here: https://drugcentral.org/download
* last obtained timestamp: 07/15/2025
* Content current as of: 11/01/2023
* additional information:
    * Drug-target interaction data extracted from literature, drug labels, and external data sources in TSV format:
    * drug.target.interaction.tsv 

In [1]:
## To do list:
## Add

In [2]:
## Load necessary packages
import os
import pandas as pd
import glob
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt

## load the TCT related packages
from TCT import node_normalizer
from TCT import name_resolver
from TCT import translator_metakg
from TCT import translator_kpinfo
from TCT import translator_query
from TCT import TCT

## Define the version number
version_number = "07_23_2025"
deployment_date = "2025-07-23"

In [3]:
## Load the Biolink category and predicate dictionary for mapping subject, object, and predicate types
%run ./Biolink_category_and_predication_dictionary.ipynb

Date of last update:  2025-07-08
Order is to always process Node/category map first, since the Edeg/predicate map depends on biolink-complainat node values
-----------------------------------------------------------------------------------------------------------------------------
Dictionary: category_map, Key template: Subject_category or Object_category
------------------------------------------------------------------------------------------
Dictionary: predicate_map, Key template: (Subject_category, Object_category, Predicate)


In [4]:
## Load all helper functions
%run /Users/Weiqi0/ISB_working/Hadlock_lab/QI_ISB_Git_repo/TranslatorPharcogenomicsKG/Parser_helper_functions.ipynb

In [5]:
## Notice!! Please change the file path of following codes into your own
raw_files_path = '/Users/Weiqi0/ISB_working/Ilya_lab/Translator/Pharmagenomics_KG/files/drug_central/'

## Define the output path for node & edge files after formatting
download_path_node_file = f'/Users/Weiqi0/ISB_working/Ilya_lab/Translator/Pharmagenomics_KG/files/parsed/drug_central_parsed_node_{version_number}.tsv'
download_path_edge_file = f'/Users/Weiqi0/ISB_working/Ilya_lab/Translator/Pharmagenomics_KG/files/parsed/drug_central_parsed_edge_{version_number}.tsv'

In [6]:
## Check all node files being read
## Read all BigGIM node csv file in group 1

for f in os.listdir(raw_files_path):
    if f.endswith('.tsv'):
        print(f)

drug.target.interaction.tsv


In [7]:
## Read each individual csv files
source_df = pd.read_csv(raw_files_path + 'drug.target.interaction.tsv', sep = '\t')
source_df.head(10)

Unnamed: 0,DRUG_NAME,STRUCT_ID,TARGET_NAME,TARGET_CLASS,ACCESSION,GENE,SWISSPROT,ACT_VALUE,ACT_UNIT,ACT_TYPE,ACT_COMMENT,ACT_SOURCE,RELATION,MOA,MOA_SOURCE,ACT_SOURCE_URL,MOA_SOURCE_URL,ACTION_TYPE,TDL,ORGANISM
0,levobupivacaine,4,Potassium voltage-gated channel subfamily H me...,Ion channel,Q12809,KCNH2,KCNH2_HUMAN,4.89,,IC50,Inhibition of wild-type human ERG channel expr...,CHEMBL,=,,,,,,Tclin,Homo sapiens
1,levobupivacaine,4,Sodium channel protein type 1 subunit alpha,Ion channel,P35498,SCN1A,SCN1A_HUMAN,5.79,,IC50,,WOMBAT-PK,=,,,,,,Tclin,Homo sapiens
2,levobupivacaine,4,Sodium channel protein type 4 subunit alpha,Ion channel,P35499,SCN4A,SCN4A_HUMAN,,,,,WOMBAT-PK,,1.0,CHEMBL,,https://www.ebi.ac.uk/chembl/compound/inspect/...,BLOCKER,Tclin,Homo sapiens
3,levobupivacaine,4,Prostaglandin E2 receptor EP1 subtype,GPCR,P34995,PTGER1,PE2R1_HUMAN,,,,,WOMBAT-PK,,,,,,,Tclin,Homo sapiens
4,levobupivacaine,4,Cytochrome P450 2D6,Enzyme,P10635,CYP2D6,CP2D6_HUMAN,6.707,,IC50,"DRUGMATRIX: CYP450, 2D6 enzyme inhibition (sub...",DRUG MATRIX,=,,,,,,Tclin,Homo sapiens
5,levobupivacaine,4,5-hydroxytryptamine receptor 3A,Ion channel,P46098,HTR3A,5HT3A_HUMAN,,,,,WOMBAT-PK,,,,,,,Tclin,Homo sapiens
6,levobupivacaine,4,Potassium voltage-gated channel subfamily D me...,Ion channel,Q9UK17,KCND3,KCND3_HUMAN,4.5,,IC50,,WOMBAT-PK,=,,,,,,Tclin,Homo sapiens
7,levobupivacaine,4,Potassium voltage-gated channel subfamily A me...,Ion channel,P22460,KCNA5,KCNA5_HUMAN,,,,,WOMBAT-PK,,,,,,,Tclin,Homo sapiens
8,(S)-nicardipine,5,Voltage-gated L-type calcium channel,Ion channel,Q01668|Q13936,CACNA1D|CACNA1C,CAC1D_HUMAN|CAC1C_HUMAN,,,,Mechanism of Action,DRUG LABEL,,1.0,DRUG LABEL,http://www.accessdata.fda.gov/drugsatfda_docs/...,http://www.accessdata.fda.gov/drugsatfda_docs/...,BLOCKER,Tclin|Tclin|Tclin|Tclin,Homo sapiens
9,(S)-nitrendipine,6,Intermediate conductance calcium-activated pot...,Ion channel,O15554,KCNN4,KCNN4_HUMAN,7.6,,IC50,,IUPHAR,=,,,,,BLOCKER,Tchem,Homo sapiens


In [None]:
## Execute name resolver to try to find all corresponding identifiers in Translator
* use name_resolver.loopup() function
* use name_resolver.batch_lookup() function for batch mapping

In [8]:
name = '(S)-nitrendipine'
input_node_info = name_resolver.lookup(name)
print(input_node_info)

TranslatorNode(curie='CHEBI:135522', label='(S)-nitrendipine', types=['biolink:SmallMolecule', 'biolink:MolecularEntity', 'biolink:ChemicalEntity', 'biolink:PhysicalEssence', 'biolink:ChemicalOrDrugOrTreatment', 'biolink:ChemicalEntityOrGeneOrGeneProduct', 'biolink:ChemicalEntityOrProteinOrPolypeptide', 'biolink:NamedThing', 'biolink:Entity', 'biolink:PhysicalEssenceOrOccurrent'], synonyms=None, curie_synonyms=None)


In [9]:
print(input_node_info.curie)

CHEBI:135522


In [None]:
## Apply name_resolver.lookup and extract .curie for subject name
# source_df['subject'] = source_df['subject_name'].apply(lambda name: name_resolver.lookup(name).curie if name_resolver.lookup(name) else None)

## switch to use batch_lookup?
import pandas as pd

# Get all names
names = source_df['subject_name'].tolist()

# Break into batches of 25
batch_size = 25
batches = [names[i:i + batch_size] for i in range(0, len(names), batch_size)]

# Run batch lookups and collect results
results = {}
for batch in batches:
    lookup_results = name_resolver.batch_lookup(batch)  # Expected to return a dict: {name: result or None}
    for name, result in lookup_results.items():
        results[name] = result.curie if result else None

# Map the resolved CURIEs back to the DataFrame
source_df['subject'] = source_df['subject_name'].map(results)