__Aim:__
- [x] Exploring PrimeKG to find potential drugs for targets listed through multi-omics data integration.
- [x] List of potential drugs for combination with DAC to target genes listed through multi-omics data integration.
- [x] Evaluation of _Clinical Phase_ for listed drugs.
- [x] Use CanDI and TDC to extract info about `Drug-cell_line` and `Drug-Drug` interactions in a cancer context.
- [ ] Rank drugs for potential experimental validations.

__Contributions:__
- Expanding TDC data loader for PrimeKG
  - https://github.com/mims-harvard/PrimeKG#dataloader-therapeutics-data-commons
  - https://github.com/mims-harvard/PrimeKG/pull/12#issuecomment-1741878955


__Related works/links:__

- https://github.com/mims-harvard/TDC/blob/master/tutorials/TDC_103.1_Datasets_Small_Molecules.ipynb

- https://github.com/AstraZeneca/skywalkR-graph-features

> KR4SL: knowledge graph reasoning for explainable prediction of synthetic lethality 
> - https://doi.org/10.1093/bioinformatics/btad261


<!-- - https://tdcommons.ai/multi_pred_tasks/ppi/ -->

___

### Load ...

In [4]:
import numpy as np 
import pandas as pd
import anndata as ad
import screenpro as scp

from screenpro.load import loadScreenProcessingData, read_screen_pkl

In [5]:
import matplotlib.pyplot as plt

from matplotlib import font_manager as fm
from matplotlib import rcParams

font_files = fm.findSystemFonts(fontpaths=None, fontext='ttf')

for font_file in font_files:
    fm.fontManager.addfont(font_file)

# {f.name for f in matplotlib.font_manager.fontManager.ttflist}

rcParams['font.family'] = ['Arial']

In [44]:
import sys

sys.path.append('/data_gilbert2/backups/aarab/CanDI')

from CanDI import candi as can

In [154]:
import cancer_data as candata

___

In [6]:
import igraph as ig

### Drug KG

In [7]:
!mkdir -p datasets

In [8]:
import pandas as pd

from tdc.multi_pred import DrugRes
from tdc.resource import PrimeKG

In [10]:
from tdc.utils.knowledge_graph import KnowledgeGraph

In [11]:
# Drug Response Prediction Task Overview
# Y is the log normalized IC50. This is the version 2 of GDSC, which uses improved experimental procedures.

# https://tdcommons.ai/multi_pred_tasks/drugres/

In [13]:
GDSC1 = DrugRes(name = 'GDSC1', path = './datasets/GDSC1')
GDSC2 = DrugRes(name = 'GDSC2', path = './datasets/GDSC2')

primekg = PrimeKG(path = './datasets/PrimeKG')

Found local copy...
Loading...
Done!
Found local copy...
Loading...
Done!
Found local copy...
Loading...


### DAC + X Drug

In [17]:
primekg.KG.copy()

<tdc.utils.knowledge_graph.KnowledgeGraph at 0x7fa94d954d60>

In [227]:
primekg_drug_target = primekg.KG.copy()

primekg_drug_target.run_query(query='relation == "drug_protein" & display_relation == "target"')

In [237]:
drugs = primekg.KG.get_nodes_by_source('DrugBank')

In [238]:
drugs

Unnamed: 0,id,type,name,source
0,DB09130,drug,Copper,DrugBank
1,DB09140,drug,Oxygen,DrugBank
2,DB00180,drug,Flunisolide,DrugBank
3,DB00240,drug,Alclometasone,DrugBank
4,DB00253,drug,Medrysone,DrugBank
...,...,...,...,...
7952,DB01486,drug,Cathine,DrugBank
7953,DB11104,drug,Sulfur hexafluoride,DrugBank
7954,DB00639,drug,Butoconazole,DrugBank
7955,DB00538,drug,Gadoversetamide,DrugBank


In [230]:
primekg_dac_synergy = primekg.KG.copy()

primekg_dac_synergy.run_query('(x_name == "Decitabine" | y_name == "Decitabine")&(display_relation == "synergistic interaction")')

In [231]:
primekg_dac_synergy_drug_names = primekg_dac_synergy.get_nodes_by_source(source='DrugBank').name.to_list()

___

see Figure 4B – https://biorxiv.org/content/10.1101/2022.12.14.518457v2

In [232]:
target_genes = [
    "PMPCA","RNF126","SLC7A6","DHODH","ZNF777","SQLE","MYBBP1A",
    "RBM14-RBM4","INTS5","INO80D",
    'BCL2'
] 
# + ['DNMT1']

In [233]:
target_genes

['PMPCA',
 'RNF126',
 'SLC7A6',
 'DHODH',
 'ZNF777',
 'SQLE',
 'MYBBP1A',
 'RBM14-RBM4',
 'INTS5',
 'INO80D',
 'BCL2']

In [28]:
primekg_drugs_for_combo = primekg_drug_target.copy()
primekg_drugs_for_combo.run_query(f'x_name in {target_genes} | y_name in {target_genes}')

In [29]:
primekg_drugs_for_combo.get_nodes_by_source('NCBI')

Unnamed: 0,id,type,name,source
0,596,gene/protein,BCL2,NCBI
1,1723,gene/protein,DHODH,NCBI
2,6713,gene/protein,SQLE,NCBI


In [30]:
primekg_dac_synergy_drugs_for_combo = primekg_drugs_for_combo.copy()
primekg_dac_synergy_drugs_for_combo.run_query(f'x_name in {primekg_dac_synergy_drug_names} | y_name in {primekg_dac_synergy_drug_names}')

In [170]:
primekg_dac_synergy_drugs_for_combo_list = primekg_dac_synergy_drugs_for_combo.get_nodes_by_source(
    source='DrugBank'
).name.to_list()

### 
prep a table for paper...

In [32]:
table_0 = primekg_drugs_for_combo.df.query('x_type=="drug"')#.set_index(['y_name','x_id'])[['x_name']]

table_0['dac_synergy'] = table_0.x_name.isin(primekg_dac_synergy_drugs_for_combo_list)
table_0.sort_values(['y_name','dac_synergy'],ascending=False,inplace=True)

In [33]:
table_1 = table_0[['y_name','x_id','x_name','dac_synergy']].rename(columns={'y_name':'target','x_id':'DrugBank','x_name':'Drug full name'}).set_index(['target','DrugBank'])

In [34]:
drug_targets = {}

for drug in table_1.reset_index().DrugBank:
    drug_kg = primekg_drug_target.copy()
    drug_kg.run_query(f'x_id == "{drug}" | y_id == "{drug}"')
    
    drug_targets[drug] = ','.join(drug_kg.get_nodes_by_source('NCBI').name.to_list())

table_1['drug_targets'] = drug_targets.values()

del drug_targets

In [35]:
table_1

Unnamed: 0_level_0,Unnamed: 1_level_0,Drug full name,dac_synergy,drug_targets
target,DrugBank,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
SQLE,DB00735,Naftifine,False,SQLE
SQLE,DB00857,Terbinafine,False,SQLE
SQLE,DB01091,Butenafine,False,SQLE
SQLE,DB08846,Ellagic acid,False,"CA1,CA2,CA4,SQLE,PRKCB,PRKCA,CSNK2A1,SYK,CA12,..."
DHODH,DB01097,Leflunomide,True,"DHODH,PTK2B,AHR"
DHODH,DB03523,Brequinar,True,DHODH
DHODH,DB08880,Teriflunomide,True,DHODH
DHODH,DB01117,Atovaquone,False,DHODH
DHODH,DB02262,Orotic acid,False,DHODH
DHODH,DB02613,Capric dimethyl amine oxide,False,"DHODH,PNLIPRP2"


## Drug -> AML
Finding links between drugs and AML phenotypes ...

In [443]:
dac_id = drugs[drugs.name.eq('Decitabine')].id.to_list()

dhodh_ids = table_1.reset_index().query('target == "DHODH"').DrugBank.to_list() # & drug_targets == "DHODH"
dhodh_names = drugs.set_index('id').loc[dhodh_ids]['name'].to_list()

bcl2_ids = table_1.reset_index().query('target == "BCL2" & drug_targets == "BCL2"').DrugBank.to_list()

In [444]:
table_1.reset_index().query('target == "DHODH" & drug_targets == "DHODH"')

Unnamed: 0,target,DrugBank,Drug full name,dac_synergy,drug_targets
5,DHODH,DB03523,Brequinar,True,DHODH
6,DHODH,DB08880,Teriflunomide,True,DHODH
7,DHODH,DB01117,Atovaquone,False,DHODH
8,DHODH,DB02262,Orotic acid,False,DHODH
11,DHODH,DB03480,Brequinar Analog,False,DHODH
13,DHODH,DB04281,"2-[4-(4-Chlorophenyl)Cyclohexylidene]-3,4-Dihy...",False,DHODH
14,DHODH,DB04583,"5-Carbamoyl-1,1':4',1''-terphenyl-3-carboxylic...",False,DHODH
15,DHODH,DB05125,SC12267,False,DHODH
16,DHODH,DB06481,Manitimus,False,DHODH
17,DHODH,DB07443,(2Z)-N-biphenyl-4-yl-2-cyano-3-hydroxybut-2-en...,False,DHODH


### CanDI
Cell line query

In [445]:
lu = can.Cancer("Leukemia", subtype='AML')

# Number of Leukemia lines
print(len(lu.depmap_ids))

54


In [446]:
# '","'.join(table_1['Drug full name'])

In [447]:
# cell_lines = ['HL-60','MOLM-13']

### GDSC ...
Drug-cell line

In [485]:
def create_dict_from_tuples(tuples):
    result_dict = {}
    for key, value in tuples:
        result_dict.setdefault(key, []).append(value)
    return result_dict

def search_gdsc_for_given_drug(q_drugs, cell_lines):
    GDSC = pd.concat([GDSC1.get_data(),GDSC2.get_data()])
    
    cell_lines_ol = list(set(cell_lines) & set(GDSC['Cell Line_ID']) )

    drug_cell = GDSC.query(
        f"Drug_ID in {q_drugs} & `Cell Line_ID` in {cell_lines_ol}"
    ).sort_values(['Drug_ID','Cell Line_ID'],ascending=False).set_index(['Drug_ID','Cell Line_ID'])[['Y']]

    if len(q_drugs) == 1:
        out = ', '.join([val for _,val in drug_cell.index.to_list()])
        
    if len(q_drugs) > 1:
        out = [(k,', '.join(val))for k,val in create_dict_from_tuples(drug_cell.index.to_list()).items()]
    
    return out

___

In [490]:
table_2 = table_1.copy()

table_2['AML cell lines in GDSC'] = [search_gdsc_for_given_drug([drug],lu.names) for drug in table_2['Drug full name'].to_list()]

In [491]:
table_2

Unnamed: 0_level_0,Unnamed: 1_level_0,Drug full name,dac_synergy,drug_targets,AML cell lines in GDSC
target,DrugBank,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
SQLE,DB00735,Naftifine,False,SQLE,
SQLE,DB00857,Terbinafine,False,SQLE,
SQLE,DB01091,Butenafine,False,SQLE,
SQLE,DB08846,Ellagic acid,False,"CA1,CA2,CA4,SQLE,PRKCB,PRKCA,CSNK2A1,SYK,CA12,...",
DHODH,DB01097,Leflunomide,True,"DHODH,PTK2B,AHR","TUR, SKM-1, OCI-M1, OCI-AML5, OCI-AML3, OCI-AM..."
DHODH,DB03523,Brequinar,True,DHODH,
DHODH,DB08880,Teriflunomide,True,DHODH,
DHODH,DB01117,Atovaquone,False,DHODH,
DHODH,DB02262,Orotic acid,False,DHODH,
DHODH,DB02613,Capric dimethyl amine oxide,False,"DHODH,PNLIPRP2",


In [492]:
table_2.to_excel('DAC_combo_candidates.xlsx')

___
#### DHODH
So, by looking at GDSC1 / GDSC2 it's obvious that there is only data for **Leflunomide**. But this drug is not specific to _DHODH_:

In [496]:
table_2.reset_index().query('`Drug full name` == "Leflunomide"').T

Unnamed: 0,4
target,DHODH
DrugBank,DB01097
Drug full name,Leflunomide
dac_synergy,True
drug_targets,"DHODH,PTK2B,AHR"
AML cell lines in GDSC,"TUR, SKM-1, OCI-M1, OCI-AML5, OCI-AML3, OCI-AM..."


In [222]:
# from tdc.multi_pred import DrugSyn
# OncoPolyPharmacology = DrugSyn(name = 'OncoPolyPharmacology',path='datasets')

### Efficacy and Safety

> After a compound is found to have high affinity to the target disease, it needs to have numerous drug-likeliness properties for it to be delivered safely and efficaciously to the human body. That is good ADME (Absorption, Distribution, Metabolism, and Execretion) properties. ADME datasets are scattered around the internet, there are several great resource on ADME prediction web services, but there is a limited set of organized data for machine learning scientists to build models upon and improve the model performances. In TDC first release, we collect 21 ADME datasets from various public sources such as eDrug3D, AqSolDB, Molecule Net, and various papers supplementary. You can find all the datasets by typing:

In [286]:
from tdc import utils
utils.retrieve_dataset_names('ADME')

['lipophilicity_astrazeneca',
 'solubility_aqsoldb',
 'hydrationfreeenergy_freesolv',
 'caco2_wang',
 'pampa_ncats',
 'approved_pampa_ncats',
 'hia_hou',
 'pgp_broccatelli',
 'bioavailability_ma',
 'vdss_lombardo',
 'cyp2c19_veith',
 'cyp2d6_veith',
 'cyp3a4_veith',
 'cyp1a2_veith',
 'cyp2c9_veith',
 'cyp2c9_substrate_carbonmangels',
 'cyp2d6_substrate_carbonmangels',
 'cyp3a4_substrate_carbonmangels',
 'bbb_martins',
 'ppbr_az',
 'half_life_obach',
 'clearance_hepatocyte_az',
 'clearance_microsome_az']

In [203]:
# from tdc.single_pred import ADME

> In addition to individual efficacy and safety, a drug can clash with each other to have adverse effects, i.e. drug-drug interactions (DDIs). This becomes more and more important as more people are taking combination of drugs for various diseases and it is impossible to screen the combination of all of them in wet lab, especially for higher-order combinations. In TDC, we include the DrugBank and TWOSIDES datasets for DDI. For DrugBank, instead of the standard binary dataset, we use the full multi-typed DrugBank where there are more than 80 DDI types:



In [497]:
from tdc.utils import get_label_map
from tdc.multi_pred import DDI

DrugBank = DDI(name = 'DrugBank',path='datasets')
label_map = get_label_map(name = 'DrugBank', task = 'DDI', path='datasets')

Found local copy...
Loading...
Done!


In [525]:
# tmp = DrugBank.get_data()[
#     (DrugBank.get_data().Drug1_ID.isin(table_2.reset_index().DrugBank.to_list())) &
#     (DrugBank.get_data().Drug2_ID.isin(table_2.reset_index().DrugBank.to_list()))
# ]

# tmp['label'] = [label_map[i] for i in tmp['Y']]
# tmp.set_index(['Drug1_ID','Drug2_ID'])['label']

In [526]:
DrugBank.get_data()[
    (DrugBank.get_data().Drug1_ID.isin(dac_id)) &
    (DrugBank.get_data().Drug2_ID.isin(table_2.reset_index().DrugBank.to_list()))
]

Unnamed: 0,Drug1_ID,Drug1,Drug2_ID,Drug2,Y


DAC has noting reported as drug-drug interactions (DDIs) with our query drugs.

# 

In [528]:
from watermark import watermark
print(
    watermark()
)
print('_'*80)
print(
    watermark(iversions=True, globals_=globals())
)

Last updated: 2023-12-10T03:40:18.309613-08:00

Python implementation: CPython
Python version       : 3.9.16
IPython version      : 8.14.0

Compiler    : GCC 11.3.0
OS          : Linux
Release     : 3.10.0-957.27.2.el7.x86_64
Machine     : x86_64
Processor   : x86_64
CPU cores   : 64
Architecture: 64bit

________________________________________________________________________________
sys        : 3.9.16 | packaged by conda-forge | (main, Feb  1 2023, 21:39:03) 
[GCC 11.3.0]
screenpro  : 0.2.5
seaborn    : 0.12.2
cancer_data: 0.3.5
igraph     : 0.10.4
pandas     : 1.5.3
anndata    : 0.9.1
matplotlib : 3.7.2
numpy      : 1.24.4

