In [9]:
import time
import pickle
import torch
import torch.nn          as nn
import numpy             as np
import pandas            as pd
import matplotlib.pyplot as plt 
import seaborn           as sns

from torch.utils.data import Dataset, DataLoader

from config import (
    PATH_TO_FEATURES,
    PATH_TO_SAVED_DRUG_FEATURES
)

torch.manual_seed(42)
sns.set_theme(style="white")

---

# Experiments on the `GraphTab` approach

In this notebook we are going to expirment the approach of 
- replacing the cell-line branch by a GNN and 
- having the drug branch using tabular input.

In [2]:
# Reading the cell-line gene graphs.
with open(f'{PATH_TO_FEATURES}cl_graphs_as_dict.pkl', 'rb') as f:
    cl_graphs = pickle.load(f)

# Reading the drug response matrix.
with open(f'{PATH_TO_FEATURES}drugs_sparse.pkl', 'rb') as f: 
    drug_cl = pickle.load(f)

In [8]:
print(f"Number of cell-lines/graphs: {len(list(cl_graphs.keys()))}")
print(cl_graphs['22RV1'])

Number of cell-lines/graphs: 983
Data(x=[4, 858], edge_index=[2, 83126])


In [4]:
print(f"Shape: {drug_cl.shape}")
drug_cl.head(10)

Shape: (310904, 4)


Unnamed: 0,CELL_LINE_NAME,DRUG_ID,DATASET,LN_IC50
190089,201T,133,GDSC1,-3.770673
198783,201T,134,GDSC1,-0.81418
207405,201T,135,GDSC1,-0.29805
216171,201T,136,GDSC1,-4.472378
224883,201T,140,GDSC1,-5.332884
233550,201T,147,GDSC1,4.680281
242370,201T,150,GDSC1,2.754322
251217,201T,151,GDSC1,1.99259
261873,201T,152,GDSC1,2.29966
271251,201T,153,GDSC1,-1.83753


The cell-line gene dataset is basically ready to go. It "only" needs to be transformed to a pytorch `Data` class. The graph per cell-line will be used as input to the GNN cell-line branch of the bi-modal model. However, the drug datasets is the drug response matrix. It doesn't contain the drug features. In the following subsection we will obtain the [SMILES fingerprints](https://jcheminf.biomedcentral.com/articles/10.1186/s13321-020-00445-4) for each unique `DRUG_ID`. This will later be used as the input to the drug branch of the bi-modal model.

### Transform drugs to SMILES fingerprints

In [11]:
with open(f'{PATH_TO_SAVED_DRUG_FEATURES}drug_name_fingerprints_dataframe.pkl', 'rb') as f:
    drug_name_fps = pickle.load(f)
drug_name_fps.set_index(['drug_name'], inplace=True)
print(drug_name_fps.shape)
drug_name_fps.head(5)

(367, 256)


Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9,...,246,247,248,249,250,251,252,253,254,255
drug_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
(5Z)-7-Oxozeaenol,1,0,0,1,1,0,0,0,0,0,...,0,0,0,0,0,1,1,0,1,0
5-Fluorouracil,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
A-443654,0,1,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,1
A-770041,1,1,0,0,0,1,0,0,0,0,...,0,0,0,1,0,1,0,0,0,0
A-83-01,0,0,0,1,1,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
