## Drug Discovery -- Mechanism of Action

### Gene Expression

Gene expression is the process by which information from a gene is used to synthesize a functional gene product, i.e. a protein. These proteins ultimately create a person's phenotype, which is the observable traits that are expressed through someone's genotype. With the correct molecular formulation, genetic transcription pathways can be inhibited or catalyzed by a given medication, and by manipulating these pathways humans are able to alter the chemistry within our bodies and fight cancer or treat hypertension for example.

Recording and cataloging gene expression data is especially important for pharmaceutical development since the goal of any medication is to modulate a transcriptional pathway, and through repetition or trials, trends may be observed to determine its safety or lack thereof within an in vitro and then in-human setting.   

### Cell Viability 

Cell viability is a measurement of the total live, healthy cells within a given sample. Assays are used to quantify factors such as metabolic activity, presence of ATP and cell proliferation, and also the toxicity or markers signifying the death of a cell. When introducing investigational compounds within an in-vitro environment, possessing the ability to quantify the enhancement or inhibition of certain cellular processes becomes extremely important because these metrics are used to scale the effectiveness and/or harmfulness of the compound within the human body. Understanding how well a compound is absorbed may be of particular concern to clinical researchers, because a negative downstream effect could be blood toxicity due to compounds that cannot be metabolized, or a positive downstream effect  from proper absorption could be the proliferation of healthy cells vs. harmful cells.  

For example, a PD-L1 checkpoint inhibitor is a class of drugs meant to interupt the binding of PD-L1 to a PD-1 receptor. Cancer cells express the PD-L1 protein and they use it to bind to an immune cell's PD-1 receptor, thus helping the cancer cells avoid being detected as a foreign/threatening object. So PD-L1 receptors act as an inhibiting force, preventing the binding of the molecules and leaving the cancer cells open to immune system eradication.  

In [8]:
import pandas as pd 

In [9]:
test_features = pd.read_csv('test_features.csv')
train_features = pd.read_csv('train_features.csv')
tt_nonscored = pd.read_csv('train_targets_nonscored.csv')
tt_scored = pd.read_csv('train_targets_scored.csv')

In [10]:
train_features.head()

Unnamed: 0,sig_id,cp_type,cp_time,cp_dose,g-0,g-1,g-2,g-3,g-4,g-5,...,c-90,c-91,c-92,c-93,c-94,c-95,c-96,c-97,c-98,c-99
0,id_000644bb2,trt_cp,24,D1,1.062,0.5577,-0.2479,-0.6208,-0.1944,-1.012,...,0.2862,0.2584,0.8076,0.5523,-0.1912,0.6584,-0.3981,0.2139,0.3801,0.4176
1,id_000779bfc,trt_cp,72,D1,0.0743,0.4087,0.2991,0.0604,1.019,0.5207,...,-0.4265,0.7543,0.4708,0.023,0.2957,0.4899,0.1522,0.1241,0.6077,0.7371
2,id_000a6266a,trt_cp,48,D1,0.628,0.5817,1.554,-0.0764,-0.0323,1.239,...,-0.725,-0.6297,0.6103,0.0223,-1.324,-0.3174,-0.6417,-0.2187,-1.408,0.6931
3,id_0015fd391,trt_cp,48,D1,-0.5138,-0.2491,-0.2656,0.5288,4.062,-0.8095,...,-2.099,-0.6441,-5.63,-1.378,-0.8632,-1.288,-1.621,-0.8784,-0.3876,-0.8154
4,id_001626bd3,trt_cp,72,D2,-0.3254,-0.4009,0.97,0.6919,1.418,-0.8244,...,0.0042,0.0048,0.667,1.069,0.5523,-0.3031,0.1094,0.2885,-0.3786,0.7125


In [11]:
train_features.dtypes.nunique()


3

In [12]:
dfs = [test_features, train_features]

In [13]:
# Drop 'sig_id' column from features dataframes, and convert strings and objects to integer values


def col_drop(df):
    df = df.drop(columns=['sig_id'], axis=1, inplace=True)
    return df

def cleaner(df):
    df['cp_type'] = df['cp_type'].map({'ctl_vehicle': 0, 'trt_cp': 1})
    df['cp_time'] = df['cp_time'].map({24: 1, 48: 2, 72: 3})
    df['cp_dose'] = df['cp_dose'].map({'D1': 0 , 'D2': 1})
    return df


for df in dfs:
    col_drop(df)
    cleaner(df)

In [14]:
train_features.dtypes.nunique()

2

In [15]:
train_features.head()

# Key:
# ['cp_type'] = whether data is from control or active groups
#     0 = control
#     1 = active
# ['cp_time'] = treatment duration time
#     1 = 24hrs
#     2 = 48hrs
#     3 = 72hrs
# ['cp_dose'] = treatment strength
#     0 = low
#     1 = high 

Unnamed: 0,cp_type,cp_time,cp_dose,g-0,g-1,g-2,g-3,g-4,g-5,g-6,...,c-90,c-91,c-92,c-93,c-94,c-95,c-96,c-97,c-98,c-99
0,1,1,0,1.062,0.5577,-0.2479,-0.6208,-0.1944,-1.012,-1.022,...,0.2862,0.2584,0.8076,0.5523,-0.1912,0.6584,-0.3981,0.2139,0.3801,0.4176
1,1,3,0,0.0743,0.4087,0.2991,0.0604,1.019,0.5207,0.2341,...,-0.4265,0.7543,0.4708,0.023,0.2957,0.4899,0.1522,0.1241,0.6077,0.7371
2,1,2,0,0.628,0.5817,1.554,-0.0764,-0.0323,1.239,0.1715,...,-0.725,-0.6297,0.6103,0.0223,-1.324,-0.3174,-0.6417,-0.2187,-1.408,0.6931
3,1,2,0,-0.5138,-0.2491,-0.2656,0.5288,4.062,-0.8095,-1.959,...,-2.099,-0.6441,-5.63,-1.378,-0.8632,-1.288,-1.621,-0.8784,-0.3876,-0.8154
4,1,3,1,-0.3254,-0.4009,0.97,0.6919,1.418,-0.8244,-0.28,...,0.0042,0.0048,0.667,1.069,0.5523,-0.3031,0.1094,0.2885,-0.3786,0.7125


In [16]:
test_features.head()

# Key:
# ['cp_type'] = whether data is from control or active groups
#     0 = control
#     1 = active
# ['cp_time'] = treatment duration time
#     1 = 24hrs
#     2 = 48hrs
#     3 = 72hrs
# ['cp_dose'] = treatment strength
#     0 = low
#     1 = high

Unnamed: 0,cp_type,cp_time,cp_dose,g-0,g-1,g-2,g-3,g-4,g-5,g-6,...,c-90,c-91,c-92,c-93,c-94,c-95,c-96,c-97,c-98,c-99
0,1,1,0,-0.5458,0.1306,-0.5135,0.4408,1.55,-0.1644,-0.214,...,0.0981,0.7978,-0.143,-0.2067,-0.2303,-0.1193,0.021,-0.0502,0.151,-0.775
1,1,3,0,-0.1829,0.232,1.208,-0.4522,-0.3652,-0.3319,-1.882,...,-0.119,-0.1852,-1.031,-1.367,-0.369,-0.5382,0.0359,-0.4764,-1.381,-0.73
2,0,1,0,0.1852,-0.1404,-0.3911,0.131,-1.438,0.2455,-0.339,...,-0.2261,0.337,-1.384,0.8604,-1.953,-1.014,0.8662,1.016,0.4924,-0.1942
3,1,1,1,0.4828,0.1955,0.3825,0.4244,-0.5855,-1.202,0.5998,...,0.126,0.157,-0.1784,-1.12,-0.4325,-0.9005,0.8131,-0.1305,0.5645,-0.5809
4,1,2,0,-0.3979,-1.268,1.913,0.2057,-0.5864,-0.0166,0.5128,...,0.4965,0.7578,-0.158,1.051,0.5742,1.09,-0.2962,-0.5313,0.9931,1.838
