### [100920] ** Updated Paramater Values **

# Mechanisms of Action Predictions 

As per [National Cancer Institute](https://www.cancer.gov/publications/dictionaries/cancer-terms/def/mechanism-of-action), **Mechanism of Action (MoA)** is a term used to describe how a drug or other substance produces an effect in the body. For example, a drug’s mechanism of action could be how it affects a specific target in a cell, such as an enzyme, or a cell function, such as cell growth. Knowing the mechanism of action of a drug may help provide information about the safety of the drug and how it affects the body. It may also help identify the right dose of a drug and which patients are most likely to respond to treatment. Also called MOA.


In this notebook, we'll predict multiple targets of the **Mechanism of Action (MoA)** responses of different samples. Samples are drugs profiled at different time points and doses. The dataset consists of various group of features and there are more than two hundred targets of enzymes and receptors.

We'll use XGBoost Regressor with specific parameter values to make our predictions. 

## Libraries

In [None]:
'''Libraries'''

import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, median_absolute_error, mean_squared_error,accuracy_score
from xgboost import XGBRegressor

import warnings
warnings.filterwarnings('ignore')

## Data

In [None]:
'''Data'''

#Sample
sample = pd.read_csv("../input/lish-moa/sample_submission.csv")

#Test
test_features = pd.read_csv("../input/lish-moa/test_features.csv",index_col='sig_id')

#Train
train_features = pd.read_csv("../input/lish-moa/train_features.csv",index_col='sig_id')
train_nonscore = pd.read_csv("../input/lish-moa/train_targets_nonscored.csv",index_col='sig_id')
train_score = pd.read_csv("../input/lish-moa/train_targets_scored.csv",index_col='sig_id')

## Features

* sig_id is the unique sample id
* Features with g- prefix are gene expression features and there are 772 of them (from g-0 to g-771)
* Features with c- prefix are cell viability features and there are 100 of them (from c-0 to g-99)
* cp_type is a binary categorical feature which indicates the samples are treated with a compound or with a control perturbation (trt_cp or ctl_vehicle)
* cp_time is a categorical feature which indicates the treatment duration (24, 48 or 72 hours)
* cp_dose is a binary categorical feature which indicates the dose is low or high (D1 or D2)

In [None]:
g_features = [feature for feature in train_features.columns if feature.startswith('g-')]
c_features = [feature for feature in train_features.columns if feature.startswith('c-')]
other_features = [feature for feature in train_features.columns if feature not in g_features and feature not in c_features]
                                                            

print(f'Number of g- Features: {len(g_features)}')
print(f'Number of c- Features: {len(c_features)}')
print(f'Number of Other Features: {len(other_features)} ({other_features})')

## Model Building & Training

In [None]:
cols = train_score.columns
submission = pd.DataFrame({'sig_id': test_features.index})
total_loss = 0

SEED = 42

In [None]:
'''Build Model & Traning'''

for c, column in enumerate(cols,1):
    
    y = train_score[column]
    
    # Split
    X_train_full, X_valid_full, y_train, y_valid = train_test_split(train_features, y, train_size=0.9, test_size=0.1, random_state=SEED)
    X_train = X_train_full.copy()
    X_valid = X_valid_full.copy()
    X_test = test_features.copy()

    # One-hot encode the data (to shorten the code, we use pandas)
    X_train = pd.get_dummies(X_train)
    X_test = pd.get_dummies(X_test)
    X_valid = pd.get_dummies(X_valid)
    
    X_train, X_test = X_train.align(X_test, join='left', axis=1)
    X_train, X_valid = X_train.align(X_valid, join='left', axis=1)
    
    
    # Define Regressor Model
    model = XGBRegressor(
                         tree_method = 'gpu_hist',
                         min_child_weight = 31.580,
                         learning_rate = 0.055,
                         colsample_bytree = 0.655,
                         gamma = 3.705,
                         max_delta_step = 2.080,
                         max_depth = 25,
                         n_estimators = 170,
                         #subsample =  0.864, 
                         subsample =  0.910,
                         booster='dart',
                         validate_parameters = True,
                         grow_policy = 'depthwise',
                         predictor = 'gpu_predictor'
                              
                        )
                        
    # Train Model
    model.fit(X_train, y_train)
    pred = model.predict(X_valid)
    
    # Loss
    mae = mean_absolute_error(y_valid,pred)
    mdae = median_absolute_error(y_valid,pred)
    mse = mean_squared_error(y_valid,pred)
    
    total_loss += mae
    
    # Prediction
    predictions = model.predict(X_test)
    submission[column] = predictions
    
    print("Regressing through col-"+str(c)+", Mean Abs Error: "+str(mae)+", Median Abs Error: "+str(mdae)+", Mean Sqrd Error: "+str(mse))




In [None]:
print("Loss: ", total_loss/206)

In [None]:
# Saving the submission
submission.to_csv('submission.csv', index=False)