### SVM

### Advantages
* Classification and regression
* Linear and nonlinear
* Complex data

### Disadvantages
* Not suitable for large datasets 
* Training time
* Kernel*

### How does it works?

![alt text](images/svm.png) 


#### Objective: 
##### Build a model to predict "Drug Like" properties of a single compound.

#### Data: 
##### ADME descriptors for 3 libraries. Libraries: AFRODB Biofacquim FDA

#### Endpoint:
##### Drug Like (Binary)
* 1 -> Drug Like
* 0 -> No Drug Like

#### Descriptors:
#####    ADME descriptors:
* Aromatic heavy atoms
* H-bond acceptors
* H-bond donors
* Heavy atoms
* Rotatable bonds
* Ali Log S
* Ali Solubility (mg/ml)

#### Method: 
##### Support Vector Machine

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
Data = pd.read_csv("data/Data_SVM.csv", sep = ",")

In [None]:
Data.head()

In [None]:
#Select numerical variables
numerical_data  = Data.select_dtypes(np.number)

In [None]:
#High correlated variables
corr_var =   ['XLOGP3', 'iLOGP', 'log Kp (cm/s)', 'Silicos-IT LogSw',
             'Ali Solubility (mol/l)','Ali Solubility (mg/ml)' , 
                'Consensus Log P', 'ESOL Solubility (mg/ml)', 'Unnamed: 54']

In [None]:
#Drop correlated variables
numerical_data = numerical_data.drop(columns=corr_var)
#Drop Target variable
numerical_data = numerical_data.drop("Drug Like", axis =1)

In [None]:
numerical_data.head()

In [None]:
#Save target column in a new DF
df_target = pd.DataFrame(Data['Drug Like'],columns=['Drug Like'])
df_target

## Machine Learning Model

#### SVM

In [None]:
#Train Test Split
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(numerical_data, np.ravel(df_target), test_size = 0.30, random_state=101)

In [None]:
#Import Support Vector Classifier
from sklearn.svm import SVC

In [None]:
#Assign Model
model = SVC()

In [None]:
#Train model
model.fit(X_train,y_train)

### Predictions

##### Now let's  predict If a Molecule has"Drug Like" properties using the trained model.

In [None]:
#Write a function to select descriptors for a single compound
def test_compound(Library, Name):
    FDA = Data[Data["Library"]== Library]
    test = FDA[FDA["Name"]== Name]
    test = test[numerical_data.columns]
    return test

In [None]:
test1 = test_compound("FDA", "Acetaminophen")
test1

In [None]:
#Predic result for test compound (test variable)
model.predict(test1)

In [None]:
test2 = test_compound("FDA", "Ambroxol")
model.predict(test2)

In [None]:
test3 = test_compound("Biofacquim", "Purgic_acid_A")
model.predict(test3)

#### The  kernel transforms an input data space into the required form.

In [None]:
#Try with a different kernel
model_2 = SVC(kernel="linear")
model_2.fit(X_train,y_train)

In [None]:
test1 = test_compound("FDA", "Acetaminophen")
model_2.predict(test1)

#### Evaluate the model¶

In [None]:
predictions = model.predict(X_test)

In [None]:
#import metrics
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import f1_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import roc_curve, auc, roc_auc_score

In [None]:
#accuravy
accuracy_score(y_test,predictions)

![alt text](images/confusion_matrix.png) 

In [None]:
#Compute confusion matrix
confusion_matrix(y_test,predictions)

In [None]:
#precision
precision_score(y_test, predictions),

In [None]:
#f1
f1_score(y_test, predictions)

In [None]:
roc_curve(y_test, predictions)

In [None]:
y_score = model.decision_function(X_test)

In [None]:
#ROC curve
fpr, tpr,  _= roc_curve(y_test, y_score)
roc_auc = auc(fpr, tpr)
plt.figure()
lw = 2
plt.plot(fpr, tpr, color='purple', lw=lw, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='darkcyan', lw=lw, linestyle='--')
plt.xlim([-0.01, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC curve')
plt.legend(loc="lower right")
plt.show()

In [None]:
#AUC
print(roc_auc_score(y_test, predictions))