# Drug Classifier using Decision Trees

In this notebook, we will use machine learning algorithm, Decision Tree. We will use this classification algorithm to build a model from historical data of patients, and their response to different medications. Then we use the trained decision tree to predict the class of a unknown patient, or to find a proper drug for a new patient.

In [1]:
import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeClassifier

In [2]:
df = pd.read_csv('drug200.csv')
df.head()

Unnamed: 0,Age,Sex,BP,Cholesterol,Na_to_K,Drug
0,23,F,HIGH,HIGH,25.355,drugY
1,47,M,LOW,HIGH,13.093,drugC
2,47,M,LOW,HIGH,10.114,drugC
3,28,F,NORMAL,HIGH,7.798,drugX
4,61,F,LOW,HIGH,18.043,drugY


## Pre-Processing

### Feature Set

In [3]:
x = df[['Age','Sex','BP','Cholesterol','Na_to_K']].values
x[0:5]

array([[23, 'F', 'HIGH', 'HIGH', 25.355],
       [47, 'M', 'LOW', 'HIGH', 13.093],
       [47, 'M', 'LOW', 'HIGH', 10.113999999999999],
       [28, 'F', 'NORMAL', 'HIGH', 7.797999999999999],
       [61, 'F', 'LOW', 'HIGH', 18.043]], dtype=object)

In [4]:
y = df['Drug'].values
y[0:5]

array(['drugY', 'drugC', 'drugC', 'drugX', 'drugY'], dtype=object)

Some features in this dataset are categorical such as Sex or BP. Unfortunately, Sklearn Decision Trees do not handle categorical variables. But still we can convert these features to numerical values.

In [5]:
from sklearn.preprocessing import LabelEncoder

In [6]:
sex = LabelEncoder()
sex.fit(['F','M'])
x[:,1] = sex.transform(x[:,1])

In [7]:
bp = LabelEncoder()
bp.fit(['LOW','HIGH','NORMAL'])
x[:,2] = bp.transform(x[:,2])

In [8]:
chol = LabelEncoder()
chol.fit(['NORMAL','HIGH'])
x[:,3] = chol.transform(x[:,3])

## Train/Test Split

In [9]:
from sklearn.model_selection import train_test_split

In [10]:
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.3,random_state=3)
print ('Train set:', x_train.shape,  y_train.shape)
print ('Test set:', x_test.shape,  y_test.shape)

Train set: (140, 5) (140,)
Test set: (60, 5) (60,)


## Modeling

In [11]:
drugTree = DecisionTreeClassifier(criterion='entropy',max_depth=4)
drugTree

DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=4,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')

In [12]:
drugTree.fit(x_train,y_train)

DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=4,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')

## Prediction

In [13]:
yhat = drugTree.predict(x_test)
yhat[0:5]

array(['drugY', 'drugX', 'drugX', 'drugX', 'drugX'], dtype=object)

## Evaluation

In [14]:
from sklearn.metrics import accuracy_score

In [15]:
print("DecisionTrees's Accuracy: ",round(accuracy_score(y_test,yhat),3))

DecisionTrees's Accuracy:  0.983
