# **Dataset**

**Dataset:** Medical Researcher Compilation data

**Description:** Data about a set of 200 patients, all of whom suffered from the same illness. 
During their course of treatment, each patient responded to one of 5 medications, Drug a, Drug b, Drug c, Drug x and y.

**To find:** Which drug might be appropriate for a future patient with the same illness. 

**Features:** Age, Sex, Blood Pressure, and Cholesterol of patients

**Target:** The drug that each patient responded to. 

In [None]:
!wget -O drug200.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/drug200.csv

--2021-06-06 12:43:53--  https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/drug200.csv
Resolving s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)... 67.228.254.196
Connecting to s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)|67.228.254.196|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6027 (5.9K) [text/csv]
Saving to: ‘drug200.csv’


2021-06-06 12:43:54 (666 MB/s) - ‘drug200.csv’ saved [6027/6027]



In [None]:
import pandas as pd

my_data = pd.read_csv("drug200.csv", delimiter=",")
my_data.head

<bound method NDFrame.head of      Age Sex      BP Cholesterol  Na_to_K   Drug
0     23   F    HIGH        HIGH   25.355  drugY
1     47   M     LOW        HIGH   13.093  drugC
2     47   M     LOW        HIGH   10.114  drugC
3     28   F  NORMAL        HIGH    7.798  drugX
4     61   F     LOW        HIGH   18.043  drugY
..   ...  ..     ...         ...      ...    ...
195   56   F     LOW        HIGH   11.567  drugC
196   16   M     LOW        HIGH   12.006  drugC
197   52   M  NORMAL        HIGH    9.894  drugX
198   23   M  NORMAL      NORMAL   14.020  drugX
199   40   F     LOW      NORMAL   11.349  drugX

[200 rows x 6 columns]>

### **Data pre-processing**

In [None]:
X = my_data[['Age', 'Sex', 'BP', 'Cholesterol', 'Na_to_K']].values
X[0:5]


array([[23, 'F', 'HIGH', 'HIGH', 25.355],
       [47, 'M', 'LOW', 'HIGH', 13.093],
       [47, 'M', 'LOW', 'HIGH', 10.113999999999999],
       [28, 'F', 'NORMAL', 'HIGH', 7.797999999999999],
       [61, 'F', 'LOW', 'HIGH', 18.043]], dtype=object)

In [None]:
from sklearn import preprocessing
le_sex = preprocessing.LabelEncoder()
le_sex.fit(['F','M'])
X[:,1] = le_sex.transform(X[:,1]) 


le_BP = preprocessing.LabelEncoder()
le_BP.fit([ 'LOW', 'NORMAL', 'HIGH'])
X[:,2] = le_BP.transform(X[:,2])


le_Chol = preprocessing.LabelEncoder()
le_Chol.fit([ 'NORMAL', 'HIGH'])
X[:,3] = le_Chol.transform(X[:,3]) 

X[0:5]

array([[23, 0, 0, 0, 25.355],
       [47, 1, 1, 0, 13.093],
       [47, 1, 1, 0, 10.113999999999999],
       [28, 0, 2, 0, 7.797999999999999],
       [61, 0, 1, 0, 18.043]], dtype=object)

In [None]:
y = my_data["Drug"]
y[0:5]

0    drugY
1    drugC
2    drugC
3    drugX
4    drugY
Name: Drug, dtype: object

###  **Algorithm 1: KNN**

**Cross validation : 60-40**

In [None]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size= 0.4)
knn = KNeighborsClassifier(n_neighbors = 5)
knn.fit(X_train,Y_train)
predknn = knn.predict(X_test)

In [None]:
from sklearn import metrics


print("KNN's Accuracy: ", metrics.accuracy_score(Y_test, predknn))

KNN's Accuracy:  0.5


**Cross validation : 70-30**

In [None]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size= 0.3)
knn = KNeighborsClassifier(n_neighbors = 5)
knn.fit(X_train,Y_train)
predknn = knn.predict(X_test)

In [None]:
from sklearn import metrics
print("KNN's Accuracy: ", metrics.accuracy_score(Y_test, predknn))

KNN's Accuracy:  0.6333333333333333


**Cross validation : 80-20**

In [None]:

from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size= 0.2)
knn = KNeighborsClassifier(n_neighbors = 5)
knn.fit(X_train,Y_train)
predknn = knn.predict(X_test)

In [None]:
from sklearn import metrics
print("KNN's Accuracy: ", metrics.accuracy_score(Y_test, predknn))

KNN's Accuracy:  0.6


**Cross validation : 90-10**

In [None]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size= 0.1)
knn = KNeighborsClassifier(n_neighbors = 5)
knn.fit(X_train,Y_train)
predknn = knn.predict(X_test)

In [None]:
from sklearn import metrics
print("KNN's Accuracy: ", metrics.accuracy_score(Y_test, predknn))

KNN's Accuracy:  0.5


###  **Algorithm 2: Decision Tree**

**Cross validation : 60-40**

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size= 0.4)
drugTree = DecisionTreeClassifier(criterion="entropy", max_depth = 4)
drugTree.fit(X_train,Y_train)
predTree = drugTree.predict(X_test)

In [None]:
from sklearn import metrics
print("DecisionTrees's Accuracy: ", metrics.accuracy_score(Y_test, predTree))

DecisionTrees's Accuracy:  0.9625


**Cross validation : 70-30**

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size= 0.3,)
drugTree = DecisionTreeClassifier(criterion="entropy", max_depth = 4)
drugTree.fit(X_train,Y_train)
predTree = drugTree.predict(X_test)

In [None]:
from sklearn import metrics
print("DecisionTrees's Accuracy: ", metrics.accuracy_score(Y_test, predTree))

DecisionTrees's Accuracy:  0.9833333333333333


**Cross validation : 80-20**

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size= 0.2,)
drugTree = DecisionTreeClassifier(criterion="entropy", max_depth = 4)
drugTree.fit(X_train,Y_train)
predTree = drugTree.predict(X_test)

In [None]:
from sklearn import metrics
print("DecisionTrees's Accuracy: ", metrics.accuracy_score(Y_test, predTree))

DecisionTrees's Accuracy:  1.0


**Cross validation : 90-10**

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size= 0.1,)
drugTree = DecisionTreeClassifier(criterion="entropy", max_depth = 4)
drugTree.fit(X_train,Y_train)
predTree = drugTree.predict(X_test)

In [None]:
from sklearn import metrics
print("DecisionTrees's Accuracy: ", metrics.accuracy_score(Y_test, predTree))

DecisionTrees's Accuracy:  1.0


###  **Algorithm 3: Logistic Regression**

**Cross validation : 60-40**

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size= 0.4,)
LR = LogisticRegression(C=0.01, solver='liblinear')
LR.fit(X_train,Y_train)
predLR = LR.predict(X_test)

In [None]:
from sklearn import metrics
print("Logistic Regression's Accuracy: ", metrics.accuracy_score(Y_test, predLR))

DecisionTrees's Accuracy:  0.5625


**Cross validation : 70-30**

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size= 0.3,)
LR = LogisticRegression(C=0.01, solver='liblinear')
LR.fit(X_train,Y_train)
predLR = LR.predict(X_test)

In [None]:
from sklearn import metrics
print("Logistic Regression's Accuracy: ", metrics.accuracy_score(Y_test, predLR))

Logistic Regression's Accuracy:  0.5333333333333333


**Cross validation : 80-20**

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size= 0.2,)
LR = LogisticRegression(C=0.01, solver='liblinear')
LR.fit(X_train,Y_train)
predLR = LR.predict(X_test)

In [None]:
from sklearn import metrics
print("Logistic Regression's Accuracy: ", metrics.accuracy_score(Y_test, predLR))

Logistic Regression's Accuracy:  0.45


**Cross validation : 90-10**

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size= 0.1,)
LR = LogisticRegression(C=0.01, solver='liblinear')
LR.fit(X_train,Y_train)
predLR = LR.predict(X_test)

In [None]:
from sklearn import metrics
print("Logistic Regression's Accuracy: ", metrics.accuracy_score(Y_test, predLR))

Logistic Regression's Accuracy:  0.5


**To download as pdf**

In [None]:
!wget -nc https://raw.githubusercontent.com/brpy/colab-pdf/master/colab_pdf.py
from colab_pdf import colab_pdf
colab_pdf('Demo_DL.ipynb')


File ‘colab_pdf.py’ already there; not retrieving.





[NbConvertApp] Converting notebook /content/drive/MyDrive/Colab Notebooks/Demo_DL.ipynb to pdf
[NbConvertApp] Writing 49494 bytes to ./notebook.tex
[NbConvertApp] Building PDF
[NbConvertApp] Running xelatex 3 times: [u'xelatex', u'./notebook.tex', '-quiet']
[NbConvertApp] Running bibtex 1 time: [u'bibtex', u'./notebook']
[NbConvertApp] PDF successfully created
[NbConvertApp] Writing 34125 bytes to /content/drive/My Drive/Demo_DL.pdf


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

'File ready to be Downloaded and Saved to Drive'