
# Machine Learning Classification Using Decision Trees
Using the decision tree classification algorithm to build a model from the historical data of patients and their response to various medications. Then use them to predict the category of an unknown patient, or to find a suitable drug for a new patient.

## About the dataset
The data set is centered on a group of patients, all with the same disease. During the course of treatment, each patient responded to one of the five drugs, drug A, drug B, drug C, drug x, and Y.

Train the model to find out which drug may be suitable for a future patient suffering from the same disease. The features of this data set are the age, gender, blood pressure and cholesterol of patients, the target is the drug to which each patient responded.

## Decision tree coding

In [1]:
# Importing necessary libraries
import sys
import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
import sklearn.tree as tree
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn import metrics
import matplotlib.pyplot as plt

# Suppressing warnings for cleaner output
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn

# Reading data from CSV file
my_data = pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-ML0101EN-SkillsNetwork/labs/Module%203/data/drug200.csv', delimiter=",")
print('the size of data:', my_data.shape)

# Extracting features and labels
X = my_data[['Age', 'Sex', 'BP', 'Cholesterol', 'Na_to_K']].values

# Encoding categorical features
le_sex = preprocessing.LabelEncoder()
le_sex.fit(['F','M'])
X[:,1] = le_sex.transform(X[:,1])

le_BP = preprocessing.LabelEncoder()
le_BP.fit([ 'LOW', 'NORMAL', 'HIGH'])
X[:,2] = le_BP.transform(X[:,2])

le_Chol = preprocessing.LabelEncoder()
le_Chol.fit([ 'NORMAL', 'HIGH'])
X[:,3] = le_Chol.transform(X[:,3])

y = my_data["Drug"]

# Splitting data into training and testing sets
X_trainset, X_testset, y_trainset, y_testset = train_test_split(X, y, test_size=0.3, random_state=3)
print('Shape of X training set {}'.format(X_trainset.shape),'&',' Size of Y training set {}'.format(y_trainset.shape))
print('Shape of X testing set {}'.format(X_testset.shape),'&',' Size of Y testing set {}'.format(y_testset.shape))

# Building and training the decision tree model
drugTree = DecisionTreeClassifier(criterion="entropy", max_depth = 4)
drugTree.fit(X_trainset,y_trainset)

# Making predictions on the test set
predTree = drugTree.predict(X_testset)
print (predTree [0:5])
print (y_testset [0:5])

# Calculating model accuracy
print("DecisionTrees's Accuracy: ", metrics.accuracy_score(y_testset, predTree))

the size of data: (200, 6)
Shape of X training set (140, 5) &  Size of Y training set (140,)
Shape of X testing set (60, 5) &  Size of Y testing set (60,)
['drugY' 'drugX' 'drugX' 'drugX' 'drugX']
40     drugY
51     drugX
139    drugX
197    drugX
170    drugX
Name: Drug, dtype: object
DecisionTrees's Accuracy:  0.9833333333333333


In [2]:
# Creating new data for predictions
new_data = pd.DataFrame({'Age': [25, 35, 45],
                         'Sex': ['F', 'M', 'F'],
                         'BP': ['NORMAL', 'HIGH', 'LOW'],
                         'Cholesterol': ['NORMAL', 'HIGH', 'NORMAL'],
                         'Na_to_K': [15.2, 10.1, 20.5]})

# Encoding categorical features in new data
new_data['Sex'] = le_sex.transform(new_data['Sex'])
new_data['BP'] = le_BP.transform(new_data['BP'])
new_data['Cholesterol'] = le_Chol.transform(new_data['Cholesterol'])

# Making predictions on new data
predictions = drugTree.predict(new_data)

print("Predictions for new data:")
for i, pred in enumerate(predictions):
    print(f"Prediction {i+1}: {pred}")

Predictions for new data:
Prediction 1: drugY
Prediction 2: drugA
Prediction 3: drugY
