 **AI Assignment 3**

*Description: Build an ANN model for Drug classification.*

This project aims to analyze the relationship between various medical parameters and drug effectiveness.
The dataset consists of patient information, including age, sex, blood pressure levels (BP), cholesterol
levels, sodium-to-potassium ratio (Na_to_K), drug type, and corresponding labels. The goal is to develop
a model that can accurately predict the class or category of a given drug based on its features.


*1: Read the dataset and do data pre-processing*

Import necessary libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

In [2]:
# Read the dataset
data = pd.read_csv("drug200.csv")
print(data)

     Age Sex      BP Cholesterol  Na_to_K   Drug
0     23   F    HIGH        HIGH   25.355  DrugY
1     47   M     LOW        HIGH   13.093  drugC
2     47   M     LOW        HIGH   10.114  drugC
3     28   F  NORMAL        HIGH    7.798  drugX
4     61   F     LOW        HIGH   18.043  DrugY
..   ...  ..     ...         ...      ...    ...
195   56   F     LOW        HIGH   11.567  drugC
196   16   M     LOW        HIGH   12.006  drugC
197   52   M  NORMAL        HIGH    9.894  drugX
198   23   M  NORMAL      NORMAL   14.020  drugX
199   40   F     LOW      NORMAL   11.349  drugX

[200 rows x 6 columns]


In [3]:
column_names = data.columns
print("Column Names:")
print(column_names)

Column Names:
Index(['Age', 'Sex', 'BP', 'Cholesterol', 'Na_to_K', 'Drug'], dtype='object')


*Check for Missing Values:*

In [4]:
# Check for Missing Values
print(data.isnull().sum())

Age            0
Sex            0
BP             0
Cholesterol    0
Na_to_K        0
Drug           0
dtype: int64


*Data encoding*

In [5]:
# Encode categorical variables
label_encoder = LabelEncoder()
data['Sex'] = label_encoder.fit_transform(data['Sex'])
data['BP'] = label_encoder.fit_transform(data['BP'])
data['Cholesterol'] = label_encoder.fit_transform(data['Cholesterol'])
# One-hot encode the 'Drug' column
data = pd.get_dummies(data, columns=['Drug'])

print(data)

     Age  Sex  BP  Cholesterol  Na_to_K  Drug_DrugY  Drug_drugA  Drug_drugB  \
0     23    0   0            0   25.355        True       False       False   
1     47    1   1            0   13.093       False       False       False   
2     47    1   1            0   10.114       False       False       False   
3     28    0   2            0    7.798       False       False       False   
4     61    0   1            0   18.043        True       False       False   
..   ...  ...  ..          ...      ...         ...         ...         ...   
195   56    0   1            0   11.567       False       False       False   
196   16    1   1            0   12.006       False       False       False   
197   52    1   2            0    9.894       False       False       False   
198   23    1   2            1   14.020       False       False       False   
199   40    0   1            1   11.349       False       False       False   

     Drug_drugC  Drug_drugX  
0         False      

*Split dataset into features and labels*

In [6]:
# Split dataset into features and labels
X = data.drop(columns=['Drug_drugX', 'Drug_DrugY', 'Drug_drugA', 'Drug_drugB', 'Drug_drugC'])
y = data[['Drug_drugX', 'Drug_DrugY', 'Drug_drugA', 'Drug_drugB', 'Drug_drugC']]

*Split dataset into training and testing sets*

In [7]:
# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

*Standardize features*

In [8]:
# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

*2: Build the ANN model with (input layer, min 3 hidden layers & output layer)*

In [10]:
from tensorflow.keras.layers import Input

# Define input shape
input_shape = X_train.shape[1]

# Build the ANN model
model = Sequential([
    Input(shape=(input_shape,)),  # Input layer
    Dense(64, activation='relu'),
    Dense(128, activation='relu'),
    Dense(64, activation='relu'),
    Dense(5, activation='softmax')  # Output layer with 5 neurons for 5 drug types
])


In [11]:
# Compile the model
optimizer = Adam()  # Using recommended optimizer module
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

In [12]:
# Train the model
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_test, y_test))

Epoch 1/50
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 36ms/step - accuracy: 0.1675 - loss: 1.6592 - val_accuracy: 0.3750 - val_loss: 1.4985
Epoch 2/50
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.6690 - loss: 1.4106 - val_accuracy: 0.5750 - val_loss: 1.3320
Epoch 3/50
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.7274 - loss: 1.2244 - val_accuracy: 0.6000 - val_loss: 1.1945
Epoch 4/50
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.6975 - loss: 1.0998 - val_accuracy: 0.6000 - val_loss: 1.0655
Epoch 5/50
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.7132 - loss: 0.9527 - val_accuracy: 0.6250 - val_loss: 0.9427
Epoch 6/50
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.6717 - loss: 0.8637 - val_accuracy: 0.6750 - val_loss: 0.8282
Epoch 7/50
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0

<keras.src.callbacks.history.History at 0x1d44cf57c70>

*3: Test the model with random data*

In [13]:
# Define feature names
feature_names = ['Age', 'Sex', 'BP', 'Cholesterol', 'Na_to_K']

# Fit the scaler with feature names
scaler.fit(X_train, feature_names)

# Test the model with random data
random_data = np.array([[45, 1, 2, 0, 3.5]])  # Example of random data
random_data_scaled = scaler.transform(random_data)
prediction = model.predict(random_data_scaled)
predicted_class = np.argmax(prediction)
print("Predicted class:", predicted_class)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 44ms/step
Predicted class: 1
