# Drug Response Classification using Machine Learning

## 1. Business Context

Pharmaceutical companies invest millions of dollars in clinical trials to test drug efficacy. However, drug effectiveness varies across patients due to differences in biology, metabolism, and pre-existing medical conditions.

Traditional clinical trials:

Take several years

Are expensive

Carry high failure risk

Machine Learning models can help predict drug response early, saving both time and cost, while supporting personalized medicine.

## 2. Problem Statement

The goal of this project is to build a machine learning classification model that predicts whether a patient will respond positively to a drug based on clinical and biological features.

This is a binary classification problem.

## 3. Target Variable: Drug Response
| Label | Meaning           |
| ----- | ----------------- |
| 0     | No Response       |
| 1     | Positive Response |

#### Interpretation:

0 (No Response)

No improvement in patient condition

Biomarkers remain unchanged

Drug may be ineffective or dosage inappropriate

1 (Positive Response)

Patient shows improvement

Biomarkers improve significantly

Drug is effective for the patient


## 4. Importance of Drug Response Classification

Pharmaceutical Industry: Helps evaluate drug effectiveness before approval

Personalized Medicine: Enables patient-specific treatment

Healthcare ML: Supports predictive analytics for better outcomes

## 5. Dataset Description

The dataset contains patient-level clinical and biological data used to predict drug response.

Input Features: Patient medical and biological attributes

Target Column: Drug_Response (0 or 1)

## 6. Import Required Libraries

In [11]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report


## 7. Load the Dataset

In [16]:
df = pd.read_csv('Pharma_Industry.csv')
df.head()


Unnamed: 0,Drug Dosage (mg),Systolic Blood Pressure (mmHg),Heart Rate (BPM),Liver Toxicity Index (U/L),Blood Glucose Level (mg/dL),Drug Response
0,-0.128538,0.30328,-1.881849,0.258286,-0.792011,1
1,-1.846188,2.865142,-0.929511,2.866786,-0.719447,1
2,-1.252393,-1.541613,0.363632,-0.32537,0.191314,0
3,1.992515,-1.142779,-0.766657,0.975286,-0.823355,1
4,0.3771,0.53841,-0.029263,1.896015,-0.96013,1


## 8. Exploratory Data Analysis (EDA)

In [19]:
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 6 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   Drug Dosage (mg)                500 non-null    float64
 1   Systolic Blood Pressure (mmHg)  500 non-null    float64
 2   Heart Rate (BPM)                500 non-null    float64
 3   Liver Toxicity Index (U/L)      500 non-null    float64
 4   Blood Glucose Level (mg/dL)     500 non-null    float64
 5   Drug Response                   500 non-null    int64  
dtypes: float64(5), int64(1)
memory usage: 23.6 KB


In [21]:
df.describe()


Unnamed: 0,Drug Dosage (mg),Systolic Blood Pressure (mmHg),Heart Rate (BPM),Liver Toxicity Index (U/L),Blood Glucose Level (mg/dL),Drug Response
count,500.0,500.0,500.0,500.0,500.0,500.0
mean,-0.037761,0.214957,0.062871,0.054398,-0.171863,0.52
std,0.979891,1.247567,0.971978,0.986001,0.983765,0.5001
min,-3.019512,-3.773897,-2.940389,-3.401277,-3.110431,0.0
25%,-0.642003,-0.565168,-0.648157,-0.586085,-0.797715,0.0
50%,-0.01934,0.201532,0.027732,-0.065661,-0.108106,1.0
75%,0.641151,0.951375,0.710774,0.633914,0.513555,1.0
max,2.949094,4.111751,3.193108,3.373269,2.518023,1.0


In [25]:
df['Drug Response'].value_counts()


Drug Response
1    260
0    240
Name: count, dtype: int64

## 9. Data Preprocessing

In [30]:
X = df.drop('Drug Response', axis=1)
y = df['Drug Response']

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)


## 10. Train-Test Split

In [33]:
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.2, random_state=42
)


## 11. Model Building (Logistic Regression)

In [36]:
model = LogisticRegression()
model.fit(X_train, y_train)


## 12. Model Evaluation

In [39]:
y_pred = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))


Accuracy: 0.79
[[34 10]
 [11 45]]
              precision    recall  f1-score   support

           0       0.76      0.77      0.76        44
           1       0.82      0.80      0.81        56

    accuracy                           0.79       100
   macro avg       0.79      0.79      0.79       100
weighted avg       0.79      0.79      0.79       100



## 13. Conclusion

The model successfully predicts drug response using patient data.

Machine learning can significantly reduce clinical trial costs.

This approach supports early-stage drug evaluation and personalized medicine.

## 14. Future Scope

Try advanced models (Random Forest, XGBoost)

Perform feature importance analysis

Include genetic and lifestyle data

Improve accuracy using hyperparameter tuning