# COPD Hospital Admission Prediction Project

This project's goal is to predict the likelihood of hospital admission for patients with Chronic Obstructive Pulmonary Disease (COPD).
By implementing machine learning, we collect patient severity levels on symptoms and medical history that can assist healthcare professionals
in decision-making.

## Goals:
1. Preprocess the dataset to make it ready for machine learning.
2. Train a logistic regression model to predict hospital admissions.
3. Evaluate the model using metrics such as precision, recall, F1-score, and ROC-AUC.
4. Save the trained model for deployment.



## PART 1: Import Libraries
Load all needed libraries for app to function

In [2]:
# Importing necessary libraries
import os
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report, roc_auc_score
import joblib


## PART 2: Load the Dataset
This part loads the preprocessed dataset and displays the first five rows to view the current data structure.


In [4]:
# Dynamically locate the project root and construct the file path
from google.colab import drive
drive.mount('/content/drive')


notebook_dir = os.path.abspath(os.getcwd())
base_dir = os.path.abspath(os.path.join(notebook_dir, '..'))  # Go up one directory
data_file = "/content/drive/MyDrive/Data/copd_data_preprocessed.csv"

# Load the data
try:
    df = pd.read_csv(data_file)
    print("Data loaded successfully!")
    display(df.head())
except FileNotFoundError as e:
    print(f"Error: {e}")
    print(f"Ensure the file exists at: {data_file}")


Mounted at /content/drive
Data loaded successfully!


Unnamed: 0,Shortness_of_Breath,Cough_Intensity,Chest_Tightness,Wheezing,Fatigue,Age,Smoking_History,Comorbidities,Mucus_Amount,Mucus_Color,Fever_Last_2_Weeks,Respiratory_Rate,Hospital_Admission
0,-1.210553,0.134556,1.259235,1.519242,-1.205659,-1.460718,2,0,0.33153,2,0,-1.378438,1
1,0.170568,1.161704,-0.798899,1.519242,1.234946,1.097208,2,2,-1.525783,0,0,1.627848,1
2,0.515849,0.476939,-0.112854,-1.252465,1.583604,1.523529,0,1,-0.597126,0,1,1.405161,1
3,-1.555833,1.504086,1.602257,-0.906002,0.53763,0.244566,2,0,-1.21623,0,0,-0.59903,1
4,-0.865272,-0.207826,-0.798899,-1.598928,0.886288,-1.318611,1,1,1.569739,0,1,0.069033,1


## PART 3: Preprocess the Data
This part preprocesses the dataset, including splitting into training, testing sets, and scaling the fields to ingest into the machine learning model.


In [6]:
# Splitting features and target variable
X = df.drop(columns=["Hospital_Admission"])  # Replace with actual target column name
y = df["Hospital_Admission"]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardizing the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print("Data preprocessing completed.")


Data preprocessing completed.


## PART 4: Train the Model
This part includes training the logistic regression model on the provided dataset and prepares it for analysis.


In [7]:
# Initializing and training the logistic regression model
model = LogisticRegression(
    class_weight="balanced",  # Handling imbalanced data
    max_iter=5000,
    solver="liblinear"
)
model.fit(X_train_scaled, y_train)

print("Model training completed.")


Model training completed.


## PART 5: Evaluate the Model
This part evaluates the model using accuracy, precision, recall, and F1-score metrics. It also generates a confusion matrix and its values, and implementing a description of the data using Pandas dataframe describe() function.


In [19]:
# Making predictions
y_pred = model.predict(X_test_scaled)
y_proba = model.predict_proba(X_test_scaled)[:, 1]  # Probability scores

# Confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)

# Classification report
class_report = classification_report(y_test, y_pred)
print("\n\nClassification Report:")
print(class_report)

# ROC-AUC Score
roc_auc = roc_auc_score(y_test, y_proba)
print(f"\nROC-AUC Score: {roc_auc}\n\n")

print("Pandas DataFrame description:\n")
print(df.describe())


Confusion Matrix:
[[41 19]
 [43 97]]


Classification Report:
              precision    recall  f1-score   support

           0       0.49      0.68      0.57        60
           1       0.84      0.69      0.76       140

    accuracy                           0.69       200
   macro avg       0.66      0.69      0.66       200
weighted avg       0.73      0.69      0.70       200


ROC-AUC Score: 0.7485714285714286


Pandas DataFrame description:

       Shortness_of_Breath  Cough_Intensity  Chest_Tightness      Wheezing  \
count         1.000000e+03     1.000000e+03     1.000000e+03  1.000000e+03   
mean         -7.194245e-17    -6.039613e-17     1.136868e-16 -6.394885e-17   
std           1.000500e+00     1.000500e+00     1.000500e+00  1.000500e+00   
min          -1.555833e+00    -1.577356e+00    -1.484943e+00 -1.598928e+00   
25%          -8.652723e-01    -8.925910e-01    -7.988987e-01 -9.060017e-01   
50%           1.705684e-01     1.345563e-01    -1.128543e-01  1.333884e-01 

## PART 6: Save the Model
Ready the model by saving for future deployment.

In [12]:
# Saving the trained model for deployment
model_file = os.path.join(base_dir, "model", "logistic_model.pkl")

try:
    os.makedirs(os.path.join(base_dir, "model"), exist_ok=True)  # Ensure the directory exists
    joblib.dump(model, model_file)
    print(f"Model saved at: {model_file}")
except Exception as e:
    print(f"Error saving model: {e}")


Model saved at: /model/logistic_model.pkl


## PART 7: Load Model
Test that the model is functional after saving and does not require retraining.


In [13]:
# Loading the saved model
try:
    loaded_model = joblib.load(model_file)
    test_proba = loaded_model.predict_proba(X_test_scaled)[:, 1]
    print("Model loaded successfully.")
    print(f"Sample Prediction Probability: {test_proba[:5]}")
except Exception as e:
    print(f"Error loading model: {e}")


Model loaded successfully.
Sample Prediction Probability: [0.59382701 0.48894685 0.46230041 0.72851429 0.72599083]


## Part 8: Visit the UI on Heroku
https://copd-hospital-assessment-2902c73cfbc3.herokuapp.com/