# Prediction on Hospital Admission Likelihood for COPD Patients
This notebook demonstrates the development, training, and evaluation of a logistic regression model to predict the likelihood of hospital admission for individuals with COPD based on the reported severity level of certain COPD symptoms and relevant health categories.


## PART 1: Import Libraries
This part imports all the libraries I used for data preprocessing, model training, and analysis.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from train_model import train_model
from scripts.preprocessing import preprocess_data
from scripts.evaluation import evaluate_model




## PART 2: Load the Dataset
This part loads the preprocessed dataset and displays the first few rows to verify the data structure.


In [2]:
# Load dataset
data_file = "../copd_data_preprocessed.csv"  # Ensure the path matches your file location
df = pd.read_csv(data_file)

# Display the first 5 rows of the dataset
df.head()


FileNotFoundError: [Errno 2] No such file or directory: '../copd_data_preprocessed.csv'

## PART 3: Preprocess the Data
This part preprocesses the dataset, including splitting into training, testing sets, and scaling the fields.


In [None]:
# Preprocess the data
X_train, X_test, y_train, y_test = preprocess_data(df)

# Check the shape of the data
print("Training data shape:", X_train.shape)
print("Test data shape:", X_test.shape)


## PART 4: Train the Model
This part includes training the logistic regression model on the provided dataset and prepares it for analysis.


In [None]:
# Train the logistic regression model
model, scaler = train_model(X_train, y_train)

print("Model training done!!!!!!!")


## PART 5: Evaluate the Model
This part evaluates the model using the accuracy, precision, recall, and F1-score metrics. It also generates a confusion matrix and its values.


In [None]:
# Evaluate the model
evaluation_results = evaluate_model(model, X_test, y_test)

# Display results
print("Confusion Matrix:")
print(evaluation_results["confusion_matrix"])

print("\nClassification Report:")
print(evaluation_results["classification_report"])



## PART 6: Visualization of the Confusion Matrix
This confusion matrix gives a visual of the model's performance by displaying the number of true positives, true negatives, false positives, and false negatives.


In [3]:
# Import visualization libraries
cm = evaluation_results["confusion_matrix"]

# Create a heatmap for the confusion matrix
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['No Admission', 'Admission'], yticklabels=['No Admission', 'Admission'])
plt.title("Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()


NameError: name 'evaluation_results' is not defined

## PART 7: Analysis of the Results
The evaluation results give the model's performance:
- Accuracy: Measures the overall correctness of the model.
- F1-Score: A balance between precision and recall.


In [None]:
print(f"Model Accuracy: {evaluation_results['accuracy'] * 100:.2f}%")
print(f"F1-Score: {evaluation_results['classification_report']['1']['f1-score'] * 100:.2f}%")
