### Author: Andrés Felipe Sánchez Arias
Date: Jun-03-2024
Last actualization: Jun-04-2024

#### Detecting Heart Disease with Logistic Regression

This Jupyter notebook presents a machine learning approach to detecting heart diseases using logistic regression.

The notebook demonstrates the importance of feature scaling for logistic regression by applying the StandardScaler to normalize the data. This ensures that each feature contributes equally to the model's decision-making process. The logistic regression model is then created and trained using the training data.

After training the logistic regression model, the notebook provides an interactive interface for users to input patient data. Finally, the notebook concludes by displaying the prediction result, indicating whether heart disease is detected or the patient's condition is deemed normal. 

In [18]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.discriminant_analysis import StandardScaler
import sys
import os
sys.path.append(os.path.abspath('../'))

In [19]:
# Path to the input CSV file
input_csv = '../csv/heart.csv'

In [20]:
# Read the CSV file into a DataFrame
data = pd.read_csv(input_csv)

In [21]:
# Display the first few rows of the DataFrame
data.head()

Unnamed: 0,Age,Sex,ChestPainType,RestingBP,Cholesterol,FastingBS,RestingECG,MaxHR,ExerciseAngina,Oldpeak,ST_Slope,HeartDisease
0,40,M,ATA,140,289,0,Normal,172,N,0.0,Up,0
1,49,F,NAP,160,180,0,Normal,156,N,1.0,Flat,1
2,37,M,ATA,130,283,0,ST,98,N,0.0,Up,0
3,48,F,ASY,138,214,0,Normal,108,Y,1.5,Flat,1
4,54,M,NAP,150,195,0,Normal,122,N,0.0,Up,0


In [23]:
# Extract features (x) and target variable (y)
x = data.drop("HeartDisease", axis=1)
y = data["HeartDisease"]

# Define categories for encoding categorical variables
categories = {
    "ChestPainType": ['ATA', 'NAP', 'ASY', 'TA'],
    "Sex": ['M', 'F'],
    "RestingECG": ['Normal', 'ST', 'LVH'],
    "ExerciseAngina": ['N', 'Y'],
    "ST_Slope": ['Up', 'Flat', 'Down']
}

# Convert categorical variables into dummy variables
x = pd.get_dummies(x, columns=categories.keys())

# Split the data into training and testing sets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

# Save the column names before scaling
x_train_columns = x_train.columns

# Scale the data using StandardScaler
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

# Create and train the logistic regression model with an increased number of iterations
logistic_classifier = LogisticRegression(max_iter=500)
logistic_classifier.fit(x_train, y_train)

# Function to get user input for patient data
def get_user_input():
    age = int(input("Enter age: "))
    sex = input("Enter sex (M/F): ").upper()
    chest_pain_type = input("Enter chest pain type (ATA/NAP/ASY/TA): ").upper()
    resting_bp = int(input("Enter resting blood pressure: "))
    cholesterol = int(input("Enter cholesterol: "))
    fasting_bs = int(input("Enter fasting blood sugar: "))
    resting_ecg = input("Enter resting ECG (Normal/ST/LVH): ")
    max_hr = int(input("Enter max heart rate: "))
    exercise_angina = input("Enter exercise angina (N/Y): ").upper()
    old_peak = float(input("Enter oldpeak: "))
    st_slope = input("Enter ST Slope (Up/Flat/Down): ").capitalize()
    
    return {
        "Age": age,
        "Sex": sex,
        "ChestPainType": chest_pain_type,
        "RestingBP": resting_bp,
        "Cholesterol": cholesterol,
        "FastingBS": fasting_bs,
        "RestingECG": resting_ecg,
        "MaxHR": max_hr,
        "ExerciseAngina": exercise_angina,
        "Oldpeak": old_peak,
        "ST_Slope": st_slope
    }

# Get patient data from user input
patient_data = get_user_input()

# Convert patient data into a DataFrame and encode categorical variables
patient_df = pd.DataFrame([patient_data])
patient_df = pd.get_dummies(patient_df, columns=categories.keys())

# Handle missing features in the patient data
missing_features = set(x_train_columns) - set(patient_df.columns)
for feature in missing_features:
    patient_df[feature] = 0
patient_df = patient_df[x_train_columns]

# Scale the patient data using the same scaler used for training data
patient_df = scaler.transform(patient_df)

# Make predictions using the logistic regression model
predictions = logistic_classifier.predict(patient_df)

# Output the prediction result
if predictions[0] == 1:
    print("¡Heart disease detected!")
else:
    print("Normal")

¡Heart disease detected!


#### The inputs used to test the model were as follows:

Enter age: 30

Enter sex (M/F): M

Enter chest pain type (ATA/NAP/ASY/TA): NAP

Enter resting blood pressure: 180

Enter cholesterol: 130

Enter fasting blood sugar: 1

Enter resting ECG (Normal/ST/LVH): Normal

Enter max heart rate: 75

Enter exercise angina (N/Y): Y

Enter oldpeak: 1.3

Enter ST Slope (Up/Flat/Down): Down