# Real-World Use Case: Heart Disease Risk Prediction

## 1. The Problem
A hospital wants a triage tool to flag patients at high risk of heart disease based on initial screening.
*   **Goal**: Estimate the probability of heart disease.

## 2. Why Logistic Regression?
*   **Binary Outcome**: Disease vs No Disease.
*   **Probability**: Doctors don't just want a "Yes/No", they want to know if it's 51% risk or 99% risk.
*   **Risk Factors**: Coefficients tell us which factors (e.g., Cholesterol) increase the log-odds of disease.

## 3. Data Simulation (Cleveland Heart Disease Proxy)
Features:
*   **Age**: Years.
*   **Cholesterol**: mg/dl.
*   **MaxHR**: Max Heart Rate.
*   **ChestPain**: Asymptomatic, Non-Anginal, Atypical, Typical (Categorical).
*   **Target**: 1 (Disease), 0 (No Disease).

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, classification_report, roc_auc_score

# 1. Generate Data
np.random.seed(0)
n = 300
age = np.random.randint(30, 80, n)
chol = np.random.normal(250, 50, n)
max_hr = np.random.normal(150, 20, n)
cp = np.random.choice(['Asymptomatic', 'Non-Anginal', 'Atypical', 'Typical'], n)

# Risk Formula (Sigmoid proxy)
risk_score = (age * 0.05) + (chol * 0.01) - (max_hr * 0.05) - 2
prob = 1 / (1 + np.exp(-risk_score))
target = [1 if p > np.random.rand() else 0 for p in prob]

df = pd.DataFrame({'Age': age, 'Chol': chol, 'MaxHR': max_hr, 'ChestPain': cp, 'Target': target})

# 2. Pipeline
numeric_features = ['Age', 'Chol', 'MaxHR']
categorical_features = ['ChestPain']

preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numeric_features),
        ('cat', OneHotEncoder(), categorical_features)
    ])

model = Pipeline(steps=[('preprocessor', preprocessor),
                        ('classifier', LogisticRegression())])

X = df.drop('Target', axis=1)
y = df['Target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model.fit(X_train, y_train)

# 3. Evaluate
y_pred = model.predict(X_test)
print("Classification Report:")
print(classification_report(y_test, y_pred))
print(f"ROC-AUC Score: {roc_auc_score(y_test, model.predict_proba(X_test)[:, 1]):.3f}")

# 4. Inference (Doctor's View)
patient = pd.DataFrame({'Age': [65], 'Chol': [280], 'MaxHR': [130], 'ChestPain': ['Typical']})
risk = model.predict_proba(patient)[0][1]
print(f"\nPatient Risk Probability: {risk:.2%}")