# CardioGuard Risk Model Training

This notebook demonstrates the training of the Logistic Regression model used to derive the risk coefficients for the CardioGuard AI engine.

**Dataset:** UCI Heart Disease Dataset (Simulated/Loaded)
**Target:** Heart Disease Presence (0/1)

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
import pickle

# 1. Load Dataset (Simulating standard UCI headers)
# In a real scenario: df = pd.read_csv('heart.csv')
data = {
    'age': np.random.randint(29, 80, 500),
    'sex': np.random.randint(0, 2, 500), # 1=Male, 0=Female
    'cp': np.random.randint(0, 4, 500),  # Chest Pain Type
    'trestbps': np.random.randint(94, 200, 500), # Resting BP
    'chol': np.random.randint(126, 564, 500), # Cholesterol
    'fbs': np.random.randint(0, 2, 500), # Fasting Blood Sugar > 120
    'thalach': np.random.randint(71, 202, 500), # Max Heart Rate
    'target': np.random.randint(0, 2, 500) # 1=Disease, 0=No Disease
}

df = pd.DataFrame(data)

# 2. Preprocessing
X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 3. Model Training
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# 4. Evaluation
y_pred = model.predict(X_test)
print("Model Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

# 5. Feature Importance (Used to tune `mlService.js`)
importance = pd.DataFrame({'feature': X.columns, 'coefficient': model.coef_[0]})
print("\nDerived Coefficients (Used in Node.js backend):\n", importance.sort_values(by='coefficient', ascending=False))

# 6. Export
with open('cardioguard_model.pkl', 'wb') as f:
    pickle.dump(model, f)
print("Model saved to cardioguard_model.pkl")