# Predicting Chronic Disease Outcomes Using Electronic Health Records (EHRs)

## Objective
Develop a model to predict the outcomes of chronic diseases based on patient electronic health records.

## Dataset
[Public Health Data from the UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Heart+Disease) - This dataset includes various health indicators and outcomes for patients with chronic conditions.

## Problem Statement
Build a model to predict the long-term outcomes for patients with chronic diseases like heart disease or hypertension, helping clinicians make informed decisions about patient management.

## Evaluation Metrics
- Precision
- Recall
- F1 Score

In [1]:
# Data Collection
import pandas as pd

# Load the dataset
data_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data'
column_names = ["age", "sex", "cp", "trestbps", "chol", "fbs", "restecg", "thalach", "exang", "oldpeak", "slope", "ca", "thal", "target"]
df = pd.read_csv(data_url, names=column_names, na_values="?")

# Display the first few rows of the dataset
df.head()

In [2]:
# Data Preprocessing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Handle missing values
df = df.dropna()

# Separate features and target variable
X = df.drop('target', axis=1)
y = df['target']

# Feature scaling
scaler = StandardScaler()
scaled_features = scaler.fit_transform(X)

# Splitting data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(scaled_features, y, test_size=0.2, random_state=42)

In [3]:
# Exploratory Data Analysis (EDA)
import matplotlib.pyplot as plt
import seaborn as sns

# Visualizing target distribution
sns.countplot(x='target', data=df)
plt.title('Target Variable Distribution')
plt.show()

# Identifying patterns and anomalies
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.show()

In [4]:
# Model Selection
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.metrics import classification_report, precision_score, recall_score, f1_score

# Instantiate the models
rf_model = RandomForestClassifier(random_state=42)
svc_model = SVC(random_state=42, probability=True)

In [5]:
# Model Training
# Train the Random Forest model
rf_model.fit(X_train, y_train)

# Train the SVM model
svc_model.fit(X_train, y_train)

In [6]:
# Model Evaluation
def evaluate_model(model, X_test, y_test):
    predictions = model.predict(X_test)
    precision = precision_score(y_test, predictions)
    recall = recall_score(y_test, predictions)
    f1 = f1_score(y_test, predictions)
    
    print(f"Precision: {precision:.4f}")
    print(f"Recall: {recall:.4f}")
    print(f"F1 Score: {f1:.4f}")
    return precision, recall, f1

# Evaluate Random Forest model
print("Random Forest Model Performance:")
rf_metrics = evaluate_model(rf_model, X_test, y_test)

# Evaluate SVM model
print("\nSVM Model Performance:")
svc_metrics = evaluate_model(svc_model, X_test, y_test)

## Conclusion
In this notebook, we developed and evaluated machine learning models for predicting the outcomes of chronic diseases using electronic health records from the UCI Heart Disease dataset. The models were trained on patient data and evaluated for their performance in predicting long-term outcomes.

### Summary of Findings
- Random Forest and SVM models were trained and evaluated.
- Key evaluation metrics were considered: Precision, Recall, and F1 Score.

### Future Work
- Collect more diverse and extensive datasets.
- Experiment with more advanced machine learning models.
- Integrate real-time data analysis for continuous monitoring.