# üß¨ MedGuardian: Adherence Forecasting Research
## Final Academic Audit - BTech AIML

This notebook evaluates the predictive models used in the MedGuardian platform to forecast medication non-adherence. We analyze temporal behaviors, medication priorities, and user patterns to mitigate the risk of missed doses.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, roc_auc_score, confusion_matrix, roc_curve

# Load Research Dataset
df = pd.read_csv('../datasets/adherence_dataset.csv')
df.head()

### üìä Exploratory Data Analysis (EDA)
We analyze how time of day and day of week impact adherence rates.

In [None]:
plt.figure(figsize=(12, 5))
sns.barplot(x='hour', y='adherence_target', data=df, palette='viridis')
plt.title('Adherence Probability by Hour of Day')
plt.ylabel('Adherence Rate')
plt.show()

### üõ†Ô∏è Feature Engineering
Features used:
- `hour` (Temporal context)
- `day_of_week` (Weekly patterns)
- `is_weekend` (Lifestyle shifts)
- `priority` (Encoded: 2=High, 1=Normal, 0=Low)

In [None]:
priority_map = {'high': 2, 'normal': 1, 'low': 0}
df['priority_encoded'] = df['priority'].map(priority_map)

X = df[['hour', 'day_of_week', 'is_weekend', 'priority_encoded']]
y = df['adherence_target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(f"Training set size: {len(X_train)}")
print(f"Testing set size: {len(X_test)}")

### ü§ñ Model Training & Evaluation (Random Forest)
We use a Random Forest Classifier to handle non-linear relationships in behavioral data.

In [None]:
rf = RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42)
rf.fit(X_train, y_train)

y_pred = rf.predict(X_test)
y_prob = rf.predict_proba(X_test)[:, 1]

print("Classification Report:")
print(classification_report(y_test, y_pred))
print(f"ROC-AUC Score: {roc_auc_score(y_test, y_prob):.4f}")

### üìà ROC Curve
Visualizing the trade-off between sensitivity and specificity.

In [None]:
fpr, tpr, thresholds = roc_curve(y_test, y_prob)
plt.plot(fpr, tpr, label='Random Forest (AUC = {:.2f})'.format(roc_auc_score(y_test, y_prob)))
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc='lower right')
plt.show()