# ü©∫ Diabetes Prediction ‚Äî End-to-End Demo

This notebook demonstrates:
- Loading the trained model
- Running a prediction
- Viewing model metrics
- Running a RAG-enhanced explanation
- Visualizing ROC curve
- Visualizing feature importance

---
‚ö† **Note:** Replace API keys in `.env` before running LLM features.


In [None]:
import json
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc
from joblib import load
import os

# Load environment variables
from dotenv import load_dotenv
load_dotenv()

MODEL_PATH = "src/models/model.pkl"
model = load(MODEL_PATH)

print("Model loaded successfully.")

## üîç Example Input
Modify these values to test different predictions.

In [None]:
sample = {
    "Pregnancies": 2,
    "Glucose": 130,
    "BloodPressure": 70,
    "SkinThickness": 20,
    "Insulin": 80,
    "BMI": 30.5,
    "DiabetesPedigreeFunction": 0.45,
    "Age": 34
}

sample_df = pd.DataFrame([sample])
sample_df

## ü§ñ Run Prediction

In [None]:
prediction = model.predict(sample_df)[0]
proba = model.predict_proba(sample_df)[0][1]

print(f"Prediction: {prediction}")
print(f"Probability of Diabetes: {proba:.4f}")

## üìä ROC Curve Visualization

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
import joblib

# Load processed dataset (saved during training)
data = pd.read_csv("data/processed/processed_diabetes.csv")
X = data.drop("Outcome", axis=1)
y = data["Outcome"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
proba_test = model.predict_proba(X_test)[:, 1]

fpr, tpr, thresholds = roc_curve(y_test, proba_test)
roc_auc = auc(fpr, tpr)

plt.figure(figsize=(7, 7))
plt.plot(fpr, tpr, label=f'ROC Curve (AUC = {roc_auc:.3f}')
plt.plot([0, 1], [0, 1], "k--")
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend()
plt.grid(True)
plt.show()

## üî• Feature Importance (for tree models)

In [None]:
try:
    importance = model.feature_importances_
    plt.figure(figsize=(8, 5))
    plt.barh(X.columns, importance)
    plt.title("Feature Importance")
    plt.show()
except:
    print("Feature importance not available for this model.")

## üß† RAG Explanation (LLM-Enhanced)

This queries FAISS + LLM to generate a medical explanation for the prediction.

‚ö† You must configure your `.env` with your API key.


In [None]:
from src.rag import explain_prediction

response = explain_prediction(sample_df.iloc[0].to_dict())
print(response)