# **📦 Stroke Prediction Model – Deployment Notebook**

### **🧠 Purpose:**
This notebook loads the trained Logistic Regression model and preprocessing pipeline from the project and defines a prediction function. This is the minimal setup for:

- Running **batch predictions**
- Serving predictions via **web app (Streamlit)**
- Ensuring that the pipeline and model behave as expected when exposed to new inputs

### **🎯 Goals:**
1. Load the saved model and preprocessing pipeline
2. Create functions to handle new patient data
3. Build a user-friendly web interface with Streamlit

### **📊 Deployment Flow:**
Raw Patient Data → Feature Engineering → Preprocessing → Model → Risk Prediction

### **🔁 What This Notebook Includes**
- Load saved model, pipeline, and feature list
- Define feature engineering class to recreate engineered features
- Define `predict_stroke_risk()` to generate predictions from raw patient input
- Run example predictions
- Instructions for Streamlit deployment


## **1️⃣ Setup: Import Required Libraries**

In [18]:
import pandas as pd
import numpy as np
import joblib
import json
from sklearn.base import BaseEstimator, TransformerMixin
import warnings
warnings.filterwarnings('ignore')

## **2️⃣ Load Model Artifacts and Saved Model Components**

This section loads the trained model, preprocessing pipeline, and list of final feature names. These artifacts were saved from the development notebook and are required to ensure consistent inference during deployment.

We saved three critical files during training:
1. **final_model.pkl**: The trained Logistic Regression model
2. **preprocessing_pipeline.pkl**: Handles scaling and encoding
3. **feature_names.json**: List of features the model expects

In [19]:
# Define FeatureSelector class (needed for loading the pipeline)
class FeatureSelector:
    def __init__(self, indices):
        self.indices = indices
    
    def fit(self, X, y=None):
        return self
    
    def transform(self, X):
        return X[:, self.indices]

print("✅ FeatureSelector class defined")

✅ FeatureSelector class defined


In [20]:
model = joblib.load("final_model_artifacts/final_model.pkl")
pipeline = joblib.load("final_model_artifacts/preprocessing_pipeline.pkl")
with open("final_model_artifacts/feature_names.json") as f:
    feature_names = json.load(f) 
print("✅ Corrected artifacts loaded")

✅ Corrected artifacts loaded


### **Info:**

* `preprocessor` → `pipeline` (name change to reflect it's now a 2-stage pipeline)
* The files themselves contain different content (7-feature pipeline vs 30-feature preprocessor)

## **3️⃣ Feature Engineering for New Patients**

### ❓ Why do we need this?
During training, we created engineered features like:
- `age_group` (categorizing age into buckets)
- `risk_factor_count` (counting total risk factors)
- `is_senior` (flagging elderly patients)

New patient data won't have these features, so we need to recreate them!

### 🔧 This class:
1. Takes raw patient data (age, BMI, etc.)
2. Creates all the engineered features we used in training
3. Ensures consistency between training and deployment

In [21]:
class StrokeFeatureEngineering:
    def transform(self, df):
        df = df.copy()
        
        df['age_group'] = pd.cut(df['age'], 
                                 bins=[0, 40, 65, 120],  
                                 labels=['young', 'middle', 'senior']) 
          
        df['bmi_category'] = pd.cut(df['bmi'].fillna(28.0), 
                                    bins=[0, 18.5, 25, 30, 100],
                                    labels=['underweight', 'normal', 'overweight', 'obese']) 
        

        df['glucose_category'] = pd.cut(df['avg_glucose_level'],
                                        bins=[0, 100, 126, 300],  
                                        labels=['normal', 'prediabetic', 'very_high']) 
        
        df['risk_factor_count'] = (
            (df['hypertension'] == 1).astype(int) +
            (df['heart_disease'] == 1).astype(int) +
            (df['smoking_status'].isin(['smokes', 'formerly smoked'])).astype(int) +
            (df['age'] > 65).astype(int) +
            (df['avg_glucose_level'] > 126).astype(int)
        )
        df['bmi_glucose_ratio'] = df['bmi'].fillna(28.0) / df['avg_glucose_level']
        df['bmi_missing'] = df['bmi'].isna().astype(int)
        df['is_senior'] = (df['age'] >= 65).astype(int)
        df['high_risk_group'] = ((df['age'] > 65) & 
                                ((df['hypertension'] == 1) | 
                                 (df['heart_disease'] == 1) | 
                                 (df['avg_glucose_level'] > 126))).astype(int)
        return df

## **4️⃣ Create Prediction Function:**

This function orchestrates the entire prediction process:
1. Accepts raw patient data
2. Applies feature engineering
3. Runs preprocessing pipeline
4. Gets model prediction
5. Returns user-friendly results

This function accepts raw input in dictionary or DataFrame form, applies preprocessing, and returns:
- `prediction`: whether stroke is predicted (0/1)
- `probability`: model confidence for stroke (0 to 1)

In [26]:
def predict_stroke_risk(patient_input, model, pipeline):
    """
    Updated prediction function that works with the new 7-feature pipeline
    """
    df = pd.DataFrame([patient_input])
    
    feature_engineer = StrokeFeatureEngineering()
    df_engineered = feature_engineer.transform(df)
    
    X_processed = pipeline.transform(df_engineered)
    
    probability = model.predict_proba(X_processed)[0][1]
    prediction = model.predict(X_processed)[0]
    
    if probability < 0.1:
        risk_level = "Low"
        label = "Low Stroke Risk"
    elif probability < 0.3:
        risk_level = "Moderate"
        label = "Low Stroke Risk"
    elif probability < 0.5:
        risk_level = "High"
        label = "Stroke Risk Detected" 
    else:
        risk_level = "Very High"
        label = "Stroke Risk Detected"
    
    return {
        "prediction": int(prediction),
        "probability": round(probability, 3),
        "risk_level": risk_level,
        "label": label 
    }

## **5️⃣ Test with Example Patient 🧪**

The following code simulates a new patient record and passes it through the deployed model pipeline. 

The following code tests the final deployment-ready prediction function using two realistic scenarios. his serves as a sanity check to ensure all artifacts were loaded correctly and the prediction process works end-to-end:

- **Case 1: High-risk elderly patient**  
- **Case 2: Low-risk young patient**

In [28]:
# Test case 1: High-risk elderly patient
test_patient_1 = {
    'gender': 'Female',
    'age': 75,
    'hypertension': 1,
    'heart_disease': 1,
    'ever_married': 'Yes',
    'work_type': 'Private',
    'Residence_type': 'Urban',
    'avg_glucose_level': 228.69,
    'bmi': 36.6,
    'smoking_status': 'formerly smoked'
}

result_1 = predict_stroke_risk(test_patient_1, model, pipeline)
print("🧓 TEST CASE 1: High-risk elderly patient")
print(json.dumps(result_1, indent=2))

# Test case 2: Low-risk young patient  
test_patient_2 = {
    'gender': 'Male',
    'age': 32,
    'hypertension': 0,
    'heart_disease': 0,
    'ever_married': 'No',
    'work_type': 'Private',
    'Residence_type': 'Urban',
    'avg_glucose_level': 85.5,
    'bmi': 23.1,
    'smoking_status': 'never smoked'
}

result_2 = predict_stroke_risk(test_patient_2, model, pipeline)  # Use 'pipeline', not 'preprocessor'
print("\n🧑 TEST CASE 2: Low-risk young patient")
print(json.dumps(result_2, indent=2))

🧓 TEST CASE 1: High-risk elderly patient
{
  "prediction": 1,
  "probability": 0.767,
  "risk_level": "Very High",
  "label": "Stroke Risk Detected"
}

🧑 TEST CASE 2: Low-risk young patient
{
  "prediction": 0,
  "probability": 0.112,
  "risk_level": "Moderate",
  "label": "Low Stroke Risk"
}


## **6️⃣ Deploy with Streamlit:**

Streamlit converts Python scripts into interactive web apps. No web development knowledge needed!

### 🚀 Deployment Steps:

1. **Save the Streamlit app code** as `streamlit_app.py`
2. **Install Streamlit**: `pip install streamlit`
3. **Run the app**: `streamlit run streamlit_app.py`
4. **Your browser will open** automatically at http://localhost:8501