# 🧪 Notebook 6 — Model Deployment & Inference

**Objective:**  

Prepare the final, tuned model for deployment. This notebook will:
 
- Load the best model(s) from Notebook 5.  
  
- Build preprocessing + prediction pipeline.  
  
- Create functions for inference on new data.  
  
- Test predictions with sample inputs.  
  
- Visualize feature importance for interpretability.  
  
- Save the pipeline for deployment.
 
 
This ensures that new data can be fed into the model consistently and predictions are reproducible.

---


## 6.1 Load Tuned Model & Scaler

With the new approach, our saved pipelines already include scaling + model, so we no longer need to load a separate scaler.


In [15]:
import joblib
import pandas as pd

# Load best pipelines
pipeline_log_reg = joblib.load("models/tuned/best_log_reg_pipeline.pkl")
pipeline_rf      = joblib.load("models/tuned/best_rf_pipeline.pkl")

print("✅ Best model pipelines loaded successfully")

✅ Best model pipelines loaded successfully


---

## 6.2 Build Inference Pipeline

The function now directly takes raw input (as a DataFrame) and applies the pipeline end-to-end.

In [16]:
def predict_pipeline(model_pipeline, new_data):
    """
    Takes a pipeline (preprocessing + model) and a DataFrame of new data,
    returns predicted class and probability for heart disease.
    """
    pred_class = model_pipeline.predict(new_data)
    pred_proba = model_pipeline.predict_proba(new_data)[:, 1]
    return pred_class, pred_proba

---

In [19]:
# Look at the steps in your pipeline
print(pipeline_log_reg)


Pipeline(steps=[('scaler', StandardScaler()),
                ('log_reg',
                 LogisticRegression(C=0.01, max_iter=1000, random_state=42,
                                    solver='liblinear'))])


---

## 6.3 Test Predictions with Sample Data

We can test both Logistic Regression and Random Forest here. Users can input their own data in the same format as the training features.


In [18]:
# Example sample input (replace with realistic values)
sample_data = pd.DataFrame({
    "age": [55],
    "sex": [1],
    "cp": [3],
    "trestbps": [140],
    "chol": [220],
    "fbs": [0],
    "restecg": [1],
    "thalch": [150],
    "exang": [0],
    "oldpeak": [1.5],
})

# Logistic Regression prediction
pred_class_lr, pred_proba_lr = predict_pipeline(pipeline_log_reg, sample_data)
print("Prediction:", pred_class_lr[0])
print("Probability:", round(pred_proba_lr[0], 3))

# Random Forest prediction
pred_class_rf, pred_proba_rf = predict_pipeline(pipeline_rf, sample_data)
print("Random Forest Prediction:", pred_class_rf[0])
print("Probability:", round(pred_proba_rf[0], 3))

ValueError: The feature names should match those that were passed during fit.
Feature names unseen at fit time:
- cp
- restecg
- sex
Feature names seen at fit time, yet now missing:
- cp_atypical angina
- cp_non-anginal
- cp_typical angina
- restecg_normal
- restecg_st-t abnormality
- ...
