# Heart Disease Prediction - Model Deployment Guide
## Using the Hierarchical Classifier for Web Application

This notebook demonstrates:
1. How to load the trained hierarchical model
2. Understanding input data format and requirements
3. Making predictions with sample patient data
4. Interpreting prediction outputs
5. Integration guide for Flask/FastAPI web application

## 1. Setup and Load Required Artifacts

In [89]:
# Import required libraries
import pandas as pd
import numpy as np
import pickle
import warnings
warnings.filterwarnings('ignore')

print("Libraries imported successfully")

Libraries imported successfully


In [90]:
# Define HierarchicalClassifier class (must match the training definition)
class HierarchicalClassifier:
    """
    Two-stage hierarchical classifier:
    Stage 1: Binary (Disease vs No Disease)
    Stage 2: Multi-class (Severity Level for Disease cases)
    """
    def __init__(self, binary_model, multiclass_model):
        self.binary_model = binary_model
        self.multiclass_model = multiclass_model
    
    def predict(self, X):
        # Stage 1: Binary prediction
        binary_pred = self.binary_model.predict(X)
        
        # Initialize final predictions
        final_pred = np.zeros(len(X), dtype=int)
        
        # Stage 2: For disease cases, predict severity
        disease_mask = binary_pred == 1
        if disease_mask.sum() > 0:
            X_disease = X[disease_mask]
            severity_pred = self.multiclass_model.predict(X_disease)
            final_pred[disease_mask] = severity_pred
        
        return final_pred
    
    def predict_proba(self, X):
        # Get binary probabilities
        binary_proba = self.binary_model.predict_proba(X)
        
        # Initialize final probabilities (5 classes: 0-4)
        final_proba = np.zeros((len(X), 5))
        
        # No disease probability (class 0)
        final_proba[:, 0] = binary_proba[:, 0]
        
        # Disease cases
        disease_prob = binary_proba[:, 1]
        
        # Get multi-class probabilities
        multi_proba = self.multiclass_model.predict_proba(X)
        
        # Distribute disease probability across severity levels
        for i in range(1, 5):
            final_proba[:, i] = disease_prob * multi_proba[:, i]
        
        return final_proba

print("HierarchicalClassifier class defined")

HierarchicalClassifier class defined


In [91]:
# Load preprocessing artifacts
with open('../data/processed/preprocessing_artifacts.pkl', 'rb') as f:
    preprocessing_artifacts = pickle.load(f)

print("Loaded Preprocessing Artifacts:")
print(f"  - Scaler: {type(preprocessing_artifacts['scaler']).__name__}")
print(f"  - KNN Imputer: {type(preprocessing_artifacts['knn_imputer']).__name__}")
print(f"  - Label Encoders: {len(preprocessing_artifacts['label_encoders'])} encoders")
print(f"  - Feature Names: {len(preprocessing_artifacts['feature_names'])} features")
print(f"  - Numeric Features: {preprocessing_artifacts['numeric_features']}")
print(f"  - Categorical Features: {preprocessing_artifacts['categorical_features']}")

Loaded Preprocessing Artifacts:
  - Scaler: StandardScaler
  - KNN Imputer: KNNImputer
  - Label Encoders: 8 encoders
  - Feature Names: 18 features
  - Numeric Features: ['age', 'trestbps', 'chol', 'thalch', 'oldpeak']
  - Categorical Features: ['sex', 'cp', 'fbs', 'restecg', 'exang', 'slope', 'ca', 'thal']


In [92]:
# Load the hierarchical model
with open('../models/hierarchical_classifier.pkl', 'rb') as f:
    hierarchical_model = pickle.load(f)

print("Hierarchical model loaded successfully")
print(f"  - Binary Model: {type(hierarchical_model.binary_model).__name__}")
print(f"  - Multi-class Model: {type(hierarchical_model.multiclass_model).__name__}")

Hierarchical model loaded successfully
  - Binary Model: XGBClassifier
  - Multi-class Model: GradientBoostingClassifier


In [93]:
# Load model metadata
with open('../models/model_metadata.pkl', 'rb') as f:
    metadata = pickle.load(f)

print("Model Metadata:")
print(f"  - Best Binary Model: {metadata['best_binary_model_name']}")
print(f"  - Binary F1-Score: {metadata['best_binary_f1']:.4f}")
print(f"  - Best Multi-class Model: {metadata['best_multiclass_model_name']}")
print(f"  - Multi-class F1-Score: {metadata['best_multiclass_f1']:.4f}")
print(f"  - Hierarchical F1-Score: {metadata['hierarchical_f1']:.4f}")
print(f"  - Total Features: {len(metadata['feature_names'])}")

Model Metadata:
  - Best Binary Model: XGBoost (Tuned)
  - Binary F1-Score: 0.8692
  - Best Multi-class Model: Gradient Boosting
  - Multi-class F1-Score: 0.5634
  - Hierarchical F1-Score: 0.6048
  - Total Features: 18


## 2. Understanding Input Data Format

### Raw Input Features (What the User Provides)

The model expects **13 clinical features** from the user:

#### Numeric Features (5):
1. **age**: Age in years (e.g., 63)
2. **trestbps**: Resting blood pressure in mm Hg (e.g., 145)
3. **chol**: Serum cholesterol in mg/dl (e.g., 233)
4. **thalch**: Maximum heart rate achieved (e.g., 150)
5. **oldpeak**: ST depression induced by exercise relative to rest (e.g., 2.3)

#### Categorical Features (8):
6. **sex**: Gender
   - Values: 'Male' or 'Female'
7. **cp**: Chest pain type
   - Values: 'typical angina', 'atypical angina', 'non-anginal', 'asymptomatic'
8. **fbs**: Fasting blood sugar > 120 mg/dl
   - Values: 'TRUE' or 'FALSE' (or True/False)
9. **restecg**: Resting electrocardiographic results
   - Values: 'normal', 'lv hypertrophy', 'st-t abnormality'
10. **exang**: Exercise-induced angina
    - Values: 'TRUE' or 'FALSE' (or True/False)
11. **slope**: Slope of peak exercise ST segment
    - Values: 'upsloping', 'flat', 'downsloping'
12. **ca**: Number of major vessels colored by fluoroscopy
    - Values: '0', '1', '2', '3', '4' (or integers 0-4)
13. **thal**: Thalassemia test result
    - Values: 'normal', 'fixed defect', 'reversable defect'

### Important Notes:
- **Missing Values**: Can be provided as `None`, `np.nan`, or empty string - the model handles them automatically
- **Case Sensitivity**: Categorical values are case-insensitive (will be standardized)
- **Type Flexibility**: Numeric values can be provided as strings (will be converted)

## 3. Create Prediction Function

In [94]:
def preprocess_input(raw_input_dict):
    """
    Preprocess raw user input to model-ready format.
    
    Parameters:
    -----------
    raw_input_dict : dict
        Dictionary with 14 clinical features
    
    Returns:
    --------
    X_processed : numpy array
        Processed feature array ready for model prediction
    """
    # Define identifier features to drop (not used in prediction)
    identifier_features = ['id', 'dataset']
    
    # Create DataFrame from input
    df = pd.DataFrame([raw_input_dict])
    
    # Drop identifier columns if present
    for col in identifier_features:
        if col in df.columns:
            df = df.drop(columns=[col])
    
    # Convert numeric features
    numeric_features = preprocessing_artifacts['numeric_features']
    for col in numeric_features:
        if col in df.columns:
            df[col] = pd.to_numeric(df[col], errors='coerce')
    
    # Standardize categorical values (handle case sensitivity)
    categorical_features = preprocessing_artifacts['categorical_features']
    for col in categorical_features:
        if col in df.columns:
            df[col] = df[col].astype(str).str.lower()
    
    # Separate features
    X = df.copy()
    
    # Create missing indicators for high-missing features (BEFORE encoding)
    high_missing_features = ['ca', 'thal', 'slope']
    for col in high_missing_features:
        if col in X.columns:
            X[f'{col}_missing'] = (X[col].isnull() | (X[col] == 'none') | (X[col] == 'nan')).astype(int)
    
    # Encode categorical variables
    label_encoders = preprocessing_artifacts['label_encoders']
    for col in categorical_features:
        if col in X.columns:
            le = label_encoders.get(col)
            if le is not None:
                # Get valid classes from encoder (lowercase for comparison)
                valid_classes = [str(c).lower() for c in le.classes_]
                
                # Map values to valid classes or use mode
                def map_to_valid_class(x):
                    if pd.isna(x) or str(x).lower() == 'nan' or str(x).lower() == 'none':
                        # Use the most common class (index 0 is usually the mode after encoding)
                        return le.classes_[0]
                    x_lower = str(x).lower()
                    if x_lower in valid_classes:
                        # Find the original case version
                        idx = valid_classes.index(x_lower)
                        return le.classes_[idx]
                    else:
                        # Unknown category - use mode (first class)
                        return le.classes_[0]
                
                X[col] = X[col].apply(map_to_valid_class)
                X[col] = le.transform(X[col])
            else:
                # No encoder available, fill with 0
                X[col] = 0
    
    # Convert all to numeric
    for col in X.columns:
        if X[col].dtype == 'object':
            X[col] = pd.to_numeric(X[col], errors='coerce')
    
    # Simple imputation for any remaining missing values (median for numeric)
    # For single-sample prediction, we use safe defaults rather than KNN
    imputation_values = {
        'age': 55,  # median age
        'trestbps': 130,  # median blood pressure
        'chol': 240,  # median cholesterol
        'thalch': 150,  # median heart rate
        'oldpeak': 0.8,  # median ST depression
    }
    
    for col in X.columns:
        if X[col].isnull().any():
            if col in imputation_values:
                X[col] = X[col].fillna(imputation_values[col])
            else:
                X[col] = X[col].fillna(0)
    
    # NOW do feature engineering
    # 1. Age groups
    if 'age' in X.columns:
        X['age_group'] = pd.cut(X['age'], bins=[0, 45, 60, 75, 100], labels=[0, 1, 2, 3])
        X['age_group'] = X['age_group'].astype(float)
        X['age_group'] = X['age_group'].fillna(1)  # Default to middle group
    
    # 2. Blood pressure categories
    if 'trestbps' in X.columns:
        X['bp_category'] = pd.cut(X['trestbps'], bins=[0, 120, 130, 140, 200], labels=[0, 1, 2, 3])
        X['bp_category'] = X['bp_category'].astype(float)
        X['bp_category'] = X['bp_category'].fillna(1)  # Default to normal-high
    
    # 3. Cholesterol categories
    if 'chol' in X.columns:
        X['chol_category'] = pd.cut(X['chol'], bins=[0, 200, 240, 500], labels=[0, 1, 2])
        X['chol_category'] = X['chol_category'].astype(float)
        X['chol_category'] = X['chol_category'].fillna(1)  # Default to borderline
    
    # 4. Heart rate reserve
    if 'age' in X.columns and 'thalch' in X.columns:
        predicted_max_hr = 220 - X['age']
        X['hr_reserve'] = predicted_max_hr - X['thalch']
    
    # 5. Cardiovascular risk score
    risk_score = 0
    if 'trestbps' in X.columns:
        risk_score += (X['trestbps'] > 140).astype(int) * 2
    if 'chol' in X.columns:
        risk_score += (X['chol'] > 240).astype(int) * 2
    if 'fbs' in X.columns:
        risk_score += X['fbs']
    X['cv_risk_score'] = risk_score
    
    # Fill any remaining NaN
    X = X.fillna(0)
    
    # Ensure all expected features are present
    expected_features = preprocessing_artifacts['feature_names']
    for feature in expected_features:
        if feature not in X.columns:
            X[feature] = 0
    
    # Reorder columns to match training
    X = X[expected_features]
    
    # Scale features
    scaler = preprocessing_artifacts['scaler']
    X_scaled = scaler.transform(X)
    
    return X_scaled

print("Preprocessing function defined")

Preprocessing function defined


In [95]:
def predict_heart_disease(patient_data):
    """
    Predict heart disease severity for a patient.
    
    Parameters:
    -----------
    patient_data : dict
        Dictionary containing 14 clinical features
    
    Returns:
    --------
    result : dict
        Dictionary containing:
        - severity_level: int (0-4)
        - severity_label: str
        - risk_category: str
        - confidence: float (probability)
        - recommendation: str
    """
    # Preprocess input
    X_processed = preprocess_input(patient_data)
    
    # Get prediction
    severity_level = hierarchical_model.predict(X_processed)[0]
    
    # Get prediction probabilities
    probabilities = hierarchical_model.predict_proba(X_processed)[0]
    confidence = probabilities[severity_level]
    
    # Map severity to labels
    severity_labels = {
        0: 'No Heart Disease',
        1: 'Mild Heart Disease',
        2: 'Moderate Heart Disease',
        3: 'Severe Heart Disease',
        4: 'Very Severe Heart Disease'
    }
    
    # Risk categories
    risk_categories = {
        0: 'Low Risk',
        1: 'Moderate Risk',
        2: 'High Risk',
        3: 'Very High Risk',
        4: 'Critical Risk'
    }
    
    # Recommendations
    recommendations = {
        0: 'Maintain healthy lifestyle. Regular check-ups recommended.',
        1: 'Consult with your doctor. Lifestyle modifications and monitoring recommended.',
        2: 'Medical attention needed. Schedule appointment with cardiologist for evaluation.',
        3: 'Urgent medical attention required. Immediate cardiology consultation recommended.',
        4: 'CRITICAL: Immediate emergency medical attention required. Go to ER or call emergency services.'
    }
    
    result = {
        'severity_level': int(severity_level),
        'severity_label': severity_labels[severity_level],
        'risk_category': risk_categories[severity_level],
        'confidence': float(confidence),
        'recommendation': recommendations[severity_level],
        'all_probabilities': {
            'Level 0 (No Disease)': float(probabilities[0]),
            'Level 1 (Mild)': float(probabilities[1]),
            'Level 2 (Moderate)': float(probabilities[2]),
            'Level 3 (Severe)': float(probabilities[3]),
            'Level 4 (Very Severe)': float(probabilities[4])
        }
    }
    
    return result

print("Prediction function defined")

Prediction function defined


## 4. Sample Predictions

### Sample 1: Low Risk Patient

In [96]:
# Sample 1: Healthy patient profile
patient_1 = {
    'age': 45,
    'sex': 'Female',
    'cp': 'non-anginal',
    'trestbps': 110,
    'chol': 180,
    'fbs': 'FALSE',
    'restecg': 'normal',
    'thalch': 165,
    'exang': 'FALSE',
    'oldpeak': 0.0,
    'slope': 'upsloping',
    'ca': '0',
    'thal': 'normal',
}

print("PATIENT 1 - INPUT DATA:")
print("=" * 70)
for key, value in patient_1.items():
    print(f"  {key:15s}: {value}")

# Make prediction
result_1 = predict_heart_disease(patient_1)

print("\n" + "=" * 70)
print("PREDICTION RESULT:")
print("=" * 70)
print(f"Severity Level: {result_1['severity_level']}")
print(f"Diagnosis: {result_1['severity_label']}")
print(f"Risk Category: {result_1['risk_category']}")
print(f"Confidence: {result_1['confidence']:.2%}")
print(f"\nRecommendation: {result_1['recommendation']}")
print(f"\nProbability Distribution:")
for level, prob in result_1['all_probabilities'].items():
    print(f"  {level:25s}: {prob:.2%}")
print("=" * 70)

PATIENT 1 - INPUT DATA:
  age            : 45
  sex            : Female
  cp             : non-anginal
  trestbps       : 110
  chol           : 180
  fbs            : FALSE
  restecg        : normal
  thalch         : 165
  exang          : FALSE
  oldpeak        : 0.0
  slope          : upsloping
  ca             : 0
  thal           : normal

PREDICTION RESULT:
Severity Level: 0
Diagnosis: No Heart Disease
Risk Category: Low Risk
Confidence: 96.99%

Recommendation: Maintain healthy lifestyle. Regular check-ups recommended.

Probability Distribution:
  Level 0 (No Disease)     : 96.99%
  Level 1 (Mild)           : 0.00%
  Level 2 (Moderate)       : 0.00%
  Level 3 (Severe)         : 0.00%
  Level 4 (Very Severe)    : 0.00%


### Sample 2: Moderate Risk Patient

In [97]:
# Sample 2: Patient with moderate heart disease indicators
patient_2 = {
    'age': 58,
    'sex': 'Male',
    'cp': 'atypical angina',
    'trestbps': 140,
    'chol': 240,
    'fbs': 'TRUE',
    'restecg': 'lv hypertrophy',
    'thalch': 130,
    'exang': 'TRUE',
    'oldpeak': 1.5,
    'slope': 'flat',
    'ca': '1',
    'thal': 'reversable defect',
}

print("PATIENT 2 - INPUT DATA:")
print("=" * 70)
for key, value in patient_2.items():
    print(f"  {key:15s}: {value}")

# Make prediction
result_2 = predict_heart_disease(patient_2)

print("\n" + "=" * 70)
print("PREDICTION RESULT:")
print("=" * 70)
print(f"Severity Level: {result_2['severity_level']}")
print(f"Diagnosis: {result_2['severity_label']}")
print(f"Risk Category: {result_2['risk_category']}")
print(f"Confidence: {result_2['confidence']:.2%}")
print(f"\nRecommendation: {result_2['recommendation']}")
print(f"\nProbability Distribution:")
for level, prob in result_2['all_probabilities'].items():
    print(f"  {level:25s}: {prob:.2%}")
print("=" * 70)

PATIENT 2 - INPUT DATA:
  age            : 58
  sex            : Male
  cp             : atypical angina
  trestbps       : 140
  chol           : 240
  fbs            : TRUE
  restecg        : lv hypertrophy
  thalch         : 130
  exang          : TRUE
  oldpeak        : 1.5
  slope          : flat
  ca             : 1
  thal           : reversable defect

PREDICTION RESULT:
Severity Level: 0
Diagnosis: No Heart Disease
Risk Category: Low Risk
Confidence: 52.15%

Recommendation: Maintain healthy lifestyle. Regular check-ups recommended.

Probability Distribution:
  Level 0 (No Disease)     : 52.15%
  Level 1 (Mild)           : 10.39%
  Level 2 (Moderate)       : 1.27%
  Level 3 (Severe)         : 1.68%
  Level 4 (Very Severe)    : 16.23%


### Sample 3: High Risk Patient

In [98]:
# Sample 3: Patient with severe heart disease indicators
patient_3 = {
    'age': 67,
    'sex': 'Male',
    'cp': 'asymptomatic',
    'trestbps': 160,
    'chol': 286,
    'fbs': 'FALSE',
    'restecg': 'lv hypertrophy',
    'thalch': 108,
    'exang': 'TRUE',
    'oldpeak': 2.6,
    'slope': 'downsloping',
    'ca': '3',
    'thal': 'reversable defect',
}

print("PATIENT 3 - INPUT DATA:")
print("=" * 70)
for key, value in patient_3.items():
    print(f"  {key:15s}: {value}")

# Make prediction
result_3 = predict_heart_disease(patient_3)

print("\n" + "=" * 70)
print("PREDICTION RESULT:")
print("=" * 70)
print(f"Severity Level: {result_3['severity_level']}")
print(f"Diagnosis: {result_3['severity_label']}")
print(f"Risk Category: {result_3['risk_category']}")
print(f"Confidence: {result_3['confidence']:.2%}")
print(f"\nRecommendation: {result_3['recommendation']}")
print(f"\nProbability Distribution:")
for level, prob in result_3['all_probabilities'].items():
    print(f"  {level:25s}: {prob:.2%}")
print("=" * 70)

PATIENT 3 - INPUT DATA:
  age            : 67
  sex            : Male
  cp             : asymptomatic
  trestbps       : 160
  chol           : 286
  fbs            : FALSE
  restecg        : lv hypertrophy
  thalch         : 108
  exang          : TRUE
  oldpeak        : 2.6
  slope          : downsloping
  ca             : 3
  thal           : reversable defect

PREDICTION RESULT:
Severity Level: 3
Diagnosis: Severe Heart Disease
Risk Category: Very High Risk
Confidence: 49.68%

Recommendation: Urgent medical attention required. Immediate cardiology consultation recommended.

Probability Distribution:
  Level 0 (No Disease)     : 4.65%
  Level 1 (Mild)           : 0.49%
  Level 2 (Moderate)       : 44.67%
  Level 3 (Severe)         : 49.68%
  Level 4 (Very Severe)    : 0.45%


## 5. Handling Missing Values

The model can handle missing values gracefully. Here's an example:

In [99]:
# Sample with missing values
patient_missing = {
    'age': 55,
    'sex': 'Female',
    'cp': 'typical angina',
    'trestbps': 135,
    'chol': 220,
    'fbs': 'FALSE',
    'restecg': 'normal',
    'thalch': 145,
    'exang': 'FALSE',
    'oldpeak': 1.0,
    'slope': None,  # Missing value
    'ca': None,      # Missing value
    'thal': None,    # Missing value
}

print("PATIENT WITH MISSING VALUES - INPUT DATA:")
print("=" * 70)
for key, value in patient_missing.items():
    print(f"  {key:15s}: {value if value is not None else 'MISSING'}")

# Make prediction
result_missing = predict_heart_disease(patient_missing)

print("\n" + "=" * 70)
print("PREDICTION RESULT (with imputed values):")
print("=" * 70)
print(f"Severity Level: {result_missing['severity_level']}")
print(f"Diagnosis: {result_missing['severity_label']}")
print(f"Risk Category: {result_missing['risk_category']}")
print(f"Confidence: {result_missing['confidence']:.2%}")
print(f"\nNote: Missing values were automatically imputed using KNN imputation")
print(f"Recommendation: {result_missing['recommendation']}")
print("=" * 70)

PATIENT WITH MISSING VALUES - INPUT DATA:
  age            : 55
  sex            : Female
  cp             : typical angina
  trestbps       : 135
  chol           : 220
  fbs            : FALSE
  restecg        : normal
  thalch         : 145
  exang          : FALSE
  oldpeak        : 1.0
  slope          : MISSING
  ca             : MISSING
  thal           : MISSING

PREDICTION RESULT (with imputed values):
Severity Level: 0
Diagnosis: No Heart Disease
Risk Category: Low Risk
Confidence: 91.51%

Note: Missing values were automatically imputed using KNN imputation
Recommendation: Maintain healthy lifestyle. Regular check-ups recommended.


## 6. Web Application Integration

### Flask Example

In [100]:
# Flask web application example
flask_example = '''
from flask import Flask, request, jsonify
import pickle
import pandas as pd
import numpy as np

app = Flask(__name__)

# Define HierarchicalClassifier class
class HierarchicalClassifier:
    """Two-stage hierarchical classifier for heart disease prediction."""
    def __init__(self, binary_model, multiclass_model):
        self.binary_model = binary_model
        self.multiclass_model = multiclass_model
    
    def predict(self, X):
        binary_pred = self.binary_model.predict(X)
        final_pred = np.zeros(len(X), dtype=int)
        disease_mask = binary_pred == 1
        if disease_mask.sum() > 0:
            X_disease = X[disease_mask]
            severity_pred = self.multiclass_model.predict(X_disease)
            final_pred[disease_mask] = severity_pred
        return final_pred
    
    def predict_proba(self, X):
        binary_proba = self.binary_model.predict_proba(X)
        final_proba = np.zeros((len(X), 5))
        final_proba[:, 0] = binary_proba[:, 0]
        disease_prob = binary_proba[:, 1]
        multi_proba = self.multiclass_model.predict_proba(X)
        for i in range(1, 5):
            final_proba[:, i] = disease_prob * multi_proba[:, i]
        return final_proba

# Load models and artifacts at startup
with open('models/hierarchical_classifier.pkl', 'rb') as f:
    model = pickle.load(f)

with open('data/processed/preprocessing_artifacts.pkl', 'rb') as f:
    preprocessing_artifacts = pickle.load(f)

# [Include the preprocess_input and predict_heart_disease functions here]

@app.route('/predict', methods=['POST'])
def predict():
    """
    API endpoint for heart disease prediction.
    
    Expected JSON input:
    {
        "age": 63,
        "sex": "Male",
        "cp": "typical angina",
        "trestbps": 145,
        "chol": 233,
        "fbs": "TRUE",
        "restecg": "lv hypertrophy",
        "thalch": 150,
        "exang": "FALSE",
        "oldpeak": 2.3,
        "slope": "downsloping",
        "ca": "0",
        "thal": "fixed defect",
    }
    
    Returns JSON with prediction results.
    """
    try:
        # Get JSON data from request
        patient_data = request.get_json()
        
        # Validate required fields
        required_fields = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 
                          'restecg', 'thalch', 'exang', 'oldpeak', 
                          'slope', 'ca', 'thal']
        
        missing_fields = [field for field in required_fields 
                         if field not in patient_data]
        
        if missing_fields:
            return jsonify({
                'error': f'Missing required fields: {missing_fields}'
            }), 400
        
        # Make prediction
        result = predict_heart_disease(patient_data)
        
        return jsonify(result), 200
        
    except Exception as e:
        return jsonify({
            'error': f'Prediction failed: {str(e)}'
        }), 500

@app.route('/health', methods=['GET'])
def health():
    """Health check endpoint."""
    return jsonify({'status': 'healthy'}), 200

if __name__ == '__main__':
    app.run(debug=True, host='0.0.0.0', port=8000)
'''

print("FLASK APPLICATION EXAMPLE:")
print("=" * 70)
print(flask_example)
print("=" * 70)

FLASK APPLICATION EXAMPLE:

from flask import Flask, request, jsonify
import pickle
import pandas as pd
import numpy as np

app = Flask(__name__)

# Define HierarchicalClassifier class
class HierarchicalClassifier:
    """Two-stage hierarchical classifier for heart disease prediction."""
    def __init__(self, binary_model, multiclass_model):
        self.binary_model = binary_model
        self.multiclass_model = multiclass_model

    def predict(self, X):
        binary_pred = self.binary_model.predict(X)
        final_pred = np.zeros(len(X), dtype=int)
        disease_mask = binary_pred == 1
        if disease_mask.sum() > 0:
            X_disease = X[disease_mask]
            severity_pred = self.multiclass_model.predict(X_disease)
            final_pred[disease_mask] = severity_pred
        return final_pred

    def predict_proba(self, X):
        binary_proba = self.binary_model.predict_proba(X)
        final_proba = np.zeros((len(X), 5))
        final_proba[:, 0] = binary_proba

### FastAPI Example

In [101]:
# FastAPI web application example
fastapi_example = '''
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import Optional
import pickle
import pandas as pd
import numpy as np

app = FastAPI(title="Heart Disease Prediction API")

# Define HierarchicalClassifier class
class HierarchicalClassifier:
    """Two-stage hierarchical classifier for heart disease prediction."""
    def __init__(self, binary_model, multiclass_model):
        self.binary_model = binary_model
        self.multiclass_model = multiclass_model
    
    def predict(self, X):
        binary_pred = self.binary_model.predict(X)
        final_pred = np.zeros(len(X), dtype=int)
        disease_mask = binary_pred == 1
        if disease_mask.sum() > 0:
            X_disease = X[disease_mask]
            severity_pred = self.multiclass_model.predict(X_disease)
            final_pred[disease_mask] = severity_pred
        return final_pred
    
    def predict_proba(self, X):
        binary_proba = self.binary_model.predict_proba(X)
        final_proba = np.zeros((len(X), 5))
        final_proba[:, 0] = binary_proba[:, 0]
        disease_prob = binary_proba[:, 1]
        multi_proba = self.multiclass_model.predict_proba(X)
        for i in range(1, 5):
            final_proba[:, i] = disease_prob * multi_proba[:, i]
        return final_proba

# Load models at startup
@app.on_event("startup")
async def load_models():
    global model, preprocessing_artifacts
    
    with open('models/hierarchical_classifier.pkl', 'rb') as f:
        model = pickle.load(f)
    
    with open('data/processed/preprocessing_artifacts.pkl', 'rb') as f:
        preprocessing_artifacts = pickle.load(f)

# Input data model
class PatientData(BaseModel):
    age: int
    sex: str
    cp: str
    trestbps: float
    chol: float
    fbs: str
    restecg: str
    thalch: float
    exang: str
    oldpeak: float
    slope: Optional[str] = None
    ca: Optional[str] = None
    thal: Optional[str] = None
    dataset: Optional[str] = "Unknown"
    
    class Config:
        schema_extra = {
            "example": {
                "age": 63,
                "sex": "Male",
                "cp": "typical angina",
                "trestbps": 145,
                "chol": 233,
                "fbs": "TRUE",
                "restecg": "lv hypertrophy",
                "thalch": 150,
                "exang": "FALSE",
                "oldpeak": 2.3,
                "slope": "downsloping",
                "ca": "0",
                "thal": "fixed defect",
            }
        }

# [Include the preprocess_input and predict_heart_disease functions here]

@app.post("/predict")
async def predict(patient: PatientData):
    """
    Predict heart disease severity for a patient.
    
    Returns:
    - severity_level: Disease severity (0-4)
    - severity_label: Human-readable diagnosis
    - risk_category: Risk level
    - confidence: Prediction confidence
    - recommendation: Clinical recommendation
    - all_probabilities: Probability distribution across all levels
    """
    try:
        # Convert to dictionary
        patient_dict = patient.dict()
        
        # Make prediction
        result = predict_heart_disease(patient_dict)
        
        return result
        
    except Exception as e:
        raise HTTPException(status_code=500, detail=f"Prediction failed: {str(e)}")

@app.get("/health")
async def health():
    """Health check endpoint."""
    return {"status": "healthy"}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)
'''

print("FASTAPI APPLICATION EXAMPLE:")
print("=" * 70)
print(fastapi_example)
print("=" * 70)

FASTAPI APPLICATION EXAMPLE:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import Optional
import pickle
import pandas as pd
import numpy as np

app = FastAPI(title="Heart Disease Prediction API")

# Define HierarchicalClassifier class
class HierarchicalClassifier:
    """Two-stage hierarchical classifier for heart disease prediction."""
    def __init__(self, binary_model, multiclass_model):
        self.binary_model = binary_model
        self.multiclass_model = multiclass_model

    def predict(self, X):
        binary_pred = self.binary_model.predict(X)
        final_pred = np.zeros(len(X), dtype=int)
        disease_mask = binary_pred == 1
        if disease_mask.sum() > 0:
            X_disease = X[disease_mask]
            severity_pred = self.multiclass_model.predict(X_disease)
            final_pred[disease_mask] = severity_pred
        return final_pred

    def predict_proba(self, X):
        binary_proba = self.binary_model.predict_p

## 7. Input Validation and Error Handling

In [102]:
def validate_input(patient_data):
    """
    Validate patient input data.
    
    Returns:
    --------
    is_valid : bool
    errors : list of str
    """
    errors = []
    
    # Check required fields
    required_fields = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 
                      'restecg', 'thalch', 'exang', 'oldpeak', 'slope', 'ca', 'thal']
    
    for field in required_fields:
        if field not in patient_data or patient_data[field] is None:
            errors.append(f"Missing required field: {field}")
    
    # Validate numeric ranges
    if 'age' in patient_data:
        age = patient_data['age']
        if not (0 < age < 120):
            errors.append(f"Age must be between 0 and 120, got {age}")
    
    if 'trestbps' in patient_data:
        bp = patient_data['trestbps']
        if not (80 < bp < 250):
            errors.append(f"Blood pressure must be between 80 and 250 mm Hg, got {bp}")
    
    if 'chol' in patient_data:
        chol = patient_data['chol']
        if not (100 < chol < 600):
            errors.append(f"Cholesterol must be between 100 and 600 mg/dl, got {chol}")
    
    if 'thalch' in patient_data:
        hr = patient_data['thalch']
        if not (60 < hr < 220):
            errors.append(f"Heart rate must be between 60 and 220 bpm, got {hr}")
    
    # Validate categorical values
    valid_values = {
        'sex': ['Male', 'Female', 'male', 'female', 'M', 'F'],
        'cp': ['typical angina', 'atypical angina', 'non-anginal', 'asymptomatic'],
        'fbs': ['TRUE', 'FALSE', 'True', 'False', 'true', 'false'],
        'restecg': ['normal', 'lv hypertrophy', 'st-t abnormality'],
        'exang': ['TRUE', 'FALSE', 'True', 'False', 'true', 'false'],
        'slope': ['upsloping', 'flat', 'downsloping', None],
        'ca': ['0', '1', '2', '3', '4', None],
        'thal': ['normal', 'fixed defect', 'reversable defect', None]
    }
    
    for field, valid_list in valid_values.items():
        if field in patient_data and patient_data[field] not in valid_list:
            errors.append(f"Invalid value for {field}: {patient_data[field]}. "
                         f"Expected one of: {valid_list}")
    
    return len(errors) == 0, errors

print("Validation function defined")

# Test validation with invalid data
invalid_patient = {
    'age': 200,  # Invalid
    'sex': 'Unknown',  # Invalid
    'trestbps': 50  # Invalid
}

is_valid, errors = validate_input(invalid_patient)
print(f"\nValidation test with invalid data:")
print(f"Is valid: {is_valid}")
print(f"Errors found: {len(errors)}")
for error in errors:
    print(f"  - {error}")

Validation function defined

Validation test with invalid data:
Is valid: False
Errors found: 13
  - Missing required field: cp
  - Missing required field: chol
  - Missing required field: fbs
  - Missing required field: restecg
  - Missing required field: thalch
  - Missing required field: exang
  - Missing required field: oldpeak
  - Missing required field: slope
  - Missing required field: ca
  - Missing required field: thal
  - Age must be between 0 and 120, got 200
  - Blood pressure must be between 80 and 250 mm Hg, got 50
  - Invalid value for sex: Unknown. Expected one of: ['Male', 'Female', 'male', 'female', 'M', 'F']


## 8. Summary and Best Practices

### Key Takeaways:

1. **Input Format**:
   - 14 clinical features required
   - 5 numeric, 9 categorical
   - Missing values handled automatically

2. **Output Format**:
   - `severity_level`: Integer 0-4
   - `severity_label`: Human-readable diagnosis
   - `risk_category`: Clinical risk level
   - `confidence`: Prediction confidence (0-1)
   - `recommendation`: Clinical recommendation
   - `all_probabilities`: Full probability distribution

3. **Model Pipeline**:
   - Stage 1: Binary classifier (Disease vs No Disease)
   - Stage 2: Multi-class classifier (Severity 1-4)
   - Automatic preprocessing with saved artifacts

4. **Best Practices**:
   - Always validate input before prediction
   - Handle errors gracefully
   - Provide confidence scores to users
   - Include clinical recommendations
   - Log predictions for monitoring
   - Never use as sole diagnostic tool

5. **Deployment Considerations**:
   - Load models once at startup
   - Use async endpoints for scalability
   - Implement rate limiting
   - Add authentication for production
   - Monitor prediction confidence
   - Store predictions for audit trail

### Clinical Disclaimer:

**IMPORTANT**: This model is a **decision support tool** and should NOT replace:
- Professional medical judgment
- Standard diagnostic procedures
- Clinical examination
- Laboratory tests

Always consult qualified healthcare professionals for medical decisions.