# Step 8: Inference Pipeline & Threshold Configuration

This step prepares the trained fraud detection model for real-world predictions.

It includes:
- Loading trained model
- Aligning feature encoding
- Applying tuned decision threshold
- Building reusable prediction functions


In [1]:
#Load library
import pandas as pd
import numpy as np
import joblib


In [3]:
# Load Model

model = joblib.load(
    "../models/balanced_random_forest.pkl"
)

print("Balanced RF model loaded.")


Balanced RF model loaded.


In [4]:
THRESHOLD = 0.40

threshold_config = {
    "model_name": "Balanced Random Forest",
    "threshold": THRESHOLD
}

joblib.dump(
    threshold_config,
    "../models/threshold_config.pkl"
)

print("Threshold config saved.")


Threshold config saved.


In [8]:
# Load in inference
model_features = joblib.load(
    "../models/model_features.pkl"
)


In [10]:
# Inference Encoding Function
def preprocess_input(input_df):
    
    # One‑hot encode
    input_encoded = pd.get_dummies(
        input_df,
        drop_first=True
    )
    
    # Align columns with training features
    input_aligned = input_encoded.reindex(
        columns=model_features,
        fill_value=0
    )
    
    return input_aligned


In [9]:
#Prediction Function

def predict_fraud(input_df):
    
    # Preprocess
    processed_data = preprocess_input(input_df)
    
    # Fraud probability
    fraud_prob = model.predict_proba(
        processed_data
    )[:,1]
    
    # Apply threshold
    predictions = (
        fraud_prob >= THRESHOLD
    ).astype(int)
    
    result = pd.DataFrame({
        "Fraud_Probability": fraud_prob,
        "Fraud_Prediction": predictions
    })
    
    return result


In [12]:
#Test Inference Pipeline

sample = pd.read_csv(
    "../data/processed/feature_engineered_raw.csv"
).drop("FraudFound_P", axis=1).iloc[[0]]

predict_fraud(sample)


Unnamed: 0,Fraud_Probability,Fraud_Prediction
0,0.623333,1


## Inference Pipeline Design

An inference pipeline was developed to ensure consistency between training and prediction environments.

Key components include:

- Feature encoding alignment
- Model feature mapping
- Probability-based predictions
- Tuned threshold application

This ensures real-time predictions remain fully compatible with the trained fraud detection model.


## Threshold Deployment Configuration

The classification threshold was externalized as a configuration parameter to enable flexible operational tuning.

A threshold of 0.40 was selected based on recall–precision optimization, maximizing fraud detection while maintaining manageable investigation volume.

Separating threshold logic from the model allows future recalibration without retraining.
