# Phase 5: Threshold Optimization
**Objective:**\n
Optimize the decision threshold for the **_final classification model (XGBoost)_** to:
* Maximize recall or precision depending on risk telerance
* Reduce **_false positives_** without missing true risks
* Serve **_as a tuning guide for compliance officers_** using the API or Power BI

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import joblib
from sklearn.metrics import precision_recall_curve, roc_curve

# Load saved model
model = joblib.load("../models/xgb_sanction_model.pkl")

# Load prepared test data
df = pd.read_csv("../data/sanctions_features.csv")
df = df[df["fuzz_ratio_reference"].notna()].copy()
df["is_match"] = ((df["fuzz_ratio"] > 50) | (df["common_token_count"] > 0)).astype(int)

X = df.drop(["is_match", "cleaned_name", "fuzz_ratio_reference"])
y = df["is_match"]

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, stratify=y, test_size=0.2, random_state=42
)

y_scores = model.predict_proba(X_test)[:, 1]

### Plot precision-recall and F1 vs Threshold