## Model-2: Trust Model Training

This notebook trains a secondary model that predicts whether the
primary churn model's prediction is correct.

**Objective**
- Input: Model-1 features + predictions
- Output: Trust probability and trust decision


In [0]:
# Load Model-1 predictions with features
trust_df = spark.table(
    "ai_trust_catalog.churn_trust.model_1_predictions"
)

display(trust_df)

## Create Trust Label

`trust_label = 1` → Model-1 prediction is correct  
`trust_label = 0` → Model-1 prediction is incorrect


In [0]:
from pyspark.sql.functions import col

trust_df = trust_df.withColumn(
    "trust_label",
    (col("prediction") == col("label")).cast("int")
)

trust_df.select("label", "prediction", "trust_label").show(5)

## Prepare Training Data

We reuse the existing `features` column.
No feature engineering or VectorAssembler is applied here.


In [0]:
# Keep only required columns
trust_vec_df = trust_df.select(
    "features",
    "trust_label"
)

train_vec, val_vec = trust_vec_df.randomSplit([0.8, 0.2], seed=42)

## Train Trust Model

The model predicts whether Model-1's prediction can be trusted.

In [0]:
from pyspark.ml.classification import LogisticRegression

# Trust Model predicts correctness of Model-1
trust_model = LogisticRegression(
    featuresCol="features",          # already exists
    labelCol="trust_label",
    probabilityCol="trust_probability",
    predictionCol="trust_prediction"
)

model_2 = trust_model.fit(train_vec)

## Evaluate Trust Model

We measure how well the Trust Model predicts correctness.

In [0]:
from pyspark.ml.evaluation import BinaryClassificationEvaluator

val_predictions = model_2.transform(val_vec)

evaluator = BinaryClassificationEvaluator(
    labelCol="trust_label",
    rawPredictionCol="trust_probability",
    metricName="areaUnderROC"
)

trust_auc = evaluator.evaluate(val_predictions)
trust_auc

## Log Trust Model

The trained Trust Model is logged for governance and reuse.

In [0]:
import mlflow
import mlflow.spark

# Safety check (VERY IMPORTANT)
assert trust_auc is not None, "trust_auc is None. Run evaluation cell before logging."

# Use absolute experiment path (Databricks requirement)
mlflow.set_experiment("/AI Trust & Risk Intelligence Platform.04_model_1_base_ml.01_train_model_1")

with mlflow.start_run(run_name="trust_model_logistic_regression"):
    
    # Log parameters
    mlflow.log_param("model_type", "logistic_regression")
    
    # Log metric (now guaranteed not None)
    mlflow.log_metric("trust_auc", float(trust_auc))
    
    # Log Spark ML model
    mlflow.spark.log_model(
        model_2,
        artifact_path="trust_model"
    )

## Save Trust Predictions

This table is used for inference and dashboards.


In [0]:
trust_final_df = val_predictions.select(
    "features",
    "trust_label",
    "trust_prediction",
    "trust_probability"
)

trust_final_df.write \
    .format("delta") \
    .mode("overwrite") \
    .option("overwriteSchema", "true") \
    .saveAsTable(
        "ai_trust_catalog.churn_trust.gold_ai_trust_scores"
    )

## Summary

- Built correctness-based trust labels
- Reused existing feature vectors safely
- Trained Trust Model without schema conflicts
- Evaluated model using ROC-AUC
- Logged model to MLflow
- Saved trust predictions for downstream use