# Financial Fraud Detection - Kubeflow Edition

This notebook demonstrates the complete fraud detection workflow using Kubeflow Pipelines and Triton Inference Server.

**What's Different from Local Notebook:**
- Uses KFP pipelines instead of running preprocessing/training locally
- Connects to Triton via Kubernetes DNS (deployed via ArgoCD)
- Submits pipeline runs programmatically
- Data lives in S3, accessed via ConfigMaps

**Prerequisites:**
- Running in a Kubeflow Notebook Server
- `fraud-detection-config` ConfigMap deployed
- Triton Inference Server deployed via ArgoCD
- Pipelines uploaded to Kubeflow

---
## Environment Setup

The Kubeflow Notebook Server should have most dependencies pre-installed.
We just need the KFP SDK and Triton client.

In [None]:
# Install required packages
!pip install -q kfp==2.10.0 tritonclient[http]==2.52.0 matplotlib seaborn pandas numpy scikit-learn pyarrow

In [None]:
import os
import json
import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

# KFP imports
import kfp
from kfp import compiler
from kfp.client import Client

# Triton client
import tritonclient.http as httpclient
from tritonclient.http import InferInput, InferRequestedOutput

print(f"KFP SDK version: {kfp.__version__}")

---
## Configuration

These values should match your infrastructure deployment.

In [None]:
# Kubeflow Pipelines configuration
KFP_HOST = "http://ml-pipeline-ui.kubeflow.svc.cluster.local:80"  # In-cluster KFP UI

# Triton configuration (from infra/manifests/helm/triton)
TRITON_SERVICE = "triton-inference-server"
TRITON_NAMESPACE = "triton"
TRITON_HTTP_PORT = 8005
TRITON_HOST = f"{TRITON_SERVICE}.{TRITON_NAMESPACE}.svc.cluster.local:{TRITON_HTTP_PORT}"

# Model name in Triton
MODEL_NAME = "prediction_and_shapley"

# S3 configuration (should match ConfigMap)
S3_BUCKET = os.environ.get("S3_BUCKET", "ml-on-containers")
S3_REGION = os.environ.get("S3_REGION", "us-east-1")

print(f"KFP Host: {KFP_HOST}")
print(f"Triton Host: {TRITON_HOST}")
print(f"S3 Bucket: {S3_BUCKET}")

---
## Step 1: Connect to Kubeflow Pipelines

Initialize the KFP client to interact with the pipeline service.

In [None]:
# Connect to KFP
try:
    kfp_client = Client(host=KFP_HOST)
    print("Connected to Kubeflow Pipelines")
    
    # List existing pipelines
    pipelines = kfp_client.list_pipelines(page_size=10)
    if pipelines.pipelines:
        print("\nExisting pipelines:")
        for p in pipelines.pipelines:
            print(f"  - {p.display_name} (ID: {p.pipeline_id})")
    else:
        print("\nNo pipelines uploaded yet.")
except Exception as e:
    print(f"Could not connect to KFP: {e}")
    print("Make sure you're running in a Kubeflow Notebook Server")
    kfp_client = None

---
## Step 2: Upload Pipelines (if needed)

Upload the compiled pipeline YAML files to Kubeflow.

In [None]:
# Pipeline YAML paths (relative to notebooks/ directory)
PREPROCESSING_PIPELINE = "../workflows/cc_data_preprocessing_pipeline.yaml"
TRAINING_PIPELINE = "../workflows/fraud_detection_training_pipeline.yaml"
SMOKE_TEST_PIPELINE = "../workflows/fraud_model_smoke_test_pipeline.yaml"

def upload_pipeline_if_needed(client, yaml_path, pipeline_name):
    """Upload a pipeline if it doesn't already exist."""
    if not os.path.exists(yaml_path):
        print(f"Pipeline YAML not found: {yaml_path}")
        print("Run 'cd ../workflows && uv run python -m workflows.pipeline' to compile")
        return None
    
    # Check if pipeline exists
    try:
        pipelines = client.list_pipelines(page_size=100)
        for p in (pipelines.pipelines or []):
            if p.display_name == pipeline_name:
                print(f"Pipeline '{pipeline_name}' already exists (ID: {p.pipeline_id})")
                return p.pipeline_id
    except:
        pass
    
    # Upload new pipeline
    try:
        result = client.upload_pipeline(yaml_path, pipeline_name=pipeline_name)
        print(f"Uploaded pipeline '{pipeline_name}' (ID: {result.pipeline_id})")
        return result.pipeline_id
    except Exception as e:
        print(f"Failed to upload {pipeline_name}: {e}")
        return None

# Upload pipelines
if kfp_client:
    preprocessing_id = upload_pipeline_if_needed(
        kfp_client, PREPROCESSING_PIPELINE, "tabformer-preprocessing"
    )
    training_id = upload_pipeline_if_needed(
        kfp_client, TRAINING_PIPELINE, "fraud-detection-training"
    )
    smoke_test_id = upload_pipeline_if_needed(
        kfp_client, SMOKE_TEST_PIPELINE, "fraud-model-smoke-test"
    )

---
## Step 3: Run Preprocessing Pipeline

Submit the preprocessing pipeline to process raw TabFormer data.

In [None]:
def run_pipeline(client, pipeline_id, pipeline_name, params=None, experiment_name="fraud-detection"):
    """Submit a pipeline run and wait for completion."""
    if not client or not pipeline_id:
        print("KFP client or pipeline not available")
        return None
    
    # Get or create experiment
    try:
        experiment = client.create_experiment(name=experiment_name)
    except:
        experiments = client.list_experiments()
        experiment = next(
            (e for e in experiments.experiments if e.display_name == experiment_name),
            None
        )
    
    # Create run
    run_name = f"{pipeline_name}-{datetime.now().strftime('%Y%m%d-%H%M%S')}"
    run = client.run_pipeline(
        experiment_id=experiment.experiment_id,
        job_name=run_name,
        pipeline_id=pipeline_id,
        params=params or {}
    )
    
    print(f"Started run: {run_name}")
    print(f"Run ID: {run.run_id}")
    print(f"View in UI: {KFP_HOST}/#/runs/details/{run.run_id}")
    
    return run

In [None]:
# Preprocessing parameters
preprocessing_params = {
    "s3_region": S3_REGION,
    "under_sample": True,
    "fraud_ratio": 0.1,
    "train_year_cutoff": 2018,
    "validation_year": 2018,
    "one_hot_threshold": 8,
}

# Submit preprocessing run (uncomment to execute)
# preprocessing_run = run_pipeline(
#     kfp_client, 
#     preprocessing_id, 
#     "preprocessing",
#     preprocessing_params
# )

---
## Step 4: Run Training Pipeline

After preprocessing completes, run the training pipeline.

In [None]:
# Training parameters
training_params = {
    "gnn_data_s3_uri": f"s3://{S3_BUCKET}/preprocessing/gnn",
    "gnn_hidden_channels": 32,
    "gnn_n_hops": 2,
    "gnn_layer": "SAGEConv",
    "gnn_dropout_prob": 0.1,
    "gnn_batch_size": 4096,
    "gnn_fan_out": 10,
    "gnn_num_epochs": 8,
    "xgb_max_depth": 6,
    "xgb_learning_rate": 0.2,
    "xgb_num_parallel_tree": 3,
    "xgb_num_boost_round": 512,
    "s3_model_prefix": "model-repository",
    "s3_region": S3_REGION,
    "run_smoke_test": True,
    "triton_service_name": TRITON_SERVICE,
    "triton_namespace": TRITON_NAMESPACE,
    "triton_port": TRITON_HTTP_PORT,
}

# Submit training run (uncomment to execute)
# training_run = run_pipeline(
#     kfp_client,
#     training_id,
#     "training",
#     training_params
# )

---
## Step 5: Connect to Triton Inference Server

Once the model is deployed, connect to Triton for inference.

In [None]:
def wait_for_triton(host, model_name, timeout=300):
    """Wait for Triton server and model to be ready."""
    client = httpclient.InferenceServerClient(url=host)
    start = time.time()
    
    while time.time() - start < timeout:
        try:
            if client.is_server_ready():
                print("Triton server is ready")
                if client.is_model_ready(model_name):
                    print(f"Model '{model_name}' is ready")
                    return client
                else:
                    print(f"Waiting for model '{model_name}'...")
            else:
                print("Waiting for Triton server...")
        except Exception as e:
            print(f"Connection error: {e}")
        time.sleep(10)
    
    raise TimeoutError(f"Triton not ready after {timeout}s")

# Connect to Triton
try:
    triton_client = wait_for_triton(TRITON_HOST, MODEL_NAME, timeout=60)
except Exception as e:
    print(f"Could not connect to Triton: {e}")
    triton_client = None

In [None]:
# Get model metadata
if triton_client:
    metadata = triton_client.get_model_metadata(MODEL_NAME)
    print(f"Model: {metadata['name']}")
    print(f"Versions: {metadata.get('versions', ['1'])}")
    print("\nInputs:")
    for inp in metadata['inputs']:
        print(f"  {inp['name']}: {inp['datatype']} {inp['shape']}")
    print("\nOutputs:")
    for out in metadata['outputs']:
        print(f"  {out['name']}: {out['datatype']} {out['shape']}")

---
## Step 6: Run Inference

Prepare sample data and send inference requests to Triton.

In [None]:
def make_sample_request(num_users=5, num_merchants=3, num_edges=2):
    """Create sample inference request with random data.
    
    In production, you'd load actual preprocessed transaction data.
    """
    # Feature dimensions from model training
    user_feature_dim = 13
    merchant_feature_dim = 24
    edge_feature_dim = 38
    
    return {
        "x_user": np.random.randn(num_users, user_feature_dim).astype(np.float32),
        "x_merchant": np.random.randn(num_merchants, merchant_feature_dim).astype(np.float32),
        "edge_index_user_to_merchant": np.vstack([
            np.random.randint(0, num_users, num_edges),
            np.random.randint(0, num_merchants, num_edges),
        ]).astype(np.int64),
        "edge_attr_user_to_merchant": np.random.randn(num_edges, edge_feature_dim).astype(np.float32),
        "feature_mask_user": np.zeros(user_feature_dim, dtype=np.int32),
        "feature_mask_merchant": np.zeros(merchant_feature_dim, dtype=np.int32),
        "COMPUTE_SHAP": np.array([False], dtype=np.bool_),
    }

sample_data = make_sample_request(num_users=10, num_merchants=5, num_edges=20)
print("Sample request shapes:")
for k, v in sample_data.items():
    print(f"  {k}: {v.shape} ({v.dtype})")

In [None]:
def run_inference(client, model_name, data, compute_shap=False):
    """Send inference request to Triton."""
    inputs = []
    
    dtype_map = {
        "x_": "FP32",
        "feature_mask_": "INT32",
        "edge_index_": "INT64",
        "edge_attr_": "FP32",
        "COMPUTE_SHAP": "BOOL",
    }
    
    def get_dtype(key):
        for prefix, dtype in dtype_map.items():
            if key.startswith(prefix) or key == prefix.rstrip("_"):
                return dtype
        return "FP32"
    
    for key, value in data.items():
        if key == "COMPUTE_SHAP":
            value = np.array([compute_shap], dtype=np.bool_)
        inp = InferInput(key, list(value.shape), datatype=get_dtype(key))
        inp.set_data_from_numpy(value)
        inputs.append(inp)
    
    outputs = [InferRequestedOutput("PREDICTION")]
    if compute_shap:
        outputs.extend([
            InferRequestedOutput("shap_values_user"),
            InferRequestedOutput("shap_values_merchant"),
        ])
    
    t0 = time.time()
    result = client.infer(model_name, inputs, outputs=outputs)
    latency = (time.time() - t0) * 1000
    
    return result, latency

# Run inference
if triton_client:
    result, latency = run_inference(triton_client, MODEL_NAME, sample_data)
    predictions = result.as_numpy("PREDICTION")
    
    print(f"Inference latency: {latency:.2f}ms")
    print(f"Predictions shape: {predictions.shape}")
    print(f"Predictions: {predictions.flatten()}")
    print(f"Fraud probability range: [{predictions.min():.4f}, {predictions.max():.4f}]")

---
## Step 7: Visualize Results

Plot prediction distributions and performance metrics.

In [None]:
def visualize_predictions(predictions, title="Fraud Predictions"):
    """Visualize prediction distribution."""
    fig, axes = plt.subplots(1, 2, figsize=(12, 4))
    
    # Histogram
    ax1 = axes[0]
    ax1.hist(predictions.flatten(), bins=30, alpha=0.7, color='#3498db', edgecolor='white')
    ax1.axvline(0.5, color='red', linestyle='--', label='Threshold (0.5)')
    ax1.set_xlabel('Fraud Probability')
    ax1.set_ylabel('Count')
    ax1.set_title('Prediction Distribution')
    ax1.legend()
    
    # Classification
    ax2 = axes[1]
    fraud_count = (predictions > 0.5).sum()
    non_fraud_count = (predictions <= 0.5).sum()
    ax2.bar(['Non-Fraud', 'Fraud'], [non_fraud_count, fraud_count], 
            color=['#2ecc71', '#e74c3c'], alpha=0.8)
    ax2.set_ylabel('Count')
    ax2.set_title('Classification Results')
    
    for i, v in enumerate([non_fraud_count, fraud_count]):
        ax2.text(i, v + 0.5, str(v), ha='center', fontweight='bold')
    
    plt.suptitle(title, fontsize=14, fontweight='bold')
    plt.tight_layout()
    plt.show()

if triton_client:
    visualize_predictions(predictions, "Sample Inference Results")

---
## Step 8: Compute SHAP Values (Explainability)

SHAP values help explain which features contributed to each prediction.

**Note:** SHAP computation is expensive - use sparingly.

In [None]:
def run_inference_with_shap(client, model_name, data):
    """Run inference with SHAP value computation."""
    result, latency = run_inference(client, model_name, data, compute_shap=True)
    
    predictions = result.as_numpy("PREDICTION")
    
    shap_values = {}
    for name in ["shap_values_user", "shap_values_merchant"]:
        try:
            shap_values[name] = result.as_numpy(name)
        except:
            pass
    
    return predictions, shap_values, latency

# Run with SHAP (on smaller sample due to compute cost)
if triton_client:
    small_sample = make_sample_request(num_users=3, num_merchants=2, num_edges=5)
    predictions, shap_values, latency = run_inference_with_shap(
        triton_client, MODEL_NAME, small_sample
    )
    
    print(f"Inference with SHAP latency: {latency:.2f}ms")
    print(f"Predictions: {predictions.flatten()}")
    print("\nSHAP values computed:")
    for name, values in shap_values.items():
        print(f"  {name}: {values.shape}")

In [None]:
def visualize_shap_values(shap_values, feature_names=None, title="Feature Importance"):
    """Visualize SHAP values as feature importance."""
    if not shap_values:
        print("No SHAP values to visualize")
        return
    
    fig, axes = plt.subplots(1, len(shap_values), figsize=(6*len(shap_values), 5))
    if len(shap_values) == 1:
        axes = [axes]
    
    for ax, (name, values) in zip(axes, shap_values.items()):
        # Average absolute SHAP values across samples
        importance = np.abs(values).mean(axis=0)
        if len(importance.shape) > 1:
            importance = importance.mean(axis=0)
        
        n_features = len(importance)
        features = feature_names or [f"F{i}" for i in range(n_features)]
        
        # Sort by importance
        idx = np.argsort(importance)[-15:]  # Top 15
        
        ax.barh(range(len(idx)), importance[idx], color='#3498db', alpha=0.8)
        ax.set_yticks(range(len(idx)))
        ax.set_yticklabels([features[i] for i in idx])
        ax.set_xlabel('Mean |SHAP Value|')
        ax.set_title(name.replace('shap_values_', '').title() + ' Features')
    
    plt.suptitle(title, fontsize=14, fontweight='bold')
    plt.tight_layout()
    plt.show()

if triton_client and shap_values:
    visualize_shap_values(shap_values, title="Feature Importance (SHAP)")

---
## Step 9: Batch Inference on Test Data

Load actual test data from S3 and run batch inference.

In [None]:
def load_test_data_from_s3(bucket, prefix, region):
    """Load test data from S3.
    
    In production, use boto3 to download from S3.
    For this demo, we'll check if data exists locally.
    """
    import boto3
    
    s3 = boto3.client('s3', region_name=region)
    
    # Expected paths from preprocessing pipeline
    test_paths = {
        "edges": f"{prefix}/gnn_test_edges/data.parquet",
        "user_features": f"{prefix}/gnn_test_user_features/data.parquet",
        "merchant_features": f"{prefix}/gnn_test_merchant_features/data.parquet",
        "edge_features": f"{prefix}/gnn_test_edge_features/data.parquet",
        "edge_labels": f"{prefix}/gnn_test_edge_labels/data.parquet",
    }
    
    data = {}
    for name, path in test_paths.items():
        try:
            obj = s3.get_object(Bucket=bucket, Key=path)
            data[name] = pd.read_parquet(obj['Body'])
            print(f"Loaded {name}: {len(data[name])} rows")
        except Exception as e:
            print(f"Could not load {name}: {e}")
    
    return data

# Uncomment to load test data from S3
# test_data = load_test_data_from_s3(S3_BUCKET, "preprocessing", S3_REGION)

---
## Step 10: Performance Evaluation

Evaluate model performance on test data with ground truth labels.

In [None]:
def evaluate_model(predictions, true_labels):
    """Calculate and display model performance metrics."""
    from sklearn.metrics import (
        accuracy_score, precision_score, recall_score, 
        f1_score, roc_auc_score, confusion_matrix
    )
    
    pred_labels = (predictions > 0.5).astype(int)
    
    metrics = {
        "Accuracy": accuracy_score(true_labels, pred_labels),
        "Precision": precision_score(true_labels, pred_labels, zero_division=0),
        "Recall": recall_score(true_labels, pred_labels, zero_division=0),
        "F1 Score": f1_score(true_labels, pred_labels, zero_division=0),
    }
    
    try:
        metrics["ROC-AUC"] = roc_auc_score(true_labels, predictions)
    except:
        metrics["ROC-AUC"] = 0.0
    
    # Display metrics
    print("=" * 40)
    print("MODEL PERFORMANCE")
    print("=" * 40)
    for name, value in metrics.items():
        print(f"{name:12}: {value:.4f}")
    print("=" * 40)
    
    # Confusion matrix visualization
    cm = confusion_matrix(true_labels, pred_labels)
    fig, ax = plt.subplots(figsize=(6, 5))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=ax,
                xticklabels=['Non-Fraud', 'Fraud'],
                yticklabels=['Non-Fraud', 'Fraud'])
    ax.set_xlabel('Predicted')
    ax.set_ylabel('Actual')
    ax.set_title('Confusion Matrix')
    plt.tight_layout()
    plt.show()
    
    return metrics

# Demo with synthetic labels (in production, use actual test labels)
if triton_client:
    # Simulate ground truth for demo
    synthetic_labels = (np.random.rand(len(predictions)) > 0.9).astype(int)
    metrics = evaluate_model(predictions.flatten(), synthetic_labels)

---
## Monitoring & Troubleshooting

### Check Pipeline Status

In [None]:
def check_run_status(client, run_id):
    """Check the status of a pipeline run."""
    run = client.get_run(run_id)
    print(f"Run: {run.run_id}")
    print(f"Status: {run.state}")
    print(f"Created: {run.created_at}")
    if run.finished_at:
        print(f"Finished: {run.finished_at}")
    return run

# Check status of a run (replace with actual run ID)
# run_status = check_run_status(kfp_client, "your-run-id-here")

### Triton Health Check

In [None]:
def triton_health_check(host):
    """Comprehensive Triton health check."""
    client = httpclient.InferenceServerClient(url=host)
    
    print(f"Triton Server: {host}")
    print("-" * 40)
    
    try:
        # Server health
        server_ready = client.is_server_ready()
        server_live = client.is_server_live()
        print(f"Server Ready: {server_ready}")
        print(f"Server Live: {server_live}")
        
        # Model repository
        if server_ready:
            models = client.get_model_repository_index()
            print(f"\nModels in repository: {len(models)}")
            for model in models:
                name = model.get('name', 'unknown')
                state = model.get('state', 'unknown')
                print(f"  - {name}: {state}")
        
        return True
    except Exception as e:
        print(f"Health check failed: {e}")
        return False

triton_health_check(TRITON_HOST)

---
## Summary

This notebook demonstrated the complete fraud detection workflow on Kubeflow:

1. **Pipeline Management** - Upload and manage KFP pipelines
2. **Preprocessing** - Run data preprocessing via KFP
3. **Training** - Train GNN+XGBoost model on GPU nodes
4. **Deployment** - Model auto-deployed to Triton via S3 polling
5. **Inference** - Run predictions via Triton HTTP API
6. **Explainability** - Compute SHAP values for interpretability
7. **Evaluation** - Measure model performance metrics

### Next Steps
- Set up recurring pipeline runs for model retraining
- Configure alerting on model performance degradation
- Add A/B testing for model comparison