# MLflow Model Serving Test

This notebook:
1. Registers the champion model in MLflow Model Registry
2. Transitions it to Production stage
3. Tests the MLflow Model Serving API
4. Benchmarks performance

## 1. Setup and Imports

In [1]:
import sys
import os
import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import numpy as np
import requests
import json
import time
from datetime import datetime

import mlflow
import mlflow.sklearn
from mlflow.tracking import MlflowClient

print("All imports successful!")
print(f"MLflow version: {mlflow.__version__}")

All imports successful!
MLflow version: 3.7.0


## 2. Configure MLflow

In [2]:
# Set MLflow tracking URI
mlflow.set_tracking_uri("http://localhost:5000")
client = MlflowClient()

experiment_name = "home_credit_default_risk"
mlflow.set_experiment(experiment_name)

print(f"MLflow Tracking URI: {mlflow.get_tracking_uri()}")
print(f"Experiment: {experiment_name}")

MLflow Tracking URI: http://localhost:5000
Experiment: home_credit_default_risk


## 3. Find Champion Model Run

In [3]:
# Load comparison results
comparison_df = pd.read_csv('../reports/model_comparison.csv')
best_model_idx = comparison_df['Business Cost'].idxmin()
best_model_name = comparison_df.loc[best_model_idx, 'Model']

print(f"Champion Model: {best_model_name}")
print(f"Business Cost: {comparison_df.loc[best_model_idx, 'Business Cost']:.2f}")
print(f"AUC: {comparison_df.loc[best_model_idx, 'AUC']:.4f}")

# Map model name to run name
run_name_map = {
    'Logistic Regression': 'logistic_regression_baseline',
    'Random Forest': 'random_forest_tuned',
    'XGBoost': 'xgboost_tuned',
    'LightGBM': 'lightgbm_tuned'
}

champion_run_name = run_name_map[best_model_name]

Champion Model: LightGBM
Business Cost: 4959.00
AUC: 0.7793


## 4. Get All Runs from Experiment

In [4]:
# Get experiment
experiment = client.get_experiment_by_name(experiment_name)
experiment_id = experiment.experiment_id

# Search for all runs
runs = client.search_runs(
    experiment_ids=[experiment_id],
    order_by=["metrics.business_cost ASC"]
)

print(f"\nFound {len(runs)} runs in experiment:")
print("="*80)
for run in runs:
    run_name = run.data.tags.get('mlflow.runName', 'Unknown')
    auc = run.data.metrics.get('test_auc', 0)
    business_cost = run.data.metrics.get('business_cost', 0)
    print(f"{run_name:30s} | AUC: {auc:.4f} | Business Cost: {business_cost:.2f} | Run ID: {run.info.run_id[:8]}...")
print("="*80)


Found 5 runs in experiment:
lightgbm_tuned                 | AUC: 0.7793 | Business Cost: 4959.00 | Run ID: 05452ecb...
random_forest_tuned            | AUC: 0.7553 | Business Cost: 4964.00 | Run ID: 3419456e...
xgboost_tuned                  | AUC: 0.7695 | Business Cost: 4965.00 | Run ID: c768a4de...
logistic_regression_baseline   | AUC: 0.7684 | Business Cost: 4965.00 | Run ID: 8e84e916...
logistic_regression_baseline   | AUC: 0.7684 | Business Cost: 4965.00 | Run ID: 2f886076...


## 5. Register Champion Model

In [5]:
# Find the champion run
champion_run = None
for run in runs:
    run_name = run.data.tags.get('mlflow.runName', '')
    if champion_run_name in run_name:
        champion_run = run
        break

if champion_run is None:
    raise ValueError(f"Could not find run with name: {champion_run_name}")

champion_run_id = champion_run.info.run_id
print(f"Champion Run ID: {champion_run_id}")

# Register model
model_name = "home_credit_scoring"
model_uri = f"runs:/{champion_run_id}/model"

try:
    # Try to register model
    model_version = mlflow.register_model(model_uri, model_name)
    print(f"\nModel registered successfully!")
    print(f"Model Name: {model_name}")
    print(f"Version: {model_version.version}")
except Exception as e:
    print(f"Model already registered or error: {e}")
    # Get latest version
    model_versions = client.search_model_versions(f"name='{model_name}'")
    if model_versions:
        model_version = model_versions[0]
        print(f"Using existing model version: {model_version.version}")

Successfully registered model 'home_credit_scoring'.
2025/12/13 19:34:32 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: home_credit_scoring, version 1


Champion Run ID: 05452ecb1daa4fd09593e3119bd07591

Model registered successfully!
Model Name: home_credit_scoring
Version: 1


Created version '1' of model 'home_credit_scoring'.


## 6. Transition Model to Production

In [6]:
# Transition to Production stage
try:
    client.transition_model_version_stage(
        name=model_name,
        version=model_version.version,
        stage="Production",
        archive_existing_versions=True
    )
    print(f"Model transitioned to Production stage!")
except Exception as e:
    print(f"Error transitioning model: {e}")

# Verify
prod_model = client.get_latest_versions(model_name, stages=["Production"])
if prod_model:
    print(f"\nProduction model: {model_name} v{prod_model[0].version}")
    print(f"Status: {prod_model[0].status}")
    print(f"Run ID: {prod_model[0].run_id}")

Model transitioned to Production stage!

Production model: home_credit_scoring v1
Status: READY
Run ID: 05452ecb1daa4fd09593e3119bd07591


## 7. Model Metadata

In [7]:
# Get model details
model_details = client.get_model_version(model_name, model_version.version)

print("\nModel Metadata:")
print("="*60)
print(f"Name: {model_details.name}")
print(f"Version: {model_details.version}")
print(f"Stage: {model_details.current_stage}")
print(f"Run ID: {model_details.run_id}")
print(f"Source: {model_details.source}")
print("="*60)


Model Metadata:
Name: home_credit_scoring
Version: 1
Stage: Production
Run ID: 05452ecb1daa4fd09593e3119bd07591
Source: models:/m-7c7155d84f3a42d5aac70ead4378bbc1


## 8. Start MLflow Model Server (Instructions)

### To start the MLflow Model Serving server:

Open a **new terminal** and run:

```bash
cd "Projet Final"
mlflow models serve -m "models:/home_credit_scoring/Production" -p 5001 --env-manager local
```

Wait for the message: **"Listening at: http://127.0.0.1:5001"**

Then continue with the cells below to test the API.

## 9. Check Server Health

In [8]:
# Wait for server to be ready
server_url = "http://127.0.0.1:5001"
health_url = f"{server_url}/health"
invocations_url = f"{server_url}/invocations"

print("Checking if server is ready...")
max_retries = 30
retry_delay = 2

for i in range(max_retries):
    try:
        response = requests.get(health_url, timeout=5)
        if response.status_code == 200:
            print(f"✓ Server is ready!")
            print(f"Health check response: {response.text}")
            break
    except:
        if i < max_retries - 1:
            print(f"Waiting for server... ({i+1}/{max_retries})")
            time.sleep(retry_delay)
        else:
            print("✗ Server not responding. Make sure you started the server in terminal.")
            print("Run: mlflow models serve -m 'models:/home_credit_scoring/Production' -p 5001 --env-manager local")

Checking if server is ready...
Waiting for server... (1/30)
Waiting for server... (2/30)
Waiting for server... (3/30)
Waiting for server... (4/30)
Waiting for server... (5/30)
Waiting for server... (6/30)
Waiting for server... (7/30)
Waiting for server... (8/30)
Waiting for server... (9/30)
Waiting for server... (10/30)
Waiting for server... (11/30)
Waiting for server... (12/30)
Waiting for server... (13/30)
✓ Server is ready!
Health check response: 



## 10. Load Test Data

In [9]:
# Load prepared data
df = pd.read_csv('../data/application_train_prepared.csv')

# Separate features
if 'TARGET' in df.columns:
    target_col = 'TARGET'
else:
    target_col = [col for col in df.columns if 'target' in col.lower()][0]

X = df.drop(columns=[target_col])
y = df[target_col]

# Get a few test samples
test_samples = X.sample(n=5, random_state=42)
test_targets = y.loc[test_samples.index]

print(f"Selected {len(test_samples)} test samples")
print(f"Actual labels: {test_targets.values}")

Selected 5 test samples
Actual labels: [0 0 0 0 0]


## 11. Prepare Request Payload

In [10]:
# Format data for MLflow (dataframe_split format)
payload = {
    "dataframe_split": {
        "columns": test_samples.columns.tolist(),
        "data": test_samples.values.tolist()
    }
}

print("Payload prepared:")
print(f"Number of columns: {len(payload['dataframe_split']['columns'])}")
print(f"Number of samples: {len(payload['dataframe_split']['data'])}")
print(f"\nFirst few columns: {payload['dataframe_split']['columns'][:5]}")

Payload prepared:
Number of columns: 335
Number of samples: 5

First few columns: ['SK_ID_CURR', 'NAME_CONTRACT_TYPE', 'CODE_GENDER', 'FLAG_OWN_CAR', 'FLAG_OWN_REALTY']


## 12. Send Prediction Request

In [11]:
# Send POST request
headers = {"Content-Type": "application/json"}

print("Sending prediction request...")
start_time = time.time()

response = requests.post(
    invocations_url,
    json=payload,
    headers=headers
)

elapsed_time = time.time() - start_time

print(f"Response Status: {response.status_code}")
print(f"Response Time: {elapsed_time:.3f} seconds")

if response.status_code == 200:
    predictions = response.json()['predictions']
    print(f"\n✓ Predictions received successfully!")
    print(f"Number of predictions: {len(predictions)}")
else:
    print(f"✗ Error: {response.text}")

Sending prediction request...
Response Status: 200
Response Time: 0.011 seconds

✓ Predictions received successfully!
Number of predictions: 5


## 13. Parse and Display Predictions

In [12]:
# Display predictions vs actual
if response.status_code == 200:
    results_df = pd.DataFrame({
        'Index': test_samples.index,
        'Actual': test_targets.values,
        'Predicted_Proba': [pred[1] if isinstance(pred, list) else pred for pred in predictions],
        'Predicted_Class': [(pred[1] if isinstance(pred, list) else pred) >= 0.5 for pred in predictions]
    })
    
    print("\n" + "="*80)
    print("PREDICTION RESULTS")
    print("="*80)
    print(results_df.to_string(index=False))
    print("="*80)
    
    # Check accuracy
    correct = (results_df['Actual'] == results_df['Predicted_Class']).sum()
    accuracy = correct / len(results_df)
    print(f"\nAccuracy on test samples: {accuracy:.2%} ({correct}/{len(results_df)})")


PREDICTION RESULTS
 Index  Actual  Predicted_Proba  Predicted_Class
245895       0                1             True
 98194       0                0            False
 36463       0                1             True
249923       0                0            False
158389       0                0            False

Accuracy on test samples: 60.00% (3/5)


## 14. Test Batch Predictions

In [13]:
# Test with larger batch
batch_size = 100
batch_samples = X.sample(n=batch_size, random_state=123)

batch_payload = {
    "dataframe_split": {
        "columns": batch_samples.columns.tolist(),
        "data": batch_samples.values.tolist()
    }
}

print(f"Sending batch request ({batch_size} samples)...")
start_time = time.time()

batch_response = requests.post(
    invocations_url,
    json=batch_payload,
    headers=headers
)

elapsed_time = time.time() - start_time

print(f"Response Status: {batch_response.status_code}")
print(f"Response Time: {elapsed_time:.3f} seconds")
print(f"Throughput: {batch_size / elapsed_time:.2f} predictions/second")

Sending batch request (100 samples)...
Response Status: 200
Response Time: 0.015 seconds
Throughput: 6892.29 predictions/second


## 15. Edge Cases Testing

In [14]:
# Test with edge cases
print("Testing edge cases...\n")

# Test 1: Single sample
single_sample = X.iloc[[0]]
single_payload = {
    "dataframe_split": {
        "columns": single_sample.columns.tolist(),
        "data": single_sample.values.tolist()
    }
}

response1 = requests.post(invocations_url, json=single_payload, headers=headers)
print(f"✓ Single sample test: Status {response1.status_code}")

# Test 2: Sample with potential outliers
# Get samples with extreme values
outlier_samples = X.sample(n=3, random_state=999)
outlier_payload = {
    "dataframe_split": {
        "columns": outlier_samples.columns.tolist(),
        "data": outlier_samples.values.tolist()
    }
}

response2 = requests.post(invocations_url, json=outlier_payload, headers=headers)
print(f"✓ Outlier samples test: Status {response2.status_code}")

print("\nAll edge case tests passed!")

Testing edge cases...

✓ Single sample test: Status 200
✓ Outlier samples test: Status 200

All edge case tests passed!


## 16. Performance Benchmarking

In [15]:
# Benchmark with multiple requests
print("Running performance benchmark...")
n_requests = 50
request_times = []

benchmark_sample = X.sample(n=10, random_state=42)
benchmark_payload = {
    "dataframe_split": {
        "columns": benchmark_sample.columns.tolist(),
        "data": benchmark_sample.values.tolist()
    }
}

for i in range(n_requests):
    start = time.time()
    response = requests.post(invocations_url, json=benchmark_payload, headers=headers)
    elapsed = time.time() - start
    
    if response.status_code == 200:
        request_times.append(elapsed)
    
    if (i + 1) % 10 == 0:
        print(f"Progress: {i+1}/{n_requests}")

# Calculate statistics
mean_time = np.mean(request_times)
median_time = np.median(request_times)
p95_time = np.percentile(request_times, 95)
p99_time = np.percentile(request_times, 99)
throughput = 1 / mean_time

print("\n" + "="*60)
print("PERFORMANCE BENCHMARK RESULTS")
print("="*60)
print(f"Number of requests: {n_requests}")
print(f"Samples per request: {len(benchmark_sample)}")
print(f"\nLatency:")
print(f"  Mean: {mean_time*1000:.2f} ms")
print(f"  Median: {median_time*1000:.2f} ms")
print(f"  P95: {p95_time*1000:.2f} ms")
print(f"  P99: {p99_time*1000:.2f} ms")
print(f"\nThroughput: {throughput:.2f} requests/second")
print("="*60)

Running performance benchmark...
Progress: 10/50
Progress: 20/50
Progress: 30/50
Progress: 40/50
Progress: 50/50

PERFORMANCE BENCHMARK RESULTS
Number of requests: 50
Samples per request: 10

Latency:
  Mean: 4.65 ms
  Median: 4.57 ms
  P95: 5.33 ms
  P99: 5.63 ms

Throughput: 215.13 requests/second


## 17. API Documentation

### API Endpoint

**URL**: `http://127.0.0.1:5001/invocations`

**Method**: `POST`

**Headers**:
```
Content-Type: application/json
```

### Request Format

```json
{
  "dataframe_split": {
    "columns": ["feature1", "feature2", ...],
    "data": [
      [value1, value2, ...],
      [value1, value2, ...]
    ]
  }
}
```

### Response Format

```json
{
  "predictions": [
    [prob_class_0, prob_class_1],
    [prob_class_0, prob_class_1]
  ]
}
```

### cURL Example

```bash
curl -X POST http://127.0.0.1:5001/invocations \
  -H 'Content-Type: application/json' \
  -d @sample_request.json
```

### Python Example

```python
import requests
import json

url = "http://127.0.0.1:5001/invocations"
headers = {"Content-Type": "application/json"}
payload = {
    "dataframe_split": {
        "columns": [...],
        "data": [[...]]
    }
}

response = requests.post(url, json=payload, headers=headers)
predictions = response.json()['predictions']
```

## 18. Create Sample Request File

In [16]:
# Create sample request JSON file
sample_request = {
    "dataframe_split": {
        "columns": test_samples.columns.tolist(),
        "data": test_samples.iloc[:1].values.tolist()
    }
}

with open('../sample_request.json', 'w') as f:
    json.dump(sample_request, f, indent=2)

print("Sample request file created: ../sample_request.json")
print("\nYou can test with:")
print("curl -X POST http://127.0.0.1:5001/invocations -H 'Content-Type: application/json' -d @sample_request.json")

Sample request file created: ../sample_request.json

You can test with:
curl -X POST http://127.0.0.1:5001/invocations -H 'Content-Type: application/json' -d @sample_request.json


## Summary

This notebook has:
1. ✅ Registered the champion model in MLflow Model Registry
2. ✅ Transitioned model to Production stage
3. ✅ Tested the MLflow serving API with various payloads
4. ✅ Benchmarked API performance
5. ✅ Created API documentation and sample requests

### Model Registry Status:
- **Model Name**: home_credit_scoring
- **Stage**: Production
- **Model Type**: {best_model_name}

### API Performance:
- Ready for production deployment
- Consistent response times
- Handles batch predictions efficiently

### Next Steps:
1. Dockerize the application (Phase 3)
2. Deploy to production environment
3. Set up monitoring and alerting