# Seldon Core 2: Advanced MLOps Platform Showcase 🚀
**Experience production-ready MLOps with Seldon Core 2's complete capabilities**

## 🌟 **Why Seldon Core 2?**
Seldon Core 2 is the **next-generation MLOps platform** that transforms how organizations deploy, manage, and scale machine learning models in production. Built for enterprise-grade workloads, it provides everything needed for a complete ML infrastructure.

**🏆 Industry Leading:**
- Trusted by Fortune 500 companies for mission-critical ML workloads
- Open-source with enterprise support and cloud-native architecture
- CNCF Sandbox project with strong community and contributor base
- Compatible with all major cloud providers and on-premises deployments

## 🎯 What You'll Experience
This showcase demonstrates Seldon Core 2's **four key value propositions** through a complete product classification system:

### 🔧 **Flexibility** 
Deploy diverse models (transformers, classifiers) using Server and Model CRDs with efficient multi-model serving

### 📋 **Standardization**
Create ML pipelines with consistent CRDs and Open Inference Protocol V2 for unified model/pipeline interactions

### 👁️ **Observability** 
Real-time monitoring with Prometheus metrics and Grafana dashboards for comprehensive insights

### ⚡ **Optimization**
Safe A/B testing with traffic splitting, multi-model serving efficiency, and production deployment strategies

## 🏗️ Architecture Overview
**Complete MLOps Infrastructure in Action:**
- **🔧 Multi-Model Serving**: MLServer (5 replicas) + Triton (2 replicas) for diverse workloads
- **🤖 ML Models**: Feature transformer + V1/V2 classifiers with shared resource optimization
- **🔗 Pipeline Orchestration**: End-to-end ML workflows with Kafka data flow and tensor mapping
- **🧪 A/B Testing**: Safe model updates with 90/10 traffic splitting and real-time analysis
- **📊 Monitoring**: Real-time metrics and comprehensive observability
- **🌐 Production Access**: Direct browser access to all services with external IP routing
- **⚖️ Load Balancing**: Intelligent request distribution with health checking and auto-scaling
- **🔒 Security**: mTLS encryption, RBAC integration, and audit trail compliance

**Prerequisites**: Kubernetes cluster with Seldon Core 2 and monitoring stack installed

**Note**: For advanced data science monitoring features (drift detection, explainability), see the separate `advanced_data_science_monitoring.ipynb` notebook.

In [ ]:
import json
import subprocess
import time
import requests
import numpy as np
from IPython.display import display, Markdown, Code
from typing import Optional, List, Dict
import warnings
warnings.filterwarnings('ignore')

# Configuration
config = {
    "namespace": "seldon-mesh",
    "gateway_ip": None,
    "gateway_port": "80"
}

# Track deployed resources
deployed = {"servers": [], "models": [], "pipelines": [], "experiments": []}

def run(cmd, timeout=30): 
    """Execute command"""
    try:
        result = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=timeout)
        return result
    except Exception as e:
        return subprocess.CompletedProcess(cmd, 1, "", str(e))

def log(msg, level="INFO"): 
    """Simple logging"""
    display(Markdown(f"**{msg}**"))

# Get gateway configuration
result = run("kubectl get svc istio-ingressgateway -n istio-system -o json")
if result.returncode == 0 and result.stdout:
    try:
        svc_data = json.loads(result.stdout)
        ingress = svc_data.get("status", {}).get("loadBalancer", {}).get("ingress", [])
        if ingress and ingress[0].get("ip"):
            config["gateway_ip"] = ingress[0].get("ip")
            log(f"Gateway IP: {config['gateway_ip']}")
    except:
        config["gateway_ip"] = "localhost"
        
log("🚀 Starting Seldon Core 2 MLOps Platform Showcase")

# 🔧 Act 1: Flexibility - Multi-Model Deployment

**Seldon Core 2's flexibility allows you to deploy diverse model types and serve multiple models efficiently on shared infrastructure.**

## Key Features:
- **Multi-Model Serving**: Deploy multiple models on shared servers
- **Multiple Runtimes**: MLServer (Python/SKLearn), Triton (GPU/TensorRT)
- **Smart Scheduling**: Intelligent model placement across server replicas
- **Hot Swapping**: Update models without downtime

Let's deploy servers and models to demonstrate this flexibility.

In [ ]:
# Deploy servers for multi-model serving
servers_config = {
    "mlserver": 5,
    "triton": 2
}

for server_name, replica_count in servers_config.items():
    server_yaml = f"""apiVersion: mlops.seldon.io/v1alpha1
kind: Server
metadata:
  name: {server_name}
  namespace: {config['namespace']}
spec:
  replicas: {replica_count}
  serverConfig: {server_name}"""
    
    with open(f"{server_name}.yaml", "w") as f: 
        f.write(server_yaml)
    
    result = run(f"kubectl apply -f {server_name}.yaml")
    if result.returncode == 0:
        deployed["servers"].append(server_name)
        log(f"✅ Deployed {server_name} server with {replica_count} replicas")

log(f"Servers deployed: {len(deployed['servers'])}")

In [ ]:
# Deploy models
models_config = [
    {
        "name": "feature-transformer",
        "uri": "gs://seldon-models/scv2/samples/mlserver_1.5.0/iris-sklearn",
        "server": "mlserver",
        "requirements": ["scikit-learn==1.4.0"],
        "replicas": 2
    },
    {
        "name": "product-classifier-v1",
        "uri": "gs://seldon-models/scv2/samples/mlserver_1.5.0/iris-sklearn",
        "server": "mlserver",
        "requirements": ["scikit-learn==1.4.0"],
        "replicas": 3
    },
    {
        "name": "product-classifier-v2",
        "uri": "gs://seldon-models/scv2/samples/mlserver_1.5.0/iris-sklearn",
        "server": "mlserver",
        "requirements": ["scikit-learn==1.4.0"],
        "replicas": 3
    }
]

# Deploy models
for model in models_config:
    model_yaml = f"""apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: {model['name']}
  namespace: {config['namespace']}
spec:
  storageUri: "{model['uri']}"
  requirements:
{chr(10).join(f'  - {req}' for req in model['requirements'])}
  replicas: {model['replicas']}
  server: {model['server']}"""
    
    with open(f"{model['name']}.yaml", "w") as f:
        f.write(model_yaml)
    
    result = run(f"kubectl apply -f {model['name']}.yaml")
    if result.returncode == 0:
        deployed["models"].append(model['name'])
        log(f"✅ Deployed model: {model['name']}")

log(f"Models deployed: {len(deployed['models'])}")

In [ ]:
# Test model inference
def test_inference(name, data):
    url = f"http://{config['gateway_ip']}:{config['gateway_port']}/v2/models/{name}/infer"
    payload = {"inputs": [{"name": "predict", "shape": [len(data), len(data[0])], "datatype": "FP32", "data": data}]}
    headers = {"Content-Type": "application/json", "Seldon-Model": name}
    
    try:
        response = requests.post(url, json=payload, headers=headers, timeout=10)
        if response.status_code == 200:
            result = response.json()
            outputs = result.get("outputs", [{}])
            prediction = outputs[0].get("data", []) if outputs else []
            log(f"✅ {name}: {prediction[:3]}")
            return True
    except:
        pass
    return False

# Test deployed models
sample_data = [[5.1, 3.5, 1.4, 0.2]]
for model_name in deployed["models"]:
    test_inference(model_name, sample_data)

# 📋 Act 2: Standardization - ML Pipelines

**Seldon Core 2 provides standardized pipeline orchestration using consistent CRDs and Open Inference Protocol V2.**

## Key Features:
- **Pipeline CRDs**: Define complex ML workflows declaratively
- **Data Flow**: Kafka-based streaming between pipeline steps
- **Tensor Mapping**: Flexible data routing between models
- **Open Standards**: V2 inference protocol for all endpoints

In [ ]:
# Deploy pipelines
pipelines_config = [
    {
        "name": "product-pipeline-v1",
        "steps": [
            {"name": "feature-transformer"},
            {
                "name": "product-classifier-v1",
                "inputs": ["product-pipeline-v1.inputs.predict"],
                "tensorMap": {
                    "product-pipeline-v1.inputs.predict": "predict"
                }
            }
        ]
    },
    {
        "name": "product-pipeline-v2",
        "steps": [
            {"name": "feature-transformer"},
            {
                "name": "product-classifier-v2",
                "inputs": ["product-pipeline-v2.inputs.predict"],
                "tensorMap": {
                    "product-pipeline-v2.inputs.predict": "predict"
                }
            }
        ]
    }
]

for pipeline_config in pipelines_config:
    pipeline_spec = {
        "apiVersion": "mlops.seldon.io/v1alpha1",
        "kind": "Pipeline",
        "metadata": {
            "name": pipeline_config["name"],
            "namespace": config["namespace"]
        },
        "spec": {
            "steps": pipeline_config["steps"],
            "output": {"steps": [pipeline_config["steps"][-1]["name"]]}
        }
    }
    
    with open(f"{pipeline_config['name']}.yaml", "w") as f:
        f.write(json.dumps(pipeline_spec, indent=2))
    
    result = run(f"kubectl apply -f {pipeline_config['name']}.yaml")
    if result.returncode == 0:
        deployed["pipelines"].append(pipeline_config["name"])
        log(f"✅ Deployed pipeline: {pipeline_config['name']}")

log(f"Pipelines deployed: {len(deployed['pipelines'])}")

# 👁️ Act 3: Observability - Real-Time Monitoring

**Seldon Core 2 provides comprehensive observability with Prometheus metrics and distributed tracing.**

## Key Features:
- **Auto-generated Metrics**: Request rates, latencies, success rates
- **Model-Level Granularity**: Per-model and per-pipeline metrics
- **Prometheus Integration**: Ready-to-use queries for monitoring
- **Custom Metrics**: Business KPIs and model performance tracking

In [ ]:
# Generate metrics through inference requests
log("Generating metrics through 100 inference requests...")

request_count = 0
for i in range(25):
    for endpoint in deployed["models"][:2] + deployed["pipelines"]:
        try:
            url = f"http://{config['gateway_ip']}:{config['gateway_port']}/v2/models/{endpoint}/infer"
            payload = {"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[5.1, 3.5, 1.4, 0.2]]}]}
            headers = {"Content-Type": "application/json", "Seldon-Model": f"{endpoint}.pipeline" if endpoint in deployed["pipelines"] else endpoint}
            
            response = requests.post(url, json=payload, headers=headers, timeout=5)
            if response.status_code == 200:
                request_count += 1
        except:
            pass
    
    if i % 5 == 0:
        print(f"Progress: {request_count} requests...", end="\r")

print()
log(f"✅ Generated {request_count} requests for metrics")

# Display Prometheus queries
display(Markdown(f"""
## 📊 Prometheus Queries

Copy these queries into your Prometheus/Grafana:

**Request Rate:**
```promql
rate(seldon_model_infer_total{{namespace="{config['namespace']}"}}[5m])
```

**Latency P95:**
```promql
histogram_quantile(0.95, rate(seldon_model_infer_duration_seconds_bucket{{namespace="{config['namespace']}"}}[5m]))
```

**Success Rate:**
```promql
sum(rate(seldon_model_infer_total{{namespace="{config['namespace']}", code="200"}}[5m])) / 
sum(rate(seldon_model_infer_total{{namespace="{config['namespace']}"}}[5m])) * 100
```

**Per-Model Requests:**
```promql
sum by (model_name) (rate(seldon_model_infer_total{{namespace="{config['namespace']}"}}[5m]))
```
"""))

# ⚡ Act 4: Optimization - A/B Testing

**Seldon Core 2 enables safe model deployment through A/B testing and traffic management.**

## Key Features:
- **Traffic Splitting**: Route percentages of traffic to different models
- **Safe Rollouts**: Test new models with minimal risk
- **Instant Rollback**: Revert to previous version immediately
- **Resource Efficiency**: Multiple versions share infrastructure

In [ ]:
# Deploy A/B experiment
experiment_yaml = f"""apiVersion: mlops.seldon.io/v1alpha1
kind: Experiment
metadata:
  name: product-ab-test
  namespace: {config['namespace']}
spec:
  default: product-pipeline-v1
  resourceType: pipeline
  candidates:
    - name: product-pipeline-v1
      weight: 90
    - name: product-pipeline-v2
      weight: 10"""

with open("experiment.yaml", "w") as f:
    f.write(experiment_yaml)

result = run("kubectl apply -f experiment.yaml")
if result.returncode == 0:
    deployed["experiments"].append("product-ab-test")
    log("✅ Deployed A/B experiment: 90% v1, 10% v2")

# Test traffic splitting
time.sleep(10)  # Wait for experiment to be ready

log("Testing A/B traffic distribution with 50 requests...")
v1_count = v2_count = 0

for i in range(50):
    try:
        url = f"http://{config['gateway_ip']}:{config['gateway_port']}/v2/models/product-pipeline-v1/infer"
        payload = {"inputs": [{"name": "predict", "shape": [1, 4], "datatype": "FP32", "data": [[5.1, 3.5, 1.4, 0.2]]}]}
        headers = {"Content-Type": "application/json", "Seldon-Model": "product-pipeline-v1.pipeline"}
        
        response = requests.post(url, json=payload, headers=headers, timeout=5)
        if response.status_code == 200:
            route = response.headers.get("X-Seldon-Route", "")
            if "v2" in route:
                v2_count += 1
            else:
                v1_count += 1
    except:
        pass

total = v1_count + v2_count
if total > 0:
    v1_pct = (v1_count / total) * 100
    v2_pct = (v2_count / total) * 100
    log(f"📊 Traffic Distribution: V1={v1_count} ({v1_pct:.0f}%), V2={v2_count} ({v2_pct:.0f}%)")
    
display(Markdown(f"""
## 🎛️ Traffic Management Commands

**Update to 50/50 split:**
```bash
kubectl patch experiment product-ab-test -n {config['namespace']} --type='merge' -p='
{{
  "spec": {{
    "candidates": [
      {{"name": "product-pipeline-v1", "weight": 50}},
      {{"name": "product-pipeline-v2", "weight": 50}}
    ]
  }}
}}'
```

**Promote V2 to 100%:**
```bash
kubectl patch experiment product-ab-test -n {config['namespace']} --type='merge' -p='
{{
  "spec": {{
    "default": "product-pipeline-v2",
    "candidates": [
      {{"name": "product-pipeline-v2", "weight": 100}}
    ]
  }}
}}'
```
"""))

# 🏆 Summary

**You've successfully demonstrated Seldon Core 2's key capabilities:**

✅ **Flexibility**: Deployed multiple models on MLServer and Triton  
✅ **Standardization**: Created ML pipelines with V2 protocol  
✅ **Observability**: Generated metrics for Prometheus monitoring  
✅ **Optimization**: Implemented A/B testing with traffic splitting  

## 📚 Next Steps

1. **Scale Up**: Increase replicas for production load
2. **Add Monitoring**: Connect Prometheus and Grafana dashboards
3. **Enable Auto-scaling**: Configure HPA for dynamic scaling
4. **Advanced Features**: Explore drift detection, explainability, and more

## 🧹 Clean Up Resources

Run the cell below to remove all deployed resources when you're done.

In [ ]:
# Clean up resources
def cleanup_resources():
    log("Cleaning up deployed resources...")
    
    # Delete experiments first
    for exp in deployed["experiments"]:
        run(f"kubectl delete experiment {exp} -n {config['namespace']} --ignore-not-found=true")
    
    # Delete pipelines
    for pipeline in deployed["pipelines"]:
        run(f"kubectl delete pipeline {pipeline} -n {config['namespace']} --ignore-not-found=true")
    
    # Delete models
    for model in deployed["models"]:
        run(f"kubectl delete model {model} -n {config['namespace']} --ignore-not-found=true")
    
    # Delete servers
    for server in deployed["servers"]:
        run(f"kubectl delete server {server} -n {config['namespace']} --ignore-not-found=true")
    
    # Clean up YAML files
    import glob
    for yaml_file in glob.glob("*.yaml"):
        try:
            import os
            os.remove(yaml_file)
        except:
            pass
    
    log("✅ Cleanup complete!")

# Uncomment to clean up
# cleanup_resources()