# Getting Started with Kubernetes on AMD GPUs - Interactive Tutorial

**Target Audience**: Infrastructure administrators and DevOps teams exploring AMD GPUs for production Kubernetes workloads

This notebook provides hands-on experience with deploying and managing AI inference workloads on Kubernetes clusters with AMD GPUs.

## Prerequisites
- Ubuntu/Debian server with AMD GPUs
- Root/sudo access for system-level operations
- At least 2GB RAM and 20GB free disk space
- Internet connectivity for package downloads

## 🎯 Tutorial Approach
This notebook is **completely self-contained**. You can execute all installation and deployment steps directly within Jupyter cells - no need to switch to terminal!

---

## 🔧 Section 0: System Prerequisites Check

Let's start by checking system requirements and current status.

In [None]:
import subprocess
import json
import time
import requests
import os
from IPython.display import display, HTML, Markdown
import pandas as pd

def run_command(command, check=False, show_output=True):
    """Helper function to run shell commands and return output"""
    try:
        result = subprocess.run(
            command, 
            shell=True, 
            capture_output=True, 
            text=True, 
            check=check
        )
        if show_output and result.stdout:
            print(result.stdout)
        if show_output and result.stderr and result.returncode != 0:
            print(f"Error: {result.stderr}")
        return result.returncode, result.stdout.strip(), result.stderr.strip()
    except subprocess.CalledProcessError as e:
        if show_output:
            print(f"Command failed: {e}")
        return e.returncode, e.stdout.strip() if e.stdout else "", e.stderr.strip() if e.stderr else ""

def run_kubectl(command):
    """Helper function to run kubectl commands and return output"""
    return run_command(f"kubectl {command}", show_output=False)

def check_command_exists(command):
    """Check if a command exists in the system"""
    returncode, _, _ = run_command(f"which {command}", show_output=False)
    return returncode == 0

print("✅ Helper functions loaded")
print("🎯 Ready to execute installation scripts directly in the notebook!")

In [None]:
# System prerequisites check
print("🔍 System Prerequisites Check")
print("=" * 40)

# Check OS
returncode, os_info, _ = run_command("cat /etc/os-release | grep PRETTY_NAME", show_output=False)
if returncode == 0:
    print(f"📟 Operating System: {os_info.split('=')[1].strip('\"')}")
else:
    print("❌ Could not detect OS")

# Check memory
returncode, memory_info, _ = run_command("free -h | grep Mem", show_output=False)
if returncode == 0:
    memory_total = memory_info.split()[1]
    print(f"💾 Total Memory: {memory_total}")
    
    # Extract numeric value for comparison
    memory_gb = float(memory_total.replace('Gi', '').replace('G', '').replace('Mi', '').replace('M', ''))
    if 'Mi' in memory_total or 'M' in memory_total:
        memory_gb = memory_gb / 1024
    
    if memory_gb < 2:
        print("⚠️ Warning: Less than 2GB RAM detected. Kubernetes may not perform well.")
    else:
        print("✅ Memory check passed")

# Check disk space
returncode, disk_info, _ = run_command("df -h / | tail -1", show_output=False)
if returncode == 0:
    disk_available = disk_info.split()[3]
    print(f"💿 Available Disk Space: {disk_available}")

# Check for AMD GPUs
returncode, gpu_info, _ = run_command("lspci | grep -i amd", show_output=False)
if returncode == 0 and gpu_info:
    print("✅ AMD GPUs detected:")
    for line in gpu_info.split('\n'):
        if line.strip():
            print(f"   🎮 {line.strip()}")
else:
    print("⚠️ No AMD GPUs detected")

# Check sudo access
returncode, _, _ = run_command("sudo -n true", show_output=False)
if returncode == 0:
    print("✅ Root/sudo access available")
else:
    print("⚠️ Root/sudo access may be required for some operations")

print("\n✅ System prerequisites check completed")

In [None]:
# Check Kubernetes installation status
print("🔍 Kubernetes Installation Status")
print("=" * 40)

# Check if Kubernetes components are installed
kubectl_installed = check_command_exists("kubectl")
kubeadm_installed = check_command_exists("kubeadm")
kubelet_installed = check_command_exists("kubelet")

print(f"kubectl installed: {'✅' if kubectl_installed else '❌'}")
print(f"kubeadm installed: {'✅' if kubeadm_installed else '❌'}")
print(f"kubelet installed: {'✅' if kubelet_installed else '❌'}")

# If kubectl is installed, check cluster connectivity
cluster_accessible = False
if kubectl_installed:
    returncode, version_output, error = run_kubectl("version --client")
    if returncode == 0:
        print(f"\n📋 kubectl version: {version_output.split()[2] if len(version_output.split()) > 2 else 'Unknown'}")
        
        # Check cluster access
        returncode, cluster_info, error = run_kubectl("cluster-info")
        if returncode == 0:
            print("✅ Kubernetes cluster is accessible")
            cluster_accessible = True
            
            # Show basic cluster info
            print("\n📊 Cluster Information:")
            for line in cluster_info.split('\n')[:3]:  # First 3 lines
                if line.strip():
                    print(f"   {line.strip()}")
        else:
            print("❌ Kubernetes cluster not accessible")
            print(f"   Error: {error}")

# Store status for next cells
kubernetes_needs_installation = not kubectl_installed or not kubeadm_installed or not kubelet_installed
kubernetes_needs_cluster_setup = kubectl_installed and not cluster_accessible

# Determine what needs to be done
print("\n" + "=" * 50)
if kubernetes_needs_installation:
    print("🚨 KUBERNETES INSTALLATION REQUIRED")
    print("\n💡 Execute the next cell to install Kubernetes automatically!")
elif kubernetes_needs_cluster_setup:
    print("🚨 KUBERNETES CLUSTER SETUP REQUIRED")
    print("\n💡 Execute the next cell to initialize the Kubernetes cluster!")
else:
    print("✅ KUBERNETES IS READY")
    print("\n🎯 You can skip to the AMD GPU Operator installation section.")

## 🚀 Section 1: One-Click Kubernetes Installation

**Execute this cell only if Kubernetes installation is required** (as indicated by the check above).

This cell will execute the complete Kubernetes installation script directly in the notebook.

In [None]:
# One-click Kubernetes installation
# Only run this if Kubernetes is not installed

print("🚀 Kubernetes Installation")
print("=" * 30)

if kubernetes_needs_installation or kubernetes_needs_cluster_setup:
    print("📦 Starting Kubernetes installation...")
    print("⏱️ This will take 10-15 minutes. Please be patient.")
    print("\n" + "="*60)
    
    # Execute the Kubernetes installation script
    returncode, output, error = run_command("sudo ./install-kubernetes.sh", show_output=True)
    
    print("\n" + "="*60)
    if returncode == 0:
        print("✅ Kubernetes installation completed successfully!")
        print("\n🔄 Re-checking Kubernetes status...")
        
        # Re-check status
        time.sleep(5)
        returncode, nodes, _ = run_kubectl("get nodes")
        if returncode == 0:
            print("\n📊 Cluster Nodes:")
            print(nodes)
        else:
            print("⚠️ Cluster may still be initializing. Wait a moment and check manually.")
    else:
        print(f"❌ Kubernetes installation failed with return code: {returncode}")
        print("\n💡 Check the error messages above and retry if needed.")
else:
    print("✅ Kubernetes is already installed and accessible.")
    print("\n🎯 Proceeding to next section...")

## 🎮 Section 2: One-Click AMD GPU Operator Installation

Now let's install the AMD GPU Operator to enable GPU support in our Kubernetes cluster.

In [None]:
# Check if AMD GPU Operator is already installed
print("🎯 AMD GPU Operator Installation Check")
print("=" * 45)

gpu_operator_installed = False
if check_command_exists("kubectl"):
    returncode, _, _ = run_kubectl("get namespace kube-amd-gpu")
    if returncode == 0:
        print("✅ AMD GPU Operator namespace found")
        
        # Check GPU operator pods
        returncode, gpu_pods, _ = run_kubectl("get pods -n kube-amd-gpu")
        if returncode == 0:
            print("\n🔧 GPU Operator Pods:")
            print(gpu_pods)
            gpu_operator_installed = True
        else:
            print("⚠️ GPU Operator namespace exists but pods not found")
    else:
        print("❌ AMD GPU Operator not installed")
else:
    print("❌ kubectl not available - install Kubernetes first")

print("\n" + "="*50)
if gpu_operator_installed:
    print("✅ AMD GPU OPERATOR IS ALREADY INSTALLED")
    print("\n🎯 You can skip to the vLLM deployment section.")
else:
    print("🚨 AMD GPU OPERATOR INSTALLATION REQUIRED")
    print("\n💡 Execute the next cell to install AMD GPU Operator!")

In [None]:
# One-click AMD GPU Operator installation
print("🎮 AMD GPU Operator Installation")
print("=" * 40)

if not gpu_operator_installed:
    print("📦 Starting AMD GPU Operator installation...")
    print("⏱️ This will take 5-10 minutes.")
    print("\n" + "="*60)
    
    # Execute the AMD GPU Operator installation script
    returncode, output, error = run_command("./install-amd-gpu-operator.sh", show_output=True)
    
    print("\n" + "="*60)
    if returncode == 0:
        print("✅ AMD GPU Operator installation completed successfully!")
        
        # Check GPU resources
        print("\n🔄 Checking GPU resources...")
        time.sleep(10)
        returncode, gpu_resources, _ = run_kubectl('get nodes -o custom-columns=NAME:.metadata.name,"Total GPUs:.status.capacity.amd\.com/gpu","Allocatable GPUs:.status.allocatable.amd\.com/gpu"')
        if returncode == 0:
            print("\n💾 GPU Resources:")
            print(gpu_resources)
        else:
            print("⚠️ GPU resources not yet visible. They may take a few minutes to appear.")
    else:
        print(f"❌ AMD GPU Operator installation failed with return code: {returncode}")
        print("\n💡 Check the error messages above and retry if needed.")
else:
    print("✅ AMD GPU Operator is already installed.")
    print("\n🎯 Proceeding to next section...")

## 🤖 Section 3: One-Click vLLM AI Inference Deployment

Now let's deploy a production-ready AI inference service using vLLM.

In [None]:
# Check if vLLM is already deployed
print("🤖 vLLM Deployment Status Check")
print("=" * 35)

vllm_deployed = False
if check_command_exists("kubectl"):
    returncode, _, _ = run_kubectl("get deployment vllm-inference")
    if returncode == 0:
        print("✅ vLLM deployment found")
        
        # Check deployment status
        returncode, deployment_status, _ = run_kubectl("get deployment vllm-inference")
        if returncode == 0:
            print("\n📊 Deployment Status:")
            print(deployment_status)
            vllm_deployed = True
        
        # Check service
        returncode, service_status, _ = run_kubectl("get service vllm-service")
        if returncode == 0:
            print("\n🌐 Service Status:")
            print(service_status)
    else:
        print("❌ vLLM deployment not found")
else:
    print("❌ kubectl not available")

print("\n" + "="*50)
if vllm_deployed:
    print("✅ vLLM IS ALREADY DEPLOYED")
    print("\n🎯 You can proceed to testing the API.")
else:
    print("🚨 vLLM DEPLOYMENT REQUIRED")
    print("\n💡 Execute the next cell to deploy vLLM!")

In [None]:
# One-click vLLM deployment
print("🤖 vLLM AI Inference Deployment")
print("=" * 35)

if not vllm_deployed:
    print("📦 Starting vLLM deployment...")
    print("⏱️ This will take 5-10 minutes (includes downloading the model).")
    print("\n" + "="*60)
    
    # Execute the vLLM deployment script
    returncode, output, error = run_command("./deploy-vllm-inference.sh", show_output=True)
    
    print("\n" + "="*60)
    if returncode == 0:
        print("✅ vLLM deployment completed successfully!")
        
        # Check deployment status
        print("\n🔄 Checking deployment status...")
        time.sleep(5)
        
        returncode, pods, _ = run_kubectl("get pods -l app=vllm-inference")
        if returncode == 0:
            print("\n📦 vLLM Pods:")
            print(pods)
        
        returncode, service, _ = run_kubectl("get service vllm-service")
        if returncode == 0:
            print("\n🌐 vLLM Service:")
            print(service)
    else:
        print(f"❌ vLLM deployment failed with return code: {returncode}")
        print("\n💡 Check the error messages above and retry if needed.")
else:
    print("✅ vLLM is already deployed.")
    print("\n🎯 Proceeding to API testing...")

## 🧪 Section 4: Test AI Inference API

Let's test our deployed AI inference service!

In [None]:
# Get service endpoint for API testing
def get_vllm_endpoint():
    """Get the vLLM service endpoint"""
    if not check_command_exists("kubectl"):
        return "kubectl-not-available"
        
    try:
        # Try to get LoadBalancer external IP
        returncode, external_ip, _ = run_kubectl("get service vllm-service -o jsonpath='{.status.loadBalancer.ingress[0].ip}'")
        if returncode == 0 and external_ip and external_ip != "null" and external_ip.strip():
            return f"http://{external_ip.strip()}"
        
        # Fallback to NodePort
        returncode, node_ip, _ = run_kubectl("get nodes -o jsonpath='{.items[0].status.addresses[?(@.type==\"InternalIP\")].address}'")
        returncode2, node_port, _ = run_kubectl("get service vllm-service -o jsonpath='{.spec.ports[0].nodePort}'")
        
        if returncode == 0 and returncode2 == 0 and node_ip and node_port:
            return f"http://{node_ip.strip()}:{node_port.strip()}"
        
        # Last resort: port-forward indication
        return "port-forward"
    except:
        return "port-forward"

print("🌍 vLLM Service Endpoint Detection")
print("=" * 40)

endpoint = get_vllm_endpoint()

if endpoint == "kubectl-not-available":
    print("❌ kubectl not available - cannot detect vLLM endpoint")
elif endpoint == "port-forward":
    print("⚠️ External access not available. Setting up port-forward...")
    print("\n🔧 Creating port-forward tunnel...")
    
    # Start port-forward in background
    import threading
    import subprocess
    
    def port_forward():
        subprocess.run(["kubectl", "port-forward", "service/vllm-service", "8000:8000"])
    
    # Start port-forward in a separate thread
    pf_thread = threading.Thread(target=port_forward, daemon=True)
    pf_thread.start()
    time.sleep(5)  # Give port-forward time to establish
    
    endpoint = "http://localhost:8000"
    print(f"✅ Port-forward established. Using: {endpoint}")
else:
    print(f"✅ vLLM Service accessible at: {endpoint}")
    print(f"   API endpoint: {endpoint}/v1/completions")
    print(f"   Health check: {endpoint}/health")

In [None]:
# Test vLLM API health endpoint
def test_vllm_health(endpoint_url):
    """Test vLLM health endpoint"""
    try:
        response = requests.get(f"{endpoint_url}/health", timeout=10)
        if response.status_code == 200:
            return "✅ Healthy", response.text
        else:
            return f"❌ Status: {response.status_code}", response.text
    except requests.exceptions.RequestException as e:
        return "❌ Connection Failed", str(e)

print("🏥 Testing vLLM Health Endpoint")
print("=" * 35)

if endpoint not in ["kubectl-not-available"]:
    status, response = test_vllm_health(endpoint)
    print(f"Health Status: {status}")
    print(f"Response: {response}")
    
    if "Healthy" in status:
        print("\n🎉 vLLM service is healthy and ready for AI inference!")
    else:
        print("\n⚠️ Service may still be starting up. Wait a few minutes and retry.")
else:
    print("❌ Cannot test health endpoint - service not accessible")

In [None]:
# Test vLLM API with a simple completion request
def test_vllm_completion(endpoint_url, prompt, max_tokens=100):
    """Test vLLM completion endpoint"""
    try:
        payload = {
            "model": "microsoft/Llama-3.2-1B-Instruct",
            "prompt": prompt,
            "max_tokens": max_tokens,
            "temperature": 0.7
        }
        
        response = requests.post(
            f"{endpoint_url}/v1/completions",
            json=payload,
            headers={"Content-Type": "application/json"},
            timeout=30
        )
        
        if response.status_code == 200:
            return "✅ Success", response.json()
        else:
            return f"❌ Status: {response.status_code}", response.text
            
    except requests.exceptions.RequestException as e:
        return "❌ Request Failed", str(e)

print("🧠 Testing vLLM AI Completion")
print("=" * 35)

test_prompt = "The benefits of using Kubernetes for AI workloads include:"

if endpoint not in ["kubectl-not-available"]:
    print(f"📝 Prompt: {test_prompt}")
    print("\n🔄 Generating AI response...")
    
    status, response = test_vllm_completion(endpoint, test_prompt, max_tokens=150)
    print(f"\nStatus: {status}")
    
    if "Success" in status:
        completion = response['choices'][0]['text']
        print(f"\n🤖 AI Response: {completion}")
        print(f"\n📊 Usage Stats: {response.get('usage', 'N/A')}")
        print("\n🎉 Congratulations! Your AI inference service is working perfectly!")
    else:
        print(f"Error: {response}")
        print("\n💡 The service may still be initializing. Wait a few minutes and retry.")
else:
    print("❌ Cannot test AI completion - service not accessible")

## 📈 Section 5: Interactive Scaling and Monitoring

Let's explore Kubernetes' scaling capabilities with GPU workloads.

In [None]:
# Demonstrate scaling the deployment
print("🚀 Interactive Scaling Demonstration")
print("=" * 40)

if check_command_exists("kubectl"):
    # Check current scale
    returncode, current_replicas, _ = run_kubectl("get deployment vllm-inference -o jsonpath='{.spec.replicas}'")
    returncode2, ready_replicas, _ = run_kubectl("get deployment vllm-inference -o jsonpath='{.status.readyReplicas}'")
    
    if returncode == 0:
        print(f"📊 Current Scale:")
        print(f"   Desired Replicas: {current_replicas}")
        print(f"   Ready Replicas: {ready_replicas if returncode2 == 0 else 'Unknown'}")
        
        # Scale to 2 replicas if currently 1, or to 1 if currently 2+
        current_count = int(current_replicas) if current_replicas.isdigit() else 1
        target_count = 2 if current_count == 1 else 1
        
        print(f"\n📈 Scaling to {target_count} replica(s)...")
        returncode, scale_result, error = run_kubectl(f"scale deployment vllm-inference --replicas={target_count}")
        
        if returncode == 0:
            print(f"✅ Scale command executed successfully")
            
            # Wait and check new status
            print("\n⏳ Waiting for scaling to take effect...")
            time.sleep(15)
            
            returncode, new_status, _ = run_kubectl("get deployment vllm-inference")
            if returncode == 0:
                print(f"\n📊 Updated Deployment Status:")
                print(new_status)
            
            # Show pods
            returncode, pod_status, _ = run_kubectl("get pods -l app=vllm-inference")
            if returncode == 0:
                print("\n📦 Pod Status:")
                print(pod_status)
        else:
            print(f"❌ Scale command failed: {error}")
    else:
        print("❌ vLLM deployment not found")
else:
    print("❌ kubectl not available")

In [None]:
# Monitor GPU resource usage
print("💾 GPU Resource Monitoring")
print("=" * 30)

if check_command_exists("kubectl"):
    # Check GPU allocation across nodes
    print("🖥️ GPU Resources per Node:")
    returncode, gpu_allocation, _ = run_kubectl('get nodes -o custom-columns=NAME:.metadata.name,"TOTAL_GPU:.status.capacity.amd\.com/gpu","ALLOCATABLE_GPU:.status.allocatable.amd\.com/gpu"')
    if returncode == 0:
        print(gpu_allocation)
    else:
        print("❌ Could not get GPU allocation info")
    
    # Check which pods are using GPUs
    print("\n🎯 Current GPU Workloads:")
    returncode, gpu_pods, _ = run_kubectl('get pods --all-namespaces -o custom-columns=NAMESPACE:.metadata.namespace,NAME:.metadata.name,NODE:.spec.nodeName,"GPU_REQUEST:.spec.containers[*].resources.requests.amd\.com/gpu"')
    if returncode == 0:
        # Filter only pods that actually request GPUs
        lines = gpu_pods.split('\n')
        header = lines[0]
        gpu_requesting_pods = [line for line in lines[1:] if line and not line.endswith('<none>') and len(line.split()) >= 4 and line.split()[-1] not in ['<none>', '']]
        
        if gpu_requesting_pods:
            print(header)
            for pod in gpu_requesting_pods:
                print(pod)
        else:
            print("   No pods currently requesting GPUs")
    else:
        print("❌ Could not check GPU usage by pods")
    
    # Show cluster events (last 5)
    print("\n📋 Recent Cluster Events:")
    returncode, events, _ = run_kubectl("get events --sort-by=.metadata.creationTimestamp | tail -5")
    if returncode == 0:
        print(events)
    else:
        print("❌ Could not get cluster events")
else:
    print("❌ kubectl not available")

## 🎯 Section 6: Interactive Troubleshooting Tools

Essential commands and tools for managing GPU workloads in production.

In [None]:
# Interactive troubleshooting toolkit
print("🔍 Interactive Troubleshooting Toolkit")
print("=" * 45)

if check_command_exists("kubectl"):
    print("\n1️⃣ Cluster Health Check:")
    returncode, nodes, _ = run_kubectl("get nodes")
    if returncode == 0:
        print(nodes)
    
    print("\n2️⃣ System Pods Status:")
    returncode, system_pods, _ = run_kubectl("get pods -n kube-system | head -10")
    if returncode == 0:
        print(system_pods)
    
    print("\n3️⃣ GPU Operator Status:")
    returncode, gpu_pods, _ = run_kubectl("get pods -n kube-amd-gpu")
    if returncode == 0:
        print(gpu_pods)
    else:
        print("   GPU Operator not installed or pods not found")
    
    print("\n4️⃣ vLLM Application Status:")
    returncode, vllm_pods, _ = run_kubectl("get pods -l app=vllm-inference")
    if returncode == 0:
        print(vllm_pods)
    else:
        print("   vLLM not deployed or pods not found")
    
    print("\n5️⃣ Service Status:")
    returncode, services, _ = run_kubectl("get services")
    if returncode == 0:
        print(services)
else:
    print("❌ kubectl not available - cannot run troubleshooting commands")

# Quick troubleshooting reference
print("\n" + "="*50)
print("🛠️ Quick Troubleshooting Reference:")
print("\nTo check specific component logs:")
print("• kubectl logs -l app=vllm-inference")
print("• kubectl logs -n kube-amd-gpu -l app.kubernetes.io/name=gpu-operator-charts")
print("• kubectl logs -n kube-system -l k8s-app=calico-node")
print("\nTo describe resources:")
print("• kubectl describe node <node-name>")
print("• kubectl describe pod <pod-name>")
print("• kubectl describe deployment vllm-inference")
print("\nTo restart failed pods:")
print("• kubectl delete pod <pod-name>  # Pod will be recreated")
print("• kubectl rollout restart deployment vllm-inference")

In [None]:
# Generate comprehensive cluster summary report
print("📋 Comprehensive Cluster Summary Report")
print("=" * 45)

summary_data = {}

if check_command_exists("kubectl"):
    # Cluster nodes
    returncode, nodes, _ = run_kubectl("get nodes --no-headers")
    if returncode == 0:
        node_count = len([line for line in nodes.split('\n') if line.strip()])
        ready_nodes = len([line for line in nodes.split('\n') if 'Ready' in line])
        summary_data["Cluster Status"] = f"{ready_nodes}/{node_count} nodes ready"
    else:
        summary_data["Cluster Status"] = "Not accessible"
    
    # AMD GPU nodes
    returncode, gpu_nodes, _ = run_kubectl("get nodes -l feature.node.kubernetes.io/amd-gpu=true --no-headers")
    if returncode == 0:
        gpu_node_count = len([line for line in gpu_nodes.split('\n') if line.strip()])
        summary_data["AMD GPU Nodes"] = str(gpu_node_count)
    else:
        summary_data["AMD GPU Nodes"] = "0"
    
    # GPU Operator status
    returncode, _, _ = run_kubectl("get namespace kube-amd-gpu")
    summary_data["GPU Operator"] = "✅ Installed" if returncode == 0 else "❌ Not Installed"
    
    # vLLM deployment
    returncode, vllm_status, _ = run_kubectl("get deployment vllm-inference -o jsonpath='{.status.readyReplicas}/{.spec.replicas}'")
    if returncode == 0 and vllm_status:
        summary_data["vLLM Deployment"] = f"✅ Running ({vllm_status} replicas)"
    else:
        summary_data["vLLM Deployment"] = "❌ Not Deployed"
    
    # Total GPU resources
    returncode, gpu_capacity, _ = run_kubectl('get nodes -o jsonpath="{.items[*].status.capacity.amd\.com/gpu}"')
    if returncode == 0 and gpu_capacity.strip():
        try:
            gpus = [int(x) for x in gpu_capacity.split() if x.isdigit()]
            total_gpus = sum(gpus) if gpus else 0
            summary_data["Total GPU Resources"] = str(total_gpus)
        except:
            summary_data["Total GPU Resources"] = "0"
    else:
        summary_data["Total GPU Resources"] = "0"
    
    # Service accessibility
    returncode, service_info, _ = run_kubectl("get service vllm-service")
    if returncode == 0:
        if "LoadBalancer" in service_info:
            summary_data["AI Service Access"] = "✅ LoadBalancer"
        else:
            summary_data["AI Service Access"] = "✅ Available (NodePort)"
    else:
        summary_data["AI Service Access"] = "❌ Not Available"
else:
    summary_data = {
        "Kubernetes": "❌ Not Installed",
        "Recommendation": "Execute Kubernetes installation cell above"
    }

# Display summary
for key, value in summary_data.items():
    print(f"• {key}: {value}")

print("\n" + "="*50)

# Determine overall status
if check_command_exists("kubectl") and "✅ Installed" in summary_data.get("GPU Operator", "") and "✅ Running" in summary_data.get("vLLM Deployment", ""):
    print("🎉 CONGRATULATIONS! Your GPU-Accelerated AI Platform is Ready!")
    print("\n🚀 What you've accomplished:")
    print("   ✅ Kubernetes cluster with AMD GPU support")
    print("   ✅ Production-ready AI inference service")
    print("   ✅ Scalable, cloud-native architecture")
    print("   ✅ Complete monitoring and troubleshooting toolkit")
    
    print("\n🎯 Next Steps for Production:")
    print("   • Deploy your own AI models")
    print("   • Set up Prometheus/Grafana monitoring")
    print("   • Implement autoscaling policies")
    print("   • Configure resource quotas for multi-tenancy")
    print("   • Explore multi-GPU model parallelism")
else:
    print("🔧 Setup Status: Some components need attention")
    print("\n💡 Next Steps:")
    if not check_command_exists("kubectl"):
        print("   • Execute the Kubernetes installation cell")
    if "❌ Not Installed" in summary_data.get("GPU Operator", ""):
        print("   • Execute the AMD GPU Operator installation cell")
    if "❌ Not Deployed" in summary_data.get("vLLM Deployment", ""):
        print("   • Execute the vLLM deployment cell")
    print("   • Re-run this summary cell to check progress")

## 🎓 Tutorial Complete: Key Takeaways and Next Steps

### 🎉 What You've Accomplished

Congratulations! You've successfully built a complete GPU-accelerated AI infrastructure stack:

1. **✅ Complete Infrastructure Setup**: From bare Ubuntu server to production-ready GPU cluster
2. **✅ AMD GPU Integration**: Seamlessly integrated MI300X GPUs with Kubernetes
3. **✅ AI Inference Deployment**: Deployed production-ready vLLM service with load balancing
4. **✅ Scaling & Monitoring**: Demonstrated horizontal scaling and resource monitoring
5. **✅ Self-Service Experience**: Everything executed directly in Jupyter - no terminal needed!

### 🏗️ Architecture You've Built

```
┌─────────────────────────────────────────────┐
│              Jupyter Notebook               │
│         (Interactive Management)            │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│            vLLM AI Service                  │
│  ┌─────────────┐    ┌─────────────┐       │
│  │ Load Balancer│    │  Auto Scaling│       │
│  └─────────────┘    └─────────────┘       │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│           Kubernetes Orchestration          │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐   │
│  │   Pods   │ │ Services │ │Deployments│   │
│  └──────────┘ └──────────┘ └──────────┘   │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│           AMD GPU Operator                  │
│           (Resource Management)             │
└─────────────────────────────────────────────┘
┌─────────────────────────────────────────────┐
│         AMD Instinct MI300X GPUs            │
│              (192GB HBM3)                   │
└─────────────────────────────────────────────┘
```

### 🎯 Production Considerations

- **Security**: Implement RBAC, network policies, and pod security standards
- **High Availability**: Deploy across multiple nodes with anti-affinity rules
- **Monitoring**: Add Prometheus/Grafana for comprehensive observability
- **Backup**: Implement backup strategies for persistent data and configurations
- **Cost Optimization**: Use resource quotas, limits, and spot instances

### 📚 Learn More

- **[AMD GPU Operator Documentation](https://rocm.github.io/gpu-operator/)**
- **[vLLM Documentation](https://docs.vllm.ai/)**
- **[Kubernetes GPU Scheduling](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/)**
- **[ROCm Blog Series](https://rocm.blogs.amd.com/artificial-intelligence/k8s-orchestration-part1/README.html)**

### 🚀 You're Ready for Production!

Your infrastructure is now ready to handle enterprise AI workloads. You've mastered the complete stack from bare metal to production AI services - all through an interactive, self-contained Jupyter experience!

**Happy AI inferencing! 🤖✨**