# Getting Started with Kubernetes on AMD GPUs - Interactive Tutorial

**Target Audience**: Infrastructure administrators and DevOps teams exploring AMD GPUs for production Kubernetes workloads

This notebook provides hands-on experience with deploying and managing AI inference workloads on Kubernetes clusters with AMD GPUs.

## Prerequisites
- Ubuntu/Debian server with AMD GPUs
- Root/sudo access for system-level operations
- At least 2GB RAM and 20GB free disk space
- Internet connectivity for package downloads

---

## üîß Section 0: Kubernetes Prerequisites and Installation

Before we can work with AMD GPUs, we need a functioning Kubernetes cluster. This section will check if Kubernetes is installed and guide you through installation if needed.

In [None]:
import subprocess
import json
import time
import requests
import os
from IPython.display import display, HTML, Markdown
import pandas as pd

def run_command(command, check=False):
    """Helper function to run shell commands and return output"""
    try:
        result = subprocess.run(
            command, 
            shell=True, 
            capture_output=True, 
            text=True, 
            check=check
        )
        return result.returncode, result.stdout.strip(), result.stderr.strip()
    except subprocess.CalledProcessError as e:
        return e.returncode, e.stdout.strip() if e.stdout else "", e.stderr.strip() if e.stderr else ""

def run_kubectl(command):
    """Helper function to run kubectl commands and return output"""
    return run_command(f"kubectl {command}")

def check_command_exists(command):
    """Check if a command exists in the system"""
    returncode, _, _ = run_command(f"which {command}")
    return returncode == 0

print("‚úÖ Helper functions loaded")

In [None]:
# System prerequisites check
print("üîç System Prerequisites Check")
print("=" * 40)

# Check OS
returncode, os_info, _ = run_command("cat /etc/os-release | grep PRETTY_NAME")
if returncode == 0:
    print(f"üìü Operating System: {os_info.split('=')[1].strip('\"')}")
else:
    print("‚ùå Could not detect OS")

# Check memory
returncode, memory_info, _ = run_command("free -h | grep Mem")
if returncode == 0:
    memory_total = memory_info.split()[1]
    print(f"üíæ Total Memory: {memory_total}")
    
    # Extract numeric value for comparison
    memory_gb = float(memory_total.replace('Gi', '').replace('G', '').replace('Mi', '').replace('M', ''))
    if 'Mi' in memory_total or 'M' in memory_total:
        memory_gb = memory_gb / 1024
    
    if memory_gb < 2:
        print("‚ö†Ô∏è Warning: Less than 2GB RAM detected. Kubernetes may not perform well.")
    else:
        print("‚úÖ Memory check passed")

# Check disk space
returncode, disk_info, _ = run_command("df -h / | tail -1")
if returncode == 0:
    disk_available = disk_info.split()[3]
    print(f"üíø Available Disk Space: {disk_available}")

# Check for AMD GPUs
returncode, gpu_info, _ = run_command("lspci | grep -i amd")
if returncode == 0 and gpu_info:
    print("‚úÖ AMD GPUs detected:")
    for line in gpu_info.split('\n'):
        if line.strip():
            print(f"   üéÆ {line.strip()}")
else:
    print("‚ö†Ô∏è No AMD GPUs detected")

# Check sudo access
returncode, _, _ = run_command("sudo -n true")
if returncode == 0:
    print("‚úÖ Root/sudo access available")
else:
    print("‚ö†Ô∏è Root/sudo access may be required for some operations")

print("\n‚úÖ System prerequisites check completed")

In [None]:
# Check Kubernetes installation status
print("üîç Kubernetes Installation Check")
print("=" * 40)

# Check if kubectl is installed
kubectl_installed = check_command_exists("kubectl")
kubeadm_installed = check_command_exists("kubeadm")
kubelet_installed = check_command_exists("kubelet")

print(f"kubectl installed: {'‚úÖ' if kubectl_installed else '‚ùå'}")
print(f"kubeadm installed: {'‚úÖ' if kubeadm_installed else '‚ùå'}")
print(f"kubelet installed: {'‚úÖ' if kubelet_installed else '‚ùå'}")

# If kubectl is installed, check cluster connectivity
cluster_accessible = False
if kubectl_installed:
    returncode, version_output, error = run_kubectl("version --client")
    if returncode == 0:
        print(f"\nüìã kubectl version: {version_output.split()[2] if len(version_output.split()) > 2 else 'Unknown'}")
        
        # Check cluster access
        returncode, cluster_info, error = run_kubectl("cluster-info")
        if returncode == 0:
            print("‚úÖ Kubernetes cluster is accessible")
            cluster_accessible = True
            
            # Show basic cluster info
            print("\nüìä Cluster Information:")
            for line in cluster_info.split('\n')[:3]:  # First 3 lines
                if line.strip():
                    print(f"   {line.strip()}")
        else:
            print("‚ùå Kubernetes cluster not accessible")
            print(f"   Error: {error}")

# Determine what needs to be done
print("\n" + "=" * 50)
if not kubectl_installed or not kubeadm_installed or not kubelet_installed:
    print("üö® KUBERNETES INSTALLATION REQUIRED")
    print("\nYou need to install Kubernetes. Options:")
    print("1. Run the installation script: sudo ./install-kubernetes.sh")
    print("2. Or execute the installation cells below")
elif not cluster_accessible:
    print("üö® KUBERNETES CLUSTER SETUP REQUIRED")
    print("\nKubernetes is installed but cluster is not accessible.")
    print("You may need to initialize the cluster or fix configuration.")
else:
    print("‚úÖ KUBERNETES IS READY")
    print("\nYou can proceed to the AMD GPU Operator installation section.")

### üõ†Ô∏è Kubernetes Installation (Run only if needed)

**‚ö†Ô∏è IMPORTANT**: Run these cells only if Kubernetes is not installed or not accessible. This requires root/sudo privileges.

In [None]:
# Option 1: Run the installation script (Recommended)
print("üöÄ Option 1: Automated Kubernetes Installation")
print("=" * 45)
print("\nTo install Kubernetes automatically, run this command in a terminal:")
print("\n" + "="*60)
print("sudo ./install-kubernetes.sh")
print("="*60)
print("\nThis script will:")
print("‚Ä¢ Install containerd container runtime")
print("‚Ä¢ Install Kubernetes components (kubelet, kubeadm, kubectl)")
print("‚Ä¢ Initialize the cluster")
print("‚Ä¢ Install Calico CNI networking")
print("‚Ä¢ Configure single-node cluster")
print("‚Ä¢ Verify installation")
print("\n‚è±Ô∏è Estimated time: 10-15 minutes")
print("\nüí° After running the script, restart this notebook and re-run the checks above.")

In [None]:
# Option 2: Step-by-step installation (Advanced users)
# WARNING: This cell demonstrates the installation steps but requires careful execution

print("üîß Option 2: Manual Step-by-Step Installation")
print("=" * 45)
print("\n‚ö†Ô∏è WARNING: This is for demonstration purposes.")
print("For actual installation, use Option 1 (install-kubernetes.sh script)")
print("\nKey installation steps that the script performs:")

installation_steps = [
    "1. Disable swap: swapoff -a",
    "2. Load kernel modules: modprobe overlay && modprobe br_netfilter",
    "3. Configure sysctl: net.bridge.bridge-nf-call-iptables = 1",
    "4. Install containerd container runtime",
    "5. Add Kubernetes repositories",
    "6. Install kubelet, kubeadm, kubectl",
    "7. Initialize cluster: kubeadm init",
    "8. Configure kubectl access",
    "9. Install Calico CNI",
    "10. Remove control-plane taints for single-node setup"
]

for step in installation_steps:
    print(f"   {step}")

print("\nüîó For detailed commands, see the install-kubernetes.sh script")

In [None]:
# Verify Kubernetes installation after manual setup
print("‚úÖ Post-Installation Verification")
print("=" * 35)
print("\nRun this cell after Kubernetes installation to verify:")

# Re-check Kubernetes
if check_command_exists("kubectl"):
    print("‚úÖ kubectl is now available")
    
    # Check cluster access
    returncode, output, error = run_kubectl("get nodes")
    if returncode == 0:
        print("‚úÖ Cluster is accessible")
        print("\nüìä Node Status:")
        print(output)
        
        # Check system pods
        returncode, pods_output, _ = run_kubectl("get pods -n kube-system")
        if returncode == 0:
            print("\nüîß System Pods Status:")
            print(pods_output)
    else:
        print(f"‚ùå Cluster access failed: {error}")
else:
    print("‚ùå kubectl still not available")

print("\nüéØ Ready to proceed to AMD GPU Operator installation!")

## üöÄ Section 1: Environment Setup and Verification

Now that we have Kubernetes running, let's verify our cluster and check for AMD GPU setup.

In [None]:
# Check Kubernetes cluster information (updated for better error handling)
print("üîç Kubernetes Cluster Information")
print("=" * 50)

if not check_command_exists("kubectl"):
    print("‚ùå kubectl not found. Please install Kubernetes first.")
    print("   Run: sudo ./install-kubernetes.sh")
else:
    returncode, cluster_info, error = run_kubectl("cluster-info")
    if returncode == 0:
        print(cluster_info)
        
        print("\nüìä Node Status:")
        returncode, nodes, _ = run_kubectl("get nodes -o wide")
        if returncode == 0:
            print(nodes)
        else:
            print("‚ùå Could not get node information")
    else:
        print(f"‚ùå Cluster not accessible: {error}")
        print("\nüí° Troubleshooting tips:")
        print("   ‚Ä¢ Check if kubelet is running: systemctl status kubelet")
        print("   ‚Ä¢ Verify KUBECONFIG: echo $KUBECONFIG")
        print("   ‚Ä¢ Try: export KUBECONFIG=/etc/kubernetes/admin.conf")

In [None]:
# Check AMD GPU Operator installation
print("üéØ AMD GPU Operator Status")
print("=" * 40)

if not check_command_exists("kubectl"):
    print("‚ùå kubectl not available - cannot check GPU Operator")
else:
    # Check if AMD GPU Operator namespace exists
    returncode, gpu_ns, _ = run_kubectl("get namespace kube-amd-gpu")
    if returncode == 0:
        print("‚úÖ AMD GPU Operator namespace found")
        
        # Check GPU operator pods
        print("\nüîß GPU Operator Pods:")
        returncode, gpu_pods, _ = run_kubectl("get pods -n kube-amd-gpu")
        if returncode == 0:
            print(gpu_pods)
        else:
            print("‚ùå Could not get GPU operator pods")
        
        # Check node labels for AMD GPUs
        print("\nüè∑Ô∏è Node GPU Labels:")
        returncode, node_labels, _ = run_kubectl("get nodes -L feature.node.kubernetes.io/amd-gpu")
        if returncode == 0:
            print(node_labels)
        else:
            print("‚ùå Could not get node GPU labels")
    else:
        print("‚ùå AMD GPU Operator not installed")
        print("\nüí° To install AMD GPU Operator:")
        print("   Run: ./install-amd-gpu-operator.sh")
        print("   Or proceed to the GPU Operator installation section below")

In [None]:
# Check GPU resources availability
print("üíæ GPU Resources on Nodes")
print("=" * 35)

if not check_command_exists("kubectl"):
    print("‚ùå kubectl not available - cannot check GPU resources")
else:
    returncode, gpu_resources, _ = run_kubectl('get nodes -o custom-columns=NAME:.metadata.name,"Total GPUs:.status.capacity.amd\.com/gpu","Allocatable GPUs:.status.allocatable.amd\.com/gpu"')
    if returncode == 0:
        print(gpu_resources)
        
        # Check for any running GPU workloads
        print("\nüèÉ Current GPU Workloads:")
        returncode, gpu_workloads, _ = run_kubectl('get pods --all-namespaces -o custom-columns=NAMESPACE:.metadata.namespace,NAME:.metadata.name,"GPU_REQUESTS:.spec.containers[*].resources.requests.amd\.com/gpu"')
        if returncode == 0:
            # Filter only pods that actually request GPUs
            lines = gpu_workloads.split('\n')
            header = lines[0]
            gpu_pods = [line for line in lines[1:] if line.split()[-1] not in ['<none>', ''] and len(line.split()) > 2]
            
            if gpu_pods:
                print(header)
                for pod in gpu_pods:
                    print(pod)
            else:
                print("   No GPU workloads currently running")
        else:
            print("‚ùå Could not check GPU workloads")
    else:
        print("‚ùå Could not check GPU resources")
        print("   This is normal if AMD GPU Operator is not installed yet")

## ü§ñ Section 2: Deploy and Test vLLM AI Inference

Now let's work with AI inference workloads using vLLM on our AMD GPU-enabled Kubernetes cluster.

In [None]:
# Check if vLLM deployment exists
print("üîç vLLM Deployment Status")
print("=" * 30)

if not check_command_exists("kubectl"):
    print("‚ùå kubectl not available - cannot check vLLM deployment")
else:
    returncode, vllm_deployment, _ = run_kubectl("get deployment vllm-inference")
    if returncode == 0:
        print(f"‚úÖ vLLM Deployment found:")
        print(vllm_deployment)
        
        # Check vLLM pods
        print("\nüì¶ vLLM Pods:")
        returncode, vllm_pods, _ = run_kubectl("get pods -l app=vllm-inference")
        if returncode == 0:
            print(vllm_pods)
        else:
            print("‚ùå Could not get vLLM pods")
        
        # Check vLLM service
        print("\nüåê vLLM Service:")
        returncode, vllm_service, _ = run_kubectl("get service vllm-service")
        if returncode == 0:
            print(vllm_service)
        else:
            print("‚ùå Could not get vLLM service")
    else:
        print("‚ùå vLLM deployment not found")
        print("\nüí° To deploy vLLM inference:")
        print("   1. First ensure AMD GPU Operator is installed")
        print("   2. Run: ./deploy-vllm-inference.sh")
        print("   3. Or continue with the deployment sections below")

In [None]:
# Get service endpoint for API testing
def get_vllm_endpoint():
    """Get the vLLM service endpoint"""
    if not check_command_exists("kubectl"):
        return "kubectl-not-available"
        
    try:
        # Try to get LoadBalancer external IP
        returncode, external_ip, _ = run_kubectl("get service vllm-service -o jsonpath='{.status.loadBalancer.ingress[0].ip}'")
        if returncode == 0 and external_ip and external_ip != "null" and external_ip.strip():
            return f"http://{external_ip.strip()}"
        
        # Fallback to NodePort
        returncode, node_ip, _ = run_kubectl("get nodes -o jsonpath='{.items[0].status.addresses[?(@.type==\"InternalIP\")].address}'")
        returncode2, node_port, _ = run_kubectl("get service vllm-service -o jsonpath='{.spec.ports[0].nodePort}'")
        
        if returncode == 0 and returncode2 == 0 and node_ip and node_port:
            return f"http://{node_ip.strip()}:{node_port.strip()}"
        
        # Fallback to port-forward indication
        return "port-forward"
    except:
        return "port-forward"

endpoint = get_vllm_endpoint()
print(f"üåç vLLM Service Endpoint Detection")
print("=" * 40)

if endpoint == "kubectl-not-available":
    print("‚ùå kubectl not available - cannot detect vLLM endpoint")
elif endpoint == "port-forward":
    print("‚ö†Ô∏è No external access detected. Use port-forward for testing:")
    print("\nüí° To access vLLM service:")
    print("   kubectl port-forward service/vllm-service 8000:8000")
    print("   Then use: http://localhost:8000")
    endpoint = "http://localhost:8000"
else:
    print(f"‚úÖ vLLM Service accessible at: {endpoint}")
    print(f"   API endpoint: {endpoint}/v1/completions")
    print(f"   Health check: {endpoint}/health")

In [None]:
# Test vLLM API health endpoint
def test_vllm_health(endpoint_url):
    """Test vLLM health endpoint"""
    try:
        response = requests.get(f"{endpoint_url}/health", timeout=10)
        if response.status_code == 200:
            return "‚úÖ Healthy", response.text
        else:
            return f"‚ùå Status: {response.status_code}", response.text
    except requests.exceptions.RequestException as e:
        return "‚ùå Connection Failed", str(e)

print("üè• Testing vLLM Health Endpoint")
print("=" * 35)

if endpoint in ["kubectl-not-available", "port-forward"]:
    print("‚ö†Ô∏è Cannot test health endpoint without proper service access.")
    if endpoint == "port-forward":
        print("\nüí° To test the health endpoint:")
        print("   1. Run: kubectl port-forward service/vllm-service 8000:8000")
        print("   2. In another terminal: curl http://localhost:8000/health")
        print("   3. Or re-run this cell after setting up port-forward")
else:
    status, response = test_vllm_health(endpoint)
    print(f"Health Status: {status}")
    print(f"Response: {response}")

In [None]:
# Test vLLM API with a simple completion request
def test_vllm_completion(endpoint_url, prompt, max_tokens=50):
    """Test vLLM completion endpoint"""
    try:
        payload = {
            "model": "microsoft/Llama-3.2-1B-Instruct",
            "prompt": prompt,
            "max_tokens": max_tokens,
            "temperature": 0.7
        }
        
        response = requests.post(
            f"{endpoint_url}/v1/completions",
            json=payload,
            headers={"Content-Type": "application/json"},
            timeout=30
        )
        
        if response.status_code == 200:
            return "‚úÖ Success", response.json()
        else:
            return f"‚ùå Status: {response.status_code}", response.text
            
    except requests.exceptions.RequestException as e:
        return "‚ùå Request Failed", str(e)

print("üß† Testing vLLM AI Completion")
print("=" * 35)

test_prompt = "The benefits of using Kubernetes for AI workloads include:"

if endpoint in ["kubectl-not-available", "port-forward"]:
    print("‚ö†Ô∏è Cannot test completion endpoint without proper service access.")
    if endpoint == "port-forward":
        print(f"\nüìù Test prompt: {test_prompt}")
        print("\nüí° To test AI completion:")
        print("   1. Run: kubectl port-forward service/vllm-service 8000:8000")
        print("   2. Use curl or re-run this cell after port-forward setup")
        print("\nüîß Example curl command:")
        print('   curl -X POST http://localhost:8000/v1/completions \\')
        print('     -H "Content-Type: application/json" \\')
        print('     -d \'{"model": "microsoft/Llama-3.2-1B-Instruct", "prompt": "' + test_prompt + '", "max_tokens": 100}\'')
else:
    print(f"üìù Prompt: {test_prompt}")
    print("\nüîÑ Generating response...")
    
    status, response = test_vllm_completion(endpoint, test_prompt, max_tokens=100)
    print(f"\nStatus: {status}")
    
    if "Success" in status:
        completion = response['choices'][0]['text']
        print(f"\nü§ñ AI Response: {completion}")
        print(f"\nüìä Usage: {response.get('usage', 'N/A')}")
    else:
        print(f"Error: {response}")

## üìà Section 3: Scaling and Monitoring GPU Workloads

Explore Kubernetes' scaling capabilities with GPU workloads and monitor resource usage.

In [None]:
# Check current deployment scale
print("üìä Current Deployment Scale")
print("=" * 35)

if not check_command_exists("kubectl"):
    print("‚ùå kubectl not available - cannot check deployment scale")
else:
    returncode, current_replicas, _ = run_kubectl("get deployment vllm-inference -o jsonpath='{.spec.replicas}'")
    returncode2, ready_replicas, _ = run_kubectl("get deployment vllm-inference -o jsonpath='{.status.readyReplicas}'")
    
    if returncode == 0:
        print(f"Desired Replicas: {current_replicas}")
        print(f"Ready Replicas: {ready_replicas if returncode2 == 0 else 'Unknown'}")
        
        # Show detailed deployment status
        print("\nüìã Deployment Details:")
        returncode, deployment_status, _ = run_kubectl("describe deployment vllm-inference")
        if returncode == 0:
            # Show only the relevant parts
            lines = deployment_status.split('\n')
            for line in lines[:15]:  # First 15 lines usually contain the key info
                if line.strip():
                    print(line)
        else:
            print("‚ùå Could not get deployment details")
    else:
        print("‚ùå vLLM deployment not found")
        print("   Deploy vLLM first using: ./deploy-vllm-inference.sh")

In [None]:
# Demonstrate scaling the deployment
print("üöÄ Scaling vLLM Deployment")
print("=" * 30)

if not check_command_exists("kubectl"):
    print("‚ùå kubectl not available - cannot scale deployment")
else:
    # Check if deployment exists first
    returncode, _, _ = run_kubectl("get deployment vllm-inference")
    if returncode != 0:
        print("‚ùå vLLM deployment not found - cannot scale")
        print("   Deploy vLLM first using: ./deploy-vllm-inference.sh")
    else:
        # Scale to 2 replicas (if we have enough GPUs)
        print("üìà Scaling to 2 replicas...")
        returncode, scale_result, error = run_kubectl("scale deployment vllm-inference --replicas=2")
        if returncode == 0:
            print(f"‚úÖ Scale command executed: {scale_result}")
            
            # Wait a moment and check status
            print("\n‚è≥ Waiting for scaling to take effect...")
            time.sleep(10)
            
            # Check new status
            returncode, new_status, _ = run_kubectl("get deployment vllm-inference")
            if returncode == 0:
                print(f"\nüìä Updated Deployment Status:")
                print(new_status)
            
            # Show pods
            print("\nüì¶ Pod Status:")
            returncode, pod_status, _ = run_kubectl("get pods -l app=vllm-inference")
            if returncode == 0:
                print(pod_status)
            else:
                print("‚ùå Could not get pod status")
        else:
            print(f"‚ùå Scale command failed: {error}")

In [None]:
# Monitor GPU resource usage
print("üíæ GPU Resource Monitoring")
print("=" * 30)

if not check_command_exists("kubectl"):
    print("‚ùå kubectl not available - cannot monitor GPU resources")
else:
    # Check GPU allocation across nodes
    print("üñ•Ô∏è GPU Resources per Node:")
    returncode, gpu_allocation, _ = run_kubectl('get nodes -o custom-columns=NAME:.metadata.name,"TOTAL_GPU:.status.capacity.amd\.com/gpu","ALLOCATABLE_GPU:.status.allocatable.amd\.com/gpu"')
    if returncode == 0:
        print(gpu_allocation)
    else:
        print("‚ùå Could not get GPU allocation info")
        print("   This is normal if AMD GPU Operator is not installed")
    
    # Check which pods are using GPUs
    print("\nüéØ GPU Usage by Pods:")
    returncode, gpu_pods, _ = run_kubectl('get pods --all-namespaces -o custom-columns=NAMESPACE:.metadata.namespace,NAME:.metadata.name,NODE:.spec.nodeName,"GPU_REQUEST:.spec.containers[*].resources.requests.amd\.com/gpu","GPU_LIMIT:.spec.containers[*].resources.limits.amd\.com/gpu"')
    if returncode == 0:
        # Filter only pods that actually request GPUs
        lines = gpu_pods.split('\n')
        header = lines[0]
        gpu_requesting_pods = [line for line in lines[1:] if line and not line.endswith('<none>') and len(line.split()) >= 4 and line.split()[-2] not in ['<none>', '']]
        
        if gpu_requesting_pods:
            print(header)
            for pod in gpu_requesting_pods:
                print(pod)
        else:
            print("   No pods currently requesting GPUs")
    else:
        print("‚ùå Could not check GPU usage by pods")
    
    # Show resource usage summary
    print("\nüìà Node Resource Summary:")
    returncode, node_info, _ = run_kubectl('describe nodes | grep -A 5 "Allocated resources" | head -10')
    if returncode == 0 and node_info.strip():
        print(node_info[:500])  # Limit output
    else:
        print("   Detailed resource info not available")

In [None]:
# Scale back to 1 replica for resource efficiency
print("üìâ Scaling Back to 1 Replica")
print("=" * 35)

if not check_command_exists("kubectl"):
    print("‚ùå kubectl not available - cannot scale deployment")
else:
    # Check if deployment exists
    returncode, _, _ = run_kubectl("get deployment vllm-inference")
    if returncode != 0:
        print("‚ùå vLLM deployment not found")
    else:
        returncode, scale_down, error = run_kubectl("scale deployment vllm-inference --replicas=1")
        if returncode == 0:
            print(f"‚úÖ Scale down executed: {scale_down}")
            
            # Wait and verify
            time.sleep(5)
            returncode, final_status, _ = run_kubectl("get deployment vllm-inference")
            if returncode == 0:
                print(f"\nüìä Final Deployment Status:")
                print(final_status)
            
            print("\n‚úÖ Scaling demonstration completed!")
        else:
            print(f"‚ùå Scale down failed: {error}")

## üîß Section 4: Advanced Operations and Troubleshooting

Learn essential commands for managing GPU workloads in production environments.

In [None]:
# Troubleshooting commands and information gathering
print("üîç Essential Troubleshooting Commands")
print("=" * 45)

if not check_command_exists("kubectl"):
    print("‚ùå kubectl not available - cannot run troubleshooting commands")
    print("\nüí° Install Kubernetes first: sudo ./install-kubernetes.sh")
else:
    # 1. Check events for any issues
    print("1Ô∏è‚É£ Recent Cluster Events:")
    returncode, events, _ = run_kubectl("get events --sort-by=.metadata.creationTimestamp | tail -10")
    if returncode == 0:
        print(events)
    else:
        print("‚ùå Could not get cluster events")
    
    print("\n" + "="*50)
    
    # 2. Check logs from vLLM pods
    print("2Ô∏è‚É£ vLLM Pod Logs (last 10 lines):")
    returncode, vllm_logs, _ = run_kubectl("logs -l app=vllm-inference --tail=10")
    if returncode == 0:
        if vllm_logs.strip():
            print(vllm_logs)
        else:
            print("   No vLLM pods found or no logs available")
    else:
        print("‚ùå Could not get vLLM logs (deployment may not exist)")
    
    print("\n" + "="*50)
    
    # 3. Check GPU operator logs
    print("3Ô∏è‚É£ GPU Operator Logs (last 5 lines):")
    returncode, gpu_operator_logs, _ = run_kubectl("logs -n kube-amd-gpu -l app.kubernetes.io/name=gpu-operator-charts --tail=5")
    if returncode == 0:
        if gpu_operator_logs.strip():
            print(gpu_operator_logs)
        else:
            print("   No GPU operator pods found")
    else:
        print("‚ùå Could not get GPU operator logs (may not be installed)")

In [None]:
# Performance and resource monitoring
print("üìä Performance Monitoring Commands")
print("=" * 40)

if not check_command_exists("kubectl"):
    print("‚ùå kubectl not available - cannot run monitoring commands")
else:
    # Check if metrics are available
    print("1Ô∏è‚É£ Checking GPU Metrics Availability:")
    returncode, metrics_service, _ = run_kubectl("get service -n kube-amd-gpu")
    if returncode == 0:
        # Look for metrics service
        if "metrics" in metrics_service.lower():
            print("‚úÖ GPU metrics service found:")
            for line in metrics_service.split('\n'):
                if 'metrics' in line.lower():
                    print(f"   {line}")
        else:
            print("‚ö†Ô∏è No metrics service found in kube-amd-gpu namespace")
    else:
        print("‚ùå Could not check services (GPU operator may not be installed)")
    
    # Try to access metrics if available
    if returncode == 0 and "metrics" in metrics_service.lower():
        print("\n2Ô∏è‚É£ GPU Metrics Endpoint:")
        returncode, node_ip, _ = run_kubectl("get nodes -o jsonpath='{.items[0].status.addresses[?(@.type==\"InternalIP\")].address}'")
        if returncode == 0 and node_ip:
            print(f"üìà Metrics available at: http://{node_ip.strip()}:32500/metrics")
            print("   Use this endpoint with Prometheus for monitoring")
        else:
            print("‚ùå Could not determine node IP for metrics access")
    else:
        print("\n‚ö†Ô∏è GPU metrics exporter not found or not configured")
    
    print("\n3Ô∏è‚É£ Essential Monitoring Commands:")
    monitoring_commands = [
        "kubectl top nodes",
        "kubectl top pods",
        "kubectl get pods -o wide",
        "kubectl describe node <node-name>",
        "kubectl get events --sort-by=.metadata.creationTimestamp"
    ]
    
    for cmd in monitoring_commands:
        print(f"   ‚Ä¢ {cmd}")
    
    # Try to run kubectl top nodes if metrics-server is available
    print("\n4Ô∏è‚É£ Current Resource Usage (if metrics-server available):")
    returncode, top_output, _ = run_kubectl("top nodes")
    if returncode == 0:
        print(top_output)
    else:
        print("   Metrics-server not available (normal for basic installations)")

In [None]:
# Generate a summary report
print("üìã Cluster Summary Report")
print("=" * 30)

if not check_command_exists("kubectl"):
    print("‚ùå kubectl not available - cannot generate cluster summary")
    summary_data = {
        "Kubernetes Status": "Not Installed",
        "kubectl Available": "No",
        "Recommendation": "Run: sudo ./install-kubernetes.sh"
    }
else:
    # Collect all key information
    summary_data = {}
    
    # Cluster nodes
    returncode, nodes, _ = run_kubectl("get nodes --no-headers")
    if returncode == 0:
        node_count = len([line for line in nodes.split('\n') if line.strip()])
        summary_data["Cluster Status"] = f"{node_count} node(s)"
    else:
        summary_data["Cluster Status"] = "Not accessible"
    
    # AMD GPU nodes
    returncode, gpu_nodes, _ = run_kubectl("get nodes -l feature.node.kubernetes.io/amd-gpu=true --no-headers")
    if returncode == 0:
        gpu_node_count = len([line for line in gpu_nodes.split('\n') if line.strip()])
        summary_data["AMD GPU Nodes"] = str(gpu_node_count)
    else:
        summary_data["AMD GPU Nodes"] = "Unknown"
    
    # GPU Operator status
    returncode, _, _ = run_kubectl("get namespace kube-amd-gpu")
    summary_data["GPU Operator Status"] = "Installed" if returncode == 0 else "Not Installed"
    
    # vLLM deployment
    returncode, _, _ = run_kubectl("get deployment vllm-inference")
    summary_data["vLLM Deployment"] = "Running" if returncode == 0 else "Not Found"
    
    # Total GPU resources
    returncode, gpu_capacity, _ = run_kubectl('get nodes -o jsonpath="{.items[*].status.capacity.amd\.com/gpu}"')
    if returncode == 0 and gpu_capacity.strip():
        # Sum up GPU resources
        try:
            gpus = [int(x) for x in gpu_capacity.split() if x.isdigit()]
            total_gpus = sum(gpus) if gpus else 0
            summary_data["Total GPU Resources"] = str(total_gpus)
        except:
            summary_data["Total GPU Resources"] = "0"
    else:
        summary_data["Total GPU Resources"] = "0"
    
    # LoadBalancer service
    returncode, service_info, _ = run_kubectl("get service vllm-service")
    if returncode == 0:
        summary_data["LoadBalancer Service"] = "Available" if "LoadBalancer" in service_info else "Not LoadBalancer"
    else:
        summary_data["LoadBalancer Service"] = "Not Available"

# Display summary
for key, value in summary_data.items():
    print(f"‚úì {key}: {value}")

print("\n" + "="*50)
if check_command_exists("kubectl"):
    print("üéâ Tutorial Complete!")
    print("\nNext Steps:")
    print("‚Ä¢ Explore different AI models with vLLM")
    print("‚Ä¢ Set up monitoring with Prometheus/Grafana")
    print("‚Ä¢ Implement autoscaling policies")
    print("‚Ä¢ Configure resource quotas for multi-tenancy")
    print("‚Ä¢ Explore multi-GPU model parallelism")
else:
    print("üö® Setup Required!")
    print("\nNext Steps:")
    print("‚Ä¢ Install Kubernetes: sudo ./install-kubernetes.sh")
    print("‚Ä¢ Install AMD GPU Operator: ./install-amd-gpu-operator.sh")
    print("‚Ä¢ Deploy vLLM: ./deploy-vllm-inference.sh")
    print("‚Ä¢ Re-run this notebook")

## üéì Key Takeaways and Next Steps

### What You've Learned

1. **Complete Infrastructure Setup**: From bare Ubuntu server to production GPU-accelerated Kubernetes cluster

2. **AMD GPU + Kubernetes Integration**: Successfully deployed the AMD GPU Operator to expose GPU resources as schedulable Kubernetes resources

3. **AI Inference Deployment**: Deployed vLLM inference server with proper GPU allocation and external access via LoadBalancer

4. **Scaling Operations**: Demonstrated horizontal scaling of GPU workloads and monitoring resource usage

### Production Considerations

- **Resource Management**: Use resource quotas and limits to prevent GPU resource contention
- **Monitoring**: Implement comprehensive monitoring with Prometheus and Grafana
- **High Availability**: Deploy across multiple nodes with anti-affinity rules
- **Security**: Use network policies and pod security standards
- **Backup & Recovery**: Implement proper backup strategies for persistent data

### Installation Summary

**Complete Stack Installation Commands:**
```bash
# Step 0: Install Kubernetes (if needed)
sudo ./install-kubernetes.sh

# Step 1: Install AMD GPU Operator
./install-amd-gpu-operator.sh

# Step 2: Deploy AI Inference Workload
./deploy-vllm-inference.sh
```

### Useful Resources

- [AMD GPU Operator Documentation](https://rocm.github.io/gpu-operator/)
- [vLLM Documentation](https://docs.vllm.ai/)
- [Kubernetes GPU Scheduling](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/)
- [ROCm Blog Series](https://rocm.blogs.amd.com/artificial-intelligence/k8s-orchestration-part1/README.html)

---

**Congratulations!** üéâ You now have hands-on experience with the complete stack: from bare metal Ubuntu to production AMD GPU-accelerated Kubernetes clusters running AI inference workloads.