# Make Inference Requests to Deployed Model

This notebook demonstrates how to make authenticated inference requests to a model deployed with OpenVINO Model Server using the KServe V2 API.

You will:
1. Retrieve the model server route and authentication token
2. Load and preprocess a sample image
3. Send an inference request using the KServe V2 protocol
4. Process and display the prediction results

## 1. Retrieve Model Server Route

Get the external route for the deployed model server using the OpenShift CLI.

In [None]:
import subprocess

model_route = subprocess.check_output([
    "oc", "get", "route", "image-classifier-server",
    "-n", "ai0017l-wb",
    "-o", "jsonpath={.spec.host}"
]).decode().strip()

model_url = f"https://{model_route}/v2/models/image-classifier/infer"
print(f"Model URL: {model_url}")

## 2. Retrieve Authentication Token

Get a service account token for authenticating requests to the model server.

In [None]:
token = subprocess.check_output(["oc", "whoami", "-t"]).decode().strip()
print(f"Token retrieved: {token[:20]}...")

## 3. Import Required Libraries

Import libraries for image processing, HTTP requests, and numerical operations.

In [None]:
import requests
import json
import numpy as np
from PIL import Image
from pathlib import Path

# Disable SSL warnings for self-signed certificates in lab environments
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

## 4. Load ImageNet Class Labels

Load the ImageNet class labels for mapping prediction indices to human-readable class names.

In [None]:
# Download ImageNet class labels
import urllib.request

labels_url = "https://raw.githubusercontent.com/anishathalye/imagenet-simple-labels/master/imagenet-simple-labels.json"
labels_path = Path("data/imagenet_labels.json")

# Create data directory if it doesn't exist
labels_path.parent.mkdir(parents=True, exist_ok=True)

# Download labels if not already present
if not labels_path.exists():
    print(f"Downloading ImageNet labels from {labels_url}...")
    urllib.request.urlretrieve(labels_url, labels_path)
    print("✓ Labels downloaded successfully")
else:
    print("✓ ImageNet labels already exist")

# Load labels
with open(labels_path, 'r') as f:
    imagenet_labels = json.load(f)

print(f"Loaded {len(imagenet_labels)} ImageNet class labels")
print(f"Example labels: {imagenet_labels[:5]}")

## 5. Load and Preprocess Sample Image

Load a sample image and preprocess it to match the model's expected input format.

**Preprocessing Steps:**
1. Resize image to 224x224 pixels
2. Convert to RGB format
3. Normalize pixel values using ImageNet statistics
4. Transpose to channel-first format (C, H, W)
5. Add batch dimension

In [None]:
# Load image
image_path = Path("data/sample_image.jpg")
if not image_path.exists():
    raise FileNotFoundError(f"Sample image not found: {image_path}")

image = Image.open(image_path)
print(f"Original image size: {image.size}")

# Resize to 224x224
image = image.resize((224, 224))
image = image.convert('RGB')

# Convert to numpy array and normalize
image_array = np.array(image).astype(np.float32) / 255.0

# Apply ImageNet normalization
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
image_array = (image_array - mean) / std

# Transpose to channel-first format (C, H, W)
image_array = np.transpose(image_array, (2, 0, 1))

# Add batch dimension (N, C, H, W)
image_array = np.expand_dims(image_array, axis=0)

print(f"Preprocessed image shape: {image_array.shape}")
print(f"Image data type: {image_array.dtype}")

## 6. Construct KServe V2 Inference Request

Build the inference request payload following the KServe V2 protocol specification.

**KServe V2 Request Format:**
- `inputs`: Array of input tensors with name, shape, datatype, and data
- Input tensor name must match the model's expected input name ("input")

In [None]:
# Construct KServe V2 inference request payload
inference_request = {
    "inputs": [
        {
            "name": "input",
            "shape": list(image_array.shape),
            "datatype": "FP32",
            "data": image_array.flatten().tolist()
        }
    ]
}

print("Inference request payload constructed")
print(f"Input shape: {inference_request['inputs'][0]['shape']}")
print(f"Input datatype: {inference_request['inputs'][0]['datatype']}")
print(f"Number of values: {len(inference_request['inputs'][0]['data'])}")

## 7. Send Inference Request

Send the inference request to the model server with authentication headers.

In [None]:
# Set up request headers with authentication
headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json"
}

# Send POST request to model server
print(f"Sending inference request to {model_url}...")
response = requests.post(
    model_url,
    headers=headers,
    json=inference_request,
    verify=False  # Disable SSL verification for self-signed certificates
)

# Check response status
if response.status_code == 200:
    print("✓ Inference request successful!")
else:
    print(f"✗ Inference request failed with status code: {response.status_code}")
    print(f"Response: {response.text}")
    raise Exception("Inference request failed")

## 8. Process Inference Results

Parse the response and extract prediction probabilities for each ImageNet class.

In [None]:
# Parse JSON response
result = response.json()

# Extract prediction data from KServe V2 response
predictions = np.array(result['outputs'][0]['data'])

# Reshape to match output shape if needed
output_shape = result['outputs'][0]['shape']
predictions = predictions.reshape(output_shape)

print(f"Output shape: {predictions.shape}")
print(f"Predictions (first 10): {predictions[0][:10]}")

## 9. Display Top Predictions

Find the top predicted classes and display their confidence scores with human-readable labels.

In [None]:
# Get top 5 predictions
top_k = 5
top_indices = np.argsort(predictions[0])[-top_k:][::-1]
top_probabilities = predictions[0][top_indices]

# Get human-readable labels for top predictions
top_labels = [imagenet_labels[idx] for idx in top_indices]

print("\n" + "="*50)
print("Inference request successful!")
print(f"Predictions: [{', '.join([f'{p:.3f}' for p in predictions[0][:10]])}, ...]")
print(f"Top prediction: Class {top_indices[0]} (confidence: {top_probabilities[0]*100:.1f}%)")
print("="*50)

print(f"\nTop {top_k} predictions:")
for i, (idx, label, prob) in enumerate(zip(top_indices, top_labels, top_probabilities), 1):
    print(f"{i}. {label} (class {idx}): {prob*100:.1f}% confidence")

## Summary

You have successfully:
1. Retrieved the model server route using OpenShift CLI
2. Obtained an authentication token for secure access
3. Downloaded ImageNet class labels for human-readable predictions
4. Loaded and preprocessed an image using ImageNet normalization
5. Constructed a KServe V2 inference request payload
6. Sent an authenticated inference request to the deployed model
7. Processed and displayed the prediction results with class names

This demonstrates the complete workflow for making inference requests to models deployed with OpenVINO Model Server in Red Hat OpenShift AI.

The predictions show both class indices (0-999) and human-readable labels from the ImageNet dataset, making it easy to interpret the model's predictions.