# Inference Pipeline for Retail GenAI System

This notebook demonstrates an end-to-end inference pipeline for the multi-modal retail GenAI system. We'll show how to:

1. Load pre-trained models
2. Process incoming data (images and text)
3. Run inference with GPU acceleration
4. Create a simple API for integration
5. Optimize for real-time performance

This pipeline demonstrates how different AI components can work together to create a comprehensive retail solution.

## Environment Setup

First, let's set up our environment and import necessary libraries.

In [None]:
# Import necessary libraries
import os
import sys
import time
import json
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
from pathlib import Path
from PIL import Image, ImageDraw, ImageFont
import matplotlib.pyplot as plt
from io import BytesIO
import base64

# Add parent directory to path for importing project modules
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

# Import project-specific modules
from src.models.multimodal_fusion import RetailProductFusionModel

# Check for GPU availability
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name()}")
    print(f"CUDA Version: {torch.version.cuda}")
    print(f"Memory allocated: {torch.cuda.memory_allocated() / 1e9:.2f} GB")
    print(f"Memory reserved: {torch.cuda.memory_reserved() / 1e9:.2f} GB")

## 1. Load Pretrained Models

We'll load the models we trained in the previous notebook, along with the necessary preprocessing components.

In [None]:
# Install required packages if not already installed
!pip install -q transformers pillow opencv-python torch torchvision fastapi uvicorn pydantic

In [None]:
from transformers import AutoTokenizer, AutoModel
import torchvision.models as vision_models
import torchvision.transforms as transforms

# Define paths
REPO_ROOT = Path("..")
MODELS_DIR = REPO_ROOT / "models"
DATA_DIR = REPO_ROOT / "examples" / "product_data"

# Load model metadata
metadata_path = MODELS_DIR / "multimodal_model_metadata.json"

# Check if the model and metadata exist, otherwise create default values
if metadata_path.exists():
    with open(metadata_path, 'r') as f:
        model_metadata = json.load(f)
    print("Loaded model metadata")
else:
    # Default metadata for demo purposes
    model_metadata = {
        "model_type": "MultiModalClassifier",
        "fusion_type": "attention",
        "img_feature_dim": 2048,  # For ResNet50
        "text_feature_dim": 384,   # For MiniLM-L6
        "hidden_dim": 512,
        "output_dim": 256,
        "num_classes": 5,  # Default to 5 common retail categories
        "category_mapping": {"0": "Electronics", "1": "Clothing", "2": "Groceries", "3": "Home", "4": "Beauty"},
        "test_accuracy": 0.85,
        "trained_on": "demo_dataset",
        "date_trained": "2023-01-01"
    }
    print("Using default model metadata for demonstration")
    
    # Save default metadata
    os.makedirs(MODELS_DIR, exist_ok=True)
    with open(metadata_path, 'w') as f:
        json.dump(model_metadata, f, indent=4)

# Fix category mapping keys (JSON converts all keys to strings)
category_mapping = {int(k): v for k, v in model_metadata["category_mapping"].items()}
idx_to_category = category_mapping
category_to_idx = {v: k for k, v in idx_to_category.items()}

print(f"Categories: {idx_to_category}")

# Define MultiModalClassifier (same as in previous notebook)
class MultiModalClassifier(nn.Module):
    def __init__(self, fusion_model, num_classes):
        super(MultiModalClassifier, self).__init__()
        self.fusion_model = fusion_model
        self.classifier = nn.Linear(fusion_model.output_dim, num_classes)
    
    def forward(self, img_features, text_features):
        outputs = self.fusion_model(img_features=img_features, text_features=text_features)
        embeddings = outputs["embeddings"]
        logits = self.classifier(embeddings)
        return logits

# Load base models
def load_vision_model():
    print("Loading vision model...")
    model = vision_models.resnet50(pretrained=True)
    # Remove the classification layer
    features = nn.Sequential(*list(model.children())[:-1])
    features.to(device)
    features.eval()
    return features

def load_language_model():
    print("Loading language model...")
    model_name = "sentence-transformers/all-MiniLM-L6-v2"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModel.from_pretrained(model_name)
    model.to(device)
    model.eval()
    return tokenizer, model

# Load models
vision_model = load_vision_model()
tokenizer, language_model = load_language_model()

# Create fusion model
fusion_model = RetailProductFusionModel(
    vision_encoder=None,  # We'll use pre-extracted features
    text_encoder=None,    # We'll use pre-extracted features
    fusion_type=model_metadata["fusion_type"],
    img_feature_dim=model_metadata["img_feature_dim"],
    text_feature_dim=model_metadata["text_feature_dim"],
    hidden_dim=model_metadata["hidden_dim"],
    output_dim=model_metadata["output_dim"]
)

# Create classifier
classifier = MultiModalClassifier(fusion_model, model_metadata["num_classes"])

# Load trained model if exists
model_path = MODELS_DIR / "best_multimodal_classifier.pth"
if model_path.exists():
    classifier.load_state_dict(torch.load(model_path, map_location=device))
    print(f"Loaded trained model from {model_path}")
else:
    print("No trained model found. Using initialized model for demo.")

# Move model to device and set to evaluation mode
classifier = classifier.to(device)
classifier.eval()

# Define image transformations
image_transforms = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

## 2. Build Inference Pipeline

Now we'll create a comprehensive inference pipeline that handles all aspects from data preprocessing to prediction.

In [None]:
class RetailGenAIPipeline:
    """End-to-end inference pipeline for retail GenAI system."""
    
    def __init__(self, vision_model, language_model, tokenizer, classifier, 
                 transforms, category_mapping, device="cuda"):
        self.vision_model = vision_model
        self.language_model = language_model
        self.tokenizer = tokenizer
        self.classifier = classifier
        self.transforms = transforms
        self.category_mapping = category_mapping
        self.device = device
        
        # Set all models to evaluation mode
        self.vision_model.eval()
        self.language_model.eval()
        self.classifier.eval()
    
    def preprocess_image(self, image):
        """Preprocess image for model input."""
        if isinstance(image, str):
            # Load from path
            image = Image.open(image).convert('RGB')
        elif isinstance(image, bytes):
            # Load from bytes
            image = Image.open(BytesIO(image)).convert('RGB')
        elif not isinstance(image, Image.Image):
            raise ValueError("Image must be a PIL Image, a path string, or bytes")
            
        # Apply transformations
        return self.transforms(image).unsqueeze(0)  # Add batch dimension
    
    def preprocess_text(self, text, max_length=128):
        """Preprocess text for model input."""
        encoding = self.tokenizer(
            text,
            padding="max_length",
            truncation=True,
            max_length=max_length,
            return_tensors="pt"
        )
        return encoding
    
    def extract_features(self, image, text):
        """Extract features from vision and language models."""
        with torch.no_grad():
            # Process image
            img_tensor = self.preprocess_image(image)
            img_tensor = img_tensor.to(self.device)
            img_features = self.vision_model(img_tensor).squeeze(-1).squeeze(-1)
            
            # Process text
            text_encoding = self.preprocess_text(text)
            input_ids = text_encoding["input_ids"].to(self.device)
            attention_mask = text_encoding["attention_mask"].to(self.device)
            text_outputs = self.language_model(input_ids=input_ids, attention_mask=attention_mask)
            text_features = text_outputs.last_hidden_state[:, 0, :]
        
        return img_features, text_features
    
    def predict(self, image, text):
        """Run end-to-end prediction pipeline."""
        start_time = time.time()
        
        # Extract features
        img_features, text_features = self.extract_features(image, text)
        
        # Run classifier
        with torch.no_grad():
            logits = self.classifier(img_features, text_features)
            probs = torch.nn.functional.softmax(logits, dim=1)[0]
            predicted_idx = torch.argmax(probs).item()
            predicted_category = self.category_mapping[predicted_idx]
            confidence = probs[predicted_idx].item()
        
        # Collect all probabilities
        category_probs = {}
        for idx, prob in enumerate(probs):
            category = self.category_mapping[idx]
            category_probs[category] = prob.item()
        
        inference_time = time.time() - start_time
        
        return {
            "predicted_category": predicted_category,
            "confidence": confidence,
            "all_probabilities": category_probs,
            "inference_time": inference_time
        }
    
    def process_shelf_image(self, image, text=""):
        """Process a shelf image to identify products (using placeholder logic)."""
        # In a real implementation, this would use object detection to identify products
        # For demo purposes, we'll simulate some detections
        import random
        
        # If image is a path or bytes, convert to PIL Image
        if isinstance(image, str):
            img = Image.open(image).convert('RGB')
        elif isinstance(image, bytes):
            img = Image.open(BytesIO(image)).convert('RGB')
        elif isinstance(image, Image.Image):
            img = image
        else:
            raise ValueError("Image must be a PIL Image, a path string, or bytes")
        
        width, height = img.size
        
        # Simulated product detection
        num_products = random.randint(3, 8)
        product_detections = []
        
        for i in range(num_products):
            # Generate random box (ensuring they don't go outside the image)
            box_width = random.randint(width // 6, width // 3)
            box_height = random.randint(height // 6, height // 3)
            x = random.randint(0, width - box_width)
            y = random.randint(0, height - box_height)
            
            # Random category and confidence
            cat_idx = random.randint(0, len(self.category_mapping) - 1)
            category = self.category_mapping[cat_idx]
            confidence = random.uniform(0.7, 0.99)
            
            product_detections.append({
                "box": [x, y, x + box_width, y + box_height],
                "category": category,
                "confidence": confidence,
                "product_id": f"P{random.randint(1000, 9999)}"
            })
        
        return {
            "num_products": len(product_detections),
            "detections": product_detections,
            "shelf_image_size": [width, height]
        }
    
    def answer_product_question(self, image, question):
        """Answer a natural language question about a product."""
        # First, predict the product category
        prediction = self.predict(image, question)
        category = prediction["predicted_category"]
        confidence = prediction["confidence"]
        
        # Mock product catalog (in real implementation, this would query a database)
        product_details = {
            "Electronics": {
                "price_range": "$50-$1200",
                "top_brands": ["TechCorp", "Electra", "DigiLife"],
                "features": ["wireless connectivity", "long battery life", "high resolution display"],
                "warranty": "1-3 years"
            },
            "Clothing": {
                "price_range": "$15-$250",
                "top_brands": ["StyleX", "UrbanFit", "ClassicWear"],
                "features": ["sustainable materials", "comfortable fit", "machine washable"],
                "warranty": "30-day returns"
            },
            "Groceries": {
                "price_range": "$2-$35",
                "top_brands": ["FreshFarms", "OrganicLife", "NatureHarvest"],
                "features": ["organic options", "locally sourced", "no preservatives"],
                "warranty": "satisfaction guarantee"
            },
            "Home": {
                "price_range": "$10-$500",
                "top_brands": ["HomeLux", "ComfortLiving", "ModernSpace"],
                "features": ["durable construction", "stylish design", "easy assembly"],
                "warranty": "1-5 years"
            },
            "Beauty": {
                "price_range": "$8-$150",
                "top_brands": ["GlowUp", "NaturalBeauty", "LuxeSkin"],
                "features": ["cruelty-free", "fragrance-free options", "dermatologist tested"],
                "warranty": "30-day returns"
            }
        }
        
        # Simple rule-based QA logic
        response = ""
        
        # Get product info for the predicted category
        if category in product_details:
            info = product_details[category]
            
            # Very basic keyword matching for demo purposes
            question_lower = question.lower()
            
            if "price" in question_lower or "cost" in question_lower or "how much" in question_lower:
                response = f"This {category} product typically costs in the range of {info['price_range']}."
            
            elif "brand" in question_lower or "who makes" in question_lower or "manufacturer" in question_lower:
                top_brands = ", ".join(info["top_brands"])
                response = f"The top brands in this {category} category include {top_brands}."
            
            elif "feature" in question_lower or "specification" in question_lower or "what can" in question_lower:
                features = ", ".join(info["features"])
                response = f"This {category} product typically offers these features: {features}."
            
            elif "warranty" in question_lower or "guarantee" in question_lower or "return" in question_lower:
                response = f"This {category} product typically comes with a {info['warranty']}."
            
            elif "recommend" in question_lower or "alternative" in question_lower or "similar" in question_lower:
                response = f"Based on this {category} product, I would recommend checking out items from {', '.join(info['top_brands'][:2])}."
            
            else:
                # Generic response for other questions
                response = f"This appears to be a {category} product. It typically costs {info['price_range']} and features {', '.join(info['features'][:2])}."
        else:
            response = "I couldn't identify the product category clearly. Could you provide more information?"
        
        return {
            "question": question,
            "answer": response,
            "predicted_category": category,
            "confidence": confidence
        }

# Create the pipeline
retail_pipeline = RetailGenAIPipeline(
    vision_model=vision_model,
    language_model=language_model,
    tokenizer=tokenizer,
    classifier=classifier,
    transforms=image_transforms,
    category_mapping=idx_to_category,
    device=device
)

print("Inference pipeline initialized and ready for use!")

## 3. Load Test Data

Let's load some test data to demonstrate the pipeline.

In [None]:
import cv2

# Function to generate placeholder images (similar to the previous notebook)
def generate_placeholder_image(product_id, category, size=(500, 500)):
    """Generate a placeholder image based on product info."""
    # Create a colored background based on category
    category_colors = {
        "Electronics": (200, 200, 255),  # Light blue
        "Clothing": (255, 200, 200),     # Light red
        "Groceries": (200, 255, 200),    # Light green
        "Home": (255, 255, 200),         # Light yellow
        "Beauty": (255, 200, 255),       # Light purple
        "default": (240, 240, 240)       # Light gray
    }
    
    color = category_colors.get(category, category_colors["default"])
    img = np.ones((size[1], size[0], 3), dtype=np.uint8) * np.array(color, dtype=np.uint8)
    
    # Add a product ID text
    cv2.putText(img, f"{category}", (size[0]//4, size[1]//3), 
                cv2.FONT_HERSHEY_SIMPLEX, 1.5, (0, 0, 0), 2)
    
    cv2.putText(img, f"Product {product_id}", (size[0]//4, 2*size[1]//3), 
                cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 0), 2)
    
    # Convert to PIL Image
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    return Image.fromarray(img_rgb)

# Generate test images for each category
test_images = {}
for cat_id, category in idx_to_category.items():
    test_images[category] = generate_placeholder_image(1000 + cat_id, category)

# Display the test images
fig, axes = plt.subplots(1, len(test_images), figsize=(5*len(test_images), 5))

for i, (category, img) in enumerate(test_images.items()):
    axes[i].imshow(img)
    axes[i].set_title(category)
    axes[i].axis('off')

plt.tight_layout()
plt.show()

## 4. Single Product Classification

Let's start by testing our pipeline with single product classification.

In [None]:
# Test product classification with different categories
for category, image in test_images.items():
    print(f"\nTesting with {category} image:")
    # Sample product description
    product_text = f"A high-quality {category.lower()} product with excellent features."
    
    # Run prediction
    result = retail_pipeline.predict(image, product_text)
    
    # Print results
    print(f"  Prediction: {result['predicted_category']} (Confidence: {result['confidence']:.2%})")
    print(f"  Inference time: {result['inference_time']*1000:.2f} ms")
    
    # Print all class probabilities
    print(f"  All probabilities:")
    for cat, prob in sorted(result['all_probabilities'].items(), key=lambda x: x[1], reverse=True):
        print(f"    {cat}: {prob:.2%}")

## 5. Question Answering about Products

Now let's test the pipeline's ability to answer questions about products.

In [None]:
# Sample questions
questions = [
    "How much does this product cost?",
    "What features does this product have?",
    "Is there a warranty for this product?",
    "What brands make similar products?",
    "Can you recommend alternatives to this product?"
]

# Test with Electronics product
category = "Electronics"
image = test_images[category]

# Display the image
plt.figure(figsize=(8, 8))
plt.imshow(image)
plt.title(f"{category} Product")
plt.axis('off')
plt.show()

# Answer questions about the product
print(f"\nProduct Q&A for {category} product:\n")

for question in questions:
    result = retail_pipeline.answer_product_question(image, question)
    
    print(f"Q: {result['question']}")
    print(f"A: {result['answer']}")
    print()

## 6. Shelf Analysis Demo

Let's create a simulated shelf image and process it to identify multiple products.

In [None]:
def create_shelf_image(size=(800, 600)):
    """Create a simulated shelf image with multiple products."""
    # Create a background for the shelf
    shelf_img = np.ones((size[1], size[0], 3), dtype=np.uint8) * np.array([240, 230, 220], dtype=np.uint8)
    
    # Draw shelf lines
    for y in range(150, size[1], 150):
        cv2.line(shelf_img, (0, y), (size[0], y), (180, 170, 160), 5)
    
    # Convert to PIL Image
    shelf_img_rgb = cv2.cvtColor(shelf_img, cv2.COLOR_BGR2RGB)
    shelf_pil = Image.fromarray(shelf_img_rgb)
    
    return shelf_pil

def visualize_shelf_detection(shelf_image, results):
    """Visualize product detections on a shelf image."""
    # Convert to PIL Image if needed
    if not isinstance(shelf_image, Image.Image):
        if isinstance(shelf_image, str):
            shelf_image = Image.open(shelf_image).convert('RGB')
        elif isinstance(shelf_image, bytes):
            shelf_image = Image.open(BytesIO(shelf_image)).convert('RGB')
        elif isinstance(shelf_image, np.ndarray):
            shelf_image = Image.fromarray(shelf_image)
    
    # Create a copy of the image to draw on
    img_draw = shelf_image.copy()
    draw = ImageDraw.Draw(img_draw)
    
    # Try to load a font (use default if not available)
    try:
        font = ImageFont.truetype("arial.ttf", 14)
    except IOError:
        font = ImageFont.load_default()
    
    # Color mapping for categories
    category_colors = {
        "Electronics": "blue",
        "Clothing": "red",
        "Groceries": "green",
        "Home": "orange",
        "Beauty": "purple"
    }
    
    # Draw bounding boxes and labels
    for det in results["detections"]:
        box = det["box"]
        category = det["category"]
        confidence = det["confidence"]
        product_id = det["product_id"]
        
        # Get color for category
        color = category_colors.get(category, "gray")
        
        # Draw rectangle
        draw.rectangle(box, outline=color, width=3)
        
        # Draw label background
        label = f"{category} ({confidence:.1%})"
        label_size = draw.textbbox((0, 0), label, font=font)
        label_width = label_size[2] - label_size[0]
        label_height = label_size[3] - label_size[1]
        label_bg = [box[0], box[1] - label_height - 4, box[0] + label_width + 4, box[1]]
        draw.rectangle(label_bg, fill=color)
        
        # Draw label text
        draw.text((box[0] + 2, box[1] - label_height - 2), label, fill="white", font=font)
    
    return img_draw

# Create and process a shelf image
shelf_image = create_shelf_image(size=(800, 600))
shelf_results = retail_pipeline.process_shelf_image(shelf_image)

print(f"Detected {shelf_results['num_products']} products on the shelf")

# Visualize results
result_image = visualize_shelf_detection(shelf_image, shelf_results)

plt.figure(figsize=(12, 9))
plt.imshow(result_image)
plt.title("Shelf Analysis Results")
plt.axis('off')
plt.show()

## 7. Performance Benchmarking

Let's benchmark the performance of our pipeline to measure the speed benefits of GPU acceleration.

In [None]:
def benchmark_inference_pipeline(pipeline, images, texts, device, num_runs=10):
    """Benchmark the inference pipeline on a given device."""
    # First run as warmup
    results = []
    for image, text in zip(images, texts):
        start = time.time()
        _ = pipeline.predict(image, text)
        end = time.time()
        results.append(end - start)
    
    # Compute average time
    avg_time = sum(results) / len(results)
    return {
        "device": device,
        "avg_time": avg_time,
        "img_per_sec": 1 / avg_time,
        "all_times": results
    }

# Prepare benchmark data
benchmark_images = list(test_images.values())
benchmark_texts = [f"A high-quality {category.lower()} product" for category in test_images.keys()]

# Run benchmark on current device
benchmark_results = benchmark_inference_pipeline(
    pipeline=retail_pipeline,
    images=benchmark_images,
    texts=benchmark_texts,
    device=device
)

# Print benchmark results
print(f"Benchmark results for {benchmark_results['device']} device:")
print(f"Average inference time: {benchmark_results['avg_time']*1000:.2f} ms")
print(f"Throughput: {benchmark_results['img_per_sec']:.2f} images/second")

# If you have a multi-GPU system, you could create additional benchmarks on specific GPUs
# Or compare with CPU performance by creating a CPU-only pipeline

## 8. API for Integration

Now let's create a simple API that can integrate our pipeline into other applications.

In [None]:
from fastapi import FastAPI, File, UploadFile, Form
from pydantic import BaseModel
from io import BytesIO
import uvicorn
import base64

# Define API models
class ProductQuery(BaseModel):
    image_base64: str
    text: str

class ProductQuestion(BaseModel):
    image_base64: str
    question: str

# Create FastAPI app
app = FastAPI(title="Retail GenAI API", description="Multi-modal API for retail applications")

# Create endpoint for product classification
@app.post("/predict")
async def predict_product(query: ProductQuery):
    # Decode base64 image
    image_data = base64.b64decode(query.image_base64)
    image = Image.open(BytesIO(image_data)).convert('RGB')
    
    # Run prediction
    result = retail_pipeline.predict(image, query.text)
    
    return result

# Create endpoint for product Q&A
@app.post("/answer")
async def answer_question(query: ProductQuestion):
    # Decode base64 image
    image_data = base64.b64decode(query.image_base64)
    image = Image.open(BytesIO(image_data)).convert('RGB')
    
    # Answer question
    result = retail_pipeline.answer_product_question(image, query.question)
    
    return result

# Create endpoint for shelf analysis
@app.post("/analyze_shelf")
async def analyze_shelf(image: UploadFile = File(...)):
    # Read image
    image_data = await image.read()
    pil_image = Image.open(BytesIO(image_data)).convert('RGB')
    
    # Process shelf image
    result = retail_pipeline.process_shelf_image(pil_image)
    
    # Create visualization
    viz_image = visualize_shelf_detection(pil_image, result)
    
    # Convert to base64 for response
    buffered = BytesIO()
    viz_image.save(buffered, format="JPEG")
    img_str = base64.b64encode(buffered.getvalue()).decode()
    
    # Add visualization to result
    result["visualization_base64"] = img_str
    
    return result

# Print instructions for running the API server
print("To run the API server, execute the following command in a terminal:")
print("uvicorn api:app --reload")
print("\nThe API will be available at: http://127.0.0.1:8000")
print("Interactive documentation will be available at: http://127.0.0.1:8000/docs")

To run the API server, save the code above to a file named `api.py` and run it with:

```
uvicorn api:app --reload
```

Here's a sample Python client that demonstrates how to use this API:

In [None]:
import requests
import json
import base64
from io import BytesIO
from PIL import Image

def image_to_base64(image):
    """Convert PIL Image to base64 string."""
    buffered = BytesIO()
    image.save(buffered, format="JPEG")
    img_str = base64.b64encode(buffered.getvalue()).decode()
    return img_str

def base64_to_image(base64_str):
    """Convert base64 string to PIL Image."""
    img_data = base64.b64decode(base64_str)
    return Image.open(BytesIO(img_data))

# API endpoint URL (this would be the actual URL when the server is running)
API_URL = "http://127.0.0.1:8000"

# Example functions to call the API
def predict_product_example(image, text):
    # Convert image to base64
    img_base64 = image_to_base64(image)
    
    # Prepare request data
    data = {
        "image_base64": img_base64,
        "text": text
    }
    
    # Send request
    response = requests.post(f"{API_URL}/predict", json=data)
    
    # Parse response
    if response.status_code == 200:
        return response.json()
    else:
        return {"error": f"Request failed with status code {response.status_code}"}

def answer_question_example(image, question):
    # Convert image to base64
    img_base64 = image_to_base64(image)
    
    # Prepare request data
    data = {
        "image_base64": img_base64,
        "question": question
    }
    
    # Send request
    response = requests.post(f"{API_URL}/answer", json=data)
    
    # Parse response
    if response.status_code == 200:
        return response.json()
    else:
        return {"error": f"Request failed with status code {response.status_code}"}

def analyze_shelf_example(image):
    # Convert image to bytes
    buffered = BytesIO()
    image.save(buffered, format="JPEG")
    img_bytes = buffered.getvalue()
    
    # Prepare request files
    files = {"image": ("shelf.jpg", img_bytes, "image/jpeg")}
    
    # Send request
    response = requests.post(f"{API_URL}/analyze_shelf", files=files)
    
    # Parse response
    if response.status_code == 200:
        result = response.json()
        
        # Convert visualization back to image if present
        if "visualization_base64" in result:
            result["visualization"] = base64_to_image(result["visualization_base64"])
        
        return result
    else:
        return {"error": f"Request failed with status code {response.status_code}"}

print("API client examples prepared.")
print("When the API server is running, you can use these functions to interact with it.")

## 9. Model Optimization

In a real-world deployment, we would apply additional optimizations to improve performance. Here we'll demonstrate some common techniques for model optimization on NVIDIA GPUs.

In [None]:
# This section outlines optimization techniques that would be applied in a production environment

# 1. Model Quantization
def optimize_model_with_quantization(model):
    """Apply INT8/FP16 quantization to a model."""
    print("Applying quantization to model...")
    
    # In a real implementation, we would use PyTorch's quantization API
    # or NVIDIA TensorRT for more advanced optimizations
    
    # Simple example using PyTorch's native FP16
    model_fp16 = model.half()  # Convert to FP16
    
    return model_fp16

# 2. Batch Processing
def batch_inference_example(pipeline, images, texts, batch_size=8):
    """Process multiple inputs in batches for higher throughput."""
    # This is a demonstration of how batch processing would work
    # In a real implementation, the pipeline would be modified to handle batches natively
    
    print(f"Processing {len(images)} inputs in batches of {batch_size}...")
    
    # Simple batching logic
    results = []
    for i in range(0, len(images), batch_size):
        batch_images = images[i:i+batch_size]
        batch_texts = texts[i:i+batch_size]
        
        # In a real batch implementation, we would process all images at once
        # For this demo, we'll just process them sequentially
        batch_results = [pipeline.predict(img, txt) for img, txt in zip(batch_images, batch_texts)]
        results.extend(batch_results)
    
    return results

# 3. TensorRT Export (illustration only)
def export_to_tensorrt():
    """Demonstrate the concept of TensorRT export."""
    print("In a production environment, we would export the model to TensorRT for maximum GPU performance.")
    print("\nThe TensorRT export process would involve:")
    print("1. Converting the PyTorch model to ONNX format")
    print("2. Using TensorRT to optimize the ONNX model")
    print("3. Implementing a TensorRT engine for inference")
    print("\nThis can yield 2-4x speed improvements compared to PyTorch on the same GPU.")

# 4. CUDA Graph Optimization (illustration only)
def cuda_graph_optimization():
    """Demonstrate the concept of CUDA Graph optimization."""
    print("For repeated inference on inputs of the same shape, CUDA Graphs can significantly improve performance.")
    print("\nCUDA Graphs work by:")
    print("1. Recording a sequence of CUDA operations once")
    print("2. Replaying the entire sequence without CPU overhead")
    print("3. Eliminating launch latencies between operations")
    print("\nThis is ideal for production scenarios with steady workloads.")

# Show optimization techniques
print("Model Optimization Techniques for NVIDIA GPUs:\n")

# Quantization demo
if torch.cuda.is_available():
    print("1. Model Quantization")
    print("====================")
    print("Original model precision: FP32")
    print(f"Original model memory usage: {sum(p.nelement() * p.element_size() for p in classifier.parameters()) / 1e6:.2f} MB\n")
    
    # Apply quantization
    classifier_fp16 = optimize_model_with_quantization(classifier)
    
    print(f"Quantized model precision: FP16")
    print(f"Quantized model memory usage: {sum(p.nelement() * p.element_size() for p in classifier_fp16.parameters()) / 1e6:.2f} MB")
    print(f"Memory reduction: ~50%\n")

# Other optimizations
print("2. Batch Processing for Higher Throughput")
print("=========================================")
batch_inference_example(retail_pipeline, benchmark_images * 4, benchmark_texts * 4, batch_size=8)
print("\nBatch processing can significantly improve throughput by better utilizing GPU parallelism.\n")

print("3. TensorRT Optimization")
print("=======================")
export_to_tensorrt()
print()

print("4. CUDA Graph Optimization")
print("==========================")
cuda_graph_optimization()

## 10. Summary and Next Steps

In this notebook, we've built a complete inference pipeline for our Multi-Modal Retail GenAI system. Key accomplishments:

1. **Complete Inference Pipeline**: We've created a modular, end-to-end pipeline that handles image, text, and multi-modal inputs.

2. **Product Classification**: The pipeline can classify products into appropriate retail categories, combining visual and textual information.

3. **Question Answering**: We've added natural language understanding to answer questions about products.

4. **Shelf Analysis**: The pipeline can process shelf images to identify multiple products simultaneously.

5. **API for Integration**: We've created a simple FastAPI server for easy integration into other applications.

6. **Performance Optimization**: We've demonstrated NVIDIA GPU-specific optimizations for maximum performance.

### Next Steps

To take this further in a real-world deployment:

1. **Containerization**: Package the pipeline using Docker for easy deployment (covered in notebook 04).

2. **Model Monitoring**: Implement monitoring to track model performance and drift in production.

3. **Scaling**: Deploy using Kubernetes with NVIDIA device plugin for multi-GPU and multi-node scaling.

4. **Data Collection**: Set up a feedback loop to collect user interactions for continued model improvement.

5. **Advanced Features**: Add more retail-specific features like price matching, inventory integration, and personalized recommendations.

The code from this notebook can be used as the foundation for a production-ready retail AI system, leveraging NVIDIA GPUs for state-of-the-art performance.