# Containerization and Deployment of Retail GenAI System

This notebook demonstrates how to containerize and deploy the multi-modal retail GenAI system for production environments. We'll cover:

1. Packaging the model and code into Docker containers
2. Setting up GPU-accelerated containers with NVIDIA Container Toolkit
3. Deploying with Docker Compose for development
4. Kubernetes deployment for production
5. Performance tuning and monitoring

This approach ensures the system can be easily deployed across various environments while leveraging NVIDIA GPUs for maximum performance.

## 1. Prerequisites

Before containerizing our application, we need to ensure we have the necessary tools installed.

In [None]:
# Check for Docker installation
!docker --version

# Check for NVIDIA Docker support
!docker info | grep -i nvidia

# Check for NVIDIA Container Toolkit
!nvidia-container-cli --version 2>/dev/null || echo "NVIDIA Container Toolkit not found"

# Check available GPUs
!nvidia-smi 2>/dev/null || echo "No NVIDIA GPUs detected"

## 2. Understanding Docker with NVIDIA GPU Support

The NVIDIA Container Toolkit enables using NVIDIA GPUs in Docker containers. This is essential for deep learning workloads that require GPU acceleration.

### Key Components:

1. **NVIDIA Container Toolkit** (formerly nvidia-docker): Enables GPU access in containers
2. **NVIDIA Container Runtime**: Handles GPU allocation to containers
3. **NVIDIA Base Images**: CUDA-enabled base images for building ML containers

### How it Works:

When a container is launched with `--gpus all`, the NVIDIA Container Toolkit:
1. Identifies available GPUs
2. Mounts necessary drivers
3. Sets required environment variables
4. Configures CUDA libraries in the container

## 3. Reviewing Our Dockerfile

Let's review the Dockerfile we created for our retail GenAI system.

In [None]:
import os
from pathlib import Path

# Define repository root
REPO_ROOT = Path("..")
DOCKER_DIR = REPO_ROOT / "docker"

# Read the Dockerfile
with open(DOCKER_DIR / "Dockerfile", "r") as f:
    dockerfile_content = f.read()

print(dockerfile_content)

### Key Components of Our Dockerfile

Our Dockerfile is built on top of NVIDIA's PyTorch container, which includes:

1. **Base Image**: NVIDIA's optimized PyTorch container with CUDA support
2. **Dependencies**: System libraries for OpenCV and other requirements
3. **Python Libraries**: Libraries from requirements.txt
4. **Application Code**: Our retail GenAI application code

The use of NVIDIA's container as a base image ensures that we have all the necessary CUDA libraries and optimizations for PyTorch.

## 4. Docker Compose Configuration

Let's look at our Docker Compose configuration which orchestrates multiple containers.

In [None]:
# Read the Docker Compose file
with open(DOCKER_DIR / "docker-compose.yml", "r") as f:
    docker_compose_content = f.read()

print(docker_compose_content)

### Key Features of Our Docker Compose Setup

Our `docker-compose.yml` file includes:

1. **Main Application Container**: Our retail GenAI service with GPU support
2. **Vector Database Container**: For storing and searching embeddings
3. **Volume Mounts**: For persistent data storage
4. **GPU Configuration**: NVIDIA GPU resource allocation
5. **Port Mapping**: For accessing the API and interfaces

The `deploy` section with `resources.reservations.devices` is critical for GPU access.

## 5. Building the Docker Image

Now let's build the Docker image for our application. This would typically be done in a terminal, but we'll show the commands here.

In [None]:
# This cell outputs the commands you would run in a terminal
# We don't execute them directly as they require Docker privileges

print("# Navigate to the repository root")
print(f"cd {REPO_ROOT.absolute()}")
print("\n# Build the Docker image")
print("docker build -t retail-genai-accelerator:latest -f docker/Dockerfile .")

### Docker Build Optimization for ML Models

Building efficient Docker images for ML applications requires some special considerations:

1. **Layer Caching**: Order dependencies to maximize cache usage
2. **Multi-stage Builds**: Separate building from runtime
3. **Model Files**: Handle large model files carefully

Here's an example of an optimized Dockerfile for ML applications:

In [None]:
# Example of an optimized Dockerfile (not for execution)
optimized_dockerfile = """
# Build stage
FROM nvcr.io/nvidia/pytorch:23.12-py3 AS builder

# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

# Install Python dependencies
COPY requirements.txt /tmp/
RUN pip install --user --no-cache-dir -r /tmp/requirements.txt

# Runtime stage
FROM nvcr.io/nvidia/pytorch:23.12-py3

# Install runtime dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    libgl1-mesa-glx \
    libglib2.0-0 \
    && rm -rf /var/lib/apt/lists/*

# Copy Python packages from builder
COPY --from=builder /root/.local /root/.local
ENV PATH=/root/.local/bin:$PATH

# Set working directory
WORKDIR /app

# Copy application code
COPY src /app/src
COPY examples /app/examples
COPY models /app/models
COPY README.md /app/

# Create necessary directories
RUN mkdir -p /app/data /app/logs

# Set environment variables
ENV PYTHONPATH=/app:$PYTHONPATH
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility

# Entry point
CMD ["python", "-m", "src.api.server"]
"""

print(optimized_dockerfile)

## 6. Running the Application with Docker Compose

With our Docker image built, we can use Docker Compose to run the application.

In [None]:
# Commands to run the application (not executed)
print("# Navigate to the repository root")
print(f"cd {REPO_ROOT.absolute()}")
print("\n# Start the application with Docker Compose")
print("docker-compose -f docker/docker-compose.yml up -d")
print("\n# Check the running containers")
print("docker-compose -f docker/docker-compose.yml ps")
print("\n# View logs")
print("docker-compose -f docker/docker-compose.yml logs -f")

## 7. NVIDIA Container Toolkit Details

Using GPUs in Docker containers requires the NVIDIA Container Toolkit. Here's how to ensure it's properly configured.

In [None]:
# Installation steps for NVIDIA Container Toolkit (not executed)
nvidia_container_toolkit_install = """
# Add NVIDIA package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | tee /etc/apt/sources.list.d/nvidia-docker.list

# Install nvidia-docker2 package
apt-get update
apt-get install -y nvidia-container-toolkit

# Restart Docker service
systemctl restart docker
"""

print("Installation steps for NVIDIA Container Toolkit:")
print(nvidia_container_toolkit_install)

# Test NVIDIA GPU access in Docker (not executed)
print("\nTest GPU access with:")
print("docker run --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi")

### Verifying GPU Access in Containers

After starting our containers, we can verify that our application has access to GPUs.

In [None]:
# Verify GPU access in the running container (not executed)
print("# Execute nvidia-smi in the container")
print("docker exec retail-genai-accelerator nvidia-smi")

# Check if PyTorch can see the GPUs (not executed)
print("\n# Check if PyTorch can access GPUs")
print("docker exec retail-genai-accelerator python -c \"import torch; print('CUDA available:', torch.cuda.is_available()); print('GPU count:', torch.cuda.device_count()); print('GPU name:', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'N/A')\"")

## 8. Creating an Inference API Module

For production deployment, let's create a dedicated API module. We'll define this in a new file in our repository: `src/api/server.py`.

In [None]:
# Example API server code (not to be executed directly)
api_server_code = """
#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""
API server for the Retail GenAI system.

This module provides a FastAPI server that exposes the functionality
of the Retail GenAI system through RESTful endpoints.
"""

import os
import sys
import json
import time
import torch
import logging
from pathlib import Path
from typing import Dict, List, Optional, Union
from fastapi import FastAPI, File, UploadFile, Form, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse
from pydantic import BaseModel
import uvicorn
from PIL import Image
from io import BytesIO
import base64

# Ensure the repository root is in the Python path
repo_root = Path(__file__).parent.parent.parent
if str(repo_root) not in sys.path:
    sys.path.append(str(repo_root))

# Import project modules
from src.models.multimodal_fusion import RetailProductFusionModel, create_nvidia_optimized_fusion_model
from src.inference.pipeline import RetailGenAIPipeline

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(message)s",
    handlers=[
        logging.FileHandler("retail_genai_api.log"),
        logging.StreamHandler()
    ]
)
logger = logging.getLogger(__name__)

# Load environment variables
DEBUG = os.environ.get("DEBUG", "False").lower() in ("true", "1", "t")
HOST = os.environ.get("HOST", "0.0.0.0")
PORT = int(os.environ.get("PORT", "8000"))
MODEL_DIR = os.environ.get("MODEL_DIR", str(repo_root / "models"))
USE_GPU = os.environ.get("USE_GPU", "True").lower() in ("true", "1", "t")

# Define API models
class ProductQuery(BaseModel):
    image_base64: str
    text: str

class ProductQuestion(BaseModel):
    image_base64: str
    question: str

class HealthResponse(BaseModel):
    status: str
    version: str
    gpu_available: bool
    models_loaded: bool

# Initialize the API
app = FastAPI(
    title="Retail GenAI API",
    description="Multi-modal API for retail applications powered by NVIDIA GPUs",
    version="1.0.0"
)

# Add CORS middleware
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],  # For production, specify your domains
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

# Global variables for models and pipeline
pipeline = None

# Load models on startup
@app.on_event("startup")
async def startup_event():
    global pipeline
    try:
        logger.info("Initializing models...")
        device = torch.device("cuda" if torch.cuda.is_available() and USE_GPU else "cpu")
        
        # Initialize the pipeline with models
        pipeline = RetailGenAIPipeline.from_pretrained(
            model_dir=MODEL_DIR,
            device=device
        )
        
        logger.info(f"Models initialized successfully. Using device: {device}")
    except Exception as e:
        logger.error(f"Error initializing models: {e}")
        raise

# Health check endpoint
@app.get("/health", response_model=HealthResponse)
async def health_check():
    gpu_available = torch.cuda.is_available() and USE_GPU
    models_loaded = pipeline is not None
    
    if not models_loaded:
        status = "warning"
    else:
        status = "ok"
    
    return {
        "status": status,
        "version": "1.0.0",
        "gpu_available": gpu_available,
        "models_loaded": models_loaded
    }

# Product classification endpoint
@app.post("/predict")
async def predict_product(query: ProductQuery):
    if pipeline is None:
        raise HTTPException(status_code=503, detail="Models not initialized")
    
    try:
        # Decode base64 image
        image_data = base64.b64decode(query.image_base64)
        image = Image.open(BytesIO(image_data)).convert('RGB')
        
        # Run prediction
        start_time = time.time()
        result = pipeline.predict(image, query.text)
        processing_time = time.time() - start_time
        
        # Add processing time
        result["processing_time"] = processing_time
        
        return result
    except Exception as e:
        logger.error(f"Error in prediction: {e}")
        raise HTTPException(status_code=500, detail=str(e))

# Product Q&A endpoint
@app.post("/answer")
async def answer_question(query: ProductQuestion):
    if pipeline is None:
        raise HTTPException(status_code=503, detail="Models not initialized")
    
    try:
        # Decode base64 image
        image_data = base64.b64decode(query.image_base64)
        image = Image.open(BytesIO(image_data)).convert('RGB')
        
        # Answer question
        start_time = time.time()
        result = pipeline.answer_product_question(image, query.question)
        processing_time = time.time() - start_time
        
        # Add processing time
        result["processing_time"] = processing_time
        
        return result
    except Exception as e:
        logger.error(f"Error in answering question: {e}")
        raise HTTPException(status_code=500, detail=str(e))

# Shelf analysis endpoint
@app.post("/analyze_shelf")
async def analyze_shelf(image: UploadFile = File(...)):
    if pipeline is None:
        raise HTTPException(status_code=503, detail="Models not initialized")
    
    try:
        # Read image
        image_data = await image.read()
        pil_image = Image.open(BytesIO(image_data)).convert('RGB')
        
        # Process shelf image
        start_time = time.time()
        result = pipeline.process_shelf_image(pil_image)
        processing_time = time.time() - start_time
        
        # Add processing time
        result["processing_time"] = processing_time
        
        # Create visualization
        viz_image = pipeline.visualize_shelf_detection(pil_image, result)
        
        # Convert to base64 for response
        buffered = BytesIO()
        viz_image.save(buffered, format="JPEG")
        img_str = base64.b64encode(buffered.getvalue()).decode()
        
        # Add visualization to result
        result["visualization_base64"] = img_str
        
        return result
    except Exception as e:
        logger.error(f"Error in shelf analysis: {e}")
        raise HTTPException(status_code=500, detail=str(e))

# Main function to run the API server
def main():
    logger.info(f"Starting Retail GenAI API server on {HOST}:{PORT}")
    uvicorn.run("src.api.server:app", host=HOST, port=PORT, reload=DEBUG)

if __name__ == "__main__":
    main()
"""

# Create the API directory if it doesn't exist
api_dir = REPO_ROOT / "src" / "api"
os.makedirs(api_dir, exist_ok=True)

# Write the server code to a file (commenting this out to avoid overwriting existing files)
# with open(api_dir / "server.py", "w") as f:
#     f.write(api_server_code)

# Display the code instead
print(api_server_code)

We should also create an inference pipeline module to support our API server.

In [None]:
# Example inference pipeline module (not to be executed directly)
inference_pipeline_code = """
#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""
Inference pipeline for the Retail GenAI system.

This module implements an end-to-end inference pipeline that combines
vision and language models for retail applications.
"""

import os
import json
import time
import torch
import torch.nn as nn
import numpy as np
from pathlib import Path
from typing import Dict, List, Optional, Union, Tuple
from PIL import Image, ImageDraw, ImageFont
from io import BytesIO
import torchvision.transforms as transforms
from transformers import AutoTokenizer, AutoModel
import torchvision.models as vision_models

# Import project modules
from src.models.multimodal_fusion import RetailProductFusionModel

class MultiModalClassifier(nn.Module):
    """Multi-modal classifier combining vision and language features."""
    
    def __init__(self, fusion_model, num_classes):
        super(MultiModalClassifier, self).__init__()
        self.fusion_model = fusion_model
        self.classifier = nn.Linear(fusion_model.output_dim, num_classes)
    
    def forward(self, img_features, text_features):
        outputs = self.fusion_model(img_features=img_features, text_features=text_features)
        embeddings = outputs["embeddings"]
        logits = self.classifier(embeddings)
        return logits

class RetailGenAIPipeline:
    """End-to-end inference pipeline for retail GenAI system."""
    
    def __init__(self, vision_model, language_model, tokenizer, classifier, 
                 transforms, category_mapping, device="cuda"):
        self.vision_model = vision_model
        self.language_model = language_model
        self.tokenizer = tokenizer
        self.classifier = classifier
        self.transforms = transforms
        self.category_mapping = category_mapping
        self.device = device
        
        # Set all models to evaluation mode
        self.vision_model.eval()
        self.language_model.eval()
        self.classifier.eval()
    
    @classmethod
    def from_pretrained(cls, model_dir, device="cuda"):
        """Load a pipeline from pretrained models."""
        model_dir = Path(model_dir)
        
        # Load model metadata
        metadata_path = model_dir / "multimodal_model_metadata.json"
        if metadata_path.exists():
            with open(metadata_path, 'r') as f:
                model_metadata = json.load(f)
        else:
            # Default metadata for demo purposes
            model_metadata = {
                "model_type": "MultiModalClassifier",
                "fusion_type": "attention",
                "img_feature_dim": 2048,  # For ResNet50
                "text_feature_dim": 384,   # For MiniLM-L6
                "hidden_dim": 512,
                "output_dim": 256,
                "num_classes": 5,  # Default to 5 common retail categories
                "category_mapping": {"0": "Electronics", "1": "Clothing", "2": "Groceries", "3": "Home", "4": "Beauty"},
                "test_accuracy": 0.85,
                "trained_on": "demo_dataset",
                "date_trained": "2023-01-01"
            }
        
        # Fix category mapping keys (JSON converts all keys to strings)
        category_mapping = {int(k): v for k, v in model_metadata["category_mapping"].items()}
        
        # Load models
        # 1. Vision model
        vision_model = vision_models.resnet50(pretrained=True)
        # Remove the classification layer
        vision_model = nn.Sequential(*list(vision_model.children())[:-1])
        vision_model = vision_model.to(device)
        vision_model.eval()
        
        # 2. Language model
        model_name = "sentence-transformers/all-MiniLM-L6-v2"
        tokenizer = AutoTokenizer.from_pretrained(model_name)
        language_model = AutoModel.from_pretrained(model_name)
        language_model = language_model.to(device)
        language_model.eval()
        
        # 3. Fusion model
        fusion_model = RetailProductFusionModel(
            vision_encoder=None,  # We'll use pre-extracted features
            text_encoder=None,    # We'll use pre-extracted features
            fusion_type=model_metadata["fusion_type"],
            img_feature_dim=model_metadata["img_feature_dim"],
            text_feature_dim=model_metadata["text_feature_dim"],
            hidden_dim=model_metadata["hidden_dim"],
            output_dim=model_metadata["output_dim"]
        )
        
        # 4. Classifier
        classifier = MultiModalClassifier(fusion_model, model_metadata["num_classes"])
        
        # Load trained model if exists
        model_path = model_dir / "best_multimodal_classifier.pth"
        if model_path.exists():
            classifier.load_state_dict(torch.load(model_path, map_location=device))
        
        # Move model to device and set to evaluation mode
        classifier = classifier.to(device)
        classifier.eval()
        
        # Image transformations
        image_transforms = transforms.Compose([
            transforms.Resize((256, 256)),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
        ])
        
        return cls(
            vision_model=vision_model,
            language_model=language_model,
            tokenizer=tokenizer,
            classifier=classifier,
            transforms=image_transforms,
            category_mapping=category_mapping,
            device=device
        )
    
    def preprocess_image(self, image):
        """Preprocess image for model input."""
        if isinstance(image, str):
            # Load from path
            image = Image.open(image).convert('RGB')
        elif isinstance(image, bytes):
            # Load from bytes
            image = Image.open(BytesIO(image)).convert('RGB')
        elif not isinstance(image, Image.Image):
            raise ValueError("Image must be a PIL Image, a path string, or bytes")
            
        # Apply transformations
        return self.transforms(image).unsqueeze(0)  # Add batch dimension
    
    def preprocess_text(self, text, max_length=128):
        """Preprocess text for model input."""
        encoding = self.tokenizer(
            text,
            padding="max_length",
            truncation=True,
            max_length=max_length,
            return_tensors="pt"
        )
        return encoding
    
    def extract_features(self, image, text):
        """Extract features from vision and language models."""
        with torch.no_grad():
            # Process image
            img_tensor = self.preprocess_image(image)
            img_tensor = img_tensor.to(self.device)
            img_features = self.vision_model(img_tensor).squeeze(-1).squeeze(-1)
            
            # Process text
            text_encoding = self.preprocess_text(text)
            input_ids = text_encoding["input_ids"].to(self.device)
            attention_mask = text_encoding["attention_mask"].to(self.device)
            text_outputs = self.language_model(input_ids=input_ids, attention_mask=attention_mask)
            text_features = text_outputs.last_hidden_state[:, 0, :]
        
        return img_features, text_features
    
    def predict(self, image, text):
        """Run end-to-end prediction pipeline."""
        start_time = time.time()
        
        # Extract features
        img_features, text_features = self.extract_features(image, text)
        
        # Run classifier
        with torch.no_grad():
            logits = self.classifier(img_features, text_features)
            probs = torch.nn.functional.softmax(logits, dim=1)[0]
            predicted_idx = torch.argmax(probs).item()
            predicted_category = self.category_mapping[predicted_idx]
            confidence = probs[predicted_idx].item()
        
        # Collect all probabilities
        category_probs = {}
        for idx, prob in enumerate(probs):
            category = self.category_mapping[idx]
            category_probs[category] = prob.item()
        
        inference_time = time.time() - start_time
        
        return {
            "predicted_category": predicted_category,
            "confidence": confidence,
            "all_probabilities": category_probs,
            "inference_time": inference_time
        }
    
    def process_shelf_image(self, image, text=""):
        """Process a shelf image to identify products (using a simple detection model)."""
        # In a production implementation, this would use a proper object detection model
        # For this demo, we'll use a simplified approach
        import random
        
        # If image is a path or bytes, convert to PIL Image
        if isinstance(image, str):
            img = Image.open(image).convert('RGB')
        elif isinstance(image, bytes):
            img = Image.open(BytesIO(image)).convert('RGB')
        elif isinstance(image, Image.Image):
            img = image
        else:
            raise ValueError("Image must be a PIL Image, a path string, or bytes")
        
        width, height = img.size
        
        # Simulated product detection for demo purposes
        # In a real implementation, this would use a detection model like YOLO or Faster R-CNN
        num_products = random.randint(3, 8)
        product_detections = []
        
        for i in range(num_products):
            # Generate random box (ensuring they don't go outside the image)
            box_width = random.randint(width // 6, width // 3)
            box_height = random.randint(height // 6, height // 3)
            x = random.randint(0, width - box_width)
            y = random.randint(0, height - box_height)
            
            # Random category and confidence
            cat_idx = random.randint(0, len(self.category_mapping) - 1)
            category = self.category_mapping[cat_idx]
            confidence = random.uniform(0.7, 0.99)
            
            product_detections.append({
                "box": [x, y, x + box_width, y + box_height],
                "category": category,
                "confidence": confidence,
                "product_id": f"P{random.randint(1000, 9999)}"
            })
        
        return {
            "num_products": len(product_detections),
            "detections": product_detections,
            "shelf_image_size": [width, height]
        }
    
    def visualize_shelf_detection(self, image, results):
        """Visualize product detections on a shelf image."""
        # Convert to PIL Image if needed
        if not isinstance(image, Image.Image):
            if isinstance(image, str):
                image = Image.open(image).convert('RGB')
            elif isinstance(image, bytes):
                image = Image.open(BytesIO(image)).convert('RGB')
            elif isinstance(image, np.ndarray):
                image = Image.fromarray(image)
        
        # Create a copy of the image to draw on
        img_draw = image.copy()
        draw = ImageDraw.Draw(img_draw)
        
        # Try to load a font (use default if not available)
        try:
            font = ImageFont.truetype("arial.ttf", 14)
        except IOError:
            font = ImageFont.load_default()
        
        # Color mapping for categories
        category_colors = {
            "Electronics": "blue",
            "Clothing": "red",
            "Groceries": "green",
            "Home": "orange",
            "Beauty": "purple"
        }
        
        # Draw bounding boxes and labels
        for det in results["detections"]:
            box = det["box"]
            category = det["category"]
            confidence = det["confidence"]
            product_id = det["product_id"]
            
            # Get color for category
            color = category_colors.get(category, "gray")
            
            # Draw rectangle
            draw.rectangle(box, outline=color, width=3)
            
            # Draw label background
            label = f"{category} ({confidence:.1%})"
            label_size = draw.textbbox((0, 0), label, font=font)
            label_width = label_size[2] - label_size[0]
            label_height = label_size[3] - label_size[1]
            label_bg = [box[0], box[1] - label_height - 4, box[0] + label_width + 4, box[1]]
            draw.rectangle(label_bg, fill=color)
            
            # Draw label text
            draw.text((box[0] + 2, box[1] - label_height - 2), label, fill="white", font=font)
        
        return img_draw
    
    def answer_product_question(self, image, question):
        """Answer a natural language question about a product."""
        # First, predict the product category
        prediction = self.predict(image, question)
        category = prediction["predicted_category"]
        confidence = prediction["confidence"]
        
        # Mock product catalog (in real implementation, this would query a database)
        product_details = {
            "Electronics": {
                "price_range": "$50-$1200",
                "top_brands": ["TechCorp", "Electra", "DigiLife"],
                "features": ["wireless connectivity", "long battery life", "high resolution display"],
                "warranty": "1-3 years"
            },
            "Clothing": {
                "price_range": "$15-$250",
                "top_brands": ["StyleX", "UrbanFit", "ClassicWear"],
                "features": ["sustainable materials", "comfortable fit", "machine washable"],
                "warranty": "30-day returns"
            },
            "Groceries": {
                "price_range": "$2-$35",
                "top_brands": ["FreshFarms", "OrganicLife", "NatureHarvest"],
                "features": ["organic options", "locally sourced", "no preservatives"],
                "warranty": "satisfaction guarantee"
            },
            "Home": {
                "price_range": "$10-$500",
                "top_brands": ["HomeLux", "ComfortLiving", "ModernSpace"],
                "features": ["durable construction", "stylish design", "easy assembly"],
                "warranty": "1-5 years"
            },
            "Beauty": {
                "price_range": "$8-$150",
                "top_brands": ["GlowUp", "NaturalBeauty", "LuxeSkin"],
                "features": ["cruelty-free", "fragrance-free options", "dermatologist tested"],
                "warranty": "30-day returns"
            }
        }
        
        # Simple rule-based QA logic
        response = ""
        
        # Get product info for the predicted category
        if category in product_details:
            info = product_details[category]
            
            # Very basic keyword matching for demo purposes
            question_lower = question.lower()
            
            if "price" in question_lower or "cost" in question_lower or "how much" in question_lower:
                response = f"This {category} product typically costs in the range of {info['price_range']}."
            
            elif "brand" in question_lower or "who makes" in question_lower or "manufacturer" in question_lower:
                top_brands = ", ".join(info["top_brands"])
                response = f"The top brands in this {category} category include {top_brands}."
            
            elif "feature" in question_lower or "specification" in question_lower or "what can" in question_lower:
                features = ", ".join(info["features"])
                response = f"This {category} product typically offers these features: {features}."
            
            elif "warranty" in question_lower or "guarantee" in question_lower or "return" in question_lower:
                response = f"This {category} product typically comes with a {info['warranty']}."
            
            elif "recommend" in question_lower or "alternative" in question_lower or "similar" in question_lower:
                response = f"Based on this {category} product, I would recommend checking out items from {', '.join(info['top_brands'][:2])}."
            
            else:
                # Generic response for other questions
                response = f"This appears to be a {category} product. It typically costs {info['price_range']} and features {', '.join(info['features'][:2])}."
        else:
            response = "I couldn't identify the product category clearly. Could you provide more information?"
        
        return {
            "question": question,
            "answer": response,
            "predicted_category": category,
            "confidence": confidence
        }
"""

# Create the inference directory if it doesn't exist
inference_dir = REPO_ROOT / "src" / "inference"
os.makedirs(inference_dir, exist_ok=True)

# Write the pipeline code to a file (commenting this out to avoid overwriting existing files)
# with open(inference_dir / "pipeline.py", "w") as f:
#     f.write(inference_pipeline_code)

# Display the code instead
print(inference_pipeline_code)

## 9. Kubernetes Deployment for Production

For production deployment, Kubernetes provides better scalability and management. Here's an example of Kubernetes manifests for our application.

In [None]:
# Example Kubernetes deployment manifest
kubernetes_deployment = """
apiVersion: apps/v1
kind: Deployment
metadata:
  name: retail-genai
  labels:
    app: retail-genai
spec:
  replicas: 1
  selector:
    matchLabels:
      app: retail-genai
  template:
    metadata:
      labels:
        app: retail-genai
    spec:
      containers:
      - name: retail-genai
        image: retail-genai-accelerator:latest
        imagePullPolicy: IfNotPresent
        command: ["python", "-m", "src.api.server"]
        ports:
        - containerPort: 8000
        resources:
          limits:
            nvidia.com/gpu: 1
          requests:
            memory: "4Gi"
            cpu: "1"
        env:
        - name: MODEL_DIR
          value: "/app/models"
        - name: USE_GPU
          value: "true"
        - name: HOST
          value: "0.0.0.0"
        - name: PORT
          value: "8000"
        - name: DEBUG
          value: "false"
        volumeMounts:
        - name: models-volume
          mountPath: /app/models
        - name: data-volume
          mountPath: /app/data
      volumes:
      - name: models-volume
        persistentVolumeClaim:
          claimName: models-pvc
      - name: data-volume
        persistentVolumeClaim:
          claimName: data-pvc
"""

# Example Kubernetes service manifest
kubernetes_service = """
apiVersion: v1
kind: Service
metadata:
  name: retail-genai
spec:
  selector:
    app: retail-genai
  ports:
  - port: 8000
    targetPort: 8000
  type: ClusterIP
"""

# Example Kubernetes ingress manifest
kubernetes_ingress = """
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: retail-genai-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: retail-genai.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: retail-genai
            port:
              number: 8000
"""

# Example Kubernetes persistent volume claims
kubernetes_pvc = """
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: models-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
"""

# Create k8s directory
k8s_dir = REPO_ROOT / "k8s"
os.makedirs(k8s_dir, exist_ok=True)

# Write Kubernetes manifests (commenting this out to avoid overwriting existing files)
# with open(k8s_dir / "deployment.yaml", "w") as f:
#     f.write(kubernetes_deployment)
# with open(k8s_dir / "service.yaml", "w") as f:
#     f.write(kubernetes_service)
# with open(k8s_dir / "ingress.yaml", "w") as f:
#     f.write(kubernetes_ingress)
# with open(k8s_dir / "pvc.yaml", "w") as f:
#     f.write(kubernetes_pvc)

# Display the Kubernetes manifests
print("Kubernetes Deployment Manifest:\n")
print(kubernetes_deployment)
print("\nKubernetes Service Manifest:\n")
print(kubernetes_service)
print("\nKubernetes Ingress Manifest:\n")
print(kubernetes_ingress)
print("\nKubernetes PVC Manifest:\n")
print(kubernetes_pvc)

## 10. Deploying to Kubernetes with NVIDIA GPU Support

Here are the steps to deploy our application to a Kubernetes cluster with NVIDIA GPU support.

In [None]:
# Commands to deploy to Kubernetes (not executed)
kubernetes_deploy_commands = """
# 1. Build and push Docker image (if using a registry)
docker build -t retail-genai-accelerator:latest -f docker/Dockerfile .
docker tag retail-genai-accelerator:latest your-registry/retail-genai-accelerator:latest
docker push your-registry/retail-genai-accelerator:latest

# 2. Apply Kubernetes manifests
kubectl apply -f k8s/pvc.yaml
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
kubectl apply -f k8s/ingress.yaml

# 3. Verify deployment
kubectl get pods
kubectl get services
kubectl get ingress

# 4. Check logs
kubectl logs -f deployment/retail-genai

# 5. Check GPU utilization within the pod
kubectl exec -it $(kubectl get pods -l app=retail-genai -o jsonpath='{.items[0].metadata.name}') -- nvidia-smi
"""

print("Kubernetes Deployment Commands:")
print(kubernetes_deploy_commands)

### NVIDIA GPU Operator for Kubernetes

The NVIDIA GPU Operator simplifies the management of GPUs in Kubernetes clusters. It automatically installs drivers, device plugins, and monitoring components.

In [None]:
# NVIDIA GPU Operator installation (not executed)
gpu_operator_install = """
# Add the NVIDIA Helm repository
helm repo add nvidia https://nvidia.github.io/gpu-operator
helm repo update

# Install the NVIDIA GPU Operator
helm install --wait --generate-name \
     -n gpu-operator --create-namespace \
     nvidia/gpu-operator

# Verify installation
kubectl get pods -n gpu-operator
"""

print("NVIDIA GPU Operator Installation:")
print(gpu_operator_install)

## 11. Performance Monitoring

For monitoring GPU utilization and application performance in production, we can use tools like Prometheus, Grafana, and NVIDIA DCGM (Data Center GPU Manager).

In [None]:
# Prometheus configuration for GPU monitoring (not executed)
prometheus_config = """
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
    - role: pod
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
      action: keep
      regex: true
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
      action: replace
      target_label: __metrics_path__
      regex: (.+)
    - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
      action: replace
      regex: ([^:]+)(?::\d+)?;(\d+)
      replacement: $1:$2
      target_label: __address__

  - job_name: 'dcgm-exporter'
    kubernetes_sd_configs:
    - role: pod
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_label_app]
      action: keep
      regex: dcgm-exporter
    - source_labels: [__address__]
      action: replace
      regex: ([^:]+)(?::\d+)?
      replacement: $1:9400
      target_label: __address__
"""

print("Prometheus Configuration for GPU Monitoring:")
print(prometheus_config)

### Grafana Dashboard for GPU Monitoring

Here's an example of how to set up a Grafana dashboard for GPU monitoring.

In [None]:
# NVIDIA DCGM installation with Helm (not executed)
dcgm_install = """
# Add the NVIDIA Helm repository (if not already added)
helm repo add nvidia https://nvidia.github.io/gpu-monitoring-tools/helm-charts
helm repo update

# Install DCGM Exporter
helm install --name=dcgm-exporter nvidia/dcgm-exporter

# View the metrics
kubectl port-forward svc/dcgm-exporter 9400:9400
# Then visit http://localhost:9400/metrics in your browser
"""

print("NVIDIA DCGM Exporter Installation:")
print(dcgm_install)

## 12. Scaling Strategies

Here are some strategies for scaling our application based on load.

In [None]:
# Horizontal Pod Autoscaler example (not executed)
hpa_manifest = """
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: retail-genai-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: retail-genai
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 80
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
"""

print("Horizontal Pod Autoscaler Manifest:")
print(hpa_manifest)

### Multi-Node GPU Cluster

For handling larger workloads, we can deploy our application across multiple nodes with GPUs.

In [None]:
# Multi-node deployment strategies (not executed)
multi_node_strategies = """
# 1. Node Selection
# Use nodeSelector or node affinity to target specific GPU nodes
nodeSelector:
  gpu-type: nvidia-a100

# 2. Anti-Affinity
# Spread pods across nodes for high availability
affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - retail-genai
        topologyKey: kubernetes.io/hostname

# 3. Using Topology Spread Constraints
# More advanced scheduling control
topologySpreadConstraints:
- maxSkew: 1
  topologyKey: kubernetes.io/hostname
  whenUnsatisfiable: DoNotSchedule
  labelSelector:
    matchLabels:
      app: retail-genai
"""

print("Multi-Node Deployment Strategies:")
print(multi_node_strategies)

## 13. Security Considerations

Here are some security considerations for deploying containerized ML applications.

In [None]:
# Security best practices (not executed)
security_best_practices = """
# 1. Use Non-Root User in Containers
# Add to Dockerfile:
RUN groupadd -r appuser && useradd -r -g appuser appuser
USER appuser

# 2. Set Security Context in Kubernetes
securityContext:
  runAsUser: 1000
  runAsGroup: 1000
  runAsNonRoot: true
  readOnlyRootFilesystem: true

# 3. Resource Limits
resources:
  limits:
    nvidia.com/gpu: 1
    memory: "8Gi"
    cpu: "2"
  requests:
    memory: "4Gi"
    cpu: "1"

# 4. Network Policies
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: retail-genai-network-policy
spec:
  podSelector:
    matchLabels:
      app: retail-genai
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8000

# 5. Secrets Management
# Use Kubernetes Secrets for credentials
apiVersion: v1
kind: Secret
metadata:
  name: retail-genai-secrets
type: Opaque
data:
  api-key: base64EncodedApiKey
"""

print("Security Best Practices:")
print(security_best_practices)

## 14. CI/CD Pipeline for ML Models

Setting up a CI/CD pipeline for ML models ensures reliable, reproducible deployments.

In [None]:
# GitHub Actions CI/CD example (not executed)
github_actions_workflow = """
name: Retail GenAI CI/CD

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Set up Python
      uses: actions/setup-python@v2
      with:
        python-version: '3.10'
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install pytest pytest-cov
        if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
    - name: Test with pytest
      run: |
        pytest --cov=src tests/

  build-and-push:
    needs: test
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Login to DockerHub
      uses: docker/login-action@v1
      with:
        username: ${{ secrets.DOCKERHUB_USERNAME }}
        password: ${{ secrets.DOCKERHUB_TOKEN }}
    - name: Build and push
      uses: docker/build-push-action@v2
      with:
        context: .
        file: ./docker/Dockerfile
        push: true
        tags: yourusername/retail-genai-accelerator:latest

  deploy:
    needs: build-and-push
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Set up kubectl
      uses: azure/setup-kubectl@v1
    - name: Set up kubeconfig
      run: |
        mkdir -p $HOME/.kube
        echo "${{ secrets.KUBE_CONFIG }}" > $HOME/.kube/config
    - name: Deploy to Kubernetes
      run: |
        kubectl apply -f k8s/pvc.yaml
        kubectl apply -f k8s/deployment.yaml
        kubectl apply -f k8s/service.yaml
        kubectl apply -f k8s/ingress.yaml
        kubectl rollout restart deployment/retail-genai
"""

print("GitHub Actions CI/CD Workflow:")
print(github_actions_workflow)

## 15. Summary and Next Steps

In this notebook, we've covered the containerization and deployment of our retail GenAI system. Key points include:

1. **Docker Containerization**: We've packaged our application in a Docker container with NVIDIA GPU support.

2. **Kubernetes Deployment**: We've prepared Kubernetes manifests for production deployment with GPU acceleration.

3. **API Server**: We've created a dedicated API server for serving model predictions.

4. **Infrastructure as Code**: We've provided all the necessary configuration files for reproducible deployments.

5. **Performance Monitoring**: We've added monitoring tools for tracking GPU utilization and application performance.

### Next Steps

To further enhance your retail GenAI system, consider the following next steps:

1. **Implement Active Learning**: Add feedback loops to continuously improve model performance.

2. **Model Versioning**: Set up model versioning and A/B testing infrastructure.

3. **Advanced GPU Optimizations**: Explore NVIDIA Triton Inference Server for even better performance.

4. **Edge Deployment**: Adapt the system for deployment on edge devices in retail stores.

5. **Integration with Retail Systems**: Connect with inventory management, CRM, and POS systems.

This comprehensive solution demonstrates the power of combining AI with NVIDIA GPU acceleration for retail applications, providing a foundation for real-world deployment.