# SPARC Containerization and Deployment

## 1.0 Introduction
This notebook covers the final phase: packaging the SPARC backend into portable containers and deploying them to HiPerGator with a robust networking bridge.

**⚠️ Important Note on Conda vs Containers:**
Per UF RC best practices, **conda environments are preferred** for HiPerGator and PubApps deployments. Containers (Apptainer/Podman) should only be used when:
- You have complex system-level dependencies
- You need guaranteed reproducibility across different environments
- You're deploying to systems without conda support

For most SPARC-P use cases, follow the conda-based workflow in Notebooks 1 and 3.

### 1.1 Objectives
1. **Containerize**: Create Dockerfiles for the Multi-Agent System (MAS) (optional/reference)
2. **Bridge**: Configure the WebSocket-to-gRPC bridge for Unity connectivity
3. **Deploy**: Generate production SLURM scripts for HiPerGator

### 1.2 Introduction Diagram
![Introduction](./images/notebook_2_-_section_1.png)

Introduction: This section sets the objectives for packaging and deploying the backend. While containers are covered for completeness, the recommended approach for HiPerGator/PubApps is to use conda environments (see environment_backend.yml) with Podman containers only for Riva speech services.

## 2.0 Containerization (Docker -> Apptainer)
We develop with Docker/Podman and deploy with Apptainer on HPC.


![notebook 2 - section 2.png](images/notebook_2_-_section_2.png)


Container Build Strategy: This flow shows the Multi-Stage Build strategy used to create secure and small containers. A "Builder" stage installs dependencies from `requirements.txt` using `pip`, and then only the necessary artifacts are copied over to a slim "Runtime" stage. This excludes compiler tools and cache files from the final production image.

### 2.1 Dockerfile Definition

This script creates a `Dockerfile.mas` for the Multi-Agent System. We uses a multi-stage build strategy:
1. **Builder Stage**: Installs dependencies from `requirements.txt` using `pip`.
2. **Runtime Stage**: Copies only the installed packages to a lightweight `python:3.11-slim` image. This minimizes the container size and attack surface.

Two files are created on disk — `requirements.txt` and `Dockerfile.mas` — the building blocks for packaging the SPARC-P backend into a portable container.

- `requirements.txt` lists every Python library the backend needs (FastAPI for the web server, bitsandbytes for quantized AI models, Presidio for PII scrubbing, Riva client for speech, etc.) so they can all be installed at once inside the container.
- `Dockerfile.mas` is a recipe that tells the container engine exactly how to build the backend image. It uses a **two-stage build**: the first stage (builder) installs all the heavy build tools and packages; the second stage (runtime) copies only the final installed packages into a much smaller, clean image — keeping the deployed container lean and secure.
- When you run `create_requirements_file()` and `create_dockerfile()` at the bottom, both files are written to the current directory and a confirmation message is printed.

> **Note:** For HiPerGator and PubApps deployments, the preferred approach is conda environments (see `environment_backend.yml`). This Dockerfile is primarily for local development or situations where containers are explicitly required.

In [None]:
# 2.2 Dockerfile for Multi-Agent System (MAS)
# NOTE: For HiPerGator training, use conda environments instead (see environment_backend.yml)
# This is primarily for local development or when containers are explicitly required

import os

def create_requirements_file():
    """Writes canonical pip dependency artifact used by Dockerfile.mas."""
    requirements = """
fastapi
uvicorn[standard]
pydantic>=2.5.0
numpy>=1.24.0
aiofiles
websockets
python-multipart
transformers>=4.36.0
accelerate>=0.25.0
tokenizers>=0.15.0
bitsandbytes>=0.41.0
peft>=0.7.0
langchain>=0.1.0
langchain-community>=0.0.13
langchain-openai>=0.0.5
langchain-chroma>=0.1.0
langgraph>=0.0.26
nvidia-riva-client>=2.14.0
nemoguardrails>=0.5.0
chromadb>=0.4.22
presidio-analyzer>=2.2.33
presidio-anonymizer>=2.2.33
firebase-admin>=6.2.0
python-jose[cryptography]
python-dotenv
grpcio
grpcio-tools
""".strip()
    with open("requirements.txt", "w", encoding="utf-8") as f:
        f.write(requirements + "\n")
    print("Created requirements.txt")

def create_dockerfile():
    if not os.path.exists("requirements.txt"):
        raise FileNotFoundError("requirements.txt not found. Run create_requirements_file() first.")

    dockerfile_content = """
# --- Build Stage ---
FROM python:3.11-slim as builder
WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \\
    build-essential \\
    curl \\
    && rm -rf /var/lib/apt/lists/*

# Install Python dependencies
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

# --- Runtime Stage ---
FROM python:3.11-slim
WORKDIR /app

# Install runtime dependencies only
RUN apt-get update && apt-get install -y --no-install-recommends \\
    curl \\
    && rm -rf /var/lib/apt/lists/*

COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY --from=builder /app /app

EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
    """
    with open("Dockerfile.mas", "w", encoding="utf-8") as f:
        f.write(dockerfile_content.strip())
    print("Created Dockerfile.mas")
    print("\nFor HiPerGator/PubApps deployment, conda environments are preferred.")
    print("See environment_backend.yml and setup_conda_env.sh")

create_requirements_file()
create_dockerfile()

## 3.0 Local Development with Podman
Podman allows creating a 'pod' to simulate the production network namespace.


![notebook 2 - section 3.png](images/notebook_2_-_section_3.png)

Local Development Pod (Podman): This illustrates the local development environment using Podman Pods. Unlike standard Docker containers which are isolated, a "Pod" shares a network namespace (localhost). This allows the Riva Server, WebSocket Bridge, and MAS (Multi-Agent System) to communicate locally, perfectly simulating the production environment on a developer's machine.


### 3.1 Podman Local Workflow

For local development, **Podman** is preferred over Docker because it allows us to create a **Pod**. A Pod shares a network namespace (localhost), allowing the separate containers (Riva, Bridge, MAS) to communicate with each other as if they were running on the same machine, mimicking the production environment.

A ready-to-use sequence of shell commands for spinning up all three SPARC-P services locally using Podman is printed below. Nothing is executed automatically — copy and paste these commands into your local terminal.

Step by step:
1. `podman pod create` — creates a shared network sandbox named `sparc-backend`, with port 8080 forwarded so you can reach it from your browser.
2. `podman run ... riva-server` — starts the NVIDIA Riva speech AI engine (ASR + TTS) inside the pod.
3. `podman run ... ws-bridge` — starts the WebSocket bridge, which relays audio between the browser and Riva, using `localhost:50051` as the Riva address (works because everything is in the same pod).
4. `podman run ... mas-server` — starts the Multi-Agent System (the AI orchestration layer) on port 8000.

> **Tip:** After running these commands, open your browser to `http://localhost:8080` to interact with the system locally before deploying to HiPerGator.

In [None]:
# 3.1 Podman Workflow (Reference Commands)
# Run these in your local terminal to test interaction between Riva, Bridge, and MAS.

podman_commands = """
# 1. Create Pod
podman pod create --name sparc-backend -p 8080:8080

# 2. Run Riva Server
podman run -d --pod sparc-backend --name riva-server nvcr.io/nvidia/riva/riva-speech:2.16.0-server

# 3. Run WebSocket Bridge
podman run -d --pod sparc-backend --name ws-bridge \
    -e RIVA_API_URL=localhost:50051 \
    riva-websocket-bridge:latest

# 4. Run MAS Server
podman run -d --pod sparc-backend --name mas-server your-repo/mas-server:latest
"""
print(podman_commands)

## 4.0 Production Deployment on HiPerGator
Deploying persistent services using SLURM and Apptainer.


![notebook 2 - section 4.png](images/notebook_2_-_section_4.png)

Production Deployment (SLURM): This diagram shows the execution flow of the sparc_production.slurm script on HiPerGator. It details how the SLURM scheduler allocates resources (GPUs) and then launches three concurrent Apptainer containers in the background, keeping them alive with a wait command.

### 4.1 Building SIF Images

HiPerGator uses Apptainer, which requires Singularity Image Format (`.sif`) files. The commands below (commented out) show how to convert your local Docker images into SIF files using `apptainer build`. These files should be stored in the `/blue` directory.

A placeholder reminder section — it prints an instruction message but does not build anything automatically. The commented-out lines (starting with `#`) show the actual Apptainer commands you would run in a HiPerGator terminal to convert your Docker images into `.sif` files.

Why this step is needed: HiPerGator's production compute nodes use **Apptainer** (formerly Singularity) instead of Docker or Podman. Apptainer requires images in `.sif` (Singularity Image Format) format. The `apptainer build` command reads from a locally running Docker daemon and writes a portable `.sif` file that can be stored in your `/blue` project directory and run on any HiPerGator node.

> **To actually use this:** Uncomment the three `apptainer build` lines, load the apptainer module (`module load apptainer`), and run them in a HiPerGator login node terminal — not in this notebook.

In [None]:
# 4.1 Build SIF Images
# !module load apptainer
# !apptainer build mas_server.sif docker-daemon://your-repo/mas-server:latest
# !apptainer build websocket_bridge.sif docker-daemon://riva-websocket-bridge:latest
print("Build SIF images from Docker/Daemon sources before deployment.")

### 4.2 Production Service Launch

This function generates the `sparc_production.slurm` script. This is the critical deployment artifact that runs the system on HiPerGator. Key features:
- **Persistent GPUs**: Requests 4 GPUs on the AI partition.
- **Background Processes**: Launches Riva, the Bridge, and the MAS server as background tasks (`&`).
- **Wait Command**: The `wait` instruction keeps the SLURM job alive indefinitely, ensuring the services remain running.
- **Policy-Compliant Runtime**: Uses a finite default (`7-00:00:00`) to avoid scheduler rejection; only use `UNLIMITED` if your partition/QoS explicitly allows it.

`sparc_production.slurm` is a SLURM job script that, when submitted to HiPerGator, starts all three SPARC-P services as a persistent long-running job.

Key details of the generated script:
- **Resource request:** 3 tasks × 1 GPU each (so the 3 services can each use a dedicated GPU), 4 CPU cores and 96 GB RAM total, on the `gpu` partition under the project QoS.
- **7-day time limit:** The job runs for up to 7 days before the scheduler terminates it. Re-submit weekly to keep the service alive.
- **Background launches (`&`):** Each service (Riva, WebSocket bridge, MAS) is started as a background process so they all run concurrently. A 20-second sleep between Riva and the bridge gives Riva time to initialize before downstream services connect.
- **`wait` command:** This keeps the SLURM job alive until all background processes finish (or the time limit is hit). Without it, the job would exit immediately after launching the services.
- **Environment variables:** Paths to the `.sif` files are read from environment variables with safe defaults pointing to `/blue/jasondeanarnold/SPARCP/containers/`.

> **To deploy:** Transfer `sparc_production.slurm` to HiPerGator and submit with `sbatch sparc_production.slurm`.

In [None]:
# 6.1 Production Deployment Script (Persistent Service)
# Note: Containers are optional; conda-based deployment is preferred for HiPerGator/PubApps

def generate_production_script():
    script_content = """#!/bin/bash
#SBATCH --job-name=sparcp-service
#SBATCH --partition=gpu
#SBATCH --qos=jasondeanarnold-b
#SBATCH --nodes=1
#SBATCH --ntasks=3
#SBATCH --gpus-per-task=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=96gb
#SBATCH --time=7-00:00:00
#SBATCH --output=sparc_service_%j.log

module purge
module load apptainer

SPARC_BASE_PATH=${SPARC_BASE_PATH:-/blue/jasondeanarnold/SPARCP}
SPARC_BIND_ROOT=${SPARC_BIND_ROOT:-/blue}
RIVA_SIF=${SPARC_RIVA_SIF:-$SPARC_BASE_PATH/containers/riva_server.sif}
BRIDGE_SIF=${SPARC_BRIDGE_SIF:-$SPARC_BASE_PATH/containers/websocket_bridge.sif}
MAS_SIF=${SPARC_MAS_SIF:-$SPARC_BASE_PATH/containers/mas_server.sif}

# Launch Services in Background
echo "Starting Riva..."
apptainer exec --nv ${RIVA_SIF} riva_start.sh &
sleep 20

echo "Starting Bridge..."
apptainer exec ${BRIDGE_SIF} riva-websocket-gateway --riva-uri=localhost:50051 --port=8080 &

echo "Starting MAS..."
apptainer exec --nv -B $SPARC_BIND_ROOT ${MAS_SIF} uvicorn main:app --host 0.0.0.0 --port 8000 &

wait
"""
    with open("sparc_production.slurm", "w") as f:
        f.write(script_content.strip())
    print("Generated sparc_production.slurm")
    print("\nNote: For most use cases, prefer conda-based deployment from Notebook 3")
    print("Policy note: Use UNLIMITED only when explicitly allowed by your partition/QoS")

generate_production_script()