# 🐳 Week 11-12 · Notebook 03 · Docker Containerization for the Manufacturing Copilot

This notebook details how to package our FastAPI application, agent dependencies, and model weights into a secure, reproducible, and optimized Docker container, ready for deployment anywhere.


## 🎯 Learning Objectives

- **Author a Production-Grade Dockerfile:** Write a multi-stage `Dockerfile` that creates a small, secure, and efficient container image.
- **Implement Security Best Practices:** Learn to run containers as a non-root user, minimize the attack surface, and scan for vulnerabilities.
- **Manage Configuration Flexibly:** Use build arguments and environment variables to create a single container image that can be configured for different plants or environments at runtime.
- **Ensure Supply Chain Security:** Generate a Software Bill of Materials (SBOM) to create a transparent inventory of all dependencies in our container.


## 🧩 Scenario: A "Build Once, Deploy Anywhere" Container

The Manufacturing Copilot needs to be deployed across a hybrid environment: some plants will use on-premises edge servers (with limited resources), while others will deploy to GCP Cloud Run. Our container strategy must support this.

**Requirements:**
1.  **Small Image Size:** The final container image must be as small as possible (e.g., < 1.5 GB) to reduce storage costs and deployment times.
2.  **Security:** The container must run as a non-root user and pass regular security scans for known vulnerabilities (CVEs).
3.  **Configurability:** Plant-specific settings (like language translation flags or database endpoints) must be configurable at runtime via environment variables, not by rebuilding the image.


### 1. Authoring a Production-Grade Dockerfile

A `Dockerfile` is a script that contains instructions for building a Docker image. For production, we use a **multi-stage build** to create a final image that is small, secure, and contains only what's necessary to run the application.

**Key Principles:**
- **Stage 1 (Builder):** This stage installs dependencies, including build-time tools. It's where we compile code or download packages.
- **Stage 2 (Final):** This stage starts from a clean, minimal base image and copies *only the necessary artifacts* (like the compiled code and the Python virtual environment) from the builder stage. This technique dramatically reduces the final image size and attack surface.

Let's create our `Dockerfile`. We'll use `%%writefile` to create the file in the `app` directory we made in the previous notebook.

In [None]:
%%writefile app/Dockerfile

# --- STAGE 1: Builder ---
# This stage installs all Python dependencies, including compile-time-only ones.
# We use a specific version of Python for reproducibility.
FROM python:3.10-slim as builder

# Set best practices for Docker builds
ENV PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1 \
    PIP_NO_CACHE_DIR=off \
    PIP_DISABLE_PIP_VERSION_CHECK=on \
    PIP_DEFAULT_TIMEOUT=100 \
    POETRY_VERSION=1.5.1

# Install poetry for dependency management
RUN pip install "poetry==$POETRY_VERSION"

# Set working directory
WORKDIR /app

# Copy only the files needed to install dependencies.
# This leverages Docker's layer caching. The layer will only be rebuilt if
# these files change.
COPY poetry.lock pyproject.toml ./

# Install dependencies into a virtual environment.
# --no-root: Don't install poetry as root.
# --no-dev: Don't install development dependencies (like pytest).
# virtualenvs.in-project: Create the .venv folder in the project directory.
RUN poetry config virtualenvs.in-project true && \
    poetry install --no-interaction --no-ansi --no-dev

# --- STAGE 2: Final Image ---
# This stage creates the final, lightweight, and secure image.
FROM python:3.10-slim

# Set working directory
WORKDIR /app

# Create a non-root user to run the application for security reasons.
RUN useradd --create-home appuser
USER appuser

# Copy the virtual environment from the builder stage.
COPY --from=builder /app/.venv ./.venv
# Copy the application source code.
COPY . .

# Make the virtual environment's binaries accessible in the PATH.
ENV PATH="/app/.venv/bin:$PATH"

# --- Configuration via Environment Variables ---
# These can be overridden at runtime (e.g., in docker-compose or Cloud Run).
ENV APP_PORT=8000
ENV LOG_LEVEL="INFO"

# Expose the port the app will run on.
EXPOSE 8000

# --- Entrypoint ---
# The command to run when the container starts.
# Use `gunicorn` for a production-ready ASGI server with multiple workers.
CMD ["gunicorn", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "app.main:app", "--bind", "0.0.0.0:8000"]

### 🔍 Analysis of the Multi-Stage Dockerfile

This `Dockerfile` is designed for both efficiency and security.

-   **Stage 1 (`builder`):**
    -   It starts from a slim Python base image and sets several environment variables to follow best practices for Python in Docker.
    -   It installs `poetry`, a modern dependency management tool.
    -   It copies *only* `pyproject.toml` and `poetry.lock`. This is a key optimization. Docker caches layers, and this step will only be re-run if the dependencies change, speeding up subsequent builds.
    -   It installs dependencies into a virtual environment inside the project directory (`.venv`). We exclude development dependencies (`--no-dev`) to keep the image lean.

-   **Stage 2 (Final Image):**
    -   It also starts from a slim Python image, ensuring no unnecessary OS packages are included.
    -   **Security:** It creates a dedicated, non-root user named `appuser` and switches to it. This is a critical security practice to limit the container's privileges and reduce the "blast radius" if the application is compromised.
    -   **Efficiency:** It copies *only* the virtual environment from the `builder` stage and the application source code. It does *not* copy the build tools (like `poetry` itself), source packages, or cache, resulting in a much smaller final image.
    -   **Configurability:** It defines `ENV` variables for runtime configuration (like `APP_PORT`). These act as defaults but can be easily overridden when the container is run.
    -   **Production-Ready Entrypoint:** It uses `gunicorn` with `uvicorn` workers. `gunicorn` is a mature, robust process manager that can handle multiple concurrent requests by managing several `uvicorn` worker processes, making it ideal for production workloads.

### 2. Local Development with Docker Compose

`docker-compose` is a tool for defining and running multi-container Docker applications. For our project, it simplifies local development by automating the process of building the image, running the container, and mapping ports.

Let's create a `docker-compose.yml` file.

In [None]:
%%writefile app/docker-compose.yml

version: '3.8'

services:
  copilot_api:
    build:
      context: .
      dockerfile: Dockerfile
    image: manufacturing-copilot-api:latest
    container_name: manufacturing_copilot_api
    ports:
      - "8000:8000"
    environment:
      # Override environment variables for local testing
      - LOG_LEVEL=DEBUG
      # Example of a secret that should be handled securely, not hardcoded
      - DATABASE_URL=postgresql://user:password@host.docker.internal:5432/mydb
    volumes:
      # Mount local source code for live-reloading during development.
      # This is for development only and should NOT be used in production.
      - .:/app


## 🛡️ Security, Compliance, and Software Supply Chain

Containerizing an application is not just about packaging; it's also about securing the **software supply chain**. We need to know exactly what's inside our container and ensure it's free from known vulnerabilities.

| Control                       | Implementation                                                                                             | Evidence & Artifacts                                                              |
| ----------------------------- | ---------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- |
| **Vulnerability Scanning**    | Integrate a scanner like **Trivy** into the CI/CD pipeline to check the OS packages and Python libraries for known vulnerabilities (CVEs). | A passing Trivy scan report (JSON or table format). The pipeline should fail if high-severity CVEs are found. |
| **Software Bill of Materials (SBOM)** | Generate an SBOM during the build process. This is a detailed inventory of all software components, their versions, and their licenses. | An SBOM file in a standard format like CycloneDX or SPDX, archived with the release artifacts. |
| **Static Code Analysis**      | Use tools like `bandit` to scan Python code for common security issues (e.g., hardcoded passwords, SQL injection vulnerabilities). | A passing `bandit` report. The CI pipeline can be configured to fail if issues are found. |
| **Base Image Management**     | Use minimal, trusted base images (like `python:3.10-slim`) from official sources and regularly update them to patch underlying vulnerabilities. | Dockerfile history, base image version pinning (e.g., `python:3.10.12-slim`). |
| **Secrets Management**        | Never hardcode secrets (API keys, passwords) in the Docker image. Load them at runtime from a secure vault (like GCP Secret Manager or HashiCorp Vault). | Code review checklists confirming no secrets are in the repository. Configuration files that point to secret manager paths. |

In [None]:
# --- Example CI Commands for Security Scans ---

# These commands would typically be run in a CI/CD pipeline (e.g., GitHub Actions)
# after the application has been built.

# 1. Build the Docker image first
# The `--tag` flag gives our image a human-readable name.
build_command = "docker build -t manufacturing-copilot-api:latest -f app/Dockerfile app/"
print(f"Build Command:\n{build_command}\n")


# 2. Run Trivy for vulnerability scanning
# This command scans the image for HIGH and CRITICAL severity vulnerabilities.
# `--exit-code 1` ensures that the command will fail (exit with a non-zero code)
# if any such vulnerabilities are found, which will stop the CI pipeline.
trivy_command = "trivy image --severity HIGH,CRITICAL --exit-code 1 manufacturing-copilot-api:latest"
print(f"Vulnerability Scan Command:\n{trivy_command}\n")


# 3. Generate a Software Bill of Materials (SBOM)
# This command generates a detailed list of all OS and Python packages in the container.
# The output is saved in the CycloneDX JSON format, a common standard for SBOMs.
# This artifact is crucial for compliance and for quickly identifying affected systems
# if a new vulnerability is discovered in a dependency.
sbom_command = "trivy image --format cyclonedx --output sbom.json manufacturing-copilot-api:latest"
print(f"SBOM Generation Command:\n{sbom_command}\n")

# 4. Run Bandit for static code analysis
# This command scans the `app` directory for common security issues in Python code.
# `-r` means recursive, and `-ll` sets the confidence level to medium.
bandit_command = "bandit -r app/ -ll"
print(f"Static Analysis Command:\n{bandit_command}\n")

## 🧪 Lab Assignment: Build and Secure Your Container

1.  **Install Dependencies:**
    -   Make sure you have **Docker Desktop** installed and running on your machine.
    -   Install the necessary Python tools for security scanning: `pip install trivy bandit`.

2.  **Build and Run the Container:**
    -   Navigate to the `app` directory in your terminal.
    -   Run the command: `docker-compose up --build`. This will:
        -   Build the Docker image using the `Dockerfile`.
        -   Start a container based on that image.
        -   Map port 8000 on your local machine to port 8000 in the container.
    -   Open your browser to [http://localhost:8000/docs](http://localhost:8000/docs). You should see the live FastAPI Swagger UI being served from within your container.

3.  **Run a Smoke Test:**
    -   While the container is running, open another terminal.
    -   Use `curl` to test the `/health` endpoint: `curl http://localhost:8000/health`. You should see `{"status":"ok"}`.
    -   Use `curl` to test the `/v1/diagnose` endpoint. Remember to include the required `X-Auth-Token` header and a valid JSON payload.
        ```bash
        curl -X POST "http://localhost:8000/v1/diagnose" \
        -H "Content-Type: application/json" \
        -H "X-Auth-Token: Bearer technician-123" \
        -d '{
          "plant_id": "MEX-GTO",
          "equipment_id": "ROBOT-ARM-007",
          "problem_description": "The arm is failing to grip parts consistently."
        }'
        ```

4.  **Scan Your Image for Vulnerabilities:**
    -   In your terminal (with the container stopped), run the Trivy scan command from the cell above on the `manufacturing-copilot-api:latest` image you just built.
        ```bash
        trivy image --severity HIGH,CRITICAL --exit-code 0 manufacturing-copilot-api:latest
        ```
        *(Note: We use `--exit-code 0` for the lab so it doesn't stop even if vulnerabilities are found. In a real CI pipeline, you would use `--exit-code 1`.)*
    -   Analyze the output. Does Trivy find any vulnerabilities in the base OS packages or Python libraries?

5.  **Generate an SBOM:**
    -   Run the SBOM generation command from the cell above.
    -   Inspect the `sbom.json` file that is created. Can you find `fastapi`, `gunicorn`, and `pydantic` in the list of components? This file is your container's "list of ingredients."

## ✅ Checklist for this Notebook

- [X] A multi-stage `Dockerfile` is created to produce a small and efficient final image.
- [X] The container is configured to run as a non-root user for enhanced security.
- [X] A `docker-compose.yml` file is set up for easy local development and testing.
- [X] Commands for vulnerability scanning (`Trivy`) and SBOM generation are defined for CI/CD integration.
- [ ] **TODO:** Complete the Lab Assignment to build, test, and scan your container image.


## 📚 References and Further Reading

-   [Docker Docs: Multi-stage builds](https://docs.docker.com/build/building/multi-stage/) - The official guide to creating optimized, multi-stage builds.
-   [FastAPI in Containers - Official Tutorial](https://fastapi.tiangolo.com/deployment/docker/)
-   [Aqua Trivy Documentation](https://aquasecurity.github.io/trivy/) - Comprehensive guide for the vulnerability scanner.
-   [CycloneDX SBOM Standard](https://cyclonedx.org/) - Learn more about the Software Bill of Materials format.
-   [OWASP Docker Security Cheat Sheet](https://cheatsheetseries.owasp.org/Docker_Security_Cheat_Sheet.html) - Excellent tips for securing your containers.
