# Chapter 10: Docker Security Best Practices

Security in containerized environments requires a defense-in-depth approach. While Docker provides isolation mechanisms, containers share the host kernel, making security misconfigurations potentially catastrophic. This chapter covers industry-standard practices endorsed by the Center for Internet Security (CIS), NIST, and the Open Web Application Security Project (OWASP) to secure your containers throughout their lifecycle.

## 10.1 Running as Non-Root User

By default, Docker containers run as the `root` user (UID 0). This is dangerous because if an attacker compromises the application, they gain root privileges inside the container, and potentially on the host if container escape vulnerabilities exist.

### Understanding the Risk

When a container runs as root:
- **Container Escape**: Kernel exploits become more severe since the escaped process has root privileges on the host
- **Filesystem Access**: Unrestricted access to container filesystems
- **Process Visibility**: Ability to see and potentially interact with host processes
- **Resource Exhaustion**: No limits on resource consumption (unless explicitly set)

### Implementation Strategy

#### Step 1: Create a Dedicated User in Dockerfile

```dockerfile
# Bad Practice - Running as root (default)
FROM node:18
WORKDIR /app
COPY . .
RUN npm install
CMD ["node", "server.js"]

# Good Practice - Running as non-root
FROM node:18

# Create a non-root user and group
RUN groupadd -r appgroup && useradd -r -g appgroup appuser

# Set working directory with proper ownership
WORKDIR /app

# Copy files with specific ownership
COPY --chown=appuser:appgroup . .

# Install dependencies (requires root temporarily)
RUN npm install

# Switch to non-root user
USER appuser

# Expose port (doesn't require privileges)
EXPOSE 3000

CMD ["node", "server.js"]
```

#### Step 2: Using Numeric UID/GID

Always use numeric IDs instead of usernames to prevent conflicts with host user namespaces:

```dockerfile
# Better approach - Using numeric IDs
FROM alpine:latest

# Create user with specific UID/GID
RUN addgroup -g 1000 -S appgroup && \
    adduser -u 1000 -S appuser -G appgroup

# Verify the user exists
RUN id appuser

USER 1000:1000

CMD ["sh"]
```

**Industry Standard**: Use UID/GID 1000 or higher to avoid conflicts with system users (0-999).

#### Step 3: Handling Permission Requirements

Some applications need specific capabilities. Instead of running as root, grant specific capabilities:

```dockerfile
FROM nginx:alpine

# Create non-root user
RUN adduser -D -u 1000 nginxuser

# Change nginx to listen on unprivileged port (8080 instead of 80)
RUN sed -i 's/listen\s*80;/listen 8080;/g' /etc/nginx/conf.d/default.conf && \
    sed -i 's/user\s*nginx;/user nginxuser;/g' /etc/nginx/nginx.conf

# Adjust permissions for non-root user
RUN chown -R nginxuser:nginxuser /var/cache/nginx /var/log/nginx /etc/nginx/conf.d

USER nginxuser

EXPOSE 8080

CMD ["nginx", "-g", "daemon off;"]
```

### Runtime Enforcement

Even if the Dockerfile specifies a user, enforce it at runtime:

```bash
# Explicitly specify user at runtime
docker run --user 1000:1000 -d myimage

# Prevent privilege escalation
docker run --security-opt=no-new-privileges:true -d myimage
```

**Best Practice**: Combine `USER` instruction in Dockerfile with `--user` flag at runtime for defense in depth.

## 10.2 Minimal Base Images

The attack surface of a container is directly proportional to the code it contains. Traditional base images like Ubuntu or Debian include hundreds of packages you don't need, each potentially containing vulnerabilities.

### Image Strategy Hierarchy

1. **Scratch**: Empty image (ideal for static binaries)
2. **Distroless**: Contains only runtime dependencies (Google's approach)
3. **Alpine**: Minimal Linux distribution (~5MB)
4. **Slim variants**: Debian/Ubuntu minimal versions
5. **Full distributions**: Only when necessary

### Using Distroless Images

Google's distroless images contain only your application and its runtime dependencies:

```dockerfile
# Build stage
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

# Production stage - Distroless
FROM gcr.io/distroless/nodejs18-debian11
WORKDIR /app

# Copy from builder with specific user permissions
COPY --from=builder --chown=nonroot:nonroot /app/node_modules ./node_modules
COPY --from=builder --chown=nonroot:nonroot /app/package*.json ./
COPY --chown=nonroot:nonroot . .

USER nonroot

EXPOSE 3000

CMD ["server.js"]
```

**Advantages of Distroless:**
- No shell (`sh`, `bash`) for attackers to access
- No package manager (`apt`, `apk`)
- No unnecessary utilities (`curl`, `wget`, `ssh`)
- Minimal attack surface

### Using Alpine Linux

When you need a package manager but want minimal size:

```dockerfile
FROM python:3.11-alpine

# Install security updates and required packages
RUN apk update && \
    apk upgrade && \
    apk add --no-cache gcc musl-dev && \
    rm -rf /var/cache/apk/*

# Create non-root user
RUN addgroup -g 1000 -S appgroup && \
    adduser -u 1000 -S appuser -G appgroup

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

USER appuser

CMD ["python", "app.py"]
```

**Important**: Alpine uses `musl libc` instead of `glibc`, which can cause compatibility issues with some applications. Test thoroughly.

### Multi-Stage Builds for Security

Separate build tools from runtime:

```dockerfile
# Build stage - Full toolchain
FROM golang:1.21 AS builder
WORKDIR /src
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o app

# Runtime stage - Scratch
FROM scratch
WORKDIR /app

# Copy CA certificates for HTTPS
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/

# Copy binary
COPY --from=builder /src/app .

# Use non-root user (must be numeric in scratch)
USER 1000:1000

EXPOSE 8080

ENTRYPOINT ["./app"]
```

### Image Verification

Always verify minimal images contain only what you need:

```bash
# Check image layers
docker history myimage

# Inspect image contents
docker run --rm -it myimage sh
# (For distroless, use debug tags or dive tool instead)

# Use dive tool for analysis
dive myimage:latest
```

## 10.3 Secrets Management in Containers

Hardcoding secrets (API keys, passwords, tokens) in images is a critical security violation. Secrets in image layers persist even if removed in later layers due to Docker's layered filesystem.

### The Layer Problem

```dockerfile
# WRONG - Secret remains in layer history
FROM node:18
ENV API_KEY="supersecret123"
RUN echo $API_KEY > /app/config.txt
RUN rm /app/config.txt  # File removed but still in previous layer
```

Anyone with image access can retrieve this:
```bash
docker history myimage
# Or extract layers
docker save myimage -o image.tar
tar -xf image.tar
```

### Build-Time Secrets (BuildKit)

Use Docker BuildKit to mount secrets that don't persist in layers:

```dockerfile
# syntax=docker/dockerfile:1
FROM node:18
WORKDIR /app
COPY package*.json ./
RUN npm ci

# Mount secret during build, not persisted in image
RUN --mount=type=secret,id=npmrc,target=/root/.npmrc \
    npm install private-package

COPY . .
RUN npm run build
```

Build with secret:
```bash
docker build --secret id=npmrc,src=$HOME/.npmrc -t myapp .
```

### Runtime Secrets Management

#### Option 1: Environment Variables (Basic)

```dockerfile
FROM python:3.11-alpine
WORKDIR /app
COPY . .
USER appuser
CMD ["python", "app.py"]
```

Inject at runtime:
```bash
docker run -e DATABASE_PASSWORD="secret" -e API_KEY="key123" myapp
```

**Warning**: Environment variables are visible in `docker inspect` and process lists (`ps e`).

#### Option 2: Docker Secrets (Swarm Mode)

```yaml
version: '3.8'
services:
  web:
    image: myapp
    secrets:
      - db_password
      - api_key
secrets:
  db_password:
    external: true
  api_key:
    external: true
```

Access in container:
```bash
# Mounted at /run/secrets/db_password
cat /run/secrets/db_password
```

#### Option 3: Read-Only Secret Files

Mount secrets as read-only files:

```bash
docker run \
  --mount type=bind,source=/host/secrets/db-password,target=/app/secrets/db-password,readonly \
  --mount type=bind,source=/host/secrets/api-key,target=/app/secrets/api-key,readonly \
  myapp
```

### Application-Level Handling

```python
# Python example - Secure secret handling
import os
from pathlib import Path

def get_secret(secret_name):
    """Read secret from file or env var"""
    # Try Docker secrets location first
    secret_path = f"/run/secrets/{secret_name}"
    if Path(secret_path).exists():
        return Path(secret_path).read_text().strip()
    
    # Fallback to environment (development only)
    return os.environ.get(secret_name)

# Usage
db_password = get_secret("db_password")
```

## 10.4 Scanning for Vulnerabilities

Vulnerability scanning should be integrated into every stage of the CI/CD pipeline (Shift Left Security).

### Image Scanning Tools

#### Trivy (Recommended)

Fast, comprehensive scanner:

```bash
# Install Trivy
# Scan image
trivy image myapp:latest

# Scan with severity filter
trivy image --severity HIGH,CRITICAL myapp:latest

# Scan filesystem (for Dockerfile)
trivy filesystem .
```

Integration in CI/CD (GitHub Actions):
```yaml
name: Security Scan
on: [push]
jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Build image
        run: docker build -t myapp:${{ github.sha }} .
      
      - name: Run Trivy scanner
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: 'myapp:${{ github.sha }}'
          format: 'sarif'
          output: 'trivy-results.sarif'
          severity: 'CRITICAL,HIGH'
      
      - name: Upload results
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: 'trivy-results.sarif'
```

#### Clair

Static analysis engine:

```bash
# Using Clair scanner
clair-scanner -c http://clair:6060 --ip scanner-ip myimage:latest
```

#### Snyk

Developer-friendly with fix recommendations:

```bash
# Scan Dockerfile
snyk container test myapp:latest --file=Dockerfile

# Monitor for new vulnerabilities
snyk container monitor myapp:latest
```

### Scanning Strategy

1. **Base Image Scanning**: Scan before building
2. **Build Scanning**: Scan intermediate layers
3. **Registry Scanning**: Continuous monitoring of pushed images
4. **Runtime Scanning**: Scan running containers

### Handling Vulnerabilities

**Severity Classification**:
- **CRITICAL**: Fix immediately, block deployment
- **HIGH**: Fix within 24-48 hours
- **MEDIUM**: Fix within sprint cycle
- **LOW**: Track and assess

```bash
# Fail build on critical vulnerabilities
trivy image --exit-code 1 --severity CRITICAL myapp:latest
```

## 10.5 Docker Content Trust

Docker Content Trust (DCT) uses The Update Framework (TUF) to provide cryptographic verification of image tags, ensuring you deploy exactly what was signed.

### Enabling DCT

```bash
# Enable in current shell
export DOCKER_CONTENT_TRUST=1

# Or permanently
echo "export DOCKER_CONTENT_TRUST=1" >> ~/.bashrc
```

### Signing Images

```bash
# Initialize trust for repository (one-time)
docker trust sign myregistry/myimage:tag

# Push and sign
docker push myregistry/myimage:tag

# Inspect signatures
docker trust inspect --pretty myregistry/myimage:tag
```

### Notary (Standalone)

For advanced use cases:

```bash
# Add signer
notary -s https://notary-server:4443 \
  -d ~/.docker/trust \
  delegation add myregistry/myimage targets/releases \
  --all-paths \
  cert.pem

# Sign specific target
notary -s https://notary-server:4443 \
  -d ~/.docker/trust \
  sign myregistry/myimage tag
```

### Verification in CI/CD

```bash
#!/bin/bash
# deployment-script.sh

export DOCKER_CONTENT_TRUST=1

# This will fail if image is not signed or signature invalid
docker pull myregistry/myimage:production

if [ $? -ne 0 ]; then
    echo "Security Error: Image signature verification failed"
    exit 1
fi

# Proceed with deployment
kubectl apply -f deployment.yaml
```

## 10.6 Network Security

Default Docker networking may expose containers unnecessarily. Implement defense in depth through network segmentation.

### Network Modes Security

**Avoid `--network host`**:
- Removes network isolation
- Container shares host network stack
- Can access host services

**Avoid `--privileged` with network**:
- Allows modification of network interfaces
- Can change iptables rules

### Custom Bridge Networks

Create isolated networks:

```bash
# Create isolated network
docker network create --driver bridge --subnet 172.28.0.0/16 backend-network

# Run database in isolated network
docker run -d \
  --name postgres \
  --network backend-network \
  --env POSTGRES_PASSWORD_FILE=/run/secrets/db-password \
  postgres:15

# Run application with both frontend and backend access
docker run -d \
  --name webapp \
  --network frontend-network \
  --network backend-network \
  -p 8080:8080 \
  myapp
```

### Inter-Container Communication

Disable inter-container communication when not needed:

```bash
# Prevent containers from talking to each other (unless linked)
docker run -d --name webapp --icc=false myimage
```

**Note**: This requires setting `"icc": false` in daemon.json, which affects the default bridge network.

### Port Exposure Best Practices

```dockerfile
# Expose only necessary ports
EXPOSE 8080/tcp

# Don't expose database ports externally
# Instead, use container linking or networks
```

Runtime port mapping:
```bash
# Good - Bind to specific interface
docker run -p 127.0.0.1:8080:8080 myapp

# Bad - Exposes on all interfaces (0.0.0.0)
docker run -p 8080:8080 myapp
```

### TLS/SSL Encryption

For sensitive data in transit:

```dockerfile
FROM nginx:alpine

# Copy certificates
COPY certs/server.crt /etc/nginx/ssl/
COPY certs/server.key /etc/nginx/ssl/
RUN chmod 600 /etc/nginx/ssl/server.key

# Configure HTTPS
COPY nginx-ssl.conf /etc/nginx/conf.d/default.conf

EXPOSE 443
```

## 10.7 Resource Constraints

Unconstrained containers can cause Denial of Service (DoS) through resource exhaustion.

### Memory Limits

Prevent OOM (Out of Memory) killer from affecting host:

```bash
# Hard limit
docker run -m 512m --memory-swap 512m myapp

# Soft limit (reservation)
docker run -m 1g --memory-reservation 512m myapp

# Disable OOM killer (use with caution)
docker run -m 512m --oom-kill-disable myapp
```

In Docker Compose:
```yaml
services:
  web:
    image: myapp
    deploy:
      resources:
        limits:
          memory: 512M
        reservations:
          memory: 256M
```

### CPU Constraints

```bash
# Limit to 1.5 CPUs
docker run --cpus="1.5" myapp

# Limit to specific cores
docker run --cpuset-cpus="0,1" myapp

# Shares (relative weight)
docker run --cpu-shares=512 myapp  # Default is 1024
```

### Process Limits

Prevent fork bombs:

```bash
# Limit number of processes
docker run --ulimit nproc=512 myapp

# Limit file descriptors
docker run --ulimit nofile=1024:2048 myapp
```

### Disk I/O Constraints

```bash
# Limit read/write bandwidth
docker run --device-read-bps /dev/sda:1mb --device-write-bps /dev/sda:1mb myapp

# Limit IOPS
docker run --device-read-iops /dev/sda:1000 myapp
```

### Storage Driver Limits

```bash
# Limit container size
docker run --storage-opt size=10G myapp
```

**Note**: Only supported on certain storage drivers (e.g., `devicemapper`, `btrfs`, `zfs`).

## 10.8 Security Profiles (AppArmor, Seccomp)

Linux Security Modules (LSM) provide mandatory access control beyond standard Unix permissions.

### AppArmor Profiles

AppArmor restricts programs' capabilities with per-program profiles.

**Default Docker Profile**: Docker applies a default AppArmor profile (`docker-default`), but custom profiles provide granular control.

Creating a custom profile:

```bash
# /etc/apparmor.d/docker-custom
#include <tunables/global>

profile docker-custom flags=(attach_disconnected,mediate_deleted) {
  #include <abstractions/base>
  
  # Deny network raw (prevent packet sniffing)
  deny network raw,
  
  # Deny mount operations
  deny mount,
  
  # Allow specific paths only
  /app/** r,
  /app/data/** rw,
  
  # Deny access to sensitive files
  deny /etc/shadow r,
  deny /proc/** w,
  deny /sys/** w,
  
  # Capability restrictions
  capability net_admin denied,
  capability sys_admin denied,
}
```

Load and use:
```bash
# Load profile
apparmor_parser -r -W /etc/apparmor.d/docker-custom

# Run with profile
docker run --security-opt apparmor=docker-custom myapp
```

### Seccomp (Secure Computing Mode)

Seccomp filters system calls to reduce kernel attack surface.

**Default Profile**: Docker uses a default seccomp profile that disables ~44 syscalls.

Custom seccomp profile:

```json
{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": [
    "SCMP_ARCH_X86_64",
    "SCMP_ARCH_X86",
    "SCMP_ARCH_AARCH64"
  ],
  "syscalls": [
    {
      "names": [
        "accept",
        "accept4",
        "bind",
        "clone",
        "close",
        "connect",
        "epoll_create",
        "epoll_create1",
        "epoll_ctl",
        "epoll_pwait",
        "epoll_wait",
        "exit",
        "exit_group",
        "fcntl",
        "fstat",
        "fsync",
        "futex",
        "getpid",
        "getrandom",
        "ioctl",
        "listen",
        "mmap",
        "mprotect",
        "munmap",
        "nanosleep",
        "open",
        "openat",
        "poll",
        "read",
        "recvfrom",
        "recvmsg",
        "rt_sigaction",
        "rt_sigprocmask",
        "rt_sigreturn",
        "select",
        "sendmsg",
        "sendto",
        "setitimer",
        "setsockopt",
        "sigaltstack",
        "socket",
        "socketpair",
        "write",
        "writev"
      ],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}
```

Usage:
```bash
# Use custom seccomp
docker run --security-opt seccomp=custom-profile.json myapp

# Disable seccomp (not recommended)
docker run --security-opt seccomp=unconfined myapp
```

### Dropping Capabilities

Linux capabilities divide root privileges into distinct units. Drop all, then add only what's needed:

```bash
# Drop all capabilities
docker run --cap-drop=ALL myapp

# Add only specific capabilities
docker run --cap-drop=ALL --cap-add=NET_BIND_SERVICE myapp

# Common capability sets
# Web server (bind to port 80):
docker run --cap-drop=ALL --cap-add=NET_BIND_SERVICE nginx

# Database (adjust process priorities):
docker run --cap-drop=ALL --cap-add=SYS_NICE postgres

# No new privileges
docker run --security-opt=no-new-privileges:true myapp
```

### Combining Security Options

Defense in depth example:

```bash
docker run \
  --user 1000:1000 \
  --security-opt no-new-privileges:true \
  --security-opt apparmor=docker-custom \
  --security-opt seccomp=custom-profile.json \
  --cap-drop=ALL \
  --cap-add=NET_BIND_SERVICE \
  --read-only \
  --tmpfs /tmp:noexec,nosuid,size=100m \
  --memory 512m \
  --cpus 1.0 \
  --pids-limit 100 \
  -d myapp
```

### Read-Only Root Filesystem

Prevent runtime modifications:

```bash
docker run --read-only myapp
```

For applications needing temporary writes:

```bash
docker run \
  --read-only \
  --tmpfs /tmp:noexec,nosuid,size=100m \
  --tmpfs /var/cache:noexec,nosuid,size=50m \
  myapp
```

Or use volumes for specific writable areas:

```bash
docker run \
  --read-only \
  -v app-data:/app/data \
  myapp
```

---

## Chapter Summary and Preview

In this chapter, we established comprehensive security practices for Docker containers, covering the principle of least privilege through non-root execution, attack surface reduction via minimal base images, secrets management avoiding layer persistence, vulnerability scanning integration, cryptographic verification with Docker Content Trust, network segmentation strategies, resource-based DoS prevention, and kernel-level hardening using AppArmor and Seccomp.

**Key Takeaways:**
- Never run containers as root; always specify USER directives with numeric IDs
- Prefer distroless or Alpine images to minimize attack surface
- Use BuildKit secrets for build-time credentials and runtime secret injection for operational secrets
- Scan images at every pipeline stage with tools like Trivy or Clair
- Enable Docker Content Trust to ensure image authenticity
- Implement defense in depth combining capabilities dropping, security profiles, and read-only filesystems

**Next Chapter Preview:**
Chapter 11: Docker Registries will build upon these security foundations by teaching you how to securely store and distribute your hardened images. You'll learn about private registry options (AWS ECR, Google GCR, Azure ACR, Harbor), authentication mechanisms, image tagging strategies for traceability, and how to implement vulnerability scanning at the registry level to prevent compromised images from ever reaching your Kubernetes clusters.

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='9. docker_image_optimization.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='11. docker_registries.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
