# Chapter 55: CI/CD Best Practices

Best practices emerge from accumulated experience—the patterns that consistently produce reliable, secure, and maintainable software delivery, and the anti-patterns that predictably lead to outages, security breaches, and technical debt. This chapter synthesizes the principles explored throughout this handbook into actionable guidelines. These practices are not arbitrary rules but derived constraints: they optimize for flow, feedback, and resilience. We examine the **Twelve-Factor App** methodology as applied to CI/CD systems, **immutable infrastructure** that eliminates configuration drift, **infrastructure as code** for reproducible environments, **small frequent changes** that reduce risk and improve debuggability, **automation of toil** that eliminates manual steps and their associated errors, **fail-fast** mechanisms that surface errors immediately rather than propagating them downstream, **repeatability** through hermetic builds that produce identical artifacts from identical inputs, and **simplicity** as the ultimate sophistication—resisting the urge to over-engineer solutions. We contrast these with anti-patterns: the **monolithic pipeline** that couples unrelated services, **manual interventions** that break automation, **hardcoded configuration** that prevents environment portability, **skipped tests** in the name of speed, **bloated container images** that increase attack surface and startup time, **over-engineered abstractions** that obscure rather than clarify, **siloed team structures** that create handoffs and delays, and **security as an afterthought** that results in bolted-on rather than built-in protections.

## 55.1 The Twelve-Factor App

The Twelve-Factor App methodology, originally designed for SaaS applications, applies directly to CI/CD systems themselves.

### 1. Codebase

**One codebase tracked in revision control, many deploys**

**Practice**: Each pipeline definition lives in the same repository as the application it builds.

```yaml
# Good: Pipeline in app repo
my-service/
├── src/
├── tests/
├── Dockerfile
├── .github/
│   └── workflows/
│       └── deploy.yml  # Pipeline with code
└── README.md

# Bad: Centralized pipeline repo
ci-cd-repo/
├── pipelines/
│   └── my-service.yml  # Separated from code
```

**Rationale**: Version coupling ensures that pipeline changes are reviewed with code changes, and historical builds can be reproduced by checking out the corresponding commit.

### 2. Dependencies

**Explicitly declare and isolate dependencies**

**Practice**: Pin all tool versions in pipelines; use lock files.

```yaml
# Good: Explicit versions
jobs:
  build:
    runs-on: ubuntu-22.04  # Specific OS version
    steps:
      - uses: actions/checkout@v4.1.1  # Pinned action version
      
      - name: Setup Node.js
        uses: actions/setup-node@v4.0.1
        with:
          node-version: '20.10.0'  # Exact version
          cache: 'npm'
      
      - name: Install dependencies
        run: npm ci  # Uses package-lock.json
      
      - name: Build
        run: npm run build

# Bad: Floating versions
      - uses: actions/checkout@v4  # May break on v4.2.0
      - uses: actions/setup-node@v4
        with:
          node-version: '20'  # Gets latest 20.x, may have breaking changes
      - run: npm install  # Ignores lock file
```

### 3. Config

**Store config in environment**

**Practice**: No configuration in Docker images; inject at runtime.

```yaml
# Good: External configuration
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  DATABASE_URL: "postgres://db:5432/app"
  CACHE_TTL: "300"
---
apiVersion: v1
kind: Secret
metadata:
  name: app-secrets
stringData:
  API_KEY: "${API_KEY}"  # Injected from vault
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app
spec:
  template:
    spec:
      containers:
      - name: app
        image: myapp:latest  # No config baked in
        envFrom:
        - configMapRef:
            name: app-config
        - secretRef:
            name: app-secrets

# Bad: Config baked into image
# Dockerfile
FROM node:20
COPY config/production.yml /app/config.yml  # Environment-specific config in image
ENV NODE_ENV=production
CMD ["node", "server.js"]
```

### 4. Backing Services

**Treat backing services as attached resources**

**Practice**: Databases, caches, message queues are swappable via configuration.

```yaml
# Database configuration abstracted
spring:
  datasource:
    url: ${DATABASE_URL:jdbc:h2:mem:testdb}  # Default to H2 for local dev
    username: ${DATABASE_USER:sa}
    password: ${DATABASE_PASSWORD:}
  jpa:
    hibernate:
      ddl-auto: validate  # Never auto-create in production

# Pipeline handles different environments
jobs:
  test:
    services:
      postgres:
        image: postgres:15-alpine
        env:
          POSTGRES_PASSWORD: postgres
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
        ports:
          - 5432:5432
```

### 5. Build, Release, Run

**Strictly separate build and run stages**

**Practice**: Build produces artifact; release combines artifact with config; run executes.

```yaml
# Strict separation
stages:
  - name: build
    script:
      - docker build -t $IMAGE:$CI_COMMIT_SHA .
      - docker push $IMAGE:$CI_COMMIT_SHA
    # Only build, no config injection
    
  - name: release
    script:
      - helm upgrade --install app ./chart 
          --set image.tag=$CI_COMMIT_SHA
          --set environment=staging
          --values values-staging.yaml
    # Combine artifact with environment config
    
  - name: run
    # Kubernetes handles execution
    # No build or release logic here
```

### 6. Processes

**Execute the app as one or more stateless processes**

**Practice**: Build agents are stateless; state stored in external systems.

```yaml
# Stateless build pod
apiVersion: v1
kind: Pod
metadata:
  name: build-agent
spec:
  containers:
  - name: builder
    image: build-agent:latest
    env:
    - name: BUILD_ID
      valueFrom:
        fieldRef:
          fieldPath: metadata.name
    volumeMounts:
    - name: tmp
      mountPath: /tmp
    # No local state; all artifacts pushed to registry/S3
  volumes:
  - name: tmp
    emptyDir: {}  # Ephemeral, lost on pod termination
```

### 7. Port Binding

**Export services via port binding**

**Practice**: CI/CD services expose ports, not files or sockets.

```yaml
# Services expose HTTP/gRPC, not OS-specific mechanisms
services:
  jenkins:
    ports:
      - "8080:8080"  # Web UI
      - "50000:50000"  # Agent protocol
      
  argocd:
    ports:
      - "80:8080"  # HTTP
      - "443:8080"  # HTTPS
      
  prometheus:
    ports:
      - "9090:9090"  # Query API
```

### 8. Concurrency

**Scale out via the process model**

**Practice**: Scale build agents horizontally, not vertically.

```yaml
# Horizontal scaling of stateless builders
apiVersion: apps/v1
kind: Deployment
metadata:
  name: github-runner
spec:
  replicas: 10  # Scale horizontally
  template:
    spec:
      containers:
      - name: runner
        image: runner:latest
        resources:
          requests:
            cpu: "1000m"
            memory: "2Gi"
          limits:
            cpu: "2000m"
            memory: "4Gi"
        # No shared state between replicas
```

### 9. Disposability

**Maximize robustness with fast startup and graceful shutdown**

**Practice**: Build agents start quickly and finish jobs before termination.

```yaml
# Fast startup, graceful shutdown
spec:
  containers:
  - name: runner
    image: runner:latest
    lifecycle:
      preStop:
        exec:
          command:
          - /bin/sh
          - -c
          - |
            # Graceful shutdown: finish current job
            echo "Received SIGTERM, finishing current job..."
            /actions-runner/bin/Runner.Listener remove --token ${TOKEN}
            sleep 10  # Allow time for cleanup
    resources:
      requests:
        cpu: "500m"
        memory: "512Mi"
      limits:
        cpu: "2000m"
        memory: "2Gi"
```

### 10. Dev/Prod Parity

**Keep development, staging, and production as similar as possible**

**Practice**: Use same container images, same Helm charts, different values.

```yaml
# Same chart, different values
helm/
├── Chart.yaml
├── templates/
│   ├── deployment.yaml
│   ├── service.yaml
│   └── ingress.yaml
└── values/
    ├── values.yaml           # Base defaults
    ├── values-staging.yaml    # Staging overrides
    └── values-production.yaml # Production overrides

# Deployment uses same image, different replica count
# Staging: 1 replica, Production: 10 replicas
# Same probes, same security contexts
```

### 11. Logs

**Treat logs as event streams**

**Practice**: Structured logging, centralized aggregation, no local files.

```yaml
# Structured JSON logging
logging:
  format: json
  level: INFO
  fields:
    - timestamp
    - level
    - service
    - trace_id
    - message
    - context

# Pipeline logs streamed to ELK/Loki
- name: Stream Logs
  run: |
    ./build.sh 2>&1 | \
    jq -R -r '. as $line | try (fromjson) catch $line' | \
    fluent-bit -o es://logs.company.com:9200
```

### 12. Admin Processes

**Run admin/management tasks as one-off processes**

**Practice**: Database migrations, cache clears as Jobs, not long-running containers.

```yaml
# One-off admin job
apiVersion: batch/v1
kind: Job
metadata:
  name: db-migrate-v2-5
spec:
  template:
    spec:
      containers:
      - name: migrate
        image: myapp:v2.5.0
        command: ["python", "manage.py", "migrate"]
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: db-credentials
              key: url
      restartPolicy: Never
  backoffLimit: 2
```

## 55.2 Immutable Infrastructure

Immutable infrastructure replaces mutation with replacement. Instead of patching running servers, we provision new instances and decommission old ones.

### Principles

**No SSH to Production**: If you cannot log in, you cannot manually break things.

**Replace, Don't Modify**: New deployments create new resources; old resources are terminated.

**Version Everything**: Every infrastructure change is versioned and reversible.

### Implementation

**Immutable Servers**:
```yaml
# Packer template for golden AMI
{
  "builders": [{
    "type": "amazon-ebs",
    "region": "us-east-1",
    "source_ami": "ami-0c55b159cbfafe1f0",
    "instance_type": "t3.medium",
    "ssh_username": "ubuntu",
    "ami_name": "ci-runner-{{timestamp}}",
    "tags": {
      "Name": "ci-runner",
      "Version": "{{user `version`}}",
      "BaseAMI": "{{ .SourceAMI }}"
    }
  }],
  "provisioners": [
    {
      "type": "shell",
      "inline": [
        "sudo apt-get update",
        "sudo apt-get install -y docker.io",
        "sudo usermod -aG docker ubuntu"
      ]
    },
    {
      "type": "file",
      "source": "./scripts/health-check.sh",
      "destination": "/home/ubuntu/health-check.sh"
    }
  ]
}
```

**Immutable Containers**:
```dockerfile
# Multi-stage build for immutable, minimal image
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

FROM gcr.io/distroless/nodejs20-debian12
COPY --from=builder /app/node_modules /app/node_modules
COPY . /app
WORKDIR /app
USER nonroot:nonroot
EXPOSE 8080
# No shell, no package manager, immutable filesystem
CMD ["server.js"]
```

**Immutable Infrastructure as Code**:
```yaml
# Terraform with immutable replacement strategy
resource "aws_launch_template" "ci_runner" {
  name_prefix   = "ci-runner-"
  image_id      = data.aws_ami.golden_ami.id
  instance_type = "m6i.xlarge"
  
  # Immutable: new version = new launch template version
  lifecycle {
    create_before_destroy = true
  }
  
  tag_specifications {
    resource_type = "instance"
    tags = {
      Name = "ci-runner"
      Version = timestamp()  # Immutable version tag
    }
  }
}

# Blue/Green deployment of infrastructure
resource "aws_autoscaling_group" "ci_runners_blue" {
  name                = "ci-runners-blue-${formatdate("YYYYMMDD", timestamp())}"
  launch_template {
    id      = aws_launch_template.ci_runner.id
    version = "$Latest"
  }
  min_size         = 3
  max_size         = 50
  desired_capacity = 10
  
  # Health checks ensure new ASG healthy before destroying old
  health_check_type         = "EC2"
  health_check_grace_period = 300
  
  tag {
    key                 = "Name"
    value               = "ci-runner-blue"
    propagate_at_launch = true
  }
}

# After blue verified healthy, destroy green (previous)
```

## 55.3 Infrastructure as Code (IaC)

IaC is the practice of managing infrastructure through machine-readable definition files rather than manual configuration.

### Principles

**Declarative over Imperative**: Define desired state, let system converge.

**Version Control**: All infrastructure changes tracked in Git.

**Idempotency**: Applying the same configuration multiple times produces the same result.

**Modularity**: Reusable components (modules) for common patterns.

### Implementation

**Terraform Best Practices**:
```hcl
# modules/ci-cd-platform/main.tf
# Reusable, documented module

variable "environment" {
  description = "Deployment environment (dev, staging, prod)"
  type        = string
  
  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Environment must be dev, staging, or prod."
  }
}

variable "github_org" {
  description = "GitHub organization name"
  type        = string
  default     = "myorg"
}

locals {
  # Common tags applied to all resources
  common_tags = {
    Environment = var.environment
    ManagedBy   = "terraform"
    Team        = "platform"
    CostCenter  = "engineering"
  }
}

# EKS Cluster for CI/CD
resource "aws_eks_cluster" "ci_cluster" {
  name     = "ci-${var.environment}"
  role_arn = aws_iam_role.cluster.arn
  version  = "1.28"
  
  vpc_config {
    subnet_ids             = var.private_subnet_ids
    endpoint_private_access = true
    endpoint_public_access  = var.environment == "dev" ? true : false
    public_access_cidrs     = var.environment == "dev" ? ["0.0.0.0/0"] : []
  }
  
  enabled_cluster_log_types = ["api", "audit", "authenticator"]
  
  tags = local.common_tags
  
  # Ensure proper ordering
  depends_on = [
    aws_iam_role_policy_attachment.cluster_policies,
  ]
}

# Outputs for consumers
output "cluster_endpoint" {
  description = "EKS cluster endpoint for kubectl configuration"
  value       = aws_eks_cluster.ci_cluster.endpoint
  sensitive   = false
}

output "cluster_ca_certificate" {
  description = "Base64 encoded CA certificate for cluster authentication"
  value       = aws_eks_cluster.ci_cluster.certificate_authority[0].data
  sensitive   = true
}
```

**Documentation Generation from Terraform**:
```bash
# terraform-docs generates markdown from code
terraform-docs markdown . --output-file TERRAFORM.md

# Generated output:
## Requirements

| Name | Version |
|------|---------|
| terraform | >= 1.0 |
| aws | ~> 5.0 |

## Providers

| Name | Version |
|------|---------|
| aws | ~> 5.0 |

## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| environment | Deployment environment | `string` | n/a | yes |
| github_org | GitHub organization | `string` | `"myorg"` | no |
```

## 55.4 Small, Frequent Changes

Large changes are risky; small changes are predictable. The goal is to reduce batch size to the smallest deployable unit.

### Practices

**Feature Flags over Long-Lived Branches**:
```python
# Bad: Long-lived feature branch
# git checkout -b huge-feature
# 3 months of work
# git merge main (conflict hell)
# deploy (high risk)

# Good: Trunk-based with feature flags
# if feature_enabled("new-payment-flow"):
#     new_code()
# else:
#     old_code()

# Deploy small changes continuously
```

**Micro-Commits**:
```bash
# Good: Logical, small commits
git commit -m "Add validation for negative amounts"
git commit -m "Update error message for clarity"
git commit -m "Add test for edge case"

# Bad: Giant commit
git commit -m "WIP - all changes from last week"
```

**Automated Canary Analysis**:
```yaml
# Small percentage, automatic promotion
canary:
  steps:
    - setWeight: 5
    - pause: {duration: 10m}
    - analysis:
        templates:
        - templateName: success-rate
        args:
        - name: service-name
          value: payment-service
    - setWeight: 25
    - pause: {duration: 10m}
    - analysis:
        templates:
        - templateName: success-rate
    - setWeight: 100
```

## 55.5 Automate Everything

Manual steps introduce variance and delay. If a human does it twice, automate it.

### Eliminating Toil

**Definition of Toil**: Manual, repetitive, automatable, tactical work with no enduring value.

**Automation Examples**:

**Dependency Updates**:
```yaml
# Dependabot/Renovate configuration
version: 2
updates:
  - package-ecosystem: "npm"
    directory: "/"
    schedule:
      interval: "daily"
    open-pull-requests-limit: 10
    labels:
      - "dependencies"
      - "automated"
    # Auto-merge if tests pass and minor version
    auto-merge-conditions:
      - "semver:minor"
      - "status:success"
```

**Certificate Rotation**:
```yaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: api-tls
spec:
  secretName: api-tls-secret
  issuerRef:
    name: letsencrypt-prod
  dnsNames:
  - api.company.com
  # Automatic renewal 30 days before expiry
  renewBefore: 720h
```

**Cleanup**:
```bash
#!/bin/bash
# cleanup-stale-resources.sh

# Delete old preview environments
kubectl get namespaces -l purpose=preview -o json | \
  jq -r '.items[] | select(.metadata.creationTimestamp | fromdateiso8601 < now - 604800) | .metadata.name' | \
  xargs -r kubectl delete namespace

# Clean old container images (keep last 10)
aws ecr describe-images --repository-name myapp --query 'sort_by(imageDetails,& imagePushedAt)[*].imageDigest' | \
  jq -r '.[0:-10][]' | \
  while read digest; do
    aws ecr batch-delete-image --repository-name myapp --image-ids imageDigest=$digest
  done
```

## 55.6 Fail Fast

Errors should surface immediately, not propagate downstream where they cause cascading failures or require expensive debugging.

### Implementation

**Fast Feedback in Pipelines**:
```yaml
# Order by speed and importance
jobs:
  # 1. Fastest: Linting (30 seconds)
  lint:
    steps:
      - run: npm run lint
  
  # 2. Fast: Unit tests (2 minutes)
  unit-test:
    needs: lint
    steps:
      - run: npm test
  
  # 3. Slower: Security scan (5 minutes)
  security:
    needs: lint  # Parallel with unit tests
    steps:
      - run: trivy fs .
  
  # 4. Slowest: Integration tests (10 minutes)
  integration:
    needs: [unit-test, security]
    steps:
      - run: npm run test:integration
```

**Health Checks**:
```yaml
# Fail fast on unhealthy dependencies
readinessProbe:
  httpGet:
    path: /health/ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5
  failureThreshold: 3  # Fail after 15 seconds, not 5 minutes

livenessProbe:
  httpGet:
    path: /health/live
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 10
  failureThreshold: 3
```

**Circuit Breakers**:
```yaml
# Don't keep trying if service is down
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: payment-service
spec:
  host: payment-service
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 100
        maxRequestsPerConnection: 10
    outlierDetection:
      consecutiveErrors: 5
      interval: 30s
      baseEjectionTime: 30s  # Remove from pool for 30s after 5 errors
```

## 55.7 Make It Repeatable

Hermetic builds produce identical outputs from identical inputs, enabling reproducibility and debugging.

### Hermetic Builds

**Deterministic Dependencies**:
```dockerfile
# Bad: Non-deterministic
FROM node:latest
RUN npm install  # Installs latest versions matching package.json

# Good: Deterministic
FROM node:20.10.0@sha256:abc123...  # Pinned base image
COPY package-lock.json ./
RUN npm ci  # Uses exact versions from lock file
```

**Locked Toolchains**:
```yaml
# Bazel for hermetic builds
# WORKSPACE
workspace(name = "myapp")

# Pin all dependencies with SHA256
http_archive(
    name = "rules_docker",
    sha256 = "b1e80761a8a8243d033eb002b1d586e5b6e07281f4e2a8f0b83c5e5e8e6e7f8a",
    urls = ["https://github.com/bazelbuild/rules_docker/releases/download/v0.25.0/rules_docker-v0.25.0.tar.gz"],
)

# BUILD file
container_image(
    name = "app_image",
    base = "@java_base//image",
    files = [":app_deploy.jar"],
    cmd = ["java", "-jar", "/app_deploy.jar"],
)
```

**Reproducible Builds**:
```bash
# Verify reproducibility
docker build -t app:build1 .
docker build -t app:build2 .

# Should have same digest
docker inspect app:build1 --format='{{.Id}}'
docker inspect app:build2 --format='{{.Id}}'

# If different, investigate non-determinism
# (timestamps, random IDs, unordered maps)
```

## 55.8 Keep It Simple

Complexity is the enemy of reliability. Prefer simple solutions that are understood over clever solutions that are opaque.

### Simplicity Principles

**YAGNI (You Aren't Gonna Need It)**:
```yaml
# Bad: Over-engineered for future needs
jobs:
  build:
    strategy:
      matrix:
        os: [ubuntu, windows, macos]  # Only need Linux now
        arch: [amd64, arm64]  # Only need amd64 now
        compiler: [gcc, clang]  # Only need gcc now
    runs-on: ${{ matrix.os }}
    
# Good: Simple, add complexity when needed
jobs:
  build:
    runs-on: ubuntu-latest
```

**Explicit over Implicit**:
```yaml
# Bad: Magic behavior
- name: Deploy
  run: deploy.sh  # What does this do? Where does it deploy?

# Good: Explicit parameters
- name: Deploy to Staging
  run: |
    helm upgrade --install payment-service ./helm/payment-service \
      --namespace staging \
      --set image.tag=${{ github.sha }} \
      --wait \
      --timeout 5m
```

**Single Responsibility**:
```yaml
# Bad: One pipeline does everything
jobs:
  build-test-deploy-notify-report:
    steps:
      - build
      - test
      - deploy
      - notify-slack
      - generate-report
      - update-dashboard

# Good: Separate concerns
# build.yml - builds artifact
# test.yml - runs tests
# deploy.yml - handles deployment
# notify.yml - sends notifications
```

### Anti-Patterns to Avoid

**The Monolithic Pipeline**:
```yaml
# Anti-pattern: One giant workflow for all services
# .github/workflows/monolith.yml
on:
  push:
    paths:
      - '**/*'  # Triggers on any change

jobs:
  build-all:
    runs-on: ubuntu-latest
    steps:
      - run: build service-a  # Builds even if only service-z changed
      - run: build service-b
      - run: build service-c
      # ... 50 more services
      - run: test all
      - run: deploy all  # All or nothing deployment

# Better: Separate workflows per service with path filters
# .github/workflows/service-a.yml
on:
  push:
    paths:
      - 'services/a/**'
      - '.github/workflows/service-a.yml'
```

**Manual Interventions**:
```yaml
# Anti-pattern: Manual approval gates that block automation
jobs:
  deploy:
    steps:
      - name: Deploy to Staging
        run: deploy staging
      
      - name: Wait for manual approval
        uses: manual-approval-action  # Blocks pipeline for hours
      
      - name: Deploy to Production
        run: deploy prod

# Better: Automated gates based on metrics
jobs:
  deploy-staging:
    steps:
      - deploy staging
      - run: smoke-tests
      - run: security-scan
  
  deploy-production:
    needs: deploy-staging
    if: github.ref == 'refs/heads/main'
    steps:
      - deploy canary 5%
      - run: automated-verification  # Metrics-based go/no-go
      - deploy 100%  # Automatic if metrics pass
```

**Hardcoded Configuration**:
```yaml
# Anti-pattern: Environment-specific values in code
# config/production.yml
database:
  host: prod-db-01.company.internal  # Hardcoded
  port: 5432
  password: SuperSecret123!  # Hardcoded secret

# Better: Templated with environment variables
database:
  host: ${DATABASE_HOST}
  port: ${DATABASE_PORT}
  password: ${DATABASE_PASSWORD}  # Injected from vault

# CI/CD injects per environment
env:
  DATABASE_HOST: ${{ secrets.PROD_DB_HOST }}
```

**Skipping Tests**:
```yaml
# Anti-pattern: Bypassing tests in "urgent" situations
jobs:
  deploy-hotfix:
    steps:
      - checkout
      # - run: tests  # Commented out "temporarily"
      - deploy production

# Better: Fast feedback with targeted tests
jobs:
  deploy-hotfix:
    steps:
      - checkout
      - run: affected-tests --since=main  # Only test changed code
      - run: critical-path-tests  # Always run smoke tests
      - deploy production
```

**Bloated Container Images**:
```dockerfile
# Anti-pattern: Including unnecessary tools
FROM ubuntu:latest  # Large base
RUN apt-get update && apt-get install -y \
    vim \           # Editor in production image?
    curl \          # Debugging tool
    net-tools \     # Network debugging
    gcc \           # Compiler not needed at runtime
    && rm -rf /var/lib/apt/lists/*

COPY . /app
CMD ["python", "app.py"]

# Better: Minimal, purpose-built image
FROM python:3.11-slim-bookworm AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

FROM gcr.io/distroless/python3-debian12
COPY --from=builder /root/.local /home/nonroot/.local
COPY --chown=nonroot:nonroot . /app
WORKDIR /app
USER nonroot
ENV PATH=/home/nonroot/.local/bin:$PATH
CMD ["python", "-m", "app"]
```

**Over-Engineering**:
```yaml
# Anti-pattern: Abstraction for hypothetical future needs
jobs:
  deploy:
    uses: ./.github/workflows/reusable-workflow.yml  # 500 lines of abstraction
    with:
      environment: production
      strategy: canary
      # 50 other parameters for "flexibility"

# Better: Explicit and simple
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to Production
        run: |
          helm upgrade --install myapp ./chart \
            --namespace production \
            --set image.tag=${{ github.sha }} \
            --wait
```

**Siloed Teams**:
```markdown
# Anti-pattern: "Throw it over the wall"
Dev Team -> QA Team (2 week queue) -> Security Team (1 week scan) 
  -> Ops Team (manual deployment)

# Better: Shared pipeline with automated gates
Dev Team owns pipeline including:
- Automated unit tests (fast feedback)
- Automated security scan (no queue)
- Automated deployment (self-service)
- Shared on-call rotation
```

**Security as Afterthought**:
```yaml
# Anti-pattern: Security review at end
stages:
  - build
  - test
  - deploy
  - security_review  # Too late!

# Better: Shift left
stages:
  - lint_and_unit_test
  - security_scan:  # Fail fast
      - sast
      - dependency_check
      - secret_detection
  - build_hardened_image
  - integration_test
  - deploy
```

---

## Chapter Summary and Preview

This chapter synthesized the patterns and anti-patterns that distinguish high-performing CI/CD implementations from struggling ones. The **Twelve-Factor App** methodology provides the foundational philosophy: codebase discipline, explicit dependency management, externalized configuration, stateless processes, and disposability. These principles, originally formulated for applications, apply with equal force to CI/CD infrastructure itself.

**Immutable infrastructure** eliminates an entire class of configuration drift bugs by ensuring that changes are made through replacement rather than mutation. When servers are cattle, not pets, and containers are ephemeral, not maintained, systems become predictable and recoverable. **Infrastructure as Code** extends immutability to the entire stack, enabling version-controlled, tested, and peer-reviewed infrastructure changes that can be rolled back as easily as code changes.

**Small, frequent changes** reduce the mean time to detect defects and limit the blast radius of failures. By deploying changes in hours rather than months, teams maintain context and can isolate root causes quickly. **Automation of toil**—manual, repetitive work—frees engineering time for higher-value activities and eliminates the errors inherent in human execution.

**Fail-fast** mechanisms surface errors immediately at their source rather than propagating them downstream where they compound and obscure root causes. **Repeatability** through hermetic builds ensures that the same inputs always produce the same outputs, enabling reproducible debugging and supply chain verification. Finally, **simplicity**—resisting the urge to over-engineer for hypothetical future requirements—produces systems that can be understood, maintained, and debugged by the teams operating them.

The anti-patterns—**monolithic pipelines**, **manual interventions**, **hardcoded configuration**, **skipped tests**, **bloated images**, **over-engineering**, **siloed teams**, and **security as afterthought**—represent the common traps that organizations fall into when under pressure or lacking experience. Recognizing these patterns enables teams to avoid them proactively.

**Key Takeaways:**
- Apply Twelve-Factor principles to CI/CD infrastructure itself, not just applications.
- Implement immutable infrastructure; never mutate running servers, always replace.
- Practice infrastructure as code with the same review and testing rigor as application code.
- Deploy small changes frequently (hours, not months) to reduce risk and improve debuggability.
- Automate all toil; if a human does it twice, write a script.
- Design fail-fast systems that surface errors at their source.
- Ensure hermetic, repeatable builds for supply chain integrity and debugging.
- Prioritize simplicity; complex systems fail in complex ways.
- Avoid the eight deadly anti-patterns: monoliths, manual steps, hardcoding, skipped tests, bloat, over-engineering, silos, and security afterthoughts.

**Next Chapter Preview:** Chapter 56: Anti-Patterns to Avoid provides a deeper dive into the specific anti-patterns that undermine CI/CD initiatives. We will examine in detail the **Giant Monolith Pipeline** that couples unrelated services and creates cascading failures, **Manual Intervention Gates** that introduce human error and delay, **Hardcoded Secrets and Configuration** that create security vulnerabilities and prevent environment portability, **Skipping Tests in Production** that defeats the purpose of continuous integration, **Bloated Container Images** that increase attack surface and startup time, **Over-Engineered Abstractions** that obscure understanding and hinder debugging, **Siloed Team Structures** that create handoffs and blame cultures, and **Security as a Final Gate** that results in expensive rework and delayed releases. For each anti-pattern, we will provide specific detection criteria, refactoring strategies, and prevention measures to help teams recognize and eliminate these practices from their workflows.