# Chapter 42: Infrastructure as Code in CI/CD

Infrastructure as Code (IaC) extends the principles of version control, automated testing, and continuous delivery from application code to infrastructure provisioning. By defining cloud resourcesâ€”virtual networks, compute instances, databases, Kubernetes clustersâ€”as declarative configuration files, organizations eliminate manual console configuration, ensure reproducible environments, and enable peer review of infrastructure changes. Integrating IaC into CI/CD pipelines requires addressing unique challenges: state management for concurrent modifications, plan/apply workflows that prevent destructive changes, and drift detection when manual interventions bypass automated processes.

This chapter establishes patterns for managing infrastructure through code, from local development environments to production cloud deployments. We examine Terraform workflows within CI/CD contexts, strategies for secure state management, Kubernetes-native infrastructure definitions, and policy enforcement that ensures infrastructure compliance before resource creation.

## 42.1 Terraform Fundamentals

Terraform uses HashiCorp Configuration Language (HCL) to declare infrastructure resources, building dependency graphs and executing API calls to provision cloud resources.

### Project Structure

Organize infrastructure code for maintainability across environments:

```
infrastructure/
â”œâ”€â”€ modules/                    # Reusable infrastructure components
â”‚   â”œâ”€â”€ vpc/
â”‚   â”‚   â”œâ”€â”€ main.tf
â”‚   â”‚   â”œâ”€â”€ variables.tf
â”‚   â”‚   â”œâ”€â”€ outputs.tf
â”‚   â”‚   â””â”€â”€ README.md
â”‚   â”œâ”€â”€ eks-cluster/
â”‚   â””â”€â”€ rds-postgres/
â”œâ”€â”€ environments/               # Environment-specific configurations
â”‚   â”œâ”€â”€ dev/
â”‚   â”‚   â”œâ”€â”€ main.tf
â”‚   â”‚   â”œâ”€â”€ variables.tf
â”‚   â”‚   â””â”€â”€ terraform.tfvars
â”‚   â”œâ”€â”€ staging/
â”‚   â””â”€â”€ production/
â”œâ”€â”€ policies/                   # Policy as Code definitions
â”‚   â””â”€â”€ sentinel/
â””â”€â”€ .github/workflows/          # CI/CD automation
    â””â”€â”€ terraform.yml
```

**Module Structure Explanation:**
- **modules/**: Contains reusable, composable infrastructure components (VPC, EKS, RDS). Each module encapsulates a logical unit of infrastructure with clear inputs (variables) and outputs.
- **environments/**: Instantiation of modules for specific environments. These directories contain the root Terraform configurations that call modules with environment-specific parameters.
- **policies/**: Sentinel or OPA policies for governance (Terraform Enterprise/Cloud or custom validation).

### Basic Configuration

**Provider Configuration:**
```hcl
# environments/production/providers.tf
terraform {
  required_version = ">= 1.5.0"
  
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.23"
    }
    helm = {
      source  = "hashicorp/helm"
      version = "~> 2.11"
    }
  }
  
  # Remote state configuration (critical for teams)
  backend "s3" {
    bucket         = "company-terraform-state-production"
    key            = "infrastructure/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks-production"
  }
}

provider "aws" {
  region = var.aws_region
  
  default_tags {
    tags = {
      Environment = var.environment
      ManagedBy   = "terraform"
      Repository  = "company/infrastructure"
    }
  }
  
  # Assume role for production access
  assume_role {
    role_arn     = var.terraform_role_arn
    session_name = "terraform-ci"
  }
}

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
  
  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    args        = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
  }
}
```

**Explanation:**
- **required_providers**: Pins provider versions to prevent breaking changes from automatic updates. The `~> 5.0` constraint allows patch updates (5.0.1) but not minor (5.1.0).
- **backend "s3"**: Stores state remotely in S3 with DynamoDB for locking. This prevents concurrent modifications and enables team collaboration.
- **assume_role**: CI/CD pipelines assume an IAM role rather than using static credentials, providing audit trails and temporary credentials.
- **default_tags**: Automatically applies tags to all AWS resources for cost allocation and compliance.

### Resource Definitions

**VPC Module (modules/vpc/main.tf):**
```hcl
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "${var.environment}-vpc"
  }
}

resource "aws_subnet" "private" {
  count             = length(var.availability_zones)
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 8, count.index)
  availability_zone = var.availability_zones[count.index]
  
  tags = {
    Name = "${var.environment}-private-${count.index + 1}"
    Type = "private"
    "kubernetes.io/role/internal-elb" = "1"  # For EKS internal load balancers
  }
}

resource "aws_subnet" "public" {
  count                   = length(var.availability_zones)
  vpc_id                  = aws_vpc.main.id
  cidr_block              = cidrsubnet(var.vpc_cidr, 8, count.index + 100)
  availability_zone       = var.availability_zones[count.index]
  map_public_ip_on_launch = true

  tags = {
    Name = "${var.environment}-public-${count.index + 1}"
    Type = "public"
    "kubernetes.io/role/elb" = "1"  # For EKS external load balancers
  }
}

resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
  
  tags = {
    Name = "${var.environment}-igw"
  }
}

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id
  
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }
  
  tags = {
    Name = "${var.environment}-public-rt"
  }
}

resource "aws_route_table_association" "public" {
  count          = length(aws_subnet.public)
  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}
```

**Explanation:**
- **cidrsubnet**: Calculates subnet CIDRs automatically from the VPC CIDR. `cidrsubnet(var.vpc_cidr, 8, count.index)` adds 8 bits to the prefix, creating /24 subnets from a /16 VPC.
- **count**: Creates one subnet per availability zone. `length(var.availability_zones)` determines the count dynamically.
- **Tags**: The `kubernetes.io/role/elb` tag is essential for EKS, signaling AWS Load Balancer Controller which subnets to use for external/internal load balancers.
- **Implicit Dependencies**: Terraform builds the dependency graph automaticallyâ€”subnets reference the VPC ID, so Terraform creates the VPC first.

## 42.2 State Management

Terraform state maps configuration to real-world resources. Remote state with locking is mandatory for team environments.

### Remote State Configuration

**S3 Backend with Encryption:**
```hcl
terraform {
  backend "s3" {
    bucket = "company-terraform-state"
    key    = "production/terraform.tfstate"
    region = "us-east-1"
    
    # Enable state locking with DynamoDB
    dynamodb_table = "terraform-locks"
    
    # Encryption at rest
    encrypt = true
    
    # State file versioning for recovery
    versioning = true
    
    # Access logging
    logging {
      target_bucket = "company-terraform-logs"
      target_prefix = "state-access/"
    }
  }
}
```

**State Locking Mechanism:**
When Terraform runs, it acquires a lock in DynamoDB:
```json
{
  "LockID": {
    "S": "company-terraform-state/production/terraform.tfstate"
  },
  "Info": {
    "S": "{\"ID\":\"12345678-1234-1234-1234-123456789012\",\"Operation\":\"OperationTypeApply\",\"Who\":\"ci-runner@github-actions\",\"Version\":\"1.6.0\",\"Created\":\"2024-01-15T10:30:00Z\",\"Path\":\"\",\"Info\":\"\"}"
  }
}
```

**Explanation:**
If another Terraform process tries to run simultaneously, it checks DynamoDB, sees the lock, and waits or fails. This prevents concurrent modifications that could corrupt state. The lock releases automatically when Terraform completes or can be force-unlocked manually if a process crashes.

### State Security

**State Contains Secrets:**
```hcl
# NEVER commit state files to Git!
# .gitignore
*.tfstate
*.tfstate.*
.terraform/
.terraform.lock.hcl
crash.log
override.tf
```

**Remote State Data Sources:**
Reference outputs from other Terraform configurations:
```hcl
# environments/production/data.tf
data "terraform_remote_state" "networking" {
  backend = "s3"
  config = {
    bucket = "company-terraform-state"
    key    = "networking/terraform.tfstate"
    region = "us-east-1"
  }
}

# Use networking outputs
resource "aws_security_group" "app" {
  vpc_id = data.terraform_remote_state.networking.outputs.vpc_id
  
  ingress {
    from_port   = 8080
    to_port     = 8080
    protocol    = "tcp"
    cidr_blocks = data.terraform_remote_state.networking.outputs.private_subnet_cidrs
  }
}
```

**Explanation:**
The `terraform_remote_state` data source reads outputs from the networking state file. This enables micro-stacks patternâ€”separate state files for networking, databases, and applicationsâ€”preventing giant monolithic state files while maintaining dependencies.

## 42.3 Plan/Apply Workflow

The plan/apply workflow is Terraform's equivalent of build/test/deploy, providing visibility into changes before execution.

### CI/CD Integration

**GitHub Actions Workflow:**
```yaml
# .github/workflows/terraform.yml
name: Infrastructure CI/CD

on:
  push:
    branches: [main]
    paths:
      - 'infrastructure/**'
  pull_request:
    paths:
      - 'infrastructure/**'

env:
  TF_IN_AUTOMATION: "true"
  TF_INPUT: "false"

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: "1.6.0"
          terraform_wrapper: false
      
      - name: Terraform Format Check
        working-directory: infrastructure
        run: terraform fmt -check -recursive
      
      - name: Terraform Init
        working-directory: infrastructure/environments/production
        run: terraform init -backend=false
      
      - name: Terraform Validate
        working-directory: infrastructure/environments/production
        run: terraform validate

  plan:
    needs: validate
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
      
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789:role/TerraformCIRole
          aws-region: us-east-1
      
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
      
      - name: Terraform Init
        working-directory: infrastructure/environments/production
        run: terraform init
      
      - name: Terraform Plan
        id: plan
        working-directory: infrastructure/environments/production
        run: terraform plan -no-color -out=tfplan
      
      - name: Upload Plan
        uses: actions/upload-artifact@v4
        with:
          name: terraform-plan
          path: infrastructure/environments/production/tfplan
      
      - name: Comment Plan on PR
        if: github.event_name == 'pull_request'
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const plan = fs.readFileSync('infrastructure/environments/production/tfplan.stdout', 'utf8');
            
            const output = `#### Terraform Plan ðŸ“–
            <details><summary>Show Plan</summary>
            
            \`\`\`hcl
            ${plan.substring(0, 65000)} // Truncate for GitHub comment limits
            \`\`\`
            
            </details>`;
            
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: output
            });

  apply:
    needs: plan
    runs-on: ubuntu-latest
    environment: production  # Requires manual approval
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    steps:
      - uses: actions/checkout@v4
      
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789:role/TerraformCIRole
          aws-region: us-east-1
      
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
      
      - name: Download Plan
        uses: actions/download-artifact@v4
        with:
          name: terraform-plan
      
      - name: Terraform Init
        working-directory: infrastructure/environments/production
        run: terraform init
      
      - name: Terraform Apply
        working-directory: infrastructure/environments/production
        run: terraform apply -auto-approve tfplan
      
      - name: Store State
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: terraform-state-backup
          path: infrastructure/environments/production/terraform.tfstate
```

**Explanation:**
- **validate**: Checks formatting (`fmt -check`) and syntax (`validate`). Runs on every PR.
- **plan**: Generates execution plans and posts them as PR comments for review. Uses OIDC for AWS authentication (no static credentials).
- **apply**: Only runs on `main` branch merges, requires manual approval via GitHub Environments, and uses the pre-generated plan artifact to ensure what was reviewed is what gets applied.
- **TF_IN_AUTOMATION**: Disables interactive prompts and coloring for CI environments.

### Plan Output Analysis

**Interpreting Plans:**
```hcl
# Plan output example
Terraform will perform the following actions:

  # aws_instance.web will be created
  + resource "aws_instance" "web" {
      + ami                          = "ami-0c55b159cbfafe1f0"
      + instance_type                = "t3.micro"
      + subnet_id                    = "subnet-12345678"
      + vpc_security_group_ids       = [
          + "sg-12345678",
        ]
      + tags                         = {
          + "Name" = "web-server"
        }
    }

  # aws_security_group.web will be updated in-place
  ~ resource "aws_security_group" "web" {
        id                     = "sg-12345678"
        name                   = "web-sg"
      ~ ingress                = [
          + {
              + cidr_blocks      = [
                  + "10.0.0.0/8",
                ]
              + description      = "Allow internal traffic"
              + from_port        = 443
              + protocol         = "tcp"
              + to_port          = 443
            },
        ]
    }

  # aws_instance.old will be destroyed
  - resource "aws_instance" "old" {
      - ami = "ami-0987654321" -> null
    }

Plan: 1 to add, 1 to change, 1 to destroy.
```

**Explanation:**
- **+**: Create (green in colored output)
- **~**: Update in-place (yellow)
- **-**: Destroy (red)

The plan shows exactly what will happen. Reviewers should watch for:
- Destructive changes (destroy) that might lose data
- Security group modifications that open unintended access
- Instance type changes that trigger recreation (and downtime)

## 42.4 Kubernetes Manifests in Git

While Terraform provisions infrastructure, Kubernetes manifests define the desired state of cluster resources. Keeping manifests in Git enables GitOps workflows.

### Terraform Kubernetes Provider

Manage K8s resources via Terraform for infrastructure-level components:

```hcl
# modules/eks-addons/main.tf
resource "kubernetes_namespace" "monitoring" {
  metadata {
    name = "monitoring"
    
    labels = {
      "pod-security.kubernetes.io/enforce" = "restricted"
      "pod-security.kubernetes.io/audit"   = "restricted"
    }
  }
}

resource "helm_release" "prometheus" {
  name       = "prometheus"
  repository = "https://prometheus-community.github.io/helm-charts"
  chart      = "kube-prometheus-stack"
  version    = "55.0.0"
  namespace  = kubernetes_namespace.monitoring.metadata[0].name
  
  values = [
    templatefile("${path.module}/values/prometheus-values.yaml", {
      retention    = var.prometheus_retention
      storage_class = var.storage_class
    })
  ]
  
  set {
    name  = "grafana.enabled"
    value = "true"
  }
  
  depends_on = [kubernetes_namespace.monitoring]
}

resource "kubernetes_storage_class" "ebs_ssd" {
  metadata {
    name = "ebs-ssd"
  }
  
  storage_provisioner = "ebs.csi.aws.com"
  reclaim_policy      = "Retain"
  volume_binding_mode = "WaitForFirstConsumer"
  
  parameters = {
    type   = "gp3"
    encrypted = "true"
    kmsKeyId  = var.ebs_kms_key_id
  }
}
```

**Explanation:**
- **kubernetes_namespace**: Creates namespaces with Pod Security Standards labels (enforces restricted security profile).
- **helm_release**: Deploys Helm charts via Terraform. The `depends_on` ensures namespace exists before chart installation.
- **templatefile**: Renders Helm values from templates, allowing environment-specific configuration injection.
- **storage_class**: Infrastructure-level K8s resource (storage provisioning) appropriately managed by Terraform.

### Separating Application and Infrastructure

**Infrastructure (Terraform):**
- VPCs, subnets, NAT gateways
- EKS clusters, node groups
- RDS instances, ElastiCache
- IAM roles, security groups
- Storage classes, ingress controllers

**Applications (GitOps/ArgoCD/Flux):**
- Microservice Deployments
- Service configurations
- ConfigMaps and Secrets
- HorizontalPodAutoscalers
- Application-specific ingress rules

**Boundary:**
```hcl
# Terraform outputs infrastructure info for GitOps
output "cluster_name" {
  value = aws_eks_cluster.main.name
}

output "cluster_endpoint" {
  value = aws_eks_cluster.main.endpoint
}

output "irsa_role_arns" {
  value = {
    payment_service = module.payment_service_irsa.iam_role_arn
    order_service   = module.order_service_irsa.iam_role_arn
  }
}
```

**GitOps Application:**
```yaml
# apps/base/payment-service/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
spec:
  template:
    spec:
      serviceAccountName: payment-service
      containers:
      - name: payment
        image: payment-service:v2.1.0
        env:
        - name: AWS_ROLE_ARN
          value: "${terraform_outputs.irsa_role_arns.payment_service}"
```

## 42.5 Kustomize Integration

Kustomize transforms base Kubernetes manifests for different environments without templating.

### Structure

```
k8s/
â”œâ”€â”€ base/
â”‚   â”œâ”€â”€ deployment.yaml
â”‚   â”œâ”€â”€ service.yaml
â”‚   â””â”€â”€ kustomization.yaml
â””â”€â”€ overlays/
    â”œâ”€â”€ development/
    â”‚   â”œâ”€â”€ kustomization.yaml
    â”‚   â””â”€â”€ replica-patch.yaml
    â”œâ”€â”€ staging/
    â””â”€â”€ production/
```

**Base Configuration (k8s/base/kustomization.yaml):**
```yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - deployment.yaml
  - service.yaml
  - ingress.yaml

commonLabels:
  app.kubernetes.io/managed-by: kustomize
  app.kubernetes.io/part-of: payment-platform

images:
  - name: payment-service
    newTag: v2.1.0  # Overridden in overlays
```

**Production Overlay (k8s/overlays/production/kustomization.yaml):**
```yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: production

resources:
  - ../../base
  - pdb.yaml
  - hpa.yaml

namePrefix: prod-

commonLabels:
  environment: production
  tier: critical

patches:
  - path: deployment-patch.yaml
  - path: resource-patch.yaml

configMapGenerator:
  - name: payment-config
    behavior: merge
    literals:
      - LOG_LEVEL=warn
      - DB_POOL_SIZE=50

replicas:
  - name: payment-service
    count: 5

images:
  - name: payment-service
    newTag: v2.1.0-sha256:abc123  # Immutable tag from CI
```

**Deployment Patch (deployment-patch.yaml):**
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
spec:
  template:
    spec:
      containers:
        - name: payment
          resources:
            requests:
              memory: "1Gi"
              cpu: "500m"
            limits:
              memory: "2Gi"
              cpu: "2000m"
          securityContext:
            runAsNonRoot: true
            readOnlyRootFilesystem: true
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - ALL
```

**Explanation:**
- **namespace**: All resources deployed to production namespace
- **namePrefix**: Resources become prod-payment-service
- **patches**: Strategic merge patches update specific fields (resources, security contexts) without duplicating entire YAML
- **replicas**: Overrides the replica count from base
- **configMapGenerator**: Merges with base ConfigMaps, overriding specific keys

### Terraform Kustomize Provider

Apply Kustomize via Terraform for infrastructure bootstrapping:

```hcl
data "kustomization_build" "production" {
  path = "${path.module}/../../k8s/overlays/production"
}

resource "kustomization_resource" "production" {
  for_each = data.kustomization_build.production.ids
  
  manifest = data.kustomization_build.production.manifests[each.value]
  
  depends_on = [aws_eks_cluster.main]
}
```

**Explanation:**
The `kustomization_build` data source runs `kustomize build` during Terraform plan, rendering the final manifests. Terraform then manages these as individual resources. This is useful for initial cluster bootstrapping, though GitOps tools (ArgoCD/Flux) are preferred for ongoing application management.

## 42.6 Drift Detection

Drift occurs when manual changes (via AWS Console or kubectl) diverge from Terraform state.

### Automated Drift Detection

**Terraform Plan in CI:**
```yaml
# .github/workflows/drift-detection.yml
name: Infrastructure Drift Detection
on:
  schedule:
    - cron: '0 */6 * * *'  # Every 6 hours

jobs:
  detect:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Terraform Init
        run: terraform init
      
      - name: Check Drift
        id: plan
        run: terraform plan -detailed-exitcode
        continue-on-error: true
      
      - name: Alert on Drift
        if: steps.plan.outcome == 'failure' && steps.plan.exitcode == 2
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {
              "text": "ðŸš¨ Infrastructure drift detected in production!",
              "blocks": [
                {
                  "type": "section",
                  "text": {
                    "type": "mrkdwn",
                    "text": "Manual changes detected. Run `terraform plan` locally to review."
                  }
                }
              ]
            }
```

**Explanation:**
Terraform exits with code 2 if drift detected (changes exist but not in Terraform configuration). The workflow runs every 6 hours, alerting Slack if manual modifications occurred.

### Remediation Strategies

**Option 1: Terraform Refresh (Adopt Changes)**
```bash
terraform apply -refresh-only
# Updates state to match reality without changing resources
```

**Option 2: Manual Reversion (Restore to Code)**
```bash
terraform apply
# Reverts manual changes back to code definition
```

**Option 3: Policy Enforcement (Prevent Manual Changes)**
Use AWS Service Control Policies or Azure Policy to deny console modifications:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyManualEC2Changes",
      "Effect": "Deny",
      "Action": [
        "ec2:TerminateInstances",
        "ec2:ModifyInstanceAttribute"
      ],
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:PrincipalTag/ManagedBy": "terraform"
        }
      }
    }
  ]
}
```

## 42.7 Testing Infrastructure

Infrastructure testing validates configurations before deployment.

### Terratest (Go)

```go
// test/vpc_test.go
package test

import (
    "testing"
    "github.com/gruntwork-io/terratest/modules/terraform"
    "github.com/stretchr/testify/assert"
)

func TestVpcModule(t *testing.T) {
    t.Parallel()
    
    terraformOptions := &terraform.Options{
        TerraformDir: "../modules/vpc",
        Vars: map[string]interface{}{
            "vpc_cidr":           "10.0.0.0/16",
            "availability_zones": []string{"us-east-1a", "us-east-1b"},
            "environment":        "test",
        },
    }
    
    // Clean up resources after test
    defer terraform.Destroy(t, terraformOptions)
    
    // Deploy
    terraform.InitAndApply(t, terraformOptions)
    
    // Validate outputs
    vpcId := terraform.Output(t, terraformOptions, "vpc_id")
    assert.NotEmpty(t, vpcId)
    
    privateSubnets := terraform.OutputList(t, terraformOptions, "private_subnet_ids")
    assert.Equal(t, 2, len(privateSubnets))
    
    // AWS API validation
    awsRegion := "us-east-1"
    vpc := aws.GetVpcById(t, vpcId, awsRegion)
    assert.Equal(t, "10.0.0.0/16", vpc.CidrBlock)
    
    subnets := aws.GetSubnetsForVpc(t, vpcId, awsRegion)
    assert.Equal(t, 4, len(subnets)) // 2 private + 2 public
}
```

**Explanation:**
Terratest deploys real infrastructure in a test environment, validates the outputs, queries AWS APIs to verify resources were created correctly, then destroys everything. This catches configuration errors that `terraform validate` misses (e.g., invalid AMI IDs, insufficient IAM permissions).

### Terraform Compliance (tfsec/Checkov)

**tfsec Security Scanning:**
```yaml
- name: Security Scan
  uses: aquasecurity/tfsec-action@v1.1.0
  with:
    soft_fail: true
    config_file: tfsec.yml
```

**tfsec Configuration (tfsec.yml):**
```yaml
---
exclude:
  - aws-iam-no-policy-wildcards  # We use wildcards intentionally for specific roles
severity_overrides:
  HIGH: CRITICAL
  MEDIUM: HIGH
```

**Checkov Policy as Code:**
```yaml
# checkov.yml
skip-check:
  - CKV_AWS_23  # Skip "Ensure AWS security groups do not allow unrestricted ingress"
  
compact: true
quiet: true
```

## 42.8 Policy as Code

Enforce organizational standards before infrastructure creation.

### Sentinel (Terraform Cloud/Enterprise)

```hcl
# policies/restrict-instance-type.sentinel
import "tfplan"

# Allowed instance types
allowed_types = ["t3.micro", "t3.small", "t3.medium", "t3.large"]

# Get all AWS instances from plan
aws_instances = filter tfplan.resource_changes as _, rc {
    rc.type is "aws_instance"
}

# Validate each instance
violations = 0
for aws_instances as _, instance {
    if instance.change.after.instance_type not in allowed_types {
        violations += 1
        print("Instance", instance.address, "uses disallowed type:", 
              instance.change.after.instance_type)
    }
}

# Enforce policy
main = rule {
    violations == 0
}
```

**Policy Enforcement:**
```hcl
# sentinel.hcl
policy "restrict-instance-type" {
    enforcement_level = "hard-mandatory"  # Blocks apply
    # Alternatives: "soft-mandatory" (warns but allows override), "advisory" (logs only)
}
```

### OPA (Open Policy Agent) for Terraform

```rego
# policies/terraform.rego
package terraform.aws.security

import future.keywords.if
import future.keywords.in

# Deny public S3 buckets
deny[msg] if {
    some resource
    input.resource_changes[resource]
    resource.type == "aws_s3_bucket"
    resource.change.after.acl == "public-read"
    
    msg := sprintf("S3 bucket %s must not be public", [resource.address])
}

# Deny unencrypted RDS
deny[msg] if {
    some resource
    input.resource_changes[resource]
    resource.type == "aws_db_instance"
    not resource.change.after.storage_encrypted
    
    msg := sprintf("RDS instance %s must be encrypted", [resource.address])
}

# Deny overly permissive security groups
deny[msg] if {
    some resource
    input.resource_changes[resource]
    resource.type == "aws_security_group_rule"
    resource.change.after.cidr_blocks[_] == "0.0.0.0/0"
    resource.change.after.from_port == 22
    
    msg := sprintf("Security group %s allows SSH from 0.0.0.0/0", [resource.address])
}
```

**Explanation:**
OPA evaluates the Terraform plan JSON against Rego policies. The `input` variable contains the plan output. Policies deny resources that violate security standards (public S3, unencrypted databases, open SSH).

### Conftest Integration

```bash
# Test Terraform plan against OPA policies
terraform plan -out=tfplan
terraform show -json tfplan > plan.json

conftest test plan.json -p policies/
```

**CI Integration:**
```yaml
- name: Policy Check
  run: |
    terraform plan -out=tfplan
    terraform show -json tfplan > plan.json
    conftest test plan.json -p policies/
```

---

## Chapter Summary and Preview

This chapter established Infrastructure as Code as a critical component of continuous delivery, extending version control and automated testing principles to cloud infrastructure provisioning. We examined Terraform's declarative configuration language and the essential plan/apply workflow that provides visibility into infrastructure changes before execution. Remote state management using S3 with DynamoDB locking enables team collaboration while preventing concurrent modification conflicts, and state security practices ensure sensitive outputs remain protected.

The separation of concerns between infrastructure provisioning (Terraform) and application deployment (GitOps/ArgoCD/Flux) creates clear boundaries, with Terraform handling VPCs, clusters, and managed databases while Kubernetes manifests manage microservice deployments. Kustomize provides environment-specific configuration management without templating complexity, enabling overlay patterns that maintain DRY principles while supporting production-specific security hardening and resource allocation.

Drift detection mechanisms identify manual console modifications that bypass automated pipelines, with scheduled reconciliation jobs ensuring infrastructure reality matches code definitions. Testing strategies using Terratest validate real infrastructure deployment, while static analysis tools (tfsec, Checkov) and Policy as Code frameworks (Sentinel, OPA) enforce security and compliance standards before resource creation, preventing misconfigurations from reaching production.

**Key Takeaways:**
- Always use remote state with locking (S3 + DynamoDB) for team environments to prevent state corruption, and enable versioning for disaster recovery.
- Implement the plan/apply workflow with mandatory PR reviews for plan output; never run `terraform apply` locally in production environments.
- Separate infrastructure (Terraform) from application configuration (GitOps), using Terraform outputs to provide infrastructure endpoints (database URLs, IAM roles) to applications.
- Use the expand-contract pattern for infrastructure changes: add new resources alongside old ones, migrate dependencies, then remove old resources to prevent downtime.
- Implement automated drift detection via scheduled CI jobs that alert when manual console changes diverge from Terraform state, with policies preventing manual modifications where possible.
- Enforce Policy as Code using OPA or Sentinel to automatically deny non-compliant resources (public S3 buckets, unencrypted databases, overly permissive security groups) during the plan phase.

**Next Chapter Preview:**
Chapter 43: Logging in CI/CD explores comprehensive strategies for capturing, aggregating, and analyzing logs across distributed systems. We will examine structured logging formats (JSON) that enable machine parsing, the EFK/PLG stack (Elasticsearch-Fluentd-Kibana / Prometheus-Loki-Grafana) for log aggregation, centralized logging patterns that collect container logs from Kubernetes clusters, and correlation techniques using trace IDs that connect logs across microservices. The chapter covers log retention policies, sensitive data redaction, and integration with CI/CD pipelines for build and deployment audit trails, establishing observability foundations that support debugging, security forensics, and compliance requirements across the entire software delivery lifecycle.