# **Chapter 13: Cloud-Native Security**

## Introduction: The Shared Responsibility Boundary

Cloud-native security represents a paradigm shift from traditional infrastructure protection. In bare-metal and virtualized environments, you controlled the entire stack—from the physical hardware to the application layer. In cloud-native architectures, boundaries blur: your code runs in containers orchestrated by Kubernetes, deployed via Infrastructure as Code (IaC), and distributed across availability zones you may never physically visit.

This shift introduces the **Shared Responsibility Model**, where cloud providers secure the *platform* (hardware, hypervisors, network fabric), while you secure everything *in* the cloud: the operating system (if using IaaS), container images, orchestration configurations, application secrets, and data. A single misconfigured S3 bucket or overly privileged Kubernetes ServiceAccount can expose your entire infrastructure, regardless of how secure your application code is.

Cloud-native security follows the **Defense in Depth** principle across four distinct layers: the **Image** (what you build), the **Registry** (where you store it), the **Orchestrator** (how you run it), and the **Host** (where it executes). This chapter guides you through securing each layer, following the CIS Benchmarks, NIST SP 800-190 (Container Security Guide), and the Cloud Native Computing Foundation (CNCF) security whitepapers.

By the end, you will understand how to harden container images, implement zero-trust networking in Kubernetes, secure your Infrastructure as Code pipelines, and maintain visibility in ephemeral, distributed systems.

---

## 13.1 Cloud Security Fundamentals: The Shared Responsibility Model

Before deploying containers, you must understand where your security obligations begin and end. The three primary cloud service models—IaaS, PaaS, and SaaS—each shift responsibility differently.

### Responsibility Matrix

| Layer | On-Premises | IaaS (EC2/GCE) | PaaS (EKS/GKE) | SaaS (S3/RDS) |
|-------|-------------|----------------|----------------|---------------|
| **Data** | You | You | You | You |
| **Application** | You | You | You | Provider |
| **Runtime/Middleware** | You | You | Provider | Provider |
| **Operating System** | You | You | Provider | Provider |
| **Virtualization** | You | Provider | Provider | Provider |
| **Physical Hardware** | You | Provider | Provider | Provider |

**Critical Insight:** When using managed Kubernetes (EKS, AKS, GKE), the control plane (API server, etcd, scheduler) is the provider's responsibility, but the worker nodes, pod security, and network policies remain yours.

### Cloud-Native Threat Model

Cloud-native environments face unique threats:

1. **Container Escape**: Breaking out of the container namespace to access the host
2. **Image Poisoning**: Using base images with embedded malware or vulnerable packages
3. **Credential Harvesting**: Stealing cloud IAM tokens from instance metadata services
4. **Lateral Movement**: Moving between pods through flat network architectures
5. **Supply Chain Attacks**: Compromising CI/CD pipelines to inject malicious code

### Secure Cloud Architecture Principles

**Zero Trust Networking**: Never trust, always verify. In cloud-native environments:
- **mTLS** between all services (not just edge ingress)
- **Identity-aware proxies** for service-to-service communication
- **Network segmentation** at the pod level, not just subnet level

**Immutable Infrastructure**: Servers are never modified after deployment. If a change is needed, replace the instance/container rather than patching live. This ensures:
- Known good states from version control
- Rapid rollback capabilities
- Elimination of configuration drift

**Least Privilege IAM**: Cloud IAM roles should follow the principle of least privilege with **temporary credentials** (not long-term access keys).

**AWS IAM Secure Configuration:**
```json
// Trust policy: Only allow EC2 instances with specific tags to assume this role
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "ec2.amazonaws.com"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/Environment": "production",
                    "aws:ResourceTag/Team": "platform"
                },
                "Bool": {
                    "aws:MultiFactorAuthPresent": "true"
                }
            }
        }
    ]
}

// Permissions policy: Explicit deny of dangerous actions
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": "arn:aws:s3:::company-data-bucket/${aws:username}/*",
            "Condition": {
                "StringEquals": {
                    "s3:x-amz-server-side-encryption": "AES256"
                },
                "IpAddress": {
                    "aws:SourceIp": ["10.0.0.0/8"]
                }
            }
        },
        {
            "Effect": "Deny",
            "Action": [
                "s3:DeleteBucket",
                "s3:PutBucketAcl"
            ],
            "Resource": "*"
        }
    ]
}
```

**GCP Service Account Least Privilege:**
```yaml
# Minimal permissions for GKE node service account
title: GKE Node Minimal
description: Minimal permissions for GKE nodes to function
stage: GA
includedPermissions:
  - monitoring.metricDescriptors.list
  - monitoring.timeSeries.create
  - logging.logEntries.create
  - devstorage.read_only  # For pulling images from GCR
# Explicitly NOT included: compute.instances.create, iam.serviceAccounts.actAs, etc.
```

---

## 13.2 Container and Docker Security

Containers are isolated processes, not virtual machines. They share the host kernel, making container escape particularly dangerous. Security must be embedded in the image build process and runtime configuration.

### Secure Image Building

**Multi-Stage Builds**: Minimize attack surface by separating build tools from runtime.

```dockerfile
# Dockerfile with security hardening
# Stage 1: Build environment (full toolchain)
FROM golang:1.21-alpine AS builder

# Install build dependencies
RUN apk add --no-cache git ca-certificates tzdata

WORKDIR /app

# Copy and download dependencies (cached layer)
COPY go.mod go.sum ./
RUN go mod download

# Copy source and build
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo \
    -ldflags='-w -s -extldflags "-static"' \
    -o /app/server ./cmd/server

# Stage 2: Runtime environment (minimal)
FROM gcr.io/distroless/static-debian11:nonroot

# Non-root user (UID 65532 in distroless)
USER nonroot:nonroot

# Copy only necessary artifacts from builder
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=builder /usr/share/zoneinfo /usr/share/zoneinfo
COPY --from=builder /app/server /app/server

# No shell, no package manager, minimal attack surface
ENTRYPOINT ["/app/server"]

# Expose port (documentation only)
EXPOSE 8080
```

**Security Best Practices in Dockerfile:**
```dockerfile
# 1. Use specific versions, not 'latest'
FROM node:18.19.0-alpine3.18

# 2. Create non-root user
RUN addgroup -g 1000 appgroup && \
    adduser -u 1000 -G appgroup -s /bin/sh -D appuser

# 3. Set working directory with proper ownership
WORKDIR /app
RUN chown appuser:appgroup /app

# 4. Install dependencies as root, then switch user
COPY --chown=appuser:appgroup package*.json ./
RUN npm ci --only=production && \
    npm cache clean --force

# 5. Copy application code with correct ownership
COPY --chown=appuser:appgroup . .

# 6. Switch to non-root user for runtime
USER appuser

# 7. Health check (doesn't run as root)
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
    CMD node healthcheck.js || exit 1

# 8. Read-only filesystem where possible
# (Set in Kubernetes or docker run, not Dockerfile)
```

### Image Scanning and Supply Chain

Images must be scanned for vulnerabilities before deployment.

**Trivy Scanning (CI/CD Integration):**
```bash
# Scan for OS and application vulnerabilities
trivy image --exit-code 1 --severity HIGH,CRITICAL myapp:latest

# Scan filesystem during build
trivy filesystem --scanners vuln,secret,config .

# Generate SBOM (Software Bill of Materials)
trivy image --format cyclonedx --output sbom.json myapp:latest
```

**Container Signing with Cosign:**
```bash
# Sign image using Sigstore/Cosign (keyless signing)
cosign sign --yes myregistry/myapp@sha256:abc123...

# Verify signature in deployment pipeline
cosign verify \
  --certificate-identity-regexp="^https://github.com/myorg/" \
  --certificate-oidc-issuer="https://token.actions.githubusercontent.com" \
  myregistry/myapp@sha256:abc123...
```

### Runtime Security

**Docker Security Options:**
```bash
# Run with security constraints
docker run -d \
  --read-only \  # Read-only root filesystem
  --tmpfs /tmp:noexec,nosuid,size=100m \  # Writable tmpfs with restrictions
  --user 1000:1000 \  # Non-root user
  --cap-drop=ALL \  # Drop all capabilities
  --cap-add=NET_BIND_SERVICE \  # Add only necessary capability
  --security-opt=no-new-privileges:true \  # Prevent privilege escalation
  --security-opt=seccomp=restricted.json \  # Custom seccomp profile
  --network=internal \  # Custom bridge network, not default
  --memory=512m \  # Memory limits
  --cpus=1.0 \  # CPU limits
  --pids-limit=100 \  # Fork bomb protection
  myapp:latest
```

**Seccomp Profile (restricted.json):**
```json
{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_X86", "SCMP_ARCH_AARCH64"],
  "syscalls": [
    {
      "names": [
        "accept", "accept4", "bind", "clone", "close", "connect",
        "epoll_create", "epoll_create1", "epoll_ctl", "epoll_pwait",
        "epoll_wait", "eventfd2", "exit", "exit_group", "fcntl",
        "fstat", "futex", "getpid", "getrandom", "getsockname",
        "getsockopt", "ioctl", "listen", "mmap", "mprotect", "munmap",
        "nanosleep", "openat", "poll", "read", "recvfrom", "rt_sigaction",
        "rt_sigprocmask", "rt_sigreturn", "select", "sendto", "setitimer",
        "setsockopt", "sigaltstack", "socket", "socketpair", "write"
      ],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}
```

---

## 13.3 Kubernetes Security

Kubernetes is a complex distributed system with multiple attack surfaces: the API server, etcd, kubelet, and the workloads themselves.

### Pod Security Standards

Kubernetes 1.25+ replaced Pod Security Policies with **Pod Security Standards** (enforced via admission controllers).

**Restricted Pod Specification:**
```yaml
apiVersion: v1
kind: Pod
metadata:
  name: secure-app
  namespace: production
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    runAsGroup: 1000
    fsGroup: 1000
    seccompProfile:
      type: RuntimeDefault  # Or Localhost with custom profile
    sysctls:
      - name: net.ipv4.ip_unprivileged_port_start
        value: "80"  # Allow non-root to bind low ports if needed
  
  containers:
    - name: app
      image: myregistry/app:v1.2.3
      imagePullPolicy: Always
      
      securityContext:
        allowPrivilegeEscalation: false  # Prevent setuid binaries from gaining privs
        readOnlyRootFilesystem: true     # Immutable root
        capabilities:
          drop:
            - ALL  # Drop all Linux capabilities
        privileged: false
        seLinuxOptions:
          level: "s0:c123,c456"  # SELinux context
        
      resources:
        limits:
          memory: "512Mi"
          cpu: "1000m"
          ephemeral-storage: "1Gi"
        requests:
          memory: "256Mi"
          cpu: "500m"
      
      volumeMounts:
        - name: tmp
          mountPath: /tmp
        - name: cache
          mountPath: /app/cache
        - name: config
          mountPath: /app/config
          readOnly: true
      
      livenessProbe:
        httpGet:
          path: /healthz
          port: 8080
        initialDelaySeconds: 10
        periodSeconds: 10
      
      readinessProbe:
        httpGet:
          path: /ready
          port: 8080
        initialDelaySeconds: 5
  
  volumes:
    - name: tmp
      emptyDir:
        medium: Memory  # tmpfs, not disk
        sizeLimit: 100Mi
    - name: cache
      emptyDir:
        sizeLimit: 500Mi
    - name: config
      configMap:
        name: app-config
        defaultMode: 0444  # Read-only permissions
  
  automountServiceAccountToken: false  # Prevent token theft
  enableServiceLinks: false  # Prevent env var injection
```

### Network Policies (Zero-Trust Networking)

By default, Kubernetes allows all pod-to-pod communication. Network Policies enforce explicit allow rules.

**Default Deny All (Base Policy):**
```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}  # All pods
  policyTypes:
    - Ingress
    - Egress
  # No rules = deny all
```

**Application-Specific Policy:**
```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-server-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api-server
  policyTypes:
    - Ingress
    - Egress
  
  ingress:
    # Allow traffic from ingress controller
    - from:
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx
          podSelector:
            matchLabels:
              app.kubernetes.io/name: ingress-nginx
      ports:
        - protocol: TCP
          port: 8080
    
    # Allow from frontend pods in same namespace
    - from:
        - podSelector:
            matchLabels:
              app: frontend
      ports:
        - protocol: TCP
          port: 8080
  
  egress:
    # Allow to database only
    - to:
        - podSelector:
            matchLabels:
              app: postgres
      ports:
        - protocol: TCP
          port: 5432
    
    # Allow to Redis only
    - to:
        - podSelector:
            matchLabels:
              app: redis
      ports:
        - protocol: TCP
          port: 6379
    
    # Allow DNS resolution
    - to:
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53
```

### RBAC (Role-Based Access Control)

Principle of Least Privilege for Kubernetes API access.

**Service Account with Minimal Permissions:**
```yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: app-reader
  namespace: production
automountServiceAccountToken: false  # Opt-out of automatic token mounting
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: config-reader
  namespace: production
rules:
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get", "list"]
    resourceNames: ["app-config"]  # Specific resource only
  
  - apiGroups: [""]
    resources: ["secrets"]
    verbs: ["get"]
    resourceNames: ["app-secrets"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: app-reader-binding
  namespace: production
subjects:
  - kind: ServiceAccount
    name: app-reader
    namespace: production
roleRef:
  kind: Role
  name: config-reader
  apiGroup: rbac.authorization.k8s.io
```

**Cluster-Wide Restrictions:**
```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: restricted-access
rules:
  # No access to secrets cluster-wide
  # No access to nodes (prevents node info leakage)
  # No access to persistentvolumes (storage isolation)
  
  - apiGroups: [""]
    resources: ["pods", "services", "endpoints"]
    verbs: ["get", "list", "watch"]
  
  - apiGroups: ["apps"]
    resources: ["deployments", "replicasets"]
    verbs: ["get", "list", "watch"]
```

### Secrets Management

**Never use Kubernetes Secrets for sensitive data** (base64 encoded, stored in etcd). Use external secret operators.

**External Secrets Operator (AWS Secrets Manager):**
```yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: app-db-credentials
  namespace: production
spec:
  refreshInterval: 1h  # Rotate regularly
  secretStoreRef:
    kind: ClusterSecretStore
    name: aws-secrets-manager
  
  target:
    name: db-credentials
    creationPolicy: Owner
    template:
      type: Opaque
      metadata:
        annotations:
          reloader.stakater.com/auto: "true"  # Trigger rollout on change
  
  data:
    - secretKey: DATABASE_URL
      remoteRef:
        key: prod/app/db-credentials
        property: connection_string
```

---

## 13.4 Infrastructure as Code (IaC) Security

IaC defines infrastructure through code (Terraform, CloudFormation, Pulumi), enabling version control and automated security scanning.

### Terraform Security

**State File Protection:**
Terraform state files contain plaintext secrets. Secure them:
```hcl
# Backend configuration with encryption
terraform {
  backend "s3" {
    bucket         = "terraform-state-prod"
    key            = "infrastructure/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true  # Server-side encryption
    kms_key_id     = "arn:aws:kms:us-east-1:ACCOUNT:key/KEY-ID"
    dynamodb_table = "terraform-locks"  # State locking
    
    # Access logging
    logging {
      target_bucket = "terraform-logs"
      target_prefix = "state-access/"
    }
  }
}
```

**Secure Resource Configuration:**
```hcl
# Security group with least privilege
resource "aws_security_group" "app_server" {
  name        = "app-server-sg"
  description = "Security group for application servers"
  vpc_id      = aws_vpc.main.id
  
  # Ingress: Only from load balancer
  ingress {
    from_port       = 8080
    to_port         = 8080
    protocol        = "tcp"
    security_groups = [aws_security_group.lb.id]
    description     = "HTTP from load balancer"
  }
  
  # No SSH access (use AWS Systems Manager Session Manager instead)
  
  # Egress: Only to specific endpoints
  egress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]  # For AWS API access
    description = "HTTPS outbound"
  }
  
  tags = {
    Name        = "app-server-sg"
    Environment = "production"
  }
}

# S3 Bucket with security controls
resource "aws_s3_bucket" "data" {
  bucket = "company-sensitive-data"
  
  tags = {
    Security = "High"
  }
}

resource "aws_s3_bucket_public_access_block" "data" {
  bucket = aws_s3_bucket.data.id
  
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_s3_bucket_server_side_encryption_configuration" "data" {
  bucket = aws_s3_bucket.data.id
  
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm     = "aws:kms"
      kms_master_key_id = aws_kms_key.data_key.arn
    }
    bucket_key_enabled = true
  }
}

resource "aws_s3_bucket_versioning" "data" {
  bucket = aws_s3_bucket.data.id
  versioning_configuration {
    status = "Enabled"
  }
}

# Logging
resource "aws_s3_bucket_logging" "data" {
  bucket = aws_s3_bucket.data.id
  
  target_bucket = aws_s3_bucket.logs.id
  target_prefix = "data-access-logs/"
}
```

**Static Analysis with Checkov/TFLint:**
```bash
# Scan Terraform for misconfigurations
checkov --file main.tf --framework terraform

# Custom policy: Ensure encryption in transit
cat <<EOF > checkov_policy.yaml
metadata:
  name: "Ensure ALB uses HTTPS"
  category: "NETWORKING"
definition:
  cond_type: "attribute"
  resource_types:
    - "aws_lb_listener"
  attribute: "protocol"
  operator: "equals"
  value: "HTTPS"
EOF
```

---

## 13.5 Cloud-Native Logging, Monitoring, and Incident Response

Ephemeral containers require different monitoring strategies than traditional VMs.

### Security Monitoring Stack

**Falco (Runtime Threat Detection):**
```yaml
# Falco rule for detecting sensitive file access
- rule: Sensitive File Open
  desc: Detect attempts to open sensitive files (shadow, passwd)
  condition: >
    spawned_process and
    (fd.name contains "/etc/shadow" or
     fd.name contains "/etc/passwd") and
    not user.name in (root, admin)
  output: >
    Sensitive file opened
    user=%user.name command=%proc.cmdline file=%fd.name
  priority: WARNING

# Detect crypto mining
- rule: Unexpected outbound connection
  desc: Detect outbound connections from non-whitelisted processes
  condition: >
    outbound and
    not (proc.name in (wget, curl, git, apt))
  output: >
    Unexpected connection
    proc=%proc.name connection=%fd.name
  priority: NOTICE
```

**Audit Logging:**
```yaml
# Kubernetes Audit Policy
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
  # Log all requests at Metadata level
  - level: Metadata
  
  # Log specific sensitive resources at RequestResponse level
  - level: RequestResponse
    resources:
      - group: ""
        resources: ["secrets", "configmaps"]
  
  # Don't log health checks (noise reduction)
  - level: None
    userGroups: ["system:serviceaccounts:kube-system"]
    verbs: ["get", "list"]
    resources:
      - group: ""
        resources: ["endpoints", "healthz"]
```

### Incident Response in Kubernetes

**Forensic Collection:**
```bash
# Capture running container state before eviction
kubectl debug pod/compromised-pod -it --image=nicolaka/netshoot --target=app-container -- /bin/bash

# Dump network connections
kubectl exec compromised-pod -- netstat -tulpn

# Copy filesystem for analysis
kubectl cp compromised-pod:/tmp ./forensics/evidence

# Check for reverse shells
kubectl logs compromised-pod --previous | grep -E "(nc|netcat|bash -i|/bin/sh)"

# Audit who created the pod
kubectl get pod compromised-pod -o yaml | grep uid
# Then check audit logs for that UID
```

**Network Isolation (Emergency Response):**
```yaml
# Immediately isolate compromised pod
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: quarantine
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: compromised-app
  policyTypes:
    - Ingress
    - Egress
  # No rules = complete isolation
```

---

## Summary and Transition to Chapter 14

In this chapter, we ascended from application code to the infrastructure layer, recognizing that cloud-native security requires protecting the entire lifecycle from image build to runtime. You learned that the **Shared Responsibility Model** demands you secure everything above the hypervisor: your container images, orchestration configurations, and cloud IAM policies.

We explored **container hardening** through multi-stage builds, minimal base images (Distroless, Alpine), and the principle that containers are not security boundaries—requiring seccomp profiles, AppArmor/SELinux contexts, and non-root execution. **Kubernetes security** revealed the complexity of distributed systems: Pod Security Standards enforce runtime constraints, Network Policies implement zero-trust segmentation, and RBAC follows least privilege for API access. We emphasized that **Secrets** require external vaults (AWS Secrets Manager, HashiCorp Vault) rather than etcd storage, and that **Infrastructure as Code** must be scanned for misconfigurations before deployment.

The **supply chain** emerged as a critical concern: images must be signed (Cosign), scanned (Trivy), and their SBOMs tracked. Finally, we addressed **observability** in ephemeral environments—Falco for runtime threat detection, audit logging for forensics, and network policies for emergency isolation.

However, we are now witnessing a paradigm shift as significant as the move to cloud-native: the rise of Artificial Intelligence and autonomous agentic systems. While traditional applications execute deterministic logic written by developers, AI systems learn from data, make probabilistic decisions, and increasingly operate as autonomous agents capable of taking actions across systems—reading emails, executing code, making purchases, or modifying infrastructure. These systems introduce novel attack vectors: prompt injection that bypasses safety guardrails, training data poisoning that corrupts model behavior, model inversion that extracts sensitive training data, and agentic hijacking that weaponizes AI capabilities against their owners.

In **Chapter 14: AI and Agentic Application Security**, we will navigate this emerging frontier. You will learn the OWASP Top 10 for LLM Applications and the new Agentic AI security framework, understanding how to secure systems where the "code" is partially learned rather than explicitly written. We will explore prompt injection defenses, retrieval-augmented generation (RAG) security, model supply chain integrity, and the governance frameworks (ISO 42001) necessary for responsible AI deployment. As AI becomes infrastructure, securing it becomes foundational to all other security domains.