# Chapter 8: Container Orchestration with Kubernetes

The evolution of cloud computing has followed a clear trajectory: from physical servers to virtual machines, from VMs to containers, and now from individual containers to orchestrated fleets. While containers solved the "it works on my machine" problem by packaging applications with their dependencies, they introduced new challenges: How do we deploy hundreds of containers across thousands of machines? How do we ensure high availability when containers fail? How do we roll out updates without downtime?

Kubernetes—originally developed by Google and now maintained by the Cloud Native Computing Foundation (CNCF)—has emerged as the de facto standard for container orchestration. It automates the deployment, scaling, and management of containerized applications. This chapter provides a comprehensive foundation in Kubernetes architecture, core concepts, and practical deployment patterns, preparing you to operate containerized workloads at scale in production environments.

## 8.1 Containerization Deep Dive with Docker

Before orchestrating containers, we must master their construction. Docker remains the standard tool for building container images, though Kubernetes uses the Open Container Initiative (OCI) standard, making it compatible with alternatives like containerd or CRI-O.

### Docker Image Architecture
Docker images are layered filesystems. Each instruction in a Dockerfile creates a new layer, and layers are cached and reused across builds.

**Layer Optimization Principles:**
1.  **Order by Change Frequency:** Put instructions that change rarely (base image, dependency installation) at the top; frequently changing instructions (application code) at the bottom.
2.  **Minimize Layers:** Combine commands where logical (`&&` in RUN instructions).
3.  **Multi-Stage Builds:** Separate build environment from runtime environment.

**Code Snippet: Production-Optimized Dockerfile**
```dockerfile
# Stage 1: Build environment
FROM node:18-alpine AS builder
WORKDIR /build

# Install dependencies first (cached layer)
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force

# Copy source and build
COPY . .
RUN npm run build

# Stage 2: Production runtime
FROM node:18-alpine AS production

# Security: Run as non-root user
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nodejs -u 1001

WORKDIR /app

# Copy only necessary artifacts from builder
COPY --from=builder --chown=nodejs:nodejs /build/dist ./dist
COPY --from=builder --chown=nodejs:nodejs /build/node_modules ./node_modules
COPY --from=builder --chown=nodejs:nodejs /build/package.json ./

USER nodejs

EXPOSE 3000

# Health check for container orchestration
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD node -e "require('http').get('http://localhost:3000/health', (r) => {process.exit(r.statusCode === 200 ? 0 : 1)})"

CMD ["node", "dist/server.js"]
```

### Docker Security Best Practices
1.  **Non-Root User:** Never run as root inside containers (`USER` instruction).
2.  **Distroless Images:** Use Google's distroless or Alpine Linux for minimal attack surface.
3.  **Scan Images:** Integrate Trivy, Clair, or Snyk into CI pipelines.
4.  **Read-Only Filesystems:** Mount root filesystem as read-only where possible.
5.  **Drop Capabilities:** Remove unnecessary Linux capabilities (e.g., `NET_ADMIN`, `SYS_ADMIN`).

## 8.2 Kubernetes Architecture

Understanding Kubernetes requires understanding its distributed systems architecture, which separates control plane responsibilities from workload execution.

### 8.2.1 The Control Plane (Master Components)
The control plane manages the cluster's global state and makes decisions about scheduling and responding to cluster events. In managed Kubernetes services (EKS, AKS, GKE), the cloud provider manages these components; in self-managed clusters, you operate them.

**API Server (`kube-apiserver`):**
The front end of the control plane. All communication (internal and external) goes through the API Server. It validates and configures data for API objects (pods, services, deployments).

**etcd:**
A consistent and highly-available key-value store used as Kubernetes' backing store for all cluster data. It is the single source of truth for the cluster state. Losing etcd means losing your cluster state (though workloads continue running).

**Scheduler (`kube-scheduler`):**
Watches for newly created Pods with no assigned node, and selects a node for them to run on based on resource requirements, hardware/software/policy constraints, affinity/anti-affinity specifications, and data locality.

**Controller Manager (`kube-controller-manager`):**
Runs controller processes that regulate the cluster state:
*   **Node Controller:** Notices and responds when nodes go down.
*   **Replication Controller:** Maintains the correct number of pods for every replication controller object.
*   **Endpoints Controller:** Populates the Endpoints object (joins Services & Pods).
*   **Service Account & Token Controllers:** Create default accounts and API access tokens.

### 8.2.2 Worker Node Components
Nodes are the machines (virtual or physical) that run your applications.

**Kubelet:**
An agent that runs on each node in the cluster. It ensures that containers are running in a Pod. The kubelet takes a set of PodSpecs provided through various mechanisms (primarily the API Server) and ensures the containers described in those PodSpecs are running and healthy.

**Kube-Proxy:**
A network proxy that maintains network rules on nodes, enabling the Kubernetes Service abstraction. It handles network communications inside or outside of your cluster, forwarding requests to appropriate pods.

**Container Runtime:**
The software responsible for running containers. Kubernetes supports any runtime implementing the Container Runtime Interface (CRI):
*   **containerd:** Industry standard, lightweight (used by Docker internally).
*   **CRI-O:** Lightweight alternative designed specifically for Kubernetes.
*   **Docker Engine:** Still supported via dockershim (deprecated in 1.24, removed in 1.24+).

## 8.3 Core Kubernetes Objects

Kubernetes organizes resources through a declarative API. You describe desired state; Kubernetes controllers work to achieve and maintain that state.

### 8.3.1 Pods: The Atomic Unit
A Pod is the smallest deployable unit in Kubernetes, representing one or more containers that share storage and network resources.

**Pod Characteristics:**
*   **Ephemeral:** Pods are created and destroyed dynamically; they do not persist.
*   **Shared Context:** Containers in a pod share the same IP address, port space, and storage volumes.
*   **Single-Purpose:** Generally, one main container per pod (plus sidecars for logging, monitoring, or proxying).

**Code Snippet: Pod Definition**
```yaml
apiVersion: v1
kind: Pod
metadata:
  name: web-pod
  labels:
    app: web
    tier: frontend
spec:
  containers:
    - name: nginx
      image: nginx:1.25-alpine
      ports:
        - containerPort: 80
          protocol: TCP
      resources:
        requests:
          memory: "64Mi"
          cpu: "100m"
        limits:
          memory: "128Mi"
          cpu: "200m"
      livenessProbe:
        httpGet:
          path: /health
          port: 80
        initialDelaySeconds: 10
        periodSeconds: 10
      readinessProbe:
        httpGet:
          path: /ready
          port: 80
        initialDelaySeconds: 5
        periodSeconds: 5
      volumeMounts:
        - name: cache-volume
          mountPath: /var/cache/nginx
  volumes:
    - name: cache-volume
      emptyDir: {}
  restartPolicy: Always
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 1000
```

### 8.3.2 Deployments: Managing Replica Sets
Deployments provide declarative updates for Pods and ReplicaSets. They enable rolling updates, rollbacks, and scaling.

**Key Features:**
*   **Rolling Updates:** Gradually replace old pods with new ones, ensuring zero downtime.
*   **Rollback:** Revert to previous deployment revision if issues detected.
*   **Self-Healing:** If a pod fails, the Deployment controller creates a replacement to maintain desired replica count.

**Code Snippet: Production Deployment**
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
  labels:
    app: api
    version: v1.2.3
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1          # Can exceed desired count by 1 during update
      maxUnavailable: 0    # Never drop below desired count
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
        version: v1.2.3
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchExpressions:
                    - key: app
                      operator: In
                      values:
                        - api
                topologyKey: kubernetes.io/hostname
      containers:
        - name: api
          image: myregistry/api:v1.2.3
          imagePullPolicy: Always
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: api-secrets
                  key: database-url
            - name: REDIS_HOST
              valueFrom:
                configMapKeyRef:
                  name: api-config
                  key: redis-host
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "500m"
          livenessProbe:
            httpGet:
              path: /health/live
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 10
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
          startupProbe:
            httpGet:
              path: /health/startup
              port: 8080
            failureThreshold: 30
            periodSeconds: 10
      terminationGracePeriodSeconds: 60  # Time for graceful shutdown
```

### 8.3.3 Services: Networking and Discovery
Services expose applications running on Pods as network services, providing stable endpoints despite Pod churn.

**Service Types:**
*   **ClusterIP:** Internal cluster access only (default).
*   **NodePort:** Exposes service on each Node's IP at a static port.
*   **LoadBalancer:** Creates external load balancer (cloud provider specific).
*   **ExternalName:** Maps service to external DNS name (CNAME).

**Code Snippet: Service Definitions**
```yaml
# Internal service (ClusterIP)
apiVersion: v1
kind: Service
metadata:
  name: api-service
  labels:
    app: api
spec:
  type: ClusterIP
  selector:
    app: api
  ports:
    - name: http
      port: 80
      targetPort: 8080
      protocol: TCP
  sessionAffinity: None

# External LoadBalancer with annotations for cloud-specific features
---
apiVersion: v1
kind: Service
metadata:
  name: api-public
  annotations:
    # AWS-specific: Use NLB for TCP traffic
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
    # Azure-specific: Internal load balancer
    # service.beta.kubernetes.io/azure-load-balancer-internal: "true"
spec:
  type: LoadBalancer
  selector:
    app: api
  ports:
    - port: 443
      targetPort: 8080
  externalTrafficPolicy: Local  # Preserve client source IP

# Headless service for StatefulSets (direct pod access)
---
apiVersion: v1
kind: Service
metadata:
  name: db-headless
spec:
  clusterIP: None  # Headless
  selector:
    app: database
  ports:
    - port: 5432
```

### 8.3.4 ConfigMaps and Secrets: Configuration Management
Externalize configuration from container images to enable environment-specific settings without rebuilding.

**ConfigMaps:** For non-sensitive configuration (feature flags, URLs, logging levels).
**Secrets:** For sensitive data (passwords, tokens, certificates)—base64 encoded at rest, encrypted in etcd (when configured).

**Code Snippet: ConfigMap and Secret Usage**
```yaml
# ConfigMap for application configuration
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  namespace: production
data:
  # Key-value pairs
  LOG_LEVEL: "info"
  CACHE_TTL: "300"
  FEATURE_FLAGS: |
    {
      "newCheckout": true,
      "betaSearch": false
    }
  
  # File-like keys
  nginx.conf: |
    server {
      listen 80;
      location / {
        proxy_pass http://localhost:8080;
      }
    }

---
# Secret for sensitive data (values must be base64 encoded)
apiVersion: v1
kind: Secret
metadata:
  name: app-secrets
  namespace: production
type: Opaque
data:
  # echo -n 'supersecretpassword' | base64
  DB_PASSWORD: c3VwZXJzZWNyZXRwYXNzd29yZA==
  API_KEY: bXlzZWNyZXRhcGlrZXk=
  
  # TLS certificates
  tls.crt: LS0tLS1CRUdJTi...  # base64 encoded cert
  tls.key: LS0tLS1CRUdJTi...  # base64 encoded key

---
# Using ConfigMaps and Secrets in Pods
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-deployment
spec:
  template:
    spec:
      containers:
        - name: api
          image: myapp:latest
          env:
            # Direct value from ConfigMap
            - name: LOG_LEVEL
              valueFrom:
                configMapKeyRef:
                  name: app-config
                  key: LOG_LEVEL
            
            # Secret as environment variable
            - name: DB_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: app-secrets
                  key: DB_PASSWORD
                  optional: false  # Pod fails if secret doesn't exist
            
            # All keys from ConfigMap as env vars
            - name: CACHE_TTL
              valueFrom:
                configMapKeyRef:
                  name: app-config
                  key: CACHE_TTL
          
          # Mount ConfigMap as files
          volumeMounts:
            - name: config-volume
              mountPath: /etc/app/config
              readOnly: true
            - name: secret-volume
              mountPath: /etc/app/secrets
              readOnly: true
      
      volumes:
        - name: config-volume
          configMap:
            name: app-config
            items:
              - key: nginx.conf
                path: nginx.conf
              - key: FEATURE_FLAGS
                path: features.json
        
        - name: secret-volume
          secret:
            secretName: app-secrets
            defaultMode: 0400  # Read-only permissions
            items:
              - key: tls.crt
                path: cert.pem
              - key: tls.key
                path: key.pem
```

## 8.4 Managed Kubernetes Services

Running self-managed Kubernetes (installing `kubeadm` on EC2 instances) requires deep operational expertise: managing etcd backups, upgrading control plane versions, and configuring CNI networking. Managed services abstract this complexity.

### Amazon EKS (Elastic Kubernetes Service)
*   **Control Plane:** AWS manages API server, etcd, and scheduler across three AZs.
*   **Data Plane:** You manage worker nodes (EC2 or Fargate serverless).
*   **Add-ons:** VPC CNI, CoreDNS, kube-proxy managed by AWS.
*   **IAM Integration:** IAM Roles for Service Accounts (IRSA) allows fine-grained AWS API access from pods.

**EKS Cluster Creation:**
```bash
# Using eksctl (official CLI)
eksctl create cluster \
    --name production-cluster \
    --region us-east-1 \
    --node-type m6i.large \
    --nodes 3 \
    --nodes-min 2 \
    --nodes-max 10 \
    --managed \
    --asg-access \
    --external-dns-access \
    --full-ecr-access

# Using Terraform
module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 19.0"

  cluster_name    = "production-cluster"
  cluster_version = "1.28"

  cluster_endpoint_public_access  = true
  cluster_endpoint_private_access = true

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  # EKS Managed Node Groups
  eks_managed_node_groups = {
    general = {
      desired_size = 2
      min_size     = 1
      max_size     = 10

      instance_types = ["m6i.large"]
      capacity_type  = "ON_DEMAND"

      labels = {
        workload = "general"
      }

      taints = []  # Prevent pods from scheduling unless tolerated
    }

    spot = {
      desired_size = 1
      min_size     = 0
      max_size     = 5

      instance_types = ["m6i.large", "m5.large", "m5a.large"]
      capacity_type  = "SPOT"  # 70% cost savings

      labels = {
        workload = "batch"
      }

      taints = [{
        key    = "spot"
        value  = "true"
        effect = "NoSchedule"
      }]
    }
  }

  # AWS Auth ConfigMap for IAM integration
  manage_aws_auth_configmap = true

  aws_auth_roles = [
    {
      rolearn  = "arn:aws:iam::123456789:role/AdminRole"
      username = "admin"
      groups   = ["system:masters"]
    }
  ]

  tags = {
    Environment = "production"
  }
}
```

### Azure AKS (Azure Kubernetes Service)
*   **Control Plane:** Azure manages API server and etcd; free (you pay only for worker nodes).
*   **Integration:** Native Azure AD authentication, Azure Monitor for containers, Azure Policy for governance.
*   **Node Autoprovisioning:** Automatically selects VM sizes based on pod resource requirements.

**AKS Creation:**
```bash
# Create resource group
az group create --name myAKSResourceGroup --location eastus

# Create cluster with monitoring
az aks create \
    --resource-group myAKSResourceGroup \
    --name myAKSCluster \
    --node-count 3 \
    --enable-addons monitoring \
    --generate-ssh-keys \
    --node-vm-size Standard_DS2_v2 \
    --enable-cluster-autoscaler \
    --min-count 1 \
    --max-count 5

# Get credentials
az aks get-credentials --resource-group myAKSResourceGroup --name myAKSCluster
```

### Google GKE (Google Kubernetes Engine)
*   **Autopilot Mode:** Google manages nodes entirely; you pay per pod, not per provisioned VM.
*   **Standard Mode:** You manage node pools, but with advanced features like node auto-repair and auto-upgrade.
*   **Anthos:** Hybrid/multi-cloud Kubernetes platform for consistent operations across on-premises and cloud.

**GKE Creation:**
```bash
# Create cluster with autopilot
gcloud container clusters create-auto production-cluster \
    --region us-central1 \
    --release-channel regular \
    --enable-vertical-pod-autoscaling

# Or standard cluster with specific node configuration
gcloud container clusters create standard-cluster \
    --zone us-central1-a \
    --num-nodes 3 \
    --machine-type e2-standard-4 \
    --disk-size 100GB \
    --enable-autoscaling \
    --min-nodes 1 \
    --max-nodes 10 \
    --enable-autorepair \
    --enable-autoupgrade \
    --cluster-version "1.28"
```

## 8.5 Kubernetes Workload Resources

Beyond basic Pods, Kubernetes provides higher-level controllers for managing application lifecycles.

### 8.5.1 Deployments (Stateless Applications)
Deployments manage ReplicaSets, which ensure a specified number of pod replicas are running at all times.

**Key Capabilities:**
*   **Rolling Updates:** Gradual replacement of pods with new versions.
*   **Rollback:** Revert to previous revision if issues detected.
*   **Scaling:** Manual (`kubectl scale`) or automatic (Horizontal Pod Autoscaler).

**Code Snippet: Production Deployment Manifest**
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
  namespace: production
  labels:
    app: api
    version: v1.2.3
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%          # Can temporarily have 4 pods (3 + 25%)
      maxUnavailable: 0      # Never have fewer than 3 available
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
        version: v1.2.3
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
        prometheus.io/path: "/metrics"
    spec:
      serviceAccountName: api-service-account  # Least privilege IAM
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
        seccompProfile:
          type: RuntimeDefault
      
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: api
      
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchExpressions:
                    - key: app
                      operator: In
                      values:
                        - api
                topologyKey: kubernetes.io/hostname
      
      containers:
        - name: api
          image: myregistry/api:v1.2.3
          imagePullPolicy: Always
          
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
            - name: management
              containerPort: 8081
              protocol: TCP
          
          env:
            - name: NODE_ENV
              value: "production"
            - name: DB_HOST
              valueFrom:
                configMapKeyRef:
                  name: db-config
                  key: host
            - name: DB_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: db-credentials
                  key: password
                  optional: false
          
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
              ephemeral-storage: "1Gi"
            limits:
              memory: "512Mi"
              cpu: "500m"
              ephemeral-storage: "2Gi"
          
          volumeMounts:
            - name: tmp
              mountPath: /tmp
            - name: cache
              mountPath: /var/cache
          
          livenessProbe:
            httpGet:
              path: /health/live
              port: management
            initialDelaySeconds: 60
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 3
          
          readinessProbe:
            httpGet:
              path: /health/ready
              port: management
            initialDelaySeconds: 5
            periodSeconds: 5
            timeoutSeconds: 3
            failureThreshold: 3
          
          startupProbe:
            httpGet:
              path: /health/startup
              port: management
            initialDelaySeconds: 10
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 30  # 30 * 10 = 300s max startup time
          
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            runAsNonRoot: true
            runAsUser: 1000
            capabilities:
              drop:
                - ALL
      
      volumes:
        - name: tmp
          emptyDir: {}
        - name: cache
          emptyDir:
            sizeLimit: 500Mi
      
      terminationGracePeriodSeconds: 60
```

### 8.5.2 StatefulSets (Stateful Applications)
For applications requiring stable network identity and persistent storage (databases, message queues).

**Characteristics:**
*   **Ordered Deployment:** Pods created sequentially (0, 1, 2...).
*   **Stable Network ID:** Each pod gets a predictable name (`db-0`, `db-1`) and hostname.
*   **Persistent Storage:** Each pod maintains its own PersistentVolume even after rescheduling.

**Code Snippet: StatefulSet for PostgreSQL**
```yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: database
spec:
  serviceName: postgres-headless
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:15-alpine
          ports:
            - containerPort: 5432
              name: postgres
          env:
            - name: POSTGRES_USER
              valueFrom:
                secretKeyRef:
                  name: postgres-credentials
                  key: username
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgres-credentials
                  key: password
            - name: PGDATA
              value: /var/lib/postgresql/data/pgdata
          volumeMounts:
            - name: postgres-storage
              mountPath: /var/lib/postgresql/data
          resources:
            requests:
              memory: "512Mi"
              cpu: "500m"
            limits:
              memory: "1Gi"
              cpu: "1000m"
  volumeClaimTemplates:
    - metadata:
        name: postgres-storage
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: "fast-ssd"  # SSD-backed storage class
        resources:
          requests:
            storage: 10Gi

---
# Headless service for StatefulSet DNS
apiVersion: v1
kind: Service
metadata:
  name: postgres-headless
  namespace: database
spec:
  selector:
    app: postgres
  ports:
    - port: 5432
      name: postgres
  clusterIP: None  # Headless
  publishNotReadyAddresses: true  # Allow connection to initializing pods
```

### 8.5.3 DaemonSets and Jobs
**DaemonSet:** Ensures a copy of a pod runs on every (or selected) node. Used for log collectors (Fluentd), monitoring agents (Prometheus Node Exporter), or network proxies.

**Job:** Creates one or more pods that run to completion (batch processing). **CronJob:** Jobs scheduled on a cron schedule.

## 8.6 Helm: The Kubernetes Package Manager

Raw YAML manifests become unwieldy at scale. Helm packages Kubernetes resources into reusable, configurable units called **Charts**.

### Chart Structure
```
mychart/
├── Chart.yaml          # Metadata (name, version, dependencies)
├── values.yaml         # Default configuration values
├── charts/             # Sub-charts (dependencies)
├── templates/          # Kubernetes manifests with templating
│   ├── _helpers.tpl   # Named templates
│   ├── deployment.yaml
│   ├── service.yaml
│   ├── ingress.yaml
│   └── hpa.yaml       # Horizontal Pod Autoscaler
└── README.md
```

**Code Snippet: Helm Values and Templates**
```yaml
# values.yaml (Configuration)
replicaCount: 3

image:
  repository: myregistry/myapp
  tag: "1.2.3"
  pullPolicy: IfNotPresent

service:
  type: ClusterIP
  port: 80
  targetPort: 8080

ingress:
  enabled: true
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
  hosts:
    - host: api.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: api-tls
      hosts:
        - api.example.com

resources:
  limits:
    cpu: 1000m
    memory: 1Gi
  requests:
    cpu: 500m
    memory: 512Mi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 20
  targetCPUUtilizationPercentage: 60
  targetMemoryUtilizationPercentage: 70

nodeSelector:
  workload-type: general

tolerations: []

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
              - key: app.kubernetes.io/name
                operator: In
                values:
                  - myapp
          topologyKey: kubernetes.io/hostname
```

```yaml
# templates/deployment.yaml (Templated manifest)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "mychart.fullname" . }}
  labels:
    {{- include "mychart.labels" . | nindent 4 }}
spec:
  {{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
  {{- end }}
  selector:
    matchLabels:
      {{- include "mychart.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      annotations:
        # Force redeployment when config changes
        checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
        prometheus.io/scrape: "true"
        prometheus.io/port: "{{ .Values.service.targetPort }}"
      labels:
        {{- include "mychart.selectorLabels" . | nindent 8 }}
    spec:
      serviceAccountName: {{ include "mychart.serviceAccountName" . }}
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
      
      {{- with .Values.nodeSelector }}
      nodeSelector:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      
      {{- with .Values.affinity }}
      affinity:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      
      {{- with .Values.tolerations }}
      tolerations:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      
      containers:
        - name: {{ .Chart.Name }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          
          ports:
            - name: http
              containerPort: {{ .Values.service.targetPort }}
              protocol: TCP
          
          env:
            - name: ENVIRONMENT
              value: {{ .Values.environment }}
            {{- range .Values.env }}
            - name: {{ .name }}
              value: {{ .value }}
            {{- end }}
          
          envFrom:
            - configMapRef:
                name: {{ include "mychart.fullname" . }}-config
            - secretRef:
                name: {{ include "mychart.fullname" . }}-secrets
                optional: false
          
          resources:
            {{- toYaml .Values.resources | nindent 12 }}
          
          livenessProbe:
            httpGet:
              path: /health/live
              port: http
            initialDelaySeconds: 30
            periodSeconds: 10
          
          readinessProbe:
            httpGet:
              path: /health/ready
              port: http
            initialDelaySeconds: 5
            periodSeconds: 5
          
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop:
                - ALL
          
          volumeMounts:
            - name: tmp
              mountPath: /tmp
            - name: cache
              mountPath: /var/cache
      
      volumes:
        - name: tmp
          emptyDir: {}
        - name: cache
          emptyDir:
            sizeLimit: 500Mi
```

## 8.7 Advanced Kubernetes Patterns

### Horizontal Pod Autoscaler (HPA)
Automatically scales pod count based on CPU, memory, or custom metrics.

```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 100
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 70
    - type: Pods  # Custom metric
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "1000"
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5 min before scaling down
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 15
        - type: Pods
          value: 4
          periodSeconds: 15
      selectPolicy: Max
```

### Pod Disruption Budgets (PDB)
Ensure minimum availability during voluntary disruptions (node upgrades, cluster autoscaling).

```yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 2  # Or maxUnavailable: 1
  selector:
    matchLabels:
      app: api
```

### Network Policies
Firewall rules for pod-to-pod communication (zero-trust networking).

```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-network-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
    - Ingress
    - Egress
  ingress:
    # Only accept traffic from ingress-nginx
    - from:
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx
        - podSelector:
            matchLabels:
              app.kubernetes.io/name: ingress-nginx
      ports:
        - protocol: TCP
          port: 8080
    # Allow prometheus scraping from monitoring namespace
    - from:
        - namespaceSelector:
            matchLabels:
              name: monitoring
      ports:
        - protocol: TCP
          port: 8080
  egress:
    # Only allow egress to database on port 5432
    - to:
        - podSelector:
            matchLabels:
              app: postgres
      ports:
        - protocol: TCP
          port: 5432
    # Allow DNS resolution
    - to:
        - namespaceSelector: {}
      ports:
        - protocol: UDP
          port: 53
```

---

### Summary

In this chapter, we ascended from individual containers to orchestrated fleets. You mastered Docker image optimization through multi-stage builds and security hardening, understanding that containers are immutable artifacts requiring externalized configuration. We dissected Kubernetes architecture, distinguishing between the control plane's orchestration responsibilities and the data plane's execution duties. You learned to deploy production workloads using Deployments for stateless services and StatefulSets for persistent data stores, implementing health checks (liveness, readiness, startup) that enable self-healing systems. We explored the three major managed Kubernetes offerings—EKS, AKS, and GKE—understanding their unique features and IAM integration patterns. You practiced configuration management through ConfigMaps and Secrets, network isolation via Network Policies, and horizontal scaling through the Horizontal Pod Autoscaler. Finally, we introduced Helm as the package manager for Kubernetes, enabling templated, versioned deployments of complex applications.

Kubernetes is the operating system of the cloud-native world. With these skills, you can deploy microservices that scale automatically, heal from failures without human intervention, and update seamlessly without downtime. Yet for many workloads, even container orchestration is too heavy. When you need to execute code in response to events without managing servers, clusters, or containers at all, you need serverless computing.

**Next Up: Chapter 9 - Serverless Architectures**
In the next chapter, we will explore Function as a Service (FaaS) and event-driven architectures that eliminate infrastructure management entirely. You will learn to build applications that scale from zero to thousands of executions instantly, paying only for the milliseconds your code is actually running. We will design event-driven pipelines using message queues, stream processors, and function compositions that react to cloud events in real-time, completing your journey from infrastructure management to pure application logic.

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='7. cicd_pipelines_in_the_cloud.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='../4. advanced_cloud_architecture_and_patterns/9. serverless_architectures.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
