# **Chapter 8: Cloud-Native Architecture & Serverless**

Cloud-native architecture represents the modern approach to building and running applications that exploit the advantages of the cloud computing delivery model. This chapter explores containerization, orchestration with Kubernetes, serverless computing, and the operational practices that enable teams to deliver software faster and more reliably.

---

## **8.1 Introduction to Cloud-Native Architecture**

**Definition**: Cloud-native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.

**Core Principles**:
1. **Containerized**: Each part (applications, processes, etc.) is packaged in its own container
2. **Dynamically Managed**: Containers are actively scheduled and managed to optimize resource utilization
3. **Microservices-Oriented**: Applications are segmented into microservices
4. **DevOps**: Development and operations teams work together to deliver software faster

**Evolution of Deployment**:
```
Traditional Deployment          Virtualized Deployment          Container Deployment
┌─────────────────┐            ┌─────────────────┐            ┌─────────────────┐
│   Application   │            │   App A         │            │  ┌───────────┐  │
│                 │            │   App B         │            │  │ Container │  │
│   Binaries      │            │   App C         │            │  │   App A   │  │
│                 │            │   ┌───────────┐ │            │  └───────────┘  │
│   OS            │            │   │   Hyper-  │ │            │  ┌───────────┐  │
│                 │            │   │   visor   │ │            │  │ Container │  │
│   Hardware      │            │   └───────────┘ │            │  │   App B   │  │
│                 │            │   Host OS       │            │  └───────────┘  │
└─────────────────┘            │   Hardware      │            │  ┌───────────┐  │
                               └─────────────────┘            │  │ Container │  │
                                                              │  │   App C   │  │
                                                              │  └───────────┘  │
                                                              │                 │
                                                              │  Container      │
                                                              │  Runtime        │
                                                              │  (Docker)       │
                                                              │                 │
                                                              │  Host OS        │
                                                              │  Hardware       │
                                                              └─────────────────┘

Benefits of Containers:
- Lightweight (share OS kernel)
- Portable (run anywhere Docker runs)
- Isolated (process, network, filesystem isolation)
- Efficient (start in seconds, MBs not GBs)
```

---

## **8.2 Containerization with Docker**

Docker revolutionized application deployment by standardizing how applications are packaged and run. Understanding Docker fundamentals is essential for modern system design.

### **Docker Architecture**

```
┌─────────────────────────────────────────────────────────────┐
│                     Docker Architecture                      │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │              Docker Client (CLI)                     │   │
│  │         docker build, docker run, docker push        │   │
│  └─────────────────────┬───────────────────────────────┘   │
│                        │ REST API                          │
│                        ▼                                   │
│  ┌─────────────────────────────────────────────────────┐   │
│  │              Docker Daemon (dockerd)                 │   │
│  │  ┌───────────────┐  ┌───────────────┐              │   │
│  │  │   Images      │  │  Containers   │              │   │
│  │  │  (Read-only   │  │  (Running     │              │   │
│  │  │   layers)     │  │   instances)  │              │   │
│  │  └───────────────┘  └───────────────┘              │   │
│  │  ┌───────────────┐  ┌───────────────┐              │   │
│  │  │   Networks    │  │    Volumes    │              │   │
│  │  │  (Bridge,     │  │  (Persistent  │              │   │
│  │  │   Host,       │  │   storage)    │              │   │
│  │  │   Overlay)    │  │               │              │   │
│  │  └───────────────┘  └───────────────┘              │   │
│  └─────────────────────────────────────────────────────┘   │
│                        │                                   │
│                        ▼                                   │
│  ┌─────────────────────────────────────────────────────┐   │
│  │              Container Runtime                       │   │
│  │         (containerd, runc)                          │   │
│  └─────────────────────────────────────────────────────┘   │
│                        │                                   │
│                        ▼                                   │
│  ┌─────────────────────────────────────────────────────┐   │
│  │              Operating System                        │   │
│  │              (Linux Kernel)                          │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

### **Dockerfile Best Practices**

A Dockerfile defines how to build a container image. Following best practices ensures security, performance, and maintainability.

**Basic Dockerfile**:
```dockerfile
# Dockerfile for Python application
FROM python:3.11-slim

# Set working directory
WORKDIR /app

# Copy requirements first (for layer caching)
COPY requirements.txt .

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Expose port
EXPOSE 8000

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:8000/health || exit 1

# Run application
CMD ["python", "app.py"]
```

**Multi-Stage Build** (Production Optimization):
```dockerfile
# Stage 1: Build environment
FROM python:3.11 as builder

WORKDIR /app

# Install build dependencies
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

# Stage 2: Production environment (minimal)
FROM python:3.11-slim

WORKDIR /app

# Copy only necessary files from builder
COPY --from=builder /root/.local /root/.local
COPY . .

# Make sure scripts in .local are usable
ENV PATH=/root/.local/bin:$PATH

# Run as non-root user (security best practice)
RUN useradd -m appuser && chown -R appuser:appuser /app
USER appuser

EXPOSE 8000

CMD ["python", "app.py"]

# Benefits:
# - Smaller image size (no build tools in production)
# - More secure (no compiler tools exposed)
# - Faster deployments (smaller images)
```

**Security Hardening**:
```dockerfile
FROM python:3.11-alpine

# Update packages and install security patches
RUN apk update && apk upgrade && \
    apk add --no-cache curl && \
    rm -rf /var/cache/apk/*

WORKDIR /app

# Copy with specific permissions
COPY --chown=appuser:appuser requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY --chown=appuser:appuser . .

# Never run as root
RUN addgroup -g 1000 appgroup && \
    adduser -u 1000 -G appgroup -s /bin/sh -D appuser
USER appuser

# Read-only filesystem (immutable infrastructure)
# Mount tmpfs for temporary files if needed
# docker run --read-only --tmpfs /tmp:rw,noexec,nosuid,size=100m myapp

EXPOSE 8000

# Use exec form for proper signal handling
CMD ["python", "app.py"]
```

### **Container Images and Registries**

**Image Naming and Tagging**:
```bash
# Image naming convention
registry/repository:tag

# Examples
docker.io/library/python:3.11        # Docker Hub official
gcr.io/my-project/my-app:v1.2.3      # Google Container Registry
123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest  # AWS ECR

# Tagging strategy
my-app:1.0.0        # Semantic versioning
my-app:1.0          # Major.minor (floating)
my-app:latest       # Latest (use carefully in production)
my-app:git-sha      # Git commit SHA (immutable)
my-app:2024-01-15   # Date-based
```

**Working with Registries**:
```bash
# Build and tag
docker build -t my-app:1.0.0 .

# Tag for registry
docker tag my-app:1.0.0 gcr.io/my-project/my-app:1.0.0

# Push to registry
docker push gcr.io/my-project/my-app:1.0.0

# Pull from registry
docker pull gcr.io/my-project/my-app:1.0.0

# Scan for vulnerabilities
docker scan my-app:1.0.0
# or with Trivy
trivy image my-app:1.0.0
```

---

## **8.3 Kubernetes: Container Orchestration**

Kubernetes (K8s) has become the de facto standard for container orchestration. It automates deployment, scaling, and management of containerized applications.

### **Kubernetes Architecture**

```
┌─────────────────────────────────────────────────────────────┐
│                  Kubernetes Cluster                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │                 Control Plane (Master)               │   │
│  │                                                     │   │
│  │  ┌───────────────┐  ┌───────────────┐              │   │
│  │  │ API Server    │  │   etcd        │              │   │
│  │  │ (kube-apiserver)│  │ (Key-Value   │              │   │
│  │  │               │  │  Store)       │              │   │
│  │  │ - REST API    │  │ - Cluster     │              │   │
│  │  │ - Validation  │  │   state       │              │   │
│  │  │ - Authentication│  │ - Config    │              │   │
│  │  └───────────────┘  └───────────────┘              │   │
│  │                                                     │   │
│  │  ┌───────────────┐  ┌───────────────┐              │   │
│  │  │ Scheduler     │  │ Controller    │              │   │
│  │  │ (kube-        │  │ Manager       │              │   │
│  │  │  scheduler)   │  │ (kube-        │              │   │
│  │  │               │  │  controller-  │              │   │
│  │  │ - Assigns pods│  │  manager)     │              │   │
│  │  │   to nodes    │  │               │              │   │
│  │  │ - Resource    │  │ - Node        │              │   │
│  │  │   optimization│  │   controller  │              │   │
│  │  │               │  │ - Replication │              │   │
│  │  │               │  │   controller  │              │   │
│  │  └───────────────┘  └───────────────┘              │   │
│  └─────────────────────────────────────────────────────┘   │
│                           │                                 │
│                           │                                 │
│  ┌────────────────────────┼──────────────────────────────┐ │
│  │                 Worker Nodes                          │ │
│  │                        │                              │ │
│  │  ┌─────────────────────┼──────────────────────┐       │ │
│  │  │  Node 1             │                      │       │ │
│  │  │  ┌──────────────────┴──────────────────┐   │       │ │
│  │  │  │         kubelet                      │   │       │ │
│  │  │  │  (Agent, manages pods on node)       │   │       │ │
│  │  │  └──────────────────┬──────────────────┘   │       │ │
│  │  │                     │                      │       │ │
│  │  │  ┌──────────────────┴──────────────────┐   │       │ │
│  │  │  │         kube-proxy                   │   │       │ │
│  │  │  │  (Network proxy, load balancing)     │   │       │ │
│  │  │  └──────────────────┬──────────────────┘   │       │ │
│  │  │                     │                      │       │ │
│  │  │  ┌──────────┐ ┌──────────┐ ┌──────────┐   │       │ │
│  │  │  │   Pod    │ │   Pod    │ │   Pod    │   │       │ │
│  │  │  │ (App A)  │ │ (App B)  │ │ (App C)  │   │       │ │
│  │  │  └──────────┘ └──────────┘ └──────────┘   │       │ │
│  │  └───────────────────────────────────────────┘       │ │
│  │                                                      │ │
│  │  ┌───────────────────────────────────────────┐       │ │
│  │  │  Node 2                                   │       │ │
│  │  │  ┌──────────┐ ┌──────────┐               │       │ │
│  │  │  │   Pod    │ │   Pod    │               │       │ │
│  │  │  │ (App A)  │ │ (App B)  │               │       │ │
│  │  │  └──────────┘ └──────────┘               │       │ │
│  │  └───────────────────────────────────────────┘       │ │
│  └──────────────────────────────────────────────────────┘ │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

### **Core Kubernetes Objects**

**Pod**: The smallest deployable unit, containing one or more containers.

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: my-app-pod
  labels:
    app: my-app
    tier: frontend
spec:
  containers:
  - name: my-app
    image: my-app:1.0.0
    ports:
    - containerPort: 8000
    env:
    - name: DATABASE_URL
      value: "postgresql://db:5432/myapp"
    resources:
      requests:
        memory: "128Mi"
        cpu: "100m"
      limits:
        memory: "256Mi"
        cpu: "200m"
    livenessProbe:
      httpGet:
        path: /health
        port: 8000
      initialDelaySeconds: 30
      periodSeconds: 10
    readinessProbe:
      httpGet:
        path: /ready
        port: 8000
      initialDelaySeconds: 5
      periodSeconds: 5
```

**Deployment**: Manages Pod replicas and rolling updates.

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-deployment
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # Max pods above desired during update
      maxUnavailable: 0  # Max pods unavailable during update
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app
        image: my-app:1.0.0
        ports:
        - containerPort: 8000
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "200m"
```

**Service**: Exposes Pods as network services.

```yaml
apiVersion: v1
kind: Service
metadata:
  name: my-app-service
spec:
  type: ClusterIP  # Internal cluster access
  # type: LoadBalancer  # Expose externally (cloud)
  # type: NodePort      # Expose on node IP
  selector:
    app: my-app
  ports:
  - protocol: TCP
    port: 80        # Service port
    targetPort: 8000 # Container port
```

**ConfigMap**: Externalize configuration.

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: my-app-config
data:
  database_host: "postgres"
  database_port: "5432"
  log_level: "info"
  feature_flags: |
    {
      "new_checkout": true,
      "beta_feature": false
    }
---
# Usage in Pod
apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  containers:
  - name: my-app
    image: my-app:1.0.0
    env:
    - name: DATABASE_HOST
      valueFrom:
        configMapKeyRef:
          name: my-app-config
          key: database_host
    volumeMounts:
    - name: config
      mountPath: /app/config
  volumes:
  - name: config
    configMap:
      name: my-app-config
```

**Secret**: Store sensitive data (passwords, tokens).

```yaml
apiVersion: v1
kind: Secret
metadata:
  name: my-app-secrets
type: Opaque
stringData:
  database_password: "supersecretpassword"
  api_key: "sk-1234567890"
---
# Usage
env:
- name: DATABASE_PASSWORD
  valueFrom:
    secretKeyRef:
      name: my-app-secrets
      key: database_password
```

### **Scaling in Kubernetes**

**Horizontal Pod Autoscaler (HPA)**: Scale based on CPU/memory or custom metrics.

```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app-deployment
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5 min before scaling down
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
```

**Vertical Pod Autoscaler (VPA)**: Adjust CPU/memory requests/limits automatically.

```yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app-deployment
  updatePolicy:
    updateMode: "Auto"  # Auto, Off, Initial, Recreate
  resourcePolicy:
    containerPolicies:
    - containerName: my-app
      minAllowed:
        cpu: 50m
        memory: 100Mi
      maxAllowed:
        cpu: 1
        memory: 500Mi
      controlledResources: ["cpu", "memory"]
```

### **Storage in Kubernetes**

**PersistentVolume (PV)** and **PersistentVolumeClaim (PVC)**:

```yaml
# Storage Class (dynamic provisioning)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: kubernetes.io/gce-pd  # Cloud-specific
parameters:
  type: pd-ssd
  replication-type: regional
---
# Persistent Volume Claim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-pvc
spec:
  storageClassName: fast-ssd
  accessModes:
    - ReadWriteOnce  # RWO, ROX, RWX
  resources:
    requests:
      storage: 10Gi
---
# Usage in Pod
apiVersion: apps/v1
kind: StatefulSet  # Use StatefulSet for databases
metadata:
  name: postgres
spec:
  serviceName: postgres
  replicas: 1
  template:
    spec:
      containers:
      - name: postgres
        image: postgres:15
        volumeMounts:
        - name: postgres-storage
          mountPath: /var/lib/postgresql/data
      volumes:
      - name: postgres-storage
        persistentVolumeClaim:
          claimName: postgres-pvc
```

---

## **8.4 Serverless Architecture (FaaS)**

Serverless computing allows you to build and run applications without thinking about servers. The cloud provider manages the infrastructure, scaling, and maintenance.

### **Function-as-a-Service (FaaS)**

**Concept**: Write code as functions that run in response to events. You pay only for execution time (millisecond billing).

**AWS Lambda Example**:
```python
import json
import boto3
from decimal import Decimal

# AWS Lambda handler
def lambda_handler(event, context):
    """
    Process order creation event
    Triggered by: API Gateway, SQS, SNS, S3, etc.
    """
    try:
        # Parse input
        body = json.loads(event['body'])
        user_id = body['user_id']
        items = body['items']
        
        # Calculate total
        total = sum(item['price'] * item['quantity'] for item in items)
        
        # Save to DynamoDB
        dynamodb = boto3.resource('dynamodb')
        table = dynamodb.Table('orders')
        
        order = {
            'order_id': context.aws_request_id,
            'user_id': user_id,
            'items': items,
            'total': Decimal(str(total)),
            'status': 'created'
        }
        
        table.put_item(Item=order)
        
        # Publish event to SNS
        sns = boto3.client('sns')
        sns.publish(
            TopicArn='arn:aws:sns:us-east-1:123456789:order-created',
            Message=json.dumps(order)
        )
        
        return {
            'statusCode': 200,
            'body': json.dumps({
                'order_id': order['order_id'],
                'total': str(total)
            })
        }
        
    except Exception as e:
        return {
            'statusCode': 500,
            'body': json.dumps({'error': str(e)})
        }

# Deployment (SAM - Serverless Application Model)
# template.yaml
"""
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Resources:
  CreateOrderFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: src/
      Handler: app.lambda_handler
      Runtime: python3.11
      MemorySize: 512
      Timeout: 10
      Environment:
        Variables:
          TABLE_NAME: orders
      Events:
        CreateOrderApi:
          Type: Api
          Properties:
            Path: /orders
            Method: post
      Policies:
        - DynamoDBCrudPolicy:
            TableName: orders
        - SNSPublishMessagePolicy:
            TopicName: order-created
"""
```

**Cold Start Problem**:
```python
# Cold start: Time to initialize execution environment
# Factors affecting cold start:
# - Runtime (Python/Node faster than Java/C#)
# - Memory size (more memory = faster CPU = faster start)
# - VPC (VPC adds latency)
# - Dependencies (large packages slow startup)

# Mitigation Strategies:

# 1. Provisioned Concurrency (keep functions warm)
"""
aws lambda put-provisioned-concurrency-config \
  --function-name my-function \
  --qualifier PROD \
  --provisioned-concurrent-executions 100
"""

# 2. Optimization: Minimize dependencies
# requirements.txt (only what you need)
"""
boto3==1.28.0
# Don't include: numpy, pandas (unless needed)
"""

# 3. Lazy loading (load dependencies inside handler if possible)
# BAD: Loading at module level (every cold start)
import heavy_library  # Loaded on every cold start

# GOOD: Lazy loading
_heavy_lib = None

def get_heavy_lib():
    global _heavy_lib
    if _heavy_lib is None:
        import heavy_library
        _heavy_lib = heavy_library
    return _heavy_lib

def lambda_handler(event, context):
    lib = get_heavy_lib()  # Only loaded when needed
    # ...
```

**Azure Functions Example**:
```python
import azure.functions as func
import logging

# HTTP Trigger
def main(req: func.HttpRequest) -> func.HttpResponse:
    logging.info('Python HTTP trigger function processed a request.')
    
    name = req.params.get('name')
    if not name:
        try:
            req_body = req.get_json()
        except ValueError:
            pass
        else:
            name = req_body.get('name')
    
    if name:
        return func.HttpResponse(f"Hello, {name}. This HTTP triggered function executed successfully.")
    else:
        return func.HttpResponse(
            "This HTTP triggered function executed successfully. Pass a name in the query string or in the request body for a personalized response.",
            status_code=200
        )

# function.json
"""
{
  "scriptFile": "__init__.py",
  "bindings": [
    {
      "authLevel": "function",
      "type": "httpTrigger",
      "direction": "in",
      "name": "req",
      "methods": ["get", "post"]
    },
    {
      "type": "http",
      "direction": "out",
      "name": "$return"
    }
  ]
}
"""
```

---

## **8.5 Serverless Databases**

Serverless databases scale automatically and charge based on usage, perfect for variable workloads.

### **Amazon DynamoDB**

**Concept**: Fully managed NoSQL database with single-digit millisecond performance.

```python
import boto3
from boto3.dynamodb.conditions import Key, Attr

# Initialize
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('users')

# Write item
table.put_item(
    Item={
        'user_id': 'user_123',
        'email': 'alice@example.com',
        'name': 'Alice Johnson',
        'created_at': '2024-01-15T10:00:00Z'
    }
)

# Read item (eventually consistent by default)
response = table.get_item(
    Key={'user_id': 'user_123'},
    ConsistentRead=False  # Set True for strongly consistent
)
user = response.get('Item')

# Query (must use partition key)
response = table.query(
    KeyConditionExpression=Key('user_id').eq('user_123')
)

# Scan (expensive, avoid in production)
response = table.scan(
    FilterExpression=Attr('email').eq('alice@example.com')
)

# On-Demand vs Provisioned Capacity
"""
On-Demand:
- Pay per request
- No capacity planning
- Good for unpredictable workloads

Provisioned:
- Pay for capacity units (RCU/WCU)
- Auto Scaling available
- Cheaper for predictable workloads
"""
```

**DynamoDB Global Tables** (Multi-region):
```python
# DynamoDB Global Tables provide multi-region replication
# Automatic conflict resolution (last-write-wins)

# Create global table (AWS Console or CLI)
"""
aws dynamodb create-global-table \
  --global-table-name users \
  --replication-group RegionName=us-east-1 RegionName=eu-west-1 \
  --global-table-version 2019.11.21
"""

# Application uses nearest region (low latency)
# Changes replicate to other regions (< 1 second)
```

### **Amazon Aurora Serverless**

**Concept**: Auto-scaling relational database (MySQL/PostgreSQL compatible).

```yaml
# CloudFormation template for Aurora Serverless
AWSTemplateFormatVersion: '2010-09-09'
Resources:
  AuroraServerlessCluster:
    Type: AWS::RDS::DBCluster
    Properties:
      Engine: aurora-mysql
      EngineMode: serverless
      DatabaseName: myapp
      MasterUsername: admin
      MasterUserPassword: supersecret
      ScalingConfiguration:
        MinCapacity: 2    # ACU (Aurora Capacity Units)
        MaxCapacity: 64
        AutoPause: true
        SecondsUntilAutoPause: 300  # Pause after 5 min idle
      EnableHttpEndpoint: true  # Data API (no persistent connections needed)
```

**Use Cases**:
- **DynamoDB**: High throughput, low latency, simple access patterns, massive scale
- **Aurora Serverless**: Complex queries, transactions, relational data, variable workloads

---

## **8.6 Infrastructure as Code (IaC)**

IaC manages infrastructure through code rather than manual processes, enabling version control, automation, and consistency.

### **Terraform**

**Concept**: Declarative IaC tool supporting multiple cloud providers.

```hcl
# main.tf - Terraform configuration

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
  
  backend "s3" {
    bucket = "my-terraform-state"
    key    = "production/terraform.tfstate"
    region = "us-east-1"
  }
}

provider "aws" {
  region = var.aws_region
}

# Variables
variable "aws_region" {
  default = "us-east-1"
}

variable "app_name" {
  default = "my-app"
}

# VPC
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "${var.app_name}-vpc"
  }
}

# Subnets
resource "aws_subnet" "public" {
  count                   = 2
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.${count.index + 1}.0/24"
  availability_zone       = data.aws_availability_zones.available.names[count.index]
  map_public_ip_on_launch = true

  tags = {
    Name = "${var.app_name}-public-${count.index + 1}"
  }
}

# EKS Cluster
resource "aws_eks_cluster" "main" {
  name     = "${var.app_name}-cluster"
  role_arn = aws_iam_role.eks_cluster.arn

  vpc_config {
    subnet_ids = aws_subnet.public[*].id
  }

  depends_on = [aws_iam_role_policy_attachment.eks_cluster_policy]
}

# EKS Node Group
resource "aws_eks_node_group" "main" {
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "${var.app_name}-nodes"
  node_role_arn   = aws_iam_role.eks_nodes.arn
  subnet_ids      = aws_subnet.public[*].id

  scaling_config {
    desired_size = 3
    max_size     = 10
    min_size     = 1
  }

  instance_types = ["t3.medium"]

  depends_on = [
    aws_iam_role_policy_attachment.eks_worker_node_policy,
    aws_iam_role_policy_attachment.eks_cni_policy,
    aws_iam_role_policy_attachment.eks_container_registry,
  ]
}

# Outputs
output "cluster_endpoint" {
  value = aws_eks_cluster.main.endpoint
}

output "cluster_name" {
  value = aws_eks_cluster.main.name
}
```

**Terraform Workflow**:
```bash
# Initialize (download providers)
terraform init

# Plan (preview changes)
terraform plan -out=tfplan

# Apply (create infrastructure)
terraform apply tfplan

# Destroy (clean up)
terraform destroy

# State management
terraform state list          # List resources
terraform state show aws_vpc.main  # Show specific resource
```

### **Pulumi (Alternative to Terraform)**

**Concept**: IaC using familiar programming languages (Python, TypeScript, Go).

```python
# Pulumi Python example
import pulumi
from pulumi_aws import s3, ec2

# Create S3 bucket
bucket = s3.Bucket("my-bucket",
    website=s3.BucketWebsiteArgs(
        index_document="index.html",
    ))

# Create EC2 instance
security_group = ec2.SecurityGroup("web-secgrp",
    ingress=[
        ec2.SecurityGroupIngressArgs(
            protocol="tcp",
            from_port=80,
            to_port=80,
            cidr_blocks=["0.0.0.0/0"],
        ),
    ])

server = ec2.Instance("web-server",
    instance_type="t2.micro",
    security_groups=[security_group.name],
    ami="ami-0c55b159cbfafe1f0")

pulumi.export("bucket_name", bucket.id)
pulumi.export("server_ip", server.public_ip)
```

---

## **8.7 GitOps**

GitOps uses Git as the single source of truth for declarative infrastructure and applications. Changes are made via Git commits, and automated agents apply them to the cluster.

### **ArgoCD**

**Concept**: Declarative continuous delivery tool for Kubernetes.

**Architecture**:
```
Git Repository (Source of Truth)
    │
    │ Git Webhook / Polling
    ▼
┌─────────────────────────────────────┐
│           ArgoCD                    │
│  ┌───────────────┐ ┌─────────────┐ │
│  │  Application  │ │  Sync       │ │
│  │  Controller   │ │  Controller │ │
│  └───────────────┘ └─────────────┘ │
└──────────┬────────────────────────┘
           │ Apply manifests
           ▼
    ┌──────────────┐
    │ Kubernetes   │
    │ Cluster      │
    │ (Target)     │
    └──────────────┘
```

**ArgoCD Application Definition**:
```yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-app
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/my-app.git
    targetRevision: HEAD
    path: k8s/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true        # Remove resources not in Git
      selfHeal: true     # Correct drift from desired state
    syncOptions:
    - CreateNamespace=true
```

**Benefits of GitOps**:
1. **Version Control**: All changes tracked in Git (audit trail)
2. **Rollback**: Easy rollback to previous Git commits
3. **Drift Detection**: ArgoCD detects manual changes and corrects them
4. **Access Control**: Use Git permissions for deployment access
5. **Reproducibility**: Same manifests apply to dev, staging, prod

---

## **8.8 CI/CD for Cloud-Native Applications**

Continuous Integration and Continuous Deployment pipelines are essential for cloud-native development.

### **Pipeline Architecture**

```yaml
# .github/workflows/ci-cd.yml (GitHub Actions)
name: CI/CD Pipeline

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    
    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.11'
    
    - name: Install dependencies
      run: |
        pip install -r requirements.txt
        pip install pytest pytest-cov
    
    - name: Run tests
      run: pytest --cov=src --cov-report=xml
    
    - name: Upload coverage
      uses: codecov/codecov-action@v3
      with:
        file: ./coverage.xml

  build:
    needs: test
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    
    - name: Set up Docker Buildx
      uses: docker/setup-buildx-action@v2
    
    - name: Login to Container Registry
      uses: docker/login-action@v2
      with:
        registry: ghcr.io
        username: ${{ github.actor }}
        password: ${{ secrets.GITHUB_TOKEN }}
    
    - name: Build and push
      uses: docker/build-push-action@v4
      with:
        context: .
        push: true
        tags: |
          ghcr.io/${{ github.repository }}:${{ github.sha }}
          ghcr.io/${{ github.repository }}:latest
        cache-from: type=gha
        cache-to: type=gha,mode=max
    
    - name: Scan image
      uses: aquasecurity/trivy-action@master
      with:
        image-ref: ghcr.io/${{ github.repository }}:${{ github.sha }}
        format: 'sarif'
        output: 'trivy-results.sarif'

  deploy-staging:
    needs: build
    runs-on: ubuntu-latest
    environment: staging
    steps:
    - uses: actions/checkout@v3
    
    - name: Configure kubectl
      uses: azure/setup-kubectl@v3
    
    - name: Set context
      uses: azure/k8s-set-context@v3
      with:
        method: kubeconfig
        kubeconfig: ${{ secrets.KUBE_CONFIG_STAGING }}
    
    - name: Deploy to staging
      run: |
        kubectl set image deployment/my-app \
          app=ghcr.io/${{ github.repository }}:${{ github.sha }} \
          -n staging
        kubectl rollout status deployment/my-app -n staging

  deploy-production:
    needs: deploy-staging
    runs-on: ubuntu-latest
    environment: production  # Requires approval
    steps:
    - uses: actions/checkout@v3
    
    - name: Update GitOps repo
      run: |
        git clone https://x-access-token:${{ secrets.GITOPS_TOKEN }}@github.com/myorg/gitops.git
        cd gitops
        yq e -i '.spec.template.spec.containers[0].image = "ghcr.io/${{ github.repository }}:${{ github.sha }}"' \
          apps/my-app/production/deployment.yaml
        git add .
        git commit -m "Update my-app to ${{ github.sha }}"
        git push
```

### **Progressive Delivery**

**Canary Deployments**: Gradually shift traffic to new version.

```yaml
# Flagger (Progressive Delivery Controller)
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: my-app
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  service:
    port: 80
    targetPort: 8080
  analysis:
    interval: 30s
    threshold: 5
    maxWeight: 50
    stepWeight: 10
    metrics:
    - name: request-success-rate
      thresholdRange:
        min: 99
      interval: 1m
    - name: request-duration
      thresholdRange:
        max: 500
      interval: 1m
    webhooks:
    - name: load-test
      url: http://flagger-loadtester.test/
      timeout: 5s
      metadata:
        cmd: "hey -z 1m -q 10 -c 2 http://my-app-canary.production:80/"
```

---

## **8.9 Key Takeaways**

1. **Containers are the standard**: Docker provides consistent, portable, isolated environments. Master multi-stage builds for production optimization.

2. **Kubernetes is the operating system of the cloud**: Understand Pods, Deployments, Services, and ConfigMaps. Use HPA for automatic scaling.

3. **Serverless for variable workloads**: Use Lambda/Functions for event-driven, spiky workloads. Mitigate cold starts with provisioned concurrency.

4. **Choose the right database**: DynamoDB for high-scale NoSQL, Aurora Serverless for relational with variable load.

5. **Infrastructure as Code is mandatory**: Terraform or Pulumi ensure reproducible, versioned infrastructure. Never manually configure production.

6. **GitOps for Kubernetes**: Use ArgoCD or Flux to manage Kubernetes state from Git. Enables drift detection and easy rollbacks.

7. **Security by default**: Run containers as non-root, use read-only filesystems, scan images for vulnerabilities, rotate secrets automatically.

8. **Observability**: Cloud-native requires robust logging (ELK/Loki), metrics (Prometheus), and tracing (Jaeger/Tempo).

---

## **Chapter Summary**

In this chapter, we explored cloud-native architecture—the modern approach to building scalable, resilient systems. We covered containerization with Docker, including best practices for security and multi-stage builds.

We deep-dived into Kubernetes, the dominant container orchestration platform, understanding its architecture, core objects (Pods, Deployments, Services), and scaling mechanisms (HPA/VPA).

We examined serverless computing with AWS Lambda and Azure Functions, addressing the cold start problem and optimization strategies. Serverless databases (DynamoDB, Aurora Serverless) provide scalable data storage without operational overhead.

Infrastructure as Code (Terraform, Pulumi) enables reproducible infrastructure management, while GitOps (ArgoCD) provides declarative continuous delivery for Kubernetes.

Finally, we covered CI/CD pipelines for cloud-native applications, including container scanning, GitOps-based deployments, and progressive delivery strategies like canary deployments.

**Coming up next**: In Chapter 9, we'll explore Data-Intensive Systems—batch processing, stream processing, data pipelines, and the Lambda vs. Kappa architecture.

---

**Exercises**:

1. **Docker Optimization**: You have a Python application with requirements: Flask, NumPy, Pandas, scikit-learn. Write an optimized multi-stage Dockerfile that minimizes production image size.

2. **Kubernetes Deployment**: Design a Kubernetes deployment for a stateful application (PostgreSQL) requiring persistent storage, including backup considerations. What Kubernetes objects do you need?

3. **Serverless Cost Analysis**: Compare costs for:
   - A Lambda function running 10 million times/month (avg 200ms, 512MB RAM)
   - An EC2 instance (t3.medium) running 24/7
   At what invocation frequency does EC2 become cheaper?

4. **Terraform Module**: Write a reusable Terraform module for deploying an AWS S3 bucket with versioning, encryption, and lifecycle policies.

5. **GitOps Workflow**: Design a GitOps workflow using ArgoCD that:
   - Automatically deploys to dev on every commit
   - Requires manual approval for staging
   - Requires two approvals and automated canary analysis for production

---