# Project 3: Multi-Environment Enterprise Application

Enterprise-scale applications operate under stringent requirements: regulatory compliance (SOC 2, PCI-DSS, HIPAA), high availability SLAs, disaster recovery mandates, and strict change management. This project implements a production-grade CI/CD pipeline for a financial services application, demonstrating enterprise patterns including multi-environment GitOps, policy-as-code enforcement, automated compliance validation, and disaster recovery automation. We architect for resilience across multiple AWS regions with automated failover, implement strict security controls with zero-trust networking, and establish observability practices that satisfy audit requirements.

## P3.1 Requirements Analysis

**Application**: "Payment Processing Platform" - High-volume transaction processing
**Compliance Requirements**: PCI-DSS Level 1, SOC 2 Type II, GDPR
**Availability SLA**: 99.99% (52 minutes downtime/year)
**RTO/RPO**: 15 minutes / 5 minutes
**Deployment Strategy**: Blue/Green with automated rollback
**Environments**: Development, Staging, Production (multi-region)

**Architecture Decisions**:
- **Multi-region active-passive**: Primary in us-east-1, DR in us-west-2
- **Data residency**: EU data stays in eu-west-1 (GDPR compliance)
- **Encryption**: At-rest (KMS), in-transit (TLS 1.3), in-use (enclaves where applicable)
- **Network isolation**: Private subnets, VPC endpoints, no public internet for data plane
- **Audit requirements**: Immutable logs, 7-year retention, tamper-evident storage

**Risk Assessment**:
```yaml
risk_matrix:
  critical:
    - risk: "Data breach"
      mitigation: "Encryption, tokenization, network policies, WAF"
    - risk: "Service unavailability"
      mitigation: "Multi-region, auto-failover, circuit breakers"
  
  high:
    - risk: "Compliance violation"
      mitigation: "Automated policy enforcement, audit trails"
    - risk: "Insider threat"
      mitigation: "RBAC, least privilege, audit logging"
```

## P3.2 Infrastructure Setup

Terraform-based infrastructure as code with environment-specific modules.

**Directory Structure**:
```
infrastructure/
├── modules/
│   ├── eks-cluster/
│   ├── networking/
│   ├── security/
│   ├── compliance/
│   └── monitoring/
├── environments/
│   ├── dev/
│   ├── staging/
│   ├── production/
│   └── dr/
└── global/
    ├── iam/
    └── route53/
```

**Production EKS Module**:
```hcl
# modules/eks-cluster/main.tf
data "aws_caller_identity" "current" {}
data "aws_region" "current" {}

locals {
  cluster_name = "${var.environment}-${var.application}"
  common_tags = {
    Environment = var.environment
    Application = var.application
    ManagedBy   = "terraform"
    Compliance  = "pci-dss,soc2"
    CostCenter  = var.cost_center
  }
}

# VPC with private subnets only
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  name = local.cluster_name
  cidr = var.vpc_cidr

  azs             = var.availability_zones
  private_subnets = var.private_subnet_cidrs
  public_subnets  = var.public_subnet_cidrs  # For load balancers only

  enable_nat_gateway     = true
  single_nat_gateway     = false
  one_nat_gateway_per_az = true
  
  enable_dns_hostnames = true
  enable_dns_support   = true

  # VPC Flow Logs for compliance
  enable_flow_log                      = true
  create_flow_log_cloudwatch_iam_role  = true
  create_flow_log_cloudwatch_log_group = true
  flow_log_max_aggregation_interval    = 60

  private_subnet_tags = {
    "kubernetes.io/role/internal-elb" = "1"
    "kubernetes.io/cluster/${local.cluster_name}" = "owned"
  }

  public_subnet_tags = {
    "kubernetes.io/role/elb" = "1"
    "kubernetes.io/cluster/${local.cluster_name}" = "owned"
  }

  tags = local.common_tags
}

# EKS Cluster with security hardening
module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 19.0"

  cluster_name    = local.cluster_name
  cluster_version = "1.28"

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  # Private cluster endpoint only for production
  cluster_endpoint_public_access  = var.environment != "production"
  cluster_endpoint_private_access = true

  # Encryption at rest
  cluster_encryption_config = {
    provider_key_arn = aws_kms_key.eks.arn
    resources        = ["secrets"]
  }

  # Managed node groups with dedicated tenancy for compliance
  eks_managed_node_groups = {
    general = {
      desired_size = var.node_desired_size
      min_size     = var.node_min_size
      max_size     = var.node_max_size

      instance_types = var.instance_types
      capacity_type  = "ON_DEMAND"  # No spot for production PCI workloads

      ami_type = "AL2_x86_64"

      # Security hardening
      block_device_mappings = {
        xvda = {
          device_name = "/dev/xvda"
          ebs = {
            volume_size           = 100
            volume_type           = "gp3"
            encrypted             = true
            kms_key_id            = aws_kms_key.ebs.arn
            delete_on_termination = true
          }
        }
      }

      labels = {
        workload = "general"
        compliance = "pci-dss"
      }

      taints = []
    }
  }

  # IRSA for service accounts
  enable_irsa = true

  # Cluster security groups
  cluster_security_group_additional_rules = {
    ingress_nodes_ephemeral_ports_tcp = {
      description                = "Nodes on ephemeral ports"
      protocol                   = "tcp"
      from_port                  = 1025
      to_port                    = 65535
      type                       = "ingress"
      source_node_security_group = true
    }
  }

  tags = local.common_tags
}

# KMS key for EKS secrets
resource "aws_kms_key" "eks" {
  description             = "EKS Secret Encryption Key"
  deletion_window_in_days = 30
  enable_key_rotation     = true
  multi_region            = var.environment == "production"

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "Enable IAM User Permissions"
        Effect = "Allow"
        Principal = {
          AWS = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"
        }
        Action   = "kms:*"
        Resource = "*"
      },
      {
        Sid    = "Allow EKS Service"
        Effect = "Allow"
        Principal = {
          Service = "eks.amazonaws.com"
        }
        Action = [
          "kms:Encrypt",
          "kms:Decrypt",
          "kms:GenerateDataKey*",
          "kms:DescribeKey"
        ]
        Resource = "*"
      }
    ]
  })

  tags = local.common_tags
}

# KMS key for EBS volumes
resource "aws_kms_key" "ebs" {
  description             = "EBS Encryption Key"
  deletion_window_in_days = 30
  enable_key_rotation     = true

  tags = local.common_tags
}

# Security group for pods
resource "aws_security_group" "pods" {
  name        = "${local.cluster_name}-pods"
  description = "Security group for EKS pods"
  vpc_id      = module.vpc.vpc_id

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
    description = "Allow all outbound"
  }

  tags = merge(local.common_tags, {
    Name = "${local.cluster_name}-pods"
  })
}
```

**Network Isolation**:
```hcl
# modules/networking/main.tf
# PrivateLink endpoints for AWS services (no internet)
resource "aws_vpc_endpoint" "s3" {
  vpc_id       = var.vpc_id
  service_name = "com.amazonaws.${var.region}.s3"
  vpc_endpoint_type = "Gateway"
  route_table_ids   = var.private_route_table_ids

  tags = var.tags
}

resource "aws_vpc_endpoint" "ecr_api" {
  vpc_id              = var.vpc_id
  service_name        = "com.amazonaws.${var.region}.ecr.api"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = var.private_subnet_ids
  security_group_ids  = [aws_security_group.vpc_endpoints.id]
  private_dns_enabled = true

  tags = var.tags
}

resource "aws_vpc_endpoint" "ecr_dkr" {
  vpc_id              = var.vpc_id
  service_name        = "com.amazonaws.${var.region}.ecr.dkr"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = var.private_subnet_ids
  security_group_ids  = [aws_security_group.vpc_endpoints.id]
  private_dns_enabled = true

  tags = var.tags
}

resource "aws_vpc_endpoint" "secretsmanager" {
  vpc_id              = var.vpc_id
  service_name        = "com.amazonaws.${var.region}.secretsmanager"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = var.private_subnet_ids
  security_group_ids  = [aws_security_group.vpc_endpoints.id]
  private_dns_enabled = true

  tags = var.tags
}

# Security group for VPC endpoints
resource "aws_security_group" "vpc_endpoints" {
  name        = "${var.name}-vpc-endpoints"
  description = "Security group for VPC endpoints"
  vpc_id      = var.vpc_id

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = [var.vpc_cidr]
    description = "HTTPS from VPC"
  }

  tags = var.tags
}
```

## P3.3 Environment Configuration

Strict separation with GitOps-based promotion.

**Environment Promotion Flow**:
```yaml
# Promotion strategy
promotion:
  dev:
    auto_deploy: true
    requires_approval: false
    tests: [unit, integration]
  
  staging:
    auto_deploy: true
    requires_approval: false
    tests: [integration, e2e, security_scan]
    data: "anonymized_production"
  
  production:
    auto_deploy: false
    requires_approval: true
    approvers: ["sre-team", "security-team"]
    tests: [smoke, canary_analysis]
    data: "production"
    regions: ["us-east-1", "us-west-2"]
```

**Kustomize Structure**:
```
k8s/
├── base/
│   ├── payment-service/
│   │   ├── deployment.yaml
│   │   ├── service.yaml
│   │   ├── networkpolicy.yaml
│   │   └── kustomization.yaml
│   └── kustomization.yaml
├── overlays/
│   ├── dev/
│   │   ├── patch-resources.yaml
│   │   ├── configmap.yaml
│   │   └── kustomization.yaml
│   ├── staging/
│   │   ├── patch-resources.yaml
│   │   ├── externalsecret.yaml
│   │   └── kustomization.yaml
│   └── production/
│       ├── us-east-1/
│       │   ├── patch-replicas.yaml
│       │   ├── pdb.yaml
│       │   └── kustomization.yaml
│       ├── us-west-2/
│       │   ├── patch-dr.yaml
│       │   └── kustomization.yaml
│       └── kustomization.yaml
└── helm/
    └── payment-service/
        ├── Chart.yaml
        ├── values.yaml
        └── templates/
```

**Production Patch** (`k8s/overlays/production/us-east-1/patch-replicas.yaml`):
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
spec:
  replicas: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 0  # Zero-downtime
  template:
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - payment-service
              topologyKey: kubernetes.io/hostname
          - weight: 50
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - payment-service
              topologyKey: topology.kubernetes.io/zone
      containers:
      - name: payment-service
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "2000m"
        securityContext:
          runAsNonRoot: true
          runAsUser: 1000
          readOnlyRootFilesystem: true
          allowPrivilegeEscalation: false
          seccompProfile:
            type: RuntimeDefault
          capabilities:
            drop:
            - ALL
```

**Pod Disruption Budget**:
```yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: payment-service
spec:
  minAvailable: 51%  # Ensure majority remains during node maintenance
  selector:
    matchLabels:
      app: payment-service
```

## P3.4 Helm Chart Development

Enterprise-grade Helm chart with security defaults.

**Chart Structure**:
```yaml
# helm/payment-service/Chart.yaml
apiVersion: v2
name: payment-service
description: Payment Processing Service
type: application
version: 1.2.0
appVersion: "2.5.0"

dependencies:
  - name: postgresql
    version: 12.x.x
    repository: https://charts.bitnami.com/bitnami
    condition: postgresql.enabled
  
  - name: redis
    version: 17.x.x
    repository: https://charts.bitnami.com/bitnami
    condition: redis.enabled
```

**Values with Security Defaults** (`helm/payment-service/values.yaml`):
```yaml
# Security hardening by default
securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 1000
  readOnlyRootFilesystem: true
  allowPrivilegeEscalation: false
  seccompProfile:
    type: RuntimeDefault
  capabilities:
    drop:
      - ALL

podSecurityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 1000

# Network policies
networkPolicy:
  enabled: true
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: api-gateway
      ports:
        - protocol: TCP
          port: 8080
    - from:
        - namespaceSelector:
            matchLabels:
              name: monitoring
      ports:
        - protocol: TCP
          port: 8080
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: postgresql
      ports:
        - protocol: TCP
          port: 5432
    - to:
        - podSelector:
            matchLabels:
              app: redis
      ports:
        - protocol: TCP
          port: 6379

# Resource constraints
resources:
  limits:
    cpu: 1000m
    memory: 1Gi
  requests:
    cpu: 500m
    memory: 512Mi

# Service mesh integration
serviceMesh:
  enabled: true
  mtls:
    mode: STRICT
  sidecar:
    resources:
      limits:
        cpu: 500m
        memory: 256Mi
      requests:
        cpu: 100m
        memory: 128Mi

# Observability
metrics:
  enabled: true
  port: 8080
  path: /metrics
  serviceMonitor:
    enabled: true
    interval: 30s

tracing:
  enabled: true
  sampler:
    type: probabilistic
    param: 0.1  # 10% sampling in production
```

## P3.5 GitOps Setup with ArgoCD

Multi-cluster, multi-environment GitOps with automated sync and policy enforcement.

**ArgoCD Installation**:
```yaml
# argocd/install.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: argocd
---
# Using Helm for ArgoCD installation
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: argocd
  namespace: argocd
spec:
  interval: 5m
  chart:
    spec:
      chart: argo-cd
      version: "5.x.x"
      sourceRef:
        kind: HelmRepository
        name: argo
        namespace: flux-system
  values:
    global:
      domain: argocd.company.com
    
    configs:
      cm:
        # Resource customization
        resource.customizations: |
          apps/Deployment:
            health.lua: |
              -- Custom health check logic
          
        # OIDC integration
        oidc.config: |
          name: AzureAD
          issuer: https://login.microsoftonline.com/TENANT_ID/v2.0
          clientID: CLIENT_ID
          clientSecret: $oidc.azure.clientSecret
          requestedScopes: ["openid", "profile", "email", "groups"]
          requestedIDTokenClaims: {"groups": {"essential": true}}
      
      rbac:
        policy.default: role:readonly
        policy.csv: |
          p, role:admin, applications, *, */*, allow
          p, role:admin, clusters, get, *, allow
          p, role:developer, applications, get, production/*, allow
          p, role:developer, applications, sync, staging/*, allow
          g, "platform-team", role:admin
          g, "developers", role:developer
    
    server:
      ingress:
        enabled: true
        annotations:
          cert-manager.io/cluster-issuer: letsencrypt-prod
          nginx.ingress.kubernetes.io/ssl-redirect: "true"
        tls: true
      
      # Prometheus metrics
      metrics:
        enabled: true
    
    repoServer:
      # Enable CMP for custom tooling
      volumes:
        - name: custom-tools
          emptyDir: {}
      volumeMounts:
        - name: custom-tools
          mountPath: /usr/local/bin/ksops
          subPath: ksops
      initContainers:
        - name: download-tools
          image: alpine:3.8
          command: [sh, -c]
          args:
            - wget -O /custom-tools/ksops https://github.com/viaduct-ai/kustomize-sops/releases/download/v2.5.8/ksops_2.5.8_Linux_x86_64.tar.gz &&
              tar -xzf /custom-tools/ksops -C /custom-tools/
          volumeMounts:
            - name: custom-tools
              mountPath: /custom-tools
```

**App of Apps Pattern**:
```yaml
# argocd/apps/root-application.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: root-app
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: default
  source:
    repoURL: https://github.com/company/gitops.git
    targetRevision: HEAD
    path: argocd/apps
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
      allowEmpty: false
    syncOptions:
      - CreateNamespace=true
      - PrunePropagationPolicy=foreground
      - PruneLast=true
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m
```

**Environment Applications**:
```yaml
# argocd/apps/production-payment-service.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: production-payment-service
  namespace: argocd
  labels:
    environment: production
    service: payment-service
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: production
  source:
    repoURL: https://github.com/company/gitops.git
    targetRevision: HEAD
    path: k8s/overlays/production/us-east-1
    kustomize:
      images:
        - payment-service=123456789.dkr.ecr.us-east-1.amazonaws.com/payment-service:v2.5.0
  destination:
    server: https://production.us-east-1.company.com
    namespace: payment-service
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
      allowEmpty: false
    syncOptions:
      - CreateNamespace=true
      - PrunePropagationPolicy=foreground
      - PruneLast=true
      - RespectIgnoreDifferences=true
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m
    # Manual sync for production
    automated:
      prune: true
      selfHeal: true
  ignoreDifferences:
    - group: apps
      kind: Deployment
      jsonPointers:
        - /spec/replicas  # HPA manages replicas
```

## P3.6 CI Pipeline Design

Enterprise CI with compliance gates and artifact signing.

**Pipeline Architecture**:
```yaml
# .github/workflows/enterprise-ci.yml
name: Enterprise CI/CD

on:
  push:
    branches: [main, release/*]
  pull_request:
    branches: [main]

env:
  AWS_REGION: us-east-1
  ECR_REGISTRY: ${{ secrets.AWS_ACCOUNT_ID }}.dkr.ecr.us-east-1.amazonaws.com

jobs:
  # Stage 1: Fast feedback (< 2 minutes)
  lint-and-unit-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Java
        uses: actions/setup-java@v4
        with:
          java-version: '17'
          distribution: 'temurin'
      
      - name: Cache dependencies
        uses: actions/cache@v3
        with:
          path: ~/.m2
          key: ${{ runner.os }}-m2-${{ hashFiles('**/pom.xml') }}
      
      - name: Run tests
        run: ./mvnw test
      
      - name: Upload coverage
        uses: codecov/codecov-action@v3

  # Stage 2: Security and compliance (parallel)
  security-scan:
    runs-on: ubuntu-latest
    permissions:
      security-events: write
    steps:
      - uses: actions/checkout@v4
      
      - name: Run SAST (CodeQL)
        uses: github/codeql-action/init@v2
        with:
          languages: java
          queries: security-extended,security-and-quality
      - uses: github/codeql-action/analyze@v2
      
      - name: Dependency check
        uses: dependency-check/Dependency-Check_Action@main
        with:
          project: 'payment-service'
          path: '.'
          format: 'JSON'
      
      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: reports/dependency-check-report.sarif

  compliance-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Check for secrets
        uses: trufflesecurity/trufflehog@main
        with:
          path: ./
          base: main
          head: HEAD
      
      - name: License compliance
        uses: fossa-contrib/fossa-action@v2
        with:
          fossa-api-key: ${{ secrets.FOSSA_API_KEY }}
          github-token: ${{ github.token }}
      
      - name: Validate SBOM
        run: |
          # Generate and validate SBOM
          syft . -o spdx-json > sbom.json
          # Check for prohibited licenses
          jq -r '.packages[] | select(.licenseConcluded | contains("GPL")) | .name' sbom.json | \
            while read pkg; do
              echo "ERROR: GPL licensed package found: $pkg"
              exit 1
            done

  # Stage 3: Integration testing
  integration-test:
    needs: [lint-and-unit-test]
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:15
        env:
          POSTGRES_PASSWORD: postgres
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
      kafka:
        image: confluentinc/cp-kafka:latest
        ports:
          - 9092:9092
    steps:
      - uses: actions/checkout@v4
      
      - name: Run integration tests
        run: ./mvnw verify -P integration-tests
        env:
          SPRING_DATASOURCE_URL: jdbc:postgresql://localhost:5432/postgres
          SPRING_KAFKA_BOOTSTRAP_SERVERS: localhost:9092

  # Stage 4: Build and sign
  build-and-sign:
    needs: [security-scan, compliance-check, integration-test]
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
    steps:
      - uses: actions/checkout@v4
      
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/github-actions-ecr
          aws-region: ${{ env.AWS_REGION }}
      
      - name: Login to ECR
        uses: aws-actions/amazon-ecr-login@v2
      
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      
      - name: Build and push
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: |
            ${{ env.ECR_REGISTRY }}/payment-service:${{ github.sha }}
            ${{ env.ECR_REGISTRY }}/payment-service:latest
          cache-from: type=gha
          cache-to: type=gha,mode=max
          sbom: true
          provenance: true
      
      - name: Sign image
        uses: sigstore/cosign-installer@v3
      - run: |
          cosign sign \
            --yes \
            --key env://COSIGN_PRIVATE_KEY \
            ${{ env.ECR_REGISTRY }}/payment-service:${{ github.sha }}
        env:
          COSIGN_PRIVATE_KEY: ${{ secrets.COSIGN_PRIVATE_KEY }}
      
      - name: Verify signature
        run: |
          cosign verify \
            --key env://COSIGN_PUBLIC_KEY \
            ${{ env.ECR_REGISTRY }}/payment-service:${{ github.sha }}
        env:
          COSIGN_PUBLIC_KEY: ${{ secrets.COSIGN_PUBLIC_KEY }}

  # Stage 5: Deploy to staging
  deploy-staging:
    needs: [build-and-sign]
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - uses: actions/checkout@v4
      
      - name: Update GitOps repo
        run: |
          git clone https://x-access-token:${{ secrets.GITOPS_TOKEN }}@github.com/company/gitops.git
          cd gitops
          
          # Update image tag
          cd k8s/overlays/staging
          kustomize edit set image payment-service=${{ env.ECR_REGISTRY }}/payment-service:${{ github.sha }}
          
          git config user.name "GitHub Actions"
          git config user.email "actions@github.com"
          git add .
          git commit -m "Deploy payment-service ${{ github.sha }} to staging"
          git push

  # Stage 6: Production deployment (manual approval)
  deploy-production:
    needs: [deploy-staging]
    runs-on: ubuntu-latest
    environment: production  # Requires manual approval
    steps:
      - uses: actions/checkout@v4
      
      - name: Verify change ticket
        run: |
          # Check ServiceNow/ITSM for approved change
          curl -H "Authorization: Bearer ${{ secrets.SNOW_TOKEN }}" \
            "https://company.service-now.com/api/now/table/change_request/${{ github.event.inputs.change_ticket }}" | \
            jq -e '.result.state == "3"'  # State 3 = Approved
      
      - name: Create production PR
        uses: peter-evans/create-pull-request@v5
        with:
          token: ${{ secrets.GITOPS_TOKEN }}
          commit-message: "Deploy payment-service to production"
          title: "Production: Deploy payment-service ${{ github.sha }}"
          body: |
            ## Deployment Details
            - **Service**: payment-service
            - **Version**: ${{ github.sha }}
            - **Change Ticket**: ${{ github.event.inputs.change_ticket }}
            - **Staging Status**: ✅ Passed
            - **Security Scan**: ✅ Passed
            
            ## Checklist
            - [ ] Database migrations reviewed
            - [ ] Rollback plan documented
            - [ ] Monitoring dashboards checked
            
            /cc @sre-team @security-team
          branch: production/payment-service-${{ github.sha }}
          delete-branch: true
```

## P3.7 Automated Testing

Comprehensive testing strategy including chaos engineering.

**Test Pyramid**:
```yaml
testing:
  unit:
    coverage_target: 80%
    mutation_testing: true
  
  integration:
    contract_testing: true
    database_testing: true
    message_queue_testing: true
  
  e2e:
    browser_testing: true
    api_testing: true
  
  non_functional:
    performance: true
    security: true
    chaos: true
```

**Contract Testing with Pact**:
```java
// Contract test example
@Pact(consumer = "order-service", provider = "payment-service")
public RequestResponsePact createPaymentPact(PactDslWithProvider builder) {
    return builder
        .given("payment can be processed")
        .uponReceiving("a request to process payment")
        .path("/api/v1/payments")
        .method("POST")
        .body(new PactDslJsonBody()
            .stringType("orderId")
            .decimalType("amount")
            .stringType("currency"))
        .willRespondWith()
        .status(201)
        .body(new PactDslJsonBody()
            .stringType("paymentId")
            .stringValue("status", "PROCESSING"))
        .toPact();
}

@Test
@PactTestFor(pactMethod = "createPaymentPact")
void testProcessPayment(MockServer mockServer) {
    PaymentClient client = new PaymentClient(mockServer.getUrl());
    PaymentResponse response = client.processPayment(
        new PaymentRequest("order-123", new BigDecimal("100.00"), "USD"));
    
    assertEquals("PROCESSING", response.getStatus());
}
```

**Chaos Engineering**:
```yaml
# chaos-experiments/pod-failure.yaml
apiVersion: chaos-mesh.org/v1alpha1
kind: PodChaos
metadata:
  name: payment-service-pod-failure
  namespace: chaos-testing
spec:
  action: pod-failure
  mode: one
  duration: 5m
  selector:
    namespaces:
      - production
    labelSelectors:
      app: payment-service
  scheduler:
    cron: "@every 24h"  # Daily chaos
---
apiVersion: chaos-mesh.org/v1alpha1
kind: NetworkChaos
metadata:
  name: payment-network-delay
spec:
  action: delay
  mode: all
  selector:
    namespaces:
      - production
    labelSelectors:
      app: payment-service
  delay:
    latency: 100ms
    correlation: "50"
    jitter: 50ms
  duration: 10m
  scheduler:
    cron: "0 2 * * 1"  # Weekly on Monday 2am
```

## P3.8 Security Integration

Defense in depth with zero-trust networking.

**OPA Policy**:
```rego
# policies/allow-payment.rego
package kubernetes.admission

import rego.v1

# Deny privileged containers
deny contains msg if {
    input.request.kind.kind == "Pod"
    container := input.request.object.spec.containers[_]
    container.securityContext.privileged == true
    msg := sprintf("Container %s must not be privileged", [container.name])
}

# Require non-root
deny contains msg if {
    input.request.kind.kind == "Pod"
    not input.request.object.spec.securityContext.runAsNonRoot
    msg := "Pod must run as non-root"
}

# Require resource limits
deny contains msg if {
    input.request.kind.kind == "Deployment"
    container := input.request.object.spec.template.spec.containers[_]
    not container.resources.limits.memory
    msg := sprintf("Container %s must have memory limits", [container.name])
}

# PCI-DSS: No host networking
deny contains msg if {
    input.request.kind.kind == "Pod"
    input.request.object.spec.hostNetwork == true
    msg := "Host networking is not allowed (PCI-DSS requirement)"
}
```

**Falco Rules**:
```yaml
# falco/pci-dss-rules.yaml
- rule: Unauthorized Database Access
  desc: Detect access to payment database from non-payment pods
  condition: >
    spawned_process and
    container.name == "postgres" and
    user.name != "payment-service" and
    proc.name in (psql, pg_dump)
  output: >
    Unauthorized database access detected
    user=%user.name command=%proc.cmdline
    pod=%k8s.pod.name namespace=%k8s.ns.name
  priority: CRITICAL

- rule: Sensitive File Access
  desc: Access to files containing cardholder data
  condition: >
    open_read and
    fd.name contains "/var/lib/payment/cards" and
    not proc.name in (payment-service, backup-agent)
  output: >
    Sensitive data access
    file=%fd.name user=%user.name process=%proc.name
  priority: EMERGENCY
```

## P3.9 Disaster Recovery

Automated backup and cross-region failover.

**Backup Strategy**:
```yaml
# velero/backup-schedule.yaml
apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: payment-service-daily
  namespace: velero
spec:
  schedule: "0 2 * * *"  # Daily at 2am
  template:
    includedNamespaces:
      - payment-service
    excludedResources:
      - events
      - pods  # Recreated by deployment
    labelSelector:
      matchLabels:
        backup.velero.io/include: "true"
    storageLocation: aws-primary
    volumeSnapshotLocations:
      - aws-primary
    ttl: 720h0m0s  # 30 days
---
# AWS Backup for persistent volumes
apiVersion: backup.aws.upbound.io/v1beta1
kind: Plan
metadata:
  name: payment-service-volumes
spec:
  forProvider:
    region: us-east-1
    rule:
      - ruleName: daily-backup
        targetVaultName: payment-backup-vault
        schedule: cron(0 2 * * ? *)
        lifecycle:
          deleteAfter: 30
        copyActions:
          - destinationVaultArn: arn:aws:backup:us-west-2:ACCOUNT:backup-vault:dr-vault
```

**Automated Failover**:
```yaml
# failover-operator/failover.yaml
apiVersion: dr.company.com/v1
kind: Failover
metadata:
  name: payment-service-failover
spec:
  primary:
    cluster: production-us-east-1
    namespace: payment-service
  secondary:
    cluster: production-us-west-2
    namespace: payment-service
  
  healthChecks:
    - type: http
      endpoint: https://api.company.com/health
      interval: 10s
      timeout: 5s
      failureThreshold: 3
    
    - type: metric
      query: 'sum(rate(payment_errors_total[5m]))'
      threshold: 0.01
  
  failoverPolicy:
    automatic: true
    delay: 5m  # Wait 5 minutes before failover
    
  dataSync:
    database:
      replicationLagThreshold: 5s
    
  notifications:
    slack: "#incidents"
    pagerduty: "payment-oncall"
```

## P3.10 Project Summary

This enterprise project demonstrated:

**Compliance Architecture**:
- **PCI-DSS Level 1**: Network isolation, encryption, access controls, audit logging
- **SOC 2 Type II**: Change management, monitoring, incident response
- **GDPR**: Data residency, right to erasure, encryption

**Operational Excellence**:
- **Multi-region**: Active-passive with 15-minute RTO
- **Zero-trust**: mTLS, network policies, OPA admission control
- **Observability**: Distributed tracing, metrics, structured logging
- **Disaster Recovery**: Automated backups, cross-region replication, chaos testing

**Security Posture**:
- **Defense in depth**: WAF, network policies, runtime security (Falco)
- **Supply chain security**: Signed images, SBOM, vulnerability scanning
- **Secrets management**: External Secrets Operator, automatic rotation
- **Policy enforcement**: OPA/Gatekeeper for compliance-as-code

**Key Metrics**:
- Availability: 99.99% (4 nines)
- RTO: 15 minutes (automated failover)
- RPO: 5 minutes (synchronous replication)
- Deployment frequency: 10/day (with approval gates)
- Change failure rate: <5% (comprehensive testing)
- Security vulnerabilities: 0 critical in production

**Next Project**: Project 4 will address database-intensive applications with stateful workloads, implementing database migration strategies, backup/restore procedures, and data consistency patterns in CI/CD.

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='../1. foundations/3. development_environment_setup.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='4. database_intensive_app.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
