# Chapter 30: Deployment Strategies

While Chapter 29 established the conceptual foundation of Continuous Deployment, this chapter examines the tactical implementation of releasing software to production without service interruption. The deployment strategy selected determines the blast radius of potential failures, the speed of rollback, infrastructure costs, and the complexity of monitoring required.

Modern container orchestration platforms—particularly Kubernetes—provide primitive mechanisms supporting multiple deployment strategies. Understanding how to implement Recreate, Rolling Update, Blue-Green, Canary, and Shadow deployments enables teams to match technical capability with risk tolerance and business requirements.

## 30.1 Recreate Strategy

The Recreate strategy—also known as the "big bang" or "all-at-once" deployment—terminates the existing application version before deploying the new version. While conceptually simple, this approach introduces service downtime and significant risk.

### Mechanics

1. Scale current version (V1) to zero replicas
2. Wait for termination completion
3. Scale new version (V2) to desired replica count
4. Verify health

```yaml
# Kubernetes Recreate (default when no strategy specified)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 3
  strategy:
    type: Recreate  # Explicit specification
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
        version: v2.0.0
    spec:
      containers:
      - name: app
        image: myapp:v2.0.0
        ports:
        - containerPort: 8080
```

**Downtime Calculation:**
- Pod termination grace period: 30 seconds (default)
- New pod startup time: 10-60 seconds (application-dependent)
- **Total downtime:** 40-90 seconds minimum

### When to Use

**Appropriate Scenarios:**
- Development environments where downtime is acceptable
- Batch processing applications without real-time user traffic
- Maintenance windows with explicit downtime communication
- Applications requiring exclusive database locks during migration
- Legacy monoliths that cannot run multiple versions simultaneously

**Inappropriate for:**
- Customer-facing production services
- Microservices with high availability SLAs
- Real-time transaction processing systems

### Risk Profile

**Advantages:**
- Simplest implementation—no routing logic required
- Guaranteed consistency (only one version running at a time)
- No database compatibility concerns between versions
- Lowest infrastructure cost (no duplicate capacity)

**Disadvantages:**
- Complete service downtime during deployment
- No gradual rollout or validation capability
- All users impacted simultaneously if V2 fails
- Slow rollback (must terminate V2, restart V1)

### Implementation Script

```bash
#!/bin/bash
# recreate-deploy.sh - Manual recreate for non-Kubernetes environments

# Stop current version
docker-compose stop app
docker-compose rm -f app

# Database migrations (exclusive lock)
docker-compose run --rm app migrate

# Start new version
docker-compose up -d app

# Health verification
sleep 10
curl -f http://localhost/health || {
  echo "Deployment failed, rolling back..."
  docker-compose stop app
  docker-compose rm -f app
  docker-compose up -d app --scale app=0  # Scale down
  # Restore previous docker-compose.yml and restart
  git checkout HEAD~1 -- docker-compose.yml
  docker-compose up -d app
  exit 1
}
```

## 30.2 Rolling Update Strategy

Rolling Updates gradually replace old application instances with new ones, maintaining service availability throughout the deployment. Kubernetes Deployments implement this natively as the default strategy.

### Mechanics

Kubernetes replaces pods incrementally based on configurable parameters:

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%        # Maximum pods above desired count (4 extra)
      maxUnavailable: 25%  # Maximum pods below desired count (2 unavailable)
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
        version: v2.0.0
    spec:
      containers:
      - name: app
        image: myapp:v2.0.0
        readinessProbe:  # Critical for rolling updates
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
```

**Update Process:**
1. Create new ReplicaSet with V2 image
2. Scale up new ReplicaSet by 1 (or maxSurge percentage)
3. Wait for new pod to report Ready (readiness probe passes)
4. Scale down old ReplicaSet by 1
5. Repeat until all pods updated
6. Old ReplicaSet retained for rollback (revision history)

### Configuration Parameters

**maxSurge:**
- Absolute number or percentage of replicas
- Controls how many extra pods can exist during update
- Higher = faster deployment, more resource consumption
- Example: 10 replicas with maxSurge 30% = up to 13 pods temporarily

**maxUnavailable:**
- Absolute number or percentage of replicas
- Controls minimum availability during update
- Higher = faster deployment, lower availability
- Example: 10 replicas with maxUnavailable 25% = minimum 7 pods available

**Common Patterns:**
- Conservative: `maxSurge: 1, maxUnavailable: 0` (zero downtime, slow)
- Balanced: `maxSurge: 25%, maxUnavailable: 25%` (default)
- Aggressive: `maxSurge: 100%, maxUnavailable: 50%` (fast, risky)

### Monitoring Rolling Updates

```bash
# Watch rollout progress
kubectl rollout status deployment/myapp --timeout=300s

# Check replica distribution
kubectl get pods -l app=myapp --show-labels

# View rollout history
kubectl rollout history deployment/myapp

# Pause rollout if issues detected
kubectl rollout pause deployment/myapp

# Resume after fix
kubectl rollout resume deployment/myapp

# Rollback to previous version
kubectl rollout undo deployment/myapp
# Or to specific revision
kubectl rollout undo deployment/myapp --to-revision=3
```

### Database Considerations

Rolling updates require backward compatibility during the transition period when both V1 and V2 coexist:

**Forward-Compatible Schema Changes:**
```sql
-- Safe for rolling updates: Add nullable column
ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- V1 ignores column, V2 uses it
-- Safe because nullable doesn't break V1
```

**Dangerous Changes (Require coordination):**
```sql
-- DANGEROUS: Drop column (breaks V1)
ALTER TABLE users DROP COLUMN email;

-- DANGEROUS: Rename column (V1 breaks)
ALTER TABLE users RENAME COLUMN full_name TO name;

-- DANGEROUS: Non-backward compatible type change
ALTER TABLE orders ALTER COLUMN total TYPE INTEGER;
```

## 30.3 Blue/Green Deployment

Blue/Green deployment maintains two identical production environments (Blue = current, Green = new). Traffic switches instantaneously from Blue to Green after validation, enabling zero-downtime releases and immediate rollback.

### Architecture

```yaml
# Green Deployment (new version)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-green
  labels:
    app: myapp
    version: green
    release: v2.0.0
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: green
  template:
    metadata:
      labels:
        app: myapp
        version: green
    spec:
      containers:
      - name: app
        image: myapp:v2.0.0
---
# Blue Deployment (current version - already running)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-blue
  labels:
    app: myapp
    version: blue
    release: v1.9.0
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: blue
  template:
    metadata:
      labels:
        app: myapp
        version: blue
    spec:
      containers:
      - name: app
        image: myapp:v1.9.0
---
# Service initially points to Blue
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp
    version: blue  # Current production
  ports:
  - port: 80
    targetPort: 8080
```

### Traffic Switch Mechanisms

**Kubernetes Service Selector Switch:**
```bash
# Switch traffic to Green (instant)
kubectl patch service myapp -p '{"spec":{"selector":{"version":"green"}}}'

# Verify Green health
kubectl get pods -l version=green

# If issues, instant rollback to Blue
kubectl patch service myapp -p '{"spec":{"selector":{"version":"blue"}}}'
```

**Ingress/Load Balancer Approach:**
```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp
  annotations:
    nginx.ingress.kubernetes.io/canary: "false"
spec:
  rules:
  - host: app.company.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp-green  # Switch between blue/green services
            port:
              number: 80
```

**DNS Switching (Cloud Load Balancers):**
```bash
# AWS Route53 weighted routing
aws route53 change-resource-record-sets \
  --hosted-zone-id Z123456789 \
  --change-batch '{
    "Changes": [{
      "Action": "UPSERT",
      "ResourceRecordSet": {
        "Name": "app.company.com",
        "Type": "CNAME",
        "TTL": 60,
        "SetIdentifier": "Green",
        "Weight": 100,
        "ResourceRecords": [{"Value": "green-lb.amazonaws.com"}]
      }
    }]
  }'
```

### Automated Blue/Green with Argo Rollouts

Argo Rollouts controller automates Blue/Green logic:

```yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: myapp
spec:
  replicas: 5
  strategy:
    blueGreen:
      activeService: myapp-active      # Production service
      previewService: myapp-preview    # Staging/validation service
      autoPromotionEnabled: false      # Manual gate
      autoPromotionSeconds: 300        # Or auto-promote after 5 min
      maxUnavailable: 0
      scaleDownDelaySeconds: 30        # Keep blue alive for 30s after switch
  selector:
    matchLabels:
      app: myapp
  template:
    spec:
      containers:
      - name: app
        image: myapp:v2.0.0
```

**Promotion Workflow:**
```bash
# Deploy (creates green/preview)
kubectl apply -f rollout.yaml

# Test preview service
curl http://myapp-preview/health

# Promote to active (switches service selectors)
kubectl argo rollouts promote myapp

# Watch switch
kubectl argo rollouts get rollout myapp --watch
```

### Cost and Resource Considerations

Blue/Green requires double the production capacity during deployment:

**Resource Calculation:**
- Normal: 10 pods × 1 CPU × $10/pod = $100/day
- During Blue/Green: 20 pods = $200/day
- Deployment window: 1 hour = ~$4 extra cost per deployment

**Mitigation Strategies:**
- Use Horizontal Pod Autoscaler to scale down Blue after switch
- Run Green on spot/preemptible instances during validation
- Use KEDA to scale Blue to zero after promotion delay

## 30.4 Canary Deployment

Canary deployment routes a small percentage of production traffic to the new version, monitoring metrics before progressively increasing traffic. This minimizes blast radius while validating real-world behavior.

### Progressive Traffic Shifting

**Manual Canary (Kubernetes Service + HPA):**
```yaml
# Stable (V1) - 90% traffic
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-stable
spec:
  replicas: 9
  template:
    spec:
      containers:
      - image: myapp:v1.9.0
---
# Canary (V2) - 10% traffic
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-canary
spec:
  replicas: 1
  template:
    spec:
      containers:
      - image: myapp:v2.0.0
---
# Service selects both
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp  # Selects both stable and canary
  ports:
  - port: 80
```

**Automated Canary with Flagger:**
Flagger automates metric analysis and promotion:

```yaml
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: myapp
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  service:
    port: 80
    gateways:
    - istio-gateway
    hosts:
    - app.company.com
  analysis:
    interval: 1m
    threshold: 5           # Max failed checks before rollback
    maxWeight: 50          # Max 50% traffic to canary
    stepWeight: 10         # Increase by 10% each interval
    metrics:
    - name: request-success-rate
      thresholdRange:
        min: 99
      interval: 1m
    - name: request-duration
      thresholdRange:
        max: 500
      interval: 1m
    - name: error-rate
      threshold: 1.0        # Rollback if error rate > 1%
      query: |
        sum(rate(http_requests_total{status=~"5.."}[1m])) 
        / 
        sum(rate(http_requests_total[1m]))
  webhooks:
    - name: load-test
      url: http://flagger-loadtester.test/
      timeout: 5s
      metadata:
        cmd: "hey -z 1m -q 10 -c 2 http://myapp-canary/"
    - name: conformance-tests
      type: pre-rollout
      url: http://flagger-loadtester.test/
      timeout: 30s
      metadata:
        cmd: "test -v ./tests/..."
```

**Canary Progression:**
1. 0% → 10% (wait 1 minute, check metrics)
2. 10% → 20% (wait 1 minute, check metrics)
3. 20% → 30% (continue until maxWeight or failure)
4. On failure: Automatically rollback to 0% canary
5. On success: Promote canary to primary (100%)

### Header-Based Routing (Selective Canary)

Route specific users to canary based on headers or cookies:

```yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp
spec:
  hosts:
  - app.company.com
  http:
  - match:
    - headers:
        x-canary:
          exact: "true"
    route:
    - destination:
        host: myapp-canary
      weight: 100
  - route:
    - destination:
        host: myapp-stable
      weight: 95
    - destination:
        host: myapp-canary
      weight: 5
```

**Usage:**
```bash
# Internal tester gets canary
curl -H "x-canary: true" https://app.company.com

# Regular user gets stable (95%) or canary (5%)
curl https://app.company.com
```

## 30.5 A/B Testing Deployments

A/B testing extends canary deployments with user segmentation—routing based on user properties rather than random percentages—to test business metrics (conversion rates, engagement) alongside technical metrics.

### Implementation Strategies

**Cookie-Based Segmentation (Ingress Nginx):**
```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-by-cookie: "experiment_abc"
    nginx.ingress.kubernetes.io/canary-weight: "0"
spec:
  rules:
  - host: app.company.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp-b  # Variant B
            port:
              number: 80
---
# Main ingress (Variant A)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-stable
spec:
  rules:
  - host: app.company.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp-a
            port:
              number: 80
```

**Istio Weighted Routing with User Attributes:**
```yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp
spec:
  hosts:
  - app.company.com
  http:
  - match:
    - headers:
        x-user-type:
          exact: "premium"
    route:
    - destination:
        host: myapp
        subset: v2
  - route:
    - destination:
        host: myapp
        subset: v1
      weight: 90
    - destination:
        host: myapp
        subset: v2
      weight: 10
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: myapp
spec:
  host: myapp
  subsets:
  - name: v1
    labels:
      version: v1
  - name: v2
    labels:
      version: v2
```

### Metrics Collection for A/B Tests

Track business metrics separately for each variant:

```yaml
# Prometheus recording rules
groups:
- name: ab_test_metrics
  interval: 30s
  rules:
  - record: ab_test:conversion_rate:1m
    expr: |
      sum(rate(checkouts_total{version="v2"}[1m])) 
      / 
      sum(rate(page_views_total{version="v2"}[1m]))
    labels:
      variant: "B"
  
  - record: ab_test:conversion_rate:1m
    expr: |
      sum(rate(checkouts_total{version="v1"}[1m])) 
      / 
      sum(rate(page_views_total{version="v1"}[1m]))
    labels:
      variant: "A"
```

**Statistical Significance:**
A/B tests require sufficient sample size and duration:
- Minimum 1 week to account for weekly patterns
- Statistical significance (p < 0.05) using chi-square test
- Minimum 1000 conversions per variant for valid results

## 30.6 Shadow Deployment

Shadow deployment—also called mirrored or dark launch—sends production traffic to the new version without returning responses to users. The new version processes real data but discards output, enabling safe load testing and behavior validation.

### Implementation with Istio

```yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: myapp
spec:
  hosts:
  - app.company.com
  http:
  - route:
    - destination:
        host: myapp
        subset: stable
      weight: 100
    mirror:
      host: myapp
      subset: shadow
    mirrorPercentage:
      value: 10.0  # Mirror 10% of traffic
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: myapp
spec:
  host: myapp
  subsets:
  - name: stable
    labels:
      version: v1
  - name: shadow
    labels:
      version: v2
```

**Shadow Considerations:**
- Shadow pods must not write to databases (read-only mode)
- External API calls should be mocked or suppressed
- Async message publishing should be disabled
- Response from shadow is ignored (timeout doesn't affect user)

**Shadow-Safe Application Pattern:**
```python
# Application checks if running in shadow mode
import os

IS_SHADOW = os.environ.get('DEPLOYMENT_TYPE') == 'shadow'

def process_order(order):
    if IS_SHADOW:
        # Validate but don't persist
        validate_order(order)
        log_shadow_metrics(order)
        return {"status": "shadow_processed"}
    
    # Normal production logic
    save_to_database(order)
    publish_event(order)
    return {"status": "processed"}
```

## 30.7 Choosing the Right Strategy

Strategy selection depends on availability requirements, risk tolerance, infrastructure capacity, and team maturity.

### Decision Matrix

| Strategy | Downtime | Rollback Speed | Risk Level | Infrastructure Cost | Complexity |
|----------|----------|----------------|------------|---------------------|------------|
| **Recreate** | High (minutes) | Slow (restart) | High | Low | Low |
| **Rolling** | None | Slow (re-roll) | Medium | Low | Low |
| **Blue/Green** | None | Instant | Low | High (2x) | Medium |
| **Canary** | None | Fast (traffic shift) | Low | Medium | High |
| **A/B Test** | None | Fast | Low | Medium | High |
| **Shadow** | None | N/A (no user impact) | Minimal | High (2x compute) | High |

### Selection Guidelines

**Choose Recreate when:**
- Development/non-production environments
- Batch jobs with defined windows
- Complete data migration requiring exclusive database locks
- Infrastructure constraints prevent multiple versions

**Choose Rolling Update when:**
- Simple stateless applications
- Backward compatibility guaranteed
- No complex database migrations
- Team is early in DevOps journey

**Choose Blue/Green when:**
- Zero downtime mandatory
- Instant rollback required (financial trading, medical systems)
- Infrastructure cost acceptable
- Database changes are backward-compatible

**Choose Canary when:**
- Risk mitigation is priority
- Gradual rollout acceptable
- Comprehensive monitoring in place (metrics, logs, traces)
- Automated rollback capabilities available

**Choose A/B Testing when:**
- Business metric validation required (conversion, engagement)
- Sufficient traffic for statistical significance
- Product management requires data-driven decisions

**Choose Shadow when:**
- Load testing production traffic patterns required
- Validating performance at scale without user impact
- Database writes can be safely suppressed or mocked

### Hybrid Strategies

**Ring Deployment (Progressive Exposure):**
Combine Blue/Green with Canary concepts:
1. Deploy to Internal Ring (employees only)
2. Deploy to Early Adopter Ring (beta users)
3. Deploy to General Availability (all users)
Each ring is a separate Blue/Green environment with distinct ingress routing.

**Feature Flag + Canary:**
Use canary for infrastructure deployment, feature flags for functional release:
```yaml
# Deploy V2 to 100% of pods via Rolling Update
# But features disabled via flags
# Gradually enable features via flag management UI
```

## 30.8 Strategy Comparison

### Technical Implementation Comparison

| Aspect | Rolling | Blue/Green | Canary |
|--------|---------|------------|--------|
| **Kubernetes Native** | Yes (Deployment strategy) | No (requires tooling) | No (requires Ingress/Service Mesh) |
| **Rollback Time** | 5-10 minutes | < 1 second | < 5 seconds |
| **Traffic Control** | All users get random mix | All users switched together | Percentage-based random |
| **Database Requirements** | Backward compatibility | Backward compatibility | Backward compatibility |
| **Monitoring Criticality** | Medium | Medium | High (automated decision) |
| **Typical Tools** | kubectl, Helm | Argo Rollouts, Spinnaker | Flagger, Argo Rollouts, Istio |

### Risk Mitigation Effectiveness

**Blast Radius (Users affected by bad deployment):**
- Recreate: 100% immediately
- Rolling: 100% gradually over duration
- Blue/Green: 0% (if validation passes) or 100% (if validation fails after switch)
- Canary: 1-10% initially, scalable based on confidence
- Shadow: 0% (no user-facing impact)

---

## Chapter Summary and Preview

In this chapter, we examined the spectrum of deployment strategies available for modern containerized applications. The Recreate strategy provides simplicity at the cost of downtime, suitable only for non-production or maintenance-window scenarios. Rolling Updates offer zero-downtime deployment with native Kubernetes support but limited risk mitigation—both versions coexist briefly, but all users potentially experience issues if the new version fails. Blue/Green deployment eliminates these concerns by maintaining parallel environments, enabling instantaneous traffic switching and rollback, though at double the infrastructure cost. Canary deployment represents the modern standard for risk-averse organizations, progressively shifting traffic while monitoring health metrics, automatically rolling back when error rates or latency exceed thresholds. A/B Testing extends canary concepts to validate business hypotheses, routing based on user segments rather than random percentages to measure conversion and engagement differences. Shadow deployment provides the safest validation method by mirroring production traffic to the new version without affecting users, though requiring careful application design to prevent duplicate side effects. The selection framework emphasizes matching strategy to availability requirements, with most production systems benefiting from Canary or Blue/Green patterns while simpler internal tools may suffice with Rolling Updates.

**Key Takeaways:**
- Never use Recreate for customer-facing production services; the downtime and instantaneous 100% user impact violate modern availability expectations
- Implement comprehensive readiness and liveness probes for Rolling Updates—without readiness probes, Kubernetes sends traffic to pods still initializing, causing errors during deployment
- Blue/Green requires double infrastructure capacity during the deployment window; ensure cluster autoscaling or resource quotas accommodate this, or use spot instances for the idle (Green) environment during validation
- Canary deployments require automated metric analysis—manual canary promotion defeats the purpose and introduces human error; implement Flagger, Argo Rollouts, or similar controllers
- Database schema changes must remain backward-compatible during rolling and canary deployments; never deploy breaking schema changes without first deploying application code that handles both old and new formats

**Next Chapter Preview:**
Chapter 31: Kubernetes Deployments explores the native Kubernetes abstractions that enable these strategies, examining Deployment controllers, ReplicaSets, rolling update configuration, revision history management, and rollback procedures. We will analyze advanced patterns including paused deployments for canary validation, custom rollout strategies with operators, and integration with service meshes for fine-grained traffic management. This chapter provides the practical implementation details for executing the deployment strategies defined here using Kubernetes-native mechanisms and ecosystem tools.