---

# Chapter 18: Integration and Scalability (Strangler Fig, Sidecar, API Gateway, Leader Election)

## Opening Context

Building a distributed system is not a one‑time effort. Systems evolve: legacy monoliths are modernised, new services are added, and existing ones must scale and integrate seamlessly. Moreover, as the system grows, common cross‑cutting concerns—such as logging, authentication, and routing—need to be handled consistently. This chapter presents four patterns that address these challenges:

1. **Strangler Fig Pattern** – A strategy for incrementally migrating a legacy system to a new one with minimal risk.
2. **Sidecar Pattern** – An approach to augment a service with additional capabilities without modifying its code.
3. **API Gateway Pattern** – A single entry point that routes requests, handles cross‑cutting concerns, and aggregates responses.
4. **Leader Election Pattern** – A mechanism to coordinate distributed processes by electing a single leader to perform certain tasks.

These patterns are essential for building evolvable, maintainable, and scalable distributed systems.

---

## 18.1 Strangler Fig Pattern

### Intent
*Gradually replace a legacy system by building a new system around it, piece by piece, until the old system is completely “strangled” and can be decommissioned.*

### The Problem

Replacing a large, critical legacy system in one big bang is risky. It often leads to prolonged downtime, data migration nightmares, and the possibility of complete failure. Teams need a way to modernize incrementally, allowing the new system to coexist with the old, and to switch users over gradually.

### The Solution: Strangler Fig

Named after the strangler fig vine that grows around a tree and eventually replaces it, this pattern advocates building a new system around the edges of the legacy system. Initially, the new system handles a small subset of functionality. Over time, more features are moved, and traffic is routed to the new system. Eventually, the old system is no longer needed and can be decommissioned.

**Key Steps**:
1. Identify a small piece of functionality that can be extracted (e.g., a single API endpoint or a UI component).
2. Build that functionality in the new system.
3. Route requests for that functionality to the new system (using a routing layer, API gateway, or load balancer).
4. Repeat for other pieces until the legacy system is empty.

#### Example: Migrating a Monolithic E‑commerce Platform

Assume we have a monolithic Java application that handles orders, products, and customers. We want to migrate to microservices gradually.

##### Step 1: Introduce a Routing Layer

Place a reverse proxy (e.g., NGINX, HAProxy) or an API gateway in front of the monolith. Initially, all requests go to the monolith.

```nginx
# nginx.conf (initial)
server {
    listen 80;
    location / {
        proxy_pass http://legacy-monolith;
    }
}
```

##### Step 2: Extract Product Service

Build a new product microservice (Node.js, for example) that exposes REST endpoints for product data.

```typescript
// product-service/app.ts
import express from 'express';
const app = express();

app.get('/products/:id', async (req, res) => {
  // Fetch product from new database
  const product = await db.findProduct(req.params.id);
  res.json(product);
});

app.listen(3001);
```

##### Step 3: Route Product Requests to the New Service

Update the routing layer to send requests starting with `/api/products` to the new service.

```nginx
# nginx.conf (updated)
server {
    listen 80;
    
    location /api/products/ {
        proxy_pass http://product-service:3001;
    }
    
    location / {
        proxy_pass http://legacy-monolith;
    }
}
```

##### Step 4: Extract Order Service

Similarly, extract order functionality into another service. The routing layer now sends `/api/orders` to the order service.

```nginx
location /api/orders/ {
    proxy_pass http://order-service:3002;
}
```

##### Step 5: Migrate Data and Dependencies

Over time, the new services need to own their data. This may involve data synchronization or migration strategies (e.g., dual writes, event-based sync). Once all traffic for a domain is on the new service, the legacy data can be migrated and the old tables decommissioned.

##### Step 6: Decommission the Monolith

When all functionality has been moved, the routing layer can be removed, and the monolith shut down.

### Benefits

- **Reduced risk** – Changes are small and incremental.
- **Continuous delivery** – New features can be deployed independently.
- **Parallel operation** – Old and new systems coexist, allowing fallback.

### Drawbacks

- **Temporary complexity** – The routing layer and data sync add operational overhead.
- **Transaction coordination** – Operations that span old and new systems may require careful handling (e.g., sagas).
- **Data consistency** – Eventually consistency may be needed during migration.

### When to Use

- Modernizing a large, critical legacy system.
- Gradually adopting microservices from a monolith.
- Any scenario where a big‑bang rewrite is too risky.

---

## 18.2 Sidecar Pattern

### Intent
*Augment a service with additional capabilities (e.g., logging, monitoring, networking) without modifying its code by deploying a helper process (the sidecar) alongside it.*

### The Problem

In a microservices architecture, services often need common functionality: logging, configuration updates, service discovery, circuit breaking, or TLS termination. Implementing these in every service leads to duplication and ties services to specific infrastructure libraries. Moreover, services written in different languages would need to reimplement these concerns.

### The Solution: Sidecar

Deploy a secondary container (the sidecar) alongside the main service container in the same pod (in Kubernetes) or on the same host. The sidecar shares the same lifecycle and can communicate with the main service via localhost. It can handle infrastructure concerns on behalf of the service.

Common use cases:
- **Service mesh sidecar** (Envoy, Linkerd) – Handles service discovery, load balancing, retries, and observability.
- **Logging sidecar** – Collects logs from the main container and ships them to a central system.
- **Configuration sidecar** – Watches a configuration repository and updates the main service’s config files.
- **TLS termination sidecar** – Terminates HTTPS and forwards plain HTTP to the main service.

#### Example: Logging Sidecar

Imagine a simple Node.js service that writes logs to stdout. We want to ship these logs to Elasticsearch. Instead of modifying the service code, we add a sidecar that tails the logs and forwards them.

**Main service (simplified)**:

```dockerfile
# Dockerfile for main service
FROM node:14
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
CMD ["node", "app.js"]
```

**Logging sidecar** (using Filebeat):

```yaml
# docker-compose excerpt for local development
services:
  app:
    build: .
    volumes:
      - ./logs:/app/logs   # share log directory
  filebeat:
    image: docker.elastic.co/beats/filebeat:7.10.0
    volumes:
      - ./logs:/logs
      - ./filebeat.yml:/usr/share/filebeat/filebeat.yml
    depends_on:
      - app
```

**filebeat.yml**:

```yaml
filebeat.inputs:
- type: log
  paths:
    - /logs/*.log
output.elasticsearch:
  hosts: ["elasticsearch:9200"]
```

In Kubernetes, both containers would be in the same pod, sharing a volume for logs.

#### Example: Service Mesh Sidecar (Conceptual)

With a service mesh like Istio, an Envoy proxy is injected as a sidecar. The main service communicates with other services via localhost, and the sidecar intercepts all traffic to handle routing, retries, and telemetry.

```yaml
# Kubernetes pod with sidecar (simplified)
apiVersion: v1
kind: Pod
metadata:
  name: my-service
spec:
  containers:
  - name: main-app
    image: my-app:latest
  - name: envoy-sidecar
    image: envoyproxy/envoy:latest
    ports:
    - containerPort: 15000
```

The main service can call another service via `http://localhost:15000` (if the sidecar is configured as a forward proxy), or the sidecar can intercept outgoing traffic transparently.

### Benefits

- **Separation of concerns** – Main service focuses on business logic; sidecars handle infrastructure.
- **Language independence** – Sidecars can be written in any language, regardless of the main service.
- **Reusability** – The same sidecar can be attached to many services.
- **Dynamic updates** – Sidecars can be updated independently (e.g., new logging configuration).

### Drawbacks

- **Resource overhead** – Each service now runs extra containers.
- **Communication complexity** – Services must communicate with sidecars (e.g., via localhost), and debugging can be trickier.
- **Deployment complexity** – Orchestrator must support sidecar patterns (e.g., pods in Kubernetes).

### When to Use

- When you have many services that need common infrastructure capabilities.
- When you want to decouple service code from infrastructure concerns.
- In a service mesh architecture.

---

## 18.3 API Gateway Pattern

### Intent
*Provide a single, unified entry point for a set of microservices, handling cross‑cutting concerns such as authentication, rate limiting, routing, and response aggregation.*

### The Problem

In a microservices architecture, clients (web, mobile, third‑party) would need to know the addresses of many services and make multiple requests to assemble a UI. This leads to:
- **Chatty communication** – Multiple round trips increase latency.
- **Cross‑cutting concerns duplication** – Each service would need to implement authentication, CORS, logging, etc.
- **Client‑side complexity** – Clients must handle service discovery and error handling.
- **Protocol/API versioning** – Changing service interfaces forces client updates.

### The Solution: API Gateway

The API Gateway sits between clients and services. It acts as a reverse proxy, routing requests to the appropriate service. It can also perform:

- **Authentication/Authorization** – Verify tokens, enforce policies.
- **Rate limiting** – Protect services from overload.
- **Request aggregation** – Combine multiple service responses into one.
- **Response transformation** – Convert between protocols (e.g., REST to gRPC) or data formats.
- **Caching** – Cache responses to reduce load.
- **Request logging and monitoring**.

#### Example: Simple API Gateway with Express

We'll build a minimal API gateway that routes to two services: `order-service` and `product-service`, and also provides an aggregated endpoint.

```typescript
// api-gateway.ts
import express from 'express';
import proxy from 'express-http-proxy';
import axios from 'axios';

const app = express();

// Authentication middleware (simplified)
app.use((req, res, next) => {
  const token = req.headers['authorization'];
  if (!token || !isValidToken(token)) {
    return res.status(401).json({ error: 'Unauthorized' });
  }
  next();
});

// Route to order service
app.use('/orders', proxy('http://order-service:3001', {
  proxyReqPathResolver: (req) => `/api${req.url}` // map /orders to /api/orders
}));

// Route to product service
app.use('/products', proxy('http://product-service:3002', {
  proxyReqPathResolver: (req) => `/api${req.url}`
}));

// Aggregated endpoint for a customer dashboard
app.get('/dashboard/:customerId', async (req, res) => {
  const customerId = req.params.customerId;
  try {
    // Fetch data from multiple services in parallel
    const [orders, products] = await Promise.all([
      axios.get(`http://order-service:3001/api/orders?customerId=${customerId}`),
      axios.get(`http://product-service:3002/api/products/recommended?customerId=${customerId}`)
    ]);
    
    res.json({
      orders: orders.data,
      recommendedProducts: products.data
    });
  } catch (error) {
    res.status(500).json({ error: 'Failed to fetch dashboard' });
  }
});

app.listen(3000, () => console.log('API Gateway running on port 3000'));
```

**Explanation**:
- The gateway uses `express-http-proxy` to forward requests to the appropriate microservice.
- It adds authentication middleware centrally.
- The `/dashboard` endpoint aggregates responses from two services, reducing round trips for the client.

#### API Gateway vs. BFF

Recall from Chapter 15 that **Backend for Frontend (BFF)** is a pattern where you have a dedicated gateway per client type. An API Gateway is often a single entry point for all clients, but it can be combined with BFFs. For example, you might have:
- An edge API Gateway that handles global concerns (auth, rate limiting).
- BFFs for web, mobile, and third‑party that sit behind the edge gateway and tailor responses.

#### Advanced Features

- **Service discovery integration** – The gateway can query a service registry (e.g., Consul, Eureka) to locate service instances.
- **Load balancing** – Distribute requests across multiple instances.
- **Canary releases** – Route a percentage of traffic to a new version.
- **Request/response transformation** – e.g., convert XML to JSON, or add/remove headers.

### Benefits

- **Centralized control** – Cross‑cutting concerns managed in one place.
- **Client simplification** – Clients talk to one endpoint.
- **Protocol translation** – Gateway can translate between client‑friendly protocols (HTTP/JSON) and internal protocols (gRPC, Thrift).
- **Resilience** – Can implement retries, circuit breakers, and timeouts centrally.

### Drawbacks

- **Single point of failure** – Must be highly available and scalable.
- **Potential bottleneck** – All traffic passes through it; performance is critical.
- **Increased complexity** – Gateway adds another component to deploy and maintain.
- **Development overhead** – May need to update gateway when services change.

### When to Use

- Any microservices architecture with multiple clients.
- When you need to enforce common policies (auth, rate limiting).
- To simplify client communication.

---

## 18.4 Leader Election Pattern

### Intent
*Coordinate distributed processes by ensuring that only one instance (the leader) performs certain tasks at any given time, with automatic failover if the leader fails.*

### The Problem

In a distributed system, multiple instances of a service may be running for scalability or fault tolerance. However, some tasks should only be performed by one instance at a time, such as:
- Running a background job (e.g., sending nightly emails).
- Managing a shared resource (e.g., coordinating a cache refresh).
- Acting as the primary in a primary‑replica setup.
- Performing housekeeping tasks (e.g., cleaning up old data).

Without leader election, multiple instances could duplicate work, cause conflicts, or waste resources.

### The Solution: Leader Election

Leader election algorithms allow a group of processes to elect one of them as the leader. The leader performs the exclusive tasks, while others stand by as replicas. If the leader fails, the remaining processes elect a new leader.

Common implementations use:
- **Distributed consensus algorithms** (Paxos, Raft) – used in systems like ZooKeeper, etcd, Consul.
- **Lease‑based election** – processes attempt to acquire a lock with a time‑to‑live; the holder is the leader.

#### Example: Leader Election using a Distributed Lock (Redis)

We can implement a simple leader election using Redis’s `SET NX` command with an expiry. Each instance tries to acquire a lock with a unique identifier. The one that succeeds becomes leader and must periodically renew the lock.

```typescript
// leader-election.ts
import Redis from 'ioredis';

export class LeaderElector {
  private redis: Redis;
  private lockKey: string = 'service:leader';
  private lockValue: string; // unique identifier for this instance
  private ttl: number = 30000; // lock validity in ms
  private renewInterval: NodeJS.Timeout | null = null;
  public isLeader: boolean = false;

  constructor(redisUrl: string, instanceId: string) {
    this.redis = new Redis(redisUrl);
    this.lockValue = instanceId;
  }

  async start(): Promise<void> {
    // Attempt to become leader immediately
    await this.tryAcquireLock();

    // Periodically attempt to acquire lock if not leader, or renew if leader
    setInterval(() => {
      if (this.isLeader) {
        this.renewLock();
      } else {
        this.tryAcquireLock();
      }
    }, this.ttl / 3);
  }

  private async tryAcquireLock(): Promise<void> {
    // SET NX: set only if key doesn't exist
    const result = await this.redis.set(this.lockKey, this.lockValue, 'PX', this.ttl, 'NX');
    if (result === 'OK') {
      console.log('Became leader');
      this.isLeader = true;
      this.onBecomeLeader();
    }
  }

  private async renewLock(): Promise<void> {
    // Use Lua script to atomically renew if we still hold the lock
    const script = `
      if redis.call("get", KEYS[1]) == ARGV[1] then
        return redis.call("pexpire", KEYS[1], ARGV[2])
      else
        return 0
      end
    `;
    const result = await this.redis.eval(script, 1, this.lockKey, this.lockValue, this.ttl);
    if (result === 0) {
      // Lock lost; someone else became leader
      console.log('Lost leadership');
      this.isLeader = false;
      this.onLostLeadership();
    }
  }

  private onBecomeLeader(): void {
    // Start the exclusive task (e.g., scheduler)
    this.startLeaderTask();
  }

  private onLostLeadership(): void {
    // Stop the exclusive task
    this.stopLeaderTask();
  }

  private startLeaderTask(): void {
    // e.g., schedule a cron job
    console.log('Starting leader task');
    // ...
  }

  private stopLeaderTask(): void {
    console.log('Stopping leader task');
    // ...
  }

  async shutdown(): Promise<void> {
    if (this.renewInterval) clearInterval(this.renewInterval);
    // Release lock if we are leader
    if (this.isLeader) {
      await this.redis.del(this.lockKey);
    }
    await this.redis.quit();
  }
}
```

**Explanation**:
- Each instance generates a unique ID (e.g., hostname + PID).
- The first instance to set the Redis key with `NX` becomes leader.
- The leader periodically renews the lock (using a Lua script to ensure atomicity) so that other instances cannot steal it.
- If the leader fails, the lock expires, and another instance acquires it.
- When an instance becomes leader, it starts the exclusive task; when it loses leadership, it stops.

#### Using a Consensus Service (etcd)

A more robust approach is to use etcd’s `concurrency` package, which implements leader election based on Raft.

```typescript
// Using etcd3 client for Node.js
import { Etcd3 } from 'etcd3';

const etcd = new Etcd3({ hosts: 'localhost:2379' });
const election = etcd.election('my-service');

async function campaign() {
  const campaign = election.campaign('my-value'); // value could be instance ID
  campaign.on('elected', () => {
    console.log('I am the leader!');
    // Start leader work
  });
  campaign.on('error', (err) => {
    console.error('Campaign error', err);
  });
}

campaign();
```

The etcd library handles lease management, re‑election, and failover.

### Leader Election Patterns in Practice

- **Kubernetes leader election** – Uses ConfigMap or Lease objects (via client‑go) for controllers.
- **ZooKeeper** – Uses ephemeral sequential nodes for leader election.
- **Database‑based** – Use a table with optimistic locking (e.g., `UPDATE ... WHERE leader = true AND version = X`), but this is less robust.

### Benefits

- **Fault tolerance** – If the leader fails, a new one is elected automatically.
- **Simplifies coordination** – Only one instance performs critical tasks.
- **Scales horizontally** – Replicas can handle other, stateless requests while the leader does the exclusive work.

### Drawbacks

- **Complexity** – Adds dependency on a coordination service (ZooKeeper, etcd, Redis) that must itself be highly available.
- **Potential split‑brain** – If the coordination service is partitioned, two nodes might think they are leaders (use consensus systems with strong consistency to avoid this).
- **Performance** – Leader election adds latency; not suitable for high‑frequency decisions.

### When to Use

- When you have tasks that must be executed by only one instance at a time.
- In systems that need automatic failover for a primary instance.
- For background job schedulers, cache warmers, or any singleton service.

---

## Chapter Summary

This chapter covered four essential patterns for integrating and scaling distributed systems:

1. **Strangler Fig Pattern** – Enables gradual migration from a legacy system to a new one by incrementally routing functionality and finally decommissioning the old system. It reduces risk and allows continuous delivery.

2. **Sidecar Pattern** – Augments services with cross‑cutting capabilities (logging, monitoring, networking) by deploying a helper process alongside the main service. This promotes separation of concerns and reusability.

3. **API Gateway Pattern** – Provides a single entry point for clients, handling cross‑cutting concerns (authentication, routing, aggregation). It simplifies clients and centralizes control but introduces a potential bottleneck.

4. **Leader Election** – Coordinates distributed processes to ensure that only one instance performs exclusive tasks, with automatic failover. It enables fault‑tolerant singleton services.

**Key Insight**: As distributed systems grow, patterns for integration and scalability become critical. They help manage complexity, enable evolution, and ensure reliability.

---

## Next Chapter Preview

**Chapter 19: Security Design Patterns (Authentication, Authorization, Secure Factory, Security Proxy)**

With the rise of distributed systems and cloud‑native applications, security must be woven into the design from the start. Chapter 19 explores patterns for building secure systems: **Authentication and Authorization patterns**, **Secure Factory and Builder** for safe object creation, **Security Proxy and Interceptor** for access control, and **Input Validation and Sanitization** to prevent injection attacks. You’ll learn how to apply these patterns to protect your applications in an increasingly hostile environment.

