✅ A research-grade, concurrent HTTP load balancer written in Go, built for the SIT315: Concurrent and Distributed Systems assessment. It demonstrates high-performance concurrency, resilience patterns, deep observability, and dynamic runtime control — far beyond a basic round‑robin proxy.
This project evolved through 21 iterations, each introducing a new concurrent or distributed systems concept. The final version supports classic round-robin and advanced, latency-aware selection using EWMA with power-of-two choices.
Tested with Go 1.23; compatible with Go 1.22+.
A load balancer distributes incoming requests across multiple backend servers to improve throughput, reduce tail latency, and increase availability. In a concurrent system, effective admission control and scheduling decisions under load are crucial to avoid overload collapse.
This project implements a fully concurrent, self-adaptive, fault-tolerant HTTP load balancer in Go. It combines back-pressure, adaptive concurrency, health management, and observability to maintain stable performance under stress while remaining dynamically configurable at runtime.
- Round-Robin (RR)
- Weighted Round-Robin (WRR)
- Least Connections (LC)
- Power-of-Two Choices with EWMA latency awareness (P2C-EWMA)
- Optional Sticky Sessions via IP-Hash
- AIMD (Additive Increase, Multiplicative Decrease) adaptive concurrency limiter targeting stable latency
- Per-client token-bucket rate limiting
- Global semaphore to bound admitted in-flight requests
- Active HTTP health checks with jitter and success/failure thresholds
- Circuit breaker (closed/half-open/open) per backend
- Passive outlier detection with quarantine and automatic recovery
- Warm-up (slow start) ramp after backend recovery, with per-backend concurrency caps
- Graceful drain/undrain for rolling maintenance
- Prometheus metrics at
/metrics - JSON metrics snapshot at
/admin/metrics/json - Per-backend EWMA latency gauge, histogram of observed latency, and in-flight counters
- Structured JSON access logs via
log/slogincludingreq_id, status, latency, backend, and policy - Periodic metrics dump (lb-metrics-*.log) for offline analysis and graphing adaptive behavior
- Add/Remove/List backends at runtime (no restart)
- Live toggle of strategies: RR, LC, WRR, P2C-EWMA, Sticky sessions
- Canary routing with percent rollout and per-target backend
- Per-backend concurrency cap and warm-reset helpers
- Handy endpoints:
/admin/selftest,/admin/backends,/admin/outliers,/admin/canary,/debug/config
- Predictive scaling advisory: warns when EWMA latency rises >15%
- Rolling metrics dumps enable trend analysis and capacity planning
High-level request flow:
Client ─▶ LB HTTP Server ─▶ Admission (global semaphore + rate limit) ─▶ Picker (RR/LC/WRR/P2C, Canary, Sticky) ─▶ Backend
│ │ │
├─ AIMD Controller (goroutine) ── updates soft cap ├─ Health/Outlier loops ─┘
└─ Structured logging + Metrics ─────────────────────────────────────────────
Components:
main.go: Orchestration, HTTP server, admin/API endpoints, metrics registration, AIMD controller, health loop, outlier monitor, structured logging, predictive advisory, periodic metrics dumping, readiness/health.serverpool.go: Thread-safe backend registry and load balancing algorithms (RR, LC, WRR, P2C-EWMA, sticky, canary). Snapshot-based iteration avoids holding locks during selection.backend.go: Backend health state, EWMA latency tracking, circuit breaker, warm-up window, per-backend concurrency cap.config.json: Static bootstrap configuration of backend URLs and optional weights.
Concurrency at a glance:
- Each request handled in its own goroutine.
- Admission uses a bounded channel (semaphore) and per-client token bucket.
- Atomics for EWMA latency, in-flight counts, breaker and health counters.
- Controllers run as independent goroutines: AIMD limiter, active HTTP health checks with jitter, outlier/quarantine monitor, periodic metrics log writer, predictive scaling advisory.
config.json (default provided):
{
"backends": [
{"url": "http://localhost:8081", "weight": 1},
{"url": "http://localhost:8082", "weight": 1},
{"url": "http://localhost:8083", "weight": 1}
]
}Notes:
- Weight affects WRR when that policy is enabled.
- Health checks default to
GET <backend>/healthzunless configured otherwise.
Environment override for port:
export LB_PORT=9090
go run .Hot reload:
- Sending SIGHUP to the LB process triggers a hot reload of configuration (Unix-like systems):
kill -HUP <pid>On Windows, prefer admin endpoints for runtime changes.
Start three simple HTTP backends (Python’s stdlib works great for a demo):
# Start three test backends
python3 -m http.server 8081 &
python3 -m http.server 8082 &
python3 -m http.server 8083 &Run the load balancer:
go run .Access through the LB:
curl localhost:3030Dynamic operations:
# Add a backend at runtime
curl -X POST "localhost:3030/admin/backend/add?url=http://localhost:8084"
# Gracefully drain a backend (stop receiving new requests)
curl -X POST "localhost:3030/admin/drain?url=http://localhost:8081"Metrics and insights:
# Prometheus endpoint
curl localhost:3030/metrics
# JSON metrics snapshot (pipe to jq for readability)
curl localhost:3030/admin/metrics/json | jqMore helpful endpoints:
/readyz— readiness across currently healthy backends/admin/backends— list backends and state/admin/canary/*— set/clear/status for canary rollout/admin/outliers— view quarantined backends/debug/config— view effective configuration/admin/selftest— quick probe of core subsystems
- Each incoming HTTP request runs in its own goroutine.
- Admission control uses a bounded semaphore channel to hard-cap global concurrency and a per-client token bucket to ensure fairness.
- EWMA and counters are maintained with lock-free atomics to minimize contention.
- Background goroutines:
- AIMD controller periodically adjusts the soft concurrency limit to meet a latency target.
- Active HTTP health checker with jitter and success/failure thresholds.
- Outlier detector that quarantines unhealthy backends, with automatic recovery.
- Periodic metrics dumper and predictive scaling advisory loop.
Illustrative snippet (admission skeleton):
// bounded global concurrency
sema := make(chan struct{}, maxInFlight)
func withAdmission(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
select {
case sema <- struct{}{}:
defer func() { <-sema }()
next.ServeHTTP(w, r)
default:
http.Error(w, "Too Many Requests", http.StatusTooManyRequests)
return
}
})
}Methodology:
- Load tools:
hey,wrk, orabto generate concurrent traffic. - Scenarios: baseline (RR), P2C-EWMA enabled, outlier injection (5xx spikes), backend failure/recovery, canary rollout.
Metrics observed:
- End-to-end request latency histograms and per-backend EWMA.
- Error rates and breaker transitions; quarantine ejections and recovery.
- In-flight gauges and AIMD soft limit over time (stability and responsiveness).
Findings (typical):
- P2C-EWMA reduces tail latency under heterogeneous backend performance by biasing toward lower-latency nodes.
- AIMD stabilizes throughput under heavy load, preventing latency runaway by throttling admitted concurrency.
- Outlier detection quarantines flaky backends quickly, lowering error propagation while allowing automatic rejoin.
- Currently HTTP-only; gRPC proxying is detected but not fully supported.
- No persistent state across restarts; admin changes live in memory.
- Predictive scaling is advisory only (does not auto-scale).
- Could integrate real service discovery (Kubernetes Endpoints API, Consul, or Eureka).
- Add formal integration tests and trace correlation (OpenTelemetry) for richer observability.
Evolution highlights:
| Version | Key Additions |
|---|---|
| v1–v3 | Passive health checks, metrics, concurrency base |
| v4–v7 | New pickers: Least Connections, Weighted RR, EWMA (P2C) |
| v8–v11 | Rate limiting, idempotent retries, request IDs |
| v12–v15 | Admin drain/flip, structured JSON logging |
| v16–v18 | Outlier quarantine, per-backend caps, warm-up recovery |
| v19–v20 | Dynamic add/remove backends, active HTTP health with jitter |
| v21 | Predictive scaling advisory, periodic metrics dump, research-grade observability |
Acknowledgements:
- Thanks to the SIT315 teaching team for the unit’s focus on practical concurrency and distributed systems.
- Kasun Vithanage, “Let’s Build a Simple Load Balancer in Go” (2019)
- Google SRE Book, chapters on Load Balancing and Fault Tolerance
- Rob Pike, “Go Concurrency Patterns” (2012)
- Prometheus Documentation (instrumentation aad exposition formats)
- Deakin University SIT315 Unit Materials
This submission demonstrates a sophisticated, production-adjacent load balancer that unifies concurrency control, adaptive scheduling, health-based fault tolerance, and comprehensive observability. Through 21 iterative versions it showcases principled application of AIMD control, latency-aware selection (P2C‑EWMA), circuit breaking, and dynamic configuration — all implemented with goroutines, channels, and atomics. The result is a robust, self-adaptive system that maintains performance under contention and failure, exemplifying advanced competency in concurrent and distributed systems.