-
-
Notifications
You must be signed in to change notification settings - Fork 0
Horizontal Scaling
openWCS scales horizontally: every service's request path is stateless (all durable state lives in Postgres or Kafka), so you can run multiple replicas behind a load balancer. The two places that previously assumed a single instance — scheduled jobs and the conveyor-loop capacity check — were made replication-safe in the 9920fba release.
Full detail: docs/SCALING.md.
Ready-to-apply Kubernetes manifests: deploy/k8s/.
| Area | Mechanism | Scales? |
|---|---|---|
| REST request path (all services) | stateless; state in Postgres | ✅ replicas / HPA |
| Outbox relays — order-management, txlog |
ShedLock @SchedulerLock — one replica drains per tick |
✅ |
| Off-peak jobs — slotting velocity/replenishment/reslot, counting sweep, host webhook |
ShedLock @SchedulerLock
|
✅ |
| Conveyor loop capacity — flow-orchestrator | pessimistic row lock on the loop row makes count-and-enter atomic | ✅ |
| Stock reservation — inventory / allocation | already serialized by a pessimistic lock on AVAILABLE stock rows |
✅ |
| txlog→stock projection (inventory), velocity learner (slotting) | Kafka consumer group + idempotent apply on event_id
|
✅ up to topic partitions |
| conveyor-sniffer | TCP telegram stream per controller cannot be split across replicas |
order-management, txlog, slotting, counting, and integration-host each carry
ShedLock (JDBC). A @SchedulerLock-annotated
@Scheduled method acquires a cluster-wide lock — a row in that service's own <schema>.shedlock
table (added by a Flyway migration; see each service's ShedLockConfig) — so it runs on only one
replica per fire. Outbox relays use a short lockAtMostFor (1 min) for fast failover if a holder
dies; daily sweep jobs use the service default.
-
Kafka partition count caps consumer scaling. The inventory stock projection
(
group.id = inventory-stock-projection) and the slotting velocity learner (slotting-velocity-learner) scale only up to the partition count oftxlog.stream. Provision that topic with at least as many partitions as the desired max replica count; replicas beyond the partition count idle. -
Do not load-balance the conveyor-sniffer. It accepts a long-lived TCP telegram stream per
controller; splitting that stream across replicas fragments telegrams. It is pinned to one replica
(
replicas: 1,Recreatestrategy indeploy/k8s/adapters.yaml). To scale sniffing, run one instance per controller/site (distinctSNIFFER_LISTEN/ALLOWED_IPS) or use a sticky (session- affinity) L4 load balancer — not round-robin.
deploy/k8s/ contains plain-YAML starter manifests for running the whole estate on Kubernetes:
deploy/k8s/
├── namespace.yaml # the openwcs namespace
├── config.yaml # shared ConfigMap (service DNS, Kafka, security) + DB Secret placeholder
├── services.yaml # Deployment + Service for every Java service
├── adapters.yaml # Deployment + Service for the Go adapters (sniffer pinned to replicas: 1)
└── hpa.yaml # HorizontalPodAutoscalers for the high-traffic services (2→10 on CPU)
These are a starting point — set your image registry, secrets, ingress, and resource sizing for
your cluster before relying on them. See Deployment Guide for cloud-specific recipes (GKE, AKS,
ECS Fargate) and the deploy/k8s/README.md for conventions baked into the manifests.
# After editing config.yaml (image registry, DB/Kafka endpoints, secrets):
kubectl apply -f deploy/k8s/Inter-service URLs use Kubernetes DNS (http://master-data:8081, …) — identical to the Docker
Compose service names, so no application config changes are needed.
-
Scan hot path: the routing fast path (
RoutingGraphCache+ reverse-Dijkstra next-hop tables memoised per target) replaced the per-scan full edge fetch + Dijkstra. Warm scans answer in low single-digit milliseconds with two synchronous DB ops: one indexed route-plan read (deliberately not cached — dynamic state shared across stateless replicas) and one route-position UPDATE (loop occupancy + plan progression are read from it). Counters andSCANNEDtrace writes are async (ScanSideEffects). At very high rates the remaining bottleneck is the route-position UPDATE; adding replicas helps (each carries its own graph snapshot, evicted on topology writes or the 60 s TTL). Monitor the ~10 ms per-scan budget viaGET /api/flow/reports/decision-latency. -
Outbox relays are single-writer by ShedLock. If relay throughput becomes the limit, switch to
a partitioned
FOR UPDATE SKIP LOCKEDdrain keyed per stream (preserves per-stream order while allowing parallel workers).
- Architecture — principles and data model
- Deployment Guide — full deployment recipes (single-VM, AWS, GCP, Azure, Kubernetes)
-
docs/SCALING.md— authoritative in-repo scaling reference
openWCS — open-source Warehouse Control System · summarized from build.md & docs/AS-BUILT.md (the repo docs are authoritative).
Design
Flows
- Inbound and Inventory
- Slotting and Replenishment
- Goods-to-Person Stations
- Outbound Flow
- Equipment Integration
- Transport Overview
- Hardware Visualisation
- Host Integration
Reporting & Dashboards
Operations