feat(sqs): HT-FIFO partition metrics counter (Phase 3.D PR 7a) by bootjp · Pull Request #737 · bootjp/elastickv

bootjp · 2026-05-04T15:52:10Z

Summary

Phase 3.D PR 7a — adds the elastickv_sqs_partition_messages_total{queue, partition, action} Prometheus counter so dashboards and alerts can spot uneven MessageGroupId distributions across partitioned-FIFO queues. Metrics-only: the Jepsen HT-FIFO workload (PR 7b) ships separately so the two review loops do not cross.

monitoring/sqs.go (new): SQSMetrics with cardinality cap (sqsMaxTrackedQueues = 512, overflow collapses to _other) mirroring DynamoDBMetrics. Drops empty queue names and unknown action labels so a future call-site bug cannot pollute the series space dashboards have to learn about. Nil-receiver-safe so adapter call sites do not need to nil-guard.
adapter/sqs.go: SQSPartitionObserver interface + WithSQSPartitionObserver option. Re-declared in adapter so it doesn't import monitoring at the package boundary (matches the DynamoDB/Redis observer pattern). Action constants (send/receive/delete) re-declared on the adapter side and validated at runtime by the monitoring side — drift between the two surfaces as a dropped observation, not a wedge.
adapter/sqs_fifo.go, adapter/sqs_messages.go: emit the counter on the partitioned commit branch only (PartitionCount > 1) for send / receive / delete. Legacy single-partition queues stay off the metric since partition is always 0 and the cardinality cost would buy nothing.
monitoring/registry.go, main_sqs.go, main.go: wire the registry's SQSPartitionObserver() into startSQSServer so the SQS server picks up the production observer on cluster boot. Test fixtures and CLI tools that build SQSServer without a registry pass nil and the metric stays at zero.

Tests

monitoring/sqs_test.go (new, 6 cases):

TestSQSMetrics_ObservePartitionMessage_IncrementsByLabelTriple — pin the (queue, partition, action) counter contract.
TestSQSMetrics_ObservePartitionMessage_DropsInvalidAction — pin the typo guard against future drift between adapter and monitoring constants.
TestSQSMetrics_ObservePartitionMessage_DropsEmptyQueue — pin that an empty queue name does not collapse with valid observations onto a shared series.
TestSQSMetrics_NilReceiverIsSafe — pin the nil-receiver short-circuit the adapter relies on.
TestSQSMetrics_QueueLabelOverflow — pin the cap-and-collapse so a misbehaving caller cannot exhaust the Prometheus series budget.
TestSQSMetrics_RegistryWiring — pin that the public Registry exposes the metric under the documented name.

Self-review (5 lenses)

Data loss — N/A; metrics-only, no storage / Raft / FSM touch.
Concurrency — counter increments are atomic via Prometheus; the trackedQueues map is only consulted from the dispatch-success path under the SQS server's existing concurrency model. No new locks.
Performance — one map lookup + one CounterVec lookup per partitioned send/receive/delete on the success branch. Legacy queues skip the call entirely. Cardinality bounded at 512 queue × 32 partition (htfifoMaxPartitions) × 3 action ≈ 49k series worst case; in practice a 32-partition queue yields 96 series, so the budget is plenty for the SLO panels.
Data consistency — the metric is observed AFTER OCC dispatch succeeds, so the counter reflects committed state. Receive/delete branches that return on retryable errors deliberately do not increment (the retry path will observe on the eventual success).
Test coverage — 6 unit tests in monitoring/, plus the adapter-side nil-observer path is exercised by all existing partitioned-FIFO tests in adapter/sqs_partitioned_dispatch_test.go (they pass nil observer through the test fixture).

Test plan

go test -race -count=1 ./monitoring/...
go test -race -count=1 -run 'TestSQS' ./adapter/...
go test -race -count=1 ./... (full suite)
golangci-lint --config=.golangci.yaml run ./... (full repo)
Jepsen HT-FIFO workload — deferred to PR 7b

Refs

docs/design/2026_05_01_partial_split_queue_fifo.md §11 PR 7
Builds on PR 5b-3 (feat(sqs): atomic dormancy gate-lift + htfifo capability gate (Phase 3.D PR 5b-3) #734) capability gate, PR 6a (feat(sqs): tombstone reaper enumerates partitioned keyspace (Phase 3.D PR 6a) #735) tombstone reaper, PR 6b (feat(sqs): live-queue reaper enumerates partitioned keyspace (Phase 3.D PR 6b) #736 in flight) live-queue reaper.

Adds elastickv_sqs_partition_messages_total{queue, partition, action} so dashboards and alerts can spot uneven MessageGroupId distributions across partitioned-FIFO queues. - monitoring/sqs.go: SQSMetrics with cardinality cap (sqsMaxTrackedQueues = 512, overflow collapses to "_other") mirroring DynamoDBMetrics. Drops empty queue names and unknown action labels so a future call-site bug cannot pollute the series space dashboards have to learn about. Nil- receiver-safe so adapter call sites do not need to nil-guard. - adapter/sqs.go: SQSPartitionObserver interface + WithSQSPartitionObserver option. Re-declared in adapter so it does not import monitoring at the package boundary (matches DynamoDB / Redis observer pattern). Action constants (send/receive/delete) re-declared on the adapter side and validated at runtime by the monitoring side. - adapter/sqs_fifo.go, adapter/sqs_messages.go: emit the counter on the partitioned commit branch only (PartitionCount > 1) for send / receive / delete. Legacy single-partition queues stay off the metric — the cardinality cost would buy nothing since partition is always 0. - monitoring/registry.go, main_sqs.go, main.go: wire the registry SQSPartitionObserver() into startSQSServer so the SQS server picks up the production observer on cluster boot. Test fixtures and CLI tools that build SQSServer without a registry pass nil and the metric stays at zero. Tests (monitoring/sqs_test.go): - TestSQSMetrics_ObservePartitionMessage_IncrementsByLabelTriple — pin the (queue, partition, action) label triple counter contract. - TestSQSMetrics_ObservePartitionMessage_DropsInvalidAction — pin the typo guard against future drift between adapter and monitoring constants. - TestSQSMetrics_ObservePartitionMessage_DropsEmptyQueue — pin that an empty queue name does not collapse with valid observations onto a shared series. - TestSQSMetrics_NilReceiverIsSafe — pin the nil-receiver short-circuit the adapter relies on. - TestSQSMetrics_QueueLabelOverflow — pin the cap-and-collapse so a misbehaving caller cannot exhaust the Prometheus series budget. - TestSQSMetrics_RegistryWiring — pin that the public Registry exposes the metric under the documented name. Self-review (5 lenses): 1. Data loss — N/A; metrics-only, no storage / Raft / FSM touch. 2. Concurrency — counter increments are atomic via Prometheus; the trackedQueues map is only consulted from the dispatch-success path under the SQS server existing concurrency model. No new locks. 3. Performance — one map lookup + one CounterVec lookup per partitioned send/receive/delete on the success branch. Legacy queues skip the call entirely. Cardinality bounded at 512 queue x 256 partition x 3 action ~= 393k series worst case; in practice a 32-partition queue yields 96 series, so the budget is plenty for the SLO panels. 4. Data consistency — the metric is observed AFTER OCC dispatch succeeds, so the counter reflects committed state. Receive/delete branches that return on retryable errors deliberately do not increment. 5. Test coverage — 6 unit tests in monitoring/, plus the adapter-side nil-observer path is exercised by all existing partitioned-FIFO tests in adapter/sqs_partitioned_dispatch_test.go (they pass nil observer through the test fixture). Jepsen HT-FIFO workload (PR 7b) lands as a separate PR so the metrics and the workload review loops do not cross. Refs: docs/design/2026_05_01_partial_split_queue_fifo.md §11 PR 7

bootjp · 2026-05-04T15:52:17Z

@claude review

coderabbitai · 2026-05-04T15:52:20Z

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 15 minutes and 20 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f8ab090d-9b39-4f01-9f58-4018bb8dc998

📥 Commits

Reviewing files that changed from the base of the PR and between f489669 and 8230365.

📒 Files selected for processing (8)

adapter/sqs.go
adapter/sqs_fifo.go
adapter/sqs_messages.go
main.go
main_sqs.go
monitoring/registry.go
monitoring/sqs.go
monitoring/sqs_test.go

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/sqs-htfifo-partition-metrics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 15 minutes and 20 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-05-04T15:52:44Z

Claude finished @bootjp's task in 4m 14s —— View job

Code Review — feat(sqs): HT-FIFO partition metrics counter (Phase 3.D PR 7a)

Read task context and classify request
Read all changed files
Review monitoring/sqs.go and monitoring/sqs_test.go
Review adapter/sqs.go changes
Review adapter/sqs_fifo.go and adapter/sqs_messages.go changes
Review main.go, main_sqs.go, monitoring/registry.go wiring
Post final review

Overall this is a clean, well-structured metrics-only PR that correctly mirrors the existing DynamoDB observer pattern. No blocking issues. A few observations below from most to least significant.

Findings

1. Cardinality calculation in PR description is off (doc-only issue)

The PR description states: "cardinality bounded at 512 queue × 256 partition × 3 action ≈ 393k series worst case".

htfifoMaxPartitions = 32 (adapter/sqs_partitioning.go:26), so the actual worst case is 512 × 32 × 3 = 49,152 series — about 8× less than stated. The code and cardinality cap are both correct; only the description is inaccurate. Worth correcting if the design doc is updated to *_implemented_*.

2. Prometheus Help text contains an internal design-doc reference

monitoring/sqs.go:59–62:

Help: "Total HT-FIFO partition operations by queue, partition, and action (send / receive / delete). Per the §11 PR 7 split-queue-FIFO design contract — gives dashboards a hot-partition signal so operators can spot uneven MessageGroupId distributions.",

"Per the §11 PR 7 split-queue-FIFO design contract" will appear verbatim in production /metrics output and Prometheus help text. External operators and SREs reading this in Grafana or Alertmanager won't have access to the internal doc. Consider trimming to the observable semantics: Fix this →

3. SendMessageBatch / DeleteMessageBatch coverage (confirmed correct)

sendMessageBatch → sendMessageBatchWithRetry → runFifoSendWithRetry → sendFifoMessage (in sqs_messages_batch.go:304, sqs_messages_batch.go:344) — the metric fires on the shared FIFO commit path, so batches are covered.

deleteMessageBatch iterates and calls deleteMessageWithRetry per entry (sqs_messages_batch.go:485) where the observe call already lives. Both batch paths are correctly covered.

4. Nil-interface safety for the _other overflow sentinel vs. a legitimately named _other queue

If a user creates an SQS queue literally named _other (a valid name under the [a-zA-Z0-9_-] character set), its observations would be tracked under the real _other label while it's within the 512-queue cap. Overflow queues would also emit to _other. When both exist, the _other series aggregates both, making it impossible to distinguish the real queue from overflow noise. This is an accepted trade-off mirroring the DynamoDB pattern, but it's worth noting in a dashboards runbook.

5. TestSQSMetrics_NilReceiverIsSafe — belt-and-suspenders guard

The test correctly pins the if m == nil { return } nil-receiver guard in ObservePartitionMessage. The adapter's production nil-check lives in observePartitionMessage (s.partitionObserver == nil), so this guard is only reachable if a caller directly stores a typed nil (*SQSMetrics)(nil) into the interface. That can't happen through Registry.SQSPartitionObserver() since NewRegistry always initializes r.sqs, but the defensive guard + test is appropriate.

6. No concurrent stress test for queueLabelForCardinalityBudget

The mutex around trackedQueues is correct. The existing tests create independent instances, so no concurrent stress is exercised. Prometheus's own CounterVec is thread-safe, and the race detector on go test -race would surface any problem here. Low risk, but a TestSQSMetrics_ConcurrentObserve could be added to a follow-up if desired.

7. Minor: observe call order in commitReceiveRotation

monitoring/sqs.go is observed after Dispatch succeeds but before encodeReceiptHandleDispatch (adapter/sqs_messages.go:1279). If encoding fails, the Raft commit has happened (message rotation committed) but the client gets an error. The metric increment is semantically correct (partition was used), and this scenario mirrors pre-existing behaviour in commitReceiveRotation — not a regression introduced by this PR.

Summary

Code is correct, tests are thorough and cover the key invariants (nil safety, cardinality cap, label isolation, registry wiring). The five-lens self-review in the PR description is accurate. Two suggestions: clean up the Help string of internal references, and correct the 256→32 cardinality figure in the description (code is fine). LGTM with those minor nits.

gemini-code-assist · 2026-05-04T15:57:23Z

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

bootjp · 2026-05-04T16:00:21Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces observability for SQS HT-FIFO operations by tracking message counts per queue, partition, and action (send, receive, delete). It adds a new SQSPartitionObserver interface, a Prometheus-backed implementation in the monitoring package with cardinality capping, and integrates these metrics into the SQS adapter's send, receive, and delete paths. Feedback focuses on performance optimizations for the metrics hot path, specifically suggesting the use of WithLabelValues to avoid heap allocations from map creation and recommending more efficient synchronization, such as sync.RWMutex, to reduce contention during queue label resolution.

gemini-code-assist · 2026-05-04T16:06:13Z

+	m.partitionMessages.With(prometheus.Labels{
+		"queue":     queueLabel,
+		"partition": strconv.FormatUint(uint64(partition), 10),
+		"action":    action,
+	}).Inc()


Using prometheus.Labels (a map[string]string) on every observation causes a heap allocation for the map. On the hot path of SQS operations, this adds significant GC pressure. Switching to WithLabelValues avoids the map allocation. Additionally, strconv.FormatUint allocates a new string; for common partition numbers (e.g., 0-31), consider using a pre-allocated lookup table to further reduce allocations.

Suggested change

m.partitionMessages.With(prometheus.Labels{

"queue": queueLabel,

"partition": strconv.FormatUint(uint64(partition), 10),

"action": action,

}).Inc()

m.partitionMessages.WithLabelValues(

queueLabel,

strconv.FormatUint(uint64(partition), 10),

action,

).Inc()

gemini-code-assist · 2026-05-04T16:06:20Z

+func (m *SQSMetrics) queueLabelForCardinalityBudget(queue string) string {
+	m.mu.Lock()
+	defer m.mu.Unlock()
+	if _, ok := m.trackedQueues[queue]; ok {
+		return queue
+	}
+	if len(m.trackedQueues) >= sqsMaxTrackedQueues {
+		return sqsQueueOverflow
+	}
+	m.trackedQueues[queue] = struct{}{}
+	return queue
+}


The queueLabelForCardinalityBudget method is called on every SQS operation for partitioned queues. Using a plain sync.Mutex and defer creates a global bottleneck by serializing all metrics observations. Since trackedQueues is read-mostly once initialized, using a sync.RWMutex with a fast-path RLock or a sync.Map would significantly reduce contention. This follows the general rule of preferring efficient synchronization on hot paths.

References

For frequently accessed fields that require monotonic updates (like a configuration index), prefer atomic operations (e.g., CAS) over mutexes to improve performance on hot paths.

Claude flagged that the Prometheus Help string for elastickv_sqs_partition_messages_total contained an internal design-doc pointer ("Per the §11 PR 7 split-queue-FIFO design contract") which leaks into production /metrics output and the Grafana help tooltip — neither audience has access to the doc. Replaced with a description of the observable semantics: which queues populate the metric (PartitionCount > 1 only) and what the metric is for (spotting uneven MessageGroupId distribution across partitions). The cap-and-collapse / dropped-action / empty-queue behaviours stay implementation details exposed only through the unit tests. Refs: PR #737 review thread; Claude review at run 25328847079.

bootjp · 2026-05-04T16:07:01Z

@claude review

claude · 2026-05-04T16:07:24Z

Claude finished @bootjp's task in 2m 17s —— View job

Code Review (Round 2) — feat(sqs): HT-FIFO partition metrics counter (Phase 3.D PR 7a)

Read task context
Read current state of changed files (post-fix commit 9491561)
Evaluate Gemini review suggestions
Assess Help text fix
Post final review

Round 2 verdict: LGTM with one actionable nit

The two-commit state is clean. Round 1 findings are resolved or correctly accepted. Addressing Gemini's two suggestions below.

Help text fix (commit `9491561`) ✅

The updated Help string is correct and operator-facing:

Help: "Total HT-FIFO partition operations by queue, partition, and action (send / receive / delete). Non-zero only for queues with PartitionCount > 1; use to spot uneven MessageGroupId distributions across partitions.",

The internal doc reference is gone. The replacement text describes the observable semantics precisely and is appropriate for Grafana/Alertmanager output.

Gemini finding 1: `WithLabelValues` instead of `prometheus.Labels` — Apply this

monitoring/sqs.go:89–93:

m.partitionMessages.With(prometheus.Labels{
    "queue":     queueLabel,
    "partition": strconv.FormatUint(uint64(partition), 10),
    "action":    action,
}).Inc()

With(prometheus.Labels{...}) allocates a map[string]string on every call (plus the strconv.FormatUint string). WithLabelValues avoids the map allocation and takes the label values positionally in the order declared in NewCounterVec ("queue", "partition", "action"):

m.partitionMessages.WithLabelValues(
    queueLabel,
    strconv.FormatUint(uint64(partition), 10),
    action,
).Inc()

The existing DynamoDB metrics (monitoring/dynamodb.go:197, 230–238, 311–328) all use WithLabelValues — sqs.go is the only metric in the package that uses With(prometheus.Labels{...}). This is an inconsistency with a concrete allocation cost on the hot path. Worth a one-liner fix. Fix this →

Gemini finding 2: `sync.RWMutex` for `queueLabelForCardinalityBudget` — Skip, document why

monitoring/sqs.go:52, 103–113 uses sync.Mutex, identical to monitoring/dynamodb.go:77 which holds the same cardinality pattern. Gemini's suggestion is theoretically valid (read-mostly once the 512-queue cap is reached), but:

Upgrading from RLock to Lock is not atomic in Go — the correct implementation requires release/reacquire plus a double-check, making the function meaningfully more complex.
The critical section is ~5ns (map lookup + branch + possible insert of a struct{}). For short critical sections, sync.RWMutex has higher overhead than sync.Mutex due to the reader-count atomic ops; the crossover point is typically when the critical section is ≥1µs or contention is severe.
DynamoDB uses sync.Mutex for the same pattern and hasn't been flagged for contention. Changing only sqs.go would be inconsistent with no performance evidence justifying the divergence.

No change needed here. If cardinality-map contention is ever measured as a bottleneck, the correct fix would be to apply sync.Map with a separate atomic counter (removing the len() call from under the lock) — and apply it to both SQSMetrics and DynamoDBMetrics together.

Summary

Finding	Status
Help text internal doc reference (Round 1 #2)	✅ Fixed in `9491561`
Cardinality figure in PR description (Round 1 #1)	Doc-only; no code change needed
`WithLabelValues` vs `prometheus.Labels` (Gemini)	⚠️ Apply — inconsistency with existing pattern, allocation cost
`sync.RWMutex` (Gemini)	✅ Skip — mirrors DynamoDB, complexity > benefit at this scale
Batch path coverage (Round 1 #3)	✅ Confirmed correct
`_other` name collision (Round 1 #4)	✅ Accepted trade-off (mirrored from DynamoDB)
Nil guard safety (Round 1 #5)	✅ Correct

One actionable item remaining: swap With(prometheus.Labels{...}) → WithLabelValues(...) in monitoring/sqs.go:89. Everything else is correct.

Claude review round 2 surfaced one actionable nit (Gemini): the ObservePartitionMessage path used With(prometheus.Labels{...}) which allocates a map[string]string on every observe call. Switched to WithLabelValues with positional args (queue, partition, action) matching the NewCounterVec declaration — avoids the per-call map allocation and lines up with the DynamoDBMetrics pattern (monitoring/dynamodb.go) which already uses WithLabelValues across all observe sites. Behaviour unchanged. Single caller (the adapter observe wrapper), return type unchanged, label semantics identical, so no caller audit needed. Round 2 also flagged sync.RWMutex for queueLabelForCardinalityBudget but Claude (and I) agreed to skip — the critical section is ~5ns, RWMutex has higher overhead than Mutex at that scale, and the DynamoDB twin uses Mutex. Diverging only in SQS would be inconsistent without performance evidence. Refs: PR #737 round 2 review at run 25329442300.

bootjp · 2026-05-04T16:11:41Z

@claude review

chatgpt-codex-connector · 2026-05-04T16:11:43Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

claude · 2026-05-04T16:11:56Z

Claude finished @bootjp's task in 1m 51s —— View job

Code Review (Round 3) — feat(sqs): HT-FIFO partition metrics counter (Phase 3.D PR 7a)

Read task context and classify request
Read current state of changed files (post-round-2 commit 8230365)
Verify WithLabelValues fix landed correctly
Fresh review pass on all changed files
Post final verdict

Round 3 verdict: LGTM — all previous findings resolved

Both round 1 and round 2 changes are correctly applied. No new issues found.

Verification of round 2 fix: `WithLabelValues` (commit `8230365`) ✅

monitoring/sqs.go:93–97 now reads:

m.partitionMessages.WithLabelValues(
    queueLabel,
    strconv.FormatUint(uint64(partition), 10),
    action,
).Inc()

prometheus.Labels{} map allocation is gone. The prometheus import is still needed for prometheus.CounterVec, prometheus.Registerer, prometheus.NewCounterVec, and prometheus.CounterOpts — no orphaned import introduced by the change. Consistent with monitoring/dynamodb.go hot path.

The inline comment (line 89–92) correctly explains the why (allocation cost), not just the what.

Full wiring audit

Component	Status
`monitoring/sqs.go` — `WithLabelValues`, clean Help text, action validation, cardinality cap	✅
`monitoring/registry.go` — `sqs *SQSMetrics` field, `newSQSMetrics` in constructor, `SQSPartitionObserver()` method	✅
`adapter/sqs.go` — interface re-declared (no cross-package import), `observePartitionMessage` nil guard (`s.partitionObserver == nil`), `WithSQSPartitionObserver` option	✅
`adapter/sqs_fifo.go` — observe fires post-commit, `PartitionCount > 1` gate	✅
`adapter/sqs_messages.go` — receive: same gate; delete: adds `meta != nil` check (appropriate, different code path)	✅
`main.go` — `sqsPartitionObserver` field on `runtimeServerRunner`, sourced from `in.metricsRegistry.SQSPartitionObserver()`	✅
`main_sqs.go` — `partitionObserver` param threaded into `WithSQSPartitionObserver` in `startSQSServer`	✅
`monitoring/sqs_test.go` — 6 tests covering counter increment, invalid action drop, empty queue drop, nil receiver, cardinality overflow, registry wiring	✅

One residual non-blocking note (same as round 2 accepted trade-off)

If a queue is literally named _other and the cap is also reached, the _other series will aggregate both. This is the same accepted behavior as DynamoDBMetrics and is documented in the self-review. No action needed.

No further changes required. PR is ready to merge.

## Summary Phase 3.D PR 7b — Jepsen HT-FIFO workload that stresses partitioned-FIFO queues against the three contracts AWS HT-FIFO is supposed to honour even under partition and node-loss faults: **within-group ordering**, **no message loss**, **no duplicates**. Pattern follows [aphyr's Jepsen RabbitMQ analysis](https://aphyr.com/posts/315-jepsen-rabbitmq): track every `:send` and `:recv` in the operation history, then a custom checker verifies the contracts against the recorded events at the end of the run. ## What's in this PR - **`jepsen/project.clj`** — Adds `com.cognitect.aws/sqs` at the same version as the existing dynamodb dep, so the SDK wire protocol (auth, retry classification, error parsing) is exercised end-to-end against elastickv rather than a hand-rolled HTTP layer. - **`jepsen/src/elastickv/db.clj`** — Extends `start-node!` to accept `:sqs-port` (port spec like `:dynamo-port`) and `:sqs-region`. Both are optional, so existing dynamodb / s3 / redis test specs are byte-identical at the args level when `sqs-port` is absent. - **`jepsen/src/elastickv/jepsen_test.clj`** — Registers `elastickv-sqs-htfifo-test` alongside the other workloads. - **`jepsen/src/elastickv/sqs_htfifo_workload.clj`** (new, ~430 lines) — The workload. Uses cognitect/aws-api SQS, creates an HT-FIFO queue with `PartitionCount=4` + `ContentBasedDeduplication`, runs sends and receives across N `MessageGroupId` values, and the custom `ht-fifo-checker` validates the three contracts. - **`jepsen/test/elastickv/sqs_htfifo_workload_test.clj`** (new) — Pure-function tests for the checker plus integration smoke tests for the test-spec builder. 11 tests / 27 assertions. ## Checker contracts For each `MessageGroupId` independently: 1. **Within-group ordering** — the sequence of received `seq` values, sorted by global completion time across all consumers, is monotonically non-decreasing. 2. **No loss** — every `(group, seq)` successfully `:sent` eventually appears in the `:recv` history. Sends with `:info` status are treated as possibly-committed and not counted as lost. 3. **No duplicates** — every `(group, seq)` appears at most once in the `:recv` history. `ContentBasedDeduplication` on the queue + a unique `(group, seq)` body is what enforces this server-side; a duplicate here is a real bug (e.g. a deletion that did not commit). ## Open-endpoint mode The elastickv server starts without `--sqsCredentialsFile`, so the SQS adapter accepts any signed request (mirroring how the S3 adapter is wired in jepsen today). The SDK client signs with dummy credentials, so the SigV4 path still exercises end-to-end at the protocol level. ## Self-review (5 lenses) 1. **Data loss** — N/A; this is a test-only PR. The workload's whole purpose is to *detect* data loss in the system under test. 2. **Concurrency** — The shared per-group `seq-counter` is an `atom` updated via `swap!` (CAS-based), so concurrent sends from different worker threads always assign distinct seqs. The checker is pure; no shared mutable state. 3. **Performance** — Test-only code, runs at low rate (5 ops/sec/worker). Not on any hot path. 4. **Data consistency** — The checker compares committed sends against the receive history globally, so all the consistency assertions are at end-of-run with a complete picture. Sends with `:info` (uncertain commit) are correctly excluded from the loss set, matching Jepsen's standard approach. 5. **Test coverage** — 11 unit tests for the checker pin the contract surface (clean / loss / info-not-loss / duplicates / within-group ordering / cross-group interleaving / failed-send-not-counted / empty-receive). Integration smoke tests pin the test-spec builder. The workload itself is exercised end-to-end on a real cluster via `lein run -m elastickv.sqs-htfifo-workload`. ## Test plan - [x] `lein test elastickv.sqs-htfifo-workload-test` — 11 tests / 27 assertions pass - [x] `lein test` for non-redis suite (dynamodb / dynamodb-types / s3 / cli / sqs-htfifo) — 21 tests / 41 assertions pass - [ ] End-to-end live cluster run — operator-driven (out of scope for the merge gate; relies on a 3-node cluster setup) The `elastickv.redis-workload` namespace fails to load due to the empty `redis/src/` tree, which is pre-existing on main and unrelated to this PR. ## Out of scope (next milestones) - Wiring the workload into `scripts/run-jepsen-local.sh` — the existing script is dynamodb-only; an sqs counterpart lands as a follow-up. - Multi-shard cluster topology that lands distinct partitions on distinct Raft groups. This PR's `PartitionCount=4` routes to the default group on a single-shard cluster — partitioning logic (different keys per partition, ordering preserved within group) is fully exercised, but the cross-shard scaling story is gated on separate work. - Design-doc lifecycle rename (`*_proposed_*.md` → `*_partial_*.md`) — that is §11 PR 8 in the design doc and is tracked separately. ## Refs - `docs/design/2026_04_26_proposed_sqs_split_queue_fifo.md` §11 PR 7. - Closes the testing half of §11 PR 7. PR 7a (metrics) shipped at #737.  ## Summary by CodeRabbit * **New Features** * Added AWS SQS integration with HT‑FIFO support, SQS port/region configuration, and runtime options to exercise FIFO dedupe/order semantics * **Tests** * Added comprehensive unit and workload tests validating ordering, no‑loss, no‑duplicates, and option handling * **Chores** * CI updated to run the SQS HT‑FIFO workload as part of Jepsen test runs

gemini-code-assist Bot reviewed May 4, 2026

View reviewed changes

bootjp merged commit 04a997a into main May 4, 2026
9 checks passed

bootjp deleted the feat/sqs-htfifo-partition-metrics branch May 4, 2026 16:34

bootjp mentioned this pull request May 4, 2026

feat(sqs): Jepsen HT-FIFO workload (Phase 3.D PR 7b) #738

Merged

3 tasks

Conversation

bootjp commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Self-review (5 lenses)

Test plan

Refs

Uh oh!

bootjp commented May 4, 2026

Uh oh!

coderabbitai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Uh oh!

claude Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review — feat(sqs): HT-FIFO partition metrics counter (Phase 3.D PR 7a)

Findings

Summary

Uh oh!

gemini-code-assist Bot commented May 4, 2026

Uh oh!

bootjp commented May 4, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented May 4, 2026

Uh oh!

claude Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review (Round 2) — feat(sqs): HT-FIFO partition metrics counter (Phase 3.D PR 7a)

Round 2 verdict: LGTM with one actionable nit

Help text fix (commit 9491561) ✅

Gemini finding 1: WithLabelValues instead of prometheus.Labels — Apply this

Gemini finding 2: sync.RWMutex for queueLabelForCardinalityBudget — Skip, document why

Summary

Uh oh!

bootjp commented May 4, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 4, 2026

Uh oh!

claude Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review (Round 3) — feat(sqs): HT-FIFO partition metrics counter (Phase 3.D PR 7a)

Round 3 verdict: LGTM — all previous findings resolved

Verification of round 2 fix: WithLabelValues (commit 8230365) ✅

Full wiring audit

One residual non-blocking note (same as round 2 accepted trade-off)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bootjp commented May 4, 2026 •

edited

Loading

coderabbitai Bot commented May 4, 2026 •

edited

Loading

claude Bot commented May 4, 2026 •

edited

Loading

claude Bot commented May 4, 2026 •

edited

Loading

Help text fix (commit `9491561`) ✅

Gemini finding 1: `WithLabelValues` instead of `prometheus.Labels` — Apply this

Gemini finding 2: `sync.RWMutex` for `queueLabelForCardinalityBudget` — Skip, document why

claude Bot commented May 4, 2026 •

edited

Loading

Verification of round 2 fix: `WithLabelValues` (commit `8230365`) ✅