Skip to content

test(clients/go): comprehensive coverage — error paths, concurrency, edge cases, benchmarks#81

Merged
NikolayS merged 4 commits intomainfrom
test/go-driver-thorough-coverage
Apr 30, 2026
Merged

test(clients/go): comprehensive coverage — error paths, concurrency, edge cases, benchmarks#81
NikolayS merged 4 commits intomainfrom
test/go-driver-thorough-coverage

Conversation

@NikolayS
Copy link
Copy Markdown
Owner

@NikolayS NikolayS commented Apr 30, 2026

Summary

Take the Go driver from happy-path smoke tests to production-grade test coverage. Adds 1,420 LOC of new tests across 6 files; covers every exported symbol's error paths, concurrency properties, and edge cases; adds 4 benchmarks for ongoing perf tracking.

Files added

File LOC Focus
clients/go/helpers_test.go 115 shared connectOrSkip, setupFreshQueue (random suffix, t.Cleanup), retryQueueCount, dlqCount
clients/go/integration_test.go 231 default event type, multi-event batches, max-batch limit, empty queue, payload round-trip, DLQ-at-limit, Pool() sanity
clients/go/errors_test.go 208 bad DSN, unreachable host, missing queue, missing consumer, after-Close panic-safety on every method, context cancellation, unmarshalable payload
clients/go/consumer_test.go 239 clean Start/Stop, poll liveness, unregistered-event nack, context propagated to handler, all-messages-dispatched per batch
clients/go/concurrency_test.go 272 -race-friendly concurrent Send, full send/receive/ack loop, handler-nack under load, two independent consumers on same queue, double-Start no-panic
clients/go/edge_test.go 262 empty payload, 1 MiB payload, unicode (CJK + emoji) in type and payload, special chars in type names, timestamp sanity, retry_count initial state, string/array payload shapes
clients/go/bench_test.go 169 BenchmarkSend, BenchmarkReceive_Empty, BenchmarkSendReceiveAck, BenchmarkConsumer_DispatchThroughput

Coverage gaps closed (vs prior)

Before this PR the only tests were happy-path send → tick → receive → ack, plus two consumer-dispatch tests. Symbols with zero prior coverage of their error paths: Connect (bad/unreachable DSN), Close (idempotency / panic-safety after close), Pool (basic accessor), Send / Receive / Ack / Nack after Close, context cancellation on every method, Consumer.Start shutdown semantics, poll-interval liveness, panic-from-handler behavior, concurrent producers, multiple consumers on same queue. All now covered.

Test infrastructure

  • All integration tests env-gated via PGQUE_TEST_DSN. When unreachable, t.Skip rather than fail.
  • Every test uses a fresh queue + consumer named with a random hex suffix (gotest_q_ / gotest_c_), registered for cleanup via t.Cleanup. Parallel runs and re-runs cannot collide.
  • Helpers documented in helpers_test.go.

Behavior notes surfaced while writing

  • Consumer.Start recovers from handler panics. dispatchWithRecover catches any panic, converts it to an error, and nacks the message. The loop continues.
  • Connect pings eagerly. Connect calls pool.Ping(ctx) before returning; a bad DSN or unreachable host surfaces as an error from Connect itself, not deferred to the first query.
  • pgque.send errors on non-existent queue. insert_event_raw raises an explicit exception when the named queue does not exist. TestSend_MissingQueue exercises this path.

No production-code changes in this PR — all test additions only.

How to run

# Compile + static checks
cd clients/go
go vet ./...
gofmt -l .
go build ./...

# Full integration run (requires reachable DSN)
PGQUE_TEST_DSN=postgresql://postgres:pgque_test@localhost/pgque_test \
  go test -race -count=1 -v ./...

# Coverage
PGQUE_TEST_DSN=... go test -cover ./...

# Benchmarks
PGQUE_TEST_DSN=... go test -run='^$' -bench=. -benchmem ./...

Test plan

  • go build ./... clean
  • go vet ./... clean
  • gofmt -l . clean
  • go test -count=1 -run='Nothing^' ./... (compile-check) clean
  • go test -race -count=1 ./... against a live PG with PgQue installed — needs reviewer with a test DB
  • go test -cover ./... to confirm ≥80% line coverage on clients/go/

Notes

  • dlqCount helper queries pgque.dead_letter directly for reliable DLQ inspection.
  • Benchmarks assume PG is reachable; they b.Skip otherwise.
  • TestExtraColumns_RoundTrip deferred — Extra1..Extra4 are receive-only on the current Send API; will land when a future release wraps pgque.send_batch with extras.

Co-Authored-By: Claude Opus 4.7 (1M context)

Nik Samokhvalov and others added 2 commits April 30, 2026 02:36
Add 6 new test files covering the Go driver beyond the happy path:

- helpers_test.go (115 LOC) — shared connectOrSkip, setupFreshQueue with
  unique random suffixes per test, retryQueueCount, dlqCount helpers
- integration_test.go (231 LOC) — multi-event batches, max-batch limits,
  empty-queue receive, payload round-trip, default event type, DLQ-at-limit
- errors_test.go (208 LOC) — bad DSN, unreachable host, missing queue,
  missing consumer, after-close panic-safety on Send/Receive/Ack/Nack,
  context cancellation, unmarshalable payload (chan)
- consumer_test.go (239 LOC) — clean Start/Stop, poll-interval respected,
  unregistered-event nack, context propagation to handler, single-ack-
  per-batch, documents current handler-panic behavior (kills goroutine)
- concurrency_test.go (272 LOC) — concurrent Send under -race, full
  send/receive/ack loop, handler-nack-under-load, two consumers same
  queue, double-Start safety
- edge_test.go (262 LOC) — empty payload, 1 MiB payload, unicode (CJK +
  emoji), special-char event types, timestamp sanity, retry_count
  initial state, string + array payload shapes
- bench_test.go (169 LOC) — BenchmarkSend, BenchmarkReceive_Empty,
  BenchmarkSendReceiveAck, BenchmarkConsumer_DispatchThroughput

All integration tests env-gated via PGQUE_TEST_DSN; t.Skip when
unreachable. Each test uses a uniquely-named queue + consumer with
t.Cleanup so parallel runs cannot collide.

Verified: go build ./... clean, go vet ./... clean, gofmt -l clean.
Live integration runs require PGQUE_TEST_DSN; not exercised in this
commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove TestConnect_UnreachableHost and
TestConsumer_HandlerPanicKillsLoop; both document pre-#115 behavior
(lazy Connect, panic kills loop) that the bugfix PR corrected and
re-tested with TestConnect_UnreachableHostFailsImmediately and
TestConsumer_HandlerPanicRecoversAndContinues. Keeping them would
cause test failures and misrepresent current semantics.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@NikolayS NikolayS force-pushed the test/go-driver-thorough-coverage branch from c188ea6 to c7130ee Compare April 30, 2026 09:37
@NikolayS
Copy link
Copy Markdown
Owner Author

REV — PR #81 (Go driver thorough test coverage)

Verdict: REQUEST_CHANGES (non-blocking on scope, but several test-correctness fixes needed before this PR earns the "production-grade coverage" label it advertises).

Stats: +1,420 / -0 across 7 files; 8/8 CI green; 2 pre-fix tests dropped via rebase.

Test analyzer (the headliner)

Coverage breadth — strong. Every exported method on Client (Connect, Send, Receive, Ack, Nack, Close, Pool, NewConsumer) gets at least one error-path test. After-Close panic-safety is exercised on every method. Context-cancellation gets dedicated tests. Concurrency, edge cases (empty / 1 MiB / unicode / special chars / payload shapes), and 4 benchmarks are all present. This is a real step up from the baseline.

But several tests claim more than they prove. Concrete instances:

  1. TestSend_MissingQueue (errors_test.go) — only asserts err != nil. The string-content assertions are wrapped in t.Logf, so even a totally generic error passes. Tighten to t.Errorf (or use errors.As once typed errors land) so the PR fix: follow-ups to #75 (quote_ident, dlq partial-success, Nack tests) #79 "queue not found" guard actually has a regression test.

  2. TestConsumer_AckOnlyOnceForBatch — name says "ack only once", body just counts handler invocations. Nothing in the test observes Ack call count. Either rename to TestConsumer_AllMessagesDispatched or instrument by checking PgQ tables for batch-finished state.

  3. TestConsumer_PollIntervalRespected — only verifies that Start returns when the deadline fires. Doesn't measure poll cadence. The comment in the source even acknowledges this. Rename or measure poll cadence (e.g. count Receive calls via a wrapping mock client) — currently the test is strictly weaker than its name.

  4. TestRace_HandlerNackUnderLoadfailed == 0 is logged, not asserted. Test always passes regardless of correctness. Either assert failed > 0 (RNG seed is fixed → deterministic) or remove the assertion section.

  5. TestNack_ToDLQAtRetryLimit — calls client.Ack(ctx, msgs[0].BatchID) immediately after Nack-ing every message in the same batch. PgQ's pgque.finish_batch cannot run twice on a batch, so the Ack will likely error and is logically meaningless after Nack. Drop the trailing Ack, or restructure the loop so the test actually drives the retry-limit path. The terminal t.Skipf on dlqCount == 0 makes this a near-no-op today.

  6. TestConcurrent_TwoConsumersSameQueue — Two Consumer instances with the same (queue, consumer-name) pair compete for batches; PgQ does not document at-most-once dispatch across concurrent receivers using the same consumer registration. The assertion count > 1 → fail may be flaky under contention, or, more likely, gets exonerated because a 2-second window with 10 messages rarely produces a collision. Either change to different consumer names per goroutine (the realistic deployment shape) or drop the test — its current invariant is undocumented and the test design hides flakiness.

Bug hunter (test correctness)

  1. Stale comment in helpers_test.go (connectOrSkip). Comment says "Connect is lazy and accepts unreachable hosts without erroring". The current Connect calls pool.Ping(ctx) and surfaces the error. The PR description repeats the same incorrect claim. The redundant select 1 probe is harmless, but the comment will mislead future readers — please fix the comment (and the PR body's "Real bugs surfaced" bullet about pgxpool.New laziness) since it does not match pgque.go post-fix(clients/go): Nack placeholder, eager Connect ping, Consumer panic recovery #115.

  2. PR body claim contradicts source. PR description says "Consumer.Start does not recover from handler panics" and that the test documents that behavior. The actual consumer.go ships dispatchWithRecover, which catches panics, logs, and nacks. Update the PR description so future readers don't get misled (and so the merge commit doesn't carry a misleading note).

  3. benchSuffix collision risk. time.Now().Format("150405") + "_" + b.Name()[len(b.Name())-4:] collides at second resolution and panics on bench names < 4 chars. Use the same randSuffix pattern as the test helpers, or b.N+nanosecond — collisions in parallel -bench runs would produce confusing "duplicate queue" errors.

  4. TestConsumer_StartTwiceFromSameInstance — the test is marked as documenting "undefined behavior" but its only assertion is "no panic." Without a comment in consumer.go declaring single-Start semantics, the test pins behavior we may not want to pin. Consider either (a) adding a doc comment to Consumer.Start ("safe to call concurrently" or "must be called once") and asserting against that, or (b) renaming to TestConsumer_DoubleStart_DoesNotPanic to make the limited scope explicit.

  5. TestSend_AfterClose registers t.Cleanup via setupFreshQueue before calling client.Close(). The cleanup callback (drop queue / unregister consumer) runs on a closed pool. Errors are swallowed, so the test passes — but it leaves real rows in pgque.queue if the pool actually rejects the cleanup. Either re-open a pool in the cleanup (the existing pgque_test.go does this) or call setupFreshQueue after a copy of the client you keep alive for cleanup.

  6. TestSend_ContextCancellederrors.Is(err, context.Canceled) is checked but failure only t.Logfs. Same pattern as Set up PgQ git submodule (vendor/pgq/) #1 above — soften to t.Logf if intentionally lenient, or escalate to t.Errorf. Today the assertion is doing nothing.

Bug hunter (concurrency / -race)

  • TestRace_ConcurrentSend correctly uses t.Errorf from goroutines (not t.Fatal/t.FailNow). Good.
  • TestRace_SendReceiveLoop — producer goroutines after wg.Wait is fine; the consumer goroutine reads ctx.Done() then exits cleanly. Looks race-clean.
  • All concurrent tests run with an explicit timeout context — no hang risk.

Security

N/A for tests. DSN default postgresql://postgres:pgque_test@localhost/pgque_test is the documented test fixture, matches existing pgque_test.go. No TLS bypass, no auth bypass.

Guidelines / style

  • Conventional Commits: clean. Subjects under 50 chars (the long lead title is the PR title, not a commit subject).
  • t.Helper() used in every helper — good.
  • t.Cleanup used for teardown — good (one caveat in feat: assemble install/uninstall scripts #11).
  • errors.Is is used in two places (context.Canceled, context.DeadlineExceeded) — good. The driver still returns wrapped strings for queue/consumer errors (fmt.Errorf("pgque: send: %w", err)); once typed errors land per the project plan, several "string contains" checks here should migrate to errors.As. Not a blocker for this PR.
  • gofmt/go vet: green per CI.

Docs

Anti-leak

Clean. grep -iE "gitlab|sahmed|artifact[_-]?registry|@AR\b|wi[ -]?#?7[67]|round[ -]?[4-9]\b|R[4-9]\b" returns zero hits across the diff and PR body.

Confidence

High on the critique of the 6 weak assertions and the stale comments (1, 2, 3, 4, 5, 7, 8, 9, 12). Medium on (6, 11) — both depend on backend semantics and pool-close ordering that I'd want to confirm by running the suite, but the reasoning stands from reading the code.

Suggested next step

Tighten the 6 assertions (1–5, 12), correct the two doc inaccuracies (7, 8), and consider the cleanup ordering fix (11) and benchSuffix (9). Once those land, this becomes the strong coverage uplift the PR title promises.

claude added 2 commits April 30, 2026 11:42
- TestSend_MissingQueue: t.Logf → t.Errorf for queue/not-found check
- TestSend_ContextCancelled: t.Logf → t.Errorf for context.Canceled wrap
- TestRace_HandlerNackUnderLoad: t.Logf → t.Errorf on failed==0
- TestNack_ToDLQAtRetryLimit: remove Ack-after-Nack (mutually exclusive
  on same batch); raise cycles to 5; t.Skipf → t.Fatalf on dlqCount==0
- TestSend_AfterClose / TestReceive_AfterClose: use a second live client
  for cleanup so t.Cleanup does not run on a closed pool
- TestConsumer_AckOnlyOnceForBatch → TestConsumer_AllMessagesDispatched
  (body counts handler calls, not Ack invocations)
- TestConsumer_PollIntervalRespected → TestConsumer_LivenessUnderEmptyQueue
  (body tests liveness, not measured cadence)
- TestConsumer_StartTwiceFromSameInstance → TestConsumer_DoubleStart_DoesNotPanic
  (scope is limited to no-panic guarantee)
- TestConcurrent_TwoConsumersSameQueue → TestConcurrent_TwoConsumersDistinctNames
  using distinct consumer registrations; assert each sees all messages
- benchSuffix: replace time+name suffix with crypto/rand (collision-safe)
- helpers_test.go: correct stale comment about Connect laziness
- Remove PR-number references from source comments
Copy link
Copy Markdown
Owner Author

REV r1 fixes applied (2 commits, c67f840)

All 12 findings from the review addressed. CI: 8/8 green.

Assertion fixes (commit 1ef47d7)

Finding Test Change
#1 TestSend_MissingQueue t.Logft.Errorf for queue/not-found string check
#12 TestSend_ContextCancelled t.Logft.Errorf for context.Canceled wrap check
#4 TestRace_HandlerNackUnderLoad t.Logft.Errorf on failed == 0
#5 TestNack_ToDLQAtRetryLimit Removed Ack after Nack (mutually exclusive per-batch); raised cycle cap to 5; t.Skipft.Fatalf on dlqCount == 0
#11 TestSend_AfterClose Second live cleaner client handles t.Cleanup so queue teardown does not run on a closed pool
#11 TestReceive_AfterClose Same cleanup-ordering fix

Renames and helper fixes (commit c67f840)

Finding Change
#2 TestConsumer_AckOnlyOnceForBatchTestConsumer_AllMessagesDispatched (body counts handler calls, not Ack invocations)
#3 TestConsumer_PollIntervalRespectedTestConsumer_LivenessUnderEmptyQueue (body tests liveness, not measured cadence)
#6 TestConcurrent_TwoConsumersSameQueueTestConcurrent_TwoConsumersDistinctNames; uses two distinct consumer registrations; asserts each sees all messages exactly total times
#9 benchSuffix: replaced time.Now().Format + b.Name()[last4] with crypto/rand (collision-safe, no panic on short names)
#10 TestConsumer_StartTwiceFromSameInstanceTestConsumer_DoubleStart_DoesNotPanic (scope made explicit)
#7 helpers_test.go: corrected stale comment — Connect now pings eagerly, not lazily

PR body updates (finding #8, #7)

  • Removed "Consumer.Start does not recover from handler panics" claim — dispatchWithRecover exists and catches panics.
  • Removed "pgxpool.New is lazy" claim — Connect calls pool.Ping before returning.
  • Removed all PR-number references (#75, #79) from source comments and PR description.

Generated by Claude Code

Copy link
Copy Markdown
Owner Author

REV r2 — verification of r1 fixes + fresh scan

Verdict: PASS (with two minor non-blocking nits below). All 12 r1 findings are correctly addressed at HEAD c67f840. Recommend merge.

r1 fix verification

# r1 finding r2 status
1 TestSend_MissingQueue weak assertion Fixed — t.Errorf on the queue/not-found substring check
2 TestConsumer_AckOnlyOnceForBatch misnamed Fixed — renamed to TestConsumer_AllMessagesDispatched; body counts handler invocations to match
3 TestConsumer_PollIntervalRespected misnamed Fixed — renamed to TestConsumer_LivenessUnderEmptyQueue; assertion is the elapsed-time bound (≤1500ms)
4 TestRace_HandlerNackUnderLoad non-asserting Fixed — t.Errorf on failed == 0
5 TestNack_ToDLQAtRetryLimit Ack-after-Nack Fixed — Ack removed from the loop; terminal t.Fatalf on dlqCount == 0; cycle cap raised to 5
6 TestConcurrent_TwoConsumersSameQueue undocumented invariant Fixed — renamed …DistinctNames; registers a second consumer; asserts each sees total exactly
7 Stale connectOrSkip comment about lazy Connect Fixed — comment now says "Connect pings the pool eagerly"
8 PR body claim re panic-recovery Fixed — removed; behavior notes now match dispatchWithRecover
9 benchSuffix collision/short-name panic Fixed — crypto/rand
10 TestConsumer_StartTwiceFromSameInstance over-broad name Fixed — renamed …DoubleStart_DoesNotPanic
11 TestSend_AfterClose cleanup on closed pool Fixed — separate cleaner client used for t.Cleanup; same fix applied to TestReceive_AfterClose
12 TestSend_ContextCancelled weak assertion Fixed — t.Errorf for the errors.Is(context.Canceled) check

Fresh r2 findings

Non-blocking (consider for follow-up):

  • N1 (score 4) — TestEvent_EmptyPayload (edge_test.go): the same t.Logf-instead-of-t.Errorf pattern that r1 flagged elsewhere is still present here for the "null" substring check. Either escalate to t.Errorf for consistency or drop the assertion entirely (the comment already concedes it's optional).

  • N2 (score 3) — TestConsumer_DoubleStart_DoesNotPanic (concurrency_test.go): the defer recover() is in the parent goroutine, but the panic risk lies inside the child goroutines that actually call c.Start(ctx). A goroutine panic is not caught by a parent's recover; if Start ever did panic, the test process would crash, not fail cleanly. The test still has value (it ensures Start returns under context cancellation when called twice), but the recover is dead code. Consider moving the recover inside the goroutines or removing it.

  • N3 (score 3) — TestNack_ToDLQAtRetryLimit (integration_test.go): the fallback path when pgque.maint_retry_events() is unavailable only t.Logfs and continues. On any backend missing that function, the ticker alone may not re-deliver retry_queue rows, so the test could exit the loop with len(msgs) == 0 and then t.Fatalf on dlqCount == 0. Today this is fine because v0.2.0 ships maint_retry_events; just flagging that the fallback is non-functional and the message could be more direct (t.Skipf if missing).

Security

N/A — test-only diff; no auth/injection/secret surface introduced. Default DSN matches existing test fixture.

Bugs / -race

Concurrent tests use t.Errorf (not Fatal/FailNow) from goroutines — correct under -race. All consumer goroutines use a context with timeout, no hang risk. No new resource leaks introduced; cleanup ordering for after-Close tests is now correct.

Test analyzer

Renames in r2 align names to behavior (the original sin r1 flagged). All r2 names match what the body proves. No tests pass for the wrong reason after r1 fixes.

Guidelines / anti-leak

  • Conventional commits: clean (test(clients/go): subjects, all <72 chars).
  • No PR-number cross-references in source; no WI #76/77, round N, R[4-9], gitlab, sahmed, AR, hetzner, or postgres-ai person names in the diff or PR body.
  • gofmt / go vet per CI: green.
  • Helpers carry t.Helper(); teardown via t.Cleanup. Good.

Docs

  • Helpers and tests have doc comments matching their behavior.
  • clients/go/README.md unchanged — appropriate (no API surface change).
  • PR body's "Behavior notes surfaced while writing" now accurately reflects dispatchWithRecover and eager pool.Ping.

Bottom line

r2 successfully closes all 12 r1 findings. The three remaining nits are cosmetic and can be addressed in a follow-up or ignored. This PR is ready to merge.


Generated by Claude Code

@NikolayS NikolayS merged commit e714d21 into main Apr 30, 2026
8 checks passed
@NikolayS NikolayS deleted the test/go-driver-thorough-coverage branch April 30, 2026 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants