Skip to content

test(e2e): harden gateway e2e tests#272

Merged
SantiagoDePolonia merged 2 commits intomainfrom
tests/e2e-tests
Apr 25, 2026
Merged

test(e2e): harden gateway e2e tests#272
SantiagoDePolonia merged 2 commits intomainfrom
tests/e2e-tests

Conversation

@SantiagoDePolonia
Copy link
Copy Markdown
Contributor

@SantiagoDePolonia SantiagoDePolonia commented Apr 25, 2026

Summary

  • Add shared e2e setup helpers for auth/admin servers and a SQLite usage fixture.
  • Turn the admin usage e2e test into a persisted usage check instead of nil-reader zero-value assertions.
  • Add default chat request fixtures plus stricter upstream payload, error body, and SSE parsing assertions.
  • Harden release e2e scenario checks with stricter curl/jq behavior and explicit negative-path validation.
  • Merged latest origin/main at 13a2d07 before pushing.

Tests

  • git diff --check
  • tests/e2e/run-release-e2e.sh --list | tail -n 5
  • go test -v -tags=e2e -timeout=5m ./tests/e2e/...
  • go test -race -tags=e2e -timeout=5m ./tests/e2e/...
  • pre-commit hook: go import/fmt, go mod tidy, make lint

Summary by CodeRabbit

  • Tests
    • Enhanced end-to-end coverage with stricter error validation for chat, responses, audit-log, auth, and admin flows.
    • Tests now verify upstream request payloads and stricter response semantics, including token/request totals for usage.
    • Improved test infrastructure: persistent usage fixture, resilient streaming parsing, standardized request payload helpers, and fail-fast scenario checks.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 25, 2026

📝 Walkthrough

Walkthrough

Introduce a centralized E2E test bootstrap module and helpers, switch admin usage tests to an SQLite-backed usage fixture, consolidate request construction and response assertion helpers, strengthen upstream/downstream payload validations across multiple E2E tests, and make release scenario scripts fail-fast on HTTP/JSON errors.

Changes

Cohort / File(s) Summary
Test Setup & Fixtures
tests/e2e/setup_test.go
Adds E2E server bootstrap utilities, configurable e2eServerOptions, provider registry setup, and an in-memory SQLite-backed usage fixture with flush/cleanup semantics.
Admin Usage Tests
tests/e2e/admin_test.go
Reworks admin usage e2e to use SQLite-backed usage fixture: generates persisted usage, flushes fixture, and asserts /admin/api/v1/usage/summary and daily usage entries.
Auth Tests
tests/e2e/auth_test.go
Removes local setupAuthServer helper and related imports; tests now rely on shared setup helpers (moved to setup_test.go).
Helpers & SSE/Parsing
tests/e2e/helpers_test.go
Adds defaultChatReq, requireErrorResponse, requireRecorded* helpers; increases SSE scanner token size; fails fast on JSON/unmarshal and scanner errors.
Auditlog / Chat / Responses Tests
tests/e2e/auditlog_test.go, tests/e2e/chat_test.go, tests/e2e/responses_test.go
Replaces hardcoded core.ChatRequest literals with defaultChatReq; sets streaming via payload after construction; adds assertUpstream callbacks to record and validate upstream payloads (model, temperature, max_tokens, stream/stream_options); tightens error assertions using requireErrorResponse.
Release E2E Scenarios (docs)
tests/e2e/release-e2e-scenarios.md
Converts curl -sS -> -fsS, switches jq invocations to -e/-er, uses temp files for headers/body in negative scenarios, and asserts HTTP statuses and error.type values.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐇 I hopped through tests, tidy and spry,
Brought fixtures and helpers so errors won't fly,
SQLite crunched numbers, logs kept in tune,
Now E2E sleeps soundly beneath the moon. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.92% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'test(e2e): harden gateway e2e tests' accurately describes the main focus of the changeset - hardening end-to-end tests across multiple test files through stricter assertions, helper consolidation, and improved error handling.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch tests/e2e-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 25, 2026

Greptile Summary

This PR hardens the gateway e2e test suite by centralising server setup helpers into setup_test.go, replacing nil-reader zero-value usage assertions with real SQLite-backed persisted data checks, adding upstream-payload and error-body assertions across chat and responses tests, and tightening the release scenario scripts with curl -f / jq -e plus explicit negative-path verification.

  • P1 — double logger.Close() in setup_test.go: flush(t) closes the logger to force-flush buffered usage records, but the t.Cleanup registered in setupSQLiteUsageFixture unconditionally calls logger.Close() a second time on test teardown. If usage.Logger.Close() is not idempotent, the cleanup's require.NoError will fail every time flush is used (currently TestAdminAPI_UsageEndpoints_E2E).

Confidence Score: 4/5

Safe to merge after fixing the double-close of the usage logger in setup_test.go.

One P1 finding: the flush method and the t.Cleanup in setupSQLiteUsageFixture both call logger.Close(), which will cause the cleanup to fail if Close is not idempotent. All other changes are well-structured improvements with no logic issues.

tests/e2e/setup_test.go — the flush / t.Cleanup double-close interaction.

Important Files Changed

Filename Overview
tests/e2e/setup_test.go New consolidated server/fixture helpers; the flush method and the t.Cleanup for logger.Close() will double-close the logger whenever flush is called explicitly.
tests/e2e/admin_test.go Replaced nil-reader zero-value assertions with real SQLite-backed persisted usage checks; test structure is correct but relies on the double-close-prone flush helper.
tests/e2e/helpers_test.go Added defaultChatReq, requireErrorResponse, and requireRecordedRequest/requireRecordedChatRequest/requireRecordedResponsesRequest helpers; SSE scanner errors are now propagated correctly.
tests/e2e/chat_test.go Migrated to defaultChatReq, added upstream payload assertions for temperature and max_tokens, and tightened error body validation.
tests/e2e/responses_test.go New TestResponsesParameters with upstream assertion callbacks; requireRecordedResponsesRequest is used consistently with a prior mockServer.ResetRequests() call in each sub-test.
tests/e2e/auth_test.go Removed local setupAuthServer duplicating logic now centralised in setup_test.go; no other changes.
tests/e2e/auditlog_test.go Swapped inline core.ChatRequest literals for defaultChatReq; purely mechanical refactor with no logic change.
tests/e2e/release-e2e-scenarios.md Added -f to all curl calls so HTTP errors propagate as non-zero exit codes; added jq -e for truthy-exit assertion; refactored negative-path scenarios (S26, S41, S45, S48, S52, S61) to capture headers/body separately and grep for expected status codes.

Sequence Diagram

sequenceDiagram
    participant T as Test
    participant F as setupSQLiteUsageFixture
    participant L as usage.Logger
    participant DB as SQLite DB
    participant S as e2e Admin Server

    T->>F: setupSQLiteUsageFixture(t)
    F->>DB: sql.Open(":memory:")
    F->>L: usage.NewLogger(store, cfg)
    F-->>T: e2eUsageFixture{reader, logger}
    Note over F,L: t.Cleanup registered: logger.Close() + db.Close()

    T->>S: setupE2EAdminServer(t, opts{usageLogger: logger})
    T->>S: POST /v1/chat/completions (x2)
    S->>L: Log usage entry (buffered)
    T->>F: flush(t) → logger.Close() ← first Close
    L->>DB: Flush buffered entries
    T->>S: GET /admin/api/v1/usage/summary
    S->>DB: Query persisted rows
    DB-->>S: {TotalRequests:2, ...}
    S-->>T: 200 OK

    Note over T,L: Test ends — t.Cleanup runs
    T->>L: logger.Close() ← second Close ⚠️ potential double-close
    T->>DB: db.Close()
Loading

Reviews (1): Last reviewed commit: "test(e2e): harden gateway e2e tests" | Re-trigger Greptile

Comment thread tests/e2e/setup_test.go Outdated
Comment on lines +131 to +145
t.Cleanup(func() {
require.NoError(t, logger.Close())
})

return &e2eUsageFixture{
reader: reader,
logger: logger,
}
}

func (f *e2eUsageFixture) flush(t *testing.T) {
t.Helper()

require.NoError(t, f.logger.Close())
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Double logger.Close() — test cleanup will fail

flush closes the logger, but the t.Cleanup registered on line 131 also calls logger.Close(). Cleanup functions run in LIFO order after the test, so the cleanup will invoke Close() a second time on an already-closed logger. If usage.Logger.Close() is not idempotent (e.g. closes a channel a second time or returns an error), the require.NoError in the cleanup will fail every time flush is called.

The simplest fix is to guard against the double-close with a flag, or remove the automatic cleanup and require callers to manage teardown:

func (f *e2eUsageFixture) flush(t *testing.T) {
    t.Helper()
    if f.closed {
        return
    }
    require.NoError(t, f.logger.Close())
    f.closed = true
}

Then the t.Cleanup in setupSQLiteUsageFixture should also check f.closed or be removed.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/e2e/helpers_test.go (1)

149-219: 🧹 Nitpick | 🔵 Trivial

Tightening SSE parsing: good change, plus a heads-up on scanner buffer size.

Failing fast on JSON unmarshal errors and asserting scanner.Err() is the right call — silent corruption of streamed chunks won't go unnoticed anymore.

One thing to be aware of: bufio.Scanner defaults to a 64KiB max token size. If a future SSE chunk (e.g., a large tool-call argument blob or base64 image) exceeds that, scanner.Err() will surface bufio.ErrTooLong and tests will fail with a misleading error. For the current mock payloads this is fine, but worth using scanner.Buffer(...) proactively if upstream payloads grow.

♻️ Optional buffer expansion
 	scanner := bufio.NewScanner(body)
+	scanner.Buffer(make([]byte, 0, 64*1024), 1024*1024)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/helpers_test.go` around lines 149 - 219, The SSE parsers
readStreamingResponse and readResponsesStream currently use bufio.NewScanner
with the default 64KiB token limit which can surface bufio.ErrTooLong for large
SSE data; fix by calling scanner.Buffer(...) after creating the scanner (in both
readStreamingResponse and readResponsesStream) to increase the initial and
maximum token size to a safe larger value (e.g., a few megabytes) so large data
chunks (base64 blobs, tool args) won’t cause ErrTooLong while preserving the
existing JSON unmarshal and scanner.Err() checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/e2e/admin_test.go`:
- Around line 189-194: The double-close occurs because usageFixture.flush(t)
calls logger.Close() and the t.Cleanup registered in setupSQLiteUsageFixture
also calls logger.Close(); modify setupSQLiteUsageFixture (the function that
registers t.Cleanup) and/or flush(t) so logger.Close() is only invoked once —
either remove the redundant Close from flush(t) or make the cleanup registration
skip closing when flush has already closed, or make logger.Close() idempotent;
ensure the unique symbols involved are setupSQLiteUsageFixture, flush (on
usageFixture), and logger.Close so reviewers can locate and apply the
single-close fix.
- Around line 211-238: In the "daily includes persisted usage" test, avoid the
UTC-midnight race and magic numbers: capture the current date once (today :=
time.Now().UTC().Format("2006-01-02")) before issuing any chat/usage requests,
then when locating the usage entry in the daily slice accept either today or
yesterday (compute yesterday := time.Now().UTC().Add(-24*time.Hour).Format(...))
when setting todayEntry; replace literal token counts in the assertions (2, 20,
40, 60) with named constants like expectedRequests, expectedInputTokens,
expectedOutputTokens, expectedTotalTokens and add a short comment that these
derive from the mock provider (10 input + 20 output per request × 2), then
assert against those constants.

In `@tests/e2e/setup_test.go`:
- Around line 95-108: The empty-string branch in setupE2ERegistry causes
inconsistent provider registration; ensure a single default provider type
("test") is always used: update callers (setupE2EServer / setupAdminServer /
setupE2EAdminServer / setupAuthServer) to pass "test" or set providerType =
"test" at the top of setupE2ERegistry, then remove the RegisterProvider(path)
branch and always call registry.RegisterProviderWithType(testProvider,
providerType) so registration behavior is consistent across tests.
- Around line 45-66: The two helpers duplicate behavior; replace the positional
helper by converging on the options-struct API: remove or deprecate
setupAdminServer and update its callers to build an e2eServerOptions (setting
masterKey, adminEndpointsEnabled=endpointsEnabled, adminUIEnabled=uiEnabled,
providerType defaulting to "test") and call setupE2EAdminServer or directly call
setupE2EServer wrapped in httptest.NewServer; alternatively make
setupAdminServer a thin wrapper that constructs e2eServerOptions and returns
setupE2EAdminServer(t, opts). Ensure you reference e2eServerOptions,
setupE2EServer, setupE2EAdminServer and setupAdminServer when making the change.

---

Outside diff comments:
In `@tests/e2e/helpers_test.go`:
- Around line 149-219: The SSE parsers readStreamingResponse and
readResponsesStream currently use bufio.NewScanner with the default 64KiB token
limit which can surface bufio.ErrTooLong for large SSE data; fix by calling
scanner.Buffer(...) after creating the scanner (in both readStreamingResponse
and readResponsesStream) to increase the initial and maximum token size to a
safe larger value (e.g., a few megabytes) so large data chunks (base64 blobs,
tool args) won’t cause ErrTooLong while preserving the existing JSON unmarshal
and scanner.Err() checks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b6298b07-dacb-42cf-a61e-04a4b2bc5bae

📥 Commits

Reviewing files that changed from the base of the PR and between 13a2d07 and 1f2f2e1.

📒 Files selected for processing (8)
  • tests/e2e/admin_test.go
  • tests/e2e/auditlog_test.go
  • tests/e2e/auth_test.go
  • tests/e2e/chat_test.go
  • tests/e2e/helpers_test.go
  • tests/e2e/release-e2e-scenarios.md
  • tests/e2e/responses_test.go
  • tests/e2e/setup_test.go
💤 Files with no reviewable changes (1)
  • tests/e2e/auth_test.go

Comment thread tests/e2e/admin_test.go Outdated
Comment thread tests/e2e/admin_test.go
Comment thread tests/e2e/setup_test.go
Comment on lines +45 to +66
// setupAdminServer creates a new server instance with admin features configured.
func setupAdminServer(t *testing.T, masterKey string, endpointsEnabled, uiEnabled bool) *httptest.Server {
t.Helper()

srv := setupE2EServer(t, e2eServerOptions{
masterKey: masterKey,
adminEndpointsEnabled: endpointsEnabled,
adminUIEnabled: uiEnabled,
providerType: "test",
})
return httptest.NewServer(srv)
}

func setupE2EAdminServer(t *testing.T, opts e2eServerOptions) *httptest.Server {
t.Helper()

opts.adminEndpointsEnabled = true
if opts.providerType == "" {
opts.providerType = "test"
}
return httptest.NewServer(setupE2EServer(t, opts))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

setupAdminServer and setupE2EAdminServer overlap — consider unifying.

Both helpers ultimately call setupE2EServer and wrap in httptest.NewServer, but with different positional/options shapes and different defaults (setupE2EAdminServer force-enables adminEndpointsEnabled; setupAdminServer lets the caller pass endpointsEnabled). The duplication is small but invites drift. Since there's only one external caller of each pattern, prefer collapsing on the options-struct API and removing the 4-arg variant in a follow-up.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/setup_test.go` around lines 45 - 66, The two helpers duplicate
behavior; replace the positional helper by converging on the options-struct API:
remove or deprecate setupAdminServer and update its callers to build an
e2eServerOptions (setting masterKey, adminEndpointsEnabled=endpointsEnabled,
adminUIEnabled=uiEnabled, providerType defaulting to "test") and call
setupE2EAdminServer or directly call setupE2EServer wrapped in
httptest.NewServer; alternatively make setupAdminServer a thin wrapper that
constructs e2eServerOptions and returns setupE2EAdminServer(t, opts). Ensure you
reference e2eServerOptions, setupE2EServer, setupE2EAdminServer and
setupAdminServer when making the change.

Comment thread tests/e2e/setup_test.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/e2e/helpers_test.go`:
- Around line 71-82: The helper requireErrorResponse leaves the response body
open; after decoding the JSON envelope in requireErrorResponse, drain the
remaining body and close it to make the helper self-contained—use
io.Copy(io.Discard, resp.Body) (ignoring non-fatal copy errors) and then
resp.Body.Close() (or defer closing immediately after entering the function) so
callers don't rely on external cleanup.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ab0dc196-51ce-416a-b220-b1426eff6687

📥 Commits

Reviewing files that changed from the base of the PR and between 1f2f2e1 and a0bf559.

📒 Files selected for processing (3)
  • tests/e2e/admin_test.go
  • tests/e2e/helpers_test.go
  • tests/e2e/setup_test.go

Comment thread tests/e2e/helpers_test.go
Comment on lines +71 to +82
func requireErrorResponse(t *testing.T, resp *http.Response, status int, errorType core.ErrorType, messageContains string) {
t.Helper()

require.Equal(t, status, resp.StatusCode)

var envelope core.OpenAIErrorEnvelope
require.NoError(t, json.NewDecoder(resp.Body).Decode(&envelope))
require.Equal(t, errorType, envelope.Error.Type)
if messageContains != "" {
require.Contains(t, envelope.Error.Message, messageContains)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Optional: consider draining/closing the body inside requireErrorResponse.

The helper decodes the envelope but leaves the body open and partially un-drained. Callers in this codebase typically defer closeBody(resp), so this is fine in practice; just worth flagging in case a future caller forgets — draining via io.Copy(io.Discard, resp.Body) after Decode (or closing here) would make the helper more self-contained.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/helpers_test.go` around lines 71 - 82, The helper
requireErrorResponse leaves the response body open; after decoding the JSON
envelope in requireErrorResponse, drain the remaining body and close it to make
the helper self-contained—use io.Copy(io.Discard, resp.Body) (ignoring non-fatal
copy errors) and then resp.Body.Close() (or defer closing immediately after
entering the function) so callers don't rely on external cleanup.

@SantiagoDePolonia SantiagoDePolonia merged commit 18c982e into main Apr 25, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants