test(e2e): harden gateway e2e tests by SantiagoDePolonia · Pull Request #272 · ENTERPILOT/GoModel

SantiagoDePolonia · 2026-04-25T13:49:33Z

Summary

Add shared e2e setup helpers for auth/admin servers and a SQLite usage fixture.
Turn the admin usage e2e test into a persisted usage check instead of nil-reader zero-value assertions.
Add default chat request fixtures plus stricter upstream payload, error body, and SSE parsing assertions.
Harden release e2e scenario checks with stricter curl/jq behavior and explicit negative-path validation.
Merged latest origin/main at 13a2d07 before pushing.

Tests

git diff --check
tests/e2e/run-release-e2e.sh --list | tail -n 5
go test -v -tags=e2e -timeout=5m ./tests/e2e/...
go test -race -tags=e2e -timeout=5m ./tests/e2e/...
pre-commit hook: go import/fmt, go mod tidy, make lint

Summary by CodeRabbit

Tests
- Enhanced end-to-end coverage with stricter error validation for chat, responses, audit-log, auth, and admin flows.
- Tests now verify upstream request payloads and stricter response semantics, including token/request totals for usage.
- Improved test infrastructure: persistent usage fixture, resilient streaming parsing, standardized request payload helpers, and fail-fast scenario checks.

coderabbitai · 2026-04-25T13:49:45Z

📝 Walkthrough

Walkthrough

Introduce a centralized E2E test bootstrap module and helpers, switch admin usage tests to an SQLite-backed usage fixture, consolidate request construction and response assertion helpers, strengthen upstream/downstream payload validations across multiple E2E tests, and make release scenario scripts fail-fast on HTTP/JSON errors.

Changes

Cohort / File(s)	Summary
Test Setup & Fixtures `tests/e2e/setup_test.go`	Adds E2E server bootstrap utilities, configurable e2eServerOptions, provider registry setup, and an in-memory SQLite-backed usage fixture with flush/cleanup semantics.
Admin Usage Tests `tests/e2e/admin_test.go`	Reworks admin usage e2e to use SQLite-backed usage fixture: generates persisted usage, flushes fixture, and asserts `/admin/api/v1/usage/summary` and daily usage entries.
Auth Tests `tests/e2e/auth_test.go`	Removes local `setupAuthServer` helper and related imports; tests now rely on shared setup helpers (moved to setup_test.go).
Helpers & SSE/Parsing `tests/e2e/helpers_test.go`	Adds `defaultChatReq`, `requireErrorResponse`, `requireRecorded*` helpers; increases SSE scanner token size; fails fast on JSON/unmarshal and scanner errors.
Auditlog / Chat / Responses Tests `tests/e2e/auditlog_test.go`, `tests/e2e/chat_test.go`, `tests/e2e/responses_test.go`	Replaces hardcoded `core.ChatRequest` literals with `defaultChatReq`; sets streaming via payload after construction; adds `assertUpstream` callbacks to record and validate upstream payloads (model, temperature, max_tokens, stream/stream_options); tightens error assertions using `requireErrorResponse`.
Release E2E Scenarios (docs) `tests/e2e/release-e2e-scenarios.md`	Converts `curl -sS` -> `-fsS`, switches `jq` invocations to `-e`/`-er`, uses temp files for headers/body in negative scenarios, and asserts HTTP statuses and error.type values.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

chore: updated release e2e scenarios and added runner #203: Modifies the same tests/e2e/release-e2e-scenarios.md file; likely related to changes in E2E scenario scripting and assertions.

Poem

🐇 I hopped through tests, tidy and spry,
Brought fixtures and helpers so errors won't fly,
SQLite crunched numbers, logs kept in tune,
Now E2E sleeps soundly beneath the moon. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 18.92% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'test(e2e): harden gateway e2e tests' accurately describes the main focus of the changeset - hardening end-to-end tests across multiple test files through stricter assertions, helper consolidation, and improved error handling.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch tests/e2e-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov-commenter · 2026-04-25T13:52:07Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-04-25T13:52:41Z

Greptile Summary

This PR hardens the gateway e2e test suite by centralising server setup helpers into setup_test.go, replacing nil-reader zero-value usage assertions with real SQLite-backed persisted data checks, adding upstream-payload and error-body assertions across chat and responses tests, and tightening the release scenario scripts with curl -f / jq -e plus explicit negative-path verification.

P1 — double logger.Close() in setup_test.go: flush(t) closes the logger to force-flush buffered usage records, but the t.Cleanup registered in setupSQLiteUsageFixture unconditionally calls logger.Close() a second time on test teardown. If usage.Logger.Close() is not idempotent, the cleanup's require.NoError will fail every time flush is used (currently TestAdminAPI_UsageEndpoints_E2E).

Confidence Score: 4/5

Safe to merge after fixing the double-close of the usage logger in setup_test.go.

One P1 finding: the flush method and the t.Cleanup in setupSQLiteUsageFixture both call logger.Close(), which will cause the cleanup to fail if Close is not idempotent. All other changes are well-structured improvements with no logic issues.

tests/e2e/setup_test.go — the flush / t.Cleanup double-close interaction.

Important Files Changed

Filename	Overview
tests/e2e/setup_test.go	New consolidated server/fixture helpers; the `flush` method and the `t.Cleanup` for `logger.Close()` will double-close the logger whenever `flush` is called explicitly.
tests/e2e/admin_test.go	Replaced nil-reader zero-value assertions with real SQLite-backed persisted usage checks; test structure is correct but relies on the double-close-prone `flush` helper.
tests/e2e/helpers_test.go	Added `defaultChatReq`, `requireErrorResponse`, and `requireRecordedRequest`/`requireRecordedChatRequest`/`requireRecordedResponsesRequest` helpers; SSE scanner errors are now propagated correctly.
tests/e2e/chat_test.go	Migrated to `defaultChatReq`, added upstream payload assertions for `temperature` and `max_tokens`, and tightened error body validation.
tests/e2e/responses_test.go	New `TestResponsesParameters` with upstream assertion callbacks; `requireRecordedResponsesRequest` is used consistently with a prior `mockServer.ResetRequests()` call in each sub-test.
tests/e2e/auth_test.go	Removed local `setupAuthServer` duplicating logic now centralised in `setup_test.go`; no other changes.
tests/e2e/auditlog_test.go	Swapped inline `core.ChatRequest` literals for `defaultChatReq`; purely mechanical refactor with no logic change.
tests/e2e/release-e2e-scenarios.md	Added `-f` to all curl calls so HTTP errors propagate as non-zero exit codes; added `jq -e` for truthy-exit assertion; refactored negative-path scenarios (S26, S41, S45, S48, S52, S61) to capture headers/body separately and grep for expected status codes.

Sequence Diagram

sequenceDiagram
    participant T as Test
    participant F as setupSQLiteUsageFixture
    participant L as usage.Logger
    participant DB as SQLite DB
    participant S as e2e Admin Server

    T->>F: setupSQLiteUsageFixture(t)
    F->>DB: sql.Open(":memory:")
    F->>L: usage.NewLogger(store, cfg)
    F-->>T: e2eUsageFixture{reader, logger}
    Note over F,L: t.Cleanup registered: logger.Close() + db.Close()

    T->>S: setupE2EAdminServer(t, opts{usageLogger: logger})
    T->>S: POST /v1/chat/completions (x2)
    S->>L: Log usage entry (buffered)
    T->>F: flush(t) → logger.Close() ← first Close
    L->>DB: Flush buffered entries
    T->>S: GET /admin/api/v1/usage/summary
    S->>DB: Query persisted rows
    DB-->>S: {TotalRequests:2, ...}
    S-->>T: 200 OK

    Note over T,L: Test ends — t.Cleanup runs
    T->>L: logger.Close() ← second Close ⚠️ potential double-close
    T->>DB: db.Close()

_{Reviews (1): Last reviewed commit: "test(e2e): harden gateway e2e tests" | Re-trigger Greptile}

greptile-apps · 2026-04-25T13:52:45Z

+	t.Cleanup(func() {
+		require.NoError(t, logger.Close())
+	})
+
+	return &e2eUsageFixture{
+		reader: reader,
+		logger: logger,
+	}
+}
+
+func (f *e2eUsageFixture) flush(t *testing.T) {
+	t.Helper()
+
+	require.NoError(t, f.logger.Close())
+}


Double logger.Close() — test cleanup will fail

flush closes the logger, but the t.Cleanup registered on line 131 also calls logger.Close(). Cleanup functions run in LIFO order after the test, so the cleanup will invoke Close() a second time on an already-closed logger. If usage.Logger.Close() is not idempotent (e.g. closes a channel a second time or returns an error), the require.NoError in the cleanup will fail every time flush is called.

The simplest fix is to guard against the double-close with a flag, or remove the automatic cleanup and require callers to manage teardown:

func (f *e2eUsageFixture) flush(t *testing.T) { t.Helper() if f.closed { return } require.NoError(t, f.logger.Close()) f.closed = true }

Then the t.Cleanup in setupSQLiteUsageFixture should also check f.closed or be removed.

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/e2e/helpers_test.go (1)
149-219: 🧹 Nitpick | 🔵 Trivial

Tightening SSE parsing: good change, plus a heads-up on scanner buffer size.

Failing fast on JSON unmarshal errors and asserting scanner.Err() is the right call — silent corruption of streamed chunks won't go unnoticed anymore.

One thing to be aware of: bufio.Scanner defaults to a 64KiB max token size. If a future SSE chunk (e.g., a large tool-call argument blob or base64 image) exceeds that, scanner.Err() will surface bufio.ErrTooLong and tests will fail with a misleading error. For the current mock payloads this is fine, but worth using scanner.Buffer(...) proactively if upstream payloads grow.
♻️ Optional buffer expansion
 	scanner := bufio.NewScanner(body)
+	scanner.Buffer(make([]byte, 0, 64*1024), 1024*1024)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/helpers_test.go` around lines 149 - 219, The SSE parsers
readStreamingResponse and readResponsesStream currently use bufio.NewScanner
with the default 64KiB token limit which can surface bufio.ErrTooLong for large
SSE data; fix by calling scanner.Buffer(...) after creating the scanner (in both
readStreamingResponse and readResponsesStream) to increase the initial and
maximum token size to a safe larger value (e.g., a few megabytes) so large data
chunks (base64 blobs, tool args) won’t cause ErrTooLong while preserving the
existing JSON unmarshal and scanner.Err() checks.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/e2e/admin_test.go`:
- Around line 189-194: The double-close occurs because usageFixture.flush(t)
calls logger.Close() and the t.Cleanup registered in setupSQLiteUsageFixture
also calls logger.Close(); modify setupSQLiteUsageFixture (the function that
registers t.Cleanup) and/or flush(t) so logger.Close() is only invoked once —
either remove the redundant Close from flush(t) or make the cleanup registration
skip closing when flush has already closed, or make logger.Close() idempotent;
ensure the unique symbols involved are setupSQLiteUsageFixture, flush (on
usageFixture), and logger.Close so reviewers can locate and apply the
single-close fix.
- Around line 211-238: In the "daily includes persisted usage" test, avoid the
UTC-midnight race and magic numbers: capture the current date once (today :=
time.Now().UTC().Format("2006-01-02")) before issuing any chat/usage requests,
then when locating the usage entry in the daily slice accept either today or
yesterday (compute yesterday := time.Now().UTC().Add(-24*time.Hour).Format(...))
when setting todayEntry; replace literal token counts in the assertions (2, 20,
40, 60) with named constants like expectedRequests, expectedInputTokens,
expectedOutputTokens, expectedTotalTokens and add a short comment that these
derive from the mock provider (10 input + 20 output per request × 2), then
assert against those constants.

In `@tests/e2e/setup_test.go`:
- Around line 95-108: The empty-string branch in setupE2ERegistry causes
inconsistent provider registration; ensure a single default provider type
("test") is always used: update callers (setupE2EServer / setupAdminServer /
setupE2EAdminServer / setupAuthServer) to pass "test" or set providerType =
"test" at the top of setupE2ERegistry, then remove the RegisterProvider(path)
branch and always call registry.RegisterProviderWithType(testProvider,
providerType) so registration behavior is consistent across tests.
- Around line 45-66: The two helpers duplicate behavior; replace the positional
helper by converging on the options-struct API: remove or deprecate
setupAdminServer and update its callers to build an e2eServerOptions (setting
masterKey, adminEndpointsEnabled=endpointsEnabled, adminUIEnabled=uiEnabled,
providerType defaulting to "test") and call setupE2EAdminServer or directly call
setupE2EServer wrapped in httptest.NewServer; alternatively make
setupAdminServer a thin wrapper that constructs e2eServerOptions and returns
setupE2EAdminServer(t, opts). Ensure you reference e2eServerOptions,
setupE2EServer, setupE2EAdminServer and setupAdminServer when making the change.

---

Outside diff comments:
In `@tests/e2e/helpers_test.go`:
- Around line 149-219: The SSE parsers readStreamingResponse and
readResponsesStream currently use bufio.NewScanner with the default 64KiB token
limit which can surface bufio.ErrTooLong for large SSE data; fix by calling
scanner.Buffer(...) after creating the scanner (in both readStreamingResponse
and readResponsesStream) to increase the initial and maximum token size to a
safe larger value (e.g., a few megabytes) so large data chunks (base64 blobs,
tool args) won’t cause ErrTooLong while preserving the existing JSON unmarshal
and scanner.Err() checks.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b6298b07-dacb-42cf-a61e-04a4b2bc5bae

📥 Commits

Reviewing files that changed from the base of the PR and between 13a2d07 and 1f2f2e1.

📒 Files selected for processing (8)

tests/e2e/admin_test.go
tests/e2e/auditlog_test.go
tests/e2e/auth_test.go
tests/e2e/chat_test.go
tests/e2e/helpers_test.go
tests/e2e/release-e2e-scenarios.md
tests/e2e/responses_test.go
tests/e2e/setup_test.go

💤 Files with no reviewable changes (1)

tests/e2e/auth_test.go

coderabbitai · 2026-04-25T13:57:15Z

+// setupAdminServer creates a new server instance with admin features configured.
+func setupAdminServer(t *testing.T, masterKey string, endpointsEnabled, uiEnabled bool) *httptest.Server {
+	t.Helper()
+
+	srv := setupE2EServer(t, e2eServerOptions{
+		masterKey:             masterKey,
+		adminEndpointsEnabled: endpointsEnabled,
+		adminUIEnabled:        uiEnabled,
+		providerType:          "test",
+	})
+	return httptest.NewServer(srv)
+}
+
+func setupE2EAdminServer(t *testing.T, opts e2eServerOptions) *httptest.Server {
+	t.Helper()
+
+	opts.adminEndpointsEnabled = true
+	if opts.providerType == "" {
+		opts.providerType = "test"
+	}
+	return httptest.NewServer(setupE2EServer(t, opts))
+}


🧹 Nitpick | 🔵 Trivial

setupAdminServer and setupE2EAdminServer overlap — consider unifying.

Both helpers ultimately call setupE2EServer and wrap in httptest.NewServer, but with different positional/options shapes and different defaults (setupE2EAdminServer force-enables adminEndpointsEnabled; setupAdminServer lets the caller pass endpointsEnabled). The duplication is small but invites drift. Since there's only one external caller of each pattern, prefer collapsing on the options-struct API and removing the 4-arg variant in a follow-up.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/e2e/setup_test.go` around lines 45 - 66, The two helpers duplicate behavior; replace the positional helper by converging on the options-struct API: remove or deprecate setupAdminServer and update its callers to build an e2eServerOptions (setting masterKey, adminEndpointsEnabled=endpointsEnabled, adminUIEnabled=uiEnabled, providerType defaulting to "test") and call setupE2EAdminServer or directly call setupE2EServer wrapped in httptest.NewServer; alternatively make setupAdminServer a thin wrapper that constructs e2eServerOptions and returns setupE2EAdminServer(t, opts). Ensure you reference e2eServerOptions, setupE2EServer, setupE2EAdminServer and setupAdminServer when making the change.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/e2e/helpers_test.go`:
- Around line 71-82: The helper requireErrorResponse leaves the response body
open; after decoding the JSON envelope in requireErrorResponse, drain the
remaining body and close it to make the helper self-contained—use
io.Copy(io.Discard, resp.Body) (ignoring non-fatal copy errors) and then
resp.Body.Close() (or defer closing immediately after entering the function) so
callers don't rely on external cleanup.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ab0dc196-51ce-416a-b220-b1426eff6687

📥 Commits

Reviewing files that changed from the base of the PR and between 1f2f2e1 and a0bf559.

📒 Files selected for processing (3)

tests/e2e/admin_test.go
tests/e2e/helpers_test.go
tests/e2e/setup_test.go

coderabbitai · 2026-04-25T14:10:06Z

+func requireErrorResponse(t *testing.T, resp *http.Response, status int, errorType core.ErrorType, messageContains string) {
+	t.Helper()
+
+	require.Equal(t, status, resp.StatusCode)
+
+	var envelope core.OpenAIErrorEnvelope
+	require.NoError(t, json.NewDecoder(resp.Body).Decode(&envelope))
+	require.Equal(t, errorType, envelope.Error.Type)
+	if messageContains != "" {
+		require.Contains(t, envelope.Error.Message, messageContains)
+	}
+}


🧹 Nitpick | 🔵 Trivial

Optional: consider draining/closing the body inside requireErrorResponse.

The helper decodes the envelope but leaves the body open and partially un-drained. Callers in this codebase typically defer closeBody(resp), so this is fine in practice; just worth flagging in case a future caller forgets — draining via io.Copy(io.Discard, resp.Body) after Decode (or closing here) would make the helper more self-contained.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/e2e/helpers_test.go` around lines 71 - 82, The helper requireErrorResponse leaves the response body open; after decoding the JSON envelope in requireErrorResponse, drain the remaining body and close it to make the helper self-contained—use io.Copy(io.Discard, resp.Body) (ignoring non-fatal copy errors) and then resp.Body.Close() (or defer closing immediately after entering the function) so callers don't rely on external cleanup.

test(e2e): harden gateway e2e tests

1f2f2e1

greptile-apps Bot reviewed Apr 25, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 25, 2026

View reviewed changes

test(e2e): address hardening review comments

a0bf559

coderabbitai Bot reviewed Apr 25, 2026

View reviewed changes

SantiagoDePolonia merged commit 18c982e into main Apr 25, 2026
19 checks passed

coderabbitai Bot mentioned this pull request Apr 27, 2026

test(budget): add budget management coverage #284

Merged

Uh oh!

Conversation

SantiagoDePolonia commented Apr 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov-commenter commented Apr 25, 2026

Codecov Report

Uh oh!

greptile-apps Bot commented Apr 25, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SantiagoDePolonia commented Apr 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 25, 2026 •

edited

Loading