chore: updated release e2e scenarios and added runner by SantiagoDePolonia · Pull Request #203 · ENTERPILOT/GoModel

SantiagoDePolonia · 2026-04-02T11:27:07Z

Summary by CodeRabbit

Tests
- Improved release E2E testing infrastructure with per-run state isolation and enhanced scenario organization.
- Added new test runner script supporting flexible scenario selection (by ID, range, or pattern), timeout management with automatic process termination, and comprehensive execution logging.
- Enhanced test scenarios with improved artifact management and client-side filtering for audit log verification.

coderabbitai · 2026-04-02T11:27:22Z

📝 Walkthrough

Walkthrough

A new release E2E testing framework is introduced with a Bash runner script that orchestrates scenario execution, manages per-run state and temporary artifacts, implements timeout handling with process monitoring, and generates comprehensive logs with exit codes and performance metrics.

Changes

Cohort / File(s)	Summary
E2E Scenario Orchestration `tests/e2e/run-release-e2e.sh`	New executable script that parses CLI options (scenario selection, ranges, regex matching), extracts scenario/setup code blocks from markdown, manages per-run state variables (`QA_SUFFIX`, `QA_RUN_DIR`, aliases), executes scenarios with per-scenario timeout watchdogs, and produces raw/TSV summary logs with exit codes and duration metrics.
E2E Scenario Definitions `tests/e2e/release-e2e-scenarios.md`	Updated scenarios to use per-run state variables and artifact directories (`QA_RUN_DIR`), introduced conditional auth cleanup based on persistence flag, replaced hardcoded temp file paths with dynamic per-run directories, updated alias names to use per-run values, modified audit-log queries from `request_id` to `search` parameter with client-side filtering, and added run-sequencing guidance.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI
    participant Parser as Scenario Parser
    participant Executor as Scenario Executor
    participant Watchdog as Timeout Watchdog
    participant Logger as Log Aggregator
    
    CLI->>Parser: Provide scenario selection filters
    Parser->>Parser: Extract scenario & setup code blocks
    Parser->>Executor: Pass scenario config & setup
    Executor->>Executor: Generate per-scenario runner script
    Executor->>Executor: Set per-run state (QA_SUFFIX, QA_RUN_DIR)
    Executor->>Watchdog: Launch scenario with timeout
    par Scenario Execution
        Executor->>Executor: Run scenario code from repo root
    and Timeout Monitoring
        Watchdog->>Watchdog: Monitor elapsed time
        Watchdog->>Executor: Escalate signals on timeout
    end
    Executor->>Logger: Capture stdout/stderr
    Logger->>Logger: Record exit code, duration, byte counts
    Logger->>CLI: Print summary TSV & raw logs

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

fix(release): adapt gpt-5 chat params and batch aliases #199: Both PRs modify tests/e2e/release-e2e-scenarios.md—this PR adds per-run state variables and runner script integration while that PR introduced auth-enabled scenarios that now integrate with the new framework.

Poem

🐰 A framework springs forth with a hop and a bound,
Per-run states dancing 'round artifacts profound,
Timeouts watched by sentries, logs flowing free,
Each scenario tested with precision and glee! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main changes: updates to release E2E scenarios and addition of a new runner script.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chore/updated-e2e-scenarios

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/e2e/release-e2e-scenarios.md`:
- Around line 755-768: The test currently uses a fixed sleep (sleep 6) before
querying the audit log which may be flaky if the audit flush interval differs;
update the scenario to remove the hard-coded 6s wait and instead either
parameterize the wait via an environment variable (e.g., AUDIT_FLUSH_WAIT) or
replace it with a short polling loop that retries the admin API
(/admin/api/v1/audit/log?search=$REQUEST_ID) until the expected entry appears or
a configurable timeout is reached; reference the REQUEST_ID variable, the
existing curl that queries the audit log, and the sleep 6 command as the
locations to change so the test becomes robust across environments.

In `@tests/e2e/run-release-e2e.sh`:
- Around line 335-338: The current cleanup only traps EXIT and can leave the
background scenario process and its watchdog running on SIGINT/SIGTERM; update
the trap logic that currently references KEEP_ARTIFACTS and
cleanup_release_auth_artifacts so that when KEEP_ARTIFACTS==0 you register the
cleanup_release_auth_artifacts handler for additional signals (at minimum SIGINT
and SIGTERM, optionally SIGHUP) so the same cleanup runs on
interrupts/terminations and ensures background scenario and watchdog processes
are killed and artifacts removed; keep the conditional on KEEP_ARTIFACTS and
ensure the trap is installed early enough before starting background processes.
- Around line 167-171: The --timeout branch currently sets TIMEOUT_SECONDS from
the argument but later digit-only validation allows "0"; update the validation
to require a positive integer (>0) by checking TIMEOUT_SECONDS after assignment
(e.g., ensure it matches ^[1-9][0-9]*$) and call die with a clear message if it
fails; reference the TIMEOUT_SECONDS variable and the --timeout case so
reviewers can find and replace the existing digit-only check with this stricter
positive-integer check.
- Around line 222-270: The AWK only matches two-digit scenario IDs; make it
accept any digit count and extract ID/title robustly by replacing the /^###
S[0-9][0-9] / test and the current_id/current_title assignment with a
match-based approach: use match($0, /S[0-9]+/) then set current_id = substr($0,
RSTART, RLENGTH) and current_title = substr($0, RSTART + RLENGTH + 1) so
headings like "### S1 ..." or "### S100 ..." are handled; update the awk block
that contains the current_id/current_title logic accordingly (the code paths
around the /^### S[0-9][0-9] / check and where current_title is computed).
- Around line 108-139: The run_with_timeout function can interact with unrelated
processes if the child's PID is recycled before the watchdog runs; fix this by
launching the script in its own process group and making the watchdog target
that group. Concretely: when starting the job in run_with_timeout (the bash
"$script_path" -> & block that sets pid), spawn the child in a new
session/process-group (e.g., use setsid or start with "bash -c 'exec
\"$script_path\"' &" under a new session) and then have the watchdog use kill -0
and kill -TERM/KILL on the negative PID (process group) instead of the single
PID; keep the timeout_marker behavior the same. Optionally, where available,
replace the whole run_with_timeout implementation with the system timeout
command (using TIMEOUT_SECONDS) as a simpler alternative.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 53776f6f-4acf-4a38-a7b7-2a47a300551d

📥 Commits

Reviewing files that changed from the base of the PR and between cc42d4a and 89f6ae8.

📒 Files selected for processing (2)

tests/e2e/release-e2e-scenarios.md
tests/e2e/run-release-e2e.sh

coderabbitai · 2026-04-02T11:35:34Z

+### S61 Unsupported translated model is still visible in audit log search

-Checks that a rejected translated request is still visible in audit logs with the requested model and error type.
+Checks that a rejected translated request is still visible in audit-log search by request ID with the requested model and error type.

 ```bash
-REQUEST_ID="qa-invalid-model-$(date +%s)"
+REQUEST_ID="qa-invalid-model-$(date +%s)-$$"
 curl -sS -i "$BASE_URL/v1/chat/completions" \
  -H 'Content-Type: application/json' \
  -H "X-Request-ID: $REQUEST_ID" \
  -d '{"model":"does-not-exist-model","messages":[{"role":"user","content":"Reply with exactly QA_INVALID_MODEL"}],"max_tokens":20}' && echo
 sleep 6
-curl -sS "$BASE_URL/admin/api/v1/audit/log?request_id=$REQUEST_ID&limit=5" \
-  | jq '{total,entries:(.entries|map({request_id,path,model,resolved_model,provider,status_code,error_type}))}'
+curl -sS "$BASE_URL/admin/api/v1/audit/log?search=$REQUEST_ID&limit=5" \
+  | jq --arg request_id "$REQUEST_ID" '{total:(.entries|map(select(.request_id==$request_id))|length),entries:(.entries|map(select(.request_id==$request_id))|map({request_id,path,model,resolved_model,provider,status_code,error_type}))}'
 ```


🧹 Nitpick | 🔵 Trivial

Confirm the sleep 6 is sufficient for audit log flush.

The scenario adds sleep 6 before querying the audit log with ?search=. Based on the relevant code snippets, the Search parameter is properly implemented server-side for both SQLite and PostgreSQL. The client-side jq filtering with select(.request_id==$request_id) adds defense-in-depth but should be redundant given the server-side implementation.

One consideration: if the audit flush interval is configured differently in some environments, 6 seconds may not be sufficient. Consider documenting or parameterizing the expected flush interval.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/e2e/release-e2e-scenarios.md` around lines 755 - 768, The test currently uses a fixed sleep (sleep 6) before querying the audit log which may be flaky if the audit flush interval differs; update the scenario to remove the hard-coded 6s wait and instead either parameterize the wait via an environment variable (e.g., AUDIT_FLUSH_WAIT) or replace it with a short polling loop that retries the admin API (/admin/api/v1/audit/log?search=$REQUEST_ID) until the expected entry appears or a configurable timeout is reached; reference the REQUEST_ID variable, the existing curl that queries the audit log, and the sleep 6 command as the locations to change so the test becomes robust across environments.

coderabbitai · 2026-04-02T11:35:34Z

+run_with_timeout() {
+  local script_path="$1"
+  local stdout_path="$2"
+  local stderr_path="$3"
+  local timeout_marker="$4"
+
+  rm -f "$timeout_marker"
+
+  bash "$script_path" >"$stdout_path" 2>"$stderr_path" &
+  local pid=$!
+
+  (
+    sleep "$TIMEOUT_SECONDS"
+    if kill -0 "$pid" 2>/dev/null; then
+      : >"$timeout_marker"
+      kill -TERM "$pid" 2>/dev/null || true
+      sleep 1
+      kill -KILL "$pid" 2>/dev/null || true
+    fi
+  ) &
+  local watchdog=$!
+
+  local status=0
+  if ! wait "$pid"; then
+    status=$?
+  fi
+
+  kill "$watchdog" 2>/dev/null || true
+  wait "$watchdog" 2>/dev/null || true
+
+  return "$status"
+}


🧹 Nitpick | 🔵 Trivial

Race condition in watchdog cleanup on fast-completing scenarios.

If the scenario completes very quickly (before the watchdog even starts sleeping), there's a window where:

The scenario finishes

wait "$pid" completes

The watchdog is killed

But wait "$watchdog" may block or produce spurious errors

The current handling with 2>/dev/null || true mitigates this, but there's a subtle race: if the PID is reused by another process before kill -0 "$pid" runs in the watchdog, the watchdog could theoretically interact with an unrelated process.

This is unlikely in practice due to PID recycling behavior, but consider using process groups or a more robust timeout mechanism like timeout command if available.

Alternative using `timeout` command if available

run_with_timeout() { local script_path="$1" local stdout_path="$2" local stderr_path="$3" local timeout_marker="$4" rm -f "$timeout_marker" + # Use GNU timeout if available for more robust handling + if command -v timeout >/dev/null 2>&1; then + local status=0 + if ! timeout --signal=TERM --kill-after=5 "$TIMEOUT_SECONDS" \ + bash "$script_path" >"$stdout_path" 2>"$stderr_path"; then + status=$? + if [[ "$status" -eq 124 ]]; then + : >"$timeout_marker" + fi + fi + return "$status" + fi + + # Fallback to manual watchdog bash "$script_path" >"$stdout_path" 2>"$stderr_path" &

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

run_with_timeout() {

local script_path="$1"

local stdout_path="$2"

local stderr_path="$3"

local timeout_marker="$4"

rm -f "$timeout_marker"

bash "$script_path" >"$stdout_path" 2>"$stderr_path" &

local pid=$!

(

sleep "$TIMEOUT_SECONDS"

if kill -0 "$pid" 2>/dev/null; then

: >"$timeout_marker"

kill -TERM "$pid" 2>/dev/null || true

sleep 1

kill -KILL "$pid" 2>/dev/null || true

fi

) &

local watchdog=$!

local status=0

if ! wait "$pid"; then

status=$?

fi

kill "$watchdog" 2>/dev/null || true

wait "$watchdog" 2>/dev/null || true

return "$status"

}

run_with_timeout() {

local script_path="$1"

local stdout_path="$2"

local stderr_path="$3"

local timeout_marker="$4"

rm -f "$timeout_marker"

# Use GNU timeout if available for more robust handling

if command -v timeout >/dev/null 2>&1; then

local status=0

if ! timeout --signal=TERM --kill-after=5 "$TIMEOUT_SECONDS" \

bash "$script_path" >"$stdout_path" 2>"$stderr_path"; then

status=$?

if [[ "$status" -eq 124 ]]; then

: >"$timeout_marker"

fi

fi

return "$status"

fi

# Fallback to manual watchdog

bash "$script_path" >"$stdout_path" 2>"$stderr_path" &

local pid=$!

(

sleep "$TIMEOUT_SECONDS"

if kill -0 "$pid" 2>/dev/null; then

: >"$timeout_marker"

kill -TERM "$pid" 2>/dev/null || true

sleep 1

kill -KILL "$pid" 2>/dev/null || true

fi

) &

local watchdog=$!

local status=0

if ! wait "$pid"; then

status=$?

fi

kill "$watchdog" 2>/dev/null || true

wait "$watchdog" 2>/dev/null || true

return "$status"

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/e2e/run-release-e2e.sh` around lines 108 - 139, The run_with_timeout function can interact with unrelated processes if the child's PID is recycled before the watchdog runs; fix this by launching the script in its own process group and making the watchdog target that group. Concretely: when starting the job in run_with_timeout (the bash "$script_path" -> & block that sets pid), spawn the child in a new session/process-group (e.g., use setsid or start with "bash -c 'exec \"$script_path\"' &" under a new session) and then have the watchdog use kill -0 and kill -TERM/KILL on the negative PID (process group) instead of the single PID; keep the timeout_marker behavior the same. Optionally, where available, replace the whole run_with_timeout implementation with the system timeout command (using TIMEOUT_SECONDS) as a simpler alternative.

coderabbitai · 2026-04-02T11:35:34Z

+    --timeout)
+      [[ $# -ge 2 ]] || die "--timeout requires a value"
+      TIMEOUT_SECONDS="$2"
+      shift 2
+      ;;


🧹 Nitpick | 🔵 Trivial

Validate --timeout is a positive integer.

The validation on line 201 checks for digits but allows 0, which would cause the watchdog to timeout immediately. Consider requiring a positive value.

Proposed fix to require positive timeout

-[[ "$TIMEOUT_SECONDS" =~ ^[0-9]+$ ]] || die "--timeout must be an integer number of seconds" +[[ "$TIMEOUT_SECONDS" =~ ^[0-9]+$ ]] || die "--timeout must be a positive integer" +(( TIMEOUT_SECONDS > 0 )) || die "--timeout must be greater than zero"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/e2e/run-release-e2e.sh` around lines 167 - 171, The --timeout branch currently sets TIMEOUT_SECONDS from the argument but later digit-only validation allows "0"; update the validation to require a positive integer (>0) by checking TIMEOUT_SECONDS after assignment (e.g., ensure it matches ^[1-9][0-9]*$) and call die with a clear message if it fails; reference the TIMEOUT_SECONDS variable and the --timeout case so reviewers can find and replace the existing digit-only check with this stricter positive-integer check.

coderabbitai · 2026-04-02T11:35:34Z

+awk -v parsed_dir="$PARSED_DIR" -v manifest="$MANIFEST" -v setup_manifest="$SETUP_MANIFEST" '
+  function flush_code_block(    path) {
+    if (current_id != "") {
+      path = sprintf("%s/%s.sh", parsed_dir, current_id)
+      print code > path
+      close(path)
+      printf "%s\t%s\t%s\n", current_id, current_title, path >> manifest
+      close(manifest)
+      current_id = ""
+      current_title = ""
+    } else if (current_h2 == "Common environment" || current_h2 == "Auth-enabled runtime environment") {
+      setup_count++
+      path = sprintf("%s/setup-%02d.sh", parsed_dir, setup_count)
+      print code > path
+      close(path)
+      print path >> setup_manifest
+      close(setup_manifest)
+    }
+    code = ""
+  }
+
+  {
+    sub(/\r$/, "", $0)
+    if (!in_code && $0 ~ /^## /) {
+      current_h2 = substr($0, 4)
+    }
+    if (!in_code && $0 ~ /^### S[0-9][0-9] /) {
+      current_id = $2
+      current_title = substr($0, length(current_id) + 5)
+      next
+    }
+
+    if ($0 == "```bash") {
+      in_code = 1
+      code = ""
+      next
+    }
+
+    if (in_code && $0 == "```") {
+      in_code = 0
+      flush_code_block()
+      next
+    }
+
+    if (in_code) {
+      code = code $0 ORS
+    }
+  }
+' "$SCENARIO_DOC"


🧹 Nitpick | 🔵 Trivial

AWK parsing may miss scenarios with non-standard heading formats.

The AWK script expects scenario headings to match ^### S[0-9][0-9] (exactly two digits). This will fail for scenarios S01-S09 if written without leading zero, or for scenarios S100+ if the test suite grows.

Based on the current markdown, all scenarios use two-digit IDs (S01-S79), so this is fine now. Consider making the regex more flexible for future-proofing.

More flexible scenario ID regex

- if (!in_code && $0 ~ /^### S[0-9][0-9] /) { + if (!in_code && $0 ~ /^### S[0-9]+ /) {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/e2e/run-release-e2e.sh` around lines 222 - 270, The AWK only matches two-digit scenario IDs; make it accept any digit count and extract ID/title robustly by replacing the /^### S[0-9][0-9] / test and the current_id/current_title assignment with a match-based approach: use match($0, /S[0-9]+/) then set current_id = substr($0, RSTART, RLENGTH) and current_title = substr($0, RSTART + RLENGTH + 1) so headings like "### S1 ..." or "### S100 ..." are handled; update the awk block that contains the current_id/current_title logic accordingly (the code paths around the /^### S[0-9][0-9] / check and where current_title is computed).

coderabbitai · 2026-04-02T11:35:35Z

+cleanup_release_auth_artifacts
+if (( KEEP_ARTIFACTS == 0 )); then
+  trap cleanup_release_auth_artifacts EXIT
+fi


🧹 Nitpick | 🔵 Trivial

Cleanup trap may leave orphan processes on SIGINT/SIGTERM.

The trap only handles EXIT. If the script receives SIGINT (Ctrl+C) during scenario execution, the background scenario process and its watchdog may continue running after the runner exits.

Consider trapping additional signals to ensure clean shutdown.

Proposed trap enhancement

cleanup_release_auth_artifacts if (( KEEP_ARTIFACTS == 0 )); then trap cleanup_release_auth_artifacts EXIT fi + +# Ensure child processes are terminated on interrupt +trap 'trap - EXIT; cleanup_release_auth_artifacts; exit 130' INT TERM

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/e2e/run-release-e2e.sh` around lines 335 - 338, The current cleanup only traps EXIT and can leave the background scenario process and its watchdog running on SIGINT/SIGTERM; update the trap logic that currently references KEEP_ARTIFACTS and cleanup_release_auth_artifacts so that when KEEP_ARTIFACTS==0 you register the cleanup_release_auth_artifacts handler for additional signals (at minimum SIGINT and SIGTERM, optionally SIGHUP) so the same cleanup runs on interrupts/terminations and ensures background scenario and watchdog processes are killed and artifacts removed; keep the conditional on KEEP_ARTIFACTS and ensure the trap is installed early enough before starting background processes.

chore: updated release e2e scenarios and added runner

89f6ae8

SantiagoDePolonia self-assigned this Apr 2, 2026

coderabbitai Bot reviewed Apr 2, 2026

View reviewed changes

SantiagoDePolonia merged commit 29d4581 into main Apr 2, 2026
16 checks passed

SantiagoDePolonia deleted the chore/updated-e2e-scenarios branch April 4, 2026 11:52

coderabbitai Bot mentioned this pull request Apr 12, 2026

chore(e2e): fix release validation checks #225

Merged

This was referenced Apr 25, 2026

test(e2e): harden gateway e2e tests #272

Merged

test(budget): add budget management coverage #284

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: updated release e2e scenarios and added runner#203

chore: updated release e2e scenarios and added runner#203
SantiagoDePolonia merged 1 commit intomainfrom
chore/updated-e2e-scenarios

SantiagoDePolonia commented Apr 2, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 2, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 2, 2026

Uh oh!

coderabbitai Bot Apr 2, 2026

Uh oh!

coderabbitai Bot Apr 2, 2026

Uh oh!

coderabbitai Bot Apr 2, 2026

Uh oh!

coderabbitai Bot Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

SantiagoDePolonia commented Apr 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SantiagoDePolonia commented Apr 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 2, 2026 •

edited

Loading