Skip to content

chore: updated release e2e scenarios and added runner#203

Merged
SantiagoDePolonia merged 1 commit intomainfrom
chore/updated-e2e-scenarios
Apr 2, 2026
Merged

chore: updated release e2e scenarios and added runner#203
SantiagoDePolonia merged 1 commit intomainfrom
chore/updated-e2e-scenarios

Conversation

@SantiagoDePolonia
Copy link
Copy Markdown
Contributor

@SantiagoDePolonia SantiagoDePolonia commented Apr 2, 2026

Summary by CodeRabbit

  • Tests
    • Improved release E2E testing infrastructure with per-run state isolation and enhanced scenario organization.
    • Added new test runner script supporting flexible scenario selection (by ID, range, or pattern), timeout management with automatic process termination, and comprehensive execution logging.
    • Enhanced test scenarios with improved artifact management and client-side filtering for audit log verification.

@SantiagoDePolonia SantiagoDePolonia self-assigned this Apr 2, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 2, 2026

📝 Walkthrough

Walkthrough

A new release E2E testing framework is introduced with a Bash runner script that orchestrates scenario execution, manages per-run state and temporary artifacts, implements timeout handling with process monitoring, and generates comprehensive logs with exit codes and performance metrics.

Changes

Cohort / File(s) Summary
E2E Scenario Orchestration
tests/e2e/run-release-e2e.sh
New executable script that parses CLI options (scenario selection, ranges, regex matching), extracts scenario/setup code blocks from markdown, manages per-run state variables (QA_SUFFIX, QA_RUN_DIR, aliases), executes scenarios with per-scenario timeout watchdogs, and produces raw/TSV summary logs with exit codes and duration metrics.
E2E Scenario Definitions
tests/e2e/release-e2e-scenarios.md
Updated scenarios to use per-run state variables and artifact directories (QA_RUN_DIR), introduced conditional auth cleanup based on persistence flag, replaced hardcoded temp file paths with dynamic per-run directories, updated alias names to use per-run values, modified audit-log queries from request_id to search parameter with client-side filtering, and added run-sequencing guidance.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI
    participant Parser as Scenario Parser
    participant Executor as Scenario Executor
    participant Watchdog as Timeout Watchdog
    participant Logger as Log Aggregator
    
    CLI->>Parser: Provide scenario selection filters
    Parser->>Parser: Extract scenario & setup code blocks
    Parser->>Executor: Pass scenario config & setup
    Executor->>Executor: Generate per-scenario runner script
    Executor->>Executor: Set per-run state (QA_SUFFIX, QA_RUN_DIR)
    Executor->>Watchdog: Launch scenario with timeout
    par Scenario Execution
        Executor->>Executor: Run scenario code from repo root
    and Timeout Monitoring
        Watchdog->>Watchdog: Monitor elapsed time
        Watchdog->>Executor: Escalate signals on timeout
    end
    Executor->>Logger: Capture stdout/stderr
    Logger->>Logger: Record exit code, duration, byte counts
    Logger->>CLI: Print summary TSV & raw logs
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 A framework springs forth with a hop and a bound,
Per-run states dancing 'round artifacts profound,
Timeouts watched by sentries, logs flowing free,
Each scenario tested with precision and glee!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes: updates to release E2E scenarios and addition of a new runner script.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chore/updated-e2e-scenarios

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/e2e/release-e2e-scenarios.md`:
- Around line 755-768: The test currently uses a fixed sleep (sleep 6) before
querying the audit log which may be flaky if the audit flush interval differs;
update the scenario to remove the hard-coded 6s wait and instead either
parameterize the wait via an environment variable (e.g., AUDIT_FLUSH_WAIT) or
replace it with a short polling loop that retries the admin API
(/admin/api/v1/audit/log?search=$REQUEST_ID) until the expected entry appears or
a configurable timeout is reached; reference the REQUEST_ID variable, the
existing curl that queries the audit log, and the sleep 6 command as the
locations to change so the test becomes robust across environments.

In `@tests/e2e/run-release-e2e.sh`:
- Around line 335-338: The current cleanup only traps EXIT and can leave the
background scenario process and its watchdog running on SIGINT/SIGTERM; update
the trap logic that currently references KEEP_ARTIFACTS and
cleanup_release_auth_artifacts so that when KEEP_ARTIFACTS==0 you register the
cleanup_release_auth_artifacts handler for additional signals (at minimum SIGINT
and SIGTERM, optionally SIGHUP) so the same cleanup runs on
interrupts/terminations and ensures background scenario and watchdog processes
are killed and artifacts removed; keep the conditional on KEEP_ARTIFACTS and
ensure the trap is installed early enough before starting background processes.
- Around line 167-171: The --timeout branch currently sets TIMEOUT_SECONDS from
the argument but later digit-only validation allows "0"; update the validation
to require a positive integer (>0) by checking TIMEOUT_SECONDS after assignment
(e.g., ensure it matches ^[1-9][0-9]*$) and call die with a clear message if it
fails; reference the TIMEOUT_SECONDS variable and the --timeout case so
reviewers can find and replace the existing digit-only check with this stricter
positive-integer check.
- Around line 222-270: The AWK only matches two-digit scenario IDs; make it
accept any digit count and extract ID/title robustly by replacing the /^###
S[0-9][0-9] / test and the current_id/current_title assignment with a
match-based approach: use match($0, /S[0-9]+/) then set current_id = substr($0,
RSTART, RLENGTH) and current_title = substr($0, RSTART + RLENGTH + 1) so
headings like "### S1 ..." or "### S100 ..." are handled; update the awk block
that contains the current_id/current_title logic accordingly (the code paths
around the /^### S[0-9][0-9] / check and where current_title is computed).
- Around line 108-139: The run_with_timeout function can interact with unrelated
processes if the child's PID is recycled before the watchdog runs; fix this by
launching the script in its own process group and making the watchdog target
that group. Concretely: when starting the job in run_with_timeout (the bash
"$script_path" -> & block that sets pid), spawn the child in a new
session/process-group (e.g., use setsid or start with "bash -c 'exec
\"$script_path\"' &" under a new session) and then have the watchdog use kill -0
and kill -TERM/KILL on the negative PID (process group) instead of the single
PID; keep the timeout_marker behavior the same. Optionally, where available,
replace the whole run_with_timeout implementation with the system timeout
command (using TIMEOUT_SECONDS) as a simpler alternative.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 53776f6f-4acf-4a38-a7b7-2a47a300551d

📥 Commits

Reviewing files that changed from the base of the PR and between cc42d4a and 89f6ae8.

📒 Files selected for processing (2)
  • tests/e2e/release-e2e-scenarios.md
  • tests/e2e/run-release-e2e.sh

Comment on lines +755 to 768
### S61 Unsupported translated model is still visible in audit log search

Checks that a rejected translated request is still visible in audit logs with the requested model and error type.
Checks that a rejected translated request is still visible in audit-log search by request ID with the requested model and error type.

```bash
REQUEST_ID="qa-invalid-model-$(date +%s)"
REQUEST_ID="qa-invalid-model-$(date +%s)-$$"
curl -sS -i "$BASE_URL/v1/chat/completions" \
-H 'Content-Type: application/json' \
-H "X-Request-ID: $REQUEST_ID" \
-d '{"model":"does-not-exist-model","messages":[{"role":"user","content":"Reply with exactly QA_INVALID_MODEL"}],"max_tokens":20}' && echo
sleep 6
curl -sS "$BASE_URL/admin/api/v1/audit/log?request_id=$REQUEST_ID&limit=5" \
| jq '{total,entries:(.entries|map({request_id,path,model,resolved_model,provider,status_code,error_type}))}'
curl -sS "$BASE_URL/admin/api/v1/audit/log?search=$REQUEST_ID&limit=5" \
| jq --arg request_id "$REQUEST_ID" '{total:(.entries|map(select(.request_id==$request_id))|length),entries:(.entries|map(select(.request_id==$request_id))|map({request_id,path,model,resolved_model,provider,status_code,error_type}))}'
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Confirm the sleep 6 is sufficient for audit log flush.

The scenario adds sleep 6 before querying the audit log with ?search=. Based on the relevant code snippets, the Search parameter is properly implemented server-side for both SQLite and PostgreSQL. The client-side jq filtering with select(.request_id==$request_id) adds defense-in-depth but should be redundant given the server-side implementation.

One consideration: if the audit flush interval is configured differently in some environments, 6 seconds may not be sufficient. Consider documenting or parameterizing the expected flush interval.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/release-e2e-scenarios.md` around lines 755 - 768, The test
currently uses a fixed sleep (sleep 6) before querying the audit log which may
be flaky if the audit flush interval differs; update the scenario to remove the
hard-coded 6s wait and instead either parameterize the wait via an environment
variable (e.g., AUDIT_FLUSH_WAIT) or replace it with a short polling loop that
retries the admin API (/admin/api/v1/audit/log?search=$REQUEST_ID) until the
expected entry appears or a configurable timeout is reached; reference the
REQUEST_ID variable, the existing curl that queries the audit log, and the sleep
6 command as the locations to change so the test becomes robust across
environments.

Comment on lines +108 to +139
run_with_timeout() {
local script_path="$1"
local stdout_path="$2"
local stderr_path="$3"
local timeout_marker="$4"

rm -f "$timeout_marker"

bash "$script_path" >"$stdout_path" 2>"$stderr_path" &
local pid=$!

(
sleep "$TIMEOUT_SECONDS"
if kill -0 "$pid" 2>/dev/null; then
: >"$timeout_marker"
kill -TERM "$pid" 2>/dev/null || true
sleep 1
kill -KILL "$pid" 2>/dev/null || true
fi
) &
local watchdog=$!

local status=0
if ! wait "$pid"; then
status=$?
fi

kill "$watchdog" 2>/dev/null || true
wait "$watchdog" 2>/dev/null || true

return "$status"
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Race condition in watchdog cleanup on fast-completing scenarios.

If the scenario completes very quickly (before the watchdog even starts sleeping), there's a window where:

  1. The scenario finishes
  2. wait "$pid" completes
  3. The watchdog is killed
  4. But wait "$watchdog" may block or produce spurious errors

The current handling with 2>/dev/null || true mitigates this, but there's a subtle race: if the PID is reused by another process before kill -0 "$pid" runs in the watchdog, the watchdog could theoretically interact with an unrelated process.

This is unlikely in practice due to PID recycling behavior, but consider using process groups or a more robust timeout mechanism like timeout command if available.

Alternative using `timeout` command if available
 run_with_timeout() {
   local script_path="$1"
   local stdout_path="$2"
   local stderr_path="$3"
   local timeout_marker="$4"

   rm -f "$timeout_marker"

+  # Use GNU timeout if available for more robust handling
+  if command -v timeout >/dev/null 2>&1; then
+    local status=0
+    if ! timeout --signal=TERM --kill-after=5 "$TIMEOUT_SECONDS" \
+        bash "$script_path" >"$stdout_path" 2>"$stderr_path"; then
+      status=$?
+      if [[ "$status" -eq 124 ]]; then
+        : >"$timeout_marker"
+      fi
+    fi
+    return "$status"
+  fi
+
+  # Fallback to manual watchdog
   bash "$script_path" >"$stdout_path" 2>"$stderr_path" &
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
run_with_timeout() {
local script_path="$1"
local stdout_path="$2"
local stderr_path="$3"
local timeout_marker="$4"
rm -f "$timeout_marker"
bash "$script_path" >"$stdout_path" 2>"$stderr_path" &
local pid=$!
(
sleep "$TIMEOUT_SECONDS"
if kill -0 "$pid" 2>/dev/null; then
: >"$timeout_marker"
kill -TERM "$pid" 2>/dev/null || true
sleep 1
kill -KILL "$pid" 2>/dev/null || true
fi
) &
local watchdog=$!
local status=0
if ! wait "$pid"; then
status=$?
fi
kill "$watchdog" 2>/dev/null || true
wait "$watchdog" 2>/dev/null || true
return "$status"
}
run_with_timeout() {
local script_path="$1"
local stdout_path="$2"
local stderr_path="$3"
local timeout_marker="$4"
rm -f "$timeout_marker"
# Use GNU timeout if available for more robust handling
if command -v timeout >/dev/null 2>&1; then
local status=0
if ! timeout --signal=TERM --kill-after=5 "$TIMEOUT_SECONDS" \
bash "$script_path" >"$stdout_path" 2>"$stderr_path"; then
status=$?
if [[ "$status" -eq 124 ]]; then
: >"$timeout_marker"
fi
fi
return "$status"
fi
# Fallback to manual watchdog
bash "$script_path" >"$stdout_path" 2>"$stderr_path" &
local pid=$!
(
sleep "$TIMEOUT_SECONDS"
if kill -0 "$pid" 2>/dev/null; then
: >"$timeout_marker"
kill -TERM "$pid" 2>/dev/null || true
sleep 1
kill -KILL "$pid" 2>/dev/null || true
fi
) &
local watchdog=$!
local status=0
if ! wait "$pid"; then
status=$?
fi
kill "$watchdog" 2>/dev/null || true
wait "$watchdog" 2>/dev/null || true
return "$status"
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/run-release-e2e.sh` around lines 108 - 139, The run_with_timeout
function can interact with unrelated processes if the child's PID is recycled
before the watchdog runs; fix this by launching the script in its own process
group and making the watchdog target that group. Concretely: when starting the
job in run_with_timeout (the bash "$script_path" -> & block that sets pid),
spawn the child in a new session/process-group (e.g., use setsid or start with
"bash -c 'exec \"$script_path\"' &" under a new session) and then have the
watchdog use kill -0 and kill -TERM/KILL on the negative PID (process group)
instead of the single PID; keep the timeout_marker behavior the same.
Optionally, where available, replace the whole run_with_timeout implementation
with the system timeout command (using TIMEOUT_SECONDS) as a simpler
alternative.

Comment on lines +167 to +171
--timeout)
[[ $# -ge 2 ]] || die "--timeout requires a value"
TIMEOUT_SECONDS="$2"
shift 2
;;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Validate --timeout is a positive integer.

The validation on line 201 checks for digits but allows 0, which would cause the watchdog to timeout immediately. Consider requiring a positive value.

Proposed fix to require positive timeout
-[[ "$TIMEOUT_SECONDS" =~ ^[0-9]+$ ]] || die "--timeout must be an integer number of seconds"
+[[ "$TIMEOUT_SECONDS" =~ ^[0-9]+$ ]] || die "--timeout must be a positive integer"
+(( TIMEOUT_SECONDS > 0 )) || die "--timeout must be greater than zero"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/run-release-e2e.sh` around lines 167 - 171, The --timeout branch
currently sets TIMEOUT_SECONDS from the argument but later digit-only validation
allows "0"; update the validation to require a positive integer (>0) by checking
TIMEOUT_SECONDS after assignment (e.g., ensure it matches ^[1-9][0-9]*$) and
call die with a clear message if it fails; reference the TIMEOUT_SECONDS
variable and the --timeout case so reviewers can find and replace the existing
digit-only check with this stricter positive-integer check.

Comment on lines +222 to +270
awk -v parsed_dir="$PARSED_DIR" -v manifest="$MANIFEST" -v setup_manifest="$SETUP_MANIFEST" '
function flush_code_block( path) {
if (current_id != "") {
path = sprintf("%s/%s.sh", parsed_dir, current_id)
print code > path
close(path)
printf "%s\t%s\t%s\n", current_id, current_title, path >> manifest
close(manifest)
current_id = ""
current_title = ""
} else if (current_h2 == "Common environment" || current_h2 == "Auth-enabled runtime environment") {
setup_count++
path = sprintf("%s/setup-%02d.sh", parsed_dir, setup_count)
print code > path
close(path)
print path >> setup_manifest
close(setup_manifest)
}
code = ""
}

{
sub(/\r$/, "", $0)
if (!in_code && $0 ~ /^## /) {
current_h2 = substr($0, 4)
}
if (!in_code && $0 ~ /^### S[0-9][0-9] /) {
current_id = $2
current_title = substr($0, length(current_id) + 5)
next
}

if ($0 == "```bash") {
in_code = 1
code = ""
next
}

if (in_code && $0 == "```") {
in_code = 0
flush_code_block()
next
}

if (in_code) {
code = code $0 ORS
}
}
' "$SCENARIO_DOC"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

AWK parsing may miss scenarios with non-standard heading formats.

The AWK script expects scenario headings to match ^### S[0-9][0-9] (exactly two digits). This will fail for scenarios S01-S09 if written without leading zero, or for scenarios S100+ if the test suite grows.

Based on the current markdown, all scenarios use two-digit IDs (S01-S79), so this is fine now. Consider making the regex more flexible for future-proofing.

More flexible scenario ID regex
-    if (!in_code && $0 ~ /^### S[0-9][0-9] /) {
+    if (!in_code && $0 ~ /^### S[0-9]+ /) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/run-release-e2e.sh` around lines 222 - 270, The AWK only matches
two-digit scenario IDs; make it accept any digit count and extract ID/title
robustly by replacing the /^### S[0-9][0-9] / test and the
current_id/current_title assignment with a match-based approach: use match($0,
/S[0-9]+/) then set current_id = substr($0, RSTART, RLENGTH) and current_title =
substr($0, RSTART + RLENGTH + 1) so headings like "### S1 ..." or "### S100 ..."
are handled; update the awk block that contains the current_id/current_title
logic accordingly (the code paths around the /^### S[0-9][0-9] / check and where
current_title is computed).

Comment on lines +335 to +338
cleanup_release_auth_artifacts
if (( KEEP_ARTIFACTS == 0 )); then
trap cleanup_release_auth_artifacts EXIT
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Cleanup trap may leave orphan processes on SIGINT/SIGTERM.

The trap only handles EXIT. If the script receives SIGINT (Ctrl+C) during scenario execution, the background scenario process and its watchdog may continue running after the runner exits.

Consider trapping additional signals to ensure clean shutdown.

Proposed trap enhancement
 cleanup_release_auth_artifacts
 if (( KEEP_ARTIFACTS == 0 )); then
   trap cleanup_release_auth_artifacts EXIT
 fi
+
+# Ensure child processes are terminated on interrupt
+trap 'trap - EXIT; cleanup_release_auth_artifacts; exit 130' INT TERM
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/run-release-e2e.sh` around lines 335 - 338, The current cleanup
only traps EXIT and can leave the background scenario process and its watchdog
running on SIGINT/SIGTERM; update the trap logic that currently references
KEEP_ARTIFACTS and cleanup_release_auth_artifacts so that when KEEP_ARTIFACTS==0
you register the cleanup_release_auth_artifacts handler for additional signals
(at minimum SIGINT and SIGTERM, optionally SIGHUP) so the same cleanup runs on
interrupts/terminations and ensures background scenario and watchdog processes
are killed and artifacts removed; keep the conditional on KEEP_ARTIFACTS and
ensure the trap is installed early enough before starting background processes.

@SantiagoDePolonia SantiagoDePolonia merged commit 29d4581 into main Apr 2, 2026
16 checks passed
@SantiagoDePolonia SantiagoDePolonia deleted the chore/updated-e2e-scenarios branch April 4, 2026 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant