chore: updated release e2e scenarios and added runner#203
chore: updated release e2e scenarios and added runner#203SantiagoDePolonia merged 1 commit intomainfrom
Conversation
📝 WalkthroughWalkthroughA new release E2E testing framework is introduced with a Bash runner script that orchestrates scenario execution, manages per-run state and temporary artifacts, implements timeout handling with process monitoring, and generates comprehensive logs with exit codes and performance metrics. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI
participant Parser as Scenario Parser
participant Executor as Scenario Executor
participant Watchdog as Timeout Watchdog
participant Logger as Log Aggregator
CLI->>Parser: Provide scenario selection filters
Parser->>Parser: Extract scenario & setup code blocks
Parser->>Executor: Pass scenario config & setup
Executor->>Executor: Generate per-scenario runner script
Executor->>Executor: Set per-run state (QA_SUFFIX, QA_RUN_DIR)
Executor->>Watchdog: Launch scenario with timeout
par Scenario Execution
Executor->>Executor: Run scenario code from repo root
and Timeout Monitoring
Watchdog->>Watchdog: Monitor elapsed time
Watchdog->>Executor: Escalate signals on timeout
end
Executor->>Logger: Capture stdout/stderr
Logger->>Logger: Record exit code, duration, byte counts
Logger->>CLI: Print summary TSV & raw logs
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/e2e/release-e2e-scenarios.md`:
- Around line 755-768: The test currently uses a fixed sleep (sleep 6) before
querying the audit log which may be flaky if the audit flush interval differs;
update the scenario to remove the hard-coded 6s wait and instead either
parameterize the wait via an environment variable (e.g., AUDIT_FLUSH_WAIT) or
replace it with a short polling loop that retries the admin API
(/admin/api/v1/audit/log?search=$REQUEST_ID) until the expected entry appears or
a configurable timeout is reached; reference the REQUEST_ID variable, the
existing curl that queries the audit log, and the sleep 6 command as the
locations to change so the test becomes robust across environments.
In `@tests/e2e/run-release-e2e.sh`:
- Around line 335-338: The current cleanup only traps EXIT and can leave the
background scenario process and its watchdog running on SIGINT/SIGTERM; update
the trap logic that currently references KEEP_ARTIFACTS and
cleanup_release_auth_artifacts so that when KEEP_ARTIFACTS==0 you register the
cleanup_release_auth_artifacts handler for additional signals (at minimum SIGINT
and SIGTERM, optionally SIGHUP) so the same cleanup runs on
interrupts/terminations and ensures background scenario and watchdog processes
are killed and artifacts removed; keep the conditional on KEEP_ARTIFACTS and
ensure the trap is installed early enough before starting background processes.
- Around line 167-171: The --timeout branch currently sets TIMEOUT_SECONDS from
the argument but later digit-only validation allows "0"; update the validation
to require a positive integer (>0) by checking TIMEOUT_SECONDS after assignment
(e.g., ensure it matches ^[1-9][0-9]*$) and call die with a clear message if it
fails; reference the TIMEOUT_SECONDS variable and the --timeout case so
reviewers can find and replace the existing digit-only check with this stricter
positive-integer check.
- Around line 222-270: The AWK only matches two-digit scenario IDs; make it
accept any digit count and extract ID/title robustly by replacing the /^###
S[0-9][0-9] / test and the current_id/current_title assignment with a
match-based approach: use match($0, /S[0-9]+/) then set current_id = substr($0,
RSTART, RLENGTH) and current_title = substr($0, RSTART + RLENGTH + 1) so
headings like "### S1 ..." or "### S100 ..." are handled; update the awk block
that contains the current_id/current_title logic accordingly (the code paths
around the /^### S[0-9][0-9] / check and where current_title is computed).
- Around line 108-139: The run_with_timeout function can interact with unrelated
processes if the child's PID is recycled before the watchdog runs; fix this by
launching the script in its own process group and making the watchdog target
that group. Concretely: when starting the job in run_with_timeout (the bash
"$script_path" -> & block that sets pid), spawn the child in a new
session/process-group (e.g., use setsid or start with "bash -c 'exec
\"$script_path\"' &" under a new session) and then have the watchdog use kill -0
and kill -TERM/KILL on the negative PID (process group) instead of the single
PID; keep the timeout_marker behavior the same. Optionally, where available,
replace the whole run_with_timeout implementation with the system timeout
command (using TIMEOUT_SECONDS) as a simpler alternative.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 53776f6f-4acf-4a38-a7b7-2a47a300551d
📒 Files selected for processing (2)
tests/e2e/release-e2e-scenarios.mdtests/e2e/run-release-e2e.sh
| ### S61 Unsupported translated model is still visible in audit log search | ||
|
|
||
| Checks that a rejected translated request is still visible in audit logs with the requested model and error type. | ||
| Checks that a rejected translated request is still visible in audit-log search by request ID with the requested model and error type. | ||
|
|
||
| ```bash | ||
| REQUEST_ID="qa-invalid-model-$(date +%s)" | ||
| REQUEST_ID="qa-invalid-model-$(date +%s)-$$" | ||
| curl -sS -i "$BASE_URL/v1/chat/completions" \ | ||
| -H 'Content-Type: application/json' \ | ||
| -H "X-Request-ID: $REQUEST_ID" \ | ||
| -d '{"model":"does-not-exist-model","messages":[{"role":"user","content":"Reply with exactly QA_INVALID_MODEL"}],"max_tokens":20}' && echo | ||
| sleep 6 | ||
| curl -sS "$BASE_URL/admin/api/v1/audit/log?request_id=$REQUEST_ID&limit=5" \ | ||
| | jq '{total,entries:(.entries|map({request_id,path,model,resolved_model,provider,status_code,error_type}))}' | ||
| curl -sS "$BASE_URL/admin/api/v1/audit/log?search=$REQUEST_ID&limit=5" \ | ||
| | jq --arg request_id "$REQUEST_ID" '{total:(.entries|map(select(.request_id==$request_id))|length),entries:(.entries|map(select(.request_id==$request_id))|map({request_id,path,model,resolved_model,provider,status_code,error_type}))}' | ||
| ``` |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Confirm the sleep 6 is sufficient for audit log flush.
The scenario adds sleep 6 before querying the audit log with ?search=. Based on the relevant code snippets, the Search parameter is properly implemented server-side for both SQLite and PostgreSQL. The client-side jq filtering with select(.request_id==$request_id) adds defense-in-depth but should be redundant given the server-side implementation.
One consideration: if the audit flush interval is configured differently in some environments, 6 seconds may not be sufficient. Consider documenting or parameterizing the expected flush interval.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/e2e/release-e2e-scenarios.md` around lines 755 - 768, The test
currently uses a fixed sleep (sleep 6) before querying the audit log which may
be flaky if the audit flush interval differs; update the scenario to remove the
hard-coded 6s wait and instead either parameterize the wait via an environment
variable (e.g., AUDIT_FLUSH_WAIT) or replace it with a short polling loop that
retries the admin API (/admin/api/v1/audit/log?search=$REQUEST_ID) until the
expected entry appears or a configurable timeout is reached; reference the
REQUEST_ID variable, the existing curl that queries the audit log, and the sleep
6 command as the locations to change so the test becomes robust across
environments.
| run_with_timeout() { | ||
| local script_path="$1" | ||
| local stdout_path="$2" | ||
| local stderr_path="$3" | ||
| local timeout_marker="$4" | ||
|
|
||
| rm -f "$timeout_marker" | ||
|
|
||
| bash "$script_path" >"$stdout_path" 2>"$stderr_path" & | ||
| local pid=$! | ||
|
|
||
| ( | ||
| sleep "$TIMEOUT_SECONDS" | ||
| if kill -0 "$pid" 2>/dev/null; then | ||
| : >"$timeout_marker" | ||
| kill -TERM "$pid" 2>/dev/null || true | ||
| sleep 1 | ||
| kill -KILL "$pid" 2>/dev/null || true | ||
| fi | ||
| ) & | ||
| local watchdog=$! | ||
|
|
||
| local status=0 | ||
| if ! wait "$pid"; then | ||
| status=$? | ||
| fi | ||
|
|
||
| kill "$watchdog" 2>/dev/null || true | ||
| wait "$watchdog" 2>/dev/null || true | ||
|
|
||
| return "$status" | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Race condition in watchdog cleanup on fast-completing scenarios.
If the scenario completes very quickly (before the watchdog even starts sleeping), there's a window where:
- The scenario finishes
wait "$pid"completes- The watchdog is killed
- But
wait "$watchdog"may block or produce spurious errors
The current handling with 2>/dev/null || true mitigates this, but there's a subtle race: if the PID is reused by another process before kill -0 "$pid" runs in the watchdog, the watchdog could theoretically interact with an unrelated process.
This is unlikely in practice due to PID recycling behavior, but consider using process groups or a more robust timeout mechanism like timeout command if available.
Alternative using `timeout` command if available
run_with_timeout() {
local script_path="$1"
local stdout_path="$2"
local stderr_path="$3"
local timeout_marker="$4"
rm -f "$timeout_marker"
+ # Use GNU timeout if available for more robust handling
+ if command -v timeout >/dev/null 2>&1; then
+ local status=0
+ if ! timeout --signal=TERM --kill-after=5 "$TIMEOUT_SECONDS" \
+ bash "$script_path" >"$stdout_path" 2>"$stderr_path"; then
+ status=$?
+ if [[ "$status" -eq 124 ]]; then
+ : >"$timeout_marker"
+ fi
+ fi
+ return "$status"
+ fi
+
+ # Fallback to manual watchdog
bash "$script_path" >"$stdout_path" 2>"$stderr_path" &📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| run_with_timeout() { | |
| local script_path="$1" | |
| local stdout_path="$2" | |
| local stderr_path="$3" | |
| local timeout_marker="$4" | |
| rm -f "$timeout_marker" | |
| bash "$script_path" >"$stdout_path" 2>"$stderr_path" & | |
| local pid=$! | |
| ( | |
| sleep "$TIMEOUT_SECONDS" | |
| if kill -0 "$pid" 2>/dev/null; then | |
| : >"$timeout_marker" | |
| kill -TERM "$pid" 2>/dev/null || true | |
| sleep 1 | |
| kill -KILL "$pid" 2>/dev/null || true | |
| fi | |
| ) & | |
| local watchdog=$! | |
| local status=0 | |
| if ! wait "$pid"; then | |
| status=$? | |
| fi | |
| kill "$watchdog" 2>/dev/null || true | |
| wait "$watchdog" 2>/dev/null || true | |
| return "$status" | |
| } | |
| run_with_timeout() { | |
| local script_path="$1" | |
| local stdout_path="$2" | |
| local stderr_path="$3" | |
| local timeout_marker="$4" | |
| rm -f "$timeout_marker" | |
| # Use GNU timeout if available for more robust handling | |
| if command -v timeout >/dev/null 2>&1; then | |
| local status=0 | |
| if ! timeout --signal=TERM --kill-after=5 "$TIMEOUT_SECONDS" \ | |
| bash "$script_path" >"$stdout_path" 2>"$stderr_path"; then | |
| status=$? | |
| if [[ "$status" -eq 124 ]]; then | |
| : >"$timeout_marker" | |
| fi | |
| fi | |
| return "$status" | |
| fi | |
| # Fallback to manual watchdog | |
| bash "$script_path" >"$stdout_path" 2>"$stderr_path" & | |
| local pid=$! | |
| ( | |
| sleep "$TIMEOUT_SECONDS" | |
| if kill -0 "$pid" 2>/dev/null; then | |
| : >"$timeout_marker" | |
| kill -TERM "$pid" 2>/dev/null || true | |
| sleep 1 | |
| kill -KILL "$pid" 2>/dev/null || true | |
| fi | |
| ) & | |
| local watchdog=$! | |
| local status=0 | |
| if ! wait "$pid"; then | |
| status=$? | |
| fi | |
| kill "$watchdog" 2>/dev/null || true | |
| wait "$watchdog" 2>/dev/null || true | |
| return "$status" | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/e2e/run-release-e2e.sh` around lines 108 - 139, The run_with_timeout
function can interact with unrelated processes if the child's PID is recycled
before the watchdog runs; fix this by launching the script in its own process
group and making the watchdog target that group. Concretely: when starting the
job in run_with_timeout (the bash "$script_path" -> & block that sets pid),
spawn the child in a new session/process-group (e.g., use setsid or start with
"bash -c 'exec \"$script_path\"' &" under a new session) and then have the
watchdog use kill -0 and kill -TERM/KILL on the negative PID (process group)
instead of the single PID; keep the timeout_marker behavior the same.
Optionally, where available, replace the whole run_with_timeout implementation
with the system timeout command (using TIMEOUT_SECONDS) as a simpler
alternative.
| --timeout) | ||
| [[ $# -ge 2 ]] || die "--timeout requires a value" | ||
| TIMEOUT_SECONDS="$2" | ||
| shift 2 | ||
| ;; |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Validate --timeout is a positive integer.
The validation on line 201 checks for digits but allows 0, which would cause the watchdog to timeout immediately. Consider requiring a positive value.
Proposed fix to require positive timeout
-[[ "$TIMEOUT_SECONDS" =~ ^[0-9]+$ ]] || die "--timeout must be an integer number of seconds"
+[[ "$TIMEOUT_SECONDS" =~ ^[0-9]+$ ]] || die "--timeout must be a positive integer"
+(( TIMEOUT_SECONDS > 0 )) || die "--timeout must be greater than zero"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/e2e/run-release-e2e.sh` around lines 167 - 171, The --timeout branch
currently sets TIMEOUT_SECONDS from the argument but later digit-only validation
allows "0"; update the validation to require a positive integer (>0) by checking
TIMEOUT_SECONDS after assignment (e.g., ensure it matches ^[1-9][0-9]*$) and
call die with a clear message if it fails; reference the TIMEOUT_SECONDS
variable and the --timeout case so reviewers can find and replace the existing
digit-only check with this stricter positive-integer check.
| awk -v parsed_dir="$PARSED_DIR" -v manifest="$MANIFEST" -v setup_manifest="$SETUP_MANIFEST" ' | ||
| function flush_code_block( path) { | ||
| if (current_id != "") { | ||
| path = sprintf("%s/%s.sh", parsed_dir, current_id) | ||
| print code > path | ||
| close(path) | ||
| printf "%s\t%s\t%s\n", current_id, current_title, path >> manifest | ||
| close(manifest) | ||
| current_id = "" | ||
| current_title = "" | ||
| } else if (current_h2 == "Common environment" || current_h2 == "Auth-enabled runtime environment") { | ||
| setup_count++ | ||
| path = sprintf("%s/setup-%02d.sh", parsed_dir, setup_count) | ||
| print code > path | ||
| close(path) | ||
| print path >> setup_manifest | ||
| close(setup_manifest) | ||
| } | ||
| code = "" | ||
| } | ||
|
|
||
| { | ||
| sub(/\r$/, "", $0) | ||
| if (!in_code && $0 ~ /^## /) { | ||
| current_h2 = substr($0, 4) | ||
| } | ||
| if (!in_code && $0 ~ /^### S[0-9][0-9] /) { | ||
| current_id = $2 | ||
| current_title = substr($0, length(current_id) + 5) | ||
| next | ||
| } | ||
|
|
||
| if ($0 == "```bash") { | ||
| in_code = 1 | ||
| code = "" | ||
| next | ||
| } | ||
|
|
||
| if (in_code && $0 == "```") { | ||
| in_code = 0 | ||
| flush_code_block() | ||
| next | ||
| } | ||
|
|
||
| if (in_code) { | ||
| code = code $0 ORS | ||
| } | ||
| } | ||
| ' "$SCENARIO_DOC" |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
AWK parsing may miss scenarios with non-standard heading formats.
The AWK script expects scenario headings to match ^### S[0-9][0-9] (exactly two digits). This will fail for scenarios S01-S09 if written without leading zero, or for scenarios S100+ if the test suite grows.
Based on the current markdown, all scenarios use two-digit IDs (S01-S79), so this is fine now. Consider making the regex more flexible for future-proofing.
More flexible scenario ID regex
- if (!in_code && $0 ~ /^### S[0-9][0-9] /) {
+ if (!in_code && $0 ~ /^### S[0-9]+ /) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/e2e/run-release-e2e.sh` around lines 222 - 270, The AWK only matches
two-digit scenario IDs; make it accept any digit count and extract ID/title
robustly by replacing the /^### S[0-9][0-9] / test and the
current_id/current_title assignment with a match-based approach: use match($0,
/S[0-9]+/) then set current_id = substr($0, RSTART, RLENGTH) and current_title =
substr($0, RSTART + RLENGTH + 1) so headings like "### S1 ..." or "### S100 ..."
are handled; update the awk block that contains the current_id/current_title
logic accordingly (the code paths around the /^### S[0-9][0-9] / check and where
current_title is computed).
| cleanup_release_auth_artifacts | ||
| if (( KEEP_ARTIFACTS == 0 )); then | ||
| trap cleanup_release_auth_artifacts EXIT | ||
| fi |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Cleanup trap may leave orphan processes on SIGINT/SIGTERM.
The trap only handles EXIT. If the script receives SIGINT (Ctrl+C) during scenario execution, the background scenario process and its watchdog may continue running after the runner exits.
Consider trapping additional signals to ensure clean shutdown.
Proposed trap enhancement
cleanup_release_auth_artifacts
if (( KEEP_ARTIFACTS == 0 )); then
trap cleanup_release_auth_artifacts EXIT
fi
+
+# Ensure child processes are terminated on interrupt
+trap 'trap - EXIT; cleanup_release_auth_artifacts; exit 130' INT TERM🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/e2e/run-release-e2e.sh` around lines 335 - 338, The current cleanup
only traps EXIT and can leave the background scenario process and its watchdog
running on SIGINT/SIGTERM; update the trap logic that currently references
KEEP_ARTIFACTS and cleanup_release_auth_artifacts so that when KEEP_ARTIFACTS==0
you register the cleanup_release_auth_artifacts handler for additional signals
(at minimum SIGINT and SIGTERM, optionally SIGHUP) so the same cleanup runs on
interrupts/terminations and ensures background scenario and watchdog processes
are killed and artifacts removed; keep the conditional on KEEP_ARTIFACTS and
ensure the trap is installed early enough before starting background processes.
Summary by CodeRabbit