docs(evals): add initial integrations e2e spec#127
Conversation
📝 WalkthroughHidden review stack artifactWalkthroughThis PR adds a comprehensive E2E specification for initial Relayfile integrations (Linear, Slack, Notion, GitHub): seven live cases validating mounted discovery and file-native writeback, shared polling/evidence helpers, an acceptance rubric, and trajectory records documenting the completed spec. ChangesInitial Integrations E2E Evaluation Suite
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRsPoem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.trajectories/completed/2026-05/traj_brjdrgcnnwhs.json:
- Line 19: The projectId field contains a machine-specific absolute path;
replace this value with a portable repo-relative identifier or remove the
projectId entry altogether if unused. Locate the "projectId" key in the JSON
blob (symbol: projectId) and change
"/Users/khaliqgant/Projects/AgentWorkforce/relayfile" to a neutral value such as
"relayfile" or "./relayfile" (or delete the projectId property) so the metadata
no longer leaks local environment details.
In @.trajectories/index.json:
- Line 244: Replace the absolute user-specific path value in the "path" field
inside .trajectories/index.json with a repository-relative path (e.g., change
"/Users/khaliqgant/Projects/AgentWorkforce/relayfile/.trajectories/completed/2026-05/traj_brjdrgcnnwhs.json"
to ".trajectories/completed/2026-05/traj_brjdrgcnnwhs.json"); update the JSON
entry so the "path" key holds the repo-relative string to avoid leaking local
environment details and ensure portability.
In `@evals/suites/initial-integrations-e2e/cases.md`:
- Around line 341-343: The jq invocations that build JSON from literals/args
(the lines using jq --arg run "$EVAL_RUN_ID" '{description: ("Patched by " +
$run)}' > "$EVAL_LOCAL_DIR/<canonical-linear-issue-path>") must run with no
input, so add the -n flag to jq (e.g., jq -n --arg run "$EVAL_RUN_ID" ...) to
prevent jq from waiting for stdin or failing in automated runs; apply the same
-n addition to the other similar jq invocation around the EVAL_LOCAL_DIR path at
the second occurrence.
- Around line 176-183: After mounting in background with relayfile mount, add a
bounded readiness poll before running the deterministic assertions (relayfile
status, relayfile tree, relayfile writeback status): call the existing polling
helper to wait until the mount is reported ready (e.g., relayfile status shows
the workspace is mounted and provider roots/pending==0, or relayfile tree
returns the expected root listing) with a sensible timeout and interval, then
proceed to tee the outputs; apply the same readiness-wait change to the similar
block referenced at lines 187-194 to avoid race flakes.
- Around line 102-103: The grep in wait_for_file_contains currently treats
$needle as a regex which can mis-match when needle contains metacharacters;
change the check that uses grep -q "$needle" "$target" to use fixed-string mode
grep -Fq "$needle" "$target" so $needle is matched literally (locate the shell
function wait_for_file_contains and the lines referencing target and needle to
update the grep invocation).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: e0a4326f-4ab3-4725-8385-8f41917129a4
📒 Files selected for processing (5)
.trajectories/completed/2026-05/traj_brjdrgcnnwhs.json.trajectories/completed/2026-05/traj_brjdrgcnnwhs.md.trajectories/index.jsonevals/suites/initial-integrations-e2e/cases.mdevals/suites/initial-integrations-e2e/rubric.md
| jq --arg run "$EVAL_RUN_ID" '{description: ("Patched by " + $run)}' \ | ||
| > "$EVAL_LOCAL_DIR/<canonical-linear-issue-path>" |
There was a problem hiding this comment.
🟡 Missing jq -n flag causes eval script to hang on stdin
The jq command on line 341 constructs a new JSON object ({description: ...}) but is missing the -n flag, unlike the create commands on lines 327, 408, and 527 which all correctly use jq -n. Without -n, jq reads from stdin and will block indefinitely in an interactive terminal. An agent following this template would copy the jq invocation as-is (only substituting the <canonical-linear-issue-path> placeholder) and produce a hanging command. The fix is to add -n to match the pattern used everywhere else in the file.
| jq --arg run "$EVAL_RUN_ID" '{description: ("Patched by " + $run)}' \ | |
| > "$EVAL_LOCAL_DIR/<canonical-linear-issue-path>" | |
| jq -n --arg run "$EVAL_RUN_ID" '{description: ("Patched by " + $run)}' \ | |
| > "$EVAL_LOCAL_DIR/<canonical-linear-issue-path>" |
Was this helpful? React with 👍 or 👎 to provide feedback.
| 5. Attempt a read-only mutation against the canonical issue: | ||
|
|
||
| ```bash | ||
| jq '{id: "not-the-real-id"}' > "$EVAL_LOCAL_DIR/<canonical-linear-issue-path>" |
There was a problem hiding this comment.
🟡 Missing jq -n flag causes eval script to hang on stdin
The jq command on line 348 constructs a new JSON object ({id: "not-the-real-id"}) but is missing the -n flag. Same root cause as the patch command above — without -n, jq reads from stdin and will block. Every other jq invocation in this file that creates a new object uses -n (lines 327, 408, 527).
| jq '{id: "not-the-real-id"}' > "$EVAL_LOCAL_DIR/<canonical-linear-issue-path>" | |
| jq -n '{id: "not-the-real-id"}' > "$EVAL_LOCAL_DIR/<canonical-linear-issue-path>" |
Was this helpful? React with 👍 or 👎 to provide feedback.
78c4cb8 to
98cd3f4
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@evals/suites/initial-integrations-e2e/cases.md`:
- Around line 198-203: The readiness/drain helper calls wait_for_provider_roots
and wait_for_writeback_drain can return non-zero but the script keeps running;
modify the block that calls wait_for_provider_roots and wait_for_writeback_drain
so failures immediately abort the run: check each command's exit status and on
non-zero print a clear error message and exit non-zero (or enable strict mode
like set -e at the top of the script), and also treat any "dead letters"
detection from wait_for_writeback_drain as a fatal condition—use the functions'
return codes (wait_for_provider_roots, wait_for_writeback_drain) to gate
continuing to the evidence collection commands (relayfile status/tree/writeback
status).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 9b41f23e-1e4a-48eb-b6af-f1b55c5a6bc1
📒 Files selected for processing (5)
.trajectories/completed/2026-05/traj_brjdrgcnnwhs.json.trajectories/completed/2026-05/traj_brjdrgcnnwhs.md.trajectories/index.jsonevals/suites/initial-integrations-e2e/cases.mdevals/suites/initial-integrations-e2e/rubric.md
✅ Files skipped from review due to trivial changes (3)
- .trajectories/completed/2026-05/traj_brjdrgcnnwhs.md
- .trajectories/index.json
- .trajectories/completed/2026-05/traj_brjdrgcnnwhs.json
| wait_for_provider_roots 180 | ||
| wait_for_writeback_drain 180 | ||
| relayfile status "$EVAL_WORKSPACE" | tee "$EVAL_EVIDENCE_DIR/status-after-mount.txt" | ||
| relayfile tree "$EVAL_WORKSPACE" / --depth 3 | tee "$EVAL_EVIDENCE_DIR/02-tree-before.txt" | ||
| relayfile writeback status "$EVAL_WORKSPACE" --json \ | ||
| | tee "$EVAL_EVIDENCE_DIR/04-writeback-status-before.json" |
There was a problem hiding this comment.
Fail fast when readiness/drain checks time out or detect dead letters.
On Line 198 and Line 199, the wait helpers can return non-zero, but the script continues because the block doesn’t enforce abort semantics. That can produce misleading evidence and false PASS interpretation.
Suggested doc patch
-wait_for_provider_roots 180
-wait_for_writeback_drain 180
+wait_for_provider_roots 180 || {
+ echo "BLOCKED_PROVIDER_ROOTS_TIMEOUT" | tee -a "$EVAL_EVIDENCE_DIR/SUMMARY.md"
+ exit 22
+}
+wait_for_writeback_drain 180 || {
+ rc=$?
+ if [ "$rc" -eq 2 ]; then
+ echo "FAIL_DEAD_LETTERED_WRITEBACK" | tee -a "$EVAL_EVIDENCE_DIR/SUMMARY.md"
+ else
+ echo "BLOCKED_WRITEBACK_DRAIN_TIMEOUT" | tee -a "$EVAL_EVIDENCE_DIR/SUMMARY.md"
+ fi
+ exit 23
+}
relayfile status "$EVAL_WORKSPACE" | tee "$EVAL_EVIDENCE_DIR/status-after-mount.txt"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| wait_for_provider_roots 180 | |
| wait_for_writeback_drain 180 | |
| relayfile status "$EVAL_WORKSPACE" | tee "$EVAL_EVIDENCE_DIR/status-after-mount.txt" | |
| relayfile tree "$EVAL_WORKSPACE" / --depth 3 | tee "$EVAL_EVIDENCE_DIR/02-tree-before.txt" | |
| relayfile writeback status "$EVAL_WORKSPACE" --json \ | |
| | tee "$EVAL_EVIDENCE_DIR/04-writeback-status-before.json" | |
| wait_for_provider_roots 180 || { | |
| echo "BLOCKED_PROVIDER_ROOTS_TIMEOUT" | tee -a "$EVAL_EVIDENCE_DIR/SUMMARY.md" | |
| exit 22 | |
| } | |
| wait_for_writeback_drain 180 || { | |
| rc=$? | |
| if [ "$rc" -eq 2 ]; then | |
| echo "FAIL_DEAD_LETTERED_WRITEBACK" | tee -a "$EVAL_EVIDENCE_DIR/SUMMARY.md" | |
| else | |
| echo "BLOCKED_WRITEBACK_DRAIN_TIMEOUT" | tee -a "$EVAL_EVIDENCE_DIR/SUMMARY.md" | |
| fi | |
| exit 23 | |
| } | |
| relayfile status "$EVAL_WORKSPACE" | tee "$EVAL_EVIDENCE_DIR/status-after-mount.txt" | |
| relayfile tree "$EVAL_WORKSPACE" / --depth 3 | tee "$EVAL_EVIDENCE_DIR/02-tree-before.txt" | |
| relayfile writeback status "$EVAL_WORKSPACE" --json \ | |
| | tee "$EVAL_EVIDENCE_DIR/04-writeback-status-before.json" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@evals/suites/initial-integrations-e2e/cases.md` around lines 198 - 203, The
readiness/drain helper calls wait_for_provider_roots and
wait_for_writeback_drain can return non-zero but the script keeps running;
modify the block that calls wait_for_provider_roots and wait_for_writeback_drain
so failures immediately abort the run: check each command's exit status and on
non-zero print a clear error message and exit non-zero (or enable strict mode
like set -e at the top of the script), and also treat any "dead letters"
detection from wait_for_writeback_drain as a fatal condition—use the functions'
return codes (wait_for_provider_roots, wait_for_writeback_drain) to gate
continuing to the evidence collection commands (relayfile status/tree/writeback
status).
Summary
Testing