Skip to content

feat: [ENG-2324] --feedback flag on curate + query (Phase 7 Task 7.4)#532

Merged
danhdoan merged 3 commits intoproj/autoharness-v2from
feat/ENG-2324
Apr 23, 2026
Merged

feat: [ENG-2324] --feedback flag on curate + query (Phase 7 Task 7.4)#532
danhdoan merged 3 commits intoproj/autoharness-v2from
feat/ENG-2324

Conversation

@danhdoan
Copy link
Copy Markdown
Collaborator

Summary

Phase 7 Task 7.4 — closes brutal-review Tier 2 D4. Adds
`--feedback <good|bad>` to both `brv curate` and `brv query` so
users can flag the outcome the daemon just wrote, feeding Phase 6
Task 6.5's weighting policy (3× synthetic failures for `bad`,
1× synthetic success for `good`).

CLI surface

  • `src/oclif/commands/curate/index.ts` — `feedback` flag +
    `handleFeedback` run after `submitTask`. Detach mode skips with
    a hint (no completed outcome yet). Disabled harness → warn +
    exit 0. Missing outcome → exit 1.
  • `src/oclif/commands/query.ts` — same shape, adapted to the
    query command's flow.

Shared lib

  • `src/oclif/lib/harness-feedback.ts` — entry point
    `attachFeedbackFromCli(projectRoot, commandType, verdict)` +
    pure store variant `attachFeedbackToStore` for tests. Minimal
    `HarnessOutcomeRecorder` construction delegates to Phase 6's
    canonical `attachFeedback` instead of re-implementing the 3×/1×
    policy.

Bug caught in test authoring

First draft scanned `listOutcomes(..., limit=1)` for the most
recent outcome, but synthetic rows (inserted with `Date.now()`
timestamps) shadow the original real outcome. A re-label flow
(`--feedback good` then `--feedback bad`) would target the
earlier synthetic, not the original. Fix filters `_synthetic`
ids out of the scan. Test 3 pins the correct behaviour.

Test plan

  • `npm run typecheck` — clean
  • `npm run lint` — 0 errors (257 pre-existing warnings)
  • `npm test` — 7209 passing, 16 pending, 0 failing
  • `npm run build` — clean
  • 6 new unit tests in `harness-feedback.test.ts`:
    • 3 synthetic rows for `bad`, 1 for `good`
    • Re-label clears previous synthetics, inserts new
    • Empty store → `NO_RECENT_OUTCOME`
    • Most-recent-by-timestamp selection (skipping synthetics)
    • commandType partitioning (curate feedback leaves query
      outcomes untouched)

Scope narrowing

  • `HARNESS_DISABLED` / `NO_STORAGE` paths exercised indirectly
    via existing `readHarnessFeatureConfig` + `openHarnessStoreForProject`
    tests that already cover the filesystem/config side; the
    feedback helper composes them and doesn't duplicate coverage.
  • No per-outcome targeting (`--feedback-outcome `) or explicit
    `--feedback none` — kept minimal per the task-doc "out of
    scope" section.

Acceptance criteria

  • `--feedback <good|bad>` on curate + query
  • Most-recent lookup scoped to `(projectId, commandType)`
  • Phase 6 `recordFeedback` invoked with the right verdict
  • Repeat with different verdict replaces previous (idempotent)
  • No-outcome exits 1
  • Harness-disabled: primary runs, warn, exit 0

Closes brutal-review Tier 2 D4. Adds `--feedback <good|bad>` to
both `brv curate` and `brv query` so users can flag the outcome the
daemon just wrote, feeding Phase 6 Task 6.5's weighting policy
(3x synthetic failures for `bad`, 1x synthetic success for `good`).

Primary action runs first — a failed curate/query reports the
primary failure, not a silent feedback drop.

CLI surface:
- `src/oclif/commands/curate/index.ts` — new `feedback` flag +
  `handleFeedback` after `submitTask`. Detach mode skips with a
  hint (no completed outcome yet). Disabled harness warns + exit 0.
  Missing outcome → exit 1.
- `src/oclif/commands/query.ts` — same shape, adapted to the query
  command's flow.

Shared lib:
- `src/oclif/lib/harness-feedback.ts` — entry point
  `attachFeedbackFromCli(projectRoot, commandType, verdict)` +
  pure store variant `attachFeedbackToStore` for tests. Minimal
  `HarnessOutcomeRecorder` construction (NoOp logger + unused
  SessionEventBus) keeps the CLI delegating to Phase 6's
  canonical `attachFeedback` instead of re-implementing the 3x/1x
  policy.

Real bug caught in test authoring:
The first draft of the helper scanned `listOutcomes(..., limit=1)`
for the most recent outcome, but synthetic rows (inserted with
`Date.now()` timestamps) shadow the original real outcome. A
re-label flow (`--feedback good` then `--feedback bad`) would
target the earlier synthetic, not the original. Fix filters
`__synthetic_` ids out of the "most recent" scan. Test 3 pins the
correct behavior.

Tests (6 new):
- 3 synthetic rows for `bad`, 1 for `good`
- Re-label clears previous synthetics, inserts new
- Empty store → NO_RECENT_OUTCOME
- Most-recent-by-timestamp selection
- commandType partitioning (curate feedback leaves query outcomes
  untouched)

Scope narrowing:
- HARNESS_DISABLED / NO_STORAGE paths exercised indirectly via
  `readHarnessFeatureConfig` + `openHarnessStoreForProject` tests
  that already cover the filesystem/config side; the feedback
  helper composes them and doesn't duplicate coverage.
- No per-outcome targeting (`--feedback-outcome <id>`) or explicit
  `--feedback none` — Phase 7.4 keeps the CLI minimal per the
  task-doc "out of scope" section.

Acceptance criteria from task 7.4:
- --feedback <good|bad> on curate + query
- Most-recent lookup scoped to (projectId, commandType)
- Phase 6 recordFeedback invoked with the right verdict
- Repeat with different verdict replaces previous (idempotent)
- No-outcome exits 1
- Harness-disabled primary still runs, warn, exit 0
- typecheck / lint / test / build clean
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 23, 2026

Claude finished @danhdoan's task in 5m 11s —— View job


PR Review: feat: [ENG-2324] --feedback flag on curate + query (Phase 7 Task 7.4)

Tasks

  • Gather context and read changed files
  • Review src/oclif/lib/harness-feedback.ts
  • Review src/oclif/commands/curate/index.ts
  • Review src/oclif/commands/query.ts
  • Review test/unit/oclif/lib/harness-feedback.test.ts
  • Post inline comments and final summary

Overall Score: 3 / 5

The feature is well-scoped and the core logic is sound. The synthetic-row shadowing fix is the highlight — the problem is subtle (synthetics carry Date.now() timestamps and would win a "most-recent" scan), the fix is clean, and Test 3 pins exactly the right behaviour. The attachFeedbackToStore/attachFeedbackFromCli split also makes the feedback path testable without any filesystem dance.

That said there are two issues that need attention before merging:


Issues

🔴 Bug — incorrect exit code for NO_RECENT_OUTCOME / NO_STORAGE

handleFeedback calls this.error(message, {exit: 1}), which throws a CLIError. Because handleFeedback runs inside the withDaemonRetry closure, which is wrapped by the outer try/catch in run(), the CLIError is caught there and passed to reportError(). reportError formats it as a connection error (wrong context) and returns normally — the process exits 0 instead of the intended 1. See inline comment on curate/index.ts:311 (same issue in query.ts:162).

Fix this →

🟡 Missing JSON output for feedback

--feedback with --format json produces no JSON record of the operation — success or failure. A JSON consumer can't verify whether feedback was applied, which outcome was targeted, or how many synthetic rows were inserted. The text-format path logs all of that. See inline comment on curate/index.ts:298. Fix this →

🟡 Duplicated constants / inaccurate syntheticCount

BAD_SYNTHETIC_COUNT, GOOD_SYNTHETIC_COUNT, and SYNTHETIC_DELIMITER are re-declared in harness-feedback.ts instead of imported from harness-outcome-recorder.ts. The delimiter is the riskiest: if it ever changes, the synthetic-shadowing fix silently breaks. Additionally, FeedbackResult.syntheticCount is computed from the local constant rather than the actual insertions, so it may over-report when cap enforcement (FEEDBACK_SYNTHETIC_CAP = 10) trims rows. See inline comment on harness-feedback.ts:32.

🟡 as casts in query.ts

Lines 80–81 use as QueryFlags and as 'json' | 'text', violating the project's "Avoid as Type assertions" rule (CLAUDE.md). Curate constructs its flags object explicitly; query should follow the same pattern. See inline comment on query.ts:80.

🔵 Hardcoded ValidatedHarnessConfig in buildCliRecorder

language: 'auto' and maxVersions: 20 are hardcoded. The comment correctly notes that attachFeedback doesn't read them today, but there's no guard against future expansion. See inline comment on harness-feedback.ts:169.

🔵 Fragile message assertion in Test 4

.to.include('run curate first') couples the test to free-text wording; a grammar fix would break it for no behavioural reason. The code field check on the previous line is sufficient. See inline comment on harness-feedback.test.ts:107.


What's good

  • The attachFeedbackFromCli / attachFeedbackToStore separation cleanly isolates the XDG filesystem side from the feedback logic, making the unit tests straightforward.
  • Delegating the 3×/1× policy to HarnessOutcomeRecorder.attachFeedback rather than re-implementing it is the right call — one owner for the weighting policy.
  • The --detach guard in curate's handleFeedback (no outcome to flag yet) is correctly absent from query.
  • Test 3 (re-label idempotence) and Test 5 (timestamp ordering) cover the two subtlest invariants.
  • FEEDBACK_SCAN_LIMIT = 50 with a clear explanation of worst-case headroom is preferable to a magic number.

Comment thread src/oclif/commands/curate/index.ts
Comment thread src/oclif/commands/curate/index.ts Outdated
Comment thread src/oclif/lib/harness-feedback.ts Outdated
Comment thread src/oclif/commands/query.ts Outdated
Comment thread src/oclif/lib/harness-feedback.ts
Comment thread test/unit/oclif/lib/harness-feedback.test.ts
Comment thread src/oclif/lib/harness-feedback.ts
Correctness:
- `handleFeedback` is now invoked AFTER `withDaemonRetry` resolves
  in both curate and query. Previously it ran inside the daemon
  callback, which meant `this.error({exit: 1})` from a
  NO_RECENT_OUTCOME / NO_STORAGE path was caught by the outer
  try/catch and routed to `reportError` — `reportError` only
  logs, so the exit code was silently dropped. The fix captures
  `projectRoot` into an outer variable and runs feedback on the
  post-daemon path, so CLIError propagates cleanly.
- Feedback attaches only when the primary daemon call actually
  succeeded (new `daemonSucceeded` guard). A failed primary
  already skipped the feedback branch transitively, but pulling
  it out of the try/catch required making that intent explicit.

JSON output:
- `--format json` now emits structured feedback events for both
  the success path (`{outcomeId, syntheticCount, verdict}`) and
  the HARNESS_DISABLED skip path (`{reason, skipped: true}`) in
  both commands. Previously JSON consumers had no way to know
  whether feedback was attached.

Constants dedup:
- Exported `BAD_SYNTHETIC_COUNT`, `GOOD_SYNTHETIC_COUNT`, and
  `SYNTHETIC_DELIMITER` from `harness-outcome-recorder.ts` — the
  canonical owner of the §C2 weighting policy. Feedback helper
  imports them instead of re-declaring, so a future policy change
  can't silently drift.

Not addressed:
- Pre-existing `as` casts on `query.ts:80-81` are not introduced
  by this PR; fixing unrelated legacy casts would expand the
  review surface.
- Hardcoded `language`/`maxVersions` in the CLI recorder are
  documented as ignored by `attachFeedback`; adding a runtime
  assertion has no behavioural benefit and would noise the log.
- Test-message substring matches are typical mocha-chai shape;
  relaxing to code-only checks loses coverage of the
  user-visible text without meaningfully reducing brittleness.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 23, 2026

Claude finished @danhdoan's task in 1m 48s —— View job


PR Review: feat: [ENG-2324] --feedback flag on curate + query (Phase 7 Task 7.4)

Tasks

  • Gather context and analyze PR
  • Read changed files
  • Check if previous review issues were addressed
  • Provide updated review

Overall Score: 4 / 5

Strong improvement over the first draft. The three substantive issues from the prior review are all fixed. Two minor items remain.


Previous Issues — Status

✅ Fixed: Exit-code bug (was 🔴)

handleFeedback is now called outside the withDaemonRetry / outer try/catch block in both commands (curate/index.ts:186–193, query.ts:141–147). The pattern — capturing projectRoot inside the callback and tracking daemonSucceeded — is clean and correct. this.error({exit: 1}) can now propagate freely to oclif's default handler.

✅ Fixed: Missing JSON output for feedback (was 🟡)

writeJsonResponse({command: 'curate:feedback', ...}) / writeJsonResponse({command: 'query:feedback', ...}) are emitted on both success and HARNESS_DISABLED paths (curate/index.ts:313–322, query.ts:164–173). JSON consumers now have a structured record of the feedback operation.

✅ Fixed: Duplicated constants (was 🟡)

BAD_SYNTHETIC_COUNT, GOOD_SYNTHETIC_COUNT, and SYNTHETIC_DELIMITER are now imported from harness-outcome-recorder.ts (harness-feedback.ts:27–31). The comment at line 43–47 is exactly the right explanation to leave in place — it explains why importing rather than redeclaring is important.


Remaining Issues

🟡 as casts in query.ts:80–81 — still present

// query.ts:80–81
const flags = rawFlags as QueryFlags
const format = (flags.format ?? 'text') as 'json' | 'text'

These were flagged in the prior review and are unchanged. curate/index.ts constructs flags explicitly without any casts — query.ts should follow the same pattern to comply with CLAUDE.md ("Avoid as Type assertions"):

const feedbackVerdict: FeedbackVerdict | undefined =
  rawFlags.feedback === 'good' || rawFlags.feedback === 'bad' ? rawFlags.feedback : undefined
const flags: QueryFlags = {
  feedback: feedbackVerdict,
  format: rawFlags.format === 'json' ? 'json' : rawFlags.format === 'text' ? 'text' : undefined,
  timeout: rawFlags.timeout,
}
const format: 'json' | 'text' = flags.format ?? 'text'

Fix this →

🔵 Fragile message assertion in Test 4 — still present

// harness-feedback.test.ts:106
expect((caught as FeedbackError).message).to.include('run curate first')

The code field check on line 105 is the strong assertion. The .include('run curate first') substring check will break on any wording change without a behavioural regression. Consider removing it or extracting the error message to a named constant if you want to test it.


Acknowledged (no change needed)

syntheticCount may lag after cap enforcementharness-feedback.ts:151–155 returns BAD_SYNTHETIC_COUNT/GOOD_SYNTHETIC_COUNT rather than the actual rows surviving enforceFeedbackCap. This is a display-only field in the feedback result; the comment at line 127 correctly frames it as "number the recorder inserted" (before trimming). Acceptable for now.

Hardcoded language/maxVersions in buildCliRecorder — the inline comment explains the fields are unused by attachFeedback. Fine as-is with the current interface.


What's good

  • The capturedProjectRoot / daemonSucceeded split is idiomatic and removes the exit-code hazard cleanly.
  • attachFeedbackFromCli / attachFeedbackToStore separation keeps the filesystem concern at the boundary and makes unit tests straightforward.
  • Delegating the 3×/1× policy entirely to HarnessOutcomeRecorder.attachFeedback is the right call.
  • The synthetic-shadowing fix and its coverage in Test 3 and Test 5 remain the highlights of the PR.
  • Tests now run fully in-memory via FileKeyStorage({inMemory: true}) — no disk state, fast and deterministic.

- Replace the two `as` casts in query.ts:80-81 with explicit typed
  construction — mirrors the curate command's pattern and keeps
  the file on the right side of CLAUDE.md's "avoid `as Type`"
  rule. The new code paths I added in Task 7.4 touched this block,
  so fixing the legacy casts now keeps the reviewed surface
  consistent.
- Swap the fragile `.message.to.include('run curate first')`
  substring assertion in harness-feedback.test.ts for a
  structural check on `details.commandType`. The error-code check
  on the previous line was already the strong assertion; the
  substring added brittleness without coverage.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 23, 2026

Claude finished @danhdoan's task in 1m 26s —— View job


PR Review: feat: [ENG-2324] --feedback flag on curate + query (Phase 7 Task 7.4)

Tasks

  • Gather context and analyze PR
  • Read changed files
  • Check if previous review issues were addressed
  • Provide updated review

Overall Score: 5 / 5

Both remaining issues from the previous review are fully resolved. No new issues introduced. This PR is ready to merge.


Previous Issues — Status

✅ Fixed: as casts in query.ts (was 🟡)

Lines 80–81 now use explicit type narrowing matching the curate command's pattern:

// query.ts:83–89
const feedbackVerdict: FeedbackVerdict | undefined =
  rawFlags.feedback === 'good' || rawFlags.feedback === 'bad' ? rawFlags.feedback : undefined
const flags: QueryFlags = {
  feedback: feedbackVerdict,
  format: rawFlags.format === 'json' ? 'json' : rawFlags.format === 'text' ? 'text' : undefined,
  timeout: rawFlags.timeout,
}

No as casts remain. The comment on line 80–82 correctly explains why the narrowing is safe (oclif's options: validator already rejected anything outside ['good', 'bad']). Fully compliant with CLAUDE.md.

✅ Fixed: Fragile message assertion in Test 4 (was 🔵)

The .to.include('run curate first') substring check has been replaced with a structured assertion on the details field:

// harness-feedback.test.ts:104–106
expect(caught).to.be.instanceOf(FeedbackError)
expect((caught as FeedbackError).code).to.equal('NO_RECENT_OUTCOME')
expect((caught as FeedbackError).details.commandType).to.equal('curate')

This is strictly better than the prior fix — it not only removes the brittle text check but adds a meaningful behavioural assertion (details.commandType) that confirms the error carries the correct execution context. Wording changes no longer break it.


Confirmed Good (unchanged from prior review)

  • BAD_SYNTHETIC_COUNT, GOOD_SYNTHETIC_COUNT, and SYNTHETIC_DELIMITER are imported from harness-outcome-recorder.ts (lines 27–31 of harness-feedback.ts), with a comment that explains the import rationale. Confirmed all three are export const in harness-outcome-recorder.ts.
  • capturedProjectRoot / daemonSucceeded pattern keeps handleFeedback outside the withDaemonRetry try/catch in both commands — exit code 1 can propagate cleanly.
  • writeJsonResponse({command: 'curate:feedback' | 'query:feedback', ...}) covers both success and HARNESS_DISABLED paths for JSON consumers.
  • The synthetic-shadowing fix (!o.id.includes(SYNTHETIC_DELIMITER)) and Test 3 + Test 5 coverage remain the highlights of the PR.

@danhdoan danhdoan merged commit 8148cb7 into proj/autoharness-v2 Apr 23, 2026
6 checks passed
@danhdoan danhdoan deleted the feat/ENG-2324 branch April 23, 2026 02:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants