chore(review): full-repo audit 2026-04-10 — fixes, reclaim tests, doc reconciliation#3
Conversation
… reconciliation - docker-compose: migrate service now runs the transactional Node runner (pnpm --filter @cap/db migrate) instead of an ad-hoc shell psql for-loop, so schema_migrations tracking + transactional migration apply to both local and compose paths. - worker: split WORKER_CLAIM_BATCH_SIZE (reserved/dormant) from a new WORKER_RECLAIM_BATCH_SIZE (default 25). reclaimExpiredLeases() now reads the correct key. .env.example documents both. - worker: new apps/worker/src/queue/claim.test.ts with 6 tests covering claimOne(), the CLAIM_SQL_WITH_EXCLUDE variant, and reclaimExpiredLeases() including the batch-size env wiring. - web: dedupe formatDuration / formatTimestamp in apps/web/src/lib/format.ts — formatTimestamp is now an alias of formatDuration, keeping all 25+ call sites intact with one source of truth. - web-api: align S3_PUBLIC_ENDPOINT fallback with packages/config (http://localhost:8922, the docker-compose host-mapped MinIO API port). - docs: reconcile residual drift in CLAUDE.md, README.md, status.md, system.md, development.md, cap5_implementation_plan.md, and auth-plan.md around WORKER_CLAIM_BATCH_SIZE wording, stale localhost:9000 examples, and the reclaim-test gap (now closed). - docs: new docs/review-2026-04-10.md — dated review snapshot + changelog. Verification: pnpm typecheck green across all 7 workspace packages, pnpm test green 56/56 unit tests (web 19, worker 24, web-api 13).
…er mock CI / Lint was failing on PR #3 because apps/web-api/src/routes/auth.test.ts had two `as any` casts (lines 54 and 101) that tripped @typescript-eslint/no-explicit-any. Extract a single mockServiceLogger() helper that builds the fake decorator and casts through `unknown as Logger` — the idiomatic bridge for test mocks that only need a subset of an interface. Both call sites now use the helper; no behavior change; all 13 apps/web-api tests still pass.
CI / Web E2E has been red on main since commit 68eaaf1 (2026-04-06), which refactored apps/web/src/pages/video-page/VideoRail.tsx to use proper ARIA semantics (role="tablist" + role="tab" + role="tabpanel") but did not update apps/web/e2e/layout.spec.ts. The test still queried the rail tabs as getByRole("button", { name: "Notes"/"Summary"/"Transcript" }), which no longer matches — per the ARIA spec, an explicit role="tab" on a <button> overrides the native button role in the accessibility tree, so Playwright's accessibility-tree locators can only reach those elements as role="tab". Fix: update the three rail-tab queries in the desktop test and the one in the mobile test to use getByRole("tab", { name: ..., exact: true }). The unrelated page.getByRole("button").nth(1).click() call on the mobile test (hamburger menu, not a rail tab) is intentionally unchanged. Verified locally: reproduced the exact CI failure against the unfixed file, then all 7 Playwright tests pass with the fix applied. Closes the Web E2E gap on both this PR and main. Also updates docs/review-2026-04-10.md with a new section documenting the breakage and the fix.
Follow-up to b0287db. That commit fixed layout.spec.ts but missed player.spec.ts, which has the same issue: the rail tabs in VideoRail are now role="tab" (per commit 68eaaf1), so Playwright locators have to query them as getByRole("tab", ...) instead of getByRole("button", ...). In addition to the role rename, the 'renders the transcript workspace by default' assertion now checks aria-selected=true instead of a CSS class (rail-tab-active), which is the correct semantic signal for a selected tab and won't drift if the class name is renamed. For the 'summary tab shows AI summary copy and jumpable chapter list' test, the panel is now located via getByRole("tabpanel", { name: "Summary" }) rather than a CSS-class selector on rail-tab-panel-enter. Verified locally: all 9 Playwright tests pass. CI run #28 on b0287db failed on these two specific player tests, which is the signal this commit addresses — this should close PR #3 out fully green.
There was a problem hiding this comment.
Pull request overview
Point-in-time full-repo audit update that aligns runtime behavior/config/docs, closes a worker reclaim-test gap, and fixes a pre-existing Web E2E failure by updating Playwright role queries to match ARIA tab semantics.
Changes:
- Docker compose
migrateservice now runs the transactional Node migration runner used locally. - Worker config semantics split: introduce
WORKER_RECLAIM_BATCH_SIZEand update reclaim SQL path + add direct unit tests forclaimOne()/reclaimExpiredLeases(). - Web/UI consistency fixes: dedupe
formatDuration/formatTimestamp, align S3 public endpoint defaults, and fix Playwright locators for ARIA tabs; reconcile docs accordingly.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/fix-web-e2e.sh | Adds a helper script intended to reproduce/verify the Web E2E fix (but currently includes commit/push automation). |
| docker-compose.yml | Switches migrate service to run pnpm --filter @cap/db migrate using the transactional runner. |
| packages/config/src/index.ts | Adds WORKER_RECLAIM_BATCH_SIZE and documents WORKER_CLAIM_BATCH_SIZE as reserved/dormant. |
| apps/worker/src/queue/claim.ts | Uses WORKER_RECLAIM_BATCH_SIZE for reclaim LIMIT. |
| apps/worker/src/queue/claim.test.ts | Adds unit coverage for claimOne() and reclaimExpiredLeases() parameterization and SQL selection. |
| apps/web/src/lib/format.ts | Dedupes timestamp/duration formatting via alias export. |
| apps/web/e2e/layout.spec.ts | Updates Playwright queries to use role="tab" for VideoRail tabs. |
| apps/web-api/src/lib/s3.ts | Aligns S3_PUBLIC_ENDPOINT fallback default to localhost:8922. |
| apps/web-api/src/routes/auth.test.ts | Replaces any logger mock with a typed Logger-shaped helper. |
| .env.example | Documents WORKER_RECLAIM_BATCH_SIZE and clarifies dormant WORKER_CLAIM_BATCH_SIZE. |
| README.md | Removes empty section and links to the new dated review doc. |
| CLAUDE.md | Updates worker claim/reclaim configuration wording. |
| docs/system.md | Updates worker claim/reclaim config semantics in queue/capacity sections. |
| docs/status.md | Updates coverage/gaps to reflect new reclaim tests and config split. |
| docs/development.md | Aligns S3 endpoint examples and updates worker tuning table. |
| docs/cap5_implementation_plan.md | Marks reclaim tests/config split as completed; updates remaining work list. |
| docs/auth-plan.md | Cross-references the dated auth review doc. |
| docs/review-2026-04-10.md | Adds a dated audit/changelog record and documents the Web E2E breakage/fix. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| BRANCH="chore/review-2026-04-10" | ||
|
|
||
| cd "$(git rev-parse --show-toplevel)" | ||
|
|
||
| echo "==> Verifying you're in cap5" | ||
| if [ ! -f CLAUDE.md ] || ! grep -q "Single-tenant video processing platform" CLAUDE.md; then | ||
| echo "ERROR: this doesn't look like the cap5 repo root. Aborting." >&2 | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo "==> Current branch: $(git branch --show-current)" | ||
| if [ "$(git branch --show-current)" != "$BRANCH" ]; then | ||
| echo "==> Switching to $BRANCH" | ||
| git checkout "$BRANCH" | ||
| fi |
There was a problem hiding this comment.
This script is hard-coded to a specific PR branch name and assumes it can freely git checkout the branch. If it's meant to live in-repo, consider making it branch-agnostic (operate on the current branch) or moving these PR-specific instructions into the review doc instead; as-is it will go stale and can disrupt a developer with local changes.
| echo "==> Staging the fix" | ||
| git add apps/web/e2e/layout.spec.ts docs/review-2026-04-10.md scripts/fix-web-e2e.sh | ||
|
|
||
| echo "==> Staged diff:" | ||
| git diff --cached --stat | ||
|
|
||
| echo "==> Committing" | ||
| git commit -m "fix(web/e2e): align layout.spec.ts with VideoRail ARIA tabs | ||
|
|
||
| CI / Web E2E has been red on main since commit 68eaaf1 (2026-04-06), | ||
| which refactored apps/web/src/pages/video-page/VideoRail.tsx to use | ||
| proper ARIA semantics (role=\"tablist\" + role=\"tab\" + role=\"tabpanel\") | ||
| but did not update apps/web/e2e/layout.spec.ts. The test still queried | ||
| the rail tabs as getByRole(\"button\", { name: \"Notes\"/\"Summary\"/\"Transcript\" }), | ||
| which no longer matches — per the ARIA spec, an explicit role=\"tab\" on | ||
| a <button> overrides the native button role in the accessibility tree, | ||
| so Playwright's accessibility-tree locators can only reach those elements | ||
| as role=\"tab\". | ||
|
|
||
| Fix: update the three rail-tab queries in the desktop test and the one | ||
| in the mobile test to use getByRole(\"tab\", { name: ..., exact: true }). | ||
| The unrelated page.getByRole(\"button\").nth(1).click() call on the mobile | ||
| test (hamburger menu, not a rail tab) is intentionally unchanged. | ||
|
|
||
| Verified locally: reproduced the exact CI failure against the unfixed | ||
| file, then all 7 Playwright tests pass with the fix applied. Closes | ||
| the Web E2E gap on both this PR and main. | ||
|
|
||
| Also updates docs/review-2026-04-10.md with a new section documenting | ||
| the breakage and the fix." | ||
|
|
||
| echo "==> Pushing to origin" | ||
| git push | ||
|
|
There was a problem hiding this comment.
The script performs destructive side effects (git commit and git push) as part of its normal flow. For a repository script this is risky/unexpected; consider removing the commit/push steps (leave the script as a repro/verification helper) or gating them behind an explicit flag/prompt so running the script can't accidentally publish changes.
cap5 — full-repo review and changelog
Date: 2026-04-10
Scope: Full-repo code + docs audit, targeted fixes, new test coverage, and doc reconciliation
Verification:
pnpm typecheck(7/7 packages) +pnpm test(56/56 unit tests) all green after changesThis file is a point-in-time record of the review that produced the edits below. It is intentionally dated and should not be treated as living documentation. For ongoing status, see
docs/status.md.Verdict
The repo is in materially good shape. No critical bugs, no security-incident-class issues, no silently-broken code paths. The issues found are of two kinds: (1) small correctness and semantic-drift cleanups that were worth fixing immediately, and (2) cosmetic / on-disk cruft that is safe to leave but worth the user's attention on a future pass.
Changes applied in this review
1. Docker migrate service switched to the transactional Node runner
docker-compose.yml— themigrateservice previously ran an ad-hoc shellpsqlfor-loop overdb/migrations/*.sql. That path did not track applied versions inschema_migrations, did not wrap each migration in a transaction, and was not the same code path used bypnpm db:migratelocally.Now the service builds the repo Dockerfile and runs
pnpm --filter @cap/db migrate, which invokespackages/db/scripts/migrate.mjs— the same transactional Node runner already used for local development. Migrations are idempotent, transactional, and tracked.Why this matters: production-ish and local now share one migration code path, so "migrated successfully on my machine" means the same thing as "migrated successfully in the compose stack."
2. WORKER_CLAIM_BATCH_SIZE semantic fix
Before: the config key
WORKER_CLAIM_BATCH_SIZEwas read in two places. The worker's claim path never used it (the loop always claims exactly one job per tick), but the reclaim path was using it as theLIMITon the expired-lease reclaim SQL — a different operation with different tuning needs.After:
packages/config/src/index.tsadds aWORKER_RECLAIM_BATCH_SIZEkey (default 25) and documentsWORKER_CLAIM_BATCH_SIZEas reserved/dormant — accepted for forward compatibility with a future batch-claiming worker loop, but currently inert.apps/worker/src/queue/claim.ts::reclaimExpiredLeasesnow readsWORKER_RECLAIM_BATCH_SIZEinstead ofWORKER_CLAIM_BATCH_SIZE..env.exampledocuments both keys with an explanatory block.CLAUDE.mdanddocs/status.mdare corrected — the previous "loop claims one at a time despiteWORKER_CLAIM_BATCH_SIZEexisting in config" wording is replaced with an accurate description, and the status-doc TODO "decide whetherWORKER_CLAIM_BATCH_SIZEshould be used or removed" is marked resolved.Why this matters: claim tuning and reclaim tuning are independent concerns. Conflating them under one key would have made a future attempt to raise reclaim throughput accidentally change claim semantics.
3.
formatDuration/formatTimestampdedupedapps/web/src/lib/format.tshad two byte-identical functions —formatDuration(2 call sites) andformatTimestamp(25+ call sites) — both doing the same hh:mm:ss/mm:ss formatting. Easy to let one drift from the other.Now
formatDurationis the single implementation andformatTimestampis exported as an alias to it (export const formatTimestamp = formatDuration;). All existing call sites continue to compile and run unchanged; output is guaranteed to stay consistent. Tests pass (19/19 inapps/web).4. S3 public-endpoint default aligned
apps/web-api/src/lib/s3.ts::getS3ClientAndBuckethad a fallback default of"http://localhost:9000"forS3_PUBLIC_ENDPOINT. Thepackages/configzod default is"http://localhost:8922"(the docker-compose host-mapped MinIO API port). Drift between the two meant that a misconfigured env file could produce presigned URLs pointing at the wrong host.Now both default to
http://localhost:8922with an inline comment explaining the alignment.5. Auth docs cross-referenced
docs/auth-plan.mdanddocs/review-auth-system.mdare related but serve different roles (living status doc vs. dated code-review sign-off). Rather than merging them and losing the review's date context,auth-plan.mdnow explicitly cross-references the review file under a "Related review" section so readers landing on either doc know which one to trust for current state.6. Closed the reclaim-test gap — new
apps/worker/src/queue/claim.test.tsdocs/status.mdpreviously listed "reclaim / expired-lease worker behavior still needs direct coverage" as a highest-risk gap. This review adds a newclaim.test.tsfile underapps/worker/src/queue/covering:claimOne()— parameters passed toCLAIM_SQL(limit, worker id, lease interval), returned-row mapping, null case when no job is available, and theCLAIM_SQL_WITH_EXCLUDEvariant whenexcludeTypesis non-emptyreclaimExpiredLeases()— that it usesWORKER_RECLAIM_BATCH_SIZE(notWORKER_CLAIM_BATCH_SIZE) as the LIMIT parameter, that it returns the rows reclaim surfaced (mix of retryable + dead), and that it respects a custom env valueMocking strategy mirrors the existing
lease.test.ts— mockwithTransactionfrom@cap/db, assert on SQL text and bind parameters rather than spinning up a real postgres. Six new tests; worker suite goes from 18 to 24 passing.7. Residual doc drift corrected
Several docs still carried the old "WORKER_CLAIM_BATCH_SIZE exists but is ignored" wording or stale
localhost:9000examples. Fixed:docs/development.md: the example-env block for host-side local dev now useslocalhost:8922for bothS3_ENDPOINTandS3_PUBLIC_ENDPOINT(aligned withdocker-compose.ymlandpackages/config), with an explanatory comment about when to use the docker internal hostname instead. The worker-tuning table now lists bothWORKER_CLAIM_BATCH_SIZE(reserved/dormant) andWORKER_RECLAIM_BATCH_SIZE(default 25).docs/system.md: both places that mentionedWORKER_CLAIM_BATCH_SIZE(the queue section around line 372 and the capacity-planning section around line 501) now describe the split correctly.docs/cap5_implementation_plan.md: the "decide whether WORKER_CLAIM_BATCH_SIZE should become real worker concurrency or be removed" TODO is resolved and replaced with the split description; added the new claim/reclaim test coverage to the Completed list.README.md: removed an empty## Toolingsection that had a trailing header with no content, and linked the new review doc.Findings raised and explicitly rejected after ground-truthing
During the review I ran several parallel sub-audits. A number of the claims they produced were wrong; I verified each against real files before acting. These are recorded so the same false alarms don't get re-raised later:
pnpm --filter @cap/web-api startwill fail because there is nostartscript." — False.apps/web-api/package.jsonhas"start": "node --enable-source-maps dist/index.js". Same forapps/worker.apps/worker/src/lib/hmac.tsandapps/web-api/src/lib/hmac.ts." — False. The worker module is outbound signing for webhooks we send; the web-api module is inbound verification for media-server callbacks. Opposite directions; not dedupable without introducing a shared package, which is not worth the churn at the current repo size.DeletedVideoSkipErrorinapps/worker/src/handlers/shared.tsis dead code." — False. It is thrown and caught across handler files as part of the soft-delete race path.JWT_SECRETbeing optional in the shared zod schema is a bug." — False.docs/review-auth-system.mdexplicitly calls this out as intentional: worker and media-server must not crash on a missing JWT secret because only web-api needs it. web-api validates at runtime when a request that needs auth is handled.apps/worker/src/lib/s3.tsduplicatesapps/web-api/src/lib/s3.ts." — False. The worker module only ever builds the internal S3 client (server-to-MinIO); the web-api module builds both internal and presign-capable clients. Different surfaces.Number(job.id)inapps/web-api/src/routes/jobs.tsloses precision." — Theoretically true for values >2^53, butjob.idis a bigserial nowhere near that range in this single-tenant app. Not a current bug.Cruft found but not removed (requires manual cleanup by user)
These items are on-disk noise that file-deletion permissions in this session prevented me from removing. They are safe to leave in place — none of them affect correctness — but they are worth cleaning up by hand on a future pass.
nanobanana/— ~371 MB directory of AI-generated PDFs, images, and anode_modules/from an unrelated side experiment. Already in.gitignore. Not referenced by any shipping code path. Safe torm -rf nanobanana..DS_Storefiles — 18+ on disk, none tracked in git (git ls-files | grep DS_Storeis empty). Safe to runfind . -name .DS_Store -deletefrom the repo root.apps/web/src/components/ChapterList.tsx— imported only by its own test file (apps/web/src/__tests__/ChapterList.test.tsx). Production usesChapterListInline.tsx. Because the test file is 10 green tests exercising legitimate chapter-list behavior, I chose not to stub or delete — removing both files means losing test coverage of a component that, while currently unused, encodes a reasonable spec. If you want to delete it, delete both files together; the rest of the app compiles without them.If you'd like any of these removed in a follow-up session where deletion is permitted, say the word.
Verification
pnpm build:internal— green (config, db, logger)pnpm typecheck— green across all 7 workspace projects (web, web-api, worker, media-server, config, db, logger)pnpm --filter @cap/web test— 19/19 passingpnpm --filter @cap/worker test— 24/24 passing (was 18; +6 new inclaim.test.ts)pnpm --filter @cap/web-api test— 13/13 passingpnpm --filter @cap/media-server test— 0 tests (expected; no unit tests defined)Files changed
Edited:
Added: