Skip to content

feat: review bottleneck dashboard with LLM classification#137

Merged
coji merged 35 commits intomainfrom
feat/review-bottleneck-dashboard
Mar 4, 2026
Merged

feat: review bottleneck dashboard with LLM classification#137
coji merged 35 commits intomainfrom
feat/review-bottleneck-dashboard

Conversation

@coji
Copy link
Owner

@coji coji commented Mar 1, 2026

Summary

  • Review bottleneck dashboard (/{orgSlug}/reviews): PR size distribution, WIP cycle time, review queue trend charts with drill-down to individual PRs
  • LLM-based PR classification: Gemini Flash で PR を XS/S/M/L/XL に自動分類。fetch → upsert パイプラインに統合済み
  • Classifier lab (lab/classify/): サンプリング → golden set 作成 → evaluate のパイプライン。DB非依存で再現性を保証
  • Backfill command: PR メタデータの軽量再取得(--files オプションでファイルリストのみ取得可)
  • All time period option: reviews ページに全期間表示オプションを追加

DB migrations

  • pull_requestscomplexity, complexity_reason, risk_areas, classified_at, classifier_model, closed_at カラム追加
  • github_raw_datatimeline_items カラム追加

Deploy procedure

  1. デプロイ(マイグレーション自動適用)
  2. 管理画面 Data Management から Refresh をスケジュール
  3. 次のクロールジョブで全データ再取得 → classify 自動実行

Test plan

  • pnpm validate passes
  • Classifier accuracy: 92.3% on 208-sample golden set
  • 本番デプロイ後に Refresh → classify が正常に完了すること
  • reviews ダッシュボードで全期間の分布が表示されること

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added Reviews dashboard with three analytics charts: queue trend analysis, WIP cycle insights, and PR size distribution metrics.
    • Added PR classification capability using AI to analyze review complexity and identify risk areas.
    • Added new Reviews navigation item.
  • Improvements

    • Enhanced data pipeline with timeline tracking and PR closure tracking.
  • Documentation

    • Added authentication, roadmap, and prompting guides.

coji and others added 4 commits March 1, 2026 12:33
Visualize where PR reviews are getting stuck and why:
- Reviewer queue depth (who has pending reviews piling up)
- WIP count vs review time correlation (more WIP = slower reviews)
- PR size distribution with auto-merge potential (XS/S PRs could skip review)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Integrate Gemini Flash Lite into the batch pipeline to classify PR review
complexity (XS/S/M/L/XL) based on file paths, diff size, and title.
The dashboard falls back to rule-based classification when LLM results
are unavailable (GEMINI_API_KEY not set or PR not yet classified).

- Add complexity/risk_areas/classified_at columns to pull_requests
- Add files(first:100) to GitHub GraphQL PR queries
- Create batch/lib/llm-classify.ts (ported from lab experiments)
- Create batch/usecases/classify-pull-requests.ts
- Update dashboard to prefer LLM classification with rule-based fallback
- Remove client name from placeholder text

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract aggregation logic (WIP calculation, PR size grouping) into
aggregate.ts and run it in clientLoader instead of on the server.
This removes the heavy correlated subquery for WIP counts from the
server-side SQL and replaces the scatter chart with a simpler
bar chart grouped by WIP count.

- Add aggregate.ts with pure functions for WIP and PR size aggregation
- Add clientLoader in reviews/index.tsx with HydrateFallback
- Simplify chart components to receive pre-aggregated data
- Remove useMemo from chart components (no longer needed)
- Change WIP chart from scatter plot to grouped bar chart

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cache review dashboard queries for 5 minutes to reduce server load.
Clear cache after batch crawl and manual recalculate to ensure fresh data.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 1, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR introduces a comprehensive PR review analytics dashboard with LLM-based complexity classification. It adds a new Reviews page displaying queue trends, WIP cycle analysis, and PR size distribution; implements server-side data fetching and client-side aggregation; integrates Gemini-based PR classification; extends database schema to store classification metadata and timeline items; enhances GitHub data fetching to collect PR body, closure dates, and timeline events; and provides CLI tools and evaluation frameworks for classification quality assessment.

Changes

Cohort / File(s) Summary
Reviews Page & Charts
app/routes/$orgSlug/reviews/index.tsx, app/routes/$orgSlug/reviews/+components/queue-trend-chart.tsx, app/routes/$orgSlug/reviews/+components/wip-cycle-chart.tsx, app/routes/$orgSlug/reviews/+components/pr-size-chart.tsx, app/routes/$orgSlug/reviews/+components/pr-drill-down-sheet.tsx
New reviews dashboard with three data-driven charts (queue trend, WIP cycle, PR size distribution), team filtering, period selection, and drill-down details for individual PRs.
Aggregation & Classification Functions
app/routes/$orgSlug/reviews/+functions/aggregate.ts, app/routes/$orgSlug/reviews/+functions/aggregate.test.ts, app/routes/$orgSlug/reviews/+functions/classify.ts, app/routes/$orgSlug/reviews/+functions/classify.test.ts
Core analytics aggregation logic for WIP cycles, queue trends, and PR size distribution; PR complexity classification utility with validation; comprehensive test coverage.
Server Data Queries
app/routes/$orgSlug/reviews/+functions/queries.server.ts, app/routes/$orgSlug/settings/repositories/$repository/$pull/queries.server.ts
New server-side data fetching functions for queue history, WIP cycle raw data, and PR size distribution; extended PR detail queries with organization settings and timeline support.
Batch PR Classification
batch/lib/llm-classify.ts, batch/usecases/classify-pull-requests.ts, batch/commands/classify.ts
New LLM-based PR classification pipeline using Gemini API with batch processing, cost estimation, retry logic, and database persistence of complexity labels and risk areas.
GitHub Data Fetcher Enhancement
batch/provider/github/fetcher.ts, batch/provider/github/fetcher.test.ts, batch/provider/github/model.ts, batch/provider/github/store.ts, batch/provider/github/store.test.ts, batch/provider/github/types.ts
Extended GitHub API queries to fetch PR body, closure dates, timeline items, and file changes; refactored reviewer data structure to include requestedAt timestamps; improved error handling and timeout management; added timeline item processing and storage.
Provider & Data Pipeline
batch/provider/github/provider.ts, batch/provider/github/pullrequest.ts, batch/provider/index.ts
Streamlined provider PR fetch/detail flow; added backfill operation for metadata and files; updated reviewers structure throughout pipeline; enhanced logging and incremental filtering.
Database & ORM
batch/db/mutations.ts, app/services/tenant-type.ts, app/services/type.ts, db/tenant.sql, db/migrations/tenant/...
Added columns for PR classification (complexity, complexityReason, riskAreas, classifiedAt, classifierModel), closure date, and timeline items; extended schema migrations; added Prisma migrations type to DB interface.
Batch Processing
batch/jobs/crawl.ts, batch/usecases/analyze-and-upsert.ts, app/routes/$orgSlug/settings/data-management/index.tsx, app/services/db.server.ts, app/services/tenant-db.server.ts
Integrated cache clearing after crawl and export operations; added classification step to analytics pipeline; configured WAL auto-checkpoint pragmas for database optimization.
CLI & Lab Tools
batch/cli.ts, lab/classify/judge.ts, lab/classify/judge-batch.ts, lab/classify/judge-common.ts, lab/classify/evaluate.ts, lab/classify/compare.ts, lab/classify/sample.ts
New CLI commands for PR classification and backfill; comprehensive lab framework for golden-set management, batch judgment via Gemini API, evaluation metrics computation, and model comparison; stratified sampling utilities.
Navigation & Settings
app/components/layout/nav-config.ts, app/routes/$orgSlug/settings/_index/+forms/organization-settings.tsx
Added Reviews link to main navigation; minor placeholder text update in organization settings.
Documentation
CLAUDE.md, docs/, lab/README.md, .gitignore, db/seed.ts, ops/commands/pull-db.ts, lefthook.yml
Comprehensive documentation on batch operations, development workflows, auth design, roadmap, and Gemini prompting; enriched seed data with PR classifications and reviewer assignments; enhanced WAL checkpoint handling for database pulls; updated lint configuration.

Sequence Diagram(s)

sequenceDiagram
    participant User as User
    participant Reviews as Reviews Page
    participant Server as Server Loader
    participant Client as Client Loader
    participant DB as Tenant Database
    participant Agg as Aggregation Functions
    participant Chart as Chart Components

    User->>Reviews: Visit /reviews
    Reviews->>Server: Load page data
    Server->>DB: Query queue history
    Server->>DB: Query WIP cycle raw data
    Server->>DB: Query PR size distribution
    DB-->>Server: Return raw data
    Server-->>Client: Pass raw data to client loader
    Client->>Agg: aggregateWeeklyQueueTrend()
    Client->>Agg: aggregateWipCycle()
    Client->>Agg: computeWipLabels()
    Client->>Agg: aggregatePRSize()
    Agg-->>Client: Return aggregated data
    Client-->>Reviews: Provide processed data
    Reviews->>Chart: Render QueueTrendChart
    Reviews->>Chart: Render WipCycleChart
    Reviews->>Chart: Render PRSizeChart
    Chart-->>Reviews: Display visualizations
    Reviews-->>User: Show dashboard with charts
Loading
sequenceDiagram
    participant Batch as Batch Job
    participant Fetcher as GitHub Fetcher
    participant Parser as Data Parser
    participant Store as PR Store
    participant Classifier as LLM Classifier
    participant DB as Database
    participant Cache as Cache Service

    Batch->>Fetcher: Fetch PRs from GitHub API
    Fetcher->>Fetcher: Query pullRequests (body, closedAt)
    Fetcher->>Fetcher: Query timelineItems
    Fetcher->>Fetcher: Fetch file changes via REST
    Fetcher-->>Parser: Return shaped PR data
    Parser->>Store: Store raw data
    Store->>DB: Save to github_raw_data
    DB-->>Batch: Confirm storage
    Batch->>Classifier: Classify unclassified PRs
    Classifier->>Classifier: Batch call Gemini API
    Classifier->>Classifier: Parse complexity responses
    Classifier-->>Batch: Return classifications
    Batch->>DB: Update complexity, riskAreas
    DB-->>Batch: Confirm update
    Batch->>Cache: Clear cache
    Cache-->>Batch: Cache cleared
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • PR #135: Both PRs modify the pull-request detail route and server queries, affecting the same repository settings layout files and loader/action behavior with significant overlaps in data fetching logic.
  • PR #130: Both PRs implement the same batch migration from JSON files to tenant database (github_raw_data table, DB-backed store operations, PR timeline/files integration into database), making them directly related at the code level.
  • PR #106: Both PRs make overlapping changes to batch GitHub provider, data fetcher, and analyze/upsert pipeline with PR shape extensions (timeline items, closure dates, reviewer requestedAt), resulting in significant code-level intersection.

Poem

🐰 Charts bloom bright with data's song,
Gemini whispers what's right and wrong,
Timelines flow and sizes align,
A dashboard review, so fine!
Cache clears swift—now all will shine.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 27.16% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: review bottleneck dashboard with LLM classification' directly and clearly describes the main feature being added: a review bottleneck dashboard that incorporates LLM-based classification.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/review-bottleneck-dashboard

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
app/routes/$orgSlug/settings/data-management/index.tsx (1)

88-123: ⚠️ Potential issue | 🟠 Major

Run cache invalidation in finally to avoid stale reads after partial failure.

upsertPullRequest can partially persist before an exception. Since clearAllCache() is only in the success path, dashboard cache may remain stale on failure.

🔧 Suggested change
       try {
         const { pulls, reviewResponses } = await provider.analyze(
           orgContext.id,
           organization.organizationSetting,
           organization.repositories,
         )

         for (const pr of pulls) {
           await upsertPullRequest(orgContext.id, pr)
         }

         if (organization.exportSetting) {
           const exporter = createSpreadsheetExporter(organization.exportSetting)
           await exporter.exportPulls(pulls)
           await exporter.exportReviewResponses(reviewResponses)
         }

-        clearAllCache()
         return data({
           intent: 'recalculate' as const,
           ok: true,
           message: `Recalculation completed. ${pulls.length} PRs updated.`,
         })
       } catch (e) {
         console.error(
           'Recalculation failed:',
           e instanceof Error ? e.message : 'Unknown error',
         )
         return data(
           {
             intent: 'recalculate' as const,
             error: 'Recalculation failed. Please try again later.',
           },
           { status: 500 },
         )
+      } finally {
+        clearAllCache()
       }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/routes/`$orgSlug/settings/data-management/index.tsx around lines 88 -
123, The cache invalidation is currently only executed on the success path which
can leave the dashboard stale if upsertPullRequest partially persists then an
exception is thrown; move the clearAllCache() call into a finally block that
always runs after the try/catch (and remove the existing clearAllCache() from
the success path) so provider.analyze/upsertPullRequest persistence and
exporter.export* side-effects are followed by guaranteed cache invalidation
regardless of error; keep the existing return values from the try and catch
branches unchanged.
🧹 Nitpick comments (11)
batch/usecases/analyze-and-upsert.ts (1)

90-92: Consider wrapping classification in try-catch to ensure export step runs.

If classifyPullRequests throws an unexpected error (e.g., network failure to Gemini API), the export step (step 7) will be skipped. Since classification is optional and the export may be important for downstream workflows, consider isolating failures:

♻️ Proposed defensive wrapper
   // 6. classify PRs with LLM (optional, requires GEMINI_API_KEY)
-  await classifyPullRequests(orgId)
+  try {
+    await classifyPullRequests(orgId)
+  } catch (e) {
+    logger.error('classification failed, continuing with export...', orgId, e)
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@batch/usecases/analyze-and-upsert.ts` around lines 90 - 92, The call to
classifyPullRequests should be wrapped in a try-catch so an unexpected error
(e.g., Gemini/network failure) doesn't prevent the subsequent export step (step
7) from running; update the code around classifyPullRequests(orgId) to catch any
thrown error, log it with context (including the orgId and error message) via
the existing logger, and continue execution (do not rethrow) so the export step
still executes.
app/routes/$orgSlug/reviews/+functions/classify.test.ts (2)

5-17: Add threshold-edge cases for bucket boundaries.

The table currently misses exact cutovers (10/11, 50/51, 200/201, 500/501), so off-by-one regressions could slip through.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/routes/`$orgSlug/reviews/+functions/classify.test.ts around lines 5 - 17,
Tests for classifyPRSize are missing exact boundary cases, which allows
off-by-one bugs at bucket cutoffs; add parameterized cases calling
classifyPRSize for the precise thresholds around each boundary (e.g.,
additions/deletions pairs that sum to or cross the cutovers: 10 and 11, 50 and
51, 200 and 201, 500 and 501, including null cases) and assert the expected size
symbol for each so the function's bucket logic (classifyPRSize) is validated at
the edges.

2-2: Use ~/ alias for in-app import paths.

Line 2 imports from app/ via ./classify; switch to the repo-standard alias path.

As per coding guidelines: Use path alias ~/ prefix for imports from app/ directory.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/routes/`$orgSlug/reviews/+functions/classify.test.ts at line 2, The test
imports classifyPRSize and getPRComplexity from a relative module ("./classify")
instead of using the repo-standard in-app alias; update the import so it still
imports classifyPRSize and getPRComplexity but uses the application path alias
(module path prefixed with '~/') that points to the classify module in this
route, ensuring the test follows the project's ~/ alias convention.
app/routes/$orgSlug/reviews/+functions/aggregate.test.ts (2)

2-2: Use ~/ alias for app imports here too.

Line 2 should use the route alias instead of a relative path.

As per coding guidelines: Use path alias ~/ prefix for imports from app/ directory.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/routes/`$orgSlug/reviews/+functions/aggregate.test.ts at line 2, The test
uses a relative import for the functions; replace the './aggregate' import with
the app alias form (starting with '~/') so the file imports aggregatePRSize and
aggregateWipCycle via the ~/ path alias instead of a relative path; update the
import statement that brings in aggregatePRSize and aggregateWipCycle to use the
'~/...' route alias that maps to the app/ directory.

5-29: Assert exact grouped output, not only > 0 conditions.

The current checks can pass even if grouping/median math regresses. Please assert expected labels/counts/median for this fixture.

app/routes/$orgSlug/reviews/+components/pr-size-chart.tsx (2)

23-24: Switch function imports to ~/ alias paths.

These imports target modules under app/ and should use aliased paths instead of relative traversal.

As per coding guidelines: Use path alias ~/ prefix for imports from app/ directory.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/routes/`$orgSlug/reviews/+components/pr-size-chart.tsx around lines 23 -
24, Update the two type imports in pr-size-chart.tsx to use the project path
alias instead of relative paths: replace the relative imports of
PRSizeAggregation and PRSizeLabel with aliased imports from
~/routes/$orgSlug/reviews/+functions/aggregate and
~/routes/$orgSlug/reviews/+functions/classify (referencing the type names
PRSizeAggregation and PRSizeLabel) so they import from the app/ directory via
the ~/ prefix per the coding guidelines.

45-49: Extract formatHours into a shared chart helper.

This formatter duplicates app/routes/$orgSlug/reviews/+components/wip-cycle-chart.tsx (Line 32-36). Centralizing it avoids divergence across tooltips/axes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/routes/`$orgSlug/reviews/+components/pr-size-chart.tsx around lines 45 -
49, The formatHours function is duplicated across components (formatHours in
pr-size-chart.tsx and the same logic in wip-cycle-chart.tsx); extract it into a
shared chart helper (e.g., a new exported function formatHours in a shared
module like app/utils/chart-formatters or app/components/chartHelpers) and
replace the local implementations in both pr-size-chart.tsx and
wip-cycle-chart.tsx with imports from that helper; ensure the new helper exports
the same function name formatHours and update imports in both files so
tooltips/axes use the centralized formatter.
app/routes/$orgSlug/reviews/+components/wip-cycle-chart.tsx (1)

23-23: Use ~/ alias for this app import.

Line 23 should follow the same alias rule for imports under app/.

As per coding guidelines: Use path alias ~/ prefix for imports from app/ directory.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/routes/`$orgSlug/reviews/+components/wip-cycle-chart.tsx at line 23, The
import in wip-cycle-chart.tsx uses a relative path for WipAggregation; replace
the relative import "import type { WipAggregation } from
'../+functions/aggregate'" with the app-root alias form so it reads from the app
root (e.g. import type { WipAggregation } from
'~/routes/$orgSlug/reviews/+functions/aggregate'), ensuring the WipAggregation
type is resolved via the ~/ alias per the coding guidelines.
app/routes/$orgSlug/reviews/index.tsx (2)

44-45: Consider validating period against allowed values.

If a user manually enters an invalid or excessively large period value (e.g., ?period=abc or ?period=9999), Number() will return NaN or an out-of-range value. This could cause unexpected date calculations and cache key collisions (for NaN).

Suggested fix
  const teamParam = url.searchParams.get('team')
- const periodMonths = Number(url.searchParams.get('period') || '3')
+ const periodParam = url.searchParams.get('period')
+ const validPeriods = [1, 3, 6, 12]
+ const periodMonths = validPeriods.includes(Number(periodParam))
+   ? Number(periodParam)
+   : 3
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/routes/`$orgSlug/reviews/index.tsx around lines 44 - 45, Validate and
sanitize the period query value returned by url.searchParams.get('period')
before using it as periodMonths: parse it as an integer (e.g., via parseInt or
Number), treat NaN as the default (3), constrain/clamp the parsed value to an
acceptable range (e.g., min 1, max a reasonable upper bound such as 12 or a
defined MAX_PERIOD constant), and use that sanitized value for periodMonths to
avoid NaN and out-of-range values and prevent cache key collisions (update any
code that relies on periodMonths accordingly, referencing periodMonths and
url.searchParams.get('period')).

21-21: Use ~/ path alias for imports from app/ directory.

This relative import violates the coding guideline. Consider using the absolute path alias for consistency.

Suggested fix
-import { listTeams } from '../settings/teams._index/queries.server'
+import { listTeams } from '~/app/routes/$orgSlug/settings/teams._index/queries.server'

As per coding guidelines: "Use path alias ~/ prefix for imports from app/ directory".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/routes/`$orgSlug/reviews/index.tsx at line 21, The import of listTeams
uses a relative path which violates the guideline; update the import in
app/routes/$orgSlug/reviews/index.tsx to use the `~/` path alias (e.g. replace
"../settings/teams._index/queries.server" with the equivalent starting with
"~/") so the listTeams import uses an aliased path; ensure the import statement
referencing listTeams is updated accordingly.
app/routes/$orgSlug/reviews/+functions/aggregate.ts (1)

46-61: O(n²) complexity may cause client-side lag for large datasets.

For each PR in the filtered set, the inner filter (lines 51-59) iterates through all data to count concurrent WIP. With N PRs, this is O(N²). Over a 12-month period with thousands of PRs, this could cause noticeable UI jank in the client.

Consider pre-sorting by author and using a more efficient approach if performance becomes an issue:

Alternative O(n log n) approach sketch
// Group PRs by author first, then use sorted intervals
const byAuthor = Map.groupBy(data, (pr) => pr.author)
const prsWithWip = data
  .filter((d) => d.reviewTime !== null && d.reviewTime > 0)
  .map((pr) => {
    const authorPRs = byAuthor.get(pr.author) ?? []
    // Count overlapping intervals for this author at pr.pullRequestCreatedAt
    const wipCount = authorPRs.filter(
      (other) =>
        other.pullRequestCreatedAt <= pr.pullRequestCreatedAt &&
        (other.mergedAt === null || other.mergedAt > pr.pullRequestCreatedAt) &&
        (other.number !== pr.number || other.repositoryId !== pr.repositoryId),
    ).length
    return { wipCount, reviewTimeHours: (pr.reviewTime as number) * 24 }
  })

This reduces the search space per PR to only that author's PRs. For interval counting, a sweep-line algorithm could further reduce to O(n log n) if needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/routes/`$orgSlug/reviews/+functions/aggregate.ts around lines 46 - 61,
The prsWithWip computation inside aggregateWipCycle is O(n²) because for each PR
it scans all rows; refactor to group rows by author first (e.g., build a Map
keyed by author with arrays of that author’s PRs) and then compute wipCount only
against that author's PR list (optionally sort each list by pullRequestCreatedAt
and use binary search or a sweep-line to count overlapping intervals) so
prsWithWip only iterates within the author’s subset rather than the entire data
set.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/services/type.ts`:
- Around line 12-21: The manually added _PrismaMigrations interface in
app/services/type.ts and its corresponding DB mapping entry should be removed
because there is no _prisma_migrations table in the schema; either delete the
_PrismaMigrations interface declaration and the DB mapping entry (the generated
types for this table) from the file, then regenerate types by running pnpm
db:generate to ensure generated types match the database, or alternatively add
the missing _prisma_migrations table to your database schema/migrations and
re-run pnpm db:generate so the interface is produced legitimately.

In `@batch/jobs/crawl.ts`:
- Around line 79-81: Wrap the main crawl work in a try/finally so
clearAllCache() always runs even if an error is thrown; move the existing
logger.info('crawl completed.') into the success path (end of try) and invoke
clearAllCache() from the finally block (keeping a separate logger call for
cache-clear success/failure if desired). Specifically, locate the crawl job's
main loop/function and ensure clearAllCache() is removed from the normal linear
flow and instead called unconditionally in finally while preserving error
propagation.

In `@batch/lib/llm-classify.ts`:
- Around line 21-29: The PR map key uses only PR.number which can collide across
repositories; update the PRInput interface in llm-classify.ts to include
repositoryId (string) and change all places that store/read from result.results
to use a composite key like `${pr.repositoryId}#${pr.number}` (e.g., when
inserting and when retrieving with result.results.get) so keys are globally
unique; also update any related code paths (including the classify-pull-requests
retrieval logic) that construct or expect the old numeric-only key to
build/consume the new composite key.
- Around line 149-214: Add Vitest unit tests for batchClassifyPRs covering
concurrency, failure accounting, and billing math: mock the GoogleGenAI or stub
classifySinglePR to produce controlled successes and failures so you can assert
successCount/errorCount, that results Map keys are unique per PR.number, and
that totalUsage and estimatedCost reflect accumulated token usage; include tests
for batching behavior (use small concurrency like 2 with a larger PR array and
delays stubbed or zero) and for handling thrown errors to ensure logger path
increments errorCount without stopping the batch. Target the batchClassifyPRs
function, mock or replace classifySinglePR/GoogleGenAI responses to return
predictable usage objects, and assert final return values (results, totalUsage,
estimatedCost, successCount, errorCount) for both all-success and
mixed-success/failure scenarios.

In `@batch/provider/github/fetcher.ts`:
- Around line 41-47: The files field in the GraphQL queries is missing
pagination metadata and handling; update each files(first: 100) block to include
pageInfo { hasNextPage, endCursor } and implement a pagination loop in the
pull-request fetcher (e.g., the function that builds pr.files / fetchPullRequest
or getPullRequestFiles) to repeatedly query using after: endCursor until
hasNextPage is false, aggregating nodes into pr.files; add a boolean
needsMoreFiles on the PR object (set true if any page reports hasNextPage) to
mirror the other needsMore* flags and ensure batch/lib/llm-classify.ts uses
pr.needsMoreFiles when deciding about truncation.

In `@batch/usecases/classify-pull-requests.ts`:
- Around line 63-80: The mapping over unclassifiedPRs currently calls JSON.parse
on pr.rawPullRequest without protection, so a malformed payload will throw and
abort the batch; wrap the parse in a try/catch inside the prInputs mapping (or
extract to a small helper used by the map) and on parse error return a safe
fallback shaped object (e.g., {} or minimal ShapedGitHubPullRequest) and/or set
files to an empty array, and optionally log the error including
pr.number/repositoryId; update the returned object in the map to use the safely
parsed raw (referenced as rawPullRequest / raw / ShapedGitHubPullRequest and the
prInputs array) so a single bad payload cannot fail the whole run.

In `@docs/review-bottleneck-dashboard.md`:
- Around line 26-27: The architecture note claiming "no new batch/schema
changes" is outdated because this PR introduces LLM batch classification and
adds new classification columns to the pull_requests table; update the document
text (lines referencing the claim) to reflect that the implementation includes
an LLM batch classification step and schema additions (new classification
columns on pull_requests) so the doc accurately describes the current design and
remove or modify the "no new batch/schema changes" statement accordingly.
- Around line 53-56: Section B's description and diagram still describe a
scatter plot but the implementation renders grouped bars (WIP vs review time);
update the prose and ASCII diagram in Section B to describe grouped bar charts
showing median review time per WIP bucket (e.g., WIP=1, WIP=3+) and indicate
relative multipliers (Z×), and apply the same change to the repeated text block
referenced (the other occurrence around the doc). Locate and edit the "Section
B" chart description and the ASCII-art diagram so they explicitly say "Grouped
bar chart: WIP vs median review_time" (or similar) and replace scatter-plot
wording and labels accordingly.
- Around line 38-65: Add a language identifier (e.g., text) to the fenced code
blocks in docs/review-bottleneck-dashboard.md so markdownlint MD040 is
satisfied: update the ASCII-art dashboard fence (the block starting with the box
drawing "┌────────────") to start with ```text instead of ``` and likewise
change the second fence containing the path fragment
"app/routes/$orgSlug/reviews/" to ```text; apply the same change to the other
occurrence noted around lines 144-156.

---

Outside diff comments:
In `@app/routes/`$orgSlug/settings/data-management/index.tsx:
- Around line 88-123: The cache invalidation is currently only executed on the
success path which can leave the dashboard stale if upsertPullRequest partially
persists then an exception is thrown; move the clearAllCache() call into a
finally block that always runs after the try/catch (and remove the existing
clearAllCache() from the success path) so provider.analyze/upsertPullRequest
persistence and exporter.export* side-effects are followed by guaranteed cache
invalidation regardless of error; keep the existing return values from the try
and catch branches unchanged.

---

Nitpick comments:
In `@app/routes/`$orgSlug/reviews/+components/pr-size-chart.tsx:
- Around line 23-24: Update the two type imports in pr-size-chart.tsx to use the
project path alias instead of relative paths: replace the relative imports of
PRSizeAggregation and PRSizeLabel with aliased imports from
~/routes/$orgSlug/reviews/+functions/aggregate and
~/routes/$orgSlug/reviews/+functions/classify (referencing the type names
PRSizeAggregation and PRSizeLabel) so they import from the app/ directory via
the ~/ prefix per the coding guidelines.
- Around line 45-49: The formatHours function is duplicated across components
(formatHours in pr-size-chart.tsx and the same logic in wip-cycle-chart.tsx);
extract it into a shared chart helper (e.g., a new exported function formatHours
in a shared module like app/utils/chart-formatters or
app/components/chartHelpers) and replace the local implementations in both
pr-size-chart.tsx and wip-cycle-chart.tsx with imports from that helper; ensure
the new helper exports the same function name formatHours and update imports in
both files so tooltips/axes use the centralized formatter.

In `@app/routes/`$orgSlug/reviews/+components/wip-cycle-chart.tsx:
- Line 23: The import in wip-cycle-chart.tsx uses a relative path for
WipAggregation; replace the relative import "import type { WipAggregation } from
'../+functions/aggregate'" with the app-root alias form so it reads from the app
root (e.g. import type { WipAggregation } from
'~/routes/$orgSlug/reviews/+functions/aggregate'), ensuring the WipAggregation
type is resolved via the ~/ alias per the coding guidelines.

In `@app/routes/`$orgSlug/reviews/+functions/aggregate.test.ts:
- Line 2: The test uses a relative import for the functions; replace the
'./aggregate' import with the app alias form (starting with '~/') so the file
imports aggregatePRSize and aggregateWipCycle via the ~/ path alias instead of a
relative path; update the import statement that brings in aggregatePRSize and
aggregateWipCycle to use the '~/...' route alias that maps to the app/
directory.

In `@app/routes/`$orgSlug/reviews/+functions/aggregate.ts:
- Around line 46-61: The prsWithWip computation inside aggregateWipCycle is
O(n²) because for each PR it scans all rows; refactor to group rows by author
first (e.g., build a Map keyed by author with arrays of that author’s PRs) and
then compute wipCount only against that author's PR list (optionally sort each
list by pullRequestCreatedAt and use binary search or a sweep-line to count
overlapping intervals) so prsWithWip only iterates within the author’s subset
rather than the entire data set.

In `@app/routes/`$orgSlug/reviews/+functions/classify.test.ts:
- Around line 5-17: Tests for classifyPRSize are missing exact boundary cases,
which allows off-by-one bugs at bucket cutoffs; add parameterized cases calling
classifyPRSize for the precise thresholds around each boundary (e.g.,
additions/deletions pairs that sum to or cross the cutovers: 10 and 11, 50 and
51, 200 and 201, 500 and 501, including null cases) and assert the expected size
symbol for each so the function's bucket logic (classifyPRSize) is validated at
the edges.
- Line 2: The test imports classifyPRSize and getPRComplexity from a relative
module ("./classify") instead of using the repo-standard in-app alias; update
the import so it still imports classifyPRSize and getPRComplexity but uses the
application path alias (module path prefixed with '~/') that points to the
classify module in this route, ensuring the test follows the project's ~/ alias
convention.

In `@app/routes/`$orgSlug/reviews/index.tsx:
- Around line 44-45: Validate and sanitize the period query value returned by
url.searchParams.get('period') before using it as periodMonths: parse it as an
integer (e.g., via parseInt or Number), treat NaN as the default (3),
constrain/clamp the parsed value to an acceptable range (e.g., min 1, max a
reasonable upper bound such as 12 or a defined MAX_PERIOD constant), and use
that sanitized value for periodMonths to avoid NaN and out-of-range values and
prevent cache key collisions (update any code that relies on periodMonths
accordingly, referencing periodMonths and url.searchParams.get('period')).
- Line 21: The import of listTeams uses a relative path which violates the
guideline; update the import in app/routes/$orgSlug/reviews/index.tsx to use the
`~/` path alias (e.g. replace "../settings/teams._index/queries.server" with the
equivalent starting with "~/") so the listTeams import uses an aliased path;
ensure the import statement referencing listTeams is updated accordingly.

In `@batch/usecases/analyze-and-upsert.ts`:
- Around line 90-92: The call to classifyPullRequests should be wrapped in a
try-catch so an unexpected error (e.g., Gemini/network failure) doesn't prevent
the subsequent export step (step 7) from running; update the code around
classifyPullRequests(orgId) to catch any thrown error, log it with context
(including the orgId and error message) via the existing logger, and continue
execution (do not rethrow) so the export step still executes.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e381e50 and 7b9407d.

⛔ Files ignored due to path filters (1)
  • db/migrations/tenant/atlas.sum is excluded by !**/*.sum
📒 Files selected for processing (26)
  • app/components/layout/nav-config.ts
  • app/routes/$orgSlug/reviews/+components/pr-size-chart.tsx
  • app/routes/$orgSlug/reviews/+components/reviewer-queue-chart.tsx
  • app/routes/$orgSlug/reviews/+components/wip-cycle-chart.tsx
  • app/routes/$orgSlug/reviews/+functions/aggregate.test.ts
  • app/routes/$orgSlug/reviews/+functions/aggregate.ts
  • app/routes/$orgSlug/reviews/+functions/classify.test.ts
  • app/routes/$orgSlug/reviews/+functions/classify.ts
  • app/routes/$orgSlug/reviews/+functions/queries.server.ts
  • app/routes/$orgSlug/reviews/index.tsx
  • app/routes/$orgSlug/settings/_index/+forms/organization-settings.tsx
  • app/routes/$orgSlug/settings/data-management/index.tsx
  • app/services/tenant-type.ts
  • app/services/type.ts
  • batch/jobs/crawl.ts
  • batch/lib/llm-classify.ts
  • batch/provider/github/aggregator.test.ts
  • batch/provider/github/fetcher.ts
  • batch/provider/github/model.ts
  • batch/provider/github/pullrequest.ts
  • batch/provider/github/store.test.ts
  • batch/usecases/analyze-and-upsert.ts
  • batch/usecases/classify-pull-requests.ts
  • db/migrations/tenant/20260301035108.sql
  • db/tenant.sql
  • docs/review-bottleneck-dashboard.md

Comment on lines +12 to +21
export interface _PrismaMigrations {
appliedStepsCount: Generated<string>;
checksum: string;
finishedAt: string | null;
id: string;
logs: string | null;
migrationName: string;
rolledBackAt: string | null;
startedAt: Generated<string>;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== 1. Check for _prisma_migrations table in migrations ==="
find db -name "*.sql" -type f | head -20

echo
echo "=== 2. Search for _prisma_migrations references in migrations ==="
rg "_prisma_migrations|_PrismaMigrations" db/ --type sql || echo "No matches found"

echo
echo "=== 3. Check app/services/type.ts header and footer for codegen markers ==="
head -n 5 app/services/type.ts
echo "---"
tail -n 10 app/services/type.ts

echo
echo "=== 4. Search entire codebase for _PrismaMigrations references ==="
rg "_PrismaMigrations" . --type ts --type tsx -C 2 || echo "No matches found"

echo
echo "=== 5. Check db/schema.sql for prisma migrations table ==="
if [ -f "db/schema.sql" ]; then
  rg -A 10 "_prisma_migrations|CREATE TABLE.*prisma" db/schema.sql || echo "No prisma migrations table found"
else
  echo "db/schema.sql not found"
fi

echo
echo "=== 6. List migration files ==="
ls -la db/migrations/ 2>/dev/null | head -20 || echo "db/migrations not accessible"

Repository: coji/upflow

Length of output: 1780


Remove manual _PrismaMigrations interface; it has no corresponding table in the database schema.

This file is auto-generated by kysely-codegen and should not be edited manually. The _PrismaMigrations interface (lines 12–21) and its DB mapping entry (line 136) lack a corresponding _prisma_migrations table in the schema or any migrations. This causes schema drift between generated types and the actual database. Either remove these additions or add the table to the schema and regenerate types via pnpm db:generate.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/services/type.ts` around lines 12 - 21, The manually added
_PrismaMigrations interface in app/services/type.ts and its corresponding DB
mapping entry should be removed because there is no _prisma_migrations table in
the schema; either delete the _PrismaMigrations interface declaration and the DB
mapping entry (the generated types for this table) from the file, then
regenerate types by running pnpm db:generate to ensure generated types match the
database, or alternatively add the missing _prisma_migrations table to your
database schema/migrations and re-run pnpm db:generate so the interface is
produced legitimately.

Comment on lines +149 to +214
export async function batchClassifyPRs(
prs: PRInput[],
options?: {
apiKey?: string
model?: string
concurrency?: number
delayMs?: number
},
): Promise<BatchClassifyResult> {
const apiKey = options?.apiKey ?? process.env.GEMINI_API_KEY
if (!apiKey) {
throw new Error(
'GEMINI_API_KEY is required. Get one at https://aistudio.google.com/apikey',
)
}

const ai = new GoogleGenAI({ apiKey })
const model = options?.model ?? DEFAULT_MODEL
const concurrency = options?.concurrency ?? 5
const delayMs = options?.delayMs ?? 200

const results = new Map<string, LLMClassification>()
const totalUsage: TokenUsage = {
promptTokens: 0,
candidatesTokens: 0,
totalTokens: 0,
}
let successCount = 0
let errorCount = 0

for (let i = 0; i < prs.length; i += concurrency) {
const batch = prs.slice(i, i + concurrency)
const promises = batch.map(async (pr) => {
const key = `${pr.number}`
try {
const result = await classifySinglePR(ai, model, pr)
results.set(key, result.classification)
totalUsage.promptTokens += result.usage.promptTokens
totalUsage.candidatesTokens += result.usage.candidatesTokens
totalUsage.totalTokens += result.usage.totalTokens
successCount++
} catch (err) {
logger.warn(
`Failed to classify PR #${pr.number}: ${err instanceof Error ? err.message : String(err)}`,
)
errorCount++
}
})
await Promise.all(promises)

logger.info(
`Classified ${Math.min(i + concurrency, prs.length)}/${prs.length} PRs`,
)

if (i + concurrency < prs.length) {
await new Promise((resolve) => setTimeout(resolve, delayMs))
}
}

const cost = estimateCost(totalUsage, model)
logger.info(
`Classification complete: ${successCount} success, ${errorCount} errors, cost $${cost.toFixed(4)}`,
)

return { results, totalUsage, estimatedCost: cost, successCount, errorCount }
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add Vitest coverage for batch control-flow and accounting.

This module now handles concurrency, failure accounting, and billing math, but no tests were added here. Please add focused unit tests for success/failure paths, usage accumulation, and key uniqueness.

Based on learnings: Added or changed logic must have unit tests and Use Vitest for unit tests and Playwright for E2E tests.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@batch/lib/llm-classify.ts` around lines 149 - 214, Add Vitest unit tests for
batchClassifyPRs covering concurrency, failure accounting, and billing math:
mock the GoogleGenAI or stub classifySinglePR to produce controlled successes
and failures so you can assert successCount/errorCount, that results Map keys
are unique per PR.number, and that totalUsage and estimatedCost reflect
accumulated token usage; include tests for batching behavior (use small
concurrency like 2 with a larger PR array and delays stubbed or zero) and for
handling thrown errors to ensure logger path increments errorCount without
stopping the batch. Target the batchClassifyPRs function, mock or replace
classifySinglePR/GoogleGenAI responses to return predictable usage objects, and
assert final return values (results, totalUsage, estimatedCost, successCount,
errorCount) for both all-success and mixed-success/failure scenarios.

Comment on lines +41 to +47
files(first: 100) {
nodes {
path
additions
deletions
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "1) Inspect files query blocks for pagination metadata:"
rg -n "files\(first:\s*100\)|pageInfo" batch/provider/github/fetcher.ts -C3

echo
echo "2) Check downstream usage of pr.files and any truncation flag handling:"
rg -n "\bpr\.files\b|needsMoreFiles|hasNextPage" batch app -g '!**/*.test.ts' -C2

Repository: coji/upflow

Length of output: 20967


🏁 Script executed:

#!/bin/bash
# Check the specific line ranges mentioned
echo "Lines 473-480:"
sed -n '473,480p' batch/provider/github/fetcher.ts | cat -n

echo
echo "Lines 809-816:"
sed -n '809,816p' batch/provider/github/fetcher.ts | cat -n

echo
echo "Checking if files query appears in any of these ranges:"
sed -n '470,485p' batch/provider/github/fetcher.ts | grep -n "files\|pageInfo"
echo "---"
sed -n '805,820p' batch/provider/github/fetcher.ts | grep -n "files\|pageInfo"

Repository: coji/upflow

Length of output: 941


files(first: 100) lacks pagination state unlike all other paginated fields.

The GraphQL queries request files(first: 100) without exposing pageInfo.hasNextPage, making it impossible for downstream code to detect truncation for PRs over 100 files. All other paginated fields (commits, reviews, comments, reviewThreads) include pagination metadata and expose corresponding needsMore* flags, but files does not. The classification logic in batch/lib/llm-classify.ts uses pr.files directly without truncation checks.

Add pageInfo { hasNextPage } to the files query blocks, implement pagination loops to fetch all file pages, and expose a needsMoreFiles flag to match the pattern used for other paginated data.

Also applies to: 234-240, 473-480, 809-816

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@batch/provider/github/fetcher.ts` around lines 41 - 47, The files field in
the GraphQL queries is missing pagination metadata and handling; update each
files(first: 100) block to include pageInfo { hasNextPage, endCursor } and
implement a pagination loop in the pull-request fetcher (e.g., the function that
builds pr.files / fetchPullRequest or getPullRequestFiles) to repeatedly query
using after: endCursor until hasNextPage is false, aggregating nodes into
pr.files; add a boolean needsMoreFiles on the PR object (set true if any page
reports hasNextPage) to mirror the other needsMore* flags and ensure
batch/lib/llm-classify.ts uses pr.needsMoreFiles when deciding about truncation.

- Use composite key (repositoryId#number) for PR classification results
  to prevent cross-repo collisions
- Wrap crawl job in try/finally to guarantee cache clear on failure
- Guard JSON.parse in classify-pull-requests to skip malformed payloads
- Update docs to reflect grouped bar chart and LLM classification

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
docs/review-bottleneck-dashboard.md (2)

142-155: ⚠️ Potential issue | 🟡 Minor

Add language identifier to fenced code block (MD040).

The file structure fence should specify text as the language identifier to satisfy markdownlint.

🛠️ Proposed fix
-```
+```text
 app/routes/$orgSlug/reviews/
 ├── index.tsx                    # ページ + loader + clientLoader(集計処理)
 ├── +functions/
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/review-bottleneck-dashboard.md` around lines 142 - 155, The fenced code
block containing the file tree (starting with the line
"app/routes/$orgSlug/reviews/") is missing a language identifier; update the
opening fence to include "text" (i.e., change the opening ``` to ```text) so
markdownlint MD040 is satisfied and the file-tree block is marked as plain text.

38-64: ⚠️ Potential issue | 🟡 Minor

Add language identifier to fenced code block (MD040).

The ASCII diagram fence should specify text as the language identifier to satisfy markdownlint.

🛠️ Proposed fix
-```
+```text
 ┌────────────┬──────────────────────────────────────────────┐
 │ Sidebar    │                                              │
 │            │  Review Bottleneck          [Team▼] [3M▼]   │
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/review-bottleneck-dashboard.md` around lines 38 - 64, The fenced code
block containing the ASCII diagram is missing a language identifier; update the
opening fence (the triple backticks before the ASCII art in the Review
Bottleneck diagram) to include the language tag "text" (i.e., change ``` to
```text) so markdownlint rule MD040 is satisfied and the block is explicitly
marked as plain text.
🧹 Nitpick comments (1)
batch/usecases/classify-pull-requests.ts (1)

96-112: Consider batching DB updates for large PR sets.

Sequential await in the loop issues one update per PR. For organizations with many unclassified PRs, this could be slow.

♻️ Optional: batch updates in a transaction
+  await tenantDb.transaction().execute(async (trx) => {
     for (const pr of prInputs) {
       const classification = result.results.get(`${pr.repositoryId}#${pr.number}`)
       if (!classification) continue

-      await tenantDb
+      await trx
         .updateTable('pullRequests')
         .set({
           complexity: classification.complexity,
           complexityReason: classification.reason,
           riskAreas: JSON.stringify(classification.risk_areas),
           classifiedAt: now,
           classifierModel: 'gemini-2.5-flash-lite',
         })
         .where('number', '=', pr.number)
         .where('repositoryId', '=', pr.repositoryId)
         .execute()
     }
+  })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@batch/usecases/classify-pull-requests.ts` around lines 96 - 112, The loop
currently awaits each DB update serially (prInputs and the
tenantDb.updateTable('pullRequests') block), which is slow for many PRs; instead
collect updates and run them in controlled batches or in a transaction: for each
pr in prInputs build the update payload (use classification from
result.results.get(`${pr.repositoryId}#${pr.number}`)), then either 1) group
payloads into chunks and execute parallel update promises per chunk
(Promise.all) to limit concurrency, or 2) open a tenantDb transaction and
perform the updates inside it (using the same updateTable('pullRequests')
.set(...).where(...).execute() calls) to reduce round-trips; ensure you preserve
fields complexity, complexityReason, riskAreas (JSON.stringify), classifiedAt,
and classifierModel when constructing the batched updates and handle errors
per-batch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@docs/review-bottleneck-dashboard.md`:
- Around line 142-155: The fenced code block containing the file tree (starting
with the line "app/routes/$orgSlug/reviews/") is missing a language identifier;
update the opening fence to include "text" (i.e., change the opening ``` to
```text) so markdownlint MD040 is satisfied and the file-tree block is marked as
plain text.
- Around line 38-64: The fenced code block containing the ASCII diagram is
missing a language identifier; update the opening fence (the triple backticks
before the ASCII art in the Review Bottleneck diagram) to include the language
tag "text" (i.e., change ``` to ```text) so markdownlint rule MD040 is satisfied
and the block is explicitly marked as plain text.

---

Nitpick comments:
In `@batch/usecases/classify-pull-requests.ts`:
- Around line 96-112: The loop currently awaits each DB update serially
(prInputs and the tenantDb.updateTable('pullRequests') block), which is slow for
many PRs; instead collect updates and run them in controlled batches or in a
transaction: for each pr in prInputs build the update payload (use
classification from result.results.get(`${pr.repositoryId}#${pr.number}`)), then
either 1) group payloads into chunks and execute parallel update promises per
chunk (Promise.all) to limit concurrency, or 2) open a tenantDb transaction and
perform the updates inside it (using the same updateTable('pullRequests')
.set(...).where(...).execute() calls) to reduce round-trips; ensure you preserve
fields complexity, complexityReason, riskAreas (JSON.stringify), classifiedAt,
and classifierModel when constructing the batched updates and handle errors
per-batch.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b9407d and a91adc0.

📒 Files selected for processing (4)
  • batch/jobs/crawl.ts
  • batch/lib/llm-classify.ts
  • batch/usecases/classify-pull-requests.ts
  • docs/review-bottleneck-dashboard.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • batch/jobs/crawl.ts

Click any chart bar to open a side panel showing the individual PRs
behind that data point, with direct links to GitHub. All filtering
happens client-side for instant response.

- Add PR detail fields (url, repo, title) to WIP and size queries
- Add getReviewerQueuePRs query for reviewer drill-down
- Create shared PRDrillDownSheet component
- Add onClick handlers to all 3 chart components
- Add computeWipLabels helper for WIP drill-down filtering
- Validate periodMonths against allowed values

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
app/routes/$orgSlug/reviews/+functions/aggregate.ts (1)

49-64: Avoid recomputing WIP with duplicated O(n²) scans.

aggregateWipCycle and computeWipLabels each rerun the same nested overlap predicate per PR. This is costly in client-side execution and risks chart/drill-down divergence over time.

♻️ Suggested consolidation
+function buildWipRows(data: WipRawRow[]) {
+  return data
+    .filter((d) => d.reviewTime != null && d.reviewTime > 0)
+    .map((pr) => {
+      const wipCount = data.filter(
+        (other) =>
+          other.author === pr.author &&
+          other.pullRequestCreatedAt <= pr.pullRequestCreatedAt &&
+          (other.mergedAt === null || other.mergedAt > pr.pullRequestCreatedAt) &&
+          (other.number !== pr.number || other.repositoryId !== pr.repositoryId),
+      ).length
+      return { pr, wipCount }
+    })
+}
+
 export function aggregateWipCycle(data: WipRawRow[]): WipAggregation {
-  const prsWithWip = data
-    .filter((d) => d.reviewTime !== null && d.reviewTime > 0)
-    .map((pr) => {
-      const wipCount = data.filter(
-        (other) =>
-          other.author === pr.author &&
-          other.pullRequestCreatedAt <= pr.pullRequestCreatedAt &&
-          (other.mergedAt === null ||
-            other.mergedAt > pr.pullRequestCreatedAt) &&
-          (other.number !== pr.number ||
-            other.repositoryId !== pr.repositoryId),
-      ).length
-      return { wipCount, reviewTimeHours: (pr.reviewTime as number) * 24 }
-    })
+  const prsWithWip = buildWipRows(data).map(({ pr, wipCount }) => ({
+    wipCount,
+    reviewTimeHours: (pr.reviewTime as number) * 24,
+  }))
   ...
 }
 
 export function computeWipLabels(
   data: WipRawRow[],
 ): (WipRawRow & { wipLabel: string })[] {
-  return data
-    .filter((d) => d.reviewTime !== null && d.reviewTime > 0)
-    .map((pr) => {
-      const wipCount = data.filter(
-        (other) =>
-          other.author === pr.author &&
-          other.pullRequestCreatedAt <= pr.pullRequestCreatedAt &&
-          (other.mergedAt === null ||
-            other.mergedAt > pr.pullRequestCreatedAt) &&
-          (other.number !== pr.number ||
-            other.repositoryId !== pr.repositoryId),
-      ).length
-      return { ...pr, wipLabel: getWipLabel(wipCount) }
-    })
+  return buildWipRows(data).map(({ pr, wipCount }) => ({
+    ...pr,
+    wipLabel: getWipLabel(wipCount),
+  }))
 }

Also applies to: 111-127

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/routes/`$orgSlug/reviews/+functions/aggregate.ts around lines 49 - 64,
aggregateWipCycle (and computeWipLabels) currently recompute the same O(n²)
overlap predicate per PR; instead, group rows by author, sort each author's PRs
by pullRequestCreatedAt, then compute WIP counts with a linear sweep/two-pointer
over that sorted list (advance an "active" pointer to exclude PRs merged before
the current PR's createdAt and count remaining open PRs while excluding the PR
itself), producing the same output shape ({wipCount, reviewTimeHours}) for
aggregateWipCycle and using the same per-author precomputed counts in
computeWipLabels to avoid duplicate work; update aggregateWipCycle to build and
reuse the per-author map and replace the nested data.filter(...) scan with the
linear sweep, and change computeWipLabels to read from that map instead of
recomputing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/routes/`$orgSlug/reviews/+components/wip-cycle-chart.tsx:
- Around line 111-117: The SVG bars (Cell components rendered in groups.map)
only open the PR sheet via onClick, leaving keyboard users unable to trigger
drill-down; make each <Cell> focusable and operable by keyboard by adding
tabIndex={0}, role="button" and an appropriate aria-label (e.g., including
entry.label and count), and implement an onKeyDown handler that calls
setSelectedLabel(entry.label) when Enter or Space is pressed; apply the same
changes to the other Cell instances mentioned (the ones around the 127-130
region) so both mouse and keyboard users can open the PR sheet.

In `@app/routes/`$orgSlug/reviews/+functions/queries.server.ts:
- Around line 126-130: The query currently filters out PRs by requiring
pullRequests.mergedAt to be non-null and >= sinceDate, which removes open or
long-lived PRs needed for accurate WIP reconstruction; update the query that
builds the DB call (the chain using .where('pullRequests.mergedAt', ...) and the
$if(teamId != null, ...)) to stop excluding mergedAt-null or
merged-at-outside-window records — e.g., remove the
.where('pullRequests.mergedAt', 'is not', null) and
.where('pullRequests.mergedAt', '>=', sinceDate) conditions (or replace them
with an inclusive condition that allows mergedAt IS NULL) so the result set
includes open and older PRs while keeping the existing pullRequests.reviewTime
and pullRequests.author filters intact.

---

Nitpick comments:
In `@app/routes/`$orgSlug/reviews/+functions/aggregate.ts:
- Around line 49-64: aggregateWipCycle (and computeWipLabels) currently
recompute the same O(n²) overlap predicate per PR; instead, group rows by
author, sort each author's PRs by pullRequestCreatedAt, then compute WIP counts
with a linear sweep/two-pointer over that sorted list (advance an "active"
pointer to exclude PRs merged before the current PR's createdAt and count
remaining open PRs while excluding the PR itself), producing the same output
shape ({wipCount, reviewTimeHours}) for aggregateWipCycle and using the same
per-author precomputed counts in computeWipLabels to avoid duplicate work;
update aggregateWipCycle to build and reuse the per-author map and replace the
nested data.filter(...) scan with the linear sweep, and change computeWipLabels
to read from that map instead of recomputing.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a91adc0 and d810169.

📒 Files selected for processing (8)
  • app/routes/$orgSlug/reviews/+components/pr-drill-down-sheet.tsx
  • app/routes/$orgSlug/reviews/+components/pr-size-chart.tsx
  • app/routes/$orgSlug/reviews/+components/reviewer-queue-chart.tsx
  • app/routes/$orgSlug/reviews/+components/wip-cycle-chart.tsx
  • app/routes/$orgSlug/reviews/+functions/aggregate.test.ts
  • app/routes/$orgSlug/reviews/+functions/aggregate.ts
  • app/routes/$orgSlug/reviews/+functions/queries.server.ts
  • app/routes/$orgSlug/reviews/index.tsx
🚧 Files skipped from review as they are similar to previous changes (3)
  • app/routes/$orgSlug/reviews/+functions/aggregate.test.ts
  • app/routes/$orgSlug/reviews/+components/reviewer-queue-chart.tsx
  • app/routes/$orgSlug/reviews/+components/pr-size-chart.tsx

Comment on lines +111 to +117
{groups.map((entry) => (
<Cell
key={entry.label}
fill={getBarColor(entry.label)}
className="cursor-pointer"
onClick={() => setSelectedLabel(entry.label)}
/>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make drill-down accessible without a mouse.

Opening the PR sheet is currently tied to clicking SVG bars only. Keyboard users have no equivalent trigger.

♿ Suggested accessible fallback
-        <div className="text-muted-foreground mt-1 flex justify-around text-xs">
+        <div className="text-muted-foreground mt-1 flex flex-wrap justify-around gap-2 text-xs">
           {groups.map((g) => (
-            <span key={g.label}>n={g.count}</span>
+            <button
+              key={g.label}
+              type="button"
+              className="hover:underline"
+              onClick={() => setSelectedLabel(g.label)}
+              aria-label={`Open PR drill-down for ${g.label}`}
+            >
+              {g.label} · n={g.count}
+            </button>
           ))}
         </div>

Also applies to: 127-130

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/routes/`$orgSlug/reviews/+components/wip-cycle-chart.tsx around lines 111
- 117, The SVG bars (Cell components rendered in groups.map) only open the PR
sheet via onClick, leaving keyboard users unable to trigger drill-down; make
each <Cell> focusable and operable by keyboard by adding tabIndex={0},
role="button" and an appropriate aria-label (e.g., including entry.label and
count), and implement an onKeyDown handler that calls
setSelectedLabel(entry.label) when Enter or Space is pressed; apply the same
changes to the other Cell instances mentioned (the ones around the 127-130
region) so both mouse and keyboard users can open the PR sheet.

coji and others added 20 commits March 1, 2026 16:45
… handling

- Add Refresh (Re-fetch & Save) button that fetches latest data from GitHub,
  saves raw data, rebuilds cycle time metrics, and upserts to DB
- Split action into intent-based dispatch (compare / refresh) using ts-pattern
- Show processed pullRequestReviews and pullRequestReviewers in PR detail page
- Fix NOT NULL constraint on pullRequestReviewers by filtering null logins
- Add adaptive page size reduction for GitHub GraphQL 502/504 timeouts
- Extract handleGraphQLError helper for consistent retry logic

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Store all PR timeline events (ReviewRequested, ReadyForReview, ConvertToDraft,
Merged, Closed, etc.) as raw JSON in github_raw_data.timeline_items column.
This enables future analysis without re-fetching from GitHub API.

- Add timeline_items TEXT NULL column to github_raw_data
- Extend GraphQL query to fetch 10 event types instead of only ReviewRequestedEvent
- Extract buildRequestedAtMap() as a reusable pure function
- Save timeline items in both incremental and full-refresh fetch paths
- Add timeline items to PR detail page (compare + refresh)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add author avatar, PR size, complexity reason to drill-down sheet
- Replace reviewer-queue-chart with queue-trend-chart for time-series view
- Extend classify/aggregate to include complexity and risk area fields
- Change reviewers type from string[] to {login, requestedAt}[] for richer data
- Update queries to fetch reviewer requested_at and PR classification data
- Expand seed data with realistic review/classification scenarios

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GitHub GraphQL API sometimes returns 200 OK with null repository
on deep pagination (timeout). Previously this silently broke the
loop and returned partial data. Now both pullrequests() and
pullrequestsWithDetails() reduce page size and retry, matching the
existing behavior for 502/504 errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add pageInfo.hasNextPage to nested timelineItems query and fall back
to individual fetcher.timelineItems() when the 50-item limit is exceeded.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove `files(first: 100)` from GraphQL PR list/details queries to
avoid 504 timeouts caused by heavy nested data (100 PRs × 100 files).
Files are now fetched individually via REST API `pulls.listFiles`.
Incremental path parallelizes all 5 detail fetches with Promise.all.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fetch was mutating PR reviewers' requestedAt before saving raw data,
making raw data inconsistent between incremental and full-refresh paths.
Now requestedAt enrichment happens in buildPullRequests (analyze phase)
using timeline_items from the store, keeping fetch as pure data capture.

Also adds docs/crawler-pipeline-architecture.md documenting future
plans for crawler/pipeline scaling (dbt-like layers, job distribution,
Turso migration).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document webhook-driven real-time data sync, PR size labeling on
each push, and the transition from PAT-based polling to GitHub App
event-driven architecture. Includes webhook event matrix, size
classification thresholds, and job queue design for webhook processing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move GitHub App/Webhook to Phase 1 (before infrastructure changes)
since it provides immediate user value with zero impact on existing
pipeline. Phase 1a (size labels) can start today - it only needs a
webhook endpoint and GitHub API calls, no DB changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GitHub App installation replaces both PAT management and manual
repository registration. Installation access tokens eliminate personal
token dependency, and installation_repositories webhook keeps repo
list in sync automatically. Onboarding simplifies from 5 steps to 2.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Filter out undefined/null logins before upsertCompanyGithubUsers
- Add defensive filter(Boolean) in upsertCompanyGithubUsers itself
- Wrap spreadsheet export in try-catch to prevent process crash
- Include repositoryId in LLM classification failure log messages

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Retry up to 3 times (1s, 2s, 4s delays) when Gemini API returns
transient errors like 503. Failed PRs remain unclassified and will
be retried on the next upsert run.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add closed_at column to pull_requests table (schema + migration)
- Fetch closedAt from GitHub GraphQL API and propagate through
  model → row builder → upsert pipeline
- Exclude closed (non-merged) PRs from review queue trend aggregation
- Replace nested GraphQL query with per-PR detail fetching for --refresh
  (fixes timeout on large repos)
- Add request timeout (30s) to all GitHub GraphQL calls
- Add WAL checkpoint after batch upsert to prevent WAL bloat
- Fix pull-db to checkpoint remote DBs and pull WAL/SHM files
  (prevents orphan index corruption from incomplete WAL state)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Timeout or transient errors on individual PR detail fetches now log a
warning and skip the PR instead of crashing the entire batch job.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After migrations add new columns, `backfill` + `upsert` can populate
them from GitHub without a full `fetch --refresh`. This re-fetches only
PR list metadata (closedAt, etc.) and updates raw data in seconds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace synchronous 1-at-a-time API calls with batch prediction for
50% cost reduction and rate limit avoidance. Extract buildPrompt(),
collect all PRs into a single batch job, and poll for completion.

Also fix lefthook lint hook failing on biome-ignored files by adding
--no-errors-on-unmatched flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add gemini-3-prompting-guide.md as reference for LLM prompt authoring.
Add CLAUDE.md instruction to consult it when editing prompts.
Exclude lab/classify/data/ from version control.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add body to ShapedGitHubPullRequest model and both GraphQL queries
so it can be used for classifier context (release/merge PR detection).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…text

Rewrite SYSTEM_INSTRUCTION following Gemini prompting guide. Use XML
tags for PR data, add body/branch info for release/merge PR detection,
and add explicit decision procedure for cognitive load assessment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
evaluate.ts compares golden labels against current classifier output.
test-one.ts runs classification on a single PR for quick iteration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

🧹 Nitpick comments (13)
docs/crawler-pipeline-architecture.md (2)

185-186: Avoid synchronous webhook business logic in request path.

Line 185 documents synchronous execution in the webhook handler, while Lines 317-318 and Lines 336-337 advocate immediate ACK + queued processing. Even “1 PR only” can exceed webhook timing budgets under retries/latency spikes; prefer queue-first as the default path in all phases.

Also applies to: 317-318, 336-337

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/crawler-pipeline-architecture.md` around lines 185 - 186, Update the
architecture doc to remove the recommendation to execute business logic
synchronously in the webhook request path (the "Webhook handler 内で同期実行"
statement) and instead standardize on the queue-first pattern: immediately ACK
the webhook and enqueue work for asynchronous processing; replace the
synchronous wording at the current Webhook handler mention and align the text at
the sections referenced (lines 317-318 and 336-337) to emphasize ACK+queued
processing as the default, note retry/timing budget concerns, and adjust any
examples that claim "1 PR only" to state that queue-first avoids latency/retry
spikes.

8-19: Add language identifiers to fenced code blocks.

markdownlint MD040 is firing on these blocks. Please annotate each fence (for example text or sql) to keep docs lint-clean.

Also applies to: 33-62, 73-93, 166-172, 249-256, 277-288, 310-331, 376-387, 391-397, 404-413

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/crawler-pipeline-architecture.md` around lines 8 - 19, The fenced code
blocks in the document (for example the block showing "job-scheduler",
"crawlJob()", "provider.fetch()", "analyzeAndUpsert()", "provider.analyze()",
etc.) need language identifiers to satisfy markdownlint MD040; update every
fenced block (including the other ranges noted) by appending an appropriate
language tag (e.g., ```text, ```sql, or ```bash) to the opening backticks so the
linter recognizes the language for each block.
batch/db/mutations.ts (1)

133-135: Trim logins before dedup/lowercase (Lines 133-135).

Whitespace-only or padded values can slip through current normalization.

Suggested fix
-  const uniqueLogins = [
-    ...new Set(logins.filter(Boolean).map((l) => l.toLowerCase())),
-  ]
+  const uniqueLogins = [
+    ...new Set(
+      logins
+        .map((l) => l.trim())
+        .filter((l) => l.length > 0)
+        .map((l) => l.toLowerCase()),
+    ),
+  ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@batch/db/mutations.ts` around lines 133 - 135, The current normalization for
computing uniqueLogins uses logins.filter(Boolean).map(l => l.toLowerCase()) and
misses trimming, allowing whitespace-only or padded entries; update the
normalization pipeline that produces uniqueLogins to trim each login first
(e.g., map l => l?.trim()), then filter out empty strings, then lowercase before
creating the Set so symbols referenced are the logins array and the uniqueLogins
constant in batch/db/mutations.ts.
batch/commands/backfill.ts (1)

24-26: Add explicit organization existence guard before integration access (Lines 24-26).

This makes not-found behavior deterministic and easier to debug.

Suggested fix
   const orgId = props.organizationId as OrganizationId
   const organization = await getOrganization(orgId)
+  invariant(organization, `organization not found: ${orgId}`)
   invariant(organization.integration, 'integration should related')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@batch/commands/backfill.ts` around lines 24 - 26, The code currently calls
getOrganization(orgId) and immediately accesses organization.integration via
invariant; add an explicit null/undefined guard for the organization returned by
getOrganization before touching organization.integration to make not-found
deterministic: check that the organization object exists (e.g., if
(!organization) throw or invariant with a clear message referencing orgId) and
only then assert presence of organization.integration; update the error message
to clearly identify missing organization vs missing integration and reference
getOrganization, organization, organization.integration and orgId so maintainers
can locate the check.
db/tenant.sql (1)

75-79: Constrain complexity to valid labels at schema level (Lines 75-79).

This prevents invalid values from creeping into analytics.

Suggested fix
-  `complexity` text NULL,
+  `complexity` text NULL CHECK (`complexity` IN ('XS','S','M','L','XL') OR `complexity` IS NULL),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@db/tenant.sql` around lines 75 - 79, Replace the freeform complexity column
with a schema-level constraint: modify the `complexity` column definition in
db/tenant.sql to include a CHECK that limits values to the allowed labels (e.g.,
CHECK (complexity IN ('low','medium','high','unknown'))) while keeping NULL if
you want optional values; ensure the migration/schema change updates any
INSERT/UPDATE paths that write `complexity` so they use only the allowed labels
and add/update a named constraint (e.g., CONSTRAINT chk_tenant_complexity) for
clarity and maintainability.
app/routes/$orgSlug/reviews/+components/pr-size-chart.tsx (3)

158-165: Unnecessary type assertion.

SizeTimeItem.size is already typed as PRSizeLabel in the interface definition. The as PRSizeLabel casts on lines 161 and 163 are redundant.

♻️ Proposed fix
                 {timeData.map((entry) => (
                   <Cell
                     key={entry.size}
-                    fill={SIZE_COLORS[entry.size as PRSizeLabel]}
+                    fill={SIZE_COLORS[entry.size]}
                     className="cursor-pointer"
-                    onClick={() => handleSizeClick(entry.size as PRSizeLabel)}
+                    onClick={() => handleSizeClick(entry.size)}
                   />
                 ))}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/routes/`$orgSlug/reviews/+components/pr-size-chart.tsx around lines 158 -
165, The map over timeData is using unnecessary type assertions for
SizeTimeItem.size; remove the redundant "as PRSizeLabel" casts in the JSX (both
in the key/fill access and the onClick call) so Cell's key, fill
(SIZE_COLORS[...]) and onClick (handleSizeClick(...)) use the already-typed size
directly; look for the timeData.map(...) block, the Cell component, SIZE_COLORS,
handleSizeClick, and PRSizeLabel to make this change.

48-52: Consider extracting formatHours to a shared utility.

This helper appears in multiple files (pr-drill-down-sheet.tsx, aggregate.ts). Consolidating it would reduce duplication.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/routes/`$orgSlug/reviews/+components/pr-size-chart.tsx around lines 48 -
52, The local helper function formatHours should be extracted to a single shared
utility and imported where needed: create an exported function with the same
signature (formatHours(h: number): string) in a common utilities module, remove
the duplicate implementations from the components (including the current
pr-size-chart component and the other occurrences in pr-drill-down-sheet and
aggregate), and update those files to import the shared formatHours; ensure the
behavior/formatting stays identical and update any imports/exports accordingly.

77-92: Consider memoizing selectedPRs computation.

This filter/map runs on every render. While it's guarded by selectedSize, memoizing with useMemo would avoid recomputation when other state changes.

♻️ Proposed refactor
-  const selectedPRs = selectedSize
-    ? rawData
-        .filter((pr) => getPRComplexity(pr) === selectedSize)
-        .map((pr) => ({
-          number: pr.number,
-          title: pr.title,
-          url: pr.url,
-          repo: pr.repo,
-          author: pr.author,
-          authorDisplayName: pr.authorDisplayName,
-          reviewTime: pr.reviewTime,
-          size: getPRComplexity(pr),
-          complexityReason: pr.complexityReason,
-          riskAreas: pr.riskAreas,
-        }))
-    : []
+  const selectedPRs = useMemo(() => {
+    if (!selectedSize) return []
+    return rawData
+      .filter((pr) => getPRComplexity(pr) === selectedSize)
+      .map((pr) => ({
+        number: pr.number,
+        title: pr.title,
+        url: pr.url,
+        repo: pr.repo,
+        author: pr.author,
+        authorDisplayName: pr.authorDisplayName,
+        reviewTime: pr.reviewTime,
+        size: getPRComplexity(pr),
+        complexityReason: pr.complexityReason,
+        riskAreas: pr.riskAreas,
+      }))
+  }, [rawData, selectedSize])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/routes/`$orgSlug/reviews/+components/pr-size-chart.tsx around lines 77 -
92, The selectedPRs array is recomputed on every render; wrap its computation in
React's useMemo to memoize results based on rawData and selectedSize so it only
recalculates when those change. Replace the current const selectedPRs = ... with
a useMemo callback that performs the filter/map using getPRComplexity(pr) and
returns the same shape, and pass [rawData, selectedSize] (and getPRComplexity if
it's not stable) as the dependency array to ensure correctness in the
pr-size-chart component.
app/routes/$orgSlug/settings/repositories/$repository/$pull/index.tsx (1)

124-134: Unsafe type assertion through unknown.

The double cast shapedPr as unknown as ShapedGitHubPullRequest bypasses type checking. Since this data comes from JSON stored in the database, consider adding runtime validation.

♻️ Suggested approach

Consider using a Zod schema or type guard to validate the shape at runtime:

import { z } from 'zod'

const ShapedPRSchema = z.object({
  number: z.number(),
  // ... other required fields
}).passthrough()

// Then in the action:
const pr = ShapedPRSchema.parse(shapedPr)

Alternatively, if the data is trusted, add a comment explaining why the assertion is safe.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/routes/`$orgSlug/settings/repositories/$repository/$pull/index.tsx around
lines 124 - 134, The code unsafely casts shapedPr to ShapedGitHubPullRequest
using a double cast; replace this with runtime validation instead of bypassing
the type system: define a ShapedPRSchema (or a type guard) that validates the
required fields for ShapedGitHubPullRequest, then call
ShapedPRSchema.parse(shapedPr) (or run the guard and throw a 400/404 on failure)
before assigning to pr; update the code around getShapedPullRequest and variable
pr to use the validated value (or add a clear comment only if you decide to
trust the stored JSON).
app/routes/$orgSlug/reviews/+functions/aggregate.ts (3)

55-104: O(n²) complexity in WIP computation may impact performance.

The nested filter (lines 60-68) runs for each PR, computing WIP by iterating all PRs. For large datasets, this becomes expensive:

  • 1,000 PRs → ~1M comparisons
  • 10,000 PRs → ~100M comparisons

Since this runs in clientLoader, it affects page load time.

♻️ Optimization approach

Pre-compute WIP counts in a single pass by sorting PRs by creation time and maintaining an active set per author:

function computeWipCounts(data: WipRawRow[]): Map<string, number> {
  const wipMap = new Map<string, number>()
  
  // Sort by creation time
  const sorted = [...data].sort((a, b) => 
    a.pullRequestCreatedAt.localeCompare(b.pullRequestCreatedAt)
  )
  
  // Track active PRs per author
  const activeByAuthor = new Map<string, Set<string>>()
  
  for (const pr of sorted) {
    const key = `${pr.repositoryId}:${pr.number}`
    const active = activeByAuthor.get(pr.author) ?? new Set()
    
    // WIP count at creation = active PRs for this author
    wipMap.set(key, active.size)
    
    // Add this PR to active set
    active.add(key)
    activeByAuthor.set(pr.author, active)
  }
  
  return wipMap
}

This reduces complexity to O(n log n) for the sort plus O(n) for the computation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/routes/`$orgSlug/reviews/+functions/aggregate.ts around lines 55 - 104,
The nested filter inside aggregateWipCycle (computing wipCount per PR) is O(n²);
replace it by precomputing WIP counts in one pass: sort WipRawRow[] by
pullRequestCreatedAt, iterate in chronological order and maintain an active set
(Map author -> Set<repoId:number key>) to record the number of active PRs for
each PR at creation, storing results in a wipMap keyed by
`${repositoryId}:${number}`; then build prsWithWip by mapping the original rows
to { wipCount: wipMap.get(key), reviewTimeHours: ... } instead of recalculating
via data.filter for each PR, which reduces complexity to O(n log n) for sort +
O(n) for counting.

117-134: Duplicate WIP computation logic.

computeWipLabels duplicates the O(n²) WIP calculation from aggregateWipCycle. Consider extracting a shared helper that computes WIP counts once and reuses them.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/routes/`$orgSlug/reviews/+functions/aggregate.ts around lines 117 - 134,
computeWipLabels duplicates the O(n²) WIP counting logic found in
aggregateWipCycle; extract that logic into a shared helper (e.g.,
computeWipCounts or buildWipCountsMap) that takes the WipRawRow[] and returns a
lookup (Map or object) keyed by a unique PR id (combine number+repositoryId or
use object identity) mapping to the computed wipCount, then change
computeWipLabels to call that helper and map each row to { ...pr, wipLabel:
getWipLabel(wipCount) } and update aggregateWipCycle to reuse the same helper so
the heavy counting is done once.

9-22: formatHours is duplicated across multiple files.

This helper appears in pr-size-chart.tsx, pr-drill-down-sheet.tsx, and here. Consider extracting to a shared utility module.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/routes/`$orgSlug/reviews/+functions/aggregate.ts around lines 9 - 22, The
formatHours helper is duplicated; extract it into a shared utility (e.g., export
function formatHours(h: number): string) and replace the local implementations
with imports from that module in the files that currently duplicate it (the
modules containing formatHours such as pr-size-chart.tsx,
pr-drill-down-sheet.tsx, and the aggregate function file where median and
formatHours live); ensure the new util is exported with the exact name
formatHours and update all call sites to import { formatHours } from the new
util, keeping median local if not reused.
batch/provider/github/store.ts (1)

66-80: Consider batch updating for better performance.

The current implementation issues one UPDATE per PR. For repositories with many PRs, this could be slow. A batch approach using CASE WHEN or multiple INSERT ... ON CONFLICT DO UPDATE could improve performance.

That said, the current implementation is correct and simple for the backfill use case.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@batch/provider/github/store.ts` around lines 66 - 80, The updatePrMetadata
function issues one UPDATE per PR; change it to perform a single batch update to
improve performance by building one query against githubRawData that updates
pullRequest based on pullRequestNumber for the given repositoryId—either
construct a single UPDATE with a CASE WHEN mapping pr.number ->
JSON.stringify(pr) or perform a multi-row INSERT ... ON CONFLICT (repositoryId,
pullRequestNumber) DO UPDATE SET pullRequest = EXCLUDED.pullRequest; ensure you
still filter by repositoryId, serialize each pr with JSON.stringify, and return
the count of rows updated as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/routes/`$orgSlug/reviews/index.tsx:
- Line 49: teamParam is user-controlled and is being used directly to build
cache keys which can create high-cardinality keys; validate and normalize it
before using as a cache key dimension by: enforce an allowlist or lookup (map
teamParam to a known teamId or "all" fallback), constrain length and characters
via a strict regex, or bucket/map unknown values to a small fixed set (or use a
stable hash truncated to N chars) and then use that normalized value (e.g.,
validatedTeam) when composing the cache key instead of raw teamParam; apply the
same validation/normalization wherever teamParam is used to build cache keys.
- Line 21: Replace the relative import of listTeams with the app root path-alias
import: locate the import statement that references
'../settings/teams._index/queries.server' in the reviews route and change it to
use the '~/...' alias (importing listTeams from the corresponding queries.server
module under the app routes tree); this keeps the symbol listTeams and the
module queries.server but switches the path to the '~/...' prefix so route moves
follow the repository convention.

In `@batch/cli.ts`:
- Around line 17-19: The CLI handler currently invokes backfillCommand({
organizationId: argv._.organizationId }) without awaiting or returning its
promise; update the handler (the arrow function handling argv) to either be
async and await backfillCommand(...) or to return the promise so the process can
observe completion and failures, ensuring errors propagate and the CLI waits for
backfillCommand to finish.

In `@batch/db/mutations.ts`:
- Around line 98-104: The current dedup logic using seen/uniqueReviewers loses
richer requestedAt information by keeping the first occurrence; replace it with
a fold that builds a Map keyed by r.login (from the reviewers array) and for
each reviewer compare requestedAt (treat null/undefined as older) to keep the
reviewer with the newest/non-null requestedAt, then return
Array.from(map.values()) instead of the seen/filter approach; update the code
around reviewers, seen and uniqueReviewers to use this map-based dedupe so
duplicates are merged preserving the most recent requestedAt.

In `@batch/lib/llm-classify.ts`:
- Around line 227-240: The loop stride uses the variable concurrency (used in
the for loop that slices prs: for (let i = 0; i < prs.length; i +=
concurrency)), which can be 0 or negative if options.concurrency is set
incorrectly; validate or normalize concurrency before the loop (e.g., set
concurrency = Math.max(1, options?.concurrency ?? 5) or throw a clear error) so
the loop always progresses and cannot infinite-loop; update any places
referencing concurrency accordingly.
- Around line 132-137: The prompt builds fileList from pr.files and currently
includes every file which can blow up tokens for large PRs; limit the number of
files included (e.g., take first N entries from pr.files) and indicate omitted
file count/summary so callers know there were more (for example use
pr.files.slice(0, N) to build fileList and append a line like "(+X more files
omitted)"); update any references to fileList in llm-classify to use the
truncated version and pick a sensible default N (e.g., 10–20) or make it
configurable.
- Around line 148-156: The prompt string built in variable prompt injects raw PR
fields (e.g., pr.title, pr.author, branchesTag, descriptionTag, fileList and any
other interpolated values) which can contain XML-special characters; add an
XML-escaping helper (e.g., escapeXml) and call it for each interpolated value
before building prompt (use it for title, author, branch names, description,
file names and any dynamic tag content) so that <, >, &, ", ' are replaced with
safe entities; update the prompt construction in the function that builds prompt
in llm-classify.ts to use the escaped values everywhere.

In `@batch/provider/github/fetcher.ts`:
- Around line 85-90: The GetPullRequestTimeline GraphQL query and its handler
truncate timelineItems at first:100 with no pagination; update the query
signature to accept $cursor: String, add after: $cursor to the
timelineItems(...) call, and include pageInfo { hasNextPage endCursor } and
nodes (or items) as currently returned; then modify the function that executes
GetPullRequestTimeline (the function that currently performs a single request
for the pull request timeline) to loop using a cursor-based while loop (similar
to GetPullRequestCommitsQuery): call the query, append received items to the
accumulator, set cursor = pageInfo.endCursor, and continue while
pageInfo.hasNextPage, ensuring you pass the cursor param on subsequent requests
to fetch all pages.
- Around line 1146-1155: The code builds requestedAtMap from the initial
node.timelineItems?.nodes (first 50) which misses events when
needsMoreTimelineItems is true; update the logic so that before calling
buildRequestedAtMap you fetch the complete timeline via the timelineItems()
helper when needsMoreTimelineItems is true, then pass the fully populated nodes
(after applying shapeTimelineNodes) into buildRequestedAtMap; specifically
adjust the flow around prTimelineItems, replace/augment use of
node.timelineItems?.nodes with the result of timelineItems(node) when
needsMoreTimelineItems is set, and ensure buildRequestedAtMap(prTimelineItems)
uses that complete list.

In `@batch/usecases/analyze-and-upsert.ts`:
- Around line 94-96: The optional classification call
classifyPullRequests(orgId) should not block or abort the pipeline on error:
remove the top-level await and make the call fire-and-forget while
catching/logging errors (e.g., classifyPullRequests(orgId).catch(err =>
processLogger.error("classifyPullRequests failed", err))). Update the call site
where classifyPullRequests is invoked so failures are logged but do not
propagate, ensuring the rest of the pipeline continues.

In `@db/seed.ts`:
- Around line 1186-1190: The comparison that decides 'CHANGES_REQUESTED' is
inverted because it does ra.resolvedDaysAgo - ra.requestedDaysAgo > 3; change
the subtraction to requestedDaysAgo - resolvedDaysAgo so the condition reads
ra.requestedDaysAgo - ra.resolvedDaysAgo > 3 (keep the surrounding ternary logic
and 'APPROVED' results unchanged) so that cases where a review was resolved
later than requested by more than 3 days produce 'CHANGES_REQUESTED'.

In `@docs/gemini-3-prompting-guide.md`:
- Around line 74-78: The fenced code block containing the prompt snippet ("TASK:
Classify the following text. TEXT: ... CLASSES: positive, negative, neutral")
lacks a language identifier which triggers markdownlint MD040; update that
fenced block by adding a language/token identifier (e.g., "text" or
"textprompt") after the opening backticks so the block becomes fenced with a
language specifier and resolves the lint error.

In `@lab/classify/evaluate.ts`:
- Around line 251-261: The JSON.parse call for raw.pull_request should be
wrapped per-entry in a try/catch so a single malformed row doesn't abort the
entire evaluation; update the block that sets raw, rawPr and files to catch JSON
parse errors (for the raw variable from the DB query) and on parse failure log
the error and skip/continue that entry (or set rawPr = null and files = []), and
apply the same guarded parsing fix to the similar code around the rawPr handling
at the later occurrence (lines referenced near the second block).
- Line 7: The usage comment at the top of evaluate.ts incorrectly references
"lab/scripts/evaluate.ts"; update that string to the correct path
"lab/classify/evaluate.ts" wherever it appears in the file (search for the
usage/help comment in evaluate.ts) so the example command reflects the actual
location of the script.

In `@lab/classify/judge.ts`:
- Around line 273-279: The polling loop using TERMINAL_STATES and ai.batches.get
can hang indefinitely; add a hard timeout by introducing a MAX_POLL_MS (or
MAX_POLL_RETRIES) and record a start time before the while loop, then inside
each iteration check elapsed time (Date.now() - start) and if it exceeds the
limit, stop polling and fail fast (throw an error or set current.state to a
failed terminal state) so the function returns instead of waiting forever;
update references near POLL_INTERVAL_MS, current, job.name and ensure any
callers handle the thrown error or failure result.
- Around line 214-225: The PR payload embeds unescaped user content (pr.title,
pr.author, rawData.body via descriptionTag, rawData.fileList) into XML-like tags
which can break parsing; update judge.ts to escape XML entities (&, <, >, ", ')
before interpolation by adding/using a small helper (e.g., escapeXml) and apply
it to pr.title, pr.author, the description slice (rawData.body.slice(...)),
rawData.fileList and any branch names used in branchesTag so the returned string
preserves well-formed XML-like structure.

In `@lab/classify/test-one.ts`:
- Line 162: The test harness currently calls main().catch(console.error) which
logs errors but can leave the process exit code as 0; update the call site so
failures propagate a non‑zero exit code by catching the error from main() and
calling process.exit(1) (or rethrowing) after logging; modify the invocation of
main() in test-one.ts (the top-level main() call) to explicitly handle the
promise rejection and exit with code 1 on error.
- Around line 148-151: Guard the JSON.parse call so one malformed model response
doesn't crash the batch: wrap parsing of response.text (variable text) in a
try/catch around the JSON.parse(text) that assigns to result, and on error log
or record the parsing failure and set match (which compares result.complexity to
pr.currentLabel) to a safe fallback (e.g., '✗' or 'parse_error') so the loop
continues; update any downstream logic that expects result to handle the case
where parsing failed.

In `@ops/commands/pull-db.ts`:
- Around line 113-119: The tryPullFile function currently swallows all
exceptions from pullFile, hiding real transfer/auth/network errors; change
tryPullFile to only ignore expected "file not found" errors (e.g., ENOENT/404 or
a specific error type returned by pullFile) and for any other error either
rethrow or log it as an error so operators are alerted. Locate tryPullFile and
the pullFile call and implement conditional handling of the caught error
(inspect error.code/message or error instanceof SpecificError) to suppress only
the non-fatal missing-file cases and otherwise propagate or surface the failure.

---

Nitpick comments:
In `@app/routes/`$orgSlug/reviews/+components/pr-size-chart.tsx:
- Around line 158-165: The map over timeData is using unnecessary type
assertions for SizeTimeItem.size; remove the redundant "as PRSizeLabel" casts in
the JSX (both in the key/fill access and the onClick call) so Cell's key, fill
(SIZE_COLORS[...]) and onClick (handleSizeClick(...)) use the already-typed size
directly; look for the timeData.map(...) block, the Cell component, SIZE_COLORS,
handleSizeClick, and PRSizeLabel to make this change.
- Around line 48-52: The local helper function formatHours should be extracted
to a single shared utility and imported where needed: create an exported
function with the same signature (formatHours(h: number): string) in a common
utilities module, remove the duplicate implementations from the components
(including the current pr-size-chart component and the other occurrences in
pr-drill-down-sheet and aggregate), and update those files to import the shared
formatHours; ensure the behavior/formatting stays identical and update any
imports/exports accordingly.
- Around line 77-92: The selectedPRs array is recomputed on every render; wrap
its computation in React's useMemo to memoize results based on rawData and
selectedSize so it only recalculates when those change. Replace the current
const selectedPRs = ... with a useMemo callback that performs the filter/map
using getPRComplexity(pr) and returns the same shape, and pass [rawData,
selectedSize] (and getPRComplexity if it's not stable) as the dependency array
to ensure correctness in the pr-size-chart component.

In `@app/routes/`$orgSlug/reviews/+functions/aggregate.ts:
- Around line 55-104: The nested filter inside aggregateWipCycle (computing
wipCount per PR) is O(n²); replace it by precomputing WIP counts in one pass:
sort WipRawRow[] by pullRequestCreatedAt, iterate in chronological order and
maintain an active set (Map author -> Set<repoId:number key>) to record the
number of active PRs for each PR at creation, storing results in a wipMap keyed
by `${repositoryId}:${number}`; then build prsWithWip by mapping the original
rows to { wipCount: wipMap.get(key), reviewTimeHours: ... } instead of
recalculating via data.filter for each PR, which reduces complexity to O(n log
n) for sort + O(n) for counting.
- Around line 117-134: computeWipLabels duplicates the O(n²) WIP counting logic
found in aggregateWipCycle; extract that logic into a shared helper (e.g.,
computeWipCounts or buildWipCountsMap) that takes the WipRawRow[] and returns a
lookup (Map or object) keyed by a unique PR id (combine number+repositoryId or
use object identity) mapping to the computed wipCount, then change
computeWipLabels to call that helper and map each row to { ...pr, wipLabel:
getWipLabel(wipCount) } and update aggregateWipCycle to reuse the same helper so
the heavy counting is done once.
- Around line 9-22: The formatHours helper is duplicated; extract it into a
shared utility (e.g., export function formatHours(h: number): string) and
replace the local implementations with imports from that module in the files
that currently duplicate it (the modules containing formatHours such as
pr-size-chart.tsx, pr-drill-down-sheet.tsx, and the aggregate function file
where median and formatHours live); ensure the new util is exported with the
exact name formatHours and update all call sites to import { formatHours } from
the new util, keeping median local if not reused.

In `@app/routes/`$orgSlug/settings/repositories/$repository/$pull/index.tsx:
- Around line 124-134: The code unsafely casts shapedPr to
ShapedGitHubPullRequest using a double cast; replace this with runtime
validation instead of bypassing the type system: define a ShapedPRSchema (or a
type guard) that validates the required fields for ShapedGitHubPullRequest, then
call ShapedPRSchema.parse(shapedPr) (or run the guard and throw a 400/404 on
failure) before assigning to pr; update the code around getShapedPullRequest and
variable pr to use the validated value (or add a clear comment only if you
decide to trust the stored JSON).

In `@batch/commands/backfill.ts`:
- Around line 24-26: The code currently calls getOrganization(orgId) and
immediately accesses organization.integration via invariant; add an explicit
null/undefined guard for the organization returned by getOrganization before
touching organization.integration to make not-found deterministic: check that
the organization object exists (e.g., if (!organization) throw or invariant with
a clear message referencing orgId) and only then assert presence of
organization.integration; update the error message to clearly identify missing
organization vs missing integration and reference getOrganization, organization,
organization.integration and orgId so maintainers can locate the check.

In `@batch/db/mutations.ts`:
- Around line 133-135: The current normalization for computing uniqueLogins uses
logins.filter(Boolean).map(l => l.toLowerCase()) and misses trimming, allowing
whitespace-only or padded entries; update the normalization pipeline that
produces uniqueLogins to trim each login first (e.g., map l => l?.trim()), then
filter out empty strings, then lowercase before creating the Set so symbols
referenced are the logins array and the uniqueLogins constant in
batch/db/mutations.ts.

In `@batch/provider/github/store.ts`:
- Around line 66-80: The updatePrMetadata function issues one UPDATE per PR;
change it to perform a single batch update to improve performance by building
one query against githubRawData that updates pullRequest based on
pullRequestNumber for the given repositoryId—either construct a single UPDATE
with a CASE WHEN mapping pr.number -> JSON.stringify(pr) or perform a multi-row
INSERT ... ON CONFLICT (repositoryId, pullRequestNumber) DO UPDATE SET
pullRequest = EXCLUDED.pullRequest; ensure you still filter by repositoryId,
serialize each pr with JSON.stringify, and return the count of rows updated as
before.

In `@db/tenant.sql`:
- Around line 75-79: Replace the freeform complexity column with a schema-level
constraint: modify the `complexity` column definition in db/tenant.sql to
include a CHECK that limits values to the allowed labels (e.g., CHECK
(complexity IN ('low','medium','high','unknown'))) while keeping NULL if you
want optional values; ensure the migration/schema change updates any
INSERT/UPDATE paths that write `complexity` so they use only the allowed labels
and add/update a named constraint (e.g., CONSTRAINT chk_tenant_complexity) for
clarity and maintainability.

In `@docs/crawler-pipeline-architecture.md`:
- Around line 185-186: Update the architecture doc to remove the recommendation
to execute business logic synchronously in the webhook request path (the
"Webhook handler 内で同期実行" statement) and instead standardize on the queue-first
pattern: immediately ACK the webhook and enqueue work for asynchronous
processing; replace the synchronous wording at the current Webhook handler
mention and align the text at the sections referenced (lines 317-318 and
336-337) to emphasize ACK+queued processing as the default, note retry/timing
budget concerns, and adjust any examples that claim "1 PR only" to state that
queue-first avoids latency/retry spikes.
- Around line 8-19: The fenced code blocks in the document (for example the
block showing "job-scheduler", "crawlJob()", "provider.fetch()",
"analyzeAndUpsert()", "provider.analyze()", etc.) need language identifiers to
satisfy markdownlint MD040; update every fenced block (including the other
ranges noted) by appending an appropriate language tag (e.g., ```text, ```sql,
or ```bash) to the opening backticks so the linter recognizes the language for
each block.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d810169 and 2688e58.

⛔ Files ignored due to path filters (1)
  • db/migrations/tenant/atlas.sum is excluded by !**/*.sum
📒 Files selected for processing (44)
  • .gitignore
  • CLAUDE.md
  • app/routes/$orgSlug/reviews/+components/pr-drill-down-sheet.tsx
  • app/routes/$orgSlug/reviews/+components/pr-size-chart.tsx
  • app/routes/$orgSlug/reviews/+components/queue-trend-chart.tsx
  • app/routes/$orgSlug/reviews/+components/wip-cycle-chart.tsx
  • app/routes/$orgSlug/reviews/+functions/aggregate.test.ts
  • app/routes/$orgSlug/reviews/+functions/aggregate.ts
  • app/routes/$orgSlug/reviews/+functions/classify.test.ts
  • app/routes/$orgSlug/reviews/+functions/classify.ts
  • app/routes/$orgSlug/reviews/+functions/queries.server.ts
  • app/routes/$orgSlug/reviews/index.tsx
  • app/routes/$orgSlug/settings/repositories/$repository/$pull/index.tsx
  • app/routes/$orgSlug/settings/repositories/$repository/$pull/queries.server.ts
  • app/services/db.server.ts
  • app/services/tenant-db.server.ts
  • app/services/tenant-type.ts
  • batch/cli.ts
  • batch/commands/backfill.ts
  • batch/db/mutations.ts
  • batch/lib/llm-classify.ts
  • batch/provider/github/aggregator.test.ts
  • batch/provider/github/fetcher.test.ts
  • batch/provider/github/fetcher.ts
  • batch/provider/github/model.ts
  • batch/provider/github/provider.ts
  • batch/provider/github/pullrequest.ts
  • batch/provider/github/store.test.ts
  • batch/provider/github/store.ts
  • batch/provider/github/types.ts
  • batch/provider/index.ts
  • batch/usecases/analyze-and-upsert.ts
  • batch/usecases/classify-pull-requests.ts
  • db/migrations/tenant/20260301120000_add_timeline_items.sql
  • db/migrations/tenant/20260301140000_add_closed_at.sql
  • db/seed.ts
  • db/tenant.sql
  • docs/crawler-pipeline-architecture.md
  • docs/gemini-3-prompting-guide.md
  • lab/classify/evaluate.ts
  • lab/classify/judge.ts
  • lab/classify/test-one.ts
  • lefthook.yml
  • ops/commands/pull-db.ts
✅ Files skipped from review due to trivial changes (2)
  • db/migrations/tenant/20260301120000_add_timeline_items.sql
  • db/migrations/tenant/20260301140000_add_closed_at.sql
🚧 Files skipped from review as they are similar to previous changes (4)
  • batch/usecases/classify-pull-requests.ts
  • batch/provider/github/store.test.ts
  • app/routes/$orgSlug/reviews/+functions/aggregate.test.ts
  • app/routes/$orgSlug/reviews/+components/wip-cycle-chart.tsx

const { organization } = await requireOrgMember(request, params.orgSlug)

const url = new URL(request.url)
const teamParam = url.searchParams.get('team')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate team before using it as a cache key dimension.

team is user-controlled query input. Using it directly can create high-cardinality cache keys and avoidable cache misses.

🛠️ Suggested fix
-  const teamParam = url.searchParams.get('team')
+  const rawTeamParam = url.searchParams.get('team')
@@
   const teams = await listTeams(organization.id)
+  const validTeamIds = new Set(teams.map((team) => team.id))
+  const teamParam =
+    rawTeamParam && validTeamIds.has(rawTeamParam) ? rawTeamParam : null

Also applies to: 62-74

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/routes/`$orgSlug/reviews/index.tsx at line 49, teamParam is
user-controlled and is being used directly to build cache keys which can create
high-cardinality keys; validate and normalize it before using as a cache key
dimension by: enforce an allowlist or lookup (map teamParam to a known teamId or
"all" fallback), constrain length and characters via a strict regex, or
bucket/map unknown values to a small fixed set (or use a stable hash truncated
to N chars) and then use that normalized value (e.g., validatedTeam) when
composing the cache key instead of raw teamParam; apply the same
validation/normalization wherever teamParam is used to build cache keys.

Comment on lines +98 to +104
const seen = new Set<string>()
const uniqueReviewers = reviewers.filter((r) => {
if (!r.login) return false
if (seen.has(r.login)) return false
seen.add(r.login)
return true
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Dedup logic may lose richer requestedAt data (Lines 98-104).

Current first-win filtering can discard a later duplicate that has non-null/newer requestedAt.

Suggested fix
-    const seen = new Set<string>()
-    const uniqueReviewers = reviewers.filter((r) => {
-      if (!r.login) return false
-      if (seen.has(r.login)) return false
-      seen.add(r.login)
-      return true
-    })
+    const byLogin = new Map<string, { login: string; requestedAt: string | null }>()
+    for (const r of reviewers) {
+      if (!r.login) continue
+      const prev = byLogin.get(r.login)
+      if (!prev) {
+        byLogin.set(r.login, r)
+        continue
+      }
+      if (!prev.requestedAt || (r.requestedAt && r.requestedAt > prev.requestedAt)) {
+        byLogin.set(r.login, r)
+      }
+    }
+    const uniqueReviewers = [...byLogin.values()]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@batch/db/mutations.ts` around lines 98 - 104, The current dedup logic using
seen/uniqueReviewers loses richer requestedAt information by keeping the first
occurrence; replace it with a fold that builds a Map keyed by r.login (from the
reviewers array) and for each reviewer compare requestedAt (treat null/undefined
as older) to keep the reviewer with the newest/non-null requestedAt, then return
Array.from(map.values()) instead of the seen/filter approach; update the code
around reviewers, seen and uniqueReviewers to use this map-based dedupe so
duplicates are merged preserving the most recent requestedAt.

Comment on lines +214 to +225
const descriptionTag = rawData.body
? `\n <description>${rawData.body.slice(0, 2000)}</description>`
: ''

return `<pr>
<number>${pr.number}</number>
<title>${pr.title}</title>
<author>${pr.author ?? 'unknown'}</author>${branchesTag}
<stats additions="${pr.additions ?? 0}" deletions="${pr.deletions ?? 0}" files="${pr.changed_files ?? 0}" />${descriptionTag}
<files>
${rawData.fileList}
</files>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Escape PR fields before embedding into <pr> payload.

Raw title, author, description, and file paths are inserted directly into XML-like tags. If any field includes <, >, or &, the structure can break and classification quality degrades.

Proposed fix
+function escapeXml(value: string): string {
+  return value
+    .replaceAll('&', '&amp;')
+    .replaceAll('<', '&lt;')
+    .replaceAll('>', '&gt;')
+}
+
 function buildPrompt(pr: PRRecord, db: Database.Database): string {
   const rawData = getRawPrData(db, pr.repository_id, pr.number)
@@
-  const descriptionTag = rawData.body
-    ? `\n  <description>${rawData.body.slice(0, 2000)}</description>`
+  const descriptionTag = rawData.body
+    ? `\n  <description>${escapeXml(rawData.body.slice(0, 2000))}</description>`
     : ''
@@
-  <title>${pr.title}</title>
-  <author>${pr.author ?? 'unknown'}</author>${branchesTag}
+  <title>${escapeXml(pr.title)}</title>
+  <author>${escapeXml(pr.author ?? 'unknown')}</author>${branchesTag}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lab/classify/judge.ts` around lines 214 - 225, The PR payload embeds
unescaped user content (pr.title, pr.author, rawData.body via descriptionTag,
rawData.fileList) into XML-like tags which can break parsing; update judge.ts to
escape XML entities (&, <, >, ", ') before interpolation by adding/using a small
helper (e.g., escapeXml) and apply it to pr.title, pr.author, the description
slice (rawData.body.slice(...)), rawData.fileList and any branch names used in
branchesTag so the returned string preserves well-formed XML-like structure.

Comment on lines +148 to +151
const text = response.text
if (text) {
const result = JSON.parse(text)
const match = result.complexity === pr.currentLabel ? '✓' : '✗'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard JSON parsing to avoid hard-stop on one bad model response (Line 150).

A single malformed response will crash the script mid-batch.

Suggested fix
-    if (text) {
-      const result = JSON.parse(text)
+    if (text) {
+      let result: { complexity: string; reason: string }
+      try {
+        result = JSON.parse(text)
+      } catch (e) {
+        console.error(`Failed to parse response for PR #${pr.number}`, e)
+        continue
+      }
       const match = result.complexity === pr.currentLabel ? '✓' : '✗'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lab/classify/test-one.ts` around lines 148 - 151, Guard the JSON.parse call
so one malformed model response doesn't crash the batch: wrap parsing of
response.text (variable text) in a try/catch around the JSON.parse(text) that
assigns to result, and on error log or record the parsing failure and set match
(which compares result.complexity to pr.currentLabel) to a safe fallback (e.g.,
'✗' or 'parse_error') so the loop continues; update any downstream logic that
expects result to handle the case where parsing failed.

}
}

main().catch(console.error)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Propagate failure via exit code (Line 162).

main().catch(console.error) can still exit with code 0 in error cases.

Suggested fix
-main().catch(console.error)
+main().catch((e) => {
+  console.error(e)
+  process.exitCode = 1
+})
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
main().catch(console.error)
main().catch((e) => {
console.error(e)
process.exitCode = 1
})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lab/classify/test-one.ts` at line 162, The test harness currently calls
main().catch(console.error) which logs errors but can leave the process exit
code as 0; update the call site so failures propagate a non‑zero exit code by
catching the error from main() and calling process.exit(1) (or rethrowing) after
logging; modify the invocation of main() in test-one.ts (the top-level main()
call) to explicitly handle the promise rejection and exit with code 1 on error.

Comment on lines +113 to +119
function tryPullFile(app: string, remoteFile: string, localFile: string) {
try {
pullFile(app, remoteFile, localFile)
} catch {
// WAL/SHM files may not exist (e.g. after checkpoint), ignore
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not silently swallow all optional file pull errors.

Line 116 currently ignores every failure type. That hides real transfer/auth/network issues, not just “file does not exist”, and can leave operators with a false sense of success.

Proposed fix
 function tryPullFile(app: string, remoteFile: string, localFile: string) {
   try {
     pullFile(app, remoteFile, localFile)
-  } catch {
-    // WAL/SHM files may not exist (e.g. after checkpoint), ignore
+  } catch (error) {
+    const message =
+      error instanceof Error ? error.message : String(error)
+    // WAL/SHM files may not exist (e.g. after checkpoint), ignore only missing-file cases
+    if (/no such file|not found/i.test(message)) return
+    consola.warn(`Optional file pull failed for ${remoteFile}: ${message}`)
   }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function tryPullFile(app: string, remoteFile: string, localFile: string) {
try {
pullFile(app, remoteFile, localFile)
} catch {
// WAL/SHM files may not exist (e.g. after checkpoint), ignore
}
}
function tryPullFile(app: string, remoteFile: string, localFile: string) {
try {
pullFile(app, remoteFile, localFile)
} catch (error) {
const message =
error instanceof Error ? error.message : String(error)
// WAL/SHM files may not exist (e.g. after checkpoint), ignore only missing-file cases
if (/no such file|not found/i.test(message)) return
consola.warn(`Optional file pull failed for ${remoteFile}: ${message}`)
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ops/commands/pull-db.ts` around lines 113 - 119, The tryPullFile function
currently swallows all exceptions from pullFile, hiding real
transfer/auth/network errors; change tryPullFile to only ignore expected "file
not found" errors (e.g., ENOENT/404 or a specific error type returned by
pullFile) and for any other error either rethrow or log it as an error so
operators are alerted. Locate tryPullFile and the pullFile call and implement
conditional handling of the caught error (inspect error.code/message or error
instanceof SpecificError) to suppress only the non-fatal missing-file cases and
otherwise propagate or surface the failure.

coji and others added 6 commits March 2, 2026 18:37
…odule

- Extract shared types, constants, and utilities into judge-common.ts
- Rewrite judge.ts (batch) to overwrite golden.json on success instead of merging
- Add judge-sequential.ts with retry (exponential backoff), --continue, --limit, --think flags
- Add judgedAt/judgedModel fields to GoldenEntry for provenance tracking
- Add self-verification step to classification prompt
- Remove test-one.ts (replaced by --limit 1)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…or, compare tool

- Move golden.json to data/golden/ with envelope format (meta + entries)
  and automatic archive on save
- Rename judge-sequential.ts → judge.ts (primary), judge.ts → judge-batch.ts (secondary)
- Refactor evaluate.ts: import types/schema from judge-common, reuse
  buildPrompt, add promptHash, retry with backoff, --limit/--continue
  with proper evaluatedKeys tracking
- Add compare.ts for side-by-side eval diff display
- Update lab/README.md with judge → evaluate → compare cycle docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move DB access to sample.ts only. PRRecord now includes body, branches,
and files from github_raw_data. buildPrompt no longer requires a DB
connection. GoldenEntry stores the prompt used for judging, so
evaluate.ts can replay without DB. Also adds batch classify CLI command
and backfill --files option.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ensures sampled PRs always have complete file/branch/body data
for accurate golden set judging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allows viewing PR size distribution and review metrics across the
full history, useful for validating classification accuracy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coji coji changed the title feat: add review bottleneck dashboard feat: review bottleneck dashboard with LLM classification Mar 4, 2026
- Guard concurrency parameter with Math.max(1, ...) to prevent infinite loop
- Cap file list at 30 entries in production classifier prompt (consistent with lab)
- Use ~/ path alias for cross-route import

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@batch/cli.ts`:
- Around line 28-34: The CLI handler currently calls classifyCommand without
awaiting its promise; update the argv arrow function to be async (or return the
promise) and await classifyCommand so the process waits for completion and
errors propagate—specifically modify the anonymous handler that receives argv to
either be declared async and use await classifyCommand({ organizationId:
argv._.organizationId, force: argv.flags.force, limit: argv.flags.limit }) or
simply return classifyCommand(...); ensure classifyCommand is awaited/returned
from that handler.

In `@batch/provider/github/provider.ts`:
- Around line 135-143: The per-PR files backfill loop currently lets any thrown
error from fetcher.files or store.updatePrMetadata abort the whole pass; wrap
the body that fetches files and updates metadata for each PR in a try/catch so a
single failure logs the error and continues to the next PR. Specifically, inside
the for (const pr of prs) loop, call await fetcher.files(pr.number) and await
store.updatePrMetadata([pr]) inside a try block, only increment updated and emit
the logger.info(...) on success, and in the catch block log the error with
context (include pr.number and the error) and continue.

In `@lab/classify/evaluate.ts`:
- Around line 281-289: The current continue-mode incorrectly mixes prior
disagreements/evaluatedKeys with only current-run predictions, making
accuracy/avgDrift/totalEvaluated wrong; when previousResult exists (see
variables previousResult, disagreements, evaluatedKeys) merge prior
predictions/evaluation data into the current predictions/evaluation set before
computing metrics (or compute metrics over the union of evaluatedKeys and
combined disagreements), i.e., append or reconcile previousResult.predictions
(or the equivalent stored predictions) into predictions and then run the metric
calculations that produce accuracy, avgDrift, totalEvaluated so all metrics use
the same combined data.
- Around line 254-255: The resume-selection currently only matches on promptHash
(if (data.promptHash === hash) previousResult = data) which lets --continue
reuse outputs from a different model; update the check to require the model to
match as well (e.g., require data.model or data.modelName equals the current
model variable/flag) so previousResult is set only when both promptHash and
model match; also add a clear log/message when a promptHash match is skipped due
to model mismatch to aid debugging.
- Line 406: The async entrypoint call main().catch(console.error) logs errors
but doesn't set a non-zero exit code; change the catch handler for the main()
invocation (the async entrypoint named main) to log the error and then call
process.exit(1) (or otherwise propagate the failure) so the process returns a
failing exit code on unhandled errors.

In `@lab/classify/judge-batch.ts`:
- Around line 332-343: The code unconditionally calls removeBatchJob() after
attempting cancellation, which removes batch-job.json even when
ai.batches.cancel or ai.batches.get throws; change the flow in the try/catch so
removeBatchJob() is only called on successful cancellation/status retrieval
(i.e., after awaiting ai.batches.cancel and ai.batches.get and confirming
job.state shows cancellation), and move the removeBatchJob() call out of the
catch block (or add a conditional) so that when an error occurs the saved job
(saved.name / batch-job.json) is preserved for later resume/debug; ensure
console logs remain but do not delete the file on error.

In `@lab/classify/judge-common.ts`:
- Around line 206-229: The buildPrompt function is embedding raw PR fields
(pr.title, pr.author, pr.body, pr.source_branch, pr.target_branch and the file
names returned by formatFileList) into XML-like tags which will break if values
contain XML-special characters; add an XML-escape helper (e.g. escapeXml) and
apply it to all interpolated values used in buildPrompt (title, author, body
slice, branch names) and ensure formatFileList either returns already-escaped
file paths or is updated to call escapeXml for each file path it emits so all <,
>, & (and quotes if needed) are converted to safe entities before building the
prompt.

In `@lab/classify/sample.ts`:
- Around line 91-97: The CLI numeric parsing currently accepts invalid values
for --per-label and --seed silently; after parsing args[i + 1] with
Number.parseInt, validate the result (for perLabel ensure
Number.isInteger(value) && value > 0, for seed ensure Number.isInteger(value) &&
value >= 0) and bail with a clear error message if the parsed value is
NaN/invalid; update the parsing block that sets perLabel and seed (the code
using args, Number.parseInt, and variables perLabel/seed) to perform these
checks and exit/throw on invalid input so non-numeric values cannot silently
become zero or unexpected values.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a3c64df3-8565-4aea-8612-13706d17898e

📥 Commits

Reviewing files that changed from the base of the PR and between 2688e58 and 7c82c7b.

📒 Files selected for processing (24)
  • CLAUDE.md
  • app/routes/$orgSlug/reviews/index.tsx
  • batch/cli.ts
  • batch/commands/backfill.ts
  • batch/commands/classify.ts
  • batch/lib/llm-classify.ts
  • batch/provider/github/provider.ts
  • batch/provider/index.ts
  • batch/usecases/classify-pull-requests.ts
  • docs/README.md
  • docs/auth-and-invitation.md
  • docs/auth.md
  • docs/batch-json-to-db-migration.md
  • docs/future-ideas.md
  • docs/guides/gemini-prompting.md
  • docs/personal-view.md
  • docs/roadmap.md
  • lab/README.md
  • lab/classify/compare.ts
  • lab/classify/evaluate.ts
  • lab/classify/judge-batch.ts
  • lab/classify/judge-common.ts
  • lab/classify/judge.ts
  • lab/classify/sample.ts
💤 Files with no reviewable changes (3)
  • docs/personal-view.md
  • docs/batch-json-to-db-migration.md
  • docs/auth-and-invitation.md
✅ Files skipped from review due to trivial changes (3)
  • docs/future-ideas.md
  • docs/auth.md
  • docs/guides/gemini-prompting.md

Comment on lines +254 to +255
if (data.promptHash === hash) {
previousResult = data
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

--continue can mix results from different models.

At Line 254, resume selection checks only promptHash. If --model differs, prior outputs from another model are reused and the evaluation becomes invalid.

💡 Proposed fix
-        if (data.promptHash === hash) {
+        if (data.promptHash === hash && data.model === model) {
           previousResult = data
           // Use evaluatedKeys if available, fall back to disagreements only
           const keys =
             data.evaluatedKeys ?? data.disagreements.map((d) => d.key)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lab/classify/evaluate.ts` around lines 254 - 255, The resume-selection
currently only matches on promptHash (if (data.promptHash === hash)
previousResult = data) which lets --continue reuse outputs from a different
model; update the check to require the model to match as well (e.g., require
data.model or data.modelName equals the current model variable/flag) so
previousResult is set only when both promptHash and model match; also add a
clear log/message when a promptHash match is skipped due to model mismatch to
aid debugging.

Comment on lines +281 to +289
if (previousResult) {
for (const d of previousResult.disagreements) {
disagreements.push(d)
}
if (previousResult.evaluatedKeys) {
evaluatedKeys.push(...previousResult.evaluatedKeys)
}
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Continue-mode metrics are computed from partial data only.

At Line 354, metrics are computed from predictions generated in the current run, while disagreements/evaluatedKeys may include previous runs. This makes accuracy, avgDrift, and totalEvaluated inconsistent in --continue mode.

💡 Proposed fix
 interface EvalResult {
@@
   disagreements: {
@@
   }[]
+  predictions?: { key: string; golden: string; predicted: string }[]
   evaluatedKeys: string[]
   totalEvaluated: number
 }

@@
   if (previousResult) {
+    if (previousResult.predictions) {
+      predictions.push(...previousResult.predictions)
+    }
     for (const d of previousResult.disagreements) {
       disagreements.push(d)
     }
@@
   const evalResult: EvalResult = {
@@
     disagreements,
+    predictions,
     evaluatedKeys,
-    totalEvaluated: predictions.length,
+    totalEvaluated: evaluatedKeys.length,
   }

Also applies to: 353-399

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lab/classify/evaluate.ts` around lines 281 - 289, The current continue-mode
incorrectly mixes prior disagreements/evaluatedKeys with only current-run
predictions, making accuracy/avgDrift/totalEvaluated wrong; when previousResult
exists (see variables previousResult, disagreements, evaluatedKeys) merge prior
predictions/evaluation data into the current predictions/evaluation set before
computing metrics (or compute metrics over the union of evaluatedKeys and
combined disagreements), i.e., append or reconcile previousResult.predictions
(or the equivalent stored predictions) into predictions and then run the metric
calculations that produce accuracy, avgDrift, totalEvaluated so all metrics use
the same combined data.

Comment on lines +206 to +229
export function buildPrompt(pr: PRRecord): string {
const branchesTag =
pr.source_branch && pr.target_branch
? `\n <branches>${pr.source_branch} → ${pr.target_branch}</branches>`
: ''

const descriptionTag = pr.body
? `\n <description>${pr.body.slice(0, 2000)}</description>`
: ''

const fileList = formatFileList(pr.files)

return `<pr>
<number>${pr.number}</number>
<title>${pr.title}</title>
<author>${pr.author ?? 'unknown'}</author>${branchesTag}
<stats additions="${pr.additions ?? 0}" deletions="${pr.deletions ?? 0}" files="${pr.changed_files ?? 0}" />${descriptionTag}
<files>
${fileList}
</files>
</pr>

Classify this PR's review complexity.`
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Escape XML-special characters before embedding in prompt.

The buildPrompt function inserts raw PR fields (pr.title, pr.author, pr.body, branch names, file paths) directly into XML-like tags. If any field contains <, >, or &, the structure breaks and classification quality degrades.

Proposed fix
+function escapeXml(value: string): string {
+  return value
+    .replaceAll('&', '&amp;')
+    .replaceAll('<', '&lt;')
+    .replaceAll('>', '&gt;')
+}
+
+function formatFileList(
+  files: { path: string; additions: number; deletions: number }[] | undefined,
+): string {
+  if (!files || files.length === 0) return '(no file data)'
+  const listed = files
+    .slice(0, 30)
-    .map((f) => `  ${f.path} (+${f.additions}/-${f.deletions})`)
+    .map((f) => `  ${escapeXml(f.path)} (+${f.additions}/-${f.deletions})`)
     .join('\n')
   return files.length > 30
     ? `${listed}\n  ... (${files.length} files total)`
     : listed
 }

 export function buildPrompt(pr: PRRecord): string {
   const branchesTag =
     pr.source_branch && pr.target_branch
-      ? `\n  <branches>${pr.source_branch} → ${pr.target_branch}</branches>`
+      ? `\n  <branches>${escapeXml(pr.source_branch)} → ${escapeXml(pr.target_branch)}</branches>`
       : ''

   const descriptionTag = pr.body
-    ? `\n  <description>${pr.body.slice(0, 2000)}</description>`
+    ? `\n  <description>${escapeXml(pr.body.slice(0, 2000))}</description>`
     : ''
 ...
   return `<pr>
   <number>${pr.number}</number>
-  <title>${pr.title}</title>
-  <author>${pr.author ?? 'unknown'}</author>${branchesTag}
+  <title>${escapeXml(pr.title)}</title>
+  <author>${escapeXml(pr.author ?? 'unknown')}</author>${branchesTag}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lab/classify/judge-common.ts` around lines 206 - 229, The buildPrompt
function is embedding raw PR fields (pr.title, pr.author, pr.body,
pr.source_branch, pr.target_branch and the file names returned by
formatFileList) into XML-like tags which will break if values contain
XML-special characters; add an XML-escape helper (e.g. escapeXml) and apply it
to all interpolated values used in buildPrompt (title, author, body slice,
branch names) and ensure formatFileList either returns already-escaped file
paths or is updated to call escapeXml for each file path it emits so all <, >, &
(and quotes if needed) are converted to safe entities before building the
prompt.

coji and others added 2 commits March 4, 2026 11:24
… cancel failure

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e args

- Wrap per-PR file backfill in try-catch so one failure doesn't abort batch
- Escape XML special chars in PR title/author/branches/body for prompts
- Validate --per-label and --seed CLI args in sample.ts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coji coji merged commit e818787 into main Mar 4, 2026
6 checks passed
@coji coji deleted the feat/review-bottleneck-dashboard branch March 4, 2026 02:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant