feat: make classify a standalone durably job#209
Conversation
- Add classify durably job with force/limit options and progress tracking - Register classify job in durably instance - Update CLI command to use triggerAndWait with progress/log callbacks - Add return value to classifyPullRequests for job output Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthrough分類ステップを同期パイプラインから切り離し、Durably ジョブ( Changes
Sequence Diagram(s)sequenceDiagram
participant Crawler as Crawl Job
participant Shared as shared-steps
participant Durably as Durably
participant Classify as classifyJob
participant Usecase as classifyPullRequests
Crawler->>Shared: analyze → upsert → export → finalize
Shared->>Durably: triggerClassifyStep(orgId) (fire-and-forget)
Durably->>Classify: start classifyJob(orgId, force?, limit?)
Classify->>Usecase: classifyPullRequests(orgId, {force, limit})
Usecase-->>Classify: { classifiedCount }
Classify-->>Durably: return { classifiedCount }
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Remove classify step from the shared analyze→upsert→export→finalize
pipeline and trigger it as a separate durably job after crawl/recalculate
completion. This unifies the classify execution path so all classify runs
share concurrencyKey `classify:${orgId}`, preventing duplicate LLM API
calls when crawl and standalone classify run concurrently.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/routes/`$orgSlug/settings/data-management/index.tsx:
- Line 70: Add a regression test for the recalculate action that verifies the
classify step is not triggered when steps.classify is false: invoke the
recalculate action (the action handler named "recalculate" in the
data-management route) with payload containing organizationId: org.id and steps:
{ ...steps, classify: false }, then assert that the classify job/side-effect is
not enqueued/invoked and other expected effects of recalculate still occur;
ensure the test specifically checks the hard-coded classify: false behavior so
future changes to the recalculate logic will be caught.
In `@app/services/jobs/classify.server.ts`:
- Around line 6-29: Add unit/integration tests for the new classify job: create
tests that import classifyJob (from classify.server.ts) and exercise its run
flow by mocking classifyPullRequests to return a known classifiedCount and
asserting the job returns that value; include cases that pass force=true/false
and different limit values to ensure inputs are forwarded to
classifyPullRequests, and a test that simulates step.run/step.progress behavior
(mock the step runner or use the job runner helper) to validate the execution
path; ensure tests assert the output schema (classifiedCount is a number) and
that classifyPullRequests was called with the expected args.
- Around line 19-27: After classifyPullRequests completes inside the
step.run('classify') block, call clearAllCache() to invalidate caches so the new
classification is visible; locate the classify flow around step.run('classify')
and add a clearAllCache() invocation (importing clearAllCache from
app/services/cache.server.ts) immediately after the result is returned/after
finalize so that the returned classifiedCount is written and caches are cleared.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8c292acb-9556-4a14-a8dc-8ff5388640d9
📒 Files selected for processing (9)
CLAUDE.mdapp/routes/$orgSlug/settings/data-management/index.tsxapp/services/durably.server.tsapp/services/jobs/classify.server.tsapp/services/jobs/crawl.server.tsapp/services/jobs/recalculate.server.tsapp/services/jobs/shared-steps.server.tsbatch/commands/classify.tsbatch/usecases/classify-pull-requests.ts
| export const classifyJob = defineJob({ | ||
| name: 'classify', | ||
| input: z.object({ | ||
| organizationId: z.string(), | ||
| force: z.boolean().default(false), | ||
| limit: z.number().optional(), | ||
| }), | ||
| output: z.object({ | ||
| classifiedCount: z.number(), | ||
| }), | ||
| run: async (step, input) => { | ||
| const orgId = input.organizationId as OrganizationId | ||
|
|
||
| const result = await step.run('classify', async () => { | ||
| step.progress(0, 0, 'Classifying PRs...') | ||
| return await classifyPullRequests(orgId, { | ||
| force: input.force, | ||
| limit: input.limit, | ||
| }) | ||
| }) | ||
|
|
||
| return { classifiedCount: result.classifiedCount } | ||
| }, | ||
| }) |
There was a problem hiding this comment.
新規 classify ジョブの変更に対するテスト追加が必要です。
force/limit 入力、classifiedCount 出力、実行フローの最低限のユニット/統合テストを追加してください。
As per coding guidelines **/*.{ts,tsx}: Ensure added or modified logic includes corresponding unit tests.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/services/jobs/classify.server.ts` around lines 6 - 29, Add
unit/integration tests for the new classify job: create tests that import
classifyJob (from classify.server.ts) and exercise its run flow by mocking
classifyPullRequests to return a known classifiedCount and asserting the job
returns that value; include cases that pass force=true/false and different limit
values to ensure inputs are forwarded to classifyPullRequests, and a test that
simulates step.run/step.progress behavior (mock the step runner or use the job
runner helper) to validate the execution path; ensure tests assert the output
schema (classifiedCount is a number) and that classifyPullRequests was called
with the expected args.
There was a problem hiding this comment.
スキップします。classify ジョブは classifyPullRequests を durably step でラップしているだけで、ロジック自体は classifyPullRequests 側にあります。durably の step.run/step.progress のモックは費用対効果が低いと判断しました。
There was a problem hiding this comment.
@coji、了解しました。classifyJob はオーケストレーション用のグルーコードであり、ビジネスロジックは classifyPullRequests 側に集約されているというご判断は理にかなっています。テストのスキップを承認します。
✏️ Learnings added
Learnt from: coji
Repo: coji/upflow PR: 209
File: app/services/jobs/classify.server.ts:6-30
Timestamp: 2026-03-18T00:26:56.831Z
Learning: In coji/upflow, thin durably job wrappers (e.g., classifyJob in app/services/jobs/classify.server.ts) that simply delegate to a usecase function (e.g., classifyPullRequests) via step.run/step.progress do not require dedicated unit tests. The actual business logic is tested at the usecase layer. Mocking durably's step.run/step.progress for such wrappers is considered low cost-benefit by the project owner.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: CR
Repo: coji/upflow PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T23:33:34.917Z
Learning: Applies to **/*.{ts,tsx} : Ensure added or modified logic includes corresponding unit tests
Learnt from: coji
Repo: coji/upflow PR: 141
File: CLAUDE.md:215-215
Timestamp: 2026-03-09T08:25:06.885Z
Learning: In coji/upflow, for any UPDATE or DELETE on org-scoped tables, always include WHERE organizationId = ? with a server-derived value, and obtain per-org tenant data using getTenantDb(organizationId). There is no automatic Kysely plugin for org scoping; apply this pattern project-wide to ensure proper scoping.
Learnt from: coji
Repo: coji/upflow PR: 196
File: app/services/jobs/recalculate.server.ts:127-132
Timestamp: 2026-03-17T06:05:49.427Z
Learning: Use the established project-wide cache invalidation pattern: call clearAllCache() from ~/app/services/cache.server.ts when invalidating cache. This should be used in recalculate and crawl jobs (and any similar cache invalidation points in app/services/jobs). The cache is small and org-scoped entries, so full cache wipes have limited blast radius. Note that org-scoped invalidation is a known future improvement and currently not implemented; prefer clearAllCache() for consistency and traceability now.
Learnt from: coji
Repo: coji/upflow PR: 198
File: app/services/jobs/crawl.server.ts:159-165
Timestamp: 2026-03-17T07:58:29.114Z
Learning: Guideline: Do not apply the global 'organizationId' filter to updates/deletes when using the per-organization tenant database accessed via getTenantDb(). In coji/upflow, getTenantDb(organizationId) uses a separate SQLite database file per organization, and tables there (e.g., organizationSettings, repositories) are implicitly scoped to that organization. The mandatory WHERE organizationId = ? rule only applies to the shared/global database accessed via the top-level db instance (app/services/db.server.ts). During reviews, ensure queries executed through getTenantDb() do not include or require organizationId in WHERE clauses for UPDATE/DELETE, unless referencing the global/db.server.ts path. This should be enforced by code patterns or CI checks that distinguish between tenant DB vs global DB usage.
| const result = await step.run('classify', async () => { | ||
| step.progress(0, 0, 'Classifying PRs...') | ||
| return await classifyPullRequests(orgId, { | ||
| force: input.force, | ||
| limit: input.limit, | ||
| }) | ||
| }) | ||
|
|
||
| return { classifiedCount: result.classifiedCount } |
There was a problem hiding this comment.
分類結果を書き込んだ後のキャッシュ無効化が抜けています。
classify が finalize 後に別ジョブ実行される構成になったため、ここで無効化しないと分類結果がキャッシュに反映されず古い表示が残る可能性があります。
💡 修正案
import { defineJob } from '@coji/durably'
import { z } from 'zod'
+import { clearAllCache } from '~/app/services/cache.server'
import type { OrganizationId } from '~/app/types/organization'
import { classifyPullRequests } from '~/batch/usecases/classify-pull-requests'
@@
const result = await step.run('classify', async () => {
step.progress(0, 0, 'Classifying PRs...')
return await classifyPullRequests(orgId, {
force: input.force,
limit: input.limit,
})
})
+
+ clearAllCache()
return { classifiedCount: result.classifiedCount }
},
})Based on learnings: Use the established project-wide cache invalidation pattern: call clearAllCache() from ~/app/services/cache.server.ts when invalidating cache.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const result = await step.run('classify', async () => { | |
| step.progress(0, 0, 'Classifying PRs...') | |
| return await classifyPullRequests(orgId, { | |
| force: input.force, | |
| limit: input.limit, | |
| }) | |
| }) | |
| return { classifiedCount: result.classifiedCount } | |
| import { defineJob } from '@coji/durably' | |
| import { z } from 'zod' | |
| import { clearAllCache } from '~/app/services/cache.server' | |
| import type { OrganizationId } from '~/app/types/organization' | |
| import { classifyPullRequests } from '~/batch/usecases/classify-pull-requests' | |
| // ... other code ... | |
| const result = await step.run('classify', async () => { | |
| step.progress(0, 0, 'Classifying PRs...') | |
| return await classifyPullRequests(orgId, { | |
| force: input.force, | |
| limit: input.limit, | |
| }) | |
| }) | |
| clearAllCache() | |
| return { classifiedCount: result.classifiedCount } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/services/jobs/classify.server.ts` around lines 19 - 27, After
classifyPullRequests completes inside the step.run('classify') block, call
clearAllCache() to invalidate caches so the new classification is visible;
locate the classify flow around step.run('classify') and add a clearAllCache()
invocation (importing clearAllCache from app/services/cache.server.ts)
immediately after the result is returned/after finalize so that the returned
classifiedCount is written and caches are cleared.
There was a problem hiding this comment.
検討の結果、明示的なキャッシュクリアは不要と判断しました (031ac4d)。classify は crawl 完了後に非同期で実行されるため、ユーザーが即座に結果を確認するケースは少なく、インメモリキャッシュの TTL (5分) で自然失効に任せます。コメントも追加済みです。
There was a problem hiding this comment.
@coji、ご判断ありがとうございます。非同期実行という性質上、TTL による自然失効で十分という考え方は理にかなっています。了解しました。
✏️ Learnings added
Learnt from: coji
Repo: coji/upflow PR: 209
File: app/services/jobs/classify.server.ts:19-28
Timestamp: 2026-03-18T00:26:59.440Z
Learning: In coji/upflow, the classify job (app/services/jobs/classify.server.ts) intentionally does NOT call clearAllCache() after classifyPullRequests completes. Since classify runs asynchronously after crawl finishes, users are unlikely to check results immediately, and the in-memory cache TTL (5 minutes) is sufficient for natural expiration. This is a deliberate exception to the general project-wide pattern of calling clearAllCache() after data-modifying jobs.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: coji
Repo: coji/upflow PR: 196
File: app/services/jobs/recalculate.server.ts:127-132
Timestamp: 2026-03-17T06:05:49.427Z
Learning: Use the established project-wide cache invalidation pattern: call clearAllCache() from ~/app/services/cache.server.ts when invalidating cache. This should be used in recalculate and crawl jobs (and any similar cache invalidation points in app/services/jobs). The cache is small and org-scoped entries, so full cache wipes have limited blast radius. Note that org-scoped invalidation is a known future improvement and currently not implemented; prefer clearAllCache() for consistency and traceability now.
Learnt from: coji
Repo: coji/upflow PR: 141
File: CLAUDE.md:215-215
Timestamp: 2026-03-09T08:25:06.885Z
Learning: In coji/upflow, for any UPDATE or DELETE on org-scoped tables, always include WHERE organizationId = ? with a server-derived value, and obtain per-org tenant data using getTenantDb(organizationId). There is no automatic Kysely plugin for org scoping; apply this pattern project-wide to ensure proper scoping.
Learnt from: coji
Repo: coji/upflow PR: 198
File: app/services/jobs/crawl.server.ts:159-165
Timestamp: 2026-03-17T07:58:29.114Z
Learning: Guideline: Do not apply the global 'organizationId' filter to updates/deletes when using the per-organization tenant database accessed via getTenantDb(). In coji/upflow, getTenantDb(organizationId) uses a separate SQLite database file per organization, and tables there (e.g., organizationSettings, repositories) are implicitly scoped to that organization. The mandatory WHERE organizationId = ? rule only applies to the shared/global database accessed via the top-level db instance (app/services/db.server.ts). During reviews, ensure queries executed through getTenantDb() do not include or require organizationId in WHERE clauses for UPDATE/DELETE, unless referencing the global/db.server.ts path. This should be enforced by code patterns or CI checks that distinguish between tenant DB vs global DB usage.
… handling to trigger - Remove classify field from recalculate job input schema entirely - Remove --classify flag from recalculate CLI command - Add try/catch in triggerClassifyStep so classify trigger failures don't fail the parent crawl job - Update crawl CLI description to reflect new pipeline Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without this, classification results wouldn't be reflected in the UI until the next crawl/recalculate finalize step cleared the cache. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Classify runs asynchronously after crawl, so users aren't watching for immediate results. The in-memory cache has a 5-minute TTL and will naturally expire before users typically view updated data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
concurrencyKey: classify:${orgId}, preventing duplicate concurrent LLM API calls--classifyflag from recalculate CLI (use standaloneclassifycommand instead)Pipeline change:
Test plan
pnpm validatepassespnpm tsx batch/cli.ts classify <org-id>— standalone classify workspnpm dev→ Data Management → Recalculate — no classify checkbox, upsert/export workpnpm tsx batch/cli.ts recalculate <org-id>— no--classifyflag🤖 Generated with Claude Code
Summary by CodeRabbit
リリースノート
新機能
変更