feat: migrate batch intermediate data from JSON files to tenant DB#130
feat: migrate batch intermediate data from JSON files to tenant DB#130
Conversation
Replace file-based JSON storage (path-builder + fs read/write) with tenant DB tables (github_raw_data, github_raw_tags) for batch PR data. This removes the Fly.io volume dependency for intermediate data, preparing for the Turso migration. - Add github_raw_data and github_raw_tags tables to tenant schema - Rewrite store.ts to use DB UPSERT/SELECT with raw SQL via getTenantDbRaw - Add getTenantDbRaw (plugin-free Kysely instance) to preserve JSON keys - Update provider.ts to use new store API (savePrData, saveTags, preloadAll) - Delete unused path-builder, path, json, index helper files - Add store.test.ts with 7 integration tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (8)
📝 WalkthroughWalkthroughMigrates GitHub data storage from filesystem JSON files to DB-backed persistence (two new tables: Changes
Sequence Diagram(s)sequenceDiagram
participant Provider as GitHub Provider
participant Store as Store Module
participant DB as Tenant DB
participant Cache as In-memory Cache
Note over Provider,DB: DB-backed persistence and preload flow
Provider->>Store: preloadAll()
Store->>DB: SELECT * FROM github_raw_data / github_raw_tags
DB-->>Store: Rows
Store->>Cache: populate map
Provider->>Store: savePrData(pr, {commits,reviews,discussions})
Store->>DB: UPSERT INTO github_raw_data
DB-->>Store: OK
Provider->>Store: saveTags(tags)
Store->>DB: UPSERT INTO github_raw_tags
DB-->>Store: OK
Provider->>Store: loader.commits(prNumber)
alt Cache hit
Cache-->>Store: return commits
else Cache miss
Store->>DB: SELECT WHERE repository_id & pull_request_number
DB-->>Store: row
Store->>Cache: cache and return
end
Store-->>Provider: PR data
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@batch/provider/github/store.ts`:
- Around line 40-63: savePrData updates the DB but doesn't update the in-memory
preload cache, causing loaders (which read from preloaded after preloadAll) to
see stale data; after the INSERT ... ON CONFLICT executes successfully in
savePrData, also update the preloaded cache entry used by preloadAll/loaders
(the preloaded Map/object) for the key corresponding to repositoryId + pr.number
with the same shape you serialize (pull_request, commits, reviews, discussions)
and set fetched_at to the current time (e.g., new Date().toISOString()); apply
the same change to the other DB-write helper(s) that mutate GitHub data (the
writer covering the region referenced by lines 96-113) so they keep preloaded in
sync.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
db/migrations/tenant/atlas.sumis excluded by!**/*.sum
📒 Files selected for processing (12)
app/services/tenant-db.server.tsapp/services/tenant-type.tsbatch/helper/index.tsbatch/helper/json.tsbatch/helper/path-builder.test.tsbatch/helper/path-builder.tsbatch/helper/path.tsbatch/provider/github/provider.tsbatch/provider/github/store.test.tsbatch/provider/github/store.tsdb/migrations/tenant/20260228064258.sqldb/tenant.sql
💤 Files with no reviewable changes (5)
- batch/helper/path-builder.test.ts
- batch/helper/path.ts
- batch/helper/json.ts
- batch/helper/path-builder.ts
- batch/helper/index.ts
| const savePrData = async ( | ||
| pr: ShapedGitHubPullRequest, | ||
| data: { | ||
| commits: ShapedGitHubCommit[] | ||
| reviews: ShapedGitHubReview[] | ||
| discussions: ShapedGitHubReviewComment[] | ||
| }, | ||
| ) => { | ||
| const prJson = JSON.stringify(pr) | ||
| const commitsJson = JSON.stringify(data.commits) | ||
| const reviewsJson = JSON.stringify(data.reviews) | ||
| const discussionsJson = JSON.stringify(data.discussions) | ||
|
|
||
| await sql` | ||
| INSERT INTO github_raw_data (repository_id, pull_request_number, pull_request, commits, reviews, discussions) | ||
| VALUES (${repositoryId}, ${pr.number}, ${prJson}, ${commitsJson}, ${reviewsJson}, ${discussionsJson}) | ||
| ON CONFLICT (repository_id, pull_request_number) DO UPDATE SET | ||
| pull_request = ${prJson}, | ||
| commits = ${commitsJson}, | ||
| reviews = ${reviewsJson}, | ||
| discussions = ${discussionsJson}, | ||
| fetched_at = datetime('now') | ||
| `.execute(db) | ||
| } |
There was a problem hiding this comment.
Keep preload cache consistent after writes.
After preloadAll(), loaders read from preloaded (Line 111). savePrData() updates the DB but not the cache, so reads can return stale data in the same store instance.
🔧 Proposed fix
const savePrData = async (
pr: ShapedGitHubPullRequest,
data: {
commits: ShapedGitHubCommit[]
reviews: ShapedGitHubReview[]
discussions: ShapedGitHubReviewComment[]
},
) => {
@@
await sql`
@@
`.execute(db)
+
+ if (preloaded) {
+ preloaded.set(pr.number, {
+ pullRequest: pr,
+ commits: data.commits,
+ reviews: data.reviews,
+ discussions: data.discussions,
+ })
+ }
}Also applies to: 96-113
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@batch/provider/github/store.ts` around lines 40 - 63, savePrData updates the
DB but doesn't update the in-memory preload cache, causing loaders (which read
from preloaded after preloadAll) to see stale data; after the INSERT ... ON
CONFLICT executes successfully in savePrData, also update the preloaded cache
entry used by preloadAll/loaders (the preloaded Map/object) for the key
corresponding to repositoryId + pr.number with the same shape you serialize
(pull_request, commits, reviews, discussions) and set fetched_at to the current
time (e.g., new Date().toISOString()); apply the same change to the other
DB-write helper(s) that mutate GitHub data (the writer covering the region
referenced by lines 96-113) so they keep preloaded in sync.
There was a problem hiding this comment.
実際の使用フローでは savePrData(fetch フェーズ)→ preloadAll(analyze フェーズ)の順で呼ばれ、同一インスタンスで save 後に preloaded から read するパスは存在しないため、対応不要と判断しました。
There was a problem hiding this comment.
@coji さん、ご説明ありがとうございます。
おっしゃる通り、実際の使用フローを確認すると:
- Fetch フェーズ:
savePrData()でデータを DB に書き込み - Analyze フェーズ:
preloadAll()で DB から最新データを一括ロード - Read: キャッシュから読み取り
という順序で実行されるため、書き込み後に古いキャッシュから読み取るパスは存在しませんね。preloadAll() が必ず DB から最新データを取得するので、キャッシュの整合性は保たれています。
provider.ts の変更内容からも、この設計意図が確認できました。ご指摘の通り、このケースではキャッシュの明示的な無効化は不要です 👍
ShapedGitHub* モデルの snake_case フィールドを camelCase に統一し、 CamelCasePlugin との互換性問題を解消。store.ts を getTenantDb + Kysely クエリビルダーに書き換え、getTenantDbRaw を削除。 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
batch/scripts/ ディレクトリと package.json の関連スクリプトを削除。 中間データが tenant DB に移行されたため不要。 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- getTenantDb にインライン化し getOrCreateDb の不要な間接層を除去 - onConflict で excluded 参照を使い JSON.stringify の重複呼び出しを解消 - ParseJSONResultsPlugin による自動パースについてコメント追加 - store.test.ts の SQL 文字列補間を prepared statement に変更 - テスト名・コメントの snake_case 残りを camelCase に修正 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
batch/provider/github/store.ts (1)
26-53:⚠️ Potential issue | 🟡 MinorSynchronize
preloadedaftersavePrDatawrites.After
preloadAll()(Line 94),loadRow()reads from cache first (Line 115-117).savePrData()(Line 26-53) updates DB but notpreloaded, so same-instance reads can return stale/missing data.💡 Suggested fix
const savePrData = async ( @@ ) => { @@ .execute() + + if (preloaded) { + preloaded.set(pr.number, { + pullRequest: pr, + commits: data.commits, + reviews: data.reviews, + discussions: data.discussions, + }) + } } const loadRow = async (number: number) => { - if (preloaded) { - return preloaded.get(number) ?? null + if (preloaded) { + const cached = preloaded.get(number) + if (cached) return cached } @@ - return parseRow(row) + const parsed = parseRow(row) + if (preloaded) preloaded.set(number, parsed) + return parsed }Also applies to: 94-117
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@batch/provider/github/store.ts` around lines 26 - 53, savePrData writes PR data to the database but does not update the in-memory cache preloaded, so subsequent calls to loadRow (which checks preloaded first after preloadAll) can return stale/missing data; fix by updating the preloaded map inside savePrData after the DB upsert (use the same keying used by preloadAll/loadRow, e.g. repositoryId and pr.number) to store the JSON-parsed or original object you just wrote, ensuring preloaded stays in sync with the DB for same-instance reads.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@batch/provider/github/release-detect.ts`:
- Line 63: Wrap the dynamic RegExp construction around tagCondition in a
try-catch: when creating tagRegexp from tagCondition (the new
RegExp(tagCondition) instance in release-detect.ts), catch SyntaxError or other
exceptions, log or surface a clear error including the invalid pattern, and fall
back to a safe behavior (e.g., treat as no-match / skip processing this config
or exclude the condition) instead of letting the exception propagate; ensure any
downstream use of tagRegexp checks that it was successfully created before
calling test() or exec() so invalid patterns don't crash batch execution.
In `@batch/provider/github/store.test.ts`:
- Around line 20-21: Add an afterAll hook to unstub environment variables so
vi.stubEnv calls don't leak; specifically, after the existing afterEach block
add: afterAll(() => { vi.unstubAllEnvs(); }); This uses the vi.unstubAllEnvs()
helper to restore env state corresponding to the vi.stubEnv(...) calls and
should be placed in the same test file scope (near the existing afterEach) so
all tests are cleaned up.
---
Duplicate comments:
In `@batch/provider/github/store.ts`:
- Around line 26-53: savePrData writes PR data to the database but does not
update the in-memory cache preloaded, so subsequent calls to loadRow (which
checks preloaded first after preloadAll) can return stale/missing data; fix by
updating the preloaded map inside savePrData after the DB upsert (use the same
keying used by preloadAll/loadRow, e.g. repositoryId and pr.number) to store the
JSON-parsed or original object you just wrote, ensuring preloaded stays in sync
with the DB for same-instance reads.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
app/services/tenant-db.server.tsbatch/provider/github/aggregator.test.tsbatch/provider/github/aggregator.tsbatch/provider/github/fetcher.tsbatch/provider/github/model.tsbatch/provider/github/provider.tsbatch/provider/github/pullrequest.tsbatch/provider/github/release-detect.tsbatch/provider/github/review-response.tsbatch/provider/github/store.test.tsbatch/provider/github/store.tsbatch/scripts/golden-compare.tsbatch/scripts/golden-snapshot.tspackage.json
💤 Files with no reviewable changes (3)
- package.json
- batch/scripts/golden-snapshot.ts
- batch/scripts/golden-compare.ts
kysely-codegen の overrides で github_raw_data/github_raw_tags の JSON カラムを ColumnType<unknown, string, string> に設定。 SELECT 時は unknown、INSERT 時は string となり、store.ts の as unknown as を as に軽減。 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- release-detect.ts: 不正な tagCondition で RegExp コンストラクタが 例外を投げないよう try-catch を追加 - store.test.ts: vi.stubEnv のリークを防ぐ afterAll(unstubAllEnvs) を追加 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
github_raw_data,github_raw_tags)に移行getTenantDbRawを廃止変更内容
db/tenant.sqlにgithub_raw_data,github_raw_tagsテーブル追加store.tsを DB ベース(Kysely クエリビルダーによる UPSERT/SELECT)に書き換えprovider.tsを新しい store API(savePrData,saveTags,preloadAll)に移行getTenantDbRaw(プラグインなし Kysely インスタンス)を廃止、getTenantDbに統一path-builder.ts,path.ts,json.ts,index.ts,batch/scripts/golden-*.tsTest plan
pnpm validate(lint, format, typecheck, build, 52 tests)store.test.ts: save/load round-trip, upsert, empty state, preloadAll, multiple PRs (7 tests)batch fetch→github_raw_dataに camelCase キーでデータ保存を確認batch upsert→ analyze →pull_requestsに書き込み成功を確認🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Refactor
Chores