feat: AI-assisted COA classification (GPT-4o-mini)#87
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds an AI-assisted COA classification library ( Changes
Sequence DiagramsequenceDiagram
participant Client
participant APIRoute as API Route (ai-suggest)
participant Storage
participant Classifier as classifyBatchWithAI
participant OpenAI
participant Keyword as Keyword Fallback
Client->>APIRoute: POST /api/classification/ai-suggest
APIRoute->>Storage: Fetch unclassified transactions (limit 1–25)
APIRoute->>Storage: Load tenant COA options
APIRoute->>Classifier: classifyBatchWithAI(transactions, coa, apiKey, opts)
alt apiKey present
Classifier->>OpenAI: Chat completion request (structured JSON)
alt OpenAI returns valid JSON
OpenAI-->>Classifier: JSON suggestions
Classifier->>Classifier: Validate codes, clamp confidence, fill missing via keyword
else OpenAI fails or returns malformed
Classifier->>Keyword: keywordFallback for remaining/all transactions
Keyword-->>Classifier: keyword suggestions
end
else no apiKey
Classifier->>Keyword: keywordFallback for all transactions
Keyword-->>Classifier: keyword suggestions
end
Classifier-->>APIRoute: AiSuggestion[] (source: ai | keyword)
loop per suggestion
APIRoute->>Storage: classifyTransaction(..., trustLevel: L1, isSuggestion: true)
end
APIRoute->>Storage: Record audit log (classification.ai-suggest)
APIRoute-->>Client: Summary { processed, suggested, aiCount, keywordCount, aiAvailable }
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review Please evaluate:
|
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
chittyfinance | 33540fc | Apr 10 2026, 09:36 PM |
Code Review — PR #87: AI-assisted COA classificationOverall this is well-structured work. The fallback hierarchy is sound, the trust-path alignment is correct (L1/suggestion-only), and the test coverage for the library is thorough. A few issues worth addressing before merge: Bugs1. // server/routes/classification.ts
ledgerLog(c, {
entityType: 'audit',
action: 'classification.ai-suggest',
...
}, c.env);In Cloudflare Workers, any pending microtask after 2. Silent for (const s of suggestions) {
try {
await storage.classifyTransaction(...);
suggested++;
} catch {
continue; // ← no log, no counter
}
}If } catch (err) {
console.error('[ai-suggest] classifyTransaction failed for', s.transactionId, err);
continue;
}Performance3. Sequential The write loop awaits each DB call in sequence. For a full batch of 25, this is 25 round-trips to Neon. Use const results = await Promise.allSettled(
suggestions.map((s) =>
storage.classifyTransaction(s.transactionId, tenantId, s.coaCode, { ... })
.then(() => s.source)
)
);
for (const r of results) {
if (r.status === 'fulfilled') {
suggested++;
r.value === 'ai' ? aiCount++ : keywordCount++;
} else {
console.error('[ai-suggest] write failed:', r.reason);
}
}4. Double-filtering in the route The route fetches up to Code quality5. coa.map((a: any) => ({ code: a.code, name: a.name, type: a.type, description: a.description }))
6. Documentation inconsistency: "GPT-4o" vs The file header says "Uses GPT-4o with structured JSON output" but the implementation uses Test coverage gap7. No route-level tests The 9 library tests are excellent. But the route itself (
Even a basic happy-path integration test for the route would reduce risk here. Minor nitThe Summary: Items 1 (dropped audit logs in Workers) and 3 (sequential writes) are the highest-priority fixes. The rest are quality improvements. The library design and fallback logic are solid — the trust-path alignment with L1-only writes is exactly right. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aa5ee65dd5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Adds a new POST /api/classification/ai-suggest endpoint that batches
up to 25 unclassified transactions through GPT-4o-mini with structured
JSON output, falling back to keyword matching on any failure.
Library (server/lib/classification-ai.ts):
- classifyBatchWithAI() — per-request OpenAI client (Workers-safe)
- Validates AI-returned codes against the tenant's COA (rejects hallucinations)
- Clamps confidence to [0, 1]
- Always returns one suggestion per input (gap-fills with keyword fallback)
- Respects CF AI Gateway via AI_GATEWAY_ENDPOINT env
Route:
- Only processes transactions without an existing suggested_coa_code
(respects prior human/auto work)
- Writes via L1 suggest (never authoritative coa_code)
- Returns {processed, suggested, aiCount, keywordCount, aiAvailable}
Tests (9 new):
- Empty input, missing key, API error, malformed JSON, missing choices
- Confidence clamping, hallucinated-code rejection, gap-filling
- Batch size guard (MAX_BATCH=25)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
aa5ee65 to
5744890
Compare
|
@coderabbitai review Please evaluate:
|
There was a problem hiding this comment.
Pull request overview
Adds an AI-assisted COA suggestion flow to the existing classification “trust-path” by introducing a new batch endpoint that uses GPT-4o-mini with structured JSON output and defensive keyword fallbacks, plus unit tests validating the parsing/fallback behavior.
Changes:
- Added
POST /api/classification/ai-suggestto batch-suggestsuggested_coa_codevia GPT-4o-mini with keyword gap-filling. - Introduced
server/lib/classification-ai.tsto call OpenAI (Workers-safe), validate returned COA codes, clamp confidence, and always return one suggestion per input. - Added Vitest coverage for AI parsing and all fallback paths (no network).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| server/routes/classification.ts | Adds /api/classification/ai-suggest endpoint and wires it to storage + audit logging. |
| server/lib/classification-ai.ts | New OpenAI-backed batch classifier with strict validation and keyword fallback. |
| server/tests/classification-ai.test.ts | New unit tests covering success + error/malformed/hallucination/gap-fill scenarios. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
server/__tests__/classification-ai.test.ts (1)
79-107: Consider adding a test for negative confidence clamping.The test verifies that confidence > 1 is clamped to 1, but the implementation also clamps values < 0 to 0. Adding a test case for negative confidence would ensure complete coverage of the clamping logic.
💚 Optional test case
it('clamps negative confidence to 0', async () => { mockCreate.mockResolvedValueOnce({ choices: [ { message: { content: JSON.stringify({ suggestions: [ { id: 'tx-1', code: '5070', confidence: -0.5, reason: 'Negative test' }, ], }), }, }, ], }); const result = await classifyBatchWithAI([TX_REPAIR], COA, 'fake-key'); expect(result[0].confidence).toBe(0); });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/__tests__/classification-ai.test.ts` around lines 79 - 107, Add a unit test that verifies negative confidence values are clamped to 0: in server/__tests__/classification-ai.test.ts create a new test (e.g., "clamps negative confidence to 0") that uses mockCreate.mockResolvedValueOnce to return a choices[0].message.content JSON with a suggestion having confidence: -0.5, then call classifyBatchWithAI([TX_REPAIR], COA, 'fake-key') and assert the returned classification's confidence is 0; reference mockCreate, classifyBatchWithAI, TX_REPAIR and COA to locate where to add the test.server/routes/classification.ts (2)
367-367: Consider adding proper typing instead ofanycast.The
anytype cast bypasses type checking. The storage method likely returns a typed COA array.♻️ Suggested improvement
- coa.map((a: any) => ({ code: a.code, name: a.name, type: a.type, description: a.description })), + coa.map((a) => ({ code: a.code, name: a.name, type: a.type, description: a.description })),If the return type from
storage.getChartOfAccountsisn't properly typed, consider adding a type assertion or updating the storage layer's return type.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/routes/classification.ts` at line 367, The map callback is typed as any which bypasses TS checks; update the COA typing by using the actual model/interface returned from storage.getChartOfAccounts (e.g., ChartOfAccount or CoaItem) instead of any in the coa.map callback and/or adjust storage.getChartOfAccounts to return that typed array so the mapping line coa.map((a: any) => ...) becomes coa.map((a: ChartOfAccount) => ...) (or the appropriate interface name) to ensure proper type checking of code, name, type, description fields.
336-407: Consider adding rate limiting for AI requests.The AI endpoint makes external OpenAI API calls which incur costs. Without rate limiting, this endpoint could be abused or accidentally called excessively, leading to unexpected API charges.
Based on learnings: "Implement rate limiting for AI requests to prevent excessive OpenAI API costs"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/routes/classification.ts` around lines 336 - 407, The ai-suggest endpoint (classificationRoutes.post('/api/classification/ai-suggest')) lacks rate limiting and can be abused to incur OpenAI costs; add a per-tenant (and fallback global) rate limiter before calling classifyBatchWithAI that enforces a reasonable quota (e.g., X requests/min or Y tokens/min) using an in-memory token-bucket for single-instance dev and Redis/leaky-bucket for production, return a 429 JSON error when the limit is exceeded, and log quota rejections (include tenantId and requester) so storage.classifyTransaction/classifyBatchWithAI calls only proceed when the limiter permits.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/routes/classification.ts`:
- Around line 389-391: The catch block that currently swallows all exceptions
should be changed to capture the error (e.g., catch (err)) and handle it the
same way as the batch-suggest path: if err is a ClassificationError then
continue processing the next item, otherwise re-throw the error (or at minimum
call the logger to record the error) so non-recoverable issues (DB outages,
etc.) are not silently hidden; update the catch around the loop in
classification.ts to reference ClassificationError and/or log unexpected errors
before rethrowing.
---
Nitpick comments:
In `@server/__tests__/classification-ai.test.ts`:
- Around line 79-107: Add a unit test that verifies negative confidence values
are clamped to 0: in server/__tests__/classification-ai.test.ts create a new
test (e.g., "clamps negative confidence to 0") that uses
mockCreate.mockResolvedValueOnce to return a choices[0].message.content JSON
with a suggestion having confidence: -0.5, then call
classifyBatchWithAI([TX_REPAIR], COA, 'fake-key') and assert the returned
classification's confidence is 0; reference mockCreate, classifyBatchWithAI,
TX_REPAIR and COA to locate where to add the test.
In `@server/routes/classification.ts`:
- Line 367: The map callback is typed as any which bypasses TS checks; update
the COA typing by using the actual model/interface returned from
storage.getChartOfAccounts (e.g., ChartOfAccount or CoaItem) instead of any in
the coa.map callback and/or adjust storage.getChartOfAccounts to return that
typed array so the mapping line coa.map((a: any) => ...) becomes coa.map((a:
ChartOfAccount) => ...) (or the appropriate interface name) to ensure proper
type checking of code, name, type, description fields.
- Around line 336-407: The ai-suggest endpoint
(classificationRoutes.post('/api/classification/ai-suggest')) lacks rate
limiting and can be abused to incur OpenAI costs; add a per-tenant (and fallback
global) rate limiter before calling classifyBatchWithAI that enforces a
reasonable quota (e.g., X requests/min or Y tokens/min) using an in-memory
token-bucket for single-instance dev and Redis/leaky-bucket for production,
return a 429 JSON error when the limit is exceeded, and log quota rejections
(include tenantId and requester) so
storage.classifyTransaction/classifyBatchWithAI calls only proceed when the
limiter permits.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ad9334fe-0a9e-4cc9-8f7b-342a633d484c
📒 Files selected for processing (3)
server/__tests__/classification-ai.test.tsserver/lib/classification-ai.tsserver/routes/classification.ts
|
Supplemental Review - PR 87. The existing review covers the critical issues well (unawaited ledgerLog, sequential writes, silent catch). Adding gaps not yet mentioned: (1) Security - transaction descriptions are passed directly into the system prompt without sanitization. A crafted description could bias confidence scores or inject misleading text into audit trail reason fields even after COA code validation. Recommend stripping newlines and control chars before the slice. (2) Cost controls - POST /api/classification/ai-suggest has no rate limiting. In a multi-tenant system, a misconfigured automation or runaway cron could rack up significant OpenAI spend. The limit param caps a single call at 25 but does not bound frequency. A KV-backed counter keyed on tenantId plus date to cap and return 429 would mitigate this. (3) Minor - txPayload omits the date field. For REI accounting, date context improves accuracy for periodic charges like property tax and HOA fees. Adding date: tx.date is a one-liner. Net: prior review items 1 and 3 (unawaited ledgerLog, sequential DB writes) are the blockers. Points 1 and 2 above are worth addressing before this endpoint is exposed to automated callers. |
- Docstring now correctly describes gpt-4o-mini default (was: GPT-4o) - Remove unused `date` field from ClassifiableTransaction interface - ai-suggest only swallows ClassificationError.reconciled_locked; DB/runtime failures now propagate to the shared error handler instead of being silently skipped - `processed` field is now consistent: all branches return txns.length (count of transactions fetched and considered), matching the semantics of the early-return paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@coderabbitai review Please evaluate:
|
Code ReviewOverall this is solid, well-designed code. The trust-path alignment (L1 suggest only), defensive fallback chain, and COA-validation-as-hallucination-guard are all good patterns. A few items worth addressing before the next iteration builds on top of this. Bugs / Type ErrorsTest fixtures include
Two options:
Option 2 is the better call given the use-case. Performance / ReliabilityNo timeout on the OpenAI HTTP call
Recommend adding an const ac = new AbortController();
const timer = setTimeout(() => ac.abort(), 15_000); // 15s max
try {
const response = await client.chat.completions.create({ ... }, { signal: ac.signal });
} finally {
clearTimeout(timer);
}The fallback already handles errors, so an abort just routes to keyword match — zero data loss.
Consider bumping to Type Safety
coa.map((a: any) => ({ code: a.code, name: a.name, type: a.type, description: a.description }))
Minor
What's Working Well
|
🧩 Analysis chainLet me gather the necessary information from the codebase before conducting the security review. 🏁 Script executed: Length of output: 30536 🏁 Script executed: Length of output: 4544
🔐 Security Review — PR
|
| Area | Finding | Severity |
|---|---|---|
| Prompt injection via transaction description | No sanitization before prompt construction | 🔴 High |
| API key in error logs | Raw err object logged by console.error |
🔴 High |
No rate limiting on /ai-suggest |
Cost amplification attack possible | 🟠 Medium |
| PII sent to OpenAI | Financial data leaves your infra | 🟠 Medium |
AI-generated reason stored unsanitized |
Stored XSS if rendered as HTML | 🟠 Medium |
aiAvailable in response |
Infra config disclosure | 🟡 Low |
Unvalidated baseURL from env |
Latent SSRF if env becomes user-controlled | 🟡 Low |
openai unpinned (^4.95.0) |
Silent minor version bumps | 🟡 Low |
ClassifiableTransaction missing date |
Interface/test contract drift | 🟡 Low |
| Breaking changes | None — purely additive | ✅ None |
Known CVEs in openai package |
None found | ✅ Clean |
✅ Actions performed
Review triggered.
Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.
Frontend for the trust-path classification workflow added in #86/#87. New: client/src/pages/Classification.tsx - Stats cards (total, classified, reconciled, unclassified + %) - Unclassified queue sorted by date/amount/confidence (ascending or descending) - Per-row actions: accept suggested code (one-click), classify as any COA code (dropdown), reconcile (L3 lock) - Three bulk actions: - Keyword Suggest: runs /api/classification/batch-suggest (L1) - AI Suggest (GPT): runs /api/classification/ai-suggest (L1, 25 at a time) - Accept High-Confidence: writes authoritative coa_code (L2) for rows passing the bulkAcceptCandidates guardrails bulkAcceptCandidates rules (defense in depth): - Confidence >= 0.8 (AI "success" threshold) - suggestedCoaCode !== '9010' (suspense means "needs a human") - |amount| <= $500 (large-dollar → manual review regardless of confidence) New: client/src/hooks/use-classification.ts - TanStack Query hooks scoped by tenantId for automatic cache invalidation on tenant switch - Queries: coa, stats, unclassified, audit trail - Mutations: classify (L2), suggest (L1 manual), reconcile (L3), batch-suggest (L1 keyword), ai-suggest (L1 GPT) Wiring: - /classification route added to App.tsx - Sidebar link added (cfo, accountant, bookkeeper roles) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: AI-assisted COA classification with GPT-4o-mini
Adds a new POST /api/classification/ai-suggest endpoint that batches
up to 25 unclassified transactions through GPT-4o-mini with structured
JSON output, falling back to keyword matching on any failure.
Library (server/lib/classification-ai.ts):
- classifyBatchWithAI() — per-request OpenAI client (Workers-safe)
- Validates AI-returned codes against the tenant's COA (rejects hallucinations)
- Clamps confidence to [0, 1]
- Always returns one suggestion per input (gap-fills with keyword fallback)
- Respects CF AI Gateway via AI_GATEWAY_ENDPOINT env
Route:
- Only processes transactions without an existing suggested_coa_code
(respects prior human/auto work)
- Writes via L1 suggest (never authoritative coa_code)
- Returns {processed, suggested, aiCount, keywordCount, aiAvailable}
Tests (9 new):
- Empty input, missing key, API error, malformed JSON, missing choices
- Confidence clamping, hallucinated-code rejection, gap-filling
- Batch size guard (MAX_BATCH=25)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: PR #87 review feedback — docstring, silent catch, processed field
- Docstring now correctly describes gpt-4o-mini default (was: GPT-4o)
- Remove unused `date` field from ClassifiableTransaction interface
- ai-suggest only swallows ClassificationError.reconciled_locked;
DB/runtime failures now propagate to the shared error handler
instead of being silently skipped
- `processed` field is now consistent: all branches return txns.length
(count of transactions fetched and considered), matching the semantics
of the early-return paths
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: classification review page with bulk-accept guardrails
Frontend for the trust-path classification workflow added in #86/#87.
New: client/src/pages/Classification.tsx
- Stats cards (total, classified, reconciled, unclassified + %)
- Unclassified queue sorted by date/amount/confidence (ascending or
descending)
- Per-row actions: accept suggested code (one-click), classify as any
COA code (dropdown), reconcile (L3 lock)
- Three bulk actions:
- Keyword Suggest: runs /api/classification/batch-suggest (L1)
- AI Suggest (GPT): runs /api/classification/ai-suggest (L1, 25 at a time)
- Accept High-Confidence: writes authoritative coa_code (L2) for rows
passing the bulkAcceptCandidates guardrails
bulkAcceptCandidates rules (defense in depth):
- Confidence >= 0.8 (AI "success" threshold)
- suggestedCoaCode !== '9010' (suspense means "needs a human")
- |amount| <= $500 (large-dollar → manual review regardless of confidence)
New: client/src/hooks/use-classification.ts
- TanStack Query hooks scoped by tenantId for automatic cache invalidation
on tenant switch
- Queries: coa, stats, unclassified, audit trail
- Mutations: classify (L2), suggest (L1 manual), reconcile (L3),
batch-suggest (L1 keyword), ai-suggest (L1 GPT)
Wiring:
- /classification route added to App.tsx
- Sidebar link added (cfo, accountant, bookkeeper roles)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
POST /api/classification/ai-suggest— batches unclassified transactions through GPT-4o-mini with structured JSON outputserver/lib/classification-ai.ts— per-request OpenAI client (Workers-safe), validates codes against tenant COA, clamps confidence, gap-fills with keyword fallbackStacked on PR #86 — merge that first.
Behavior
OPENAI_API_KEYTrust-path alignment
suggested_coa_codeonly (L1 Suggest), never authoritativecoa_codeactorIddistinguishesauto:openai-gpt4o-minivsauto:keyword-matchin audit trailTest plan
OPENAI_API_KEYunset → verify keyword fallbackOPENAI_API_KEYset → verify audit trail recordsauto:openai-gpt4o-mini🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests