Skip to content

fix(ai-gateway): strip NUL bytes before microdollar_usage insert#2670

Merged
marius-kilocode merged 5 commits intomainfrom
marius/fix-microdollar-usage-nul-bytes
Apr 24, 2026
Merged

fix(ai-gateway): strip NUL bytes before microdollar_usage insert#2670
marius-kilocode merged 5 commits intomainfrom
marius/fix-microdollar-usage-nul-bytes

Conversation

@marius-kilocode
Copy link
Copy Markdown
Contributor

Summary

Postgres text columns reject NUL bytes with 22021 invalid byte sequence for encoding "UTF8": 0x00, which crashes the microdollar_usage CTE insert in toInsertableDbUsageRecord and silently leaves the request un-billed. Observed rate: ~1.9k silent billing-write failures/day across FIM, chat, and responses paths.

  • Fix: add stripNulBytesInPlace helper, apply it to both core and metadata right before returning from toInsertableDbUsageRecord.
  • Observability: 5%-sampled captureMessage with the sanitized field list so we can identify the upstream source and fix it at the origin.
  • Tests: pure helper, realistic body-sourced NULs (prompt content, model IDs, upstream response fields), defensively-constructed header-sourced NULs, clean no-NUL no-op.

Sentry: KILOCODE-WEB-1G3Z — 1,363 events over ~5 weeks, multiple LLM paths.

Why sanitize at the DB boundary

The realistic NUL source is JSON-body-derived fields — prompt content (system_prompt_prefix, user_prompt_prefix), LLM response fields (model, inference_provider, message_id, finish_reason, upstream_id, requested_model). HTTP header-sourced fields (http_user_agent, machine_id, session_id, etc.) can't realistically carry NULs because Node's Headers constructor rejects them per RFC 7230 — documented in the test.

Sanitizing at the toInsertableDbUsageRecord boundary closes the bleed for every LLM path at once with a single chokepoint, and defensively covers header-sourced fields in case a future code path bypasses Headers validation.

Once the sampled captureMessage surfaces the actual source field(s), the plan is to sanitize upstream and remove the defensive sanitizer.

Blast radius

  • Event sample (last 50 from KILOCODE-WEB-1G3Z):
    • /api/fim/completions (Mistral Codestral) — 66%
    • /api/openrouter/responses (grok-code-fast-1:free) — 26%
    • /api/openrouter/chat/completions — 6%
    • /api/gateway/chat/completions — 2%
  • No KiloClaw endpoints affected.
  • Current behavior: request succeeds, user not billed, analytics under-count. Low $/day leak, but the microdollar_usage table also feeds abuse detection and the 'first usage' PostHog event — downstream analytics are under-counting.

Risk

Low. Sanitizer is a one-shot walk over two flat objects, only modifies strings that actually contain a NUL (early-exits via indexOf), and the behavior is exercised by unit tests. No DB or schema changes.

Test plan

  • pnpm --filter @kilocode/web exec jest src/lib/ai-gateway/processUsage.test.ts — all 27 tests pass (6 new).
  • pnpm --filter @kilocode/web typecheck — clean.
  • pnpm -w exec oxlint --config .oxlintrc.json apps/web/src/lib/ai-gateway/processUsage.ts apps/web/src/lib/ai-gateway/processUsage.test.ts — clean.

Follow-up

  • Watch the sampled captureMessage in Sentry for a few days to identify which field is the dominant NUL source.
  • Once known, sanitize at the source (likely extractPromptInfo or an upstream body parser) and remove this defensive sanitizer.

Postgres `text` columns reject NUL bytes with `22021 invalid byte
sequence for encoding "UTF8": 0x00`, which crashes the
microdollar_usage CTE insert and silently leaves the request un-billed.
The observed rate is ~1.9k silent billing-write failures/day across
FIM, chat, and responses paths.

The realistic source is JSON-body-derived fields (prompt content, model
IDs, upstream response fields like message_id / finish_reason) where
NULs can pass through client/upstream JSON without HTTP-header-level
rejection. Sanitizing at the DB boundary in `toInsertableDbUsageRecord`
closes the bleed for every LLM path at once.

- Add `stripNulBytesInPlace` helper that mutates string fields and
  records sanitized field names for observability.
- Apply to both `core` and `metadata` before returning from
  `toInsertableDbUsageRecord`.
- Emit a 5%-sampled `captureMessage` with the sanitized field list so
  we can identify the upstream source and fix it at the origin.
- Export `stripNulBytesInPlace`, `toInsertableDbUsageRecord`, and
  `extractUsageContextInfo` for unit testing.
- Add Jest tests covering: the pure helper, realistic body-sourced
  NULs, defensively-constructed header-sourced NULs, and the clean
  no-NUL path.

Refs: Sentry KILOCODE-WEB-1G3Z
@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented Apr 22, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Files Reviewed (1 files)
  • apps/web/src/lib/ai-gateway/processUsage.ts

Reviewed by gpt-5.4-20260305 · 290,758 tokens

@marius-kilocode marius-kilocode enabled auto-merge (squash) April 23, 2026 08:26
Comment thread apps/web/src/lib/ai-gateway/processUsage.ts Outdated
…ureMessage for NUL-byte diagnostic

Per chrarnoldus review: this is a one-off source-attribution probe, not an
issue to triage in Sentry. console.warn lands in Axiom with full structured
context, which is queryable and doesn't eat Sentry quota. Drops the 5%
sampling gate since quota is no longer a concern, so the dominant source
field will surface faster.
MicrodollarUsageContext gained a required ttfb_ms field in #2734 after
this branch diverged. Merged main and set ttfb_ms: null in the test
helper so the new NUL-byte sanitization tests satisfy the type.
@marius-kilocode marius-kilocode merged commit bb6b59d into main Apr 24, 2026
16 checks passed
@marius-kilocode marius-kilocode deleted the marius/fix-microdollar-usage-nul-bytes branch April 24, 2026 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants