fix: decode base64-encoded embeddings in recorder#64
Conversation
commit: |
|
Thanks for the fix! The base64 decoding logic is correct — good use of I've added 3 tests covering the 2×2 matrix of (encoding_format present/absent) × (base64 string / array embedding):
All red-green verified. The tests are on our branch `fix/base64-embedding-recording` — feel free to cherry-pick commit `0617f69` or I can merge them into this PR. One minor note: if `buf.byteLength % 4 !== 0` (corrupted base64), the `Float32Array` constructor will throw an uncaught `RangeError`. Not a blocker since OpenAI won't send corrupted data, but a `try/catch` around the decode with fallthrough to the existing error path would be more defensive. |
Guard the Float32Array decode with try/catch so corrupted base64 data falls through to the error path instead of crashing. Add 4 tests covering the encoding_format × embedding-type matrix: - base64 string + encoding_format:base64 → decodes to float array - base64 string + no encoding_format → proxy_error (original bug path) - array embedding + encoding_format:base64 → array passthrough - truncated base64 (odd byte count) → empty embedding, no crash
## Summary Patch release capturing recorder fixes that landed since v1.6.0. ### Patch Changes - Fix record proxy to preserve upstream URL path prefixes (#57) - Fix record proxy to forward all request headers to upstream (#58) - Fix recorder to decode base64-encoded embeddings with `encoding_format: "base64"` (#64) - Guard base64 decode against corrupted data - Update CHANGELOG, skill docs, competitive matrix script ### Files changed - `package.json` — version 1.6.0 → 1.6.1 - `CHANGELOG.md` — new 1.6.1 section - `skills/write-fixtures/SKILL.md` — recorder docs updated All 1,327 tests pass. Build clean.
## Problem
When recording fixtures, the recorder fails to save OpenAI embedding
responses that use base64 encoding. The OpenAI API returns
base64-encoded embeddings when `encoding_format: "base64"` is set (the
default in many SDKs, including Python's `openai` package).
`buildFixtureResponse` checks `Array.isArray(first.embedding)` but the
embedding is a base64 string, not an array. The fixture is saved with a
`proxy_error`:
```json
{
"response": {
"error": { "message": "Could not detect response format from upstream", "type": "proxy_error" },
"status": 200
}
}
```
## Fix
Extract `encoding_format` from the raw request body and pass it to
`buildFixtureResponse`. When `encoding_format === "base64"` and the
embedding is a string, decode it via `Buffer.from(embedding, "base64")`
→ `Float32Array` → `Array.from(floats)`.
This is forward-safe — only decodes when `encoding_format` explicitly
says `"base64"`, so future encoding formats (e.g. `"int8"`) won't be
misinterpreted.
## Changes
- `recorder.ts`: Extract `encoding_format` from raw request, pass to
`buildFixtureResponse`, decode base64 when detected
## Summary Patch release capturing recorder fixes that landed since v1.6.0. ### Patch Changes - Fix record proxy to preserve upstream URL path prefixes (#57) - Fix record proxy to forward all request headers to upstream (#58) - Fix recorder to decode base64-encoded embeddings with `encoding_format: "base64"` (#64) - Guard base64 decode against corrupted data - Update CHANGELOG, skill docs, competitive matrix script ### Files changed - `package.json` — version 1.6.0 → 1.6.1 - `CHANGELOG.md` — new 1.6.1 section - `skills/write-fixtures/SKILL.md` — recorder docs updated All 1,327 tests pass. Build clean.
Problem
When recording fixtures, the recorder fails to save OpenAI embedding responses that use base64 encoding. The OpenAI API returns base64-encoded embeddings when
encoding_format: "base64"is set (the default in many SDKs, including Python'sopenaipackage).buildFixtureResponsechecksArray.isArray(first.embedding)but the embedding is a base64 string, not an array. The fixture is saved with aproxy_error:{ "response": { "error": { "message": "Could not detect response format from upstream", "type": "proxy_error" }, "status": 200 } }Fix
Extract
encoding_formatfrom the raw request body and pass it tobuildFixtureResponse. Whenencoding_format === "base64"and the embedding is a string, decode it viaBuffer.from(embedding, "base64")→Float32Array→Array.from(floats).This is forward-safe — only decodes when
encoding_formatexplicitly says"base64", so future encoding formats (e.g."int8") won't be misinterpreted.Changes
recorder.ts: Extractencoding_formatfrom raw request, pass tobuildFixtureResponse, decode base64 when detected