Skip to content

fix: decode base64-encoded embeddings in recorder#64

Merged
jpr5 merged 3 commits intoCopilotKit:mainfrom
iskhakovt:fix/base64-embedding-recording
Mar 31, 2026
Merged

fix: decode base64-encoded embeddings in recorder#64
jpr5 merged 3 commits intoCopilotKit:mainfrom
iskhakovt:fix/base64-embedding-recording

Conversation

@iskhakovt
Copy link
Copy Markdown
Contributor

@iskhakovt iskhakovt commented Mar 31, 2026

Problem

When recording fixtures, the recorder fails to save OpenAI embedding responses that use base64 encoding. The OpenAI API returns base64-encoded embeddings when encoding_format: "base64" is set (the default in many SDKs, including Python's openai package).

buildFixtureResponse checks Array.isArray(first.embedding) but the embedding is a base64 string, not an array. The fixture is saved with a proxy_error:

{
  "response": {
    "error": { "message": "Could not detect response format from upstream", "type": "proxy_error" },
    "status": 200
  }
}

Fix

Extract encoding_format from the raw request body and pass it to buildFixtureResponse. When encoding_format === "base64" and the embedding is a string, decode it via Buffer.from(embedding, "base64")Float32ArrayArray.from(floats).

This is forward-safe — only decodes when encoding_format explicitly says "base64", so future encoding formats (e.g. "int8") won't be misinterpreted.

Changes

  • recorder.ts: Extract encoding_format from raw request, pass to buildFixtureResponse, decode base64 when detected

Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new bot commented Mar 31, 2026

Open in StackBlitz

npm i https://pkg.pr.new/CopilotKit/llmock/@copilotkit/llmock@64

commit: 5602fb5

@jpr5
Copy link
Copy Markdown
Contributor

jpr5 commented Mar 31, 2026

Thanks for the fix! The base64 decoding logic is correct — good use of Float32Array with byteOffset for buffer pool alignment, and gating on encoding_format === "base64" is the right approach.

I've added 3 tests covering the 2×2 matrix of (encoding_format present/absent) × (base64 string / array embedding):

  1. base64 string + encoding_format: "base64" → decodes to [0.5, 1, -0.25]
  2. base64 string + no encoding_format → falls through to proxy_error (the original bug path)
  3. array embedding + encoding_format: "base64"Array.isArray wins, array passthrough

All red-green verified. The tests are on our branch `fix/base64-embedding-recording` — feel free to cherry-pick commit `0617f69` or I can merge them into this PR.

One minor note: if `buf.byteLength % 4 !== 0` (corrupted base64), the `Float32Array` constructor will throw an uncaught `RangeError`. Not a blocker since OpenAI won't send corrupted data, but a `try/catch` around the decode with fallthrough to the existing error path would be more defensive.

Guard the Float32Array decode with try/catch so corrupted base64 data
falls through to the error path instead of crashing.

Add 4 tests covering the encoding_format × embedding-type matrix:
- base64 string + encoding_format:base64 → decodes to float array
- base64 string + no encoding_format → proxy_error (original bug path)
- array embedding + encoding_format:base64 → array passthrough
- truncated base64 (odd byte count) → empty embedding, no crash
@jpr5 jpr5 merged commit abb6b1d into CopilotKit:main Mar 31, 2026
10 checks passed
jpr5 added a commit that referenced this pull request Mar 31, 2026
Patch release with recorder fixes:
- Preserve upstream URL path prefixes in record proxy (#57)
- Forward all request headers in record proxy (#58)
- Decode base64-encoded embeddings in recorder (#64)
- Guard base64 decode against corrupted data
- Update CHANGELOG, skill docs
@jpr5 jpr5 mentioned this pull request Mar 31, 2026
jpr5 added a commit that referenced this pull request Mar 31, 2026
## Summary

Patch release capturing recorder fixes that landed since v1.6.0.

### Patch Changes

- Fix record proxy to preserve upstream URL path prefixes (#57)
- Fix record proxy to forward all request headers to upstream (#58)
- Fix recorder to decode base64-encoded embeddings with
`encoding_format: "base64"` (#64)
- Guard base64 decode against corrupted data
- Update CHANGELOG, skill docs, competitive matrix script

### Files changed

- `package.json` — version 1.6.0 → 1.6.1
- `CHANGELOG.md` — new 1.6.1 section
- `skills/write-fixtures/SKILL.md` — recorder docs updated

All 1,327 tests pass. Build clean.
jpr5 added a commit that referenced this pull request Apr 3, 2026
## Problem

When recording fixtures, the recorder fails to save OpenAI embedding
responses that use base64 encoding. The OpenAI API returns
base64-encoded embeddings when `encoding_format: "base64"` is set (the
default in many SDKs, including Python's `openai` package).

`buildFixtureResponse` checks `Array.isArray(first.embedding)` but the
embedding is a base64 string, not an array. The fixture is saved with a
`proxy_error`:

```json
{
  "response": {
    "error": { "message": "Could not detect response format from upstream", "type": "proxy_error" },
    "status": 200
  }
}
```

## Fix

Extract `encoding_format` from the raw request body and pass it to
`buildFixtureResponse`. When `encoding_format === "base64"` and the
embedding is a string, decode it via `Buffer.from(embedding, "base64")`
→ `Float32Array` → `Array.from(floats)`.

This is forward-safe — only decodes when `encoding_format` explicitly
says `"base64"`, so future encoding formats (e.g. `"int8"`) won't be
misinterpreted.

## Changes

- `recorder.ts`: Extract `encoding_format` from raw request, pass to
`buildFixtureResponse`, decode base64 when detected
jpr5 added a commit that referenced this pull request Apr 3, 2026
Patch release with recorder fixes:
- Preserve upstream URL path prefixes in record proxy (#57)
- Forward all request headers in record proxy (#58)
- Decode base64-encoded embeddings in recorder (#64)
- Guard base64 decode against corrupted data
- Update CHANGELOG, skill docs
jpr5 added a commit that referenced this pull request Apr 3, 2026
## Summary

Patch release capturing recorder fixes that landed since v1.6.0.

### Patch Changes

- Fix record proxy to preserve upstream URL path prefixes (#57)
- Fix record proxy to forward all request headers to upstream (#58)
- Fix recorder to decode base64-encoded embeddings with
`encoding_format: "base64"` (#64)
- Guard base64 decode against corrupted data
- Update CHANGELOG, skill docs, competitive matrix script

### Files changed

- `package.json` — version 1.6.0 → 1.6.1
- `CHANGELOG.md` — new 1.6.1 section
- `skills/write-fixtures/SKILL.md` — recorder docs updated

All 1,327 tests pass. Build clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants