Record gpt-oss harmony tool calls + stream/recorder hardening#245
Merged
Conversation
…ng, and record audio companions
commit: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
aimock could not record fixtures from local gpt-oss models (e.g. via Ollama/vLLM/OpenRouter) that emit OpenAI harmony channel tokens — the raw control tokens (
<|channel|>commentary to=functions.NAME <|constrain|>json<|message|>{...}<|call|>) leaked into recorded content instead of being parsed into tool calls. This PR adds harmony parsing plus a round of stream/recorder hardening surfaced while fixing it.Reported in Slack: local gpt-oss (gpt-5.x class) recordings were unusable because hosted OpenAI pre-parses harmony, but local runtimes pass it through raw.
What changed (by concern)
Harmony channel parser (
src/harmony.ts) — a two-phase lexer + state-machine that routes harmonyanalysis/commentary/finalchannels into reasoning/tool-calls/content. Uniform all-or-nothing fail-safe: on any structural deviation it returns the original content verbatim and signalsharmonyUnparsed/harmonyNote— it never mangles, leaks a control token, or glues message bodies together. Wired as fallback-only (used only when no structured tool calls were already parsed), so it can never produce phantom/merged calls.Stream collapser hardening (
src/stream-collapse.ts) — multi-data:-line and CRLF SSE handling; missing/uncorrelatedtool_callindex guards with symmetric dropped-chunk accounting across OpenAI/Anthropic/Bedrock/Cohere (uncorrelated arg deltas now incrementdroppedChunks+ capture afirstDroppedSampleinstead of vanishing silently); Bedrock EventStream header-bounds validation (malformed frames degrade totruncatedinstead of throwing).Recorder (
src/recorder.ts,src/types.ts) — incremental multibyte UTF-8 decoding (StringDecoder) so characters split across stream chunks are no longer corrupted on the frame-timing path; CRLF-tolerant frame-timing splitter;webSearchespropagation into fixtures; audio-branch companion fields (toolCalls/content/reasoning) persisted;firstDroppedSamplelogged alongside the dropped-chunk warning.Gemini audio companion replay (
src/gemini.ts) — the audio replay builders now re-emit companion tool-call/text/thought parts (mirroring the non-audio builder) instead of dropping them, completing the record→replay round-trip for audio turns that interleave tool calls.Tests
+~3.4k lines of tests: harmony structural acceptance matrix + boundary/fail-safe regressions; collapser robustness (index guards, dropped-chunk accounting, CRLF); multibyte decode (both decode paths, byte-by-byte split); Gemini audio companion record→replay. Full suite: 3308 passed / 0 failed, tsc clean, tsdown build green.
Documented limitations
audioB64collapse is currently Gemini-only, so cross-provider audio fixtures would not replay companion modalities. Documented at the type and builder.Test plan
npx vitest run— 3308 passed, 37 skipped, 0 failednpx tsc --noEmit— cleanpnpm build(tsdown) — clean