fix: add PCM to WAV encoding and TTS MIME support by JacobLinCool · Pull Request #123 · JacobLinCool/mentora

JacobLinCool · 2026-04-07T22:00:51Z

Add support for Gemini TTS returning raw PCM by encoding PCM16LE into a WAV container and propagating MIME types through the stack. Introduces a new encodePcm16AsWav utility and SynthesizedAudio type, updates GeminiTTSExecutor to normalize model responses and return {audioBase64, mimeType}, and exports the WAV encoder. Adjusts frontend to use the provided audio MIME type (removing mp3 defaults), updates API spec and conversation service to handle the new payload shape, and updates tests (including a new unit test for WAV encoding) to expect audio/wav.

changeset-bot · 2026-04-07T22:00:55Z

⚠️ No Changeset found

Latest commit: 0f8283a

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

cloudflare-workers-and-pages · 2026-04-07T22:02:50Z

Deploying mentora-app with Cloudflare Pages

Latest commit:	`0f8283a`
Status:	✅ Deploy successful!
Preview URL:	https://0326bdc3.mentora-app.pages.dev
Branch Preview URL:	https://fix-audio-play.mentora-app.pages.dev

View logs

Copilot

Pull request overview

This PR updates the Mentora TTS pipeline to support Gemini returning raw PCM by wrapping it in a WAV container and propagating an explicit audio MIME type through backend → SDK → frontend playback.

Changes:

Add encodePcm16AsWav utility and a SynthesizedAudio { audioBase64, mimeType } contract for TTS executors.
Update GeminiTTSExecutor to normalize model audio responses (PCM → WAV) and return MIME type alongside base64 audio.
Adjust API/service/frontend/tests to expect and use audio/wav rather than defaulting to audio/mp3.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
packages/mentora-api/tests/sdk-full-scenario.integration.test.ts	Updates integration expectations for returned `audioMimeType` to WAV.
packages/mentora-api/tests/conversation-service-asr.unit.test.ts	Updates TTS mock + assertions to the new `{ audioBase64, mimeType }` payload and WAV MIME.
packages/mentora-api/src/lib/server/application/conversation-service.ts	Plumbs the new TTS payload shape through `addTurn` and returns `audioMimeType` from executor output.
packages/mentora-api/src/lib/explorer/api-spec.ts	Updates API explorer example to show `audio/wav`.
packages/mentora-ai/tests/tts-audio.test.ts	Adds unit test verifying WAV header/data layout for PCM→WAV encoding.
packages/mentora-ai/src/types.ts	Introduces `SynthesizedAudio` and updates `TTSExecutor.synthesize` return type accordingly.
packages/mentora-ai/src/executor/wav.ts	Adds WAV container encoding helper for raw PCM.
packages/mentora-ai/src/executor/tts.ts	Normalizes Gemini TTS audio responses (PCM → WAV) and enforces supported MIME handling.
packages/mentora-ai/src/executor/index.ts	Exports the new WAV encoder from the executor barrel.
apps/mentora/src/routes/conversations/[id]/+page.svelte	Removes MP3 defaulting and plays audio using the MIME type returned from the API.

Copilot · 2026-04-07T22:05:50Z

+    const normalizedMimeType = audioPart.mimeType
+        ?.toLowerCase()
+        .split(";")[0]
+        ?.trim();
+
+    if (
+        normalizedMimeType === "audio/wav" ||
+        normalizedMimeType === "audio/wave"
+    ) {
+        return {
+            audioBase64: data,
+            mimeType: "audio/wav",
+        };
+    }
+
+    // Gemini preview TTS returns raw 24 kHz 16-bit mono PCM by default.
+    if (!normalizedMimeType || PCM_MIME_TYPES.has(normalizedMimeType)) {
+        return encodePcm16AsWav(data);
+    }
+
+    throw new Error(
+        `Unsupported TTS audio MIME type: ${audioPart.mimeType ?? "<missing>"}`,
+    );


normalizeGeminiAudioResponse currently only accepts "audio/wav"/"audio/wave" and raw PCM (missing/"audio/l16"/"audio/pcm"). If Gemini returns other common but browser-playable audio types (e.g. audio/mpeg) or WAV aliases (e.g. audio/x-wav, audio/vnd.wave), this will throw and break TTS despite the PR goal of propagating MIME types. Consider normalizing additional WAV aliases and passing through supported codecs (or explicitly documenting/handling them) instead of throwing.

Copilot · 2026-04-07T22:05:50Z

+        bytesPerSample?: number;
+    } = {},
+): SynthesizedAudio {


encodePcm16AsWav exposes bytesPerSample/channelCount/sampleRate options, so it can generate WAV headers for non-16-bit PCM as well. The current name implies it only supports PCM16; consider either asserting bytesPerSample===2 (and maybe channelCount/sampleRate expectations) or renaming the helper to reflect the broader capability to avoid misuse.

Suggested change

bytesPerSample?: number;

} = {},

): SynthesizedAudio {

bytesPerSample?: 2;

} = {},

): SynthesizedAudio {

if (bytesPerSample !== 2) {

throw new Error(

`encodePcm16AsWav only supports 16-bit PCM input (bytesPerSample must be 2, received ${bytesPerSample}).`,

);

}

JacobLinCool self-assigned this Apr 7, 2026

Copilot AI review requested due to automatic review settings April 7, 2026 22:00

Copilot started reviewing on behalf of JacobLinCool April 7, 2026 22:01 View session

JacobLinCool merged commit 4942fc0 into main Apr 7, 2026
6 checks passed

JacobLinCool deleted the fix-audio-play branch April 7, 2026 22:04

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add PCM to WAV encoding and TTS MIME support#123

fix: add PCM to WAV encoding and TTS MIME support#123
JacobLinCool merged 1 commit into
mainfrom
fix-audio-play

JacobLinCool commented Apr 7, 2026

Uh oh!

changeset-bot Bot commented Apr 7, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 7, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Copilot AI Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JacobLinCool commented Apr 7, 2026

Uh oh!

changeset-bot Bot commented Apr 7, 2026

⚠️ No Changeset found

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 7, 2026

Deploying mentora-app with Cloudflare Pages

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants