Skip to content

speech synthesize --stream outputs raw SSE/JSON instead of decoded audio bytes #54

@moxi000

Description

@moxi000

Summary

mmx speech synthesize --stream is documented as streaming raw audio to stdout, and --help even ships an example piping it directly into mpv. In practice it writes the upstream Server-Sent Events stream verbatim — JSON envelopes containing hex-encoded audio — so no audio player can decode it.

Reproduce

mmx speech synthesize --text "Stream me" --stream | mpv -

Result:

[file] Reading from stdin...
Failed to recognize file format.
Exiting... (Errors when loading file)

Inspecting the raw bytes mmx writes to stdout:

mmx speech synthesize --text "Stream me" --stream > out.bin
head -c 80 out.bin
# data: {"data":{"audio":"494433040000000000235453534500...

So stdout contains:

  1. SSE framing (data: prefix, blank-line separators)
  2. A JSON object per event
  3. The audio inside .data.audio as a hex string (49 44 33 = ID3, i.e. it really is MP3 — just hex-encoded and JSON-wrapped)

--out works correctly and produces a valid MP3, and its help text even says "uses hex decoding", confirming the decode logic exists — it's just not applied on the --stream code path.

Expected

--stream should write decoded raw audio bytes to stdout (parse SSE → JSON → hex-decode .data.audio → write binary), so the documented example actually works:

mmx speech synthesize --text "Stream" --stream | mpv --no-terminal -

Related: unhandled EPIPE crash

While debugging this, I also hit an unhandled EPIPE when the downstream process exits early (e.g. mpv not installed, or any ... | head-like pipe):

node:events:486
      throw er; // Unhandled 'error' event
Error: write EPIPE
    at afterWriteDispatched (node:internal/stream_base_commons:159:15)
    ...
    at Object.run [as execute] (file:///opt/homebrew/lib/node_modules/mmx-cli/dist/mmx.mjs:145:3454)

A CLI that writes to stdout should treat EPIPE as a normal pipe-close, not an unhandled exception. Suggested fix at the entry point:

process.stdout.on('error', (e) => {
  if (e.code === 'EPIPE') process.exit(0);
  else throw e;
});

Happy to file that as a separate issue if preferred.

Workaround

Until --stream is fixed, this manually does what --stream is documented to do:

mmx speech synthesize --text "Stream me" --stream \
  | jq -rR 'select(startswith("data: ")) | .[6:] | fromjson | .data.audio // empty' \
  | xxd -r -p \
  | mpv --no-terminal -

Environment

  • mmx-cli: installed via npm i -g mmx-cli (latest)
  • Node.js: v25.8.2
  • OS: macOS 26.5 (Apple Silicon)
  • Model: speech-2.8-hd, region cn

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions