Skip to content

Add examples/server: warm transcriber daemon#19

Merged
Frikallo merged 2 commits into
Frikallo:mainfrom
silverstein:silverstein/server-example
Apr 21, 2026
Merged

Add examples/server: warm transcriber daemon#19
Frikallo merged 2 commits into
Frikallo:mainfrom
silverstein:silverstein/server-example

Conversation

@silverstein
Copy link
Copy Markdown
Contributor

Summary

Thanks to @m13v for surfacing the warm-reuse discussion in #3.

This adds an opt-in examples/server program that keeps a loaded Parakeet model warm inside one process and serves newline-delimited JSON requests over a Unix domain socket.

The goal is to provide a supported persistent-process example for users who want warm model reuse without changing parakeet.cpp core code.

Addresses #3.

What this adds

  • ENABLE_SERVER_EXAMPLE=ON CMake flag
  • make build SERVER=ON convenience wiring
  • examples/server/main.cpp
  • examples/server/README.md
  • examples/README.md entry

The example:

  • loads one model instance at startup
  • listens on a Unix domain socket
  • accepts one JSON request per line
  • returns one JSON response per line
  • logs operational events to stderr
  • ignores SIGPIPE so dropped clients do not terminate the server

Protocol

Example request:

{"request_id":"demo","audio_path":"/path/to/audio.wav","decoder":"tdt","timestamps":true}

Example response:

{"ok":true,"request_id":"demo","text":"...","elapsed_ms":812,"word_timestamps":[...]}

Local benchmark

I measured cold one-shot CLI runs against warm daemon requests on a 2-second sample (samples/mm1-short.wav), 5 runs each:

tdt-ctc-110m

  • CLI cold: 0.686, 0.693, 0.679, 0.689, 0.688

  • Mean/stddev: 0.687s ± 0.005s

  • Daemon warm: 0.564, 0.555, 0.548, 0.548, 0.552

  • Mean/stddev: 0.554s ± 0.007s

tdt-600m

  • CLI cold: 2.937, 2.913, 2.970, 2.935, 2.921

  • Mean/stddev: 2.935s ± 0.022s

  • Daemon warm: 2.240, 2.242, 2.237, 2.245, 2.239

  • Mean/stddev: 2.240s ± 0.003s

So this example does show a repeatable warm-state benefit on this machine, but I am framing it as an example pattern rather than a claim that it fully resolves the latency discussion in #3.

Scope

This is intentionally example-grade:

  • one warm model per process
  • Unix domain sockets only
  • no auth or TLS
  • single-threaded: requests are handled sequentially; concurrent workloads need a wrapper
  • meant to be wrapped or adapted downstream

Verification

Built with:

make build
make build SERVER=ON

Verified locally by:

  • running the existing one-shot CLI on a real audio sample
  • starting example-server
  • sending repeated JSON requests over the Unix socket
  • confirming the same warm process returned transcript JSON responses

@Frikallo
Copy link
Copy Markdown
Owner

Is this ready for review?

@silverstein
Copy link
Copy Markdown
Contributor Author

Yes, ready for review. Nothing pending on my end. The diff has been stable since the initial push and I don't have further changes planned unless review surfaces something.

Happy to scope down or split if any part feels out of bounds for examples/.

Comment thread examples/CMakeLists.txt Outdated
Comment on lines +17 to +23
if(ENABLE_SERVER_EXAMPLE)
if(UNIX)
add_subdirectory(server)
else()
message(WARNING "ENABLE_SERVER_EXAMPLE is ON, but the server example currently requires Unix domain sockets")
endif()
endif()
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every other option uses PARAKEET_BUILD_* (CLI, TESTS, EXAMPLES, BENCHMARKS). ENABLE_SERVER_EXAMPLE is the odd one out. Rename to PARAKEET_BUILD_SERVER_EXAMPLE and update the three call sites. It also makes the make build SERVER=ON -> -DPARAKEET_BUILD_SERVER_EXAMPLE=ON flow match the CLI=OFF pattern above it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Renamed to PARAKEET_BUILD_SERVER_EXAMPLE and updated the three call sites (top-level CMakeLists.txt option + status line, examples/CMakeLists.txt guard + warning, Makefile SERVER= passthrough) plus the example README.

Comment thread examples/server/main.cpp
if (options.decoder == parakeet::Decoder::TDT_BEAM &&
!options.lm_path.empty()) {
lm.load(options.lm_path);
}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LM is reloaded on every request in the warm TDT-600 path. Cache by path (a std::unordered_map<std::string, ArpaLM> on the transcriber)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added std::unordered_map<std::string, ArpaLM> lm_cache and get_or_load_lm(path) on WarmTDT600Transcriber. The LM is now loaded exactly once per unique path per process lifetime, and the two inner decode sites take a const ArpaLM* from the cache instead of a stack-local that was rebuilt on every call. No locking since the server is single-threaded.

Verified the cache miss/hit behavior by running the daemon with --model tdt-600m against a synthetic 15k-bigram ARPA and watching stderr: the loading LM from line appears exactly once across multiple tdt-beam requests with the same lm_path.

Left the CTC-110m warm path (WarmTranscriber) alone since its LM handling lives inside parakeet::Transcriber::transcribe in the library. That's a separate library-level change rather than an example fix, happy to open a follow-up if you'd like.

@Frikallo
Copy link
Copy Markdown
Owner

Thank you for your contribution! Will merge once pending reviews are addressed

- rename ENABLE_SERVER_EXAMPLE to PARAKEET_BUILD_SERVER_EXAMPLE to match
  the PARAKEET_BUILD_* convention (CLI, TESTS, EXAMPLES, BENCHMARKS)
- cache ArpaLM by path on WarmTDT600Transcriber so a warm daemon does not
  reload the same LM on every tdt-beam request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants