Add examples/server: warm transcriber daemon by silverstein · Pull Request #19 · Frikallo/parakeet.cpp

silverstein · 2026-04-14T19:35:44Z

Summary

Thanks to @m13v for surfacing the warm-reuse discussion in #3.

This adds an opt-in examples/server program that keeps a loaded Parakeet model warm inside one process and serves newline-delimited JSON requests over a Unix domain socket.

The goal is to provide a supported persistent-process example for users who want warm model reuse without changing parakeet.cpp core code.

Addresses #3.

What this adds

ENABLE_SERVER_EXAMPLE=ON CMake flag
make build SERVER=ON convenience wiring
examples/server/main.cpp
examples/server/README.md
examples/README.md entry

The example:

loads one model instance at startup
listens on a Unix domain socket
accepts one JSON request per line
returns one JSON response per line
logs operational events to stderr
ignores SIGPIPE so dropped clients do not terminate the server

Protocol

Example request:

{"request_id":"demo","audio_path":"/path/to/audio.wav","decoder":"tdt","timestamps":true}

Example response:

{"ok":true,"request_id":"demo","text":"...","elapsed_ms":812,"word_timestamps":[...]}

Local benchmark

I measured cold one-shot CLI runs against warm daemon requests on a 2-second sample (samples/mm1-short.wav), 5 runs each:

tdt-ctc-110m

CLI cold: 0.686, 0.693, 0.679, 0.689, 0.688
Mean/stddev: 0.687s ± 0.005s
Daemon warm: 0.564, 0.555, 0.548, 0.548, 0.552
Mean/stddev: 0.554s ± 0.007s

tdt-600m

CLI cold: 2.937, 2.913, 2.970, 2.935, 2.921
Mean/stddev: 2.935s ± 0.022s
Daemon warm: 2.240, 2.242, 2.237, 2.245, 2.239
Mean/stddev: 2.240s ± 0.003s

So this example does show a repeatable warm-state benefit on this machine, but I am framing it as an example pattern rather than a claim that it fully resolves the latency discussion in #3.

Scope

This is intentionally example-grade:

one warm model per process
Unix domain sockets only
no auth or TLS
single-threaded: requests are handled sequentially; concurrent workloads need a wrapper
meant to be wrapped or adapted downstream

Verification

Built with:

make build
make build SERVER=ON

Verified locally by:

running the existing one-shot CLI on a real audio sample
starting example-server
sending repeated JSON requests over the Unix socket
confirming the same warm process returned transcript JSON responses

Frikallo · 2026-04-21T00:22:40Z

Is this ready for review?

silverstein · 2026-04-21T15:07:59Z

Yes, ready for review. Nothing pending on my end. The diff has been stable since the initial push and I don't have further changes planned unless review surfaces something.

Happy to scope down or split if any part feels out of bounds for examples/.

Frikallo · 2026-04-21T15:47:46Z

+if(ENABLE_SERVER_EXAMPLE)
+    if(UNIX)
+        add_subdirectory(server)
+    else()
+        message(WARNING "ENABLE_SERVER_EXAMPLE is ON, but the server example currently requires Unix domain sockets")
+    endif()
+endif()


Every other option uses PARAKEET_BUILD_* (CLI, TESTS, EXAMPLES, BENCHMARKS). ENABLE_SERVER_EXAMPLE is the odd one out. Rename to PARAKEET_BUILD_SERVER_EXAMPLE and update the three call sites. It also makes the make build SERVER=ON -> -DPARAKEET_BUILD_SERVER_EXAMPLE=ON flow match the CLI=OFF pattern above it.

Done. Renamed to PARAKEET_BUILD_SERVER_EXAMPLE and updated the three call sites (top-level CMakeLists.txt option + status line, examples/CMakeLists.txt guard + warning, Makefile SERVER= passthrough) plus the example README.

Frikallo · 2026-04-21T15:52:13Z

+        if (options.decoder == parakeet::Decoder::TDT_BEAM &&
+            !options.lm_path.empty()) {
+            lm.load(options.lm_path);
+        }


LM is reloaded on every request in the warm TDT-600 path. Cache by path (a std::unordered_map<std::string, ArpaLM> on the transcriber)

Done. Added std::unordered_map<std::string, ArpaLM> lm_cache and get_or_load_lm(path) on WarmTDT600Transcriber. The LM is now loaded exactly once per unique path per process lifetime, and the two inner decode sites take a const ArpaLM* from the cache instead of a stack-local that was rebuilt on every call. No locking since the server is single-threaded.

Verified the cache miss/hit behavior by running the daemon with --model tdt-600m against a synthetic 15k-bigram ARPA and watching stderr: the loading LM from line appears exactly once across multiple tdt-beam requests with the same lm_path.

Left the CTC-110m warm path (WarmTranscriber) alone since its LM handling lives inside parakeet::Transcriber::transcribe in the library. That's a separate library-level change rather than an example fix, happy to open a follow-up if you'd like.

Frikallo · 2026-04-21T15:58:25Z

Thank you for your contribution! Will merge once pending reviews are addressed

- rename ENABLE_SERVER_EXAMPLE to PARAKEET_BUILD_SERVER_EXAMPLE to match the PARAKEET_BUILD_* convention (CLI, TESTS, EXAMPLES, BENCHMARKS) - cache ArpaLM by path on WarmTDT600Transcriber so a warm daemon does not reload the same LM on every tdt-beam request

examples: add warm transcriber server example

f0d5838

Frikallo requested changes Apr 21, 2026

View reviewed changes

Frikallo approved these changes Apr 21, 2026

View reviewed changes

Frikallo merged commit 095634c into Frikallo:main Apr 21, 2026

silverstein deleted the silverstein/server-example branch April 21, 2026 17:09

Frikallo mentioned this pull request Apr 25, 2026

Cache ARPA LM #22

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add examples/server: warm transcriber daemon#19

Add examples/server: warm transcriber daemon#19
Frikallo merged 2 commits into
Frikallo:mainfrom
silverstein:silverstein/server-example

silverstein commented Apr 14, 2026

Uh oh!

Frikallo commented Apr 21, 2026

Uh oh!

silverstein commented Apr 21, 2026

Uh oh!

Frikallo Apr 21, 2026

Uh oh!

silverstein Apr 21, 2026

Uh oh!

Frikallo Apr 21, 2026

Uh oh!

silverstein Apr 21, 2026

Uh oh!

Frikallo commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

silverstein commented Apr 14, 2026

Summary

What this adds

Protocol

Local benchmark

tdt-ctc-110m

tdt-600m

Scope

Verification

Uh oh!

Frikallo commented Apr 21, 2026

Uh oh!

silverstein commented Apr 21, 2026

Uh oh!

Frikallo Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

silverstein Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Frikallo Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

silverstein Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Frikallo commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants