Add examples/server: warm transcriber daemon#19
Conversation
|
Is this ready for review? |
|
Yes, ready for review. Nothing pending on my end. The diff has been stable since the initial push and I don't have further changes planned unless review surfaces something. Happy to scope down or split if any part feels out of bounds for |
| if(ENABLE_SERVER_EXAMPLE) | ||
| if(UNIX) | ||
| add_subdirectory(server) | ||
| else() | ||
| message(WARNING "ENABLE_SERVER_EXAMPLE is ON, but the server example currently requires Unix domain sockets") | ||
| endif() | ||
| endif() |
There was a problem hiding this comment.
Every other option uses PARAKEET_BUILD_* (CLI, TESTS, EXAMPLES, BENCHMARKS). ENABLE_SERVER_EXAMPLE is the odd one out. Rename to PARAKEET_BUILD_SERVER_EXAMPLE and update the three call sites. It also makes the make build SERVER=ON -> -DPARAKEET_BUILD_SERVER_EXAMPLE=ON flow match the CLI=OFF pattern above it.
There was a problem hiding this comment.
Done. Renamed to PARAKEET_BUILD_SERVER_EXAMPLE and updated the three call sites (top-level CMakeLists.txt option + status line, examples/CMakeLists.txt guard + warning, Makefile SERVER= passthrough) plus the example README.
| if (options.decoder == parakeet::Decoder::TDT_BEAM && | ||
| !options.lm_path.empty()) { | ||
| lm.load(options.lm_path); | ||
| } |
There was a problem hiding this comment.
LM is reloaded on every request in the warm TDT-600 path. Cache by path (a std::unordered_map<std::string, ArpaLM> on the transcriber)
There was a problem hiding this comment.
Done. Added std::unordered_map<std::string, ArpaLM> lm_cache and get_or_load_lm(path) on WarmTDT600Transcriber. The LM is now loaded exactly once per unique path per process lifetime, and the two inner decode sites take a const ArpaLM* from the cache instead of a stack-local that was rebuilt on every call. No locking since the server is single-threaded.
Verified the cache miss/hit behavior by running the daemon with --model tdt-600m against a synthetic 15k-bigram ARPA and watching stderr: the loading LM from line appears exactly once across multiple tdt-beam requests with the same lm_path.
Left the CTC-110m warm path (WarmTranscriber) alone since its LM handling lives inside parakeet::Transcriber::transcribe in the library. That's a separate library-level change rather than an example fix, happy to open a follow-up if you'd like.
|
Thank you for your contribution! Will merge once pending reviews are addressed |
- rename ENABLE_SERVER_EXAMPLE to PARAKEET_BUILD_SERVER_EXAMPLE to match the PARAKEET_BUILD_* convention (CLI, TESTS, EXAMPLES, BENCHMARKS) - cache ArpaLM by path on WarmTDT600Transcriber so a warm daemon does not reload the same LM on every tdt-beam request
Summary
Thanks to @m13v for surfacing the warm-reuse discussion in #3.
This adds an opt-in
examples/serverprogram that keeps a loaded Parakeet model warm inside one process and serves newline-delimited JSON requests over a Unix domain socket.The goal is to provide a supported persistent-process example for users who want warm model reuse without changing
parakeet.cppcore code.Addresses #3.
What this adds
ENABLE_SERVER_EXAMPLE=ONCMake flagmake build SERVER=ONconvenience wiringexamples/server/main.cppexamples/server/README.mdexamples/README.mdentryThe example:
SIGPIPEso dropped clients do not terminate the serverProtocol
Example request:
{"request_id":"demo","audio_path":"/path/to/audio.wav","decoder":"tdt","timestamps":true}Example response:
{"ok":true,"request_id":"demo","text":"...","elapsed_ms":812,"word_timestamps":[...]}Local benchmark
I measured cold one-shot CLI runs against warm daemon requests on a 2-second sample (
samples/mm1-short.wav), 5 runs each:tdt-ctc-110m
CLI cold: 0.686, 0.693, 0.679, 0.689, 0.688
Mean/stddev: 0.687s ± 0.005s
Daemon warm: 0.564, 0.555, 0.548, 0.548, 0.552
Mean/stddev: 0.554s ± 0.007s
tdt-600m
CLI cold: 2.937, 2.913, 2.970, 2.935, 2.921
Mean/stddev: 2.935s ± 0.022s
Daemon warm: 2.240, 2.242, 2.237, 2.245, 2.239
Mean/stddev: 2.240s ± 0.003s
So this example does show a repeatable warm-state benefit on this machine, but I am framing it as an example pattern rather than a claim that it fully resolves the latency discussion in #3.
Scope
This is intentionally example-grade:
Verification
Built with:
Verified locally by:
example-server