Skip to content

v0.9.34 — LLM intent classifier module + empirical bench

Choose a tag to compare

@davo20019 davo20019 released this 11 May 02:54
· 167 commits to master since this release

Added

  • LLM intent classifier module (agent::llm_classifier): fail-open fast-model classification with 5-second timeout and 20-token output cap so per-call cost stays trivial. 9 unit tests.
  • Empirical bench (intent_classifier_bench_run_corpus, #[ignore]d): runs the classifier against a 27-case hand-curated corpus and reports agreement vs heuristic baseline plus latency.

Initial bench results

Tested on `google/gemini-2.5-flash` via OpenRouter:

Metric Result
Agreement with heuristic 24/27 (88.9%)
LLM failures 0
Avg latency 462 ms

Disagreements were a mix: legitimate LLM wins on implicit fact-sharing the regex misses, one heuristic win on a compound recall+action task.

Conclusion

The classifier is a useful shadow signal but the latency is too high for synchronous primary use. A follow-up release will wire it as fire-and-forget shadow mode (off by default).

No behavior change in this release — module and bench only. All 2188 library tests pass.

Full Changelog: v0.9.33...v0.9.34