test(llm): cover llama.cpp backend integration paths#1248
test(llm): cover llama.cpp backend integration paths#1248kovtcharov-amd wants to merge 4 commits into
Conversation
|
Solid regression-prevention suite that earns its place in CI — 46 tests with no running server required, directly encoding the #1030 contract in executable form. Three nits worth tightening before merge. SummaryThe tests are correctly implemented, well-organized by concern, and mock at the right level. Issues Found🟢 Minor —
|
|
46 new mocked unit tests covering the llama.cpp-specific paths that caused the #1030 production regression — a solid, focused addition. All 46 tests are structurally correct against the current source; the public behavior contracts (payload construction, ctx-size resolution, retryable logic) are pinned accurately. Two minor issues worth fixing before merge, one nit. Issues🟡
|
|
Solid follow-up to the #1030 regression — 46 targeted unit tests now pin the exact integration seams (payload construction, ctx-size resolution, error classification) that can silently break doc-Q&A in production. The docstring update in SummaryThe test suite is well-structured and the mocking strategy is consistent throughout: One minor docstring inaccuracy; no blocking issues. Issues Found🟢 Minor — misleading docstring in
|
|
The |
itomek
left a comment
There was a problem hiding this comment.
High-value regression suite — the mocked tests pin exactly the seams that broke in #1030 (load_model payload, _ensure_model_loaded ctx resolution, undersized-ctx reload, retryable-overflow classification) with no Lemonade Server needed. Verified the autouse list-models stub (real isolation, no implicit localhost calls), the ctx_size=4096 assertion on the missing-recipe-options test, and background=silent on the launch_server tests. Approving. Tiny optional nit: test_context_overflow_by_message_string's docstring says "even without type field" but the payload sets type=unknown_error — it actually tests message-substring fallback when the type doesn't match.
Generated by Claude Code
LemonadeClient's llama.cpp-specific code paths (llamacpp_args, ctx_size parameter construction, model loading, context-size validation, error classification) had no dedicated test coverage. A regression in any of these paths could silently break model loading or produce confusing errors — the #1030 ctx-truncation incident showed what that looks like. 46 new unit tests covering: - load_model HTTP payload construction (llamacpp_args, ctx_size, save_options) - _ensure_model_loaded ctx_size resolution from MODELS registry vs fallback - Under-sized ctx reload detection (the #1030 follow-up regression path) - launch_server --ctx-size CLI flag forwarding - OpenAI-standard vs llama.cpp-native parameter separation (extra_body) - Context-overflow error classification with dynamic retryable logic - Nested llama-server error envelopes (details.response.error) - model_not_loaded and corrupt-download detection - MODELS registry structural invariants - LemonadeProvider repetition-penalty defaults and error raising All tests are fully mocked — no running Lemonade Server required. Closes #1153
- Move LemonadeProvider to module-level import (was repeated in 5 methods) - Add autouse fixture to stub list_models() in TestEnsureModelLoadedCtxResolution so tests don't make implicit network calls to localhost:13305 - Strengthen test_missing_recipe_options_triggers_reload to assert exact ctx_size=4096 argument (was bare assert_called_once) - Remove redundant patch.object(LemonadeClient, "__init__") from 5 provider tests — __new__ already bypasses __init__ - Switch TestLaunchServerCtxSize to background="silent" to avoid daemon- thread busy-loop noise in pytest output - Fix stale docstring in LemonadeContextOverflowError: "n_ctx < 32K" → 65536 to match the actual guard in _classify_lemonade_response
…ernance Cherry-picked from test/eval-toolchain-1151 to unblock CI.
1bcc882 to
14dc920
Compare
The llama.cpp-specific code paths in LemonadeClient (llamacpp_args construction, ctx_size validation, model-load HTTP payloads, error classification) had zero dedicated test coverage. The #1030 ctx-truncation incident showed what happens when these paths regress silently — unit tests pass but doc-Q&A breaks in production. 46 new mocked unit tests now pin the critical integration seams so a future change that breaks load_model payloads, ctx-size resolution, or the retryable-overflow logic will fail fast in CI.
Test plan
python -m pytest tests/unit/test_llamacpp_backend.py -xvs— 46 tests pass, no Lemonade Server requiredpython -m pytest tests/unit/ -xCloses #1153