Summary
LemonadeClient._is_corrupt_download_error treats the generic error string "llama-server failed to start" as evidence of a corrupt/incomplete model download. That string is raised by Lemonade for many non-corruption failures (resource limits, ctx_size issues, GPU/backend startup problems, port conflicts). Misclassifying them routes ordinary load failures into the delete-and-redownload repair path — wasting a full multi-GB re-download (the default model is ~25 GB) and (combined with the interactive-prompt defect, sibling issue #1293) dead-ending first-boot.
Impact
Root cause analysis
src/gaia/llm/lemonade_client.py:1238-1248:
return any(
phrase in error_message
for phrase in [
"download validation failed",
"files are incomplete",
"files are missing",
"incomplete or missing",
"corrupted download",
"llama-server failed to start", # Often indicates corrupt model files
]
)
The first five phrases are specific corruption signals. "llama-server failed to start" is a generic startup failure — the comment ("Often indicates corrupt model files") concedes it is a heuristic, not a reliable signal. The user's case is the counter-example: a force delete+redownload at ctx_size=32768 later succeeded, where the boot preload at ctx_size=0 failed with this string — pointing at a load/ctx/resource cause, not file corruption.
Proposed direction
Acceptance criteria
Test plan (TDD)
Unit (tests/unit/test_lemonade_error_classification.py — file already exists):
Integration (@pytest.mark.integration, require_lemonade):
Real-world (manual, AMD Linux hardware; tear down afterward):
Related
Summary
LemonadeClient._is_corrupt_download_errortreats the generic error string"llama-server failed to start"as evidence of a corrupt/incomplete model download. That string is raised by Lemonade for many non-corruption failures (resource limits,ctx_sizeissues, GPU/backend startup problems, port conflicts). Misclassifying them routes ordinary load failures into the delete-and-redownload repair path — wasting a full multi-GB re-download (the default model is ~25 GB) and (combined with the interactive-prompt defect, sibling issue #1293) dead-ending first-boot.Impact
Root cause analysis
src/gaia/llm/lemonade_client.py:1238-1248:The first five phrases are specific corruption signals.
"llama-server failed to start"is a generic startup failure — the comment ("Often indicates corrupt model files") concedes it is a heuristic, not a reliable signal. The user's case is the counter-example: a force delete+redownload atctx_size=32768later succeeded, where the boot preload atctx_size=0failed with this string — pointing at a load/ctx/resource cause, not file corruption.Proposed direction
"llama-server failed to start"from the corruption phrase list, or only treat it as corruption when corroborated by a specific signal (e.g. a follow-up file-integrity check, a missing/short shard, or an explicit Lemonade corruption code)."llama-server failed to start"as a load failure that surfaces an actionable error (per the repo's "fail loudly" rule) instead of silently re-downloading.code/type) over substring matching where available.Acceptance criteria
_is_corrupt_download_error("...llama-server failed to start...")returnsFalseunless a specific corruption signal is also present.True(no regression).llama-server failed to startload failure raises an actionableLemonadeClientError(what failed / what to do / where to look) and does not enter the delete+redownload path.Test plan (TDD)
Unit (
tests/unit/test_lemonade_error_classification.py— file already exists):True; bare"llama-server failed to start"→False;"llama-server failed to start"plus a corruption phrase →True.load_modelwith a mocked barellama-server failed to startresponse does not calldelete_model/pull_model_stream, and raises an actionable error.load_modelwith a mocked specific corruption error does enter the repair path.Integration (
@pytest.mark.integration,require_lemonade):ctx_size/ resource condition) and assert no model deletion/re-download occurs and the error is actionable.Real-world (manual, AMD Linux hardware; tear down afterward):
Related