NovaMLX v1.0.4

cnshsliu released this 05 May 12:14

· 140 commits to main since this release

f3c17d4

What's Changed

Thinking-Budget Enforcement (closes T5–T8 at temp=0)

New ThinkingBudgetProcessor — production-grade thinking-budget enforcement for reasoning models at temperature=0. When models like Qwen3.6 greedy-decode complex prompts, they can lock into chain-of-thought phase and exhaust max_tokens without producing response content. The processor counts generated tokens and forces close-marker emission when budget is exceeded — same primitive as vLLM/SGLang/llama.cpp max_thinking_tokens.
Smart default: min(1024, max(256, maxTokens/2)) at temp=0 for thinking models. Opt-out via thinking_budget=0, custom via thinking_budget=N.
isImplicitThinkingModel rewrite — fixed mis-classification of Qwen3.6 as explicit-thinking when it's actually implicit (chat template injects <think\n). 4-step decision tree now correctly handles Qwen3.6, DeepSeek-R1, and non-thinking models.
Extended ComposedLogitProcessor to 4-slot chain: penalty → grammar → turnStop → thinkingBudget.

VLM LogitProcessor Chain Fix

Both VLM paths now thread LogitProcessor chain (penalty + TurnStop + hallucination detection). Previously VLM requests silently bypassed repetition penalty and multi-token hallucination detection.

Strict-FSM JSON Logit Processor

Rejects raw control chars (\n, \r, \t) inside JSON strings at both FSM step and mask precompute levels. Escaped sequences allowed. 15 unit tests.

Chat Template Library & TokenMaskBuilder

Chat template format detection library with confidence scoring. TokenMaskBuilder cache for efficient logit masking.

DeepSeek-V4 Lite Test Suite

7 tests covering model registration, family routing, chat template detection, indexer contract, and family guard.

ThinkingParser Regression Suite

13 tests covering Qwen3.6 explicit markers, streaming chunk-boundary fuzz, mixed markers, implicit-detection fixtures, and enable_thinking=false regex behavior.

Build & Infra

build.sh now runs idempotent post-build sync (Mach-O UUID comparison + codesign). Bypass with NOVAMLX_SKIP_DIST_SYNC=1.

Full Changelog

feat: strict-FSM JSON logit processor, chat template library, TokenMaskBuilder cache, thinking detection overhaul
feat: VLM LogitProcessor chain, ThinkingParser regression tests, build.sh sync, GUI models path
test: DeepSeek-V4 lite regression suite (7 tests)
docs: close §2.10 DeepSeek-V4 lite test suite in todo.markdown
feat: ThinkingBudgetProcessor + isImplicitThinkingModel rewrite, close §2.12

Assets 4