Skip to content

NovaMLX v1.0.4

Choose a tag to compare

@cnshsliu cnshsliu released this 05 May 12:14
· 140 commits to main since this release

What's Changed

Thinking-Budget Enforcement (closes T5–T8 at temp=0)

  • New ThinkingBudgetProcessor — production-grade thinking-budget enforcement for reasoning models at temperature=0. When models like Qwen3.6 greedy-decode complex prompts, they can lock into chain-of-thought phase and exhaust max_tokens without producing response content. The processor counts generated tokens and forces close-marker emission when budget is exceeded — same primitive as vLLM/SGLang/llama.cpp max_thinking_tokens.
  • Smart default: min(1024, max(256, maxTokens/2)) at temp=0 for thinking models. Opt-out via thinking_budget=0, custom via thinking_budget=N.
  • isImplicitThinkingModel rewrite — fixed mis-classification of Qwen3.6 as explicit-thinking when it's actually implicit (chat template injects <think\n). 4-step decision tree now correctly handles Qwen3.6, DeepSeek-R1, and non-thinking models.
  • Extended ComposedLogitProcessor to 4-slot chain: penalty → grammar → turnStop → thinkingBudget.

VLM LogitProcessor Chain Fix

  • Both VLM paths now thread LogitProcessor chain (penalty + TurnStop + hallucination detection). Previously VLM requests silently bypassed repetition penalty and multi-token hallucination detection.

Strict-FSM JSON Logit Processor

  • Rejects raw control chars (\n, \r, \t) inside JSON strings at both FSM step and mask precompute levels. Escaped sequences allowed. 15 unit tests.

Chat Template Library & TokenMaskBuilder

  • Chat template format detection library with confidence scoring. TokenMaskBuilder cache for efficient logit masking.

DeepSeek-V4 Lite Test Suite

  • 7 tests covering model registration, family routing, chat template detection, indexer contract, and family guard.

ThinkingParser Regression Suite

  • 13 tests covering Qwen3.6 explicit markers, streaming chunk-boundary fuzz, mixed markers, implicit-detection fixtures, and enable_thinking=false regex behavior.

Build & Infra

  • build.sh now runs idempotent post-build sync (Mach-O UUID comparison + codesign). Bypass with NOVAMLX_SKIP_DIST_SYNC=1.

Full Changelog

  • feat: strict-FSM JSON logit processor, chat template library, TokenMaskBuilder cache, thinking detection overhaul
  • feat: VLM LogitProcessor chain, ThinkingParser regression tests, build.sh sync, GUI models path
  • test: DeepSeek-V4 lite regression suite (7 tests)
  • docs: close §2.10 DeepSeek-V4 lite test suite in todo.markdown
  • feat: ThinkingBudgetProcessor + isImplicitThinkingModel rewrite, close §2.12