chore: update mlx-swift-lm to fix/gemma4-pad-eos-token by solderzzc · Pull Request #70 · SharpAI/SwiftLM

solderzzc · 2026-04-21T19:32:55Z

Points to fix(Gemma4): add pad token (ID=0) to eosTokenIds to prevent infinite padding loops when Gemma-4 prompts exceed the 1024-token sliding window attention limit.

Copilot

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…uery bug) Adds option 9 to run_benchmark.sh to reproduce and track the bug where Gemma-4 fails to call tools for vague natural-language queries. Test structure (11 total requests): [1/3] Vague 'what is the news' + web_search tool — 5 runs, need ≥3 tool_calls [2/3] Same query, no tools — 3 runs, need 3 coherent text responses (sanity) [3/3] Explicit 'Use web_search...' + tool — 3 runs, need 3 tool_calls Pass criteria: all three sections meet their thresholds. Root cause (documented): The chat_template.jinja appends <|channel>thought\n<channel|> to every non-thinking generation prompt. This flattens the first-token logit distribution for vague queries when tools are present, causing the model to output garbage tokens or ignore tools entirely. Baseline (unfixed): 0/5 vague tool_calls, 3/3 explicit tool_calls. Target (fixed): ≥3/5 vague tool_calls, 3/3 explicit tool_calls.

…nchmark.sh - Swap Quit/regression: 8=Regression, 9=Quit (conventional placement) - Move Test 8 handler block to after BIN+FULL_MODEL are assigned (was incorrectly placed before model selection, causing empty $FULL_MODEL) - Restore accidentally removed 'if [ suite_opt == 2 ]' guard

solderzzc · 2026-04-21T21:34:30Z

close #69

Now working on a fix for: #69

- Implemented Server.swift workaround to force enable_thinking=true for gemma4 with tools - Extracted and tracked <|channel>thought tags correctly in prompt cache states - Fixed run_benchmark.sh to properly parse tool testing outcomes with adjusted max_tokens and system prompts

chore: update mlx-swift-lm to fix/gemma4-pad-eos-token

da9e21b

Points to fix(Gemma4): add pad token (ID=0) to eosTokenIds to prevent infinite padding loops when Gemma-4 prompts exceed the 1024-token sliding window attention limit.

Copilot AI review requested due to automatic review settings April 21, 2026 19:32

Copilot started reviewing on behalf of solderzzc April 21, 2026 19:33 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

github-actions Bot added 2 commits April 21, 2026 12:49

github-actions Bot added 2 commits April 21, 2026 15:28

Update mlx-swift-lm submodule reference to include Gemma-4 PR #28 fixes

b09190a

solderzzc merged commit 116ee91 into main Apr 21, 2026
8 checks passed

solderzzc deleted the fix/gemma4-tool-latency branch April 21, 2026 23:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: update mlx-swift-lm to fix/gemma4-pad-eos-token#70

chore: update mlx-swift-lm to fix/gemma4-pad-eos-token#70
solderzzc merged 5 commits intomainfrom
fix/gemma4-tool-latency

solderzzc commented Apr 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

solderzzc commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

solderzzc commented Apr 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

solderzzc commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants