Skip to content

llama.cpp auto-translate: add Qwen 3 4B Instruct + Qwen 3 8B#11033

Merged
niksedk merged 1 commit into
mainfrom
add-qwen3-translate-models
May 18, 2026
Merged

llama.cpp auto-translate: add Qwen 3 4B Instruct + Qwen 3 8B#11033
niksedk merged 1 commit into
mainfrom
add-qwen3-translate-models

Conversation

@niksedk
Copy link
Copy Markdown
Member

@niksedk niksedk commented May 18, 2026

Summary

Adds Qwen 3 as an alternative model family in the curated llama.cpp auto-translate model list. Until now the list was four quantizations of the same model (TranslateGemma); users hitting Gemma's quirks (occasional refusals on adult-themed dialogue, formatting drift, weaker CJK quality) had no real fallback. Qwen 3 is the strongest open model for Japanese/Chinese/Korean translation in 2026 and is competitive on European pairs.

Two new entries in LlamaCppServerManager.TranslateModels:

Model Size Source
Qwen 3 4B Instruct (Q4_K_M) 2.5 GB bartowski/Qwen_Qwen3-4B-Instruct-2507-GGUF
Qwen 3 8B (Q4_K_M) 4.7 GB bartowski/Qwen_Qwen3-8B-GGUF

Why bartowski as the GGUF source

The two most reliable GGUF authors are ggml-org (official llama.cpp org) and bartowski. ggml-org's Qwen3-4B-Instruct-2507-Q8_0-GGUF only ships the 8 GB Q8_0 quant, and they don't ship Qwen3-8B-Instruct as separate GGUFs at all. bartowski ships every standard quant for both, is the highest-trafficked GGUF maintainer on HF, and tracks upstream releases promptly.

Why Qwen3-8B (hybrid) and not Qwen3-8B-Instruct-2507

Qwen never released a separate Qwen3-8B-Instruct-2507 — only the 4B and 30B-A3B sizes got the Instruct/Thinking split. For the 8B size the original hybrid Qwen3-8B is what's available.

The existing wiring already handles this correctly: setting ChatTemplate: "chatml", NoJinja: true translates to --no-jinja --chat-template chatml on the llama-server command line. That bypasses the embedded Jinja template's enable_thinking logic and feeds the model a plain chatml prompt, so output is clean translation rather than <think>...</think> reasoning blocks.

What stays the same

Test plan

  • Build SE locally and open Auto-Translate window — dropdown shows the two new entries below the TranslateGemma group
  • Click download on "Qwen 3 4B Instruct (Q4_K_M)" — file downloads to the llama.cpp models folder
  • Translate a small subtitle file — verify output is clean translation, no <think> blocks, no system-token leakage
  • Repeat with "Qwen 3 8B (Q4_K_M)" — same checks, especially that the hybrid model doesn't fall into thinking mode

🤖 Generated with Claude Code

The curated translate-model list was four quants of the same model
family (TranslateGemma). Add Qwen 3 as an alternative family — strongest
open model for CJK languages, competitive elsewhere — so users hitting
Gemma's quirks (refusals on adult-themed dialogue, formatting drift)
have a real fallback rather than just a different quant of the same
underlying model.

Two new entries:
- Qwen 3 4B Instruct (Q4_K_M, 2.5 GB) — dedicated instruct-only variant
  from the Qwen3-Instruct-2507 series.
- Qwen 3 8B (Q4_K_M, 4.7 GB) — original hybrid Qwen3-8B. Qwen never
  released a separate Qwen3-8B-Instruct-2507; on the hybrid model,
  --no-jinja + --chat-template chatml bypasses the embedded Jinja
  template's enable_thinking logic so output is clean translation
  rather than <think>...</think> reasoning blocks.

GGUF sources are bartowski's repos — most prolific, well-maintained
quantizer. TranslateGemma 4B Q4_K_M stays as the default first-pick.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@niksedk niksedk merged commit a75434c into main May 18, 2026
1 of 3 checks passed
@niksedk niksedk deleted the add-qwen3-translate-models branch May 18, 2026 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant