llama.cpp auto-translate: add Qwen 3 4B Instruct + Qwen 3 8B by niksedk · Pull Request #11033 · SubtitleEdit/subtitleedit

niksedk · 2026-05-18T15:14:30Z

Summary

Adds Qwen 3 as an alternative model family in the curated llama.cpp auto-translate model list. Until now the list was four quantizations of the same model (TranslateGemma); users hitting Gemma's quirks (occasional refusals on adult-themed dialogue, formatting drift, weaker CJK quality) had no real fallback. Qwen 3 is the strongest open model for Japanese/Chinese/Korean translation in 2026 and is competitive on European pairs.

Two new entries in LlamaCppServerManager.TranslateModels:

Model	Size	Source
Qwen 3 4B Instruct (Q4_K_M)	2.5 GB	`bartowski/Qwen_Qwen3-4B-Instruct-2507-GGUF`
Qwen 3 8B (Q4_K_M)	4.7 GB	`bartowski/Qwen_Qwen3-8B-GGUF`

Why bartowski as the GGUF source

The two most reliable GGUF authors are ggml-org (official llama.cpp org) and bartowski. ggml-org's Qwen3-4B-Instruct-2507-Q8_0-GGUF only ships the 8 GB Q8_0 quant, and they don't ship Qwen3-8B-Instruct as separate GGUFs at all. bartowski ships every standard quant for both, is the highest-trafficked GGUF maintainer on HF, and tracks upstream releases promptly.

Why Qwen3-8B (hybrid) and not Qwen3-8B-Instruct-2507

Qwen never released a separate Qwen3-8B-Instruct-2507 — only the 4B and 30B-A3B sizes got the Instruct/Thinking split. For the 8B size the original hybrid Qwen3-8B is what's available.

The existing wiring already handles this correctly: setting ChatTemplate: "chatml", NoJinja: true translates to --no-jinja --chat-template chatml on the llama-server command line. That bypasses the embedded Jinja template's enable_thinking logic and feeds the model a plain chatml prompt, so output is clean translation rather than <think>...</think> reasoning blocks.

What stays the same

TranslateGemma 4B Q4_K_M remains the default first-pick (no change to existing users)
Custom GGUF discovery in the models folder is unaffected (PR llama.cpp auto-translate: discover custom GGUFs + open-folder button #11028's work intact)
No changes to download logic, server lifecycle, or chat-template wiring — just new entries in the curated list

Test plan

Build SE locally and open Auto-Translate window — dropdown shows the two new entries below the TranslateGemma group
Click download on "Qwen 3 4B Instruct (Q4_K_M)" — file downloads to the llama.cpp models folder
Translate a small subtitle file — verify output is clean translation, no <think> blocks, no system-token leakage
Repeat with "Qwen 3 8B (Q4_K_M)" — same checks, especially that the hybrid model doesn't fall into thinking mode

🤖 Generated with Claude Code

The curated translate-model list was four quants of the same model family (TranslateGemma). Add Qwen 3 as an alternative family — strongest open model for CJK languages, competitive elsewhere — so users hitting Gemma's quirks (refusals on adult-themed dialogue, formatting drift) have a real fallback rather than just a different quant of the same underlying model. Two new entries: - Qwen 3 4B Instruct (Q4_K_M, 2.5 GB) — dedicated instruct-only variant from the Qwen3-Instruct-2507 series. - Qwen 3 8B (Q4_K_M, 4.7 GB) — original hybrid Qwen3-8B. Qwen never released a separate Qwen3-8B-Instruct-2507; on the hybrid model, --no-jinja + --chat-template chatml bypasses the embedded Jinja template's enable_thinking logic so output is clean translation rather than <think>...</think> reasoning blocks. GGUF sources are bartowski's repos — most prolific, well-maintained quantizer. TranslateGemma 4B Q4_K_M stays as the default first-pick. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

niksedk merged commit a75434c into main May 18, 2026
1 of 3 checks passed

niksedk deleted the add-qwen3-translate-models branch May 18, 2026 15:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama.cpp auto-translate: add Qwen 3 4B Instruct + Qwen 3 8B#11033

llama.cpp auto-translate: add Qwen 3 4B Instruct + Qwen 3 8B#11033
niksedk merged 1 commit into
mainfrom
add-qwen3-translate-models

niksedk commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

niksedk commented May 18, 2026

Summary

Why bartowski as the GGUF source

Why Qwen3-8B (hybrid) and not Qwen3-8B-Instruct-2507

What stays the same

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant