[pull] master from ggml-org:master by pull[bot] · Pull Request #96 · CrazyForks/llama.cpp

pull · 2026-05-25T15:42:30Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

* Refactored Compressed Tensors NVFP4 support for new base.py * Support compressed-tensors NVFP4 conversion * Moved Qwen MTP remap into filter_tensors * simplify * pathlib no longer used --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* CUDA: add fast walsh-hadamard transform * review: add unrolls + change size_t -> int * warp size 64 --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

ffn_latent_down/up are declared GGML_OP_MUL in LLM_TENSOR_INFOS but nemotron-h feeds them through ggml_mul_mat. The loader buft probe asks the backend about the declared op, so it tested an elementwise MUL on a q8_0 weight. That used to return true unconditionally and the weight stayed on GPU by luck. Once supports_op told the truth, the probe got a no and the loader pushed the weight and its matmul to CPU, splitting the graph. Tagging it MUL_MAT asks the real question, the math is unchanged. Verified on Nemotron 3 Super 120B Q5_K_M: from 64.9 back to 103.22 t/s.

ggerganov and others added 6 commits May 25, 2026 12:43

ggml : bump version to 0.13.0 (ggml/1510)

45158f4

sync : ggml

d161ea7

ui: fix stop/continue during an agentic loop (#23356)

5a4126a

CUDA: add fast walsh-hadamard transform (#23615)

c1f1e28

* CUDA: add fast walsh-hadamard transform * review: add unrolls + change size_t -> int * warp size 64 --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

pull Bot locked and limited conversation to collaborators May 25, 2026

pull Bot added the ⤵️ pull label May 25, 2026

pull Bot merged commit 328874d into CrazyForks:master May 25, 2026
3 of 20 checks passed

github-actions Bot added Nvidia GPU testing examples python ggml script server/ui labels May 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggml-org:master#96

[pull] master from ggml-org:master#96
pull[bot] merged 6 commits into
CrazyForks:masterfrom
ggml-org:master

pull Bot commented May 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

pull Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pull Bot commented May 25, 2026 •

edited

Loading