Skip to content

Tags: ggml-org/llama.cpp

Tags

b5201

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
arg : fix unused variable (#13142)

b5200

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama-bench : Add `--override-tensors` arg (#12922)

* Add --override-tensors option to llama-bench

* Correct llama-bench --override-tensors to --override-tensor

* llama-bench: Update --override-tensors parsing to match --tensor-split, appear in test matrix.

* Make new llama-bench util functions static to fix Ubuntu CI

* llama-bench: Correct -ot corner cases (No -ot calls, leading and trailing empty -ot spans, etc.)

b5199

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
llama-chat : fix wrong template in GLM4-0414 (#13140)

* fix wrong template in GLM4-0414

* fix spaces

* no bos token since it is already in the template

* moved the chatgml4 check to higher priority

* restored template for old GLM models

* moved the GLM4 template check in the correct place with correct check

b5198

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
musa: fix build warning (#13129)

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

b5197

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Fixes Qwen2.5VL segfault during inference with #12402 as has_qwen2vl_…

…merger migration was incomplete (#13133)

b5196

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
clip : Add Qwen2.5VL support (#12402)

* implment vision model architecture, gguf convertor

* handle window attention inputs

* add debug utils

* fix few incorrect tensor memory layout

* move position id remap out of ggml to avoid int32 cuda operations

* cleaning up

* ignore transformers Qwen2_5_xxx type check

* remove not so often use `qwen2vl-cli` debug functions

* remove commented-out code blocks

* fix attn weight scaling after rebase

* add `PROJECTOR_TYPE_QWEN2_5_VL`

* remove `KEY_USE_GLU_MLP`, `KEY_USE_RMS_NORM`

* replace `KEY_FULLATTN_BLK_IDX` with `KEY_WIN_ATTN_PATTERN`

* remove `attn_window_size` from gguf

* fix model conversion

* clean up

* fix merging problem

* add test

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>

b5195

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
common : add common_remote_get_content (#13123)

* common : add common_remote_get_content

* support max size and timeout

* add tests

b5194

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
clip : improve projector naming (#13118)

* clip : improve projector naming

* no more kv has_llava_projector

* rm unused kv

* rm more unused

b5193

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ggml: move fp16/bf16 conversion optimizations to CPU backend + export…

… conversion APIs (#13107)

* ggml: dynamic x86_64 feature detection for FP32 <-> FP16/BF16 conversion

* move fp converter to ggml-cpu

* Switch ggml_compute_forward_get_rows_f16/bf16 to new ggml_cpu_fp16/bf16_to_fp32

b5192

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
grammar : handle maxItems == 0 in JSON schema (#13117)

Co-authored-by: Richard Lyons <frob@cloudstaff.com>