Pulse · ggml-org/llama.cpp · GitHub

April 15, 2025 – April 22, 2025

Overview

51 Active pull requests

65 Active issues

Could not load contribution data

Please try again later

24 Releases published by 1 person

b5142
published Apr 15, 2025
b5143
published Apr 16, 2025
b5144
published Apr 16, 2025
b5145
published Apr 16, 2025
b5146
published Apr 17, 2025
b5147
published Apr 17, 2025
b5148
published Apr 17, 2025
b5149
published Apr 17, 2025
b5150
published Apr 18, 2025
b5151
published Apr 18, 2025
b5152
published Apr 18, 2025
b5153
published Apr 18, 2025
b5155
published Apr 18, 2025
b5156
published Apr 19, 2025
b5158
published Apr 19, 2025
b5159
published Apr 20, 2025
b5160
published Apr 20, 2025
b5161
published Apr 20, 2025
b5162
published Apr 20, 2025
b5163
published Apr 21, 2025
b5164
published Apr 21, 2025
b5165
published Apr 21, 2025
b5166
published Apr 22, 2025
b5169
published Apr 22, 2025

28 Pull requests merged by 17 people

mtmd : support SmolVLM (version 1 and 2)
#13050 merged Apr 22, 2025
security : add note about RPC and server functionality
#13061 merged Apr 22, 2025
metal : add memory pool for temp allocs
#12850 merged Apr 22, 2025
llava : update documentations
#13055 merged Apr 22, 2025
ggml : add SSE 4.2 and x64 base variant for CPUs without AVX
#12871 merged Apr 21, 2025
SYCL: Add non-contiguous support in ROPE
#12993 merged Apr 21, 2025
mtmd : merge llava, gemma3 and minicpmv CLI into single llama-mtmd-cli
#13012 merged Apr 21, 2025
convert : experimental support for --mmproj flag
#13023 merged Apr 20, 2025
llava: fix errors in clip.h on certain pure C compilers
#13030 merged Apr 20, 2025
vulkan: support noncontiguous rms_norm
#13031 merged Apr 20, 2025
metal: add neg operator
#13029 merged Apr 20, 2025
Disable CI cross-compile builds
#13022 merged Apr 19, 2025
gguf-py : fix upload python package workflow
#13020 merged Apr 19, 2025
clip : refactor, add image_manipulation and llava_uhd classes
#13011 merged Apr 19, 2025
main : Fix Ctrl+D/newline handling
#12951 merged Apr 18, 2025
gguf-py : GGUF Editor GUI - Python + Qt
#12930 merged Apr 18, 2025
server : use std::move whenever possible
#12936 merged Apr 18, 2025
SYCL: Refactor and enable FP16 in binary broadcast OPs
#12975 merged Apr 18, 2025
mtmd : add methods to access mtmd_image_tokens
#12906 merged Apr 18, 2025
rpc : add RPC_CMD_HELLO
#12955 merged Apr 18, 2025
graph : make FA compatible with MLA + add initial Metal kernels
#12953 merged Apr 17, 2025
ggml: Re-enable CUDA graphs in presence of CONT and DUP nodes
#12970 merged Apr 17, 2025
CANN: Add support for async operator submission
#12864 merged Apr 17, 2025
Recognize IBM Granite 3.3 FIM tokens. Makes llama-server /infill usable.
#12988 merged Apr 17, 2025
opencl: fix incorrect local_size index in profiling log
#12868 merged Apr 16, 2025
vulkan: enable coopmat2 FA gqa and split_k optimizations more often
#12931 merged Apr 16, 2025
[CANN]310P OPT Support
#12962 merged Apr 16, 2025
opencl: split ggml-opencl.cl into multiple files and cleanup
#12886 merged Apr 15, 2025

23 Pull requests opened by 22 people

sycl: use DNN in the first part of ggml_sycl_mul_mat_batched_sycl
#12972 opened Apr 16, 2025
Fix convert script for non-hf GLM4 checkpoints
#12992 opened Apr 17, 2025
[CANN] Add the n_graph_splits performance metric to llama-bench.
#12994 opened Apr 17, 2025
threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling
#12995 opened Apr 17, 2025
make memset range dynamic
#13002 opened Apr 18, 2025
[SYCL][OPT] Fix reorder optimization for Q4_0
#13003 opened Apr 18, 2025
Nix portability improvements
#13005 opened Apr 18, 2025
CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID
#13014 opened Apr 18, 2025
vulkan: matmul gcn tuning
#13016 opened Apr 18, 2025
Append mult-eos,half-rope,bos to GLM4-0414 and Z
#13021 opened Apr 19, 2025
Bitnet: directly use scale instead of inverting it twice
#13026 opened Apr 19, 2025
quantize: improve pattern matching for allowed tensors
#13033 opened Apr 20, 2025
gguf-py : avoid requiring PySide6 for packaged scripts
#13036 opened Apr 20, 2025
quantize: Handle user-defined pruning of whole layers (blocks)
#13037 opened Apr 20, 2025
[CANN]Support OP MUL_MAT_ID
#13042 opened Apr 21, 2025
llama-gemma3-cli: Sigint rework in gemma3 vision example
#13043 opened Apr 21, 2025
ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel
#13053 opened Apr 21, 2025
Update README.md for tts example to use afplay on MacOS
#13056 opened Apr 22, 2025
Fix ChatGLMModel for glm-4-9b cannot find tokenizer merges in model file
#13058 opened Apr 22, 2025
rpc : add command line option for number of threads for the CPU backend
#13060 opened Apr 22, 2025
cmake : do not include ./src as public for libllama
#13062 opened Apr 22, 2025
mtmd : Support Pixtral 12B (help needed - 2D RoPE)
#13065 opened Apr 22, 2025
fix(rpc): Improve input validation and error handling
#13069 opened Apr 22, 2025

30 Issues closed by 12 people

Feature Request:
#13070 closed Apr 22, 2025
Big performance regression of llama-bench with Vulkan backend using forced integer dot product code path (at least on NV 4070 latest driver) (from initial support in b5010)..
#13063 closed Apr 22, 2025
Feature Request: Support for Deepseek Janus-Pro-7B & Janus-1.3B
#11490 closed Apr 22, 2025
Feature Request: Convert deepseek-v3's mtp module to gguf and quantize to q4km
#12242 closed Apr 22, 2025
Misc. bug: Misc. bug: cannot convert GLM-4v-9B- (glm-4v-9b) to GGUF format #11263
#12266 closed Apr 22, 2025
Misc. bug: llama fails to run on older x86 hardware.
#12866 closed Apr 21, 2025
Eval bug: Deepseek V2 Lite no longer working with Vulkan (assert fail during tg)
#12956 closed Apr 20, 2025
Compile bug: there is a build bug in examples/llama.android and it will brings build failure in CI
#12638 closed Apr 20, 2025
Eval bug: CANNOT LINK EXECUTABLE "./llama-cli": library "libomp.so" not found: needed by main executable
#11979 closed Apr 20, 2025
Feature Request: Proposing User-Customizable RAG Integration in llama.cpp: A Path to Enhanced Contextual Retrieval
#12129 closed Apr 20, 2025
Refactor: rename `n_swa` to `n_aux`
#13019 closed Apr 19, 2025
Misc. bug: HIP compilation together with -DGGML_CPU_ALL_VARIANTS=ON does not load the model or detects the GPU
#12175 closed Apr 19, 2025
Eval bug: Server returns 500 error on /api/generate and /api/chat requests
#12176 closed Apr 19, 2025
Misc. bug: llama-rpc crashes when deciding memory on CPU with CUDA_VISIBLE_DEVICES=""
#12203 closed Apr 19, 2025
Misc. bug: Ctrl+D no longer works properly
#12949 closed Apr 18, 2025
Does V100 support flash-attention?
#13008 closed Apr 18, 2025
Feature Request: Support for Apriel-5B-Instruct
#12926 closed Apr 18, 2025
Feature Request: Allow more than 16 combined devices to participate in RPC
#12967 closed Apr 17, 2025
Misc. bug: HIP when using llama.bench and kv cache quant cpu is doing the work instead of gpu
#12624 closed Apr 17, 2025
Eval bug: The answers have some problems with the example/llama.android
#12158 closed Apr 17, 2025
Compile bug: issue compiling in ubuntu (desktop and server version) using virtualbox
#12164 closed Apr 17, 2025
Misc. bug: I have a 20 core and 30 thread CPU, and anything above 3 or 4 CPU thread pool size doesn't give any improvements in tokens per second.
#12966 closed Apr 16, 2025
Misc. bug: "Unexpected empty grammar stack after accepting piece" tool crash
#12597 closed Apr 16, 2025
Misc. bug: RISCV output bug when using rvv with vlen > 256bit
#11041 closed Apr 16, 2025
Eval bug: Error running Phi4-mini gguf: unknown pre-tokenizer type: 'gpt-4o'
#12122 closed Apr 16, 2025
Eval bug: In RISC-V, output tokens are broken
#12124 closed Apr 16, 2025
Feature Request:
#12128 closed Apr 16, 2025
Misc. bug: vulkan on Adreno GPU
#12139 closed Apr 16, 2025
Misc. bug: gguf-dump 'newbyteorder' was removed
#12146 closed Apr 16, 2025
Replacement for deprecated codevct string conversion
#12151 closed Apr 16, 2025

35 Issues opened by 30 people

Compile bug: Vulkan Cross compile for arm64
#13068 opened Apr 22, 2025
Misc. bug: RPC server crash on `SET_TENSOR` with invalid `ggml_type`
#13067 opened Apr 22, 2025
Model Repeats Nonsensical Output
#13066 opened Apr 22, 2025
Doc. bug: docs/multimodal/gemma3.md need to be updated
#13064 opened Apr 22, 2025
Compile bug: NVIDIA A800-SXM4-40GB ggml_cuda_init failed
#13059 opened Apr 22, 2025
Misc. bug: in version 0.16.2, the gguf-dump CLI tool fails due to a missing PySide6 module, indicating an unintended GUI depende
#13054 opened Apr 21, 2025
Misc. bug: Intel container images keep getting `No space left on device` during CI Build
#13052 opened Apr 21, 2025
Misc. bug: CPU Usage low in rpc-server mode
#13051 opened Apr 21, 2025
Slow token generation speed of Gemma 3 QAT Models
#13048 opened Apr 21, 2025
Eval bug: multimodal llama-gemma3-cli gives nonsensical outputs when used with Vulkan
#13046 opened Apr 21, 2025
Misc. bug: llama-cli (vulkan backend) output gibberish with old vulkan sdk
#13044 opened Apr 21, 2025
Eval bug: Error when load `bge-reranker-v2-gemma` model
#13041 opened Apr 21, 2025
Feature Request: Ability to pack multiple GGUFs into single one
#13028 opened Apr 19, 2025
Feature Proposal: Server Model Switching at Runtime
#13027 opened Apr 19, 2025
Misc. bug: (clip.cpp) q8_0 mmproj is broken on gemma 3
#13025 opened Apr 19, 2025
Eval bug: Vulkan: "Requested buffer size exceeds device memory allocation limit" even with `-ngl 0` when trying to run very large models
#13024 opened Apr 19, 2025
Eval bug: RWKV inference issue with llama-server
#13018 opened Apr 19, 2025
[Build] Some Build Options/Definitions seems Missing in ggml-base
#13017 opened Apr 19, 2025
Perplexity script for non GGUF quantization
#13015 opened Apr 18, 2025
Eval bug: why Gemma 3 model has run into CPU inference
#13004 opened Apr 18, 2025
Eval bug: Segmentation fault when running gemma3-cli on Android
#13000 opened Apr 18, 2025
gmake[2]: *** [tests/CMakeFiles/test-tokenizer-0.dir/build.make:107: bin/test-tokenizer-0] Error 1
#12998 opened Apr 18, 2025
Eval bug: microsoft/bitnet-b1.58-2B-4T-gguf
#12997 opened Apr 17, 2025
Eval bug: HIP: llama.cpp server locks up when running multiple instances on the same gpu
#12991 opened Apr 17, 2025
Eval bug: Quad P40 unable to run 70B models on recent releases
#12990 opened Apr 17, 2025
Feature Request: Add kv-quant fa kernel variants for head sizes other than 128
#12989 opened Apr 17, 2025
Misc. bug: Potential memory leak in backend registry
#12986 opened Apr 16, 2025
Feature Request: llama-tts: read from text files and pipe audio signals to stdout for direct audio conversion using ffmpeg
#12984 opened Apr 16, 2025
Feature Reequest: Multi model cli tools: Add a possibility to specify a image in conversation mode plus tab auto completion for path
#12983 opened Apr 16, 2025
Feature Request: Make chat sessions possible with multi model cli tools
#12982 opened Apr 16, 2025
Feature Request: multi model cli tools: Convert submitted images to best size and format for model
#12981 opened Apr 16, 2025
Misc. bug: Only using 1 compute core on AMD
#12978 opened Apr 16, 2025
Misc. bug: Vulkan performance depends on thread priority
#12976 opened Apr 16, 2025
Eval bug: Gemma-3 Vision failed with CUDA
#12973 opened Apr 16, 2025
Misc. bug: llama-server speculative decoding not as performant as llama-speculative-simple
#12968 opened Apr 16, 2025

89 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Resolved half rope,multi-EOS issues in convert_hf_togguf.py for GLM4Z Model
#12957 commented on Apr 22, 2025 • 17 new comments
Add Qwen2.5VL support
#12402 commented on Apr 22, 2025 • 11 new comments
server : (experimental) vision support via libmtmd
#12898 commented on Apr 22, 2025 • 10 new comments
sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs
#12858 commented on Apr 22, 2025 • 8 new comments
llama-bench: enhance benchmark with improved token throughput measurements
#12874 commented on Apr 21, 2025 • 2 new comments
tts : implement sesame CSM + Mimi decoder
#12648 commented on Apr 22, 2025 • 1 new comment
llama-bench : Add `--override-tensors` arg
#12922 commented on Apr 21, 2025 • 1 new comment
imatrix : use GGUF to store importance matrices
#9400 commented on Apr 15, 2025 • 0 new comments
Eval bug: Loading fail on Gemma 3:12b > llama_model_load: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon
#12367 commented on Apr 22, 2025 • 0 new comments
Eval bug: LLaVa convert_image_encoder_to_gguf.py fails to byteswap v.head.ffn_up.bias tensor on Big-Endian system
#12863 commented on Apr 22, 2025 • 0 new comments
Compile bug: There was a errror while compiling support for the backend Vulkan
#12619 commented on Apr 22, 2025 • 0 new comments
Feature Request: Interleaved sliding window attention support for gemma 2 and 3
#12637 commented on Apr 22, 2025 • 0 new comments
Feature Request: Improve model load time when using the RPC backend
#12954 commented on Apr 22, 2025 • 0 new comments
Eval bug: input is too large to process. increase the physical batch size
#12295 commented on Apr 22, 2025 • 0 new comments
Compile bug: Prooted Debian in Droid Termux only
#12452 commented on Apr 22, 2025 • 0 new comments
Eval bug: llama.swiftui Unexpectedly found nil while unwrapping an Optional value
#12510 commented on Apr 22, 2025 • 0 new comments
Misc. bug: test-backend-ops grad crash by GGML_ASSERT error
#12520 commented on Apr 22, 2025 • 0 new comments
Feature request: Graphical GGUF viewer
#6715 commented on Apr 21, 2025 • 0 new comments
Error while converting peft finetuned merged model to gguf
#12494 commented on Apr 21, 2025 • 0 new comments
Misc. bug: Buffer offset is not aligned on macOS / Intel / Vulkan
#10984 commented on Apr 21, 2025 • 0 new comments
Feature Request: YuE (music gen)
#11467 commented on Apr 21, 2025 • 0 new comments
Misc. bug: --no-context-shift OR --context-shift ?
#12038 commented on Apr 21, 2025 • 0 new comments
Eval bug: Gemma-3 vision don't work multilingual
#12351 commented on Apr 21, 2025 • 0 new comments
Feature Request: New sampling method that boosts reasoning performance - looks too good?
#12479 commented on Apr 21, 2025 • 0 new comments
Feature Request: deep/ recurrent processing like "thinking", but script based.
#12486 commented on Apr 21, 2025 • 0 new comments
Compile bug: Error build llama cpp on CUDA
#12491 commented on Apr 21, 2025 • 0 new comments
Llama-3_1-Nemotron-Ultra-253B-v1 support
#12843 commented on Apr 22, 2025 • 0 new comments
`common`: add partial regex support
#12808 commented on Apr 19, 2025 • 0 new comments
ci: fix cross-compile sync issues
#12804 commented on Apr 19, 2025 • 0 new comments
`server`: inject date_string in llama 3.x template + fix date for firefunction v2
#12802 commented on Apr 19, 2025 • 0 new comments
kv-cache : separate recurrent vs non-recurrent impl
#12799 commented on Apr 22, 2025 • 0 new comments
Support for OuteTTS 1.0
#12794 commented on Apr 22, 2025 • 0 new comments
(wip) support ultravox audio input
#12745 commented on Apr 22, 2025 • 0 new comments
imatrix: add option to display importance score statistics for a given imatrix file
#12718 commented on Apr 22, 2025 • 0 new comments
vulkan: Add bfloat16 support
#12554 commented on Apr 22, 2025 • 0 new comments
(draft) tts: Orpheus support
#12487 commented on Apr 22, 2025 • 0 new comments
Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture
#12466 commented on Apr 22, 2025 • 0 new comments
`server`: streaming of tool calls and thoughts when `--jinja` is on
#12379 commented on Apr 16, 2025 • 0 new comments
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on Apr 21, 2025 • 0 new comments
[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs
#12063 commented on Apr 21, 2025 • 0 new comments
Supporting Velvet model
#11716 commented on Apr 16, 2025 • 0 new comments
Allow user to compile with any cuda version using github actions
#10928 commented on Apr 20, 2025 • 0 new comments
llama/ggml: add LLM training support
#10544 commented on Apr 22, 2025 • 0 new comments
add FP8 support to gguf/llama:
#10055 commented on Apr 18, 2025 • 0 new comments
[Draft] Tensor Parallel support to llama.cpp
#9648 commented on Apr 19, 2025 • 0 new comments
Feature Request: Ovis2 Support
#12358 commented on Apr 17, 2025 • 0 new comments
Misc. bug: Loading models result in Fatal signal 4 (SIGILL) on some Android devices
#12393 commented on Apr 17, 2025 • 0 new comments
Misc. bug: Program closes without providing an error.
#12417 commented on Apr 17, 2025 • 0 new comments
Misc. bug: CUDA graph update failed OR Failed to allocate graph error and cuda hang
#12420 commented on Apr 17, 2025 • 0 new comments
Qualcomm Adreno : Compute pipeline creation failed for mul_mat_vec_q4_k_f32_f32_1
#12421 commented on Apr 17, 2025 • 0 new comments
Feature Request: Support inference for OVIS2 models - (GGUF Conversion & quantization done with success!)
#12429 commented on Apr 17, 2025 • 0 new comments
Compile bug:
#12431 commented on Apr 17, 2025 • 0 new comments
Compile bug: fatal: not a git repository (nor any of the parent directories): .git
#12438 commented on Apr 17, 2025 • 0 new comments
Imatrix quantization bug: OLMo-2-0325-32B-Instruct found nan value
#12439 commented on Apr 17, 2025 • 0 new comments
Compile bug: FAILED: examples/llava/CMakeFiles/llava.dir/llava.cpp.obj
#12899 commented on Apr 16, 2025 • 0 new comments
Feature Request: Add support of convert.py for model Qwen2.5-Omni-7B
#12641 commented on Apr 16, 2025 • 0 new comments
Eval bug: convert_hf_to_gguf.py AttributeError:
#12847 commented on Apr 16, 2025 • 0 new comments
Regarding llama-bench and llama-parallel commands
#12106 commented on Apr 16, 2025 • 0 new comments
Compile bug: how to enable opencl in termux
#12911 commented on Apr 16, 2025 • 0 new comments
Eval bug: <think> tag with DeepSeek-R1-Distill-Qwen-1.5B-Q5_K_M.gguf
#11325 commented on Apr 16, 2025 • 0 new comments
Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache.
#12352 commented on Apr 16, 2025 • 0 new comments
How do I know which operator the code is computing has an error?
#12389 commented on Apr 16, 2025 • 0 new comments
Eval bug: KV cache changes the inference results, even when context fits and no quantization
#12396 commented on Apr 16, 2025 • 0 new comments
Eval bug: llama-qwen2vl-cli gives too short or cut response
#12408 commented on Apr 16, 2025 • 0 new comments
Eval bug: main: failed to load image /home/data1/protected/hyperscope/9/4/3/1/6/2025-02-21/Raw Crystal of Morganite Gemstone.jpg. Terminating
#12410 commented on Apr 16, 2025 • 0 new comments
Eval bug: GLM-Z1-9B-0414
#12946 commented on Apr 15, 2025 • 0 new comments
Compile bug: gcc-11: error: unrecognized command-line option '-compress-mode=size'
#12325 commented on Apr 15, 2025 • 0 new comments
tts : add support for SparkTTS
#12495 commented on Apr 21, 2025 • 0 new comments
server : improvements and maintenance
#4216 commented on Apr 20, 2025 • 0 new comments
Misc. bug: The KV cache is sometimes truncated incorrectly when making v1/chat/completions API calls
#11970 commented on Apr 20, 2025 • 0 new comments
Feature Request: (webui) Implement a experimental features on webui
#11662 commented on Apr 20, 2025 • 0 new comments
Bug tracker: (webui/experimental) Python interpreter via pyodide
#11762 commented on Apr 20, 2025 • 0 new comments
Eval bug: does llama.cpp support Intel AMX instruction? how to enable it
#12003 commented on Apr 20, 2025 • 0 new comments
Eval bug: getting assertion error when trying to use a gguf quantized model at inference "GGML_ASSERT(n_outputs_enc > 0 && "call llama_encode() first") failed"
#12080 commented on Apr 20, 2025 • 0 new comments
Misc. bug: The model's reasoning performance has significantly decreased despite using different versions of the same model architecture, identical parameters, and the same set of questions.
#12816 commented on Apr 19, 2025 • 0 new comments
Eval bug: OpenAI incompatible image handling in server multimodal
#12947 commented on Apr 19, 2025 • 0 new comments
server: Bring back multimodal support
#8010 commented on Apr 19, 2025 • 0 new comments
Research: Performance differences between Metal (macOS) and Vulkan (Linux)
#10982 commented on Apr 19, 2025 • 0 new comments
Misc. bug: Using `-c -1` results in `n_ctx = 4294967295` or `n_ctx = 8`
#12414 commented on Apr 19, 2025 • 0 new comments
Eval bug: RK3588 Unexpected inf values cause garbled output(or core dump) in llama-cli
#12458 commented on Apr 19, 2025 • 0 new comments
Feature Request: Qwen2.5 0.5b OpenCL backend support
#12463 commented on Apr 19, 2025 • 0 new comments
Eval bug: MiniCPM-2B-128k convert_hf_to_gguf Missing the required key: rope_scaling
#12468 commented on Apr 19, 2025 • 0 new comments
Misc. bug: rpc - Flash Attention Failure in Metal/CUDA RPC Mixed Environment
#12655 commented on Apr 18, 2025 • 0 new comments
Eval bug: Excessive stack usage during tool calling
#12234 commented on Apr 18, 2025 • 0 new comments
Do you add LLaDA model support?
#12360 commented on Apr 18, 2025 • 0 new comments
Eval bug: How to isolate chat history
#12440 commented on Apr 18, 2025 • 0 new comments
Feature Request: Cache nix builds on a public cache server?
#12453 commented on Apr 18, 2025 • 0 new comments
Suport for Jamba JambaForCausalLM
#6372 commented on Apr 17, 2025 • 0 new comments
[Feature request] convert_hf_to_gguf.py supports for bloomz
#12356 commented on Apr 17, 2025 • 0 new comments