-
Notifications
You must be signed in to change notification settings - Fork 11.5k
Insights: ggml-org/llama.cpp
Overview
Could not load contribution data
Please try again later
24 Releases published by 1 person
-
b5142
published
Apr 15, 2025 -
b5143
published
Apr 16, 2025 -
b5144
published
Apr 16, 2025 -
b5145
published
Apr 16, 2025 -
b5146
published
Apr 17, 2025 -
b5147
published
Apr 17, 2025 -
b5148
published
Apr 17, 2025 -
b5149
published
Apr 17, 2025 -
b5150
published
Apr 18, 2025 -
b5151
published
Apr 18, 2025 -
b5152
published
Apr 18, 2025 -
b5153
published
Apr 18, 2025 -
b5155
published
Apr 18, 2025 -
b5156
published
Apr 19, 2025 -
b5158
published
Apr 19, 2025 -
b5159
published
Apr 20, 2025 -
b5160
published
Apr 20, 2025 -
b5161
published
Apr 20, 2025 -
b5162
published
Apr 20, 2025 -
b5163
published
Apr 21, 2025 -
b5164
published
Apr 21, 2025 -
b5165
published
Apr 21, 2025 -
b5166
published
Apr 22, 2025 -
b5169
published
Apr 22, 2025
28 Pull requests merged by 17 people
-
mtmd : support SmolVLM (version 1 and 2)
#13050 merged
Apr 22, 2025 -
security : add note about RPC and server functionality
#13061 merged
Apr 22, 2025 -
metal : add memory pool for temp allocs
#12850 merged
Apr 22, 2025 -
llava : update documentations
#13055 merged
Apr 22, 2025 -
ggml : add SSE 4.2 and x64 base variant for CPUs without AVX
#12871 merged
Apr 21, 2025 -
SYCL: Add non-contiguous support in ROPE
#12993 merged
Apr 21, 2025 -
mtmd : merge llava, gemma3 and minicpmv CLI into single
llama-mtmd-cli
#13012 merged
Apr 21, 2025 -
convert : experimental support for
--mmproj
flag#13023 merged
Apr 20, 2025 -
llava: fix errors in clip.h on certain pure C compilers
#13030 merged
Apr 20, 2025 -
vulkan: support noncontiguous rms_norm
#13031 merged
Apr 20, 2025 -
metal: add neg operator
#13029 merged
Apr 20, 2025 -
Disable CI cross-compile builds
#13022 merged
Apr 19, 2025 -
gguf-py : fix upload python package workflow
#13020 merged
Apr 19, 2025 -
clip : refactor, add
image_manipulation
andllava_uhd
classes#13011 merged
Apr 19, 2025 -
main : Fix Ctrl+D/newline handling
#12951 merged
Apr 18, 2025 -
gguf-py : GGUF Editor GUI - Python + Qt
#12930 merged
Apr 18, 2025 -
server : use std::move whenever possible
#12936 merged
Apr 18, 2025 -
SYCL: Refactor and enable FP16 in binary broadcast OPs
#12975 merged
Apr 18, 2025 -
mtmd : add methods to access
mtmd_image_tokens
#12906 merged
Apr 18, 2025 -
rpc : add RPC_CMD_HELLO
#12955 merged
Apr 18, 2025 -
graph : make FA compatible with MLA + add initial Metal kernels
#12953 merged
Apr 17, 2025 -
ggml: Re-enable CUDA graphs in presence of CONT and DUP nodes
#12970 merged
Apr 17, 2025 -
CANN: Add support for async operator submission
#12864 merged
Apr 17, 2025 -
Recognize IBM Granite 3.3 FIM tokens. Makes llama-server /infill usable.
#12988 merged
Apr 17, 2025 -
opencl: fix incorrect local_size index in profiling log
#12868 merged
Apr 16, 2025 -
vulkan: enable coopmat2 FA gqa and split_k optimizations more often
#12931 merged
Apr 16, 2025 -
[CANN]310P OPT Support
#12962 merged
Apr 16, 2025 -
opencl: split
ggml-opencl.cl
into multiple files and cleanup#12886 merged
Apr 15, 2025
23 Pull requests opened by 22 people
-
sycl: use DNN in the first part of ggml_sycl_mul_mat_batched_sycl
#12972 opened
Apr 16, 2025 -
Fix convert script for non-hf GLM4 checkpoints
#12992 opened
Apr 17, 2025 -
[CANN] Add the n_graph_splits performance metric to llama-bench.
#12994 opened
Apr 17, 2025 -
threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling
#12995 opened
Apr 17, 2025 -
make memset range dynamic
#13002 opened
Apr 18, 2025 -
[SYCL][OPT] Fix reorder optimization for Q4_0
#13003 opened
Apr 18, 2025 -
Nix portability improvements
#13005 opened
Apr 18, 2025 -
CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID
#13014 opened
Apr 18, 2025 -
vulkan: matmul gcn tuning
#13016 opened
Apr 18, 2025 -
Append mult-eos,half-rope,bos to GLM4-0414 and Z
#13021 opened
Apr 19, 2025 -
Bitnet: directly use scale instead of inverting it twice
#13026 opened
Apr 19, 2025 -
quantize: improve pattern matching for allowed tensors
#13033 opened
Apr 20, 2025 -
gguf-py : avoid requiring PySide6 for packaged scripts
#13036 opened
Apr 20, 2025 -
quantize: Handle user-defined pruning of whole layers (blocks)
#13037 opened
Apr 20, 2025 -
[CANN]Support OP MUL_MAT_ID
#13042 opened
Apr 21, 2025 -
llama-gemma3-cli: Sigint rework in gemma3 vision example
#13043 opened
Apr 21, 2025 -
ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel
#13053 opened
Apr 21, 2025 -
Update README.md for tts example to use afplay on MacOS
#13056 opened
Apr 22, 2025 -
Fix ChatGLMModel for glm-4-9b cannot find tokenizer merges in model file
#13058 opened
Apr 22, 2025 -
rpc : add command line option for number of threads for the CPU backend
#13060 opened
Apr 22, 2025 -
cmake : do not include ./src as public for libllama
#13062 opened
Apr 22, 2025 -
mtmd : Support Pixtral 12B (help needed - 2D RoPE)
#13065 opened
Apr 22, 2025 -
fix(rpc): Improve input validation and error handling
#13069 opened
Apr 22, 2025
30 Issues closed by 12 people
-
Feature Request:
#13070 closed
Apr 22, 2025 -
Feature Request: Support for Deepseek Janus-Pro-7B & Janus-1.3B
#11490 closed
Apr 22, 2025 -
Feature Request: Convert deepseek-v3's mtp module to gguf and quantize to q4km
#12242 closed
Apr 22, 2025 -
Misc. bug: Misc. bug: cannot convert GLM-4v-9B- (glm-4v-9b) to GGUF format #11263
#12266 closed
Apr 22, 2025 -
Misc. bug: llama fails to run on older x86 hardware.
#12866 closed
Apr 21, 2025 -
Eval bug: Deepseek V2 Lite no longer working with Vulkan (assert fail during tg)
#12956 closed
Apr 20, 2025 -
Compile bug: there is a build bug in examples/llama.android and it will brings build failure in CI
#12638 closed
Apr 20, 2025 -
Eval bug: CANNOT LINK EXECUTABLE "./llama-cli": library "libomp.so" not found: needed by main executable
#11979 closed
Apr 20, 2025 -
Refactor: rename `n_swa` to `n_aux`
#13019 closed
Apr 19, 2025 -
Eval bug: Server returns 500 error on /api/generate and /api/chat requests
#12176 closed
Apr 19, 2025 -
Misc. bug: llama-rpc crashes when deciding memory on CPU with CUDA_VISIBLE_DEVICES=""
#12203 closed
Apr 19, 2025 -
Misc. bug: Ctrl+D no longer works properly
#12949 closed
Apr 18, 2025 -
Does V100 support flash-attention?
#13008 closed
Apr 18, 2025 -
Feature Request: Support for Apriel-5B-Instruct
#12926 closed
Apr 18, 2025 -
Feature Request: Allow more than 16 combined devices to participate in RPC
#12967 closed
Apr 17, 2025 -
Misc. bug: HIP when using llama.bench and kv cache quant cpu is doing the work instead of gpu
#12624 closed
Apr 17, 2025 -
Eval bug: The answers have some problems with the example/llama.android
#12158 closed
Apr 17, 2025 -
Compile bug: issue compiling in ubuntu (desktop and server version) using virtualbox
#12164 closed
Apr 17, 2025 -
Misc. bug: "Unexpected empty grammar stack after accepting piece" tool crash
#12597 closed
Apr 16, 2025 -
Misc. bug: RISCV output bug when using rvv with vlen > 256bit
#11041 closed
Apr 16, 2025 -
Eval bug: Error running Phi4-mini gguf: unknown pre-tokenizer type: 'gpt-4o'
#12122 closed
Apr 16, 2025 -
Eval bug: In RISC-V, output tokens are broken
#12124 closed
Apr 16, 2025 -
Feature Request:
#12128 closed
Apr 16, 2025 -
Misc. bug: vulkan on Adreno GPU
#12139 closed
Apr 16, 2025 -
Misc. bug: gguf-dump 'newbyteorder' was removed
#12146 closed
Apr 16, 2025 -
Replacement for deprecated codevct string conversion
#12151 closed
Apr 16, 2025
35 Issues opened by 30 people
-
Compile bug: Vulkan Cross compile for arm64
#13068 opened
Apr 22, 2025 -
Misc. bug: RPC server crash on `SET_TENSOR` with invalid `ggml_type`
#13067 opened
Apr 22, 2025 -
Model Repeats Nonsensical Output
#13066 opened
Apr 22, 2025 -
Doc. bug: docs/multimodal/gemma3.md need to be updated
#13064 opened
Apr 22, 2025 -
Compile bug: NVIDIA A800-SXM4-40GB ggml_cuda_init failed
#13059 opened
Apr 22, 2025 -
Misc. bug: Intel container images keep getting `No space left on device` during CI Build
#13052 opened
Apr 21, 2025 -
Misc. bug: CPU Usage low in rpc-server mode
#13051 opened
Apr 21, 2025 -
Slow token generation speed of Gemma 3 QAT Models
#13048 opened
Apr 21, 2025 -
Eval bug: multimodal llama-gemma3-cli gives nonsensical outputs when used with Vulkan
#13046 opened
Apr 21, 2025 -
Misc. bug: llama-cli (vulkan backend) output gibberish with old vulkan sdk
#13044 opened
Apr 21, 2025 -
Eval bug: Error when load `bge-reranker-v2-gemma` model
#13041 opened
Apr 21, 2025 -
Feature Request: Ability to pack multiple GGUFs into single one
#13028 opened
Apr 19, 2025 -
Feature Proposal: Server Model Switching at Runtime
#13027 opened
Apr 19, 2025 -
Misc. bug: (clip.cpp) q8_0 mmproj is broken on gemma 3
#13025 opened
Apr 19, 2025 -
Eval bug: RWKV inference issue with llama-server
#13018 opened
Apr 19, 2025 -
[Build] Some Build Options/Definitions seems Missing in ggml-base
#13017 opened
Apr 19, 2025 -
Perplexity script for non GGUF quantization
#13015 opened
Apr 18, 2025 -
Eval bug: why Gemma 3 model has run into CPU inference
#13004 opened
Apr 18, 2025 -
Eval bug: Segmentation fault when running gemma3-cli on Android
#13000 opened
Apr 18, 2025 -
gmake[2]: *** [tests/CMakeFiles/test-tokenizer-0.dir/build.make:107: bin/test-tokenizer-0] Error 1
#12998 opened
Apr 18, 2025 -
Eval bug: microsoft/bitnet-b1.58-2B-4T-gguf
#12997 opened
Apr 17, 2025 -
Eval bug: HIP: llama.cpp server locks up when running multiple instances on the same gpu
#12991 opened
Apr 17, 2025 -
Eval bug: Quad P40 unable to run 70B models on recent releases
#12990 opened
Apr 17, 2025 -
Feature Request: Add kv-quant fa kernel variants for head sizes other than 128
#12989 opened
Apr 17, 2025 -
Misc. bug: Potential memory leak in backend registry
#12986 opened
Apr 16, 2025 -
Feature Request: Make chat sessions possible with multi model cli tools
#12982 opened
Apr 16, 2025 -
Feature Request: multi model cli tools: Convert submitted images to best size and format for model
#12981 opened
Apr 16, 2025 -
Misc. bug: Only using 1 compute core on AMD
#12978 opened
Apr 16, 2025 -
Misc. bug: Vulkan performance depends on thread priority
#12976 opened
Apr 16, 2025 -
Eval bug: Gemma-3 Vision failed with CUDA
#12973 opened
Apr 16, 2025 -
Misc. bug: llama-server speculative decoding not as performant as llama-speculative-simple
#12968 opened
Apr 16, 2025
89 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Resolved half rope,multi-EOS issues in convert_hf_togguf.py for GLM4Z Model
#12957 commented on
Apr 22, 2025 • 17 new comments -
Add Qwen2.5VL support
#12402 commented on
Apr 22, 2025 • 11 new comments -
server : (experimental) vision support via libmtmd
#12898 commented on
Apr 22, 2025 • 10 new comments -
sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs
#12858 commented on
Apr 22, 2025 • 8 new comments -
llama-bench: enhance benchmark with improved token throughput measurements
#12874 commented on
Apr 21, 2025 • 2 new comments -
tts : implement sesame CSM + Mimi decoder
#12648 commented on
Apr 22, 2025 • 1 new comment -
llama-bench : Add `--override-tensors` arg
#12922 commented on
Apr 21, 2025 • 1 new comment -
imatrix : use GGUF to store importance matrices
#9400 commented on
Apr 15, 2025 • 0 new comments -
Eval bug: Loading fail on Gemma 3:12b > llama_model_load: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon
#12367 commented on
Apr 22, 2025 • 0 new comments -
Eval bug: LLaVa convert_image_encoder_to_gguf.py fails to byteswap v.head.ffn_up.bias tensor on Big-Endian system
#12863 commented on
Apr 22, 2025 • 0 new comments -
Compile bug: There was a errror while compiling support for the backend Vulkan
#12619 commented on
Apr 22, 2025 • 0 new comments -
Feature Request: Interleaved sliding window attention support for gemma 2 and 3
#12637 commented on
Apr 22, 2025 • 0 new comments -
Feature Request: Improve model load time when using the RPC backend
#12954 commented on
Apr 22, 2025 • 0 new comments -
Eval bug: input is too large to process. increase the physical batch size
#12295 commented on
Apr 22, 2025 • 0 new comments -
Compile bug: Prooted Debian in Droid Termux only
#12452 commented on
Apr 22, 2025 • 0 new comments -
Eval bug: llama.swiftui Unexpectedly found nil while unwrapping an Optional value
#12510 commented on
Apr 22, 2025 • 0 new comments -
Misc. bug: test-backend-ops grad crash by GGML_ASSERT error
#12520 commented on
Apr 22, 2025 • 0 new comments -
Feature request: Graphical GGUF viewer
#6715 commented on
Apr 21, 2025 • 0 new comments -
Error while converting peft finetuned merged model to gguf
#12494 commented on
Apr 21, 2025 • 0 new comments -
Misc. bug: Buffer offset is not aligned on macOS / Intel / Vulkan
#10984 commented on
Apr 21, 2025 • 0 new comments -
Feature Request: YuE (music gen)
#11467 commented on
Apr 21, 2025 • 0 new comments -
Misc. bug: --no-context-shift OR --context-shift ?
#12038 commented on
Apr 21, 2025 • 0 new comments -
Eval bug: Gemma-3 vision don't work multilingual
#12351 commented on
Apr 21, 2025 • 0 new comments -
Feature Request: New sampling method that boosts reasoning performance - looks too good?
#12479 commented on
Apr 21, 2025 • 0 new comments -
Feature Request: deep/ recurrent processing like "thinking", but script based.
#12486 commented on
Apr 21, 2025 • 0 new comments -
Compile bug: Error build llama cpp on CUDA
#12491 commented on
Apr 21, 2025 • 0 new comments -
Llama-3_1-Nemotron-Ultra-253B-v1 support
#12843 commented on
Apr 22, 2025 • 0 new comments -
`common`: add partial regex support
#12808 commented on
Apr 19, 2025 • 0 new comments -
ci: fix cross-compile sync issues
#12804 commented on
Apr 19, 2025 • 0 new comments -
`server`: inject date_string in llama 3.x template + fix date for firefunction v2
#12802 commented on
Apr 19, 2025 • 0 new comments -
kv-cache : separate recurrent vs non-recurrent impl
#12799 commented on
Apr 22, 2025 • 0 new comments -
Support for OuteTTS 1.0
#12794 commented on
Apr 22, 2025 • 0 new comments -
(wip) support ultravox audio input
#12745 commented on
Apr 22, 2025 • 0 new comments -
imatrix: add option to display importance score statistics for a given imatrix file
#12718 commented on
Apr 22, 2025 • 0 new comments -
vulkan: Add bfloat16 support
#12554 commented on
Apr 22, 2025 • 0 new comments -
(draft) tts: Orpheus support
#12487 commented on
Apr 22, 2025 • 0 new comments -
Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture
#12466 commented on
Apr 22, 2025 • 0 new comments -
`server`: streaming of tool calls and thoughts when `--jinja` is on
#12379 commented on
Apr 16, 2025 • 0 new comments -
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on
Apr 21, 2025 • 0 new comments -
[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs
#12063 commented on
Apr 21, 2025 • 0 new comments -
Supporting Velvet model
#11716 commented on
Apr 16, 2025 • 0 new comments -
Allow user to compile with any cuda version using github actions
#10928 commented on
Apr 20, 2025 • 0 new comments -
llama/ggml: add LLM training support
#10544 commented on
Apr 22, 2025 • 0 new comments -
add FP8 support to gguf/llama:
#10055 commented on
Apr 18, 2025 • 0 new comments -
[Draft] Tensor Parallel support to llama.cpp
#9648 commented on
Apr 19, 2025 • 0 new comments -
Feature Request: Ovis2 Support
#12358 commented on
Apr 17, 2025 • 0 new comments -
Misc. bug: Loading models result in Fatal signal 4 (SIGILL) on some Android devices
#12393 commented on
Apr 17, 2025 • 0 new comments -
Misc. bug: Program closes without providing an error.
#12417 commented on
Apr 17, 2025 • 0 new comments -
Misc. bug: CUDA graph update failed OR Failed to allocate graph error and cuda hang
#12420 commented on
Apr 17, 2025 • 0 new comments -
Qualcomm Adreno : Compute pipeline creation failed for mul_mat_vec_q4_k_f32_f32_1
#12421 commented on
Apr 17, 2025 • 0 new comments -
Feature Request: Support inference for OVIS2 models - (GGUF Conversion & quantization done with success!)
#12429 commented on
Apr 17, 2025 • 0 new comments -
Compile bug:
#12431 commented on
Apr 17, 2025 • 0 new comments -
Compile bug: fatal: not a git repository (nor any of the parent directories): .git
#12438 commented on
Apr 17, 2025 • 0 new comments -
Imatrix quantization bug: OLMo-2-0325-32B-Instruct found nan value
#12439 commented on
Apr 17, 2025 • 0 new comments -
Compile bug: FAILED: examples/llava/CMakeFiles/llava.dir/llava.cpp.obj
#12899 commented on
Apr 16, 2025 • 0 new comments -
Feature Request: Add support of convert.py for model Qwen2.5-Omni-7B
#12641 commented on
Apr 16, 2025 • 0 new comments -
Eval bug: convert_hf_to_gguf.py AttributeError:
#12847 commented on
Apr 16, 2025 • 0 new comments -
Regarding llama-bench and llama-parallel commands
#12106 commented on
Apr 16, 2025 • 0 new comments -
Compile bug: how to enable opencl in termux
#12911 commented on
Apr 16, 2025 • 0 new comments -
Eval bug: <think> tag with DeepSeek-R1-Distill-Qwen-1.5B-Q5_K_M.gguf
#11325 commented on
Apr 16, 2025 • 0 new comments -
Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache.
#12352 commented on
Apr 16, 2025 • 0 new comments -
How do I know which operator the code is computing has an error?
#12389 commented on
Apr 16, 2025 • 0 new comments -
Eval bug: KV cache changes the inference results, even when context fits and no quantization
#12396 commented on
Apr 16, 2025 • 0 new comments -
Eval bug: llama-qwen2vl-cli gives too short or cut response
#12408 commented on
Apr 16, 2025 • 0 new comments -
Eval bug: main: failed to load image /home/data1/protected/hyperscope/9/4/3/1/6/2025-02-21/Raw Crystal of Morganite Gemstone.jpg. Terminating
#12410 commented on
Apr 16, 2025 • 0 new comments -
Eval bug: GLM-Z1-9B-0414
#12946 commented on
Apr 15, 2025 • 0 new comments -
Compile bug: gcc-11: error: unrecognized command-line option '-compress-mode=size'
#12325 commented on
Apr 15, 2025 • 0 new comments -
tts : add support for SparkTTS
#12495 commented on
Apr 21, 2025 • 0 new comments -
server : improvements and maintenance
#4216 commented on
Apr 20, 2025 • 0 new comments -
Misc. bug: The KV cache is sometimes truncated incorrectly when making v1/chat/completions API calls
#11970 commented on
Apr 20, 2025 • 0 new comments -
Feature Request: (webui) Implement a experimental features on webui
#11662 commented on
Apr 20, 2025 • 0 new comments -
Bug tracker: (webui/experimental) Python interpreter via pyodide
#11762 commented on
Apr 20, 2025 • 0 new comments -
Eval bug: does llama.cpp support Intel AMX instruction? how to enable it
#12003 commented on
Apr 20, 2025 • 0 new comments -
Eval bug: getting assertion error when trying to use a gguf quantized model at inference "GGML_ASSERT(n_outputs_enc > 0 && "call llama_encode() first") failed"
#12080 commented on
Apr 20, 2025 • 0 new comments -
Misc. bug: The model's reasoning performance has significantly decreased despite using different versions of the same model architecture, identical parameters, and the same set of questions.
#12816 commented on
Apr 19, 2025 • 0 new comments -
Eval bug: OpenAI incompatible image handling in server multimodal
#12947 commented on
Apr 19, 2025 • 0 new comments -
server: Bring back multimodal support
#8010 commented on
Apr 19, 2025 • 0 new comments -
Research: Performance differences between Metal (macOS) and Vulkan (Linux)
#10982 commented on
Apr 19, 2025 • 0 new comments -
Misc. bug: Using `-c -1` results in `n_ctx = 4294967295` or `n_ctx = 8`
#12414 commented on
Apr 19, 2025 • 0 new comments -
Eval bug: RK3588 Unexpected inf values cause garbled output(or core dump) in llama-cli
#12458 commented on
Apr 19, 2025 • 0 new comments -
Feature Request: Qwen2.5 0.5b OpenCL backend support
#12463 commented on
Apr 19, 2025 • 0 new comments -
Eval bug: MiniCPM-2B-128k convert_hf_to_gguf Missing the required key: rope_scaling
#12468 commented on
Apr 19, 2025 • 0 new comments -
Misc. bug: rpc - Flash Attention Failure in Metal/CUDA RPC Mixed Environment
#12655 commented on
Apr 18, 2025 • 0 new comments -
Eval bug: Excessive stack usage during tool calling
#12234 commented on
Apr 18, 2025 • 0 new comments -
Do you add LLaDA model support?
#12360 commented on
Apr 18, 2025 • 0 new comments -
Eval bug: How to isolate chat history
#12440 commented on
Apr 18, 2025 • 0 new comments -
Feature Request: Cache nix builds on a public cache server?
#12453 commented on
Apr 18, 2025 • 0 new comments -
Suport for Jamba JambaForCausalLM
#6372 commented on
Apr 17, 2025 • 0 new comments -
[Feature request] convert_hf_to_gguf.py supports for bloomz
#12356 commented on
Apr 17, 2025 • 0 new comments