Pulse · ggml-org/llama.cpp · GitHub

August 25, 2025 – September 1, 2025

Overview

103 Active pull requests

81 Active issues

Could not load contribution data

Please try again later

56 Releases published by 1 person

b6271
published Aug 25, 2025
b6272
published Aug 25, 2025
b6273
published Aug 25, 2025
b6275
published Aug 25, 2025
b6274
published Aug 25, 2025
b6277
published Aug 25, 2025
b6276
published Aug 25, 2025
b6278
published Aug 26, 2025
b6279
published Aug 26, 2025
b6280
published Aug 26, 2025
b6282
published Aug 26, 2025
b6283
published Aug 26, 2025
b6284
published Aug 26, 2025
b6285
published Aug 26, 2025
b6286
published Aug 26, 2025
b6287
published Aug 26, 2025
b6289
published Aug 26, 2025
b6290
published Aug 26, 2025
b6291
published Aug 26, 2025
b6292
published Aug 26, 2025
b6294
published Aug 26, 2025
b6293
published Aug 26, 2025
b6295
published Aug 27, 2025
b6297
published Aug 27, 2025
b6298
published Aug 27, 2025
b6299
published Aug 27, 2025
b6300
published Aug 27, 2025
b6301
published Aug 27, 2025
b6303
published Aug 28, 2025
b6305
published Aug 28, 2025
b6307
published Aug 28, 2025
b6309
published Aug 28, 2025
b6310
published Aug 28, 2025
b6311
published Aug 28, 2025
b6312
published Aug 28, 2025
b6313
published Aug 28, 2025
b6314
published Aug 28, 2025
b6315
published Aug 29, 2025
b6316
published Aug 29, 2025
b6317
published Aug 29, 2025
b6318
published Aug 29, 2025
b6322
published Aug 30, 2025
b6323
published Aug 30, 2025
b6324
published Aug 30, 2025
b6325
published Aug 30, 2025
b6327
published Aug 30, 2025
b6328
published Aug 31, 2025
b6329
published Aug 31, 2025
b6330
published Aug 31, 2025
b6331
published Aug 31, 2025
b6332
published Aug 31, 2025
b6334
published Aug 31, 2025
b6335
published Aug 31, 2025
b6337
published Sep 1, 2025
b6340
published Sep 1, 2025
b6341
published Sep 1, 2025

73 Pull requests merged by 35 people

docs : add Hunyuan to models section
#15707 merged Sep 1, 2025
CUDA: fix build error from ambiguous __half conversions in conv2d
#15690 merged Sep 1, 2025
CANN: Optimize MUL_MAT_ID
#15658 merged Sep 1, 2025
CANN: fix RoPE cache issue on multi-device
#15629 merged Sep 1, 2025
sampling : optimize samplers by reusing bucket sort
#15665 merged Aug 31, 2025
server : enable /slots by default and make it secure
#15630 merged Aug 31, 2025
metal : fix checks for available FA kernels
#15700 merged Aug 31, 2025
llama : fix fattn reserve call n_seqs parameter
#15699 merged Aug 31, 2025
llama : separate compute buffer reserve from fattn check
#15696 merged Aug 31, 2025
ci : explicitly set fa off or on
#15692 merged Aug 31, 2025
vulkan: handle large sizes for get_rows
#15686 merged Aug 31, 2025
vulkan: mul_mat_id coopmat2 optimizations
#15546 merged Aug 31, 2025
vulkan : remove unused portability_enumeration_ext variable
#15679 merged Aug 31, 2025
vulkan: Allow fallback to sysmem memory when vidmem is full
#15649 merged Aug 31, 2025
vulkan: clamp matmul and FA results to the max finite value
#15652 merged Aug 31, 2025
ggml: update kleidiai to v1.13.0
#15663 merged Aug 30, 2025
docs : update build.md to remove MSVC arm64 notes
#15684 merged Aug 30, 2025
llama: use FA + max. GPU layers by default
#15434 merged Aug 30, 2025
CUDA: use FP32 arithmetic for conv2d
#15683 merged Aug 30, 2025
vulkan: Skip syncing for prealloc_y when it is reused
#15544 merged Aug 30, 2025
[CANN] Optimize compiler warning issues
#15661 merged Aug 30, 2025
removed obsolete doc
#15670 merged Aug 29, 2025
scripts: strip "AMD Instinct" from GPU name
#15668 merged Aug 29, 2025
tools: [SERVER] Added documentation for parallel_tool_calls param
#15647 merged Aug 29, 2025
CUDA: fix bug in rms_norm fusion
#15660 merged Aug 29, 2025
Model: Seed OSS thinking + tool call support
#15552 merged Aug 29, 2025
CUDA: fuse adds, fuse add with rms norm
#15631 merged Aug 29, 2025
nvidia nemotron nano v2 (nemotronh)
#15507 merged Aug 29, 2025
fix: Compute the full sum in llama-eval-callback
#15637 merged Aug 28, 2025
CUDA: add conv2d
#15635 merged Aug 28, 2025
ggml-cpu: fix invalid hsum build in debug s390x
#15634 merged Aug 28, 2025
ggml : fix SSM_SCAN for n_groups > 1
#15625 merged Aug 28, 2025
kv-cache : fix find_slot to not search for continuous slot
#15638 merged Aug 28, 2025
model : jina-embeddings-v3 support
#13693 merged Aug 28, 2025
scripts: add sqlite3 check for compare-commits.sh
#15633 merged Aug 28, 2025
kv-cache : remove LLAMA_SET_ROWS checks
#15505 merged Aug 28, 2025
gguf-py: byteswapping improvements
#12851 merged Aug 28, 2025
Change to info instead of debug, to explain reason for stopping.
#15604 merged Aug 28, 2025
model-conversion : add mmproj conversion target
#15628 merged Aug 28, 2025
cuda: Add cublasLt_static linking when GGML_STATIC is enabled
#15622 merged Aug 28, 2025
server: higher timeout for tests
#15621 merged Aug 27, 2025
presets : add qwen3-30B-a3b FIM
#15616 merged Aug 27, 2025
HIP: Enable support for ggml_backend_cuda_register_host_buffer
#15615 merged Aug 27, 2025
kv-cache : better estimate of n_kv for multi-sequence batches
#15610 merged Aug 27, 2025
CANN: refactor mask handling and improve performance in FA
#15561 merged Aug 27, 2025
ggml-cpu : add basic RVV support for vector f32 ops
#15057 merged Aug 27, 2025
common : add -m to bash completion for --model [no ci]
#15591 merged Aug 27, 2025
OpenCL: add fused group_norm/norm, mul, add
#15314 merged Aug 27, 2025
tests : fix test-opt with GGML_BACKEND_DL
#15599 merged Aug 26, 2025
SYCL: fix rms_norm_mul_add for tensor dim not a multiple of sg_size
#15592 merged Aug 26, 2025
fix mtmd ios build
#15579 merged Aug 26, 2025
tests: add test-backend-ops performance test for mul mat id
#15543 merged Aug 26, 2025
PowerPC: Sgemm Optimization
#15558 merged Aug 26, 2025
graph : fix assert in memory-less build_attn
#15590 merged Aug 26, 2025
model-conversion : add qat-q4 quantization targets
#15588 merged Aug 26, 2025
CUDA: return -1 for nonexistent compiled arch
#15587 merged Aug 26, 2025
metal : optimize FA vec for large sequences and BS <= 8
#15566 merged Aug 26, 2025
mtmd : support Kimi VL model
#15458 merged Aug 26, 2025
context : print graph stats for memory-less contexts
#15586 merged Aug 26, 2025
metal : improve MUL_MAT_ID
#15541 merged Aug 26, 2025
support MiniCPM-V 4.5
#15575 merged Aug 26, 2025
gguf-py : remove erroneous FFN_GATE entry
#15583 merged Aug 26, 2025
metal : remove contiguous assertion for src0 in IM2COL
#15577 merged Aug 26, 2025
Add a warning for special devices
#15563 merged Aug 26, 2025
vulkan: Remove splitting for mul_mat_id
#15568 merged Aug 26, 2025
CUDA: Accelerate MXFP4 table lookup using __byte_perm
#15451 merged Aug 25, 2025
opencl: fix support ops condition for rms_norm
#15560 merged Aug 25, 2025
vulkan: fix min subgroup 16 condition for mmid subgroup optimization
#15565 merged Aug 25, 2025
tests: Generate unique input values for count_equal
#15487 merged Aug 25, 2025
metal: fix regression when no metal devices are present
#15531 merged Aug 25, 2025
CUDA: MoE helper in device code, better tile sizes
#15525 merged Aug 25, 2025
model-conversion : set pooling type to none in logits.cpp
#15564 merged Aug 25, 2025
model-conversion : add model card template for embeddings [no ci]
#15557 merged Aug 25, 2025

30 Pull requests opened by 25 people

fix(ggml-sycl): add synchronization before exiting argsort kernel
#15582 opened Aug 26, 2025
Partial code documentation
#15601 opened Aug 26, 2025
musa: fix build warnings
#15611 opened Aug 27, 2025
kleidiai: fix GGML_ASSERT(*cur_backend_id != -1) failed
#15614 opened Aug 27, 2025
model : fix internvl3_5_20b gguf conversion
#15617 opened Aug 27, 2025
Possible fix: use ne0..ne3 (dst dims) instead of ne00..ne03 in ggml_compute_forward_dup_f16
#15626 opened Aug 28, 2025
Refactor server.cpp: Split monolithic file into modular components
#15632 opened Aug 28, 2025
batch : add `pad_equal` [RFC]
#15636 opened Aug 28, 2025
Hermes 2 tool calling : fixed crash when <tool_call> had a newline before it
#15639 opened Aug 28, 2025
granite embedding small support (ModernBert arch)
#15641 opened Aug 28, 2025
Catch up to the upstream
#15642 opened Aug 28, 2025
tools: update llama-bench to include TTFT, E2E, ITL metrics
#15643 opened Aug 28, 2025
gguf-py: reduce peak RAM during convert by streaming dtype casts
#15648 opened Aug 28, 2025
utils : add t_max_predict_ms param to set prediction phase time limit to cli
#15655 opened Aug 29, 2025
model : avoid ggml_cont_3d for fused QKV weights
#15662 opened Aug 29, 2025
vulkan : update ggml_vk_instance_validation_ext_available
#15666 opened Aug 29, 2025
convert : parse safetensors directly
#15667 opened Aug 29, 2025
ggml: add ops for WAN video model (cuda && cpu)
#15669 opened Aug 29, 2025
feat: nemotron thinking & toolcalling support
#15676 opened Aug 29, 2025
chat: Fix streaming parser for granite models
#15682 opened Aug 30, 2025
tests: large sizes for get_rows
#15687 opened Aug 30, 2025
gguf: gguf_writer refactor
#15691 opened Aug 31, 2025
ggml : WebGPU add TRANSPOSE and RESHAPE to supported ops
#15695 opened Aug 31, 2025
vulkan: add missing clamps in new mul_mat_id paths
#15702 opened Aug 31, 2025
sampling : optimize dist sampler
#15704 opened Aug 31, 2025
vulkan: initialize vulkan-hpp to allow using extension function pointers
#15705 opened Aug 31, 2025
OpenCL: add attention sinks support for FA kernels
#15706 opened Aug 31, 2025
convert : remove redundant code
#15708 opened Sep 1, 2025
CANN: support ext_factor in rope
#15710 opened Sep 1, 2025
[CANN] Support eager execution mode under ACL graph compilation
#15712 opened Sep 1, 2025

49 Issues closed by 22 people

Feature Request: s390x CI
#13243 closed Sep 1, 2025
Eval bug: Inconsistent Embedding Similarity between llama-server and LlamaCppEmbeddings for BGE-M3 Model
#14280 closed Sep 1, 2025
main: failed to quantize model from 'gemma-3n-E2B-it.f16.gguf'
#14405 closed Sep 1, 2025
Feature Request: Server stream response for "prompt processing progress"
#14685 closed Sep 1, 2025
Misc. bug: After fine-tuning LLM-Research/Meta-Llama-3-8B-Instruct model with LLaMA Factory, an error occurs while converting it to the GGUF format.
#14715 closed Sep 1, 2025
Misc. bug: b5921 release zip on github misses llama-embedding binary
#14738 closed Sep 1, 2025
Misc. bug: out of memory error after PR #13746
#14740 closed Sep 1, 2025
Misc. bug: RPC flash attention bug on deepseek models (deepseek/kimi k2)
#14747 closed Sep 1, 2025
Eval bug: Nemotron 49b doesnt load correctly
#14752 closed Sep 1, 2025
Misc. bug: llama-server, speed penalty from d9d398f since b5222
#15672 closed Aug 31, 2025
Significant Performance Drop When Using Tools in llama-server
#15389 closed Aug 31, 2025
Misc. bug: Qwen3-Embedding-0.6B-GGUF doesn't work for 32768 context size (too much memory used)
#14084 closed Aug 31, 2025
Feature Request: Generic CPU in ggml-cpu/arch
#14402 closed Aug 31, 2025
Misc. bug: crash on vulkan with new max mem alloc size calculations since b5703
#14553 closed Aug 31, 2025
Feature Request: ARMv7 / Termux Support on Mobile Devices
#14699 closed Aug 31, 2025
Compile bug: How to build Llama.cpp for ARM64 Windows with MSVC?
#15674 closed Aug 30, 2025
Compile bug: error: more than one conversion function from "half" to a built-in type applies
#15680 closed Aug 30, 2025
Compile bug: [SYCL][ARC A770] Regression: 双 A770 支持在 b5422 及以后版本失效
#14709 closed Aug 30, 2025
Misc. bug: Meta-Llama-3-8B-Instruct could not convert to .guuf. error:FileNotFoundError: File not found: /mnt/workspace/LLaMA-Factory/output/llama3_lora_sft/tokenizer.model
#14690 closed Aug 30, 2025
Compile bug: llama-llava-clip-quantize-cli not found
#14693 closed Aug 30, 2025
Feature Request: Support multiple tool calls
#15644 closed Aug 29, 2025
Eval bug: Seed-OSS crash after typing user promt
#15547 closed Aug 29, 2025
Eval bug: Qwen3 30B A3B crashing on load
#15548 closed Aug 29, 2025
Misc. bug: Does llama.cpp support Ascend 910B NPU?
#15656 closed Aug 29, 2025
OpenCL backend with Qualcomm Adreno GPUs load time is too long
#14337 closed Aug 29, 2025
Eval bug: Qwen 2.5 VL gets stuck in a loop
#14663 closed Aug 29, 2025
Feature Request: Support for NVidia Nemotron Nano v2
#15409 closed Aug 29, 2025
Eval bug: llama-server crashes on ROCm 6.4.3-1 (Arch Linux)
#15613 closed Aug 28, 2025
Eval bug: Address boundary error
#15605 closed Aug 28, 2025
Feature Request: Support Jina V3 arch
#9585 closed Aug 28, 2025
Misc. bug: llama-server issue on Windows when compiling from source code
#14826 closed Aug 28, 2025
Feature Request: Explain reason for stopping
#15553 closed Aug 28, 2025
Compile bug: Missing link with cublasLt_static when GGML_STATIC is enabled
#15620 closed Aug 28, 2025
Eval bug: failed to allocate compute pp buffers
#14836 closed Aug 27, 2025
Compile bug: mtmd ios build crashes because packaging not possible
#15578 closed Aug 27, 2025
Eval bug: HIP: After compiling the code using HIP, when the length of the output token exceeds u_batch, GGGGGGGG will be output.
#15465 closed Aug 27, 2025
Bug with gpt-oss-120b sha256-90a618fe6ff21b09ca968df959104eb650658b0bef0faef785c18c2795d993e3
#15597 closed Aug 27, 2025
Misc. bug: string_strip encoding issue (reasoning + utf8/unicode)
#15607 closed Aug 27, 2025
Compile bug: undefined reference to `ggml_backend_is_cpu`
#15598 closed Aug 26, 2025
Compile bug: ggml was not compiled with any CUDA arch <= 750
#15593 closed Aug 26, 2025
Eval bug: Coredump of llama-server\llama-cli\llama-bench on start
#15584 closed Aug 26, 2025
Eval bug: ggml was not compiled with any CUDA arch <= 750
#15589 closed Aug 26, 2025
Misc. bug: Server stops PP and TG on connection close (postman) but only stops TG (not PP) on JS openai `signal:abortController` call
#15232 closed Aug 26, 2025
Feature Request: Add support for moonshotai/Kimi-VL-A3B-Instruct
#14318 closed Aug 26, 2025
Eval bug: [CUDA] Mistral's mmproj in BF16 triggers an assertion failure because im2col impl doesn't support BF16
#15536 closed Aug 26, 2025
Eval bug: Llama-server crashes with Mistrall-Small when I pass it an image for processing.
#15574 closed Aug 26, 2025
Eval bug: The content returned by the model is very strange
#14641 closed Aug 26, 2025
Compile bug: ggml_graph_compute_with_ctx import error
#15570 closed Aug 25, 2025
Misc. bug: ARGMAX tie-breaking
#15484 closed Aug 25, 2025

32 Issues opened by 31 people

Eval bug: Granite 4.0 Invalid diff: '<|tool_call|>["1025202362"]' not found at start of '<|tool_call|>["1350490027"]'
#15713 opened Sep 1, 2025
Misc. bug: Integer overflow leads to buffer overflow.
#15711 opened Sep 1, 2025
Eval bug: InternVL3_5 GPT OSS 20b crashes at warmup
#15701 opened Aug 31, 2025
Compile bug: undefined symbol: vkResetQueryPool for android-ndk-r29-beta3
#15698 opened Aug 31, 2025
Misc. bug: Fatal crash when using `--reranking` flag in `llama-server`
#15685 opened Aug 30, 2025
Misc. bug: Granite chat parser doesn't stream content section
#15681 opened Aug 30, 2025
Misc. bug: Build 6278 Vulkan crashes: llama-bench and llama-server both affected
#15678 opened Aug 30, 2025
Eval bug: Nemotron v2 Nano always reprocesses prompt
#15677 opened Aug 29, 2025
Compile bug: Unable to compile llama-server from source.
#15675 opened Aug 29, 2025
Eval bug: NVIDIA Nemotron Nano 9B v2 thinking tokens not properly handled in the llama-server web ui
#15673 opened Aug 29, 2025
Eval bug: response format not respected when --jinja enabled (for llama3.1-3.2)
#15664 opened Aug 29, 2025
Feature Request: Add Linux HIP Release
#15659 opened Aug 29, 2025
Feature Request: Add support for t_max_predict_ms to CLI
#15654 opened Aug 29, 2025
Compile bug: Build Llama.cpp failed for Windows on ARM with KLEIDIAI enabled.
#15653 opened Aug 29, 2025
Misc. bug: CONVERT merged_16bit TO f16_gguf BY MODEL phi-3.5-mini-instruct
#15651 opened Aug 29, 2025
Misc. bug: Calling a tool that uses a specific regular expression causes the server to crash
#15640 opened Aug 28, 2025
Misc. bug: Vulkan FA massively slowdowns Qwen 30B
#15624 opened Aug 27, 2025
Misc. bug: convert_hf_to_gguf.py runs out of memory
#15623 opened Aug 27, 2025
Misc. bug: Performance Downgrade happened from b6188, for llama-bin-win-vulkan-x64 distribution.
#15618 opened Aug 27, 2025
Feature Request: Add support for Ovis2.5
#15612 opened Aug 27, 2025
Eval bug: LLAMA_CPP_PROCESS_ERROR
#15609 opened Aug 27, 2025
Misc. bug: Tool calling CRASH : Unexpected empty grammar stack after accepting piece<tool_call>
#15608 opened Aug 27, 2025
Feature Request: Support Cohere's new Command A Reasoning Model
#15603 opened Aug 27, 2025
Feature Request: Repeated Unecessary Activation Quantization Ops
#15602 opened Aug 26, 2025
Eval bug: Kimi-VL-A3B-Thinking-2506 not working correctly
#15600 opened Aug 26, 2025
Misc. bug: Windows 11 flags b6287 as containing a virus
#15596 opened Aug 26, 2025
Eval bug: Mistral v7-tekken tokenizer regex causes segfault on certain strings
#15594 opened Aug 26, 2025
Feature Request: Add support for OLMoE2
#15585 opened Aug 26, 2025
Feature Request: Run Lookahead within llama-server
#15581 opened Aug 26, 2025
Eval bug: Asynchronous Kernel Execution on iGPU Causes Runtime Errors with MOE Model
#15580 opened Aug 26, 2025
Eval bug: gptoss-120b multi-turn chat: KV cache growth & unexpected <|start|>assistant... prefix
#15573 opened Aug 25, 2025
Misc. bug: llama-server JS error "can't access property "delta", B.choices[0] is undefined"
#15571 opened Aug 25, 2025

81 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Deepseek V3.1 thinking mode is the default
#15533 commented on Sep 1, 2025 • 42 new comments
common : add GLM-4.5 tool calling support
#15186 commented on Aug 31, 2025 • 10 new comments
Vulkan: Add Integer Dot Product mul_mat_vec shader for legacy quants
#14903 commented on Sep 1, 2025 • 7 new comments
vulkan: use memory budget extension to read memory usage
#15545 commented on Sep 1, 2025 • 4 new comments
Fix incorrect causual attention mask caused by M-Rope
#15474 commented on Aug 26, 2025 • 2 new comments
Apple NPU acceleration integrated into llama.cpp, using MiniCPM-V 4.0 as an example.
#15262 commented on Aug 27, 2025 • 2 new comments
Fixes #15247 | Update chat.cpp to support (at least) qwen3 reasoning + tool_choice = required
#15248 commented on Sep 1, 2025 • 1 new comment
ggml: SVE support for exponential functions
#15145 commented on Aug 31, 2025 • 1 new comment
quantize: add option to automatically choose optimal quant types to reach a bpw target at lowest MSE error
#15550 commented on Aug 30, 2025 • 0 new comments
CUDA: update build CTK version to 12.8
#13360 commented on Aug 26, 2025 • 0 new comments
Introduce New Lookup-Table(LUT)-Based Matrix Multiplication Method (TMAC)
#13206 commented on Aug 30, 2025 • 0 new comments
[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs
#12063 commented on Aug 29, 2025 • 0 new comments
Allow user to compile with any cuda version using github actions
#10928 commented on Aug 27, 2025 • 0 new comments
Eval bug: GGML_ASSERT failure at ggml.c:6370 during llama_opt_epoch with Saiga Nemo 12B
#15279 commented on Sep 1, 2025 • 0 new comments
Enhancement: Improve ROCm performance on various quants (benchmarks included)
#11931 commented on Sep 1, 2025 • 0 new comments
Feature Request:
#15022 commented on Sep 1, 2025 • 0 new comments
Eval bug: Qwen3-Coder-480B-A35B-Instruct-1M-GGUF GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed
#15049 commented on Aug 31, 2025 • 0 new comments
Eos and bos tokens can be redefined as additional tokens with other ids
#1776 commented on Aug 31, 2025 • 0 new comments
Feature Request: Gemma3n multimodal support
#14429 commented on Aug 31, 2025 • 0 new comments
Compile bug: Error: unknown type name 'THREAD_POWER_THROTTLING_STATE'
#14953 commented on Aug 31, 2025 • 0 new comments
Eval bug: OLMoE (https://github.com/allenai/OLMoE) is causing issues with llama-serve
#14988 commented on Aug 31, 2025 • 0 new comments
Feature Request: Support Step3TextForCausalLM
#14998 commented on Aug 31, 2025 • 0 new comments
Eval bug: Getting memory critical errors when using --no-mmap with MoE models
#14999 commented on Aug 31, 2025 • 0 new comments
Feature Request: Add an example of using mtmd C-api, at least for images.
#15492 commented on Aug 31, 2025 • 0 new comments
Eval bug: model infer input "GGGGGGG"
#15556 commented on Aug 30, 2025 • 0 new comments
model : add grok-2 support
#15539 commented on Sep 1, 2025 • 0 new comments
ggml WebGPU: remove userdata from request adapter callback
#15527 commented on Aug 27, 2025 • 0 new comments
llama : add ggml version and commit functions
#15499 commented on Aug 28, 2025 • 0 new comments
Thinking model disabled assistant prefill
#15404 commented on Aug 28, 2025 • 0 new comments
aLoRA Support
#15327 commented on Aug 28, 2025 • 0 new comments
Add OpenVINO backend
#15307 commented on Aug 27, 2025 • 0 new comments
64 bit CUDA copy routines via GGML_CUDA_ALLOW_LARGE_TENSORS
#15298 commented on Aug 31, 2025 • 0 new comments
server: implement GLM-style MTP
#15225 commented on Aug 29, 2025 • 0 new comments
ggml: aarch64: Implement SVE F16 kernels for vector functions
#15115 commented on Aug 28, 2025 • 0 new comments
qwen3-coder tool call parser
#15019 commented on Aug 31, 2025 • 0 new comments
Add support for CogVLM model
#15002 commented on Aug 29, 2025 • 0 new comments
imatrix: calculate activation-based statistics for new format (GGUF) imatrices
#14891 commented on Sep 1, 2025 • 0 new comments
SvelteKit-based WebUI
#14839 commented on Sep 1, 2025 • 0 new comments
feat: Add optional prompt processing progress streaming
#14731 commented on Aug 27, 2025 • 0 new comments
server : (webui) let server send locally-defined default webui settings
#14468 commented on Aug 26, 2025 • 0 new comments
llama : support qwen3 rerank and embeddings
#14029 commented on Aug 31, 2025 • 0 new comments
Eval bug: Image processing on Metal takes significant amount of time
#15426 commented on Aug 28, 2025 • 0 new comments
changelog : `llama-server` REST API
#9291 commented on Aug 28, 2025 • 0 new comments
Feature request: Graphical GGUF viewer
#6715 commented on Aug 28, 2025 • 0 new comments
Feature Request: Grok-2 support
#15534 commented on Aug 28, 2025 • 0 new comments
Misc. bug: llama-server embedding endpoint returns vectors with just null values after a while
#14812 commented on Aug 28, 2025 • 0 new comments
Eval bug: Uncaught exception during inference crashes llama-server
#14923 commented on Aug 28, 2025 • 0 new comments
Compile bug: "Illegal instruction" on StarFive VisionFive2 (risc-v)
#14926 commented on Aug 28, 2025 • 0 new comments
GPT-OSS 20B and Qwen3 30B A3B prefill so slowly
#15163 commented on Aug 27, 2025 • 0 new comments
Feature Request: The script convert_hf_to_gguf.py supports conversion of DeepSeek-R1-0528-FP4.
#15415 commented on Aug 27, 2025 • 0 new comments
Feature Request: Support for ERNIE-4.5-VL
#15512 commented on Aug 27, 2025 • 0 new comments
tutorials : list for llama.cpp
#13523 commented on Aug 27, 2025 • 0 new comments
Feature Request: Deepthink with Confidence
#15518 commented on Aug 27, 2025 • 0 new comments
Eval bug: JSON schema incorrectly enforces order of JSON object keys
#15216 commented on Aug 27, 2025 • 0 new comments
Eval bug: Qwen2.5-VL-7B-Instruct returns extremely inaccurate bbox coordinates
#13694 commented on Aug 26, 2025 • 0 new comments
Misc. bug: LLAMA-SERVER is 40% slower than LLAMA-CLI when using identical parameters including -ot option for tensor offloading
#14201 commented on Aug 26, 2025 • 0 new comments
Eval bug: Regression: Tool calls still returned in content field as JSON string instead of tool_calls array
#14697 commented on Aug 26, 2025 • 0 new comments
Misc. bug: llamacpp crashes my PC whenever I close the console for it.
#14713 commented on Aug 26, 2025 • 0 new comments
Misc. bug: llama-bench json output is too verbose
#15554 commented on Aug 25, 2025 • 0 new comments
Eval bug: KV buffer allocation when using --n-cpu-moe on ROCm multi-GPU setup
#15538 commented on Aug 25, 2025 • 0 new comments
Feature Request: --n-cpu-moe option for multi GPU?
#15263 commented on Aug 25, 2025 • 0 new comments
Compile bug: Broken HIP on 6122
#15196 commented on Aug 30, 2025 • 0 new comments
changelog : `libllama` API
#9289 commented on Aug 30, 2025 • 0 new comments
kubernetes example
#6546 commented on Aug 30, 2025 • 0 new comments
LoRA training example
#13485 commented on Aug 30, 2025 • 0 new comments
Misc. bug: -sm row results in gibberish output on HIP (ROCm 6.3.3)
#13545 commented on Aug 30, 2025 • 0 new comments
Compile bug: build-xcframework.sh error
#14954 commented on Aug 30, 2025 • 0 new comments
Eval bug: i use llama-server to decode audio with llm but it say it dont has any audio
#14963 commented on Aug 30, 2025 • 0 new comments
libblis crashes:Default MC is non-multiple of MR for one or more datatypes.
#14972 commented on Aug 30, 2025 • 0 new comments
Misc. bug: Reranker not working with OpenWebUI
#14977 commented on Aug 30, 2025 • 0 new comments
Feature Request: Add DeepSeek-V3.1
#15496 commented on Aug 29, 2025 • 0 new comments
Misc. bug: prompt processing stall with long context on deepseek models
#15514 commented on Aug 29, 2025 • 0 new comments
Eval bug: Crash over tool calls in Qwen3 Coder
#15046 commented on Aug 29, 2025 • 0 new comments
Misc. bug: Regression in unified KV cache appears after `llama.cpp` release b5912 in b5913
#14847 commented on Aug 29, 2025 • 0 new comments
Eval bug:
#14936 commented on Aug 29, 2025 • 0 new comments
Eval bug: Repeated sequences with medgemma3-27b-text-it gguf
#14938 commented on Aug 29, 2025 • 0 new comments
Feature Request: T5Gemma support
#14940 commented on Aug 29, 2025 • 0 new comments
Eval bug: llama_model_load: unknown model architecture: 'mllama' llama_model_load_from_file_impl: Segmentation fault (core dumped)
#14951 commented on Aug 29, 2025 • 0 new comments
Eval bug: No generation with follow up on high token responses on GPT-OSS 120B
#15517 commented on Aug 28, 2025 • 0 new comments
Feature Request: Support GLM-4.1V-9B-Thinking
#14495 commented on Aug 28, 2025 • 0 new comments
Eval bug: Jinja fails on `gpt-oss-120b` when using Vulkan
#15274 commented on Aug 28, 2025 • 0 new comments