-
Notifications
You must be signed in to change notification settings - Fork 12.9k
Insights: ggml-org/llama.cpp
Overview
Could not load contribution data
Please try again later
56 Releases published by 1 person
-
b6271
published
Aug 25, 2025 -
b6272
published
Aug 25, 2025 -
b6273
published
Aug 25, 2025 -
b6275
published
Aug 25, 2025 -
b6274
published
Aug 25, 2025 -
b6277
published
Aug 25, 2025 -
b6276
published
Aug 25, 2025 -
b6278
published
Aug 26, 2025 -
b6279
published
Aug 26, 2025 -
b6280
published
Aug 26, 2025 -
b6282
published
Aug 26, 2025 -
b6283
published
Aug 26, 2025 -
b6284
published
Aug 26, 2025 -
b6285
published
Aug 26, 2025 -
b6286
published
Aug 26, 2025 -
b6287
published
Aug 26, 2025 -
b6289
published
Aug 26, 2025 -
b6290
published
Aug 26, 2025 -
b6291
published
Aug 26, 2025 -
b6292
published
Aug 26, 2025 -
b6294
published
Aug 26, 2025 -
b6293
published
Aug 26, 2025 -
b6295
published
Aug 27, 2025 -
b6297
published
Aug 27, 2025 -
b6298
published
Aug 27, 2025 -
b6299
published
Aug 27, 2025 -
b6300
published
Aug 27, 2025 -
b6301
published
Aug 27, 2025 -
b6303
published
Aug 28, 2025 -
b6305
published
Aug 28, 2025 -
b6307
published
Aug 28, 2025 -
b6309
published
Aug 28, 2025 -
b6310
published
Aug 28, 2025 -
b6311
published
Aug 28, 2025 -
b6312
published
Aug 28, 2025 -
b6313
published
Aug 28, 2025 -
b6314
published
Aug 28, 2025 -
b6315
published
Aug 29, 2025 -
b6316
published
Aug 29, 2025 -
b6317
published
Aug 29, 2025 -
b6318
published
Aug 29, 2025 -
b6322
published
Aug 30, 2025 -
b6323
published
Aug 30, 2025 -
b6324
published
Aug 30, 2025 -
b6325
published
Aug 30, 2025 -
b6327
published
Aug 30, 2025 -
b6328
published
Aug 31, 2025 -
b6329
published
Aug 31, 2025 -
b6330
published
Aug 31, 2025 -
b6331
published
Aug 31, 2025 -
b6332
published
Aug 31, 2025 -
b6334
published
Aug 31, 2025 -
b6335
published
Aug 31, 2025 -
b6337
published
Sep 1, 2025 -
b6340
published
Sep 1, 2025 -
b6341
published
Sep 1, 2025
73 Pull requests merged by 35 people
-
docs : add Hunyuan to models section
#15707 merged
Sep 1, 2025 -
CUDA: fix build error from ambiguous __half conversions in conv2d
#15690 merged
Sep 1, 2025 -
CANN: Optimize MUL_MAT_ID
#15658 merged
Sep 1, 2025 -
CANN: fix RoPE cache issue on multi-device
#15629 merged
Sep 1, 2025 -
sampling : optimize samplers by reusing bucket sort
#15665 merged
Aug 31, 2025 -
server : enable /slots by default and make it secure
#15630 merged
Aug 31, 2025 -
metal : fix checks for available FA kernels
#15700 merged
Aug 31, 2025 -
llama : fix fattn reserve call n_seqs parameter
#15699 merged
Aug 31, 2025 -
llama : separate compute buffer reserve from fattn check
#15696 merged
Aug 31, 2025 -
ci : explicitly set fa off or on
#15692 merged
Aug 31, 2025 -
vulkan: handle large sizes for get_rows
#15686 merged
Aug 31, 2025 -
vulkan: mul_mat_id coopmat2 optimizations
#15546 merged
Aug 31, 2025 -
vulkan : remove unused portability_enumeration_ext variable
#15679 merged
Aug 31, 2025 -
vulkan: Allow fallback to sysmem memory when vidmem is full
#15649 merged
Aug 31, 2025 -
vulkan: clamp matmul and FA results to the max finite value
#15652 merged
Aug 31, 2025 -
ggml: update kleidiai to v1.13.0
#15663 merged
Aug 30, 2025 -
docs : update build.md to remove MSVC arm64 notes
#15684 merged
Aug 30, 2025 -
llama: use FA + max. GPU layers by default
#15434 merged
Aug 30, 2025 -
CUDA: use FP32 arithmetic for conv2d
#15683 merged
Aug 30, 2025 -
vulkan: Skip syncing for prealloc_y when it is reused
#15544 merged
Aug 30, 2025 -
[CANN] Optimize compiler warning issues
#15661 merged
Aug 30, 2025 -
removed obsolete doc
#15670 merged
Aug 29, 2025 -
scripts: strip "AMD Instinct" from GPU name
#15668 merged
Aug 29, 2025 -
tools: [SERVER] Added documentation for
parallel_tool_calls
param#15647 merged
Aug 29, 2025 -
CUDA: fix bug in rms_norm fusion
#15660 merged
Aug 29, 2025 -
Model: Seed OSS thinking + tool call support
#15552 merged
Aug 29, 2025 -
CUDA: fuse adds, fuse add with rms norm
#15631 merged
Aug 29, 2025 -
nvidia nemotron nano v2 (nemotronh)
#15507 merged
Aug 29, 2025 -
fix: Compute the full sum in llama-eval-callback
#15637 merged
Aug 28, 2025 -
CUDA: add conv2d
#15635 merged
Aug 28, 2025 -
ggml-cpu: fix invalid hsum build in debug s390x
#15634 merged
Aug 28, 2025 -
ggml : fix SSM_SCAN for n_groups > 1
#15625 merged
Aug 28, 2025 -
kv-cache : fix find_slot to not search for continuous slot
#15638 merged
Aug 28, 2025 -
model : jina-embeddings-v3 support
#13693 merged
Aug 28, 2025 -
scripts: add sqlite3 check for compare-commits.sh
#15633 merged
Aug 28, 2025 -
kv-cache : remove LLAMA_SET_ROWS checks
#15505 merged
Aug 28, 2025 -
gguf-py: byteswapping improvements
#12851 merged
Aug 28, 2025 -
Change to info instead of debug, to explain reason for stopping.
#15604 merged
Aug 28, 2025 -
model-conversion : add mmproj conversion target
#15628 merged
Aug 28, 2025 -
cuda: Add cublasLt_static linking when GGML_STATIC is enabled
#15622 merged
Aug 28, 2025 -
server: higher timeout for tests
#15621 merged
Aug 27, 2025 -
presets : add qwen3-30B-a3b FIM
#15616 merged
Aug 27, 2025 -
HIP: Enable support for ggml_backend_cuda_register_host_buffer
#15615 merged
Aug 27, 2025 -
kv-cache : better estimate of n_kv for multi-sequence batches
#15610 merged
Aug 27, 2025 -
CANN: refactor mask handling and improve performance in FA
#15561 merged
Aug 27, 2025 -
ggml-cpu : add basic RVV support for vector f32 ops
#15057 merged
Aug 27, 2025 -
common : add -m to bash completion for --model [no ci]
#15591 merged
Aug 27, 2025 -
OpenCL: add fused group_norm/norm, mul, add
#15314 merged
Aug 27, 2025 -
tests : fix test-opt with GGML_BACKEND_DL
#15599 merged
Aug 26, 2025 -
SYCL: fix rms_norm_mul_add for tensor dim not a multiple of sg_size
#15592 merged
Aug 26, 2025 -
fix mtmd ios build
#15579 merged
Aug 26, 2025 -
tests: add test-backend-ops performance test for mul mat id
#15543 merged
Aug 26, 2025 -
PowerPC: Sgemm Optimization
#15558 merged
Aug 26, 2025 -
graph : fix assert in memory-less build_attn
#15590 merged
Aug 26, 2025 -
model-conversion : add qat-q4 quantization targets
#15588 merged
Aug 26, 2025 -
CUDA: return -1 for nonexistent compiled arch
#15587 merged
Aug 26, 2025 -
metal : optimize FA vec for large sequences and BS <= 8
#15566 merged
Aug 26, 2025 -
mtmd : support Kimi VL model
#15458 merged
Aug 26, 2025 -
context : print graph stats for memory-less contexts
#15586 merged
Aug 26, 2025 -
metal : improve
MUL_MAT_ID
#15541 merged
Aug 26, 2025 -
support MiniCPM-V 4.5
#15575 merged
Aug 26, 2025 -
gguf-py : remove erroneous FFN_GATE entry
#15583 merged
Aug 26, 2025 -
metal : remove contiguous assertion for src0 in IM2COL
#15577 merged
Aug 26, 2025 -
Add a warning for special devices
#15563 merged
Aug 26, 2025 -
vulkan: Remove splitting for mul_mat_id
#15568 merged
Aug 26, 2025 -
CUDA: Accelerate MXFP4 table lookup using
__byte_perm
#15451 merged
Aug 25, 2025 -
opencl: fix support ops condition for
rms_norm
#15560 merged
Aug 25, 2025 -
vulkan: fix min subgroup 16 condition for mmid subgroup optimization
#15565 merged
Aug 25, 2025 -
tests: Generate unique input values for count_equal
#15487 merged
Aug 25, 2025 -
metal: fix regression when no metal devices are present
#15531 merged
Aug 25, 2025 -
CUDA: MoE helper in device code, better tile sizes
#15525 merged
Aug 25, 2025 -
model-conversion : set pooling type to none in logits.cpp
#15564 merged
Aug 25, 2025 -
model-conversion : add model card template for embeddings [no ci]
#15557 merged
Aug 25, 2025
30 Pull requests opened by 25 people
-
fix(ggml-sycl): add synchronization before exiting argsort kernel
#15582 opened
Aug 26, 2025 -
Partial code documentation
#15601 opened
Aug 26, 2025 -
musa: fix build warnings
#15611 opened
Aug 27, 2025 -
kleidiai: fix GGML_ASSERT(*cur_backend_id != -1) failed
#15614 opened
Aug 27, 2025 -
model : fix internvl3_5_20b gguf conversion
#15617 opened
Aug 27, 2025 -
Possible fix: use ne0..ne3 (dst dims) instead of ne00..ne03 in ggml_compute_forward_dup_f16
#15626 opened
Aug 28, 2025 -
Refactor server.cpp: Split monolithic file into modular components
#15632 opened
Aug 28, 2025 -
batch : add `pad_equal` [RFC]
#15636 opened
Aug 28, 2025 -
Hermes 2 tool calling : fixed crash when <tool_call> had a newline before it
#15639 opened
Aug 28, 2025 -
granite embedding small support (ModernBert arch)
#15641 opened
Aug 28, 2025 -
Catch up to the upstream
#15642 opened
Aug 28, 2025 -
tools: update llama-bench to include TTFT, E2E, ITL metrics
#15643 opened
Aug 28, 2025 -
gguf-py: reduce peak RAM during convert by streaming dtype casts
#15648 opened
Aug 28, 2025 -
utils : add t_max_predict_ms param to set prediction phase time limit to cli
#15655 opened
Aug 29, 2025 -
model : avoid ggml_cont_3d for fused QKV weights
#15662 opened
Aug 29, 2025 -
vulkan : update ggml_vk_instance_validation_ext_available
#15666 opened
Aug 29, 2025 -
convert : parse safetensors directly
#15667 opened
Aug 29, 2025 -
ggml: add ops for WAN video model (cuda && cpu)
#15669 opened
Aug 29, 2025 -
feat: nemotron thinking & toolcalling support
#15676 opened
Aug 29, 2025 -
chat: Fix streaming parser for granite models
#15682 opened
Aug 30, 2025 -
tests: large sizes for get_rows
#15687 opened
Aug 30, 2025 -
gguf: gguf_writer refactor
#15691 opened
Aug 31, 2025 -
ggml : WebGPU add TRANSPOSE and RESHAPE to supported ops
#15695 opened
Aug 31, 2025 -
vulkan: add missing clamps in new mul_mat_id paths
#15702 opened
Aug 31, 2025 -
sampling : optimize dist sampler
#15704 opened
Aug 31, 2025 -
vulkan: initialize vulkan-hpp to allow using extension function pointers
#15705 opened
Aug 31, 2025 -
OpenCL: add attention sinks support for FA kernels
#15706 opened
Aug 31, 2025 -
convert : remove redundant code
#15708 opened
Sep 1, 2025 -
CANN: support ext_factor in rope
#15710 opened
Sep 1, 2025 -
[CANN] Support eager execution mode under ACL graph compilation
#15712 opened
Sep 1, 2025
49 Issues closed by 22 people
-
Feature Request: s390x CI
#13243 closed
Sep 1, 2025 -
Eval bug: Inconsistent Embedding Similarity between llama-server and LlamaCppEmbeddings for BGE-M3 Model
#14280 closed
Sep 1, 2025 -
main: failed to quantize model from 'gemma-3n-E2B-it.f16.gguf'
#14405 closed
Sep 1, 2025 -
Feature Request: Server stream response for "prompt processing progress"
#14685 closed
Sep 1, 2025 -
Misc. bug: b5921 release zip on github misses llama-embedding binary
#14738 closed
Sep 1, 2025 -
Misc. bug: out of memory error after PR #13746
#14740 closed
Sep 1, 2025 -
Misc. bug: RPC flash attention bug on deepseek models (deepseek/kimi k2)
#14747 closed
Sep 1, 2025 -
Eval bug: Nemotron 49b doesnt load correctly
#14752 closed
Sep 1, 2025 -
Misc. bug: llama-server, speed penalty from d9d398f since b5222
#15672 closed
Aug 31, 2025 -
Significant Performance Drop When Using Tools in llama-server
#15389 closed
Aug 31, 2025 -
Misc. bug: Qwen3-Embedding-0.6B-GGUF doesn't work for 32768 context size (too much memory used)
#14084 closed
Aug 31, 2025 -
Feature Request: Generic CPU in ggml-cpu/arch
#14402 closed
Aug 31, 2025 -
Misc. bug: crash on vulkan with new max mem alloc size calculations since b5703
#14553 closed
Aug 31, 2025 -
Feature Request: ARMv7 / Termux Support on Mobile Devices
#14699 closed
Aug 31, 2025 -
Compile bug: How to build Llama.cpp for ARM64 Windows with MSVC?
#15674 closed
Aug 30, 2025 -
Compile bug: error: more than one conversion function from "half" to a built-in type applies
#15680 closed
Aug 30, 2025 -
Compile bug: [SYCL][ARC A770] Regression: 双 A770 支持在 b5422 及以后版本失效
#14709 closed
Aug 30, 2025 -
Compile bug: llama-llava-clip-quantize-cli not found
#14693 closed
Aug 30, 2025 -
Feature Request: Support multiple tool calls
#15644 closed
Aug 29, 2025 -
Eval bug: Seed-OSS crash after typing user promt
#15547 closed
Aug 29, 2025 -
Eval bug: Qwen3 30B A3B crashing on load
#15548 closed
Aug 29, 2025 -
Misc. bug: Does llama.cpp support Ascend 910B NPU?
#15656 closed
Aug 29, 2025 -
OpenCL backend with Qualcomm Adreno GPUs load time is too long
#14337 closed
Aug 29, 2025 -
Eval bug: Qwen 2.5 VL gets stuck in a loop
#14663 closed
Aug 29, 2025 -
Feature Request: Support for NVidia Nemotron Nano v2
#15409 closed
Aug 29, 2025 -
Eval bug: llama-server crashes on ROCm 6.4.3-1 (Arch Linux)
#15613 closed
Aug 28, 2025 -
Eval bug: Address boundary error
#15605 closed
Aug 28, 2025 -
Feature Request: Support Jina V3 arch
#9585 closed
Aug 28, 2025 -
Misc. bug: llama-server issue on Windows when compiling from source code
#14826 closed
Aug 28, 2025 -
Feature Request: Explain reason for stopping
#15553 closed
Aug 28, 2025 -
Compile bug: Missing link with cublasLt_static when GGML_STATIC is enabled
#15620 closed
Aug 28, 2025 -
Eval bug: failed to allocate compute pp buffers
#14836 closed
Aug 27, 2025 -
Compile bug: mtmd ios build crashes because packaging not possible
#15578 closed
Aug 27, 2025 -
Bug with gpt-oss-120b sha256-90a618fe6ff21b09ca968df959104eb650658b0bef0faef785c18c2795d993e3
#15597 closed
Aug 27, 2025 -
Misc. bug: string_strip encoding issue (reasoning + utf8/unicode)
#15607 closed
Aug 27, 2025 -
Compile bug: undefined reference to `ggml_backend_is_cpu`
#15598 closed
Aug 26, 2025 -
Compile bug: ggml was not compiled with any CUDA arch <= 750
#15593 closed
Aug 26, 2025 -
Eval bug: Coredump of llama-server\llama-cli\llama-bench on start
#15584 closed
Aug 26, 2025 -
Eval bug: ggml was not compiled with any CUDA arch <= 750
#15589 closed
Aug 26, 2025 -
Feature Request: Add support for moonshotai/Kimi-VL-A3B-Instruct
#14318 closed
Aug 26, 2025 -
Eval bug: Llama-server crashes with Mistrall-Small when I pass it an image for processing.
#15574 closed
Aug 26, 2025 -
Eval bug: The content returned by the model is very strange
#14641 closed
Aug 26, 2025 -
Compile bug: ggml_graph_compute_with_ctx import error
#15570 closed
Aug 25, 2025 -
Misc. bug: ARGMAX tie-breaking
#15484 closed
Aug 25, 2025
32 Issues opened by 31 people
-
Misc. bug: Integer overflow leads to buffer overflow.
#15711 opened
Sep 1, 2025 -
Eval bug: InternVL3_5 GPT OSS 20b crashes at warmup
#15701 opened
Aug 31, 2025 -
Compile bug: undefined symbol: vkResetQueryPool for android-ndk-r29-beta3
#15698 opened
Aug 31, 2025 -
Misc. bug: Fatal crash when using `--reranking` flag in `llama-server`
#15685 opened
Aug 30, 2025 -
Misc. bug: Granite chat parser doesn't stream content section
#15681 opened
Aug 30, 2025 -
Misc. bug: Build 6278 Vulkan crashes: llama-bench and llama-server both affected
#15678 opened
Aug 30, 2025 -
Eval bug: Nemotron v2 Nano always reprocesses prompt
#15677 opened
Aug 29, 2025 -
Compile bug: Unable to compile llama-server from source.
#15675 opened
Aug 29, 2025 -
Eval bug: NVIDIA Nemotron Nano 9B v2 thinking tokens not properly handled in the llama-server web ui
#15673 opened
Aug 29, 2025 -
Eval bug: response format not respected when --jinja enabled (for llama3.1-3.2)
#15664 opened
Aug 29, 2025 -
Feature Request: Add Linux HIP Release
#15659 opened
Aug 29, 2025 -
Feature Request: Add support for t_max_predict_ms to CLI
#15654 opened
Aug 29, 2025 -
Compile bug: Build Llama.cpp failed for Windows on ARM with KLEIDIAI enabled.
#15653 opened
Aug 29, 2025 -
Misc. bug: CONVERT merged_16bit TO f16_gguf BY MODEL phi-3.5-mini-instruct
#15651 opened
Aug 29, 2025 -
Misc. bug: Calling a tool that uses a specific regular expression causes the server to crash
#15640 opened
Aug 28, 2025 -
Misc. bug: Vulkan FA massively slowdowns Qwen 30B
#15624 opened
Aug 27, 2025 -
Misc. bug: convert_hf_to_gguf.py runs out of memory
#15623 opened
Aug 27, 2025 -
Misc. bug: Performance Downgrade happened from b6188, for llama-bin-win-vulkan-x64 distribution.
#15618 opened
Aug 27, 2025 -
Feature Request: Add support for Ovis2.5
#15612 opened
Aug 27, 2025 -
Eval bug: LLAMA_CPP_PROCESS_ERROR
#15609 opened
Aug 27, 2025 -
Misc. bug: Tool calling CRASH : Unexpected empty grammar stack after accepting piece<tool_call>
#15608 opened
Aug 27, 2025 -
Feature Request: Support Cohere's new Command A Reasoning Model
#15603 opened
Aug 27, 2025 -
Feature Request: Repeated Unecessary Activation Quantization Ops
#15602 opened
Aug 26, 2025 -
Eval bug: Kimi-VL-A3B-Thinking-2506 not working correctly
#15600 opened
Aug 26, 2025 -
Misc. bug: Windows 11 flags b6287 as containing a virus
#15596 opened
Aug 26, 2025 -
Eval bug: Mistral v7-tekken tokenizer regex causes segfault on certain strings
#15594 opened
Aug 26, 2025 -
Feature Request: Add support for OLMoE2
#15585 opened
Aug 26, 2025 -
Feature Request: Run Lookahead within llama-server
#15581 opened
Aug 26, 2025 -
Eval bug: Asynchronous Kernel Execution on iGPU Causes Runtime Errors with MOE Model
#15580 opened
Aug 26, 2025 -
Eval bug: gptoss-120b multi-turn chat: KV cache growth & unexpected <|start|>assistant... prefix
#15573 opened
Aug 25, 2025 -
Misc. bug: llama-server JS error "can't access property "delta", B.choices[0] is undefined"
#15571 opened
Aug 25, 2025
81 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Deepseek V3.1 thinking mode is the default
#15533 commented on
Sep 1, 2025 • 42 new comments -
common : add GLM-4.5 tool calling support
#15186 commented on
Aug 31, 2025 • 10 new comments -
Vulkan: Add Integer Dot Product mul_mat_vec shader for legacy quants
#14903 commented on
Sep 1, 2025 • 7 new comments -
vulkan: use memory budget extension to read memory usage
#15545 commented on
Sep 1, 2025 • 4 new comments -
Fix incorrect causual attention mask caused by M-Rope
#15474 commented on
Aug 26, 2025 • 2 new comments -
Apple NPU acceleration integrated into llama.cpp, using MiniCPM-V 4.0 as an example.
#15262 commented on
Aug 27, 2025 • 2 new comments -
Fixes #15247 | Update chat.cpp to support (at least) qwen3 reasoning + tool_choice = required
#15248 commented on
Sep 1, 2025 • 1 new comment -
ggml: SVE support for exponential functions
#15145 commented on
Aug 31, 2025 • 1 new comment -
quantize: add option to automatically choose optimal quant types to reach a bpw target at lowest MSE error
#15550 commented on
Aug 30, 2025 • 0 new comments -
CUDA: update build CTK version to 12.8
#13360 commented on
Aug 26, 2025 • 0 new comments -
Introduce New Lookup-Table(LUT)-Based Matrix Multiplication Method (TMAC)
#13206 commented on
Aug 30, 2025 • 0 new comments -
[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs
#12063 commented on
Aug 29, 2025 • 0 new comments -
Allow user to compile with any cuda version using github actions
#10928 commented on
Aug 27, 2025 • 0 new comments -
Eval bug: GGML_ASSERT failure at ggml.c:6370 during llama_opt_epoch with Saiga Nemo 12B
#15279 commented on
Sep 1, 2025 • 0 new comments -
Enhancement: Improve ROCm performance on various quants (benchmarks included)
#11931 commented on
Sep 1, 2025 • 0 new comments -
Feature Request:
#15022 commented on
Sep 1, 2025 • 0 new comments -
Eval bug: Qwen3-Coder-480B-A35B-Instruct-1M-GGUF GGML_ASSERT(ggml_nbytes(src0) <= INT_MAX) failed
#15049 commented on
Aug 31, 2025 • 0 new comments -
Eos and bos tokens can be redefined as additional tokens with other ids
#1776 commented on
Aug 31, 2025 • 0 new comments -
Feature Request: Gemma3n multimodal support
#14429 commented on
Aug 31, 2025 • 0 new comments -
Compile bug: Error: unknown type name 'THREAD_POWER_THROTTLING_STATE'
#14953 commented on
Aug 31, 2025 • 0 new comments -
Eval bug: OLMoE (https://github.com/allenai/OLMoE) is causing issues with llama-serve
#14988 commented on
Aug 31, 2025 • 0 new comments -
Feature Request: Support Step3TextForCausalLM
#14998 commented on
Aug 31, 2025 • 0 new comments -
Eval bug: Getting memory critical errors when using --no-mmap with MoE models
#14999 commented on
Aug 31, 2025 • 0 new comments -
Feature Request: Add an example of using mtmd C-api, at least for images.
#15492 commented on
Aug 31, 2025 • 0 new comments -
Eval bug: model infer input "GGGGGGG"
#15556 commented on
Aug 30, 2025 • 0 new comments -
model : add grok-2 support
#15539 commented on
Sep 1, 2025 • 0 new comments -
ggml WebGPU: remove userdata from request adapter callback
#15527 commented on
Aug 27, 2025 • 0 new comments -
llama : add ggml version and commit functions
#15499 commented on
Aug 28, 2025 • 0 new comments -
Thinking model disabled assistant prefill
#15404 commented on
Aug 28, 2025 • 0 new comments -
aLoRA Support
#15327 commented on
Aug 28, 2025 • 0 new comments -
Add OpenVINO backend
#15307 commented on
Aug 27, 2025 • 0 new comments -
64 bit CUDA copy routines via GGML_CUDA_ALLOW_LARGE_TENSORS
#15298 commented on
Aug 31, 2025 • 0 new comments -
server: implement GLM-style MTP
#15225 commented on
Aug 29, 2025 • 0 new comments -
ggml: aarch64: Implement SVE F16 kernels for vector functions
#15115 commented on
Aug 28, 2025 • 0 new comments -
qwen3-coder tool call parser
#15019 commented on
Aug 31, 2025 • 0 new comments -
Add support for CogVLM model
#15002 commented on
Aug 29, 2025 • 0 new comments -
imatrix: calculate activation-based statistics for new format (GGUF) imatrices
#14891 commented on
Sep 1, 2025 • 0 new comments -
SvelteKit-based WebUI
#14839 commented on
Sep 1, 2025 • 0 new comments -
feat: Add optional prompt processing progress streaming
#14731 commented on
Aug 27, 2025 • 0 new comments -
server : (webui) let server send locally-defined default webui settings
#14468 commented on
Aug 26, 2025 • 0 new comments -
llama : support qwen3 rerank and embeddings
#14029 commented on
Aug 31, 2025 • 0 new comments -
Eval bug: Image processing on Metal takes significant amount of time
#15426 commented on
Aug 28, 2025 • 0 new comments -
changelog : `llama-server` REST API
#9291 commented on
Aug 28, 2025 • 0 new comments -
Feature request: Graphical GGUF viewer
#6715 commented on
Aug 28, 2025 • 0 new comments -
Feature Request: Grok-2 support
#15534 commented on
Aug 28, 2025 • 0 new comments -
Misc. bug: llama-server embedding endpoint returns vectors with just null values after a while
#14812 commented on
Aug 28, 2025 • 0 new comments -
Eval bug: Uncaught exception during inference crashes llama-server
#14923 commented on
Aug 28, 2025 • 0 new comments -
Compile bug: "Illegal instruction" on StarFive VisionFive2 (risc-v)
#14926 commented on
Aug 28, 2025 • 0 new comments -
GPT-OSS 20B and Qwen3 30B A3B prefill so slowly
#15163 commented on
Aug 27, 2025 • 0 new comments -
Feature Request: The script convert_hf_to_gguf.py supports conversion of DeepSeek-R1-0528-FP4.
#15415 commented on
Aug 27, 2025 • 0 new comments -
Feature Request: Support for ERNIE-4.5-VL
#15512 commented on
Aug 27, 2025 • 0 new comments -
tutorials : list for llama.cpp
#13523 commented on
Aug 27, 2025 • 0 new comments -
Feature Request: Deepthink with Confidence
#15518 commented on
Aug 27, 2025 • 0 new comments -
Eval bug: JSON schema incorrectly enforces order of JSON object keys
#15216 commented on
Aug 27, 2025 • 0 new comments -
Eval bug: Qwen2.5-VL-7B-Instruct returns extremely inaccurate bbox coordinates
#13694 commented on
Aug 26, 2025 • 0 new comments -
Misc. bug: LLAMA-SERVER is 40% slower than LLAMA-CLI when using identical parameters including -ot option for tensor offloading
#14201 commented on
Aug 26, 2025 • 0 new comments -
Eval bug: Regression: Tool calls still returned in content field as JSON string instead of tool_calls array
#14697 commented on
Aug 26, 2025 • 0 new comments -
Misc. bug: llamacpp crashes my PC whenever I close the console for it.
#14713 commented on
Aug 26, 2025 • 0 new comments -
Misc. bug: llama-bench json output is too verbose
#15554 commented on
Aug 25, 2025 • 0 new comments -
Eval bug: KV buffer allocation when using --n-cpu-moe on ROCm multi-GPU setup
#15538 commented on
Aug 25, 2025 • 0 new comments -
Feature Request: --n-cpu-moe option for multi GPU?
#15263 commented on
Aug 25, 2025 • 0 new comments -
Compile bug: Broken HIP on 6122
#15196 commented on
Aug 30, 2025 • 0 new comments -
changelog : `libllama` API
#9289 commented on
Aug 30, 2025 • 0 new comments -
kubernetes example
#6546 commented on
Aug 30, 2025 • 0 new comments -
LoRA training example
#13485 commented on
Aug 30, 2025 • 0 new comments -
Misc. bug: -sm row results in gibberish output on HIP (ROCm 6.3.3)
#13545 commented on
Aug 30, 2025 • 0 new comments -
Compile bug: build-xcframework.sh error
#14954 commented on
Aug 30, 2025 • 0 new comments -
Eval bug: i use llama-server to decode audio with llm but it say it dont has any audio
#14963 commented on
Aug 30, 2025 • 0 new comments -
libblis crashes:Default MC is non-multiple of MR for one or more datatypes.
#14972 commented on
Aug 30, 2025 • 0 new comments -
Misc. bug: Reranker not working with OpenWebUI
#14977 commented on
Aug 30, 2025 • 0 new comments -
Feature Request: Add DeepSeek-V3.1
#15496 commented on
Aug 29, 2025 • 0 new comments -
Misc. bug: prompt processing stall with long context on deepseek models
#15514 commented on
Aug 29, 2025 • 0 new comments -
Eval bug: Crash over tool calls in Qwen3 Coder
#15046 commented on
Aug 29, 2025 • 0 new comments -
Misc. bug: Regression in unified KV cache appears after `llama.cpp` release b5912 in b5913
#14847 commented on
Aug 29, 2025 • 0 new comments -
Eval bug:
#14936 commented on
Aug 29, 2025 • 0 new comments -
Eval bug: Repeated sequences with medgemma3-27b-text-it gguf
#14938 commented on
Aug 29, 2025 • 0 new comments -
Feature Request: T5Gemma support
#14940 commented on
Aug 29, 2025 • 0 new comments -
Eval bug: llama_model_load: unknown model architecture: 'mllama' llama_model_load_from_file_impl: Segmentation fault (core dumped)
#14951 commented on
Aug 29, 2025 • 0 new comments -
Eval bug: No generation with follow up on high token responses on GPT-OSS 120B
#15517 commented on
Aug 28, 2025 • 0 new comments -
Feature Request: Support GLM-4.1V-9B-Thinking
#14495 commented on
Aug 28, 2025 • 0 new comments -
Eval bug: Jinja fails on `gpt-oss-120b` when using Vulkan
#15274 commented on
Aug 28, 2025 • 0 new comments