Pulse · ggml-org/llama.cpp · GitHub

March 25, 2025 – April 1, 2025

Overview

74 Active pull requests

80 Active issues

Could not load contribution data

Please try again later

40 Releases published by 1 person

b4953
published Mar 25, 2025
b4956
published Mar 25, 2025
b4957
published Mar 25, 2025
b4958
published Mar 25, 2025
b4961
published Mar 26, 2025
b4963
published Mar 26, 2025
b4964
published Mar 26, 2025
b4966
published Mar 26, 2025
b4967
published Mar 27, 2025
b4969
published Mar 27, 2025
b4970
published Mar 27, 2025
b4972
published Mar 27, 2025
b4974
published Mar 27, 2025
b4976
published Mar 27, 2025
b4977
published Mar 27, 2025
b4978
published Mar 27, 2025
b4980
published Mar 27, 2025
b4981
published Mar 28, 2025
b4982
published Mar 28, 2025
b4984
published Mar 28, 2025
b4985
published Mar 28, 2025
b4986
published Mar 28, 2025
b4987
published Mar 28, 2025
b4988
published Mar 28, 2025
b4990
published Mar 29, 2025
b4991
published Mar 29, 2025
b4992
published Mar 29, 2025
b4997
published Mar 30, 2025
b4998
published Mar 30, 2025
b4999
published Mar 30, 2025
b5001
published Mar 30, 2025
b5002
published Mar 30, 2025
b5003
published Mar 31, 2025
b5004
published Mar 31, 2025
b5005
published Mar 31, 2025
b5006
published Mar 31, 2025
b5009
published Mar 31, 2025
b5010
published Mar 31, 2025
b5012
published Mar 31, 2025
b5013
published Mar 31, 2025

53 Pull requests merged by 30 people

convert : BailingMoE : avoid setting rope_dim to 0
#12678 merged Mar 31, 2025
vocab : add special infill tokens for CodeLlama
#11850 merged Mar 31, 2025
Faster ssm scan
#10558 merged Mar 31, 2025
convert : Qwerky : use lora_rank_tokenshift and lora_rank_decay if present
#12667 merged Mar 31, 2025
Vulkan: Add DP4A MMQ and Q8_1 quantization shader
#12135 merged Mar 31, 2025
sync : ggml
#12670 merged Mar 31, 2025
llava : proper description fix
#12668 merged Mar 31, 2025
SYCL: Remove misleading ggml_sycl_op_flatten function
#12387 merged Mar 31, 2025
llava : fix clip loading GGUFs with missing description
#12660 merged Mar 31, 2025
llama-tts refactor console output
#12640 merged Mar 31, 2025
llama : support BailingMoE (Ling)
#12634 merged Mar 30, 2025
metal : use constexpr in FA kernels + fix typedef
#12659 merged Mar 30, 2025
Add Trillion 7B model support
#12556 merged Mar 30, 2025
Add Yandex instruct model template support
#12621 merged Mar 30, 2025
musa: fix all warnings, re-enable -DLLAMA_FATAL_WARNINGS=ON in ci and update doc
#12611 merged Mar 30, 2025
sync : ggml
#12645 merged Mar 30, 2025
llama : fix non-causal mask for gemma 3
#12615 merged Mar 29, 2025
change cpu_buft_list order: ACCEL -> GPU host -> CPU extra -> CPU
#12632 merged Mar 29, 2025
cmake: fix ccache conflict
#12522 merged Mar 29, 2025
[CANN]: remove clang-format in ggml-cann
#12607 merged Mar 29, 2025
llama : fix incorrect Qwen2Moe ffn_moe_out graph callback
#12631 merged Mar 28, 2025
metal : improve FA + improve MoE
#12612 merged Mar 28, 2025
vulkan: fix coopmat shader generation when cross-compiling
#12272 merged Mar 28, 2025
llama: fix error on bad grammar
#12628 merged Mar 28, 2025
Include speculative decoding stats when timings_per_token is enabled
#12603 merged Mar 28, 2025
rpc : update README for cache usage
#12620 merged Mar 28, 2025
llamafile : ppc64le GEMV forwarding for FP32.
#12594 merged Mar 28, 2025
rpc : send hash when tensor data is above some fixed threshold
#12496 merged Mar 28, 2025
server : Support listening on a unix socket
#12613 merged Mar 27, 2025
media : add SVG logo [no ci]
#12616 merged Mar 27, 2025
opencl: add multi and vision rope, gelu_quick and im2col
#12600 merged Mar 27, 2025
Add PLM GGUF Conversion & Inference Support
#12457 merged Mar 27, 2025
Fix T5Encoder model handling.
#12590 merged Mar 27, 2025
Support Qwen2_5_VLForConditionalGeneration
#12595 merged Mar 27, 2025
sync : ggml
#12606 merged Mar 27, 2025
sync : ggml
#12604 merged Mar 27, 2025
llamafile : ppc64le MMA implementation for Q4_0.
#12489 merged Mar 27, 2025
ggml : riscv: add 128-bit RVV support
#12530 merged Mar 27, 2025
llama : make loras compatible with repacking
#12593 merged Mar 27, 2025
SYCL: implement memset ggml backend buffer interface
#12580 merged Mar 27, 2025
Add support for new gfx1200 and gfx1201 targets
#12372 merged Mar 26, 2025
metal : refactor mat-vec code
#12569 merged Mar 26, 2025
grammars: upgrade to llguidance 0.7.10
#12576 merged Mar 26, 2025
clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend
#12566 merged Mar 26, 2025
convert : fix squeeze for ssm_conv tensors
#12573 merged Mar 26, 2025
ggml : fix MUL_MAT_ID repack with Q8_K
#12544 merged Mar 26, 2025
doc: [MUSA] minor changes
#12583 merged Mar 26, 2025
convert: fix Mistral3/Gemma3 model hparams init
#12571 merged Mar 25, 2025
De-duplicate fmt and format functions and optimize
#11596 merged Mar 25, 2025
ggml-cpu : bug fix related to KleidiAI multithreaded LHS packing
#12568 merged Mar 25, 2025
SYCL: disable Q4_0 reorder optimization by default
#12560 merged Mar 25, 2025
docs : add build instructions for KleidiAI
#12563 merged Mar 25, 2025
ci: [MUSA] add CI and update doc
#12562 merged Mar 25, 2025

21 Pull requests opened by 17 people

Enable MMA for BF16 data types on Powerpc
#12565 opened Mar 25, 2025
opencl: Add support for multiple devices
#12622 opened Mar 28, 2025
sycl: allow ggml-sycl configuration and compilation using Visual Studio project/solution
#12625 opened Mar 28, 2025
opencl: remove a self-referential macro
#12626 opened Mar 28, 2025
vulkan: Implement split_k for coopmat2 flash attention.
#12627 opened Mar 28, 2025
vulkan: Hybrid waitForFences/getFenceStatus to reduce fence latency
#12630 opened Mar 28, 2025
llama-server : implement universal assisted decoding
#12635 opened Mar 28, 2025
tts : implement sesame CSM + Mimi decoder
#12648 opened Mar 29, 2025
opencl : fix memory allocation size
#12649 opened Mar 30, 2025
llama : nit, DeepSeek V1 MoE is 16B and GigaChat is 20B
#12652 opened Mar 30, 2025
contrib: support modelscope community
#12664 opened Mar 31, 2025
update `rope_multi`:
#12665 opened Mar 31, 2025
[CANN]get_rows and dup optimization
#12671 opened Mar 31, 2025
use LLM_KV instead of gguf_find_key
#12672 opened Mar 31, 2025
SYCL: switch to SYCL namespace
#12674 opened Mar 31, 2025
vocab : BailingMoE : change possessive quantifiers to greedy
#12677 opened Mar 31, 2025
WIP: Add support for CogAgent
#12679 opened Mar 31, 2025
gguf-split now respects dry-run option
#12681 opened Mar 31, 2025
vulkan: fix build when glslc doesn't support coopmat
#12683 opened Apr 1, 2025
Fix clang warning in gguf_check_reserved_keys
#12686 opened Apr 1, 2025
convert : BailingMoE : fix qkv split when head_dim is 0
#12687 opened Apr 1, 2025

48 Issues closed by 17 people

llama-cli: error while loading shared libraries: libllama.so: cannot open shared object file: No such file or directory
#11123 closed Apr 1, 2025
Feature Request: Add support to deepseek vl2
#11678 closed Apr 1, 2025
Feature Request: Use all (or configurable #) of threads for model loading, not constrainted by --threads specified for inference
#11873 closed Apr 1, 2025
Misc. bug: I tried to integrate this simple application(/examples/simple/) to my android application with UI interface(.apk), but the performance is degraded very drastically( like 2 seconds to 200 seconds).
#11889 closed Apr 1, 2025
Eval bug: Qwerky QwQ 32B (rwkv6qwen2) failed to load
#12662 closed Mar 31, 2025
core dumped on riscv
#11537 closed Mar 31, 2025
Compile bug: Error while compiling llama.cpp
#11691 closed Mar 31, 2025
Eval bug: image encode time slow on mobile device
#11856 closed Mar 31, 2025
OLMoE Q4_0 quant does not work
#11862 closed Mar 31, 2025
Does llama.cpp deploy support mutil_nodes mutil-GPUs
#11865 closed Mar 31, 2025
Feature Request: APIkey
#11874 closed Mar 31, 2025
GGUF Model Missing `general.description` Key Causes Runtime Error in Qwen2-VL Instruct
#12658 closed Mar 31, 2025
Feature Request: support DeepSeek-V3's "Scaled ReLU or SwiGLU activation functions"
#12653 closed Mar 30, 2025
Eval bug: convert_hf_to_gguf.py Can not map tensor 'model.layers.0.mlp.down_proj.weight_scale_inv'
#12644 closed Mar 30, 2025
Misc. bug: convert_hf_to_gguf.py Can not map tensor 'model.layers.0.mlp.down_proj.weight_scale_inv'
#12650 closed Mar 30, 2025
Misc. bug: examples/gguf/gguf.cpp always fails with data check
#12647 closed Mar 30, 2025
Misc. bug: convert_hf_to_gguf failed !! ValueError: Can not map tensor 'model.layers.0.mlp.down_proj.weight_scale_inv'
#12341 closed Mar 30, 2025
Compile bug: parameter packs not expanded with ‘...’:
#11112 closed Mar 30, 2025
Misc. bug: llama-server `--ctx-size` is divided by `--parallel` and cannot be increased?
#11681 closed Mar 30, 2025
When Running deepseek-r1-dynamic-1.58-bit，the KV cache question
#11757 closed Mar 30, 2025
cudaErrorIllegalAddress (error 700) due to "an illegal memory access was encountered" on CUDA API call to cudaDeviceSynchronize.
#11829 closed Mar 30, 2025
Misc. bug: CUDA error: CUDA-capable device(s) is/are busy or unavailable from `cudaSetDevice(device)`
#11841 closed Mar 30, 2025
Eval bug: Gemma3 <unused32> spam
#12433 closed Mar 29, 2025
Misc. bug: llama-server does not print model loading errors by default (log level misconfigured?)
#11819 closed Mar 29, 2025
Misc. bug: llama-cli crash on ubuntu with GGML-VULKAN=ON
#11823 closed Mar 29, 2025
Misc. bug: Quantization process 100 times slower on Windows (dockerized)
#11825 closed Mar 29, 2025
[BENCHMARKS] DeepScaleR-1.5B-Preview F16 ollama GGUF vs llama.cpp
#11828 closed Mar 29, 2025
Feature Request: Direct way to check the status of the abort mechanism.
#12525 closed Mar 28, 2025
Feature Request: RPC offloading using a local model copy
#10095 closed Mar 28, 2025
why assert(!isnan(wp[i])) in softmax_forward function
#12542 closed Mar 28, 2025
llama.cpp didn’t use GPU to accelerate inference for gguf file.
#12614 closed Mar 28, 2025
Misc. bug: Virus detected
#10768 closed Mar 28, 2025
Eval bug: [CANN] inference not use NPU
#11799 closed Mar 28, 2025
Urgent Help Needed! Problems Encountered in Hybrid Inference Function Verification Based on llama.cpp
#11805 closed Mar 28, 2025
Eval bug: Incorrect n_gpu_layer settings for MoE models
#12596 closed Mar 27, 2025
Eval bug: T5Encoder support broken
#12588 closed Mar 27, 2025
Misc. bug: Server crash with use of lora on CPU
#12587 closed Mar 27, 2025
Compile bug: Fails to compile with undefined references in libggml.so
#11562 closed Mar 27, 2025
Eval bug: Abnormal memory usage on Metal backend
#12574 closed Mar 26, 2025
Eval bug: GPU Hang Error on Metal backend
#12277 closed Mar 26, 2025
Misc. bug: Falcon3-Mamba-7B fails on ggml_ssm_conv
#12572 closed Mar 26, 2025
Eval bug: Program not working properly due to new features of "repack Q4_K tensor"
#12528 closed Mar 26, 2025
Misc. bug: All llama executables exit immediately without console output
#10929 closed Mar 26, 2025
Eval bug: error: Double type is not supported on this platform.
#11266 closed Mar 26, 2025
Feature Request: llama-server support continue_final_message
#11755 closed Mar 26, 2025
Misc. bug: embedding example coredump since
#12561 closed Mar 26, 2025
Misc. bug: Gemma3 adapter gguf conversion fails
#12551 closed Mar 25, 2025
GPT2: llama_model_load: error loading model: missing tensor 'output.weight'
#12567 closed Mar 25, 2025

32 Issues opened by 29 people

Compile bug: compilation warnings (clang) Introduced in #10558
#12685 opened Apr 1, 2025
Misc. bug: examples/gguf-split merge does not respect dry-run option
#12680 opened Mar 31, 2025
Eval bug: with -ub 8192 model llama-server insists running on GPU
#12675 opened Mar 31, 2025
Feature Request: Qwen2.5-Omni
#12673 opened Mar 31, 2025
I wonder if the speed of eval(decode stage) can remain unchanged when using the OpenCL backend to accelerate prompt eval(prefill stage) with the GPU on Qualcomm devices.
#12669 opened Mar 31, 2025
Feature Request: Add support for StarVector-8b/1b
#12666 opened Mar 31, 2025
kv buffer on CPU ?
#12663 opened Mar 31, 2025
Compile bug: ¿How to compile only one example?
#12661 opened Mar 30, 2025
Misc. bug: Gibbersish output on AMD Ryzen 9 8945HS w/ Radeon 780M Graphics since commit: 3d82dbcbce2c
#12657 opened Mar 30, 2025
Misc. bug: rpc - Flash Attention Failure in Metal/CUDA RPC Mixed Environment
#12655 opened Mar 30, 2025
Feature Request: Splitting layers according to VRAM usage on multi GPUs setups
#12654 opened Mar 30, 2025
Eval bug: Unusual high RAM usage on Windows when running DeepSeek V3 Q2_K_XL/IQ2_XXS, on Hybrid CPU+GPU (vs Linux).
#12651 opened Mar 30, 2025
Feature Request: convert_hf_to_gguf.py to support model type Qwen2_5_VLForConditionalGeneration
#12642 opened Mar 29, 2025
Feature Request: Add support of convert.py for model Qwen2.5-Omni-7B
#12641 opened Mar 29, 2025
Compile bug: there is a build bug in examples/llama.android and it will brings build failure in CI
#12638 opened Mar 29, 2025
Feature Request: Interleaved sliding window attention support for gemma 2 and 3
#12637 opened Mar 29, 2025
Misc. bug: HIP when using llama.bench and kv cache quant cpu is doing the work instead of gpu
#12624 opened Mar 28, 2025
Misc. bug:
#12623 opened Mar 28, 2025
Compile bug: There was a errror while compiling support for the backend Vulkan
#12619 opened Mar 28, 2025
Misc. bug: Data check in examples/gguf
#12617 opened Mar 27, 2025
Feature Request: Add "trust_remote_code support" to 'convert_hf_to_gguf.py' for compatibility with modern HF models
#12610 opened Mar 27, 2025
Compile bug: SYCL backend build fail on debug config
#12602 opened Mar 27, 2025
[New Bitnet Model Support Request] Deepgrove model Bonsai 0.5B - Add Channel Scales
#12598 opened Mar 27, 2025
Misc. bug: "Unexpected empty grammar stack after accepting piece" tool crash
#12597 opened Mar 26, 2025
Eval bug: run failed when run lora adapter(no merged) on android
#12592 opened Mar 26, 2025
Eval bug: got exception: {"code":500,"message":"Unsupported param: echo","type":"server_error"}
#12591 opened Mar 26, 2025
Eval bug: allocating 114296.55 MiB on device 0: cudaMalloc failed: out of memory
#12586 opened Mar 26, 2025
Qwen2.5-vl support and conversion？
#12584 opened Mar 26, 2025
Compile bug: vulkan-shaders-gen hangs when built with address sanitizers
#12581 opened Mar 26, 2025
-ngl to load ·last n layers· to gpu
#12577 opened Mar 26, 2025
Misc. bug: performance drop with 2x SYCL GPUs
#12575 opened Mar 25, 2025
Eval bug: Using llama-llava-clip-quantize-cli under CUDA backend conditions will encounter a crash.
#12564 opened Mar 25, 2025

80 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on Apr 1, 2025 • 28 new comments
SYCL: Rename oneMKL to oneMath
#12192 commented on Apr 1, 2025 • 12 new comments
llama : add llama_batch_ext
#11875 commented on Mar 31, 2025 • 5 new comments
perplexity: Add option to ignore context window overflow errors and continue score calculation
#12512 commented on Mar 30, 2025 • 2 new comments
Misc. bug: RISCV output bug when using rvv with vlen > 256bit
#11041 commented on Apr 1, 2025 • 0 new comments
Eval bug: Error running Phi4-mini gguf: unknown pre-tokenizer type: 'gpt-4o'
#12122 commented on Apr 1, 2025 • 0 new comments
Eval bug: In RISC-V, output tokens are broken
#12124 commented on Apr 1, 2025 • 0 new comments
Feature Request:
#12128 commented on Apr 1, 2025 • 0 new comments
Feature request: Graphical GGUF viewer
#6715 commented on Mar 31, 2025 • 0 new comments
Eval bug: Command A only outputs 88888888 with -fa
#12441 commented on Mar 31, 2025 • 0 new comments
ggml : refactor ggml-cpu.c into multiple C++ source files
#10180 commented on Mar 31, 2025 • 0 new comments
llama cpp android gpu
#12462 commented on Mar 31, 2025 • 0 new comments
Misc. bug: llama-cli '--log-disable' parameter omits response
#11983 commented on Mar 31, 2025 • 0 new comments
Eval bug: model producing gibberish for Orion14b-chat
#12411 commented on Mar 31, 2025 • 0 new comments
Feature Request: Support for Qwen2-VL
#9246 commented on Mar 31, 2025 • 0 new comments
Compile bug: fatal error: 'ggml.h' file not found
#12101 commented on Mar 31, 2025 • 0 new comments
Misc. bug: When using streaming output, if stream_options={"include_usage": True} is not set, the returned result should not include usage stats
#12102 commented on Mar 31, 2025 • 0 new comments
Misc. bug: While running llama-simple-chat, it throws "context size exceeded"
#12113 commented on Mar 31, 2025 • 0 new comments
Misc. bug: Server web UI: Complete output is lost due to the „normal“ context shift message
#12120 commented on Mar 31, 2025 • 0 new comments
Eval bug: llama-qwen2vl-cli --log-disable rather disables the response, not the log
#12407 commented on Mar 30, 2025 • 0 new comments
Compile bug: Emulated Linux ARM64 CPU build fails
#10933 commented on Mar 30, 2025 • 0 new comments
Regarding llama-bench and llama-parallel commands
#12106 commented on Mar 30, 2025 • 0 new comments
ggml-quants : weighted rounding algorithms with cumulative search
#12557 commented on Mar 30, 2025 • 0 new comments
llama-map to support hugepage feature of pagesize 2M or 1G which can …
#12552 commented on Mar 31, 2025 • 0 new comments
quantize: Handle user-defined quantization levels for additional tensors
#12511 commented on Mar 31, 2025 • 0 new comments
(draft) tts: Orpheus support
#12487 commented on Mar 28, 2025 • 0 new comments
Metal TQ2_0
#12485 commented on Mar 30, 2025 • 0 new comments
Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture
#12466 commented on Mar 28, 2025 • 0 new comments
ci: add Linux cross-compile build
#12428 commented on Mar 31, 2025 • 0 new comments
[WIP] MUSA: enable fastfp16, correct warp reduce impl and perf tuning
#12383 commented on Mar 30, 2025 • 0 new comments
`server`: streaming of tool calls and thoughts when `--jinja` is on
#12379 commented on Mar 28, 2025 • 0 new comments
Supporting Velvet model
#11716 commented on Mar 26, 2025 • 0 new comments
Add support for Deepseek-R1 flash attention
#11557 commented on Mar 26, 2025 • 0 new comments
Optimized DeepSeek V2/V3 implementation (MLA)
#11446 commented on Mar 31, 2025 • 0 new comments
llama : add option to override model tensor buffers
#11397 commented on Mar 27, 2025 • 0 new comments
add FP8 support to gguf/llama:
#10055 commented on Mar 29, 2025 • 0 new comments
Simplify and improve CUDA graphs through use of indirect copy pointers
#9017 commented on Mar 31, 2025 • 0 new comments
Feature Request: Qwen 2.5 VL
#11483 commented on Apr 1, 2025 • 0 new comments
llama-gemma3-cli: output degeneration after repeated uses
#12499 commented on Apr 1, 2025 • 0 new comments
Misc. bug: CUDA errors with multi-threaded use
#11804 commented on Apr 1, 2025 • 0 new comments
Compile bug: Build failure on VirtualBox: ggml-cpu-aarch64.cpp invalid conversion error
#11783 commented on Mar 28, 2025 • 0 new comments
Misc. bug: ggml-backend.cpp:746: pre-allocated tensor (cache_k_l0 (view) (copy of cache_k_l0 (view))) in a buffer (Vulkan0) that cannot run the operation (CPY)
#12045 commented on Mar 28, 2025 • 0 new comments
Misc. bug: Crashing, forcing BMI2 on non BMI2 CPUs
#12500 commented on Mar 27, 2025 • 0 new comments
ggml : add ANE backend
#10453 commented on Mar 27, 2025 • 0 new comments
Bug: Cannot run larger than VRAM models with `GGML_CUDA_ENABLE_UNIFIED_MEMORY`
#10091 commented on Mar 27, 2025 • 0 new comments
Misc. bug: [SERVER] Multiple slots, generation speed is degraded after each generation/slot used
#10860 commented on Mar 27, 2025 • 0 new comments
Compile bug:
#11930 commented on Mar 27, 2025 • 0 new comments
Feature Request: 推理minicpmv时，encoding_image_with_clip耗时很久
#11941 commented on Mar 27, 2025 • 0 new comments
Eval bug: context shift is disabled
#11974 commented on Mar 27, 2025 • 0 new comments
Eval bug: Error when converting moonlight from bf16 to q4km
#12040 commented on Mar 27, 2025 • 0 new comments
Compile bug: llama.cpp-b4749/ggml/src/ggml-cpu/ggml-cpu-quants.c:5141:26: error: initialization of ‘uint32_t *’ {aka ‘unsigned int *’} from incompatible pointer type ‘const uint8_t (*)[12]’ {aka ‘const unsigned char (*)[12]’} [-Wincompatible-pointer-types]
#12050 commented on Mar 27, 2025 • 0 new comments
Misc. bug: cannot scroll to right side when input too long
#12054 commented on Mar 27, 2025 • 0 new comments
Eval bug: the swiftui keeps saying the same thing
#12558 commented on Mar 26, 2025 • 0 new comments
Possible solution for poor token generation performance in llama.cpp on dual Epyc Genoa/Turin systems
#11744 commented on Mar 26, 2025 • 0 new comments
Misc. bug: auto scroll doesn't work in WebUI
#12362 commented on Mar 25, 2025 • 0 new comments
Feature Request: Add support for Kokoro TTS
#11050 commented on Mar 25, 2025 • 0 new comments
Misc. bug: vulkan: performance regression after fd123cfead49eb32e386e26b8ef7a6d41554dda5
#12553 commented on Mar 25, 2025 • 0 new comments
Study how LM Evaluation Harness works and try to implement it
#231 commented on Mar 25, 2025 • 0 new comments
Eval bug: inference of 32B eats too much memory on ROCM HIP (5x AMD Radeon Instinct Mi50 (gfx906))
#12369 commented on Mar 25, 2025 • 0 new comments
Eval bug: CPU usage is abnormal when running deepseek-r1-671B-Q4_0 weights in Atlas 800T a2 and NPU device。
#11966 commented on Mar 25, 2025 • 0 new comments
csm : implement Sesame-based conversation example
#12392 commented on Mar 30, 2025 • 0 new comments
Compile bug: iOS version able to build not not able to run
#10922 commented on Mar 30, 2025 • 0 new comments
"CPU_AARCH64 model buffer" appears when not using AARCH64
#11204 commented on Mar 30, 2025 • 0 new comments
Feature Request: NUMA-aware MoE Expert Allocation for Improved Performanc
#11333 commented on Mar 30, 2025 • 0 new comments
Feature Request: resize an existing context
#11577 commented on Mar 30, 2025 • 0 new comments
Eval bug: granite-vision-3.1-2b-preview ERROR:hf-to-gguf:Model LlavaNextForConditionalGeneration is not supported
#12053 commented on Mar 30, 2025 • 0 new comments
Compile bug: Failed to compile on centos8 system
#12092 commented on Mar 30, 2025 • 0 new comments
tts : add support for Orpheus
#12476 commented on Mar 29, 2025 • 0 new comments
kubernetes example
#6546 commented on Mar 29, 2025 • 0 new comments
Move gguf fuzzers to the llama.cpp repository
#11514 commented on Mar 29, 2025 • 0 new comments
Feature Request: Support Codestral Mamba
#8519 commented on Mar 29, 2025 • 0 new comments
Eval bug: llama.cpp CPU bound while inferencing against DeepSeek-R1 GGUF
#11635 commented on Mar 29, 2025 • 0 new comments
Eval bug: rpc backend surport cpu?
#11807 commented on Mar 29, 2025 • 0 new comments
Enhancement: Improve ROCm performance on various quants (benchmarks included)
#11931 commented on Mar 29, 2025 • 0 new comments
Eval bug: TikTokenTokenizer has no attribute vocab
#12044 commented on Mar 29, 2025 • 0 new comments
Misc. bug: llama-cli llama_backend_free may not free all the gpu memory
#12057 commented on Mar 29, 2025 • 0 new comments
Eval bug: MUSA error: operation not supported
#12077 commented on Mar 29, 2025 • 0 new comments
Misc. bug: Loop range computation question of Vulkan matmul shaders
#12082 commented on Mar 29, 2025 • 0 new comments
Compile bug: How to compile llama.cpp with Vulkan for android device
#11695 commented on Mar 29, 2025 • 0 new comments
Eval bug: Phi-4 mini in iOS with xcframework
#12232 commented on Mar 28, 2025 • 0 new comments