-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Insights: ggml-org/llama.cpp
Overview
Could not load contribution data
Please try again later
40 Releases published by 1 person
-
b4953
published
Mar 25, 2025 -
b4956
published
Mar 25, 2025 -
b4957
published
Mar 25, 2025 -
b4958
published
Mar 25, 2025 -
b4961
published
Mar 26, 2025 -
b4963
published
Mar 26, 2025 -
b4964
published
Mar 26, 2025 -
b4966
published
Mar 26, 2025 -
b4967
published
Mar 27, 2025 -
b4969
published
Mar 27, 2025 -
b4970
published
Mar 27, 2025 -
b4972
published
Mar 27, 2025 -
b4974
published
Mar 27, 2025 -
b4976
published
Mar 27, 2025 -
b4977
published
Mar 27, 2025 -
b4978
published
Mar 27, 2025 -
b4980
published
Mar 27, 2025 -
b4981
published
Mar 28, 2025 -
b4982
published
Mar 28, 2025 -
b4984
published
Mar 28, 2025 -
b4985
published
Mar 28, 2025 -
b4986
published
Mar 28, 2025 -
b4987
published
Mar 28, 2025 -
b4988
published
Mar 28, 2025 -
b4990
published
Mar 29, 2025 -
b4991
published
Mar 29, 2025 -
b4992
published
Mar 29, 2025 -
b4997
published
Mar 30, 2025 -
b4998
published
Mar 30, 2025 -
b4999
published
Mar 30, 2025 -
b5001
published
Mar 30, 2025 -
b5002
published
Mar 30, 2025 -
b5003
published
Mar 31, 2025 -
b5004
published
Mar 31, 2025 -
b5005
published
Mar 31, 2025 -
b5006
published
Mar 31, 2025 -
b5009
published
Mar 31, 2025 -
b5010
published
Mar 31, 2025 -
b5012
published
Mar 31, 2025 -
b5013
published
Mar 31, 2025
53 Pull requests merged by 30 people
-
convert : BailingMoE : avoid setting rope_dim to 0
#12678 merged
Mar 31, 2025 -
vocab : add special infill tokens for CodeLlama
#11850 merged
Mar 31, 2025 -
Faster ssm scan
#10558 merged
Mar 31, 2025 -
convert : Qwerky : use lora_rank_tokenshift and lora_rank_decay if present
#12667 merged
Mar 31, 2025 -
Vulkan: Add DP4A MMQ and Q8_1 quantization shader
#12135 merged
Mar 31, 2025 -
sync : ggml
#12670 merged
Mar 31, 2025 -
llava : proper description fix
#12668 merged
Mar 31, 2025 -
SYCL: Remove misleading ggml_sycl_op_flatten function
#12387 merged
Mar 31, 2025 -
llava : fix clip loading GGUFs with missing description
#12660 merged
Mar 31, 2025 -
llama-tts refactor console output
#12640 merged
Mar 31, 2025 -
llama : support BailingMoE (Ling)
#12634 merged
Mar 30, 2025 -
metal : use constexpr in FA kernels + fix typedef
#12659 merged
Mar 30, 2025 -
Add Trillion 7B model support
#12556 merged
Mar 30, 2025 -
Add Yandex instruct model template support
#12621 merged
Mar 30, 2025 -
musa: fix all warnings, re-enable
-DLLAMA_FATAL_WARNINGS=ON
in ci and update doc#12611 merged
Mar 30, 2025 -
sync : ggml
#12645 merged
Mar 30, 2025 -
llama : fix non-causal mask for gemma 3
#12615 merged
Mar 29, 2025 -
change cpu_buft_list order: ACCEL -> GPU host -> CPU extra -> CPU
#12632 merged
Mar 29, 2025 -
cmake: fix ccache conflict
#12522 merged
Mar 29, 2025 -
[CANN]: remove clang-format in ggml-cann
#12607 merged
Mar 29, 2025 -
llama : fix incorrect Qwen2Moe ffn_moe_out graph callback
#12631 merged
Mar 28, 2025 -
metal : improve FA + improve MoE
#12612 merged
Mar 28, 2025 -
vulkan: fix coopmat shader generation when cross-compiling
#12272 merged
Mar 28, 2025 -
llama: fix error on bad grammar
#12628 merged
Mar 28, 2025 -
Include speculative decoding stats when timings_per_token is enabled
#12603 merged
Mar 28, 2025 -
rpc : update README for cache usage
#12620 merged
Mar 28, 2025 -
llamafile : ppc64le GEMV forwarding for FP32.
#12594 merged
Mar 28, 2025 -
rpc : send hash when tensor data is above some fixed threshold
#12496 merged
Mar 28, 2025 -
server : Support listening on a unix socket
#12613 merged
Mar 27, 2025 -
media : add SVG logo [no ci]
#12616 merged
Mar 27, 2025 -
opencl: add multi and vision rope,
gelu_quick
andim2col
#12600 merged
Mar 27, 2025 -
Add PLM GGUF Conversion & Inference Support
#12457 merged
Mar 27, 2025 -
Fix T5Encoder model handling.
#12590 merged
Mar 27, 2025 -
Support Qwen2_5_VLForConditionalGeneration
#12595 merged
Mar 27, 2025 -
sync : ggml
#12606 merged
Mar 27, 2025 -
sync : ggml
#12604 merged
Mar 27, 2025 -
llamafile : ppc64le MMA implementation for Q4_0.
#12489 merged
Mar 27, 2025 -
ggml : riscv: add 128-bit RVV support
#12530 merged
Mar 27, 2025 -
llama : make loras compatible with repacking
#12593 merged
Mar 27, 2025 -
SYCL: implement memset ggml backend buffer interface
#12580 merged
Mar 27, 2025 -
Add support for new gfx1200 and gfx1201 targets
#12372 merged
Mar 26, 2025 -
metal : refactor mat-vec code
#12569 merged
Mar 26, 2025 -
grammars: upgrade to llguidance 0.7.10
#12576 merged
Mar 26, 2025 -
clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend
#12566 merged
Mar 26, 2025 -
convert : fix squeeze for ssm_conv tensors
#12573 merged
Mar 26, 2025 -
ggml : fix MUL_MAT_ID repack with Q8_K
#12544 merged
Mar 26, 2025 -
doc: [MUSA] minor changes
#12583 merged
Mar 26, 2025 -
convert: fix Mistral3/Gemma3 model hparams init
#12571 merged
Mar 25, 2025 -
De-duplicate fmt and format functions and optimize
#11596 merged
Mar 25, 2025 -
ggml-cpu : bug fix related to KleidiAI multithreaded LHS packing
#12568 merged
Mar 25, 2025 -
SYCL: disable Q4_0 reorder optimization by default
#12560 merged
Mar 25, 2025 -
docs : add build instructions for KleidiAI
#12563 merged
Mar 25, 2025 -
ci: [MUSA] add CI and update doc
#12562 merged
Mar 25, 2025
21 Pull requests opened by 17 people
-
Enable MMA for BF16 data types on Powerpc
#12565 opened
Mar 25, 2025 -
opencl: Add support for multiple devices
#12622 opened
Mar 28, 2025 -
sycl: allow ggml-sycl configuration and compilation using Visual Studio project/solution
#12625 opened
Mar 28, 2025 -
opencl: remove a self-referential macro
#12626 opened
Mar 28, 2025 -
vulkan: Implement split_k for coopmat2 flash attention.
#12627 opened
Mar 28, 2025 -
vulkan: Hybrid waitForFences/getFenceStatus to reduce fence latency
#12630 opened
Mar 28, 2025 -
llama-server : implement universal assisted decoding
#12635 opened
Mar 28, 2025 -
tts : implement sesame CSM + Mimi decoder
#12648 opened
Mar 29, 2025 -
opencl : fix memory allocation size
#12649 opened
Mar 30, 2025 -
llama : nit, DeepSeek V1 MoE is 16B and GigaChat is 20B
#12652 opened
Mar 30, 2025 -
contrib: support modelscope community
#12664 opened
Mar 31, 2025 -
update `rope_multi`:
#12665 opened
Mar 31, 2025 -
[CANN]get_rows and dup optimization
#12671 opened
Mar 31, 2025 -
use LLM_KV instead of gguf_find_key
#12672 opened
Mar 31, 2025 -
SYCL: switch to SYCL namespace
#12674 opened
Mar 31, 2025 -
vocab : BailingMoE : change possessive quantifiers to greedy
#12677 opened
Mar 31, 2025 -
WIP: Add support for CogAgent
#12679 opened
Mar 31, 2025 -
gguf-split now respects dry-run option
#12681 opened
Mar 31, 2025 -
vulkan: fix build when glslc doesn't support coopmat
#12683 opened
Apr 1, 2025 -
Fix clang warning in gguf_check_reserved_keys
#12686 opened
Apr 1, 2025 -
convert : BailingMoE : fix qkv split when head_dim is 0
#12687 opened
Apr 1, 2025
48 Issues closed by 17 people
-
Feature Request: Add support to deepseek vl2
#11678 closed
Apr 1, 2025 -
Eval bug: Qwerky QwQ 32B (rwkv6qwen2) failed to load
#12662 closed
Mar 31, 2025 -
core dumped on riscv
#11537 closed
Mar 31, 2025 -
Compile bug: Error while compiling llama.cpp
#11691 closed
Mar 31, 2025 -
Eval bug: image encode time slow on mobile device
#11856 closed
Mar 31, 2025 -
OLMoE Q4_0 quant does not work
#11862 closed
Mar 31, 2025 -
Does llama.cpp deploy support mutil_nodes mutil-GPUs
#11865 closed
Mar 31, 2025 -
Feature Request: APIkey
#11874 closed
Mar 31, 2025 -
GGUF Model Missing `general.description` Key Causes Runtime Error in Qwen2-VL Instruct
#12658 closed
Mar 31, 2025 -
Feature Request: support DeepSeek-V3's "Scaled ReLU or SwiGLU activation functions"
#12653 closed
Mar 30, 2025 -
Eval bug: convert_hf_to_gguf.py Can not map tensor 'model.layers.0.mlp.down_proj.weight_scale_inv'
#12644 closed
Mar 30, 2025 -
Misc. bug: convert_hf_to_gguf.py Can not map tensor 'model.layers.0.mlp.down_proj.weight_scale_inv'
#12650 closed
Mar 30, 2025 -
Misc. bug: examples/gguf/gguf.cpp always fails with data check
#12647 closed
Mar 30, 2025 -
Compile bug: parameter packs not expanded with ‘...’:
#11112 closed
Mar 30, 2025 -
Misc. bug: llama-server `--ctx-size` is divided by `--parallel` and cannot be increased?
#11681 closed
Mar 30, 2025 -
When Running deepseek-r1-dynamic-1.58-bit,the KV cache question
#11757 closed
Mar 30, 2025 -
Misc. bug: CUDA error: CUDA-capable device(s) is/are busy or unavailable from `cudaSetDevice(device)`
#11841 closed
Mar 30, 2025 -
Eval bug: Gemma3 <unused32> spam
#12433 closed
Mar 29, 2025 -
Misc. bug: llama-server does not print model loading errors by default (log level misconfigured?)
#11819 closed
Mar 29, 2025 -
Misc. bug: llama-cli crash on ubuntu with GGML-VULKAN=ON
#11823 closed
Mar 29, 2025 -
Misc. bug: Quantization process 100 times slower on Windows (dockerized)
#11825 closed
Mar 29, 2025 -
[BENCHMARKS] DeepScaleR-1.5B-Preview F16 ollama GGUF vs llama.cpp
#11828 closed
Mar 29, 2025 -
Feature Request: Direct way to check the status of the abort mechanism.
#12525 closed
Mar 28, 2025 -
Feature Request: RPC offloading using a local model copy
#10095 closed
Mar 28, 2025 -
why assert(!isnan(wp[i])) in softmax_forward function
#12542 closed
Mar 28, 2025 -
llama.cpp didn’t use GPU to accelerate inference for gguf file.
#12614 closed
Mar 28, 2025 -
Misc. bug: Virus detected
#10768 closed
Mar 28, 2025 -
Eval bug: [CANN] inference not use NPU
#11799 closed
Mar 28, 2025 -
Urgent Help Needed! Problems Encountered in Hybrid Inference Function Verification Based on llama.cpp
#11805 closed
Mar 28, 2025 -
Eval bug: Incorrect n_gpu_layer settings for MoE models
#12596 closed
Mar 27, 2025 -
Eval bug: T5Encoder support broken
#12588 closed
Mar 27, 2025 -
Misc. bug: Server crash with use of lora on CPU
#12587 closed
Mar 27, 2025 -
Compile bug: Fails to compile with undefined references in libggml.so
#11562 closed
Mar 27, 2025 -
Eval bug: Abnormal memory usage on Metal backend
#12574 closed
Mar 26, 2025 -
Eval bug: GPU Hang Error on Metal backend
#12277 closed
Mar 26, 2025 -
Misc. bug: Falcon3-Mamba-7B fails on ggml_ssm_conv
#12572 closed
Mar 26, 2025 -
Eval bug: Program not working properly due to new features of "repack Q4_K tensor"
#12528 closed
Mar 26, 2025 -
Misc. bug: All llama executables exit immediately without console output
#10929 closed
Mar 26, 2025 -
Eval bug: error: Double type is not supported on this platform.
#11266 closed
Mar 26, 2025 -
Feature Request: llama-server support continue_final_message
#11755 closed
Mar 26, 2025 -
Misc. bug: embedding example coredump since
#12561 closed
Mar 26, 2025 -
Misc. bug: Gemma3 adapter gguf conversion fails
#12551 closed
Mar 25, 2025 -
GPT2: llama_model_load: error loading model: missing tensor 'output.weight'
#12567 closed
Mar 25, 2025
32 Issues opened by 29 people
-
Compile bug: compilation warnings (clang) Introduced in #10558
#12685 opened
Apr 1, 2025 -
Misc. bug: examples/gguf-split merge does not respect dry-run option
#12680 opened
Mar 31, 2025 -
Eval bug: with -ub 8192 model llama-server insists running on GPU
#12675 opened
Mar 31, 2025 -
Feature Request: Qwen2.5-Omni
#12673 opened
Mar 31, 2025 -
Feature Request: Add support for StarVector-8b/1b
#12666 opened
Mar 31, 2025 -
kv buffer on CPU ?
#12663 opened
Mar 31, 2025 -
Compile bug: ¿How to compile only one example?
#12661 opened
Mar 30, 2025 -
Misc. bug: Gibbersish output on AMD Ryzen 9 8945HS w/ Radeon 780M Graphics since commit: 3d82dbcbce2c
#12657 opened
Mar 30, 2025 -
Misc. bug: rpc - Flash Attention Failure in Metal/CUDA RPC Mixed Environment
#12655 opened
Mar 30, 2025 -
Feature Request: Splitting layers according to VRAM usage on multi GPUs setups
#12654 opened
Mar 30, 2025 -
Feature Request: convert_hf_to_gguf.py to support model type Qwen2_5_VLForConditionalGeneration
#12642 opened
Mar 29, 2025 -
Feature Request: Add support of convert.py for model Qwen2.5-Omni-7B
#12641 opened
Mar 29, 2025 -
Compile bug: there is a build bug in examples/llama.android and it will brings build failure in CI
#12638 opened
Mar 29, 2025 -
Feature Request: Interleaved sliding window attention support for gemma 2 and 3
#12637 opened
Mar 29, 2025 -
Misc. bug: HIP when using llama.bench and kv cache quant cpu is doing the work instead of gpu
#12624 opened
Mar 28, 2025 -
Misc. bug:
#12623 opened
Mar 28, 2025 -
Compile bug: There was a errror while compiling support for the backend Vulkan
#12619 opened
Mar 28, 2025 -
Misc. bug: Data check in examples/gguf
#12617 opened
Mar 27, 2025 -
Compile bug: SYCL backend build fail on debug config
#12602 opened
Mar 27, 2025 -
[New Bitnet Model Support Request] Deepgrove model Bonsai 0.5B - Add Channel Scales
#12598 opened
Mar 27, 2025 -
Misc. bug: "Unexpected empty grammar stack after accepting piece" tool crash
#12597 opened
Mar 26, 2025 -
Eval bug: run failed when run lora adapter(no merged) on android
#12592 opened
Mar 26, 2025 -
Eval bug: got exception: {"code":500,"message":"Unsupported param: echo","type":"server_error"}
#12591 opened
Mar 26, 2025 -
Eval bug: allocating 114296.55 MiB on device 0: cudaMalloc failed: out of memory
#12586 opened
Mar 26, 2025 -
Qwen2.5-vl support and conversion?
#12584 opened
Mar 26, 2025 -
Compile bug: vulkan-shaders-gen hangs when built with address sanitizers
#12581 opened
Mar 26, 2025 -
-ngl to load ·last n layers· to gpu
#12577 opened
Mar 26, 2025 -
Misc. bug: performance drop with 2x SYCL GPUs
#12575 opened
Mar 25, 2025 -
Eval bug: Using llama-llava-clip-quantize-cli under CUDA backend conditions will encounter a crash.
#12564 opened
Mar 25, 2025
80 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on
Apr 1, 2025 • 28 new comments -
SYCL: Rename oneMKL to oneMath
#12192 commented on
Apr 1, 2025 • 12 new comments -
llama : add llama_batch_ext
#11875 commented on
Mar 31, 2025 • 5 new comments -
perplexity: Add option to ignore context window overflow errors and continue score calculation
#12512 commented on
Mar 30, 2025 • 2 new comments -
Misc. bug: RISCV output bug when using rvv with vlen > 256bit
#11041 commented on
Apr 1, 2025 • 0 new comments -
Eval bug: Error running Phi4-mini gguf: unknown pre-tokenizer type: 'gpt-4o'
#12122 commented on
Apr 1, 2025 • 0 new comments -
Eval bug: In RISC-V, output tokens are broken
#12124 commented on
Apr 1, 2025 • 0 new comments -
Feature Request:
#12128 commented on
Apr 1, 2025 • 0 new comments -
Feature request: Graphical GGUF viewer
#6715 commented on
Mar 31, 2025 • 0 new comments -
Eval bug: Command A only outputs 88888888 with -fa
#12441 commented on
Mar 31, 2025 • 0 new comments -
ggml : refactor ggml-cpu.c into multiple C++ source files
#10180 commented on
Mar 31, 2025 • 0 new comments -
llama cpp android gpu
#12462 commented on
Mar 31, 2025 • 0 new comments -
Misc. bug: llama-cli '--log-disable' parameter omits response
#11983 commented on
Mar 31, 2025 • 0 new comments -
Eval bug: model producing gibberish for Orion14b-chat
#12411 commented on
Mar 31, 2025 • 0 new comments -
Feature Request: Support for Qwen2-VL
#9246 commented on
Mar 31, 2025 • 0 new comments -
Compile bug: fatal error: 'ggml.h' file not found
#12101 commented on
Mar 31, 2025 • 0 new comments -
Misc. bug: When using streaming output, if stream_options={"include_usage": True} is not set, the returned result should not include usage stats
#12102 commented on
Mar 31, 2025 • 0 new comments -
Misc. bug: While running llama-simple-chat, it throws "context size exceeded"
#12113 commented on
Mar 31, 2025 • 0 new comments -
Misc. bug: Server web UI: Complete output is lost due to the „normal“ context shift message
#12120 commented on
Mar 31, 2025 • 0 new comments -
Eval bug: llama-qwen2vl-cli --log-disable rather disables the response, not the log
#12407 commented on
Mar 30, 2025 • 0 new comments -
Compile bug: Emulated Linux ARM64 CPU build fails
#10933 commented on
Mar 30, 2025 • 0 new comments -
Regarding llama-bench and llama-parallel commands
#12106 commented on
Mar 30, 2025 • 0 new comments -
ggml-quants : weighted rounding algorithms with cumulative search
#12557 commented on
Mar 30, 2025 • 0 new comments -
llama-map to support hugepage feature of pagesize 2M or 1G which can …
#12552 commented on
Mar 31, 2025 • 0 new comments -
quantize: Handle user-defined quantization levels for additional tensors
#12511 commented on
Mar 31, 2025 • 0 new comments -
(draft) tts: Orpheus support
#12487 commented on
Mar 28, 2025 • 0 new comments -
Metal TQ2_0
#12485 commented on
Mar 30, 2025 • 0 new comments -
Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture
#12466 commented on
Mar 28, 2025 • 0 new comments -
ci: add Linux cross-compile build
#12428 commented on
Mar 31, 2025 • 0 new comments -
[WIP] MUSA: enable fastfp16, correct warp reduce impl and perf tuning
#12383 commented on
Mar 30, 2025 • 0 new comments -
`server`: streaming of tool calls and thoughts when `--jinja` is on
#12379 commented on
Mar 28, 2025 • 0 new comments -
Supporting Velvet model
#11716 commented on
Mar 26, 2025 • 0 new comments -
Add support for Deepseek-R1 flash attention
#11557 commented on
Mar 26, 2025 • 0 new comments -
Optimized DeepSeek V2/V3 implementation (MLA)
#11446 commented on
Mar 31, 2025 • 0 new comments -
llama : add option to override model tensor buffers
#11397 commented on
Mar 27, 2025 • 0 new comments -
add FP8 support to gguf/llama:
#10055 commented on
Mar 29, 2025 • 0 new comments -
Simplify and improve CUDA graphs through use of indirect copy pointers
#9017 commented on
Mar 31, 2025 • 0 new comments -
Feature Request: Qwen 2.5 VL
#11483 commented on
Apr 1, 2025 • 0 new comments -
llama-gemma3-cli: output degeneration after repeated uses
#12499 commented on
Apr 1, 2025 • 0 new comments -
Misc. bug: CUDA errors with multi-threaded use
#11804 commented on
Apr 1, 2025 • 0 new comments -
Compile bug: Build failure on VirtualBox: ggml-cpu-aarch64.cpp invalid conversion error
#11783 commented on
Mar 28, 2025 • 0 new comments -
Misc. bug: ggml-backend.cpp:746: pre-allocated tensor (cache_k_l0 (view) (copy of cache_k_l0 (view))) in a buffer (Vulkan0) that cannot run the operation (CPY)
#12045 commented on
Mar 28, 2025 • 0 new comments -
Misc. bug: Crashing, forcing BMI2 on non BMI2 CPUs
#12500 commented on
Mar 27, 2025 • 0 new comments -
ggml : add ANE backend
#10453 commented on
Mar 27, 2025 • 0 new comments -
Bug: Cannot run larger than VRAM models with `GGML_CUDA_ENABLE_UNIFIED_MEMORY`
#10091 commented on
Mar 27, 2025 • 0 new comments -
Misc. bug: [SERVER] Multiple slots, generation speed is degraded after each generation/slot used
#10860 commented on
Mar 27, 2025 • 0 new comments -
Compile bug:
#11930 commented on
Mar 27, 2025 • 0 new comments -
Feature Request: 推理minicpmv时,encoding_image_with_clip耗时很久
#11941 commented on
Mar 27, 2025 • 0 new comments -
Eval bug: context shift is disabled
#11974 commented on
Mar 27, 2025 • 0 new comments -
Eval bug: Error when converting moonlight from bf16 to q4km
#12040 commented on
Mar 27, 2025 • 0 new comments -
Compile bug: llama.cpp-b4749/ggml/src/ggml-cpu/ggml-cpu-quants.c:5141:26: error: initialization of ‘uint32_t *’ {aka ‘unsigned int *’} from incompatible pointer type ‘const uint8_t (*)[12]’ {aka ‘const unsigned char (*)[12]’} [-Wincompatible-pointer-types]
#12050 commented on
Mar 27, 2025 • 0 new comments -
Misc. bug: cannot scroll to right side when input too long
#12054 commented on
Mar 27, 2025 • 0 new comments -
Eval bug: the swiftui keeps saying the same thing
#12558 commented on
Mar 26, 2025 • 0 new comments -
Possible solution for poor token generation performance in llama.cpp on dual Epyc Genoa/Turin systems
#11744 commented on
Mar 26, 2025 • 0 new comments -
Misc. bug: auto scroll doesn't work in WebUI
#12362 commented on
Mar 25, 2025 • 0 new comments -
Feature Request: Add support for Kokoro TTS
#11050 commented on
Mar 25, 2025 • 0 new comments -
Misc. bug: vulkan: performance regression after fd123cfead49eb32e386e26b8ef7a6d41554dda5
#12553 commented on
Mar 25, 2025 • 0 new comments -
Study how LM Evaluation Harness works and try to implement it
#231 commented on
Mar 25, 2025 • 0 new comments -
Eval bug: inference of 32B eats too much memory on ROCM HIP (5x AMD Radeon Instinct Mi50 (gfx906))
#12369 commented on
Mar 25, 2025 • 0 new comments -
Eval bug: CPU usage is abnormal when running deepseek-r1-671B-Q4_0 weights in Atlas 800T a2 and NPU device。
#11966 commented on
Mar 25, 2025 • 0 new comments -
csm : implement Sesame-based conversation example
#12392 commented on
Mar 30, 2025 • 0 new comments -
Compile bug: iOS version able to build not not able to run
#10922 commented on
Mar 30, 2025 • 0 new comments -
"CPU_AARCH64 model buffer" appears when not using AARCH64
#11204 commented on
Mar 30, 2025 • 0 new comments -
Feature Request: NUMA-aware MoE Expert Allocation for Improved Performanc
#11333 commented on
Mar 30, 2025 • 0 new comments -
Feature Request: resize an existing context
#11577 commented on
Mar 30, 2025 • 0 new comments -
Eval bug: granite-vision-3.1-2b-preview ERROR:hf-to-gguf:Model LlavaNextForConditionalGeneration is not supported
#12053 commented on
Mar 30, 2025 • 0 new comments -
Compile bug: Failed to compile on centos8 system
#12092 commented on
Mar 30, 2025 • 0 new comments -
tts : add support for Orpheus
#12476 commented on
Mar 29, 2025 • 0 new comments -
kubernetes example
#6546 commented on
Mar 29, 2025 • 0 new comments -
Move gguf fuzzers to the llama.cpp repository
#11514 commented on
Mar 29, 2025 • 0 new comments -
Feature Request: Support Codestral Mamba
#8519 commented on
Mar 29, 2025 • 0 new comments -
Eval bug: llama.cpp CPU bound while inferencing against DeepSeek-R1 GGUF
#11635 commented on
Mar 29, 2025 • 0 new comments -
Eval bug: rpc backend surport cpu?
#11807 commented on
Mar 29, 2025 • 0 new comments -
Enhancement: Improve ROCm performance on various quants (benchmarks included)
#11931 commented on
Mar 29, 2025 • 0 new comments -
Eval bug: TikTokenTokenizer has no attribute vocab
#12044 commented on
Mar 29, 2025 • 0 new comments -
Misc. bug: llama-cli llama_backend_free may not free all the gpu memory
#12057 commented on
Mar 29, 2025 • 0 new comments -
Eval bug: MUSA error: operation not supported
#12077 commented on
Mar 29, 2025 • 0 new comments -
Misc. bug: Loop range computation question of Vulkan matmul shaders
#12082 commented on
Mar 29, 2025 • 0 new comments -
Compile bug: How to compile llama.cpp with Vulkan for android device
#11695 commented on
Mar 29, 2025 • 0 new comments -
Eval bug: Phi-4 mini in iOS with xcframework
#12232 commented on
Mar 28, 2025 • 0 new comments