Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
294 commits
Select commit Hold shift + click to select a range
a23b9bd
ggml : fix unaligned access in AMX code (#16315)
ggerganov Oct 6, 2025
3a002af
ci : refactor sdk caching to minimize storage (#16414)
CISC Oct 6, 2025
c08002a
chat : Granite Docling stopping (#16438)
gabe-l-hart Oct 6, 2025
3df2244
llama : add --no-host to disable host buffers (#16310)
Gadflyii Oct 6, 2025
8ae32dc
metal : various optimizations + refactoring (#16446)
ggerganov Oct 7, 2025
1d6092f
tests : add -INF blocks to the KQ mask in the FA tests (#16380)
ggerganov Oct 7, 2025
0a319bb
metal : add support for non-padded FA KV (#16148)
ggerganov Oct 7, 2025
0123ff3
memory : use sequential equal splits for recurrent modules (#16442)
ggerganov Oct 7, 2025
c61ae20
rpc : update documentation (#16441)
rgerganov Oct 7, 2025
ef4c5b8
presets : fix pooling param for embedding models (#16455)
ggerganov Oct 7, 2025
4e0388a
webui : added download action (#13552) (#16282)
srogmann Oct 7, 2025
df1b612
server : add `/v1/health` endpoint (#16461)
ggerganov Oct 7, 2025
aeaf8a3
llama : support LiquidAI LFM2-MoE hybrid model (#16464)
tdakhran Oct 7, 2025
74b8fc1
ggml webgpu: profiling, CI updates, reworking of command submission (…
reeselevine Oct 7, 2025
7fdd16b
server : improve context checkpoint logic (#16440)
ggerganov Oct 8, 2025
b2c08c9
metal : mark FA blocks (#16372)
ggerganov Oct 8, 2025
d2ee056
server : fix cancel pending task (#16467)
issixx Oct 8, 2025
9d08828
Disable CUDA host buffers on integrated GPUs (#16308)
ai-fonsi Oct 8, 2025
12bbc3f
refactor: centralize CoT parsing in backend for streaming mode (#16394)
ServeurpersoCom Oct 8, 2025
e08db42
model: EmbeddingGemma Adding Support for SentenceTransformers Dense M…
sfallah Oct 9, 2025
b260213
[SYCL] refactor soft_max, add soft_max_back (#16472)
NeoZhangJianyu Oct 9, 2025
d80d6d2
kleidiai: kernel interface refactoring (#16460)
chaxu01 Oct 9, 2025
aa4711d
CANN: Improve ACL graph matching (#16166)
noemotiovon Oct 9, 2025
2c0d875
ci: add ARM64 Kleidiai build and test support (#16462)
sudhiarm Oct 9, 2025
56b4795
model-conversion : add support for SentenceTransformers (#16387)
danbev Oct 9, 2025
8328fd4
No markdown in cot (#16483)
ServeurpersoCom Oct 9, 2025
d00cbea
server : host-memory prompt caching (#16391)
ggerganov Oct 9, 2025
1deee0f
cpu : optimize the ggml NORM operation (#15953)
duduta Oct 9, 2025
1faa13a
webui: updated the chat service to only include max_tokens in the req…
ServeurpersoCom Oct 9, 2025
6d69ab3
cmake : Dont define XOPENSOURCE on AIX (#16481)
mehendarkarprajwal Oct 10, 2025
cdb6da4
server : log requests to /v1/completions (#16495)
rgerganov Oct 10, 2025
68ee98a
server : return HTTP 400 if prompt exceeds context length (#16486)
rgerganov Oct 10, 2025
81086cd
vocab : mark EOT token for Granite models (#16499)
ggerganov Oct 10, 2025
e60f01d
server : fix division by zero when reporting stats (#16501)
ggerganov Oct 10, 2025
477a66b
convert : correctly handle LLaMA tokenizer for Jamba (#16470)
amirai21 Oct 11, 2025
97870e6
cuda : avoid initializing unused devices (#16510)
slaren Oct 11, 2025
31d0ff1
server / ranking : add sorting and management of top_n (#16403)
YannFollet Oct 11, 2025
4a8fbe0
feat: render user content as markdown option (#16358)
ServeurpersoCom Oct 11, 2025
a3cb047
metal : fix mul-mm condition + fix mul-mv permuted kernels (#16494)
ggerganov Oct 11, 2025
11f0af5
CUDA: faster tile FA, add oob checks, more HSs (#16492)
JohannesGaessler Oct 11, 2025
20cc625
ggml: Correct SVE implementation in ggml_vec_dot_f16_unroll (#16518)
sirus20x6 Oct 12, 2025
a2fba89
hparams : add check for layer index in is_recurrent (#16511)
danbev Oct 12, 2025
41aac5c
ggml : Fix FP16 ELU positive branch (#16519)
sirus20x6 Oct 12, 2025
4b2dae3
common : update presets (#16504)
ggerganov Oct 12, 2025
2c301e9
common : handle unicode during partial json parsing (#16526)
aldehir Oct 12, 2025
8415f61
ci : add Vulkan on Ubuntu with default packages build (#16532)
mbaudier Oct 12, 2025
c7be9fe
[SYCL] fix UT fault cases: count-equal, argsort, pad OPs (#16521)
NeoZhangJianyu Oct 12, 2025
81d54bb
webui: remove client-side context pre-check and rely on backend for l…
ServeurpersoCom Oct 12, 2025
a31cf36
metal : add opt_step_adamw and op_sum (#16529)
cern1710 Oct 12, 2025
f9bc66c
CANN: Update several operators to support FP16 data format (#16251)
hipudding Oct 13, 2025
c515fc5
ggml : fix scalar path for computing norm (#16558)
ggerganov Oct 13, 2025
3f750f8
metal: add support for opt_step_sgd (#16539)
cern1710 Oct 13, 2025
1fb9504
fix: add remark plugin to render raw HTML as literal text (#16505)
ServeurpersoCom Oct 13, 2025
56fc38b
CANN: fix CPU memory leak in CANN backend (#16549)
noemotiovon Oct 13, 2025
01d2bdc
ggml : fix build broken with -march=armv9-a on MacOS (#16520)
DamonFool Oct 13, 2025
7049736
CUDA: fix numerical issues in tile FA kernel (#16540)
JohannesGaessler Oct 13, 2025
5016b72
opencl: fix build targeting CL 2 (#16554)
lhez Oct 13, 2025
e38b7c6
graph : support cacheless embeddings with FA and iSWA (#16528)
ggerganov Oct 13, 2025
e60f241
metal : FA support F32 K and V and head size = 32 (#16531)
ggerganov Oct 13, 2025
bc07349
server : dynamic token limit for prompt cache (#16560)
ggerganov Oct 14, 2025
5b6913c
cuda : remove legacy copy-op pointer indirection code (#16485)
anavp-nvidia Oct 14, 2025
48e2fa9
CUDA: add fp kernel for larger batch size MoE (#16512)
am17an Oct 14, 2025
1ee9d0b
CUDA: use fastdiv + ggml_cuda_mad for mmvf (#16557)
am17an Oct 14, 2025
9c7185d
CUDA: enable FA for FP32 KV cache (#16546)
JohannesGaessler Oct 14, 2025
7ea15bb
vulkan: Improve build time for MSVC (#16545)
jeffbolznv Oct 14, 2025
4258e0c
vulkan: Support FA with K/V in F32 (#16543)
jeffbolznv Oct 14, 2025
120bf70
CUDA + openCL: fix bug in accessing rms_norm->src while doing fusion …
am17an Oct 14, 2025
ffa0590
vulkan: Add ACC_TYPE_VEC2 implementation (#16203)
SavicStefan Oct 14, 2025
fa882fd
metal : avoid using Metal's gpuAddress property (#16576)
ggerganov Oct 14, 2025
554fd57
server : fix mtmd checkpoints (#16591)
ggerganov Oct 15, 2025
5acd455
CUDA: Changing the CUDA scheduling strategy to spin (#16585)
JTischbein Oct 15, 2025
3e3cb19
llama-quant: add support for mmproj (#16592)
ngxson Oct 15, 2025
17304cb
server : fix img token logs (#16595)
ggerganov Oct 15, 2025
f4ce81c
metal: optimise `GGML_OP_SUM` (#16559)
cern1710 Oct 15, 2025
f9fb33f
Add server-driven parameter defaults and syncing (#16515)
allozaur Oct 15, 2025
d93f843
opencl: fix FA for f32 (#16584)
lhez Oct 15, 2025
0cb7a06
opencl: add q8_0 mm support (#16469)
lhez Oct 15, 2025
466c191
cpu : add FLOOR, CEIL, ROUND and TRUNC unary operators (#16083)
safranowith Oct 15, 2025
7adc79c
gguf-py : add support for endian conversion of BF16 data (#16594)
AlekseiNikiforovIBM Oct 15, 2025
ee50ee1
SYCL: Add GGML_OP_MEAN operator support (#16009)
yael-works Oct 16, 2025
adc9b60
ggml-cpu: replace putenv with setenv for const-correctness (#16573)
otegami Oct 16, 2025
6f5d924
common : Update the docs on -t --threads (#16236)
takasurazeem Oct 16, 2025
7a50cf3
CANN: format code using .clang-format (#15863)
noemotiovon Oct 16, 2025
b22572e
sycl : add ARANGE operator (#16362)
GittyBurstein Oct 16, 2025
683fa6b
fix: added a normalization step for MathJax-style \[\] and \(\) delim…
ServeurpersoCom Oct 16, 2025
1bb4f43
mtmd : support home-cooked Mistral Small Omni (#14928)
ngxson Oct 16, 2025
ceff6bb
SYCL SET operator optimized for F32 tensors (#16350)
GittyBurstein Oct 17, 2025
79967ec
grammar : use int64_t to avoid int overflows in int schema to grammar…
ochafik Oct 17, 2025
9ad4f19
metal : add `CONV_TRANSPOSE_2D` (#16542)
iliailmer Oct 17, 2025
b194915
vulkan: fix debug build (add_rms_len/data not found) (#16624)
jeffbolznv Oct 17, 2025
ababae7
webui: reorganize settings layout (#16607)
ServeurpersoCom Oct 17, 2025
342c728
ggml : fix SpaceMit IME array out-of-bounds in task assignment (#16629)
muggle-stack Oct 17, 2025
3d4e86b
vulkan: Add State Space Model (SSM) Operations Support (#16463)
giuseppe Oct 17, 2025
41386cf
rpc : report actual free memory (#16616)
rgerganov Oct 17, 2025
66b0dbc
llama-model: fix insonsistent ctxs <-> bufs order (#16581)
JohannesGaessler Oct 17, 2025
8138785
opencl: transposed gemm/gemv moe kernel with mxfp4,f32 (#16602)
shawngu-quic Oct 18, 2025
38355c6
CUDA: use registers instead of smem in topk-moe (#16647)
am17an Oct 18, 2025
e56abd2
vulkan: Implement topk_moe fused shader, ported from CUDA (#16641)
jeffbolznv Oct 18, 2025
ee09828
HIP: fix GPU_TARGETS (#16642)
JohannesGaessler Oct 18, 2025
55754be
CODEOWNERS: update for ggml-cuda/mmf (#16660)
am17an Oct 19, 2025
fcb235b
ci: include s390x release binaries (#16648)
taronaeo Oct 19, 2025
cec5edb
ci : avoid manual updates of docs/ops.md (#16663)
CISC Oct 19, 2025
4f73d0a
ci : fix binaries release failure for s390x (binaries may not work ye…
taronaeo Oct 19, 2025
0398752
model : add Granite Hybrid types (#16635)
giuseppe Oct 19, 2025
7062dd8
llama-context: only warn on pooling_type when user specified (#16674)
otegami Oct 20, 2025
2330de7
SYCL: Add support for FLOOR,CEIL,ROUND and TRUNC unary operators (#16…
safranowith Oct 20, 2025
72d53e6
readme: update bindings (#16651)
deadprogram Oct 20, 2025
06332e2
llama-batch: fix build fails with `-Werror=missing-braces` (#16614)
otegami Oct 20, 2025
13f2cfa
Enable per-conversation loading states to allow having parallel conve…
allozaur Oct 20, 2025
0e4a0cf
Import/Export UX improvements (#16619)
allozaur Oct 20, 2025
7906850
Prevent premature submission on IME input (#16673)
allozaur Oct 20, 2025
b617cfd
ggml-alloc : fix leak when reusing a tensor with a larger size (#16679)
slaren Oct 20, 2025
c9c1972
Handle legacy 'context' attachments (#16687)
allozaur Oct 20, 2025
84bf3c6
model : add BailingMoeV2 support (#16063)
CISC Oct 20, 2025
6de8ed7
sycl : add PAD_REFLECT_D1 operator support (#16145)
ye-NX Oct 20, 2025
fb34984
vulkan: Handle FA with all -inf mask values (#16447)
jeffbolznv Oct 21, 2025
6ea37f5
opencl: fix warnings and clean up profiling (#16688)
lhez Oct 21, 2025
4926419
ggml: add ggml_can_fuse_subgraph (#16662)
am17an Oct 21, 2025
51d1a8c
CUDA: better error for FA kernel with 0 occupancy (#16643)
JohannesGaessler Oct 21, 2025
03792ad
CUDA: topk-moe: add optional parameter for gpt-oss (#16649)
am17an Oct 21, 2025
9285325
CUDA: fix bug in topk-moe softmax (#16711)
am17an Oct 22, 2025
d8eaa26
tests : fix test-thread-safety when compiling with multiple backends …
Acly Oct 22, 2025
19a5a3e
ggml : Leverage the existing GGML_F32_VEC helpers to vectorize ggml_v…
sirus20x6 Oct 22, 2025
9b9201f
webui: introduce OpenAI-compatible model selector in JSON payload (#1…
ServeurpersoCom Oct 22, 2025
a2e0088
Revert "ggml : Leverage the existing GGML_F32_VEC helpers to vectoriz…
slaren Oct 22, 2025
63d2fc4
Add experimental ggml-hexagon backend for the Hexagon NPU (#16547)
max-krasnyansky Oct 22, 2025
9de9672
sycl: use async memory allocation to fix crashes during graph recordi…
mmichel11 Oct 23, 2025
8cf6b42
server : send partial stop string when <EOG> is reached (#15007)
matteoserva Oct 23, 2025
061f0ef
ggml-cuda: use passed ops instead of hardcoded ops (#16712)
am17an Oct 23, 2025
fe6a988
Manually link -lbsd to resolve flock symbol on AIX (#16610)
mehendarkarprajwal Oct 23, 2025
d0660f2
mtmd-cli : allow using --jinja (#16718)
ngxson Oct 23, 2025
dd62dcf
convert : Make mistral-common dependency optional (#16738)
juliendenize Oct 23, 2025
0bf47a1
server: add memory breakdown print (#16740)
JohannesGaessler Oct 23, 2025
f8f071f
convert : handle pre-quantized models (#14810)
compilade Oct 23, 2025
5a91109
model-conversion : add trust_remote_code for orig model run [no ci] (…
danbev Oct 24, 2025
69e9ff0
webui: support q URL parameter (#16728)
odrling Oct 24, 2025
0bcb40b
CUDA: use CUB for arbitary size argsort (#16754)
am17an Oct 24, 2025
55945d2
ggml: fix CUDA grid launch condition for large block_nums.y in binbca…
leejet Oct 24, 2025
5cca254
convert : avoid dequantizing mxfp4 for GPT-OSS (#16756)
compilade Oct 25, 2025
8423d01
vulkan: Optimize SSM_SCAN (#16645)
jeffbolznv Oct 25, 2025
f90b4a8
vulkan: delete dead code (#16732)
giuseppe Oct 25, 2025
226f295
model : set res->t_embd in PLaMo2 models (#16766)
mitmul Oct 25, 2025
5d195f1
convert : handle mmproj filename/path properly (#16760)
Galunid Oct 25, 2025
3cfa9c3
vulkan: deduplicate Microsoft Direct3D12 devices (#16689)
giladgd Oct 26, 2025
f77c13b
CUDA: General GEMV fusion (#16715)
am17an Oct 26, 2025
8d88628
docs : add Jamba to Text-only models list (#16778)
amirai21 Oct 26, 2025
7cce4f8
model : set res->t_embd in SmallThinker models (#16782)
CISC Oct 26, 2025
f696428
graph : add clamping to ffn_moe_weights_sum to avoid div-by-zero (#16…
CISC Oct 26, 2025
73a48c9
convert : enable expert group selection for all models with it (#16691)
CISC Oct 26, 2025
bbac6a2
ggml: fix cuda kernel launch configuration for k_compute_batched_ptrs…
leejet Oct 26, 2025
bd562fe
cuda : use fast copy when src and dst are of different type and conti…
CISC Oct 26, 2025
3470a5c
ggml-alloc : make gallocr prefer chunks that allow memory reuse (#16788)
Acly Oct 26, 2025
75d33b9
CUDA: support for weight clamp in top-k norm (#16702)
am17an Oct 27, 2025
59fc1ec
sycl: add REPEAT_BACK operation support (#16734)
shani-f Oct 27, 2025
2b9bd9b
sycl: add ROLL operation support (#16665)
tamarPal Oct 27, 2025
75cbdd3
test-backend-ops: print failed tests at the end (#16785)
am17an Oct 27, 2025
945501f
llama: fix leaked buffers for mmap + split files (#16765)
JohannesGaessler Oct 27, 2025
c55d53a
model : add LightOnOCR-1B model (#16764)
ngxson Oct 27, 2025
80d28f1
HIP: fix AMDGPU_TARGETS, update documentation (#16803)
JohannesGaessler Oct 27, 2025
10640e3
ggml : fix interpolate with align-corners and ne=1 (#16700)
Acly Oct 27, 2025
5a4ff43
llama : disable pipeline parallelism if compute buffer allocation fai…
slaren Oct 27, 2025
e1ab084
mtmd : fix idefics3 preprocessing (#16806)
ngxson Oct 27, 2025
c053e18
chat: Add LFM2 tool handling (#16763)
ykhrustalev Oct 27, 2025
ad8d36b
sycl: add SSM_CONV operation support (#16800)
tamarPal Oct 28, 2025
463bbf2
CUDA: add unused vars to mmvf and mmvq (#16807)
am17an Oct 28, 2025
3479efd
CANN: Improve device ID handling and aclnnArange checks (#16752)
noemotiovon Oct 28, 2025
280d97b
grammar : support array references in json schema (#16792)
aldehir Oct 28, 2025
7a0e900
llama: consistent ctx <-> buf order for KV cache (#16746)
JohannesGaessler Oct 28, 2025
1c1409e
embedding: add raw option for --embd-output-format (#16541)
SamMalayek Oct 28, 2025
8284efc
initialise buffer.device in ggml_hexagon_session (#16816)
l3utterfly Oct 28, 2025
a8ca18b
llama-bench : clarify benchmarked parts of the computation (#16823)
ggerganov Oct 28, 2025
85a7d86
memory : remove KV cache size padding (#16812)
ggerganov Oct 28, 2025
851553e
cuda: add SET operation support (#16804)
YaelGitAccount Oct 28, 2025
338074c
sycl: add RMS_NORM_BACK operation support (#16808)
YaelLogic Oct 29, 2025
9a3ea68
CUDA: Fix bug in topk-moe for gpt-oss (#16821)
am17an Oct 29, 2025
f549b00
vulkan: Call ggml_vk_buffer_write_2d from ggml_vk_buffer_copy (#16793)
jeffbolznv Oct 29, 2025
144a4ce
vendor : sync minja (#16500)
CISC Oct 29, 2025
e41bcce
CUDA: use fastdiv in set-rows (#16834)
am17an Oct 29, 2025
3eb2be1
Hexagon Op queue & dispatch optimizations (#16820)
max-krasnyansky Oct 29, 2025
bcf5bda
Vulkan MMQ Integer Dot Refactor and K-Quant support (#16536)
0cc4m Oct 29, 2025
10fcc41
vulkan: Update topk_moe fusion to handle gpt's late softmax (#16656)
jeffbolznv Oct 29, 2025
e3af556
llama: store mrope data in KV cell (#16825)
ngxson Oct 29, 2025
3464bda
llama: fix ASAN error with M-RoPE (#16848)
ngxson Oct 29, 2025
b9ce940
vulkan: Fuse rope+set_rows (#16769)
jeffbolznv Oct 29, 2025
8b11dee
Hide latency of bias and gate-loading (#16847)
ORippler Oct 30, 2025
052df28
vulkan: Handle argsort with a large number of rows (#16851)
jeffbolznv Oct 30, 2025
d739511
llama : use std::abs instead of abs (#16853)
kaetemi Oct 30, 2025
229bf68
cuda : fix argsort with 64k+ rows (#16849)
CISC Oct 30, 2025
bacddc0
model: Add support for CogVLM model (#15002)
Tianyue-Zhao Oct 30, 2025
dcca0d3
cpu: introduce chunking for flash attention (#16829)
max-krasnyansky Oct 30, 2025
d261223
model: add support for qwen3vl series (#16780)
JJJYmmm Oct 30, 2025
835e918
common: fix typo in cli help text (#16864)
sbera77 Oct 30, 2025
517b717
cpu: introduce chunking for repack matmuls and enable matmul-id chunk…
max-krasnyansky Oct 30, 2025
b52edd2
server : remove n_past (#16818)
ggerganov Oct 30, 2025
16724b5
server : bump request URI max length to 32768 (#16862)
chansikpark Oct 30, 2025
ce18efe
convert : update transformers requirements (#16866)
RodriMora Oct 30, 2025
9984cbb
opencl: fix boundary handling for mul_mm (#16875)
lhez Oct 30, 2025
6eb208d
ci : enable free-disk-space on cuda docker build (#16877)
CISC Oct 30, 2025
13002a0
ggml-hexagon: respect input size when getting/setting tensor data (#1…
l3utterfly Oct 31, 2025
d2a2673
vulkan: fix shmem overrun in mmq id shader (#16873)
0cc4m Oct 31, 2025
2976b03
vulkan: Fix crash when FP16 mul_mat accumulation is not supported (#1…
rillomas Oct 31, 2025
d2d931f
vulkan: disable spirv-opt for rope shaders (#16872)
jeffbolznv Oct 31, 2025
0f715b4
server : fix typos in server.cpp comments [no ci] (#16883)
danbev Oct 31, 2025
c22473b
server : don't print user inputs to console (#16871)
ggerganov Oct 31, 2025
8da3c0e
batch : fix consistency checks for the input positions (#16890)
ggerganov Oct 31, 2025
4146d6a
CUDA: add expert reduce kernel (#16857)
am17an Oct 31, 2025
6d39015
sync : ggml
ggerganov Oct 31, 2025
31c511a
CUDA: Volta tensor core support for MMF (#16843)
JohannesGaessler Oct 31, 2025
e58d585
model : add Granite Hybrid nano types (#16896)
giuseppe Oct 31, 2025
0de0a01
model : Minimax M2 (#16831)
pwilkin Oct 31, 2025
bea0452
refactor : llama-model.cpp (#16252)
pwilkin Oct 31, 2025
d3dc9dd
CUDA: Remove unneded bias/gate dims in fused mmvq (#16858)
ORippler Nov 1, 2025
2e76e01
vulkan: fuse mul_mat+add and mul_mat_id+add_id (#16868)
jeffbolznv Nov 1, 2025
5d8bb90
vulkan: Fix multi_add invalid descriptor usage (#16899)
jeffbolznv Nov 1, 2025
74fef41
codeowners : update after refactor (#16905)
CISC Nov 1, 2025
961660b
common : allow --system-prompt-file for diffusion-cli (#16903)
CISC Nov 1, 2025
1ae7488
webui: recognize AsciiDoc files as valid text files (#16850)
jhradilek Nov 1, 2025
d8b860a
Add a setting to display message generation statistics (#16901)
allozaur Nov 1, 2025
cf659bb
mtmd: refactor preprocessing + support max/min pixels (#16878)
ngxson Nov 1, 2025
dd5e8ca
vendor : update cpp-httplib to 0.27.0 (#16846)
angt Nov 1, 2025
e4a7159
webui: add HTML/JS preview support to MarkdownContent with sandboxed …
ServeurpersoCom Nov 1, 2025
2f68ce7
webui: auto-refresh /props on inference start to resync model metadat…
ServeurpersoCom Nov 1, 2025
7fd205a
scripts : add script to bench models (#16894)
ggerganov Nov 1, 2025
d38d9f0
ggml: add s390x cpu-feats (#16774)
taronaeo Nov 2, 2025
a864132
devops: fix failing s390x docker build (#16918)
taronaeo Nov 2, 2025
7db35a7
CUDA: add FLOOR, CEIL, ROUND, TRUNC unary ops (#16917)
mnehete32 Nov 2, 2025
76af40a
docs: remove llama_sampler_accept reference in sampling sample usage …
alundb Nov 2, 2025
87c9efc
common : move gpt-oss reasoning processing to init params (#16937)
aldehir Nov 2, 2025
cd5e3b5
server : support unified cache across slots (#16736)
ggerganov Nov 2, 2025
2f966b8
clip : use FA (#16837)
ggerganov Nov 2, 2025
6b9a524
model: add Janus Pro for image understanding (#16906)
ravenouse Nov 2, 2025
dd52868
ci : disable failing riscv cross build (#16952)
CISC Nov 2, 2025
a2054e3
test-backend-ops : fix segfault in moe-expert-reduce test in support …
sbera77 Nov 2, 2025
bcfa876
feat(webui): improve LaTeX rendering with currency detection (#16508)
srogmann Nov 2, 2025
7e99416
SYCL: optimized repeat_back kernel (3× fewer asm instructions, 2× fas…
shani-f Nov 3, 2025
ee3a5a1
sync: minja (glm 4.6 & minmax m2 templates) (#16949)
ochafik Nov 3, 2025
fcfce04
ggml : LoongArch fixes (#16958)
MQ-mengqing Nov 3, 2025
bf7b0c9
mtmd: pad mask for qwen2.5vl (#16954)
ngxson Nov 3, 2025
070ff4d
mtmd: add --image-min/max-tokens (#16921)
ngxson Nov 3, 2025
622cd01
ggml: CUDA: add head size 72 for flash-attn (#16962)
theo77186 Nov 3, 2025
48bd265
server : add props.model_alias (#16943)
ggerganov Nov 3, 2025
ed8aa63
model-conversion : pass config to from_pretrained (#16963)
danbev Nov 3, 2025
e7da30b
fix: Viewing multiple PDF attachments (#16974)
allozaur Nov 3, 2025
c5023da
opencl: support imrope (#16914)
lhez Nov 3, 2025
2759ccd
CUDA: avoid mul + bias fusion when doing fusion (#16935)
am17an Nov 4, 2025
1f5accb
Fix garbled output with REPACK at high thread counts (#16956)
NoahOksuz Nov 4, 2025
b164259
chore : fix models indent after refactor (#16992)
CISC Nov 4, 2025
d945834
ci : apply model label to models (#16994)
CISC Nov 4, 2025
cc98f8d
ggml-cpu : bicubic interpolation (#16891)
Acly Nov 4, 2025
2fcd1a0
Merge ggml-org/llama.cpp master into amd-integration
cpietsch Nov 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
6 changes: 3 additions & 3 deletions .devops/intel.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
ARG ONEAPI_VERSION=2025.1.1-0-devel-ubuntu24.04
ARG ONEAPI_VERSION=2025.2.2-0-devel-ubuntu24.04

## Build Image

FROM intel/oneapi-basekit:$ONEAPI_VERSION AS build
FROM intel/deep-learning-essentials:$ONEAPI_VERSION AS build

ARG GGML_SYCL_F16=OFF
RUN apt-get update && \
Expand Down Expand Up @@ -31,7 +31,7 @@ RUN mkdir -p /app/full \
&& cp requirements.txt /app/full \
&& cp .devops/tools.sh /app/full/tools.sh

FROM intel/oneapi-basekit:$ONEAPI_VERSION AS base
FROM intel/deep-learning-essentials:$ONEAPI_VERSION AS base

RUN apt-get update \
&& apt-get install -y libgomp1 curl\
Expand Down
4 changes: 0 additions & 4 deletions .devops/nix/package.nix
Original file line number Diff line number Diff line change
Expand Up @@ -128,10 +128,6 @@ effectiveStdenv.mkDerivation (finalAttrs: {
};

postPatch = ''
substituteInPlace ./ggml/src/ggml-metal/ggml-metal.m \
--replace '[bundle pathForResource:@"ggml-metal" ofType:@"metal"];' "@\"$out/bin/ggml-metal.metal\";"
substituteInPlace ./ggml/src/ggml-metal/ggml-metal.m \
--replace '[bundle pathForResource:@"default" ofType:@"metallib"];' "@\"$out/bin/default.metallib\";"
'';

# With PR#6015 https://github.com/ggml-org/llama.cpp/pull/6015,
Expand Down
10 changes: 4 additions & 6 deletions .devops/rocm.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
ARG UBUNTU_VERSION=24.04

# This needs to generally match the container host's environment.
ARG ROCM_VERSION=6.4
ARG AMDGPU_VERSION=6.4
ARG ROCM_VERSION=7.0
ARG AMDGPU_VERSION=7.0

# Target the ROCm build image
ARG BASE_ROCM_DEV_CONTAINER=rocm/dev-ubuntu-${UBUNTU_VERSION}:${ROCM_VERSION}-complete
Expand All @@ -13,9 +13,8 @@ FROM ${BASE_ROCM_DEV_CONTAINER} AS build
# Unless otherwise specified, we make a fat build.
# List from https://github.com/ggml-org/llama.cpp/pull/1087#issuecomment-1682807878
# This is mostly tied to rocBLAS supported archs.
# gfx803, gfx900, gfx1032, gfx1101, gfx1102,not officialy supported
# gfx906 is deprecated
#check https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.1/reference/system-requirements.html
# gfx803, gfx900, gfx906, gfx1032, gfx1101, gfx1102,not officialy supported
# check https://rocm.docs.amd.com/projects/install-on-linux/en/docs-6.4.1/reference/system-requirements.html

ARG ROCM_DOCKER_ARCH='gfx803;gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1010;gfx1030;gfx1032;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201;gfx1151'
#ARG ROCM_DOCKER_ARCH='gfx1151'
Expand All @@ -42,7 +41,6 @@ RUN HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \
cmake -S . -B build \
-DGGML_HIP=ON \
-DGGML_HIP_ROCWMMA_FATTN=ON \
-DCMAKE_HIP_FLAGS="-I$(pwd)/rocwmma/library/include/" \
-DAMDGPU_TARGETS="$ROCM_DOCKER_ARCH" \
-DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON \
-DCMAKE_BUILD_TYPE=Release -DLLAMA_BUILD_TESTS=OFF \
Expand Down
5 changes: 4 additions & 1 deletion .devops/s390x.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,9 @@ RUN --mount=type=cache,target=/root/.ccache \
-DCMAKE_C_COMPILER_LAUNCHER=ccache \
-DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
-DLLAMA_BUILD_TESTS=OFF \
-DGGML_BACKEND_DL=OFF \
-DGGML_NATIVE=OFF \
-DGGML_BACKEND_DL=ON \
-DGGML_CPU_ALL_VARIANTS=ON \
-DGGML_BLAS=ON \
-DGGML_BLAS_VENDOR=OpenBLAS && \
cmake --build build --config Release -j $(nproc) && \
Expand Down Expand Up @@ -103,6 +104,7 @@ FROM base AS light
WORKDIR /llama.cpp/bin

# Copy llama.cpp binaries and libraries
COPY --from=collector /llama.cpp/bin/*.so /llama.cpp/bin
COPY --from=collector /llama.cpp/bin/llama-cli /llama.cpp/bin

ENTRYPOINT [ "/llama.cpp/bin/llama-cli" ]
Expand All @@ -116,6 +118,7 @@ ENV LLAMA_ARG_HOST=0.0.0.0
WORKDIR /llama.cpp/bin

# Copy llama.cpp binaries and libraries
COPY --from=collector /llama.cpp/bin/*.so /llama.cpp/bin
COPY --from=collector /llama.cpp/bin/llama-server /llama.cpp/bin

EXPOSE 8080
Expand Down
36 changes: 36 additions & 0 deletions .github/actions/install-exe/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: "Install exe"
description: "Download and install exe"
inputs:
url:
description: "URL of the exe installer"
required: true
args:
description: "Installer arguments"
required: true
timeout:
description: "Timeout (in ms)"
required: false
default: "600000"

runs:
using: "composite"
steps:
- name: Install EXE
shell: pwsh
run: |
$ErrorActionPreference = "Stop"
write-host "Downloading Installer EXE"
Invoke-WebRequest -Uri "${{ inputs.url }}" -OutFile "${env:RUNNER_TEMP}\temp-install.exe"
write-host "Installing"
$proc = Start-Process "${env:RUNNER_TEMP}\temp-install.exe" -ArgumentList '${{ inputs.args }}' -NoNewWindow -PassThru
$completed = $proc.WaitForExit(${{ inputs.timeout }})
if (-not $completed) {
Write-Error "Installer timed out. Killing the process"
$proc.Kill()
exit 1
}
if ($proc.ExitCode -ne 0) {
Write-Error "Installer failed with exit code $($proc.ExitCode)"
exit 1
}
write-host "Completed installation"
20 changes: 20 additions & 0 deletions .github/actions/linux-setup-spacemit/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: "Linux - Setup SpacemiT Toolchain"
description: "Setup SpacemiT Toolchain for Linux"
inputs:
path:
description: "Installation path"
required: true
version:
description: "SpacemiT toolchain version"
required: true

runs:
using: "composite"
steps:
- name: Setup SpacemiT Toolchain
id: setup
uses: ./.github/actions/unarchive-tar
with:
url: https://archive.spacemit.com/toolchain/spacemit-toolchain-linux-glibc-x86_64-v${{ inputs.version }}.tar.xz
path: ${{ inputs.path }}
strip: 1
20 changes: 20 additions & 0 deletions .github/actions/linux-setup-vulkan/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: "Linux - Setup Vulkan SDK"
description: "Setup Vulkan SDK for Linux"
inputs:
path:
description: "Installation path"
required: true
version:
description: "Vulkan SDK version"
required: true

runs:
using: "composite"
steps:
- name: Setup Vulkan SDK
id: setup
uses: ./.github/actions/unarchive-tar
with:
url: https://sdk.lunarg.com/sdk/download/${{ inputs.version }}/linux/vulkan_sdk.tar.xz
path: ${{ inputs.path }}
strip: 1
27 changes: 27 additions & 0 deletions .github/actions/unarchive-tar/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: "Unarchive tar"
description: "Download and unarchive tar into directory"
inputs:
url:
description: "URL of the tar archive"
required: true
path:
description: "Directory to unarchive into"
required: true
type:
description: "Compression type (tar option)"
required: false
default: "J"
strip:
description: "Strip components"
required: false
default: "0"

runs:
using: "composite"
steps:
- name: Unarchive into directory
shell: bash
run: |
mkdir -p ${{ inputs.path }}
cd ${{ inputs.path }}
curl --no-progress-meter ${{ inputs.url }} | tar -${{ inputs.type }}x --strip-components=${{ inputs.strip }}
15 changes: 15 additions & 0 deletions .github/actions/windows-setup-rocm/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
name: "Windows - Setup ROCm"
description: "Setup ROCm for Windows"
inputs:
version:
description: "ROCm version"
required: true

runs:
using: "composite"
steps:
- name: Setup ROCm
uses: ./.github/actions/install-exe
with:
url: https://download.amd.com/developer/eula/rocm-hub/AMD-Software-PRO-Edition-${{ inputs.version }}-WinSvr2022-For-HIP.exe
args: -install
4 changes: 4 additions & 0 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,10 @@ ggml:
- changed-files:
- any-glob-to-any-file:
- ggml/**
model:
- changed-files:
- any-glob-to-any-file:
- src/models/**
nix:
- changed-files:
- any-glob-to-any-file:
Expand Down
89 changes: 89 additions & 0 deletions .github/workflows/build-cache.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
name: Build Actions Cache

on:
workflow_dispatch: # allows manual triggering
schedule:
- cron: '0 * * * *'

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
cancel-in-progress: true

jobs:
ubuntu-24-vulkan-cache:
runs-on: ubuntu-24.04

steps:
- name: Clone
id: checkout
uses: actions/checkout@v4

- name: Get latest Vulkan SDK version
id: vulkan_sdk_version
run: |
echo "VULKAN_SDK_VERSION=$(curl https://vulkan.lunarg.com/sdk/latest/linux.txt)" >> "$GITHUB_ENV"

- name: Setup Cache
uses: actions/cache@v4
id: cache-sdk
with:
path: ./vulkan_sdk
key: vulkan-sdk-${{ env.VULKAN_SDK_VERSION }}-${{ runner.os }}

- name: Setup Vulkan SDK
if: steps.cache-sdk.outputs.cache-hit != 'true'
uses: ./.github/actions/linux-setup-vulkan
with:
path: ./vulkan_sdk
version: ${{ env.VULKAN_SDK_VERSION }}

ubuntu-24-spacemit-cache:
runs-on: ubuntu-24.04

env:
# Make sure this is in sync with build-linux-cross.yml
SPACEMIT_IME_TOOLCHAIN_VERSION: "1.1.2"

steps:
- name: Clone
id: checkout
uses: actions/checkout@v4

- name: Setup Cache
uses: actions/cache@v4
id: cache-toolchain
with:
path: ./spacemit_toolchain
key: spacemit-ime-toolchain-v${{ env.SPACEMIT_IME_TOOLCHAIN_VERSION }}-${{ runner.os }}

- name: Setup SpacemiT Toolchain
if: steps.cache-toolchain.outputs.cache-hit != 'true'
uses: ./.github/actions/linux-setup-spacemit
with:
path: ./spacemit_toolchain
version: ${{ env.SPACEMIT_IME_TOOLCHAIN_VERSION }}

windows-2022-rocm-cache:
runs-on: windows-2022

env:
# Make sure this is in sync with build.yml
HIPSDK_INSTALLER_VERSION: "25.Q3"

steps:
- name: Clone
id: checkout
uses: actions/checkout@v4

- name: Setup Cache
uses: actions/cache@v4
id: cache-rocm
with:
path: C:\Program Files\AMD\ROCm
key: rocm-${{ env.HIPSDK_INSTALLER_VERSION }}-${{ runner.os }}

- name: Setup ROCm
if: steps.cache-rocm.outputs.cache-hit != 'true'
uses: ./.github/actions/windows-setup-rocm
with:
version: ${{ env.HIPSDK_INSTALLER_VERSION }}
Loading