Skip to content
Merged

Main #33

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
292 commits
Select commit Hold shift + click to select a range
0da5d86
server : allow using LoRA adapters per-request (#10994)
ngxson Jan 2, 2025
2f0ee84
server: bench: minor fixes (#10765)
phymbert Jan 2, 2025
f66f582
llama : refactor `src/llama.cpp` (#10902)
ggerganov Jan 3, 2025
e7da954
metal : avoid uint (#11019)
ggerganov Jan 3, 2025
4b0c638
common : disable KV cache shifting automatically for unsupported mode…
MollySophia Jan 3, 2025
c31fc8b
fix: Vulkan shader gen binary path (#11037)
giladgd Jan 4, 2025
db68c93
ggml : improve inputs log sched_print_assignments (ggml/1053)
danbev Dec 19, 2024
5e3b08d
ggml : do not install metal source when embed library (ggml/1054)
ggerganov Jan 4, 2025
78c6785
sync : ggml
ggerganov Jan 4, 2025
46be942
llama : add support for the cohere2 model architecture (#10900)
dranger003 Jan 4, 2025
f922a9c
[GGML][RPC] Support for models with non-512-aligned tensors over RPC.…
matt23654 Jan 4, 2025
9394bbd
llama : Add support for DeepSeek V3 (#11049)
fairydreaming Jan 4, 2025
b56f079
Vulkan: Add device-specific blacklist for coopmat for the AMD proprie…
0cc4m Jan 4, 2025
46e3556
CUDA: add BF16 support (#11093)
JohannesGaessler Jan 6, 2025
5047dd3
llama : use _impl suffix instead of _internal (#11060)
ggerganov Jan 6, 2025
727368c
llama : use LLAMA_TOKEN_NULL (#11062)
ggerganov Jan 6, 2025
ae2f606
mmap : fix fileno macro clash (#11076)
ggerganov Jan 6, 2025
3e6e7a6
tokenize : escape the prompt (#11058)
ggerganov Jan 6, 2025
47182dd
llama : update llama_model API names (#11063)
ggerganov Jan 6, 2025
6369f86
llama : rename missed batch params/vars to ubatch (#10059)
danbev Jan 6, 2025
96a1dc2
llama : prevent system info string accumulation across calls (#11101)
a-ghorbani Jan 6, 2025
09186fa
llama : remove check flash_attn with lora (#11104)
ngxson Jan 6, 2025
e6e7c75
server : fix extra BOS in infill endpoint (#11106)
ggerganov Jan 6, 2025
96be8c3
github : add cmd line field to bug report (#11090)
ngxson Jan 6, 2025
ecebbd2
llama : remove unused headers (#11109)
ggerganov Jan 6, 2025
dc7cef9
llama-run : fix context size (#11094)
ericcurtin Jan 6, 2025
c0d6f79
SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 (#1…
qnixsynapse Jan 7, 2025
a4dd490
rpc : code cleanup (#11107)
rgerganov Jan 7, 2025
a3d50bc
ggml-backend : only offload from host buffers (#11120)
slaren Jan 7, 2025
017cc5f
ggml-backend : only offload from host buffers (fix) (#11124)
slaren Jan 7, 2025
53ff6b9
GGUF: C++ refactor, backend support, misc fixes (#11030)
JohannesGaessler Jan 7, 2025
bec2183
fix: Vulkan shader gen binary path when Cross-compiling (#11096)
ag2s20150909 Jan 8, 2025
02f0430
Disable GL_KHR_cooperative_matrix Vulkan extension if not available. …
mbaudier Jan 8, 2025
0d52a69
ci : fix cmake option (#11125)
ggerganov Jan 8, 2025
8cef75c
llamafile : ppc64le MMA INT8 implementation (#10912)
amritahs-ibm Jan 8, 2025
a3c1232
arg : option to exclude arguments from specific examples (#11136)
ggerganov Jan 8, 2025
80ccf5d
ci : pin dependency to specific version (#11137)
ngxson Jan 8, 2025
c792dcf
ggml : allow loading backend with env variable (ggml/1059)
rgerganov Jan 5, 2025
99a3755
sync : ggml
ggerganov Jan 8, 2025
c07d437
llama : avoid hardcoded QK_K (#11061)
ggerganov Jan 8, 2025
4d2b3d8
lora : improve compat with `mergekit-extract-lora` (#11131)
ngxson Jan 8, 2025
f7cd133
ci : use actions from ggml-org (#11140)
ngxson Jan 8, 2025
1bf839b
Enhance user input handling for llama-run (#11138)
ericcurtin Jan 8, 2025
8a1d9c2
gguf-py : move scripts directory (#11116)
VJHack Jan 8, 2025
8d59d91
fix: add missing msg in static_assert (#11143)
hydai Jan 8, 2025
d9feae1
llama-chat : add phi 4 template (#11148)
ngxson Jan 9, 2025
be0e950
media : remove old img [no ci]
ggerganov Jan 9, 2025
f8feb4b
model: Add support for PhiMoE arch (#11003)
phymbert Jan 9, 2025
8eceb88
server : add tooltips to settings and themes btn (#11154)
danbev Jan 9, 2025
1204f97
doc: add cuda guide for fedora (#11135)
teihome Jan 9, 2025
c6860cc
SYCL: Refactor ggml_sycl_compute_forward (#11121)
qnixsynapse Jan 10, 2025
ee7136c
llama: add support for QRWKV6 model architecture (#11001)
MollySophia Jan 10, 2025
c3f9d25
Vulkan: Fix float16 use on devices without float16 support + fix subg…
0cc4m Jan 10, 2025
ff3fcab
convert : add --print-supported-models option (#11172)
danbev Jan 10, 2025
ba8a1f9
examples : add README.md to tts example [no ci] (#11155)
danbev Jan 10, 2025
2739a71
convert : sort print supported models [no ci] (#11179)
danbev Jan 11, 2025
c05e8c9
gguf-py: fixed local detection of gguf package (#11180)
VJHack Jan 11, 2025
afa8a9e
llama : add `llama_vocab`, functions -> methods, naming (#11110)
ggerganov Jan 12, 2025
08f10f6
llama : remove notion of CLS token (#11064)
ggerganov Jan 12, 2025
9a48399
llama : fix chat template gguf key (#11201)
ngxson Jan 12, 2025
924518e
Reset color before we exit (#11205)
ericcurtin Jan 12, 2025
1244cdc
ggml : do not define GGML_USE_CUDA when building with GGML_BACKEND_DL…
rgerganov Jan 13, 2025
8f70fc3
llama : remove 'd' from bad special token log (#11212)
danbev Jan 13, 2025
7426a26
contrib : add naming guidelines (#11177)
ggerganov Jan 13, 2025
00b4c3d
common : support tag-based --hf-repo like on ollama (#11195)
ngxson Jan 13, 2025
ca001f6
contrib : add naming guidelines (cont) (#11177)
ggerganov Jan 13, 2025
437e05f
server : (UI) Support for RTL text as models input or output (#11208)
ebraminio Jan 13, 2025
a29f087
contrib : add naming guidelines (cont) (#11177)
ggerganov Jan 13, 2025
39509fb
cuda : CUDA Graph Compute Function Refactor (precursor for performanc…
aendk Jan 13, 2025
84a4481
cli : auto activate conversation mode if chat template is available (…
ngxson Jan 13, 2025
504af20
server : (UI) Improve messages bubble shape in RTL (#11220)
ebraminio Jan 13, 2025
d00a80e
scripts : sync opencl
ggerganov Jan 14, 2025
48e1ae0
scripts : sync gguf
ggerganov Jan 14, 2025
a4f3f5d
scripts : sync gguf (cont)
ggerganov Jan 14, 2025
44d1e79
sync : ggml
ggerganov Jan 14, 2025
091592d
Refactor test-chat-template.cpp (#11224)
ochafik Jan 14, 2025
c5bf0d1
server : Improve code snippets direction between RTL text (#11221)
ebraminio Jan 14, 2025
bbf3e55
vocab : add dummy tokens for "no_vocab" type (#11231)
ggerganov Jan 14, 2025
b4d92a5
ci : add -no-cnv for tests (#11238)
ngxson Jan 14, 2025
f446c2c
SYCL: Add gated linear attention kernel (#11175)
qnixsynapse Jan 15, 2025
0ccd7f3
examples : add embd_to_audio to tts-outetts.py [no ci] (#11235)
danbev Jan 15, 2025
432df2d
RoPE: fix back, CUDA support for back + noncont. (#11240)
JohannesGaessler Jan 15, 2025
1d85043
fix: ggml: fix vulkan-shaders-gen build (#10448)
sparkleholic Jan 15, 2025
f11cfdf
ci : use -no-cnv in gguf-split tests (#11254)
ggerganov Jan 15, 2025
adc5dd9
vulkan: scale caching for k quants + misc fixes (#11081)
netrunnereve Jan 15, 2025
c67cc98
ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot (#11227)
fj-y-saito Jan 16, 2025
681149c
llama : add `llama_model_load_from_splits` (#11255)
ngxson Jan 16, 2025
9c8dcef
CUDA: backwards pass for misc. ops, add tests (#11257)
JohannesGaessler Jan 16, 2025
4dbc8b9
llama : add internlm3 support (#11233)
RunningLeon Jan 16, 2025
206bc53
vulkan: optimize coopmat2 q2_k dequant function (#11130)
jeffbolznv Jan 16, 2025
466300f
vulkan: optimize coopmat2 q4_k/q5_k dequant functions. (#11206)
jeffbolznv Jan 16, 2025
bd38dde
vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (#11…
jeffbolznv Jan 16, 2025
7a689c4
README : added kalavai to infrastructure list (#11216)
musoles Jan 17, 2025
960ec65
llama : fix deprecation message: vocabable -> vocab (#11269)
dwrensha Jan 17, 2025
a133566
vocab : fix double-eos check (#11273)
ggerganov Jan 17, 2025
667d728
rpc : early register backend devices (#11262)
rgerganov Jan 17, 2025
3edfa7d
llama.android: add field formatChat to control whether to parse speci…
codezjx Jan 17, 2025
44e18ef
vulkan: fix coopmat2 flash attention for non-contiguous inputs (#11281)
jeffbolznv Jan 18, 2025
6390a99
tts : add guide tokens support (#11186)
LostRuins Jan 18, 2025
f26c874
scripts : restore hf.sh (#11288)
ggerganov Jan 18, 2025
f30f099
server : implement cancellable request (#11285)
ngxson Jan 18, 2025
4dd34ff
cmake : add sanitizer flags for llama.cpp (#11279)
ggerganov Jan 18, 2025
a1649cc
Adding linenoise.cpp to llama-run (#11252)
ericcurtin Jan 18, 2025
99487b5
SYCL: Introducing memory host pool (#11251)
s-Nick Jan 19, 2025
b9daaff
simple-chat : fix BOS being added to each message (#11278)
ggerganov Jan 19, 2025
92bc493
tests : increase timeout when sanitizers are enabled (#11300)
ggerganov Jan 19, 2025
ae3c1db
llama : re-add LLM_ARCH_PHIMOE (#11305)
KyleBruene Jan 20, 2025
ef6dada
cont : fix whitespaces (#11305)
ggerganov Jan 20, 2025
ec7f3ac
llama : add support for Deepseek-R1-Qwen distill model (#11310)
ngxson Jan 20, 2025
a4251ed
cmake: fix shell command quoting in build-info script (#11309)
Xarbirus Jan 20, 2025
90d987b
mmap: add include for cerrno (#11296)
mascguy Jan 20, 2025
9f7add1
examples : fix add_special conditions (#11311)
ggerganov Jan 20, 2025
aea8ddd
vulkan: fix coopmat2 validation failures (#11284)
jeffbolznv Jan 20, 2025
80d0d6b
common : add -hfd option for the draft model (#11318)
ggerganov Jan 20, 2025
2139667
metal : fix out-of-bounds write (#11314)
ggerganov Jan 21, 2025
2e2f8f0
linenoise.cpp refactoring (#11301)
ericcurtin Jan 21, 2025
6da5bec
rpc : better caching of the base buffer pointer (#11331)
rgerganov Jan 21, 2025
e28245f
export-lora : fix tok_embd tensor (#11330)
ngxson Jan 21, 2025
6171c9d
Add Jinja template support (#11016)
ochafik Jan 21, 2025
3e3357f
llava : support Minicpm-omni (#11289)
tc-mb Jan 22, 2025
a94f3b2
`common`: utils to split / join / repeat strings (from json converter…
ochafik Jan 22, 2025
96f4053
Adding logprobs to /v1/completions (#11344)
jpodivin Jan 22, 2025
c64d2be
`minja`: sync at https://github.com/google/minja/commit/0f5f7f2b3770e…
ochafik Jan 22, 2025
12c2bdf
server : fix draft context not being released (#11354)
slaren Jan 22, 2025
16d3df7
readme : add plugin links (#11355)
ggerganov Jan 22, 2025
6152129
main : update README documentation for batch size (#11353)
slaren Jan 22, 2025
5245729
vulkan: fix diag_mask_inf (#11323)
jeffbolznv Jan 23, 2025
1971adf
vulkan: sort shaders for more deterministic binary (#11315)
jeffbolznv Jan 23, 2025
955a6c2
Vulkan-run-test: fix mmq_wg_denoms (#11343)
AMD-dwang Jan 23, 2025
f211d1d
Treat hf.co/ prefix the same as hf:// (#11350)
ericcurtin Jan 23, 2025
5845661
server : add more clean up when cancel_tasks is called (#11340)
ngxson Jan 23, 2025
f7fb43c
Add -ngl (#11372)
ericcurtin Jan 23, 2025
05f63cc
Update documentation (#11373)
ericcurtin Jan 23, 2025
564804b
tests: fix some mul_mat test gaps (#11375)
jeffbolznv Jan 23, 2025
c07e87f
server : (webui) put DeepSeek R1 CoT in a collapsible <details> eleme…
stduhpf Jan 24, 2025
01f37ed
Update llama-run README.md (#11386)
ericcurtin Jan 24, 2025
1af6945
cmake : avoid -march=native when reproducible build is wanted (#11366)
bmwiedemann Jan 24, 2025
8137b4b
CPU/CUDA: fix (GQA) mul mat back, add CUDA support (#11380)
JohannesGaessler Jan 24, 2025
a07c2c8
docs : Update readme to build targets for local docker build (#11368)
JafarAbdi Jan 24, 2025
9755129
release : pack /lib in the packages (#11392)
ggerganov Jan 24, 2025
9fbadae
rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (#11356)
IMbackK Jan 24, 2025
c5d9eff
CUDA: fix FP16 cuBLAS GEMM (#11396)
JohannesGaessler Jan 24, 2025
5f0db95
hip : Add hipGraph and VMM support to ROCM (#11362)
IMbackK Jan 24, 2025
466ea66
CANN: Add Ascend CANN build ci (#10217)
xuedinge233 Jan 24, 2025
00c24ac
ci : fix line breaks on windows builds (#11409)
ggerganov Jan 25, 2025
20a7581
docker : fix CPU ARM build (#11403)
slaren Jan 25, 2025
49b0e3c
server : fix cleaning up stream task (#11418)
ngxson Jan 25, 2025
6e264a9
docker : add GGML_CPU_ARM_ARCH arg to select ARM architecture to buil…
slaren Jan 25, 2025
ca6baf7
build: add /bigobj to MSVC build (#11407)
jeffbolznv Jan 25, 2025
26771a1
Hip: disable VMM on hip as it seams that it dosent work in some confi…
IMbackK Jan 25, 2025
4a75d19
vulkan: compile shaders on-demand (#11406)
jeffbolznv Jan 25, 2025
f35726c
build: apply MSVC /bigobj option to c/cpp files only (#11423)
jeffbolznv Jan 26, 2025
2cc9b8c
readme : update hot topics
ggerganov Jan 26, 2025
1d8ee06
rpc: fix register position (#11424)
thxCode Jan 26, 2025
19f6518
cmake: add ggml find package (#11369)
bandoti Jan 26, 2025
6f53d8a
docker: add missing vulkan library to base layer and update to 24.04 …
rare-magma Jan 26, 2025
178a7eb
metal : use residency sets (#11427)
ggerganov Jan 26, 2025
caf773f
docker : fix ARM build and Vulkan build (#11434)
ngxson Jan 26, 2025
acd38ef
metal: Handle null returned from MTLCreateSystemDefaultDevice() (#11441)
Jan 27, 2025
df984e0
llama: refactor llama_decode_impl (#11381)
JohannesGaessler Jan 27, 2025
a5203b4
llama : minor fixes for up llama load model speed (#11448)
lexasub Jan 27, 2025
d6d24cd
AMD: parse the architecture as supplied by gcnArchName (#11244)
Haus1 Jan 27, 2025
a4417dd
Add new hf protocol for ollama (#11449)
ericcurtin Jan 27, 2025
2b8525d
Handle missing model in CLI parameters for llama-run (#11399)
engelmi Jan 28, 2025
6e84b0a
SYCL : SOFTMAX F16 mask support and other fixes (#11261)
qnixsynapse Jan 28, 2025
f643120
docker: add perplexity and bench commands to full image (#11438)
rare-magma Jan 28, 2025
4bf3119
cmake : don't fail on `GGML_CPU=OFF` (#11457)
someone13574 Jan 28, 2025
d7d1ecc
docker: allow installing pip packages system-wide (#11437)
rare-magma Jan 28, 2025
7fee288
Add github protocol pulling and http:// (#11465)
ericcurtin Jan 28, 2025
cae9fb4
HIP: Only call rocblas_initialize on rocblas versions with the multip…
sARY77 Jan 28, 2025
be5ef79
HIP: Supress transformation warning in softmax.cu
IMbackK Jan 28, 2025
d0c0804
ci : fix build CPU arm64 (#11472)
ngxson Jan 28, 2025
cf8cc85
server : Fixed wrong function name in llamacpp server unit test (#11473)
peidaqi Jan 28, 2025
794fe23
cmake: add hints for locating ggml on Windows using Llama find-packag…
Emreerdog Jan 28, 2025
325afb3
llama: fix missing k_cache store for rwkv6qwen2 (#11445)
MollySophia Jan 29, 2025
b636228
embedding : enable --no-warmup option (#11475)
danbev Jan 29, 2025
d2e518e
ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. …
issixx Jan 17, 2025
1a0e87d
ggml : add option to not print stack on abort (ggml/1081)
WilliamTambellini Jan 23, 2025
8158577
sync : ggml
ggerganov Jan 29, 2025
f0d4b29
Parse https://ollama.com/library/ syntax (#11480)
ericcurtin Jan 29, 2025
2711d02
vulkan: Catch pipeline creation failure and print an error message (#…
jeffbolznv Jan 29, 2025
e51c47b
server : update auto gen files comments [no ci] (#11484)
danbev Jan 29, 2025
66ee4f2
vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360)
remyoudompheng Jan 29, 2025
eb7cf15
server : add /apply-template endpoint for additional use cases of Min…
pnb Jan 29, 2025
e044976
server : update json snippets in README.md [no ci] (#11492)
danbev Jan 30, 2025
7919256
readme : reference examples relative links (#11505)
guspan-tanadi Jan 30, 2025
496e5bf
server : (docs) added response format for /apply-template [no ci] (#1…
isaac-mcfadyen Jan 30, 2025
4314e56
server : use lambda instead of std::bind (#11507)
danbev Jan 30, 2025
ffd0821
vocab : correctly identify LF token for GPT-2 style BPE tokenizer (#1…
mgroeber9110 Jan 30, 2025
3d804de
sync: minja (#11499)
ochafik Jan 30, 2025
c300e68
CUDA/HIP: add warp_size to cuda_device_info
IMbackK Jan 29, 2025
6af1ca4
HIP: Prepare reduction operators for wave 64
IMbackK Jan 29, 2025
27d135c
HIP: require at least HIP 5.5
IMbackK Jan 29, 2025
8b576b6
Tool call support (generic + native for Llama, Functionary, Hermes, M…
ochafik Jan 30, 2025
553f1e4
`ci`: ccache for all github worfklows (#11516)
ochafik Jan 30, 2025
a2df278
server : update help metrics processing/deferred (#11512)
danbev Jan 31, 2025
1bd3047
common: Add missing va_end (#11529)
stevegrubb Jan 31, 2025
4a2b196
server : fix --jinja when there's no tools or schema (typo was forcin…
ochafik Jan 31, 2025
5783575
Fix chatml fallback for unsupported builtin templates (when --jinja n…
ochafik Jan 31, 2025
b1bcd30
fix stop regression (#11543)
ochafik Jan 31, 2025
a83f528
`tool-call`: fix llama 3.x and functionary 3.2, play nice w/ pydantic…
ochafik Jan 31, 2025
aa6fb13
`ci`: use sccache on windows instead of ccache (#11545)
ochafik Jan 31, 2025
5bbc736
ci: simplify cmake build commands (#11548)
ochafik Feb 1, 2025
ecef206
Implement s3:// protocol (#11511)
ericcurtin Feb 1, 2025
cfd74c8
`sync`: minja (https://github.com/google/minja/commit/418a2364b56dc9b…
ochafik Feb 1, 2025
53debe6
ci: use sccache on windows HIP jobs (#11553)
ochafik Feb 1, 2025
0cec062
llama : add support for GLM-Edge and GLM-Edge-V series models (#10573)
piDack Feb 2, 2025
ff22770
sampling : support for llguidance grammars (#10224)
mmoskal Feb 2, 2025
6980448
Fix exotic ci env that lacks ostringstream::str (#11581)
ochafik Feb 2, 2025
bfcce4d
`tool-call`: support Command R7B (+ return tool_plan "thoughts" in AP…
ochafik Feb 2, 2025
84ec8a5
Name colors (#11573)
ericcurtin Feb 2, 2025
864a0b6
CUDA: use mma PTX instructions for FlashAttention (#11583)
JohannesGaessler Feb 2, 2025
90f9b88
nit: more informative crash when grammar sampler fails (#11593)
ochafik Feb 2, 2025
4d0598e
HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectu…
IMbackK Feb 2, 2025
396856b
CUDA/HIP: add support for selectable warp size to mmv (#11519)
IMbackK Feb 2, 2025
6eecde3
HIP: fix flash_attn_stream_k_fixup warning (#11604)
JohannesGaessler Feb 2, 2025
d92cb67
server : (webui) Fix Shift+Enter handling (#11609)
mashdragon Feb 3, 2025
21c84b5
CUDA: fix Volta FlashAttention logic (#11615)
JohannesGaessler Feb 3, 2025
8ec0583
sync : ggml
ggerganov Feb 3, 2025
5598f47
server : remove CPPHTTPLIB_NO_EXCEPTIONS define (#11622)
danbev Feb 3, 2025
1d1e6a9
server : (webui) allow typing and submitting during llm response (#11…
woof-dog Feb 3, 2025
b345178
server : (webui) revert hacky solution from #11626 (#11634)
ngxson Feb 3, 2025
cde3833
`tool-call`: allow `--chat-template chatml` w/ `--jinja`, default to …
ochafik Feb 3, 2025
b34aedd
ci : do not stale-close roadmap issues
ggerganov Feb 4, 2025
8f8290a
cmake: Add ability to pass in GGML_BUILD_NUMBER (ggml/1096)
ckastner Feb 3, 2025
7c9e0ca
sync : ggml
ggerganov Feb 4, 2025
387a159
authors : update
ggerganov Feb 4, 2025
534c46b
metal : use residency set for other platforms (#11648)
jhen0409 Feb 4, 2025
f117d84
swift : fix llama-vocab api usage (#11645)
jhen0409 Feb 4, 2025
106045e
readme : add llm_client Rust crate to readme bindings (#11628)
ShelbyJenkins Feb 4, 2025
db288b6
`tool-call`: command r7b fix for normal responses (#11608)
ochafik Feb 4, 2025
1bef571
arg : list RPC devices first when using --list-devices (#11655)
rgerganov Feb 4, 2025
3962fc1
server : add try..catch to places not covered by set_exception_handle…
ngxson Feb 4, 2025
3ec9fd4
HIP: force max threads per block to be 1024 (#11621)
fxzjshm Feb 4, 2025
fd08255
CUDA: non-contiguous (RMS) norm support (#11659)
JohannesGaessler Feb 4, 2025
9f4cc8f
`sync`: minja (#11641)
ochafik Feb 5, 2025
1ec2080
llava: add quantization for the visual projector LLAVA, Qwen2VL (#11644)
samkoesnadi Feb 5, 2025
fa62da9
CUDA: support for mat. mul. with ne03 != ne13 (#11656)
JohannesGaessler Feb 5, 2025
d774ab3
metal : adjust support conditions for norm operators (#11671)
ggerganov Feb 5, 2025
c3db048
readme : add link to Autopen under UIs (#11684)
blackhole89 Feb 6, 2025
902368a
metal : avoid breaking build when metal API predates TARGET_OS_VISION…
charles-dyfis-net Feb 6, 2025
1b598b3
vulkan: use smaller combined allocations to avoid fragmentation (#11551)
jeffbolznv Feb 6, 2025
8a7e3bf
vulkan: initial support for IQ4_XS quantization (#11501)
remyoudompheng Feb 6, 2025
2c6c8df
vulkan: optimize coopmat2 iq2/iq3 callbacks (#11521)
jeffbolznv Feb 6, 2025
8d4d2be
ggml : fix LoongArch compile error with 128-bit SIMD (#11701)
junchao-loongson Feb 6, 2025
c0d4843
build : fix llama.pc (#11658)
angt Feb 6, 2025
9dd7a03
llama : add log about loading model tensors (#11699)
ggerganov Feb 6, 2025
194b2e6
SYCL: Adjust support condition for norm operators (#11674)
qnixsynapse Feb 6, 2025
9ab42dc
docs: update fedora cuda guide for 12.8 release (#11393)
teihome Feb 6, 2025
271e33f
Merge branch 'master' into main
apicalshark Feb 6, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
92 changes: 92 additions & 0 deletions .devops/cpu.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
ARG UBUNTU_VERSION=22.04

FROM ubuntu:$UBUNTU_VERSION AS build

ARG TARGETARCH

ARG GGML_CPU_ARM_ARCH=armv8-a

RUN apt-get update && \
apt-get install -y build-essential git cmake libcurl4-openssl-dev

WORKDIR /app

COPY . .

RUN if [ "$TARGETARCH" = "amd64" ]; then \
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=ON -DGGML_NATIVE=OFF -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON; \
elif [ "$TARGETARCH" = "arm64" ]; then \
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=ON -DGGML_NATIVE=OFF -DGGML_CPU_ARM_ARCH=${GGML_CPU_ARM_ARCH}; \
else \
echo "Unsupported architecture"; \
exit 1; \
fi && \
cmake --build build -j $(nproc)

RUN mkdir -p /app/lib && \
find build -name "*.so" -exec cp {} /app/lib \;

RUN mkdir -p /app/full \
&& cp build/bin/* /app/full \
&& cp *.py /app/full \
&& cp -r gguf-py /app/full \
&& cp -r requirements /app/full \
&& cp requirements.txt /app/full \
&& cp .devops/tools.sh /app/full/tools.sh

## Base image
FROM ubuntu:$UBUNTU_VERSION AS base

RUN apt-get update \
&& apt-get install -y libgomp1 curl\
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
&& find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
&& find /var/cache -type f -delete

COPY --from=build /app/lib/ /app

### Full
FROM base AS full

COPY --from=build /app/full /app

WORKDIR /app

RUN apt-get update \
&& apt-get install -y \
git \
python3 \
python3-pip \
&& pip install --upgrade pip setuptools wheel \
&& pip install -r requirements.txt \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
&& find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
&& find /var/cache -type f -delete

ENTRYPOINT ["/app/tools.sh"]

### Light, CLI only
FROM base AS light

COPY --from=build /app/full/llama-cli /app

WORKDIR /app

ENTRYPOINT [ "/app/llama-cli" ]

### Server, Server only
FROM base AS server

ENV LLAMA_ARG_HOST=0.0.0.0

COPY --from=build /app/full/llama-server /app

WORKDIR /app

HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]

ENTRYPOINT [ "/app/llama-server" ]
94 changes: 94 additions & 0 deletions .devops/cuda.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
ARG UBUNTU_VERSION=22.04
# This needs to generally match the container host's environment.
ARG CUDA_VERSION=12.6.0
# Target the CUDA build image
ARG BASE_CUDA_DEV_CONTAINER=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}

ARG BASE_CUDA_RUN_CONTAINER=nvidia/cuda:${CUDA_VERSION}-runtime-ubuntu${UBUNTU_VERSION}

FROM ${BASE_CUDA_DEV_CONTAINER} AS build

# CUDA architecture to build for (defaults to all supported archs)
ARG CUDA_DOCKER_ARCH=default

RUN apt-get update && \
apt-get install -y build-essential cmake python3 python3-pip git libcurl4-openssl-dev libgomp1

WORKDIR /app

COPY . .

RUN if [ "${CUDA_DOCKER_ARCH}" != "default" ]; then \
export CMAKE_ARGS="-DCMAKE_CUDA_ARCHITECTURES=${CUDA_DOCKER_ARCH}"; \
fi && \
cmake -B build -DGGML_NATIVE=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake --build build --config Release -j$(nproc)

RUN mkdir -p /app/lib && \
find build -name "*.so" -exec cp {} /app/lib \;

RUN mkdir -p /app/full \
&& cp build/bin/* /app/full \
&& cp *.py /app/full \
&& cp -r gguf-py /app/full \
&& cp -r requirements /app/full \
&& cp requirements.txt /app/full \
&& cp .devops/tools.sh /app/full/tools.sh

## Base image
FROM ${BASE_CUDA_RUN_CONTAINER} AS base

RUN apt-get update \
&& apt-get install -y libgomp1 curl\
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
&& find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
&& find /var/cache -type f -delete

COPY --from=build /app/lib/ /app

### Full
FROM base AS full

COPY --from=build /app/full /app

WORKDIR /app

RUN apt-get update \
&& apt-get install -y \
git \
python3 \
python3-pip \
&& pip install --upgrade pip setuptools wheel \
&& pip install -r requirements.txt \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
&& find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
&& find /var/cache -type f -delete


ENTRYPOINT ["/app/tools.sh"]

### Light, CLI only
FROM base AS light

COPY --from=build /app/full/llama-cli /app

WORKDIR /app

ENTRYPOINT [ "/app/llama-cli" ]

### Server, Server only
FROM base AS server

ENV LLAMA_ARG_HOST=0.0.0.0

COPY --from=build /app/full/llama-server /app

WORKDIR /app

HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]

ENTRYPOINT [ "/app/llama-server" ]
33 changes: 0 additions & 33 deletions .devops/full-cuda.Dockerfile

This file was deleted.

33 changes: 0 additions & 33 deletions .devops/full-musa.Dockerfile

This file was deleted.

50 changes: 0 additions & 50 deletions .devops/full-rocm.Dockerfile

This file was deleted.

38 changes: 0 additions & 38 deletions .devops/full.Dockerfile

This file was deleted.

Loading
Loading