Releases · ggerganov/llama.cpp

29 May 03:17

504f0c3

b3030 Latest

Latest

ggml : fix typo in ggml.c (#7603)

Assets 21

cudart-llama-bin-win-cu11.7.1-x64.zip

293 MB 2024-05-29T03:17:46Z
cudart-llama-bin-win-cu12.2.0-x64.zip

413 MB 2024-05-29T03:17:52Z
llama-b3030-bin-macos-arm64.zip

41.9 MB 2024-05-29T03:18:00Z
llama-b3030-bin-macos-x64.zip

38.5 MB 2024-05-29T03:18:01Z
llama-b3030-bin-ubuntu-x64.zip

46.5 MB 2024-05-29T03:18:02Z
llama-b3030-bin-win-avx-x64.zip

6.66 MB 2024-05-29T03:18:03Z
llama-b3030-bin-win-avx2-x64.zip

6.64 MB 2024-05-29T03:18:04Z
llama-b3030-bin-win-avx512-x64.zip

6.66 MB 2024-05-29T03:18:05Z
llama-b3030-bin-win-clblast-x64.zip

7.84 MB 2024-05-29T03:18:05Z
llama-b3030-bin-win-cuda-cu11.7.1-x64.zip

65.1 MB 2024-05-29T03:18:06Z
Source code (zip)

2024-05-29T02:09:31Z
Source code (tar.gz)

2024-05-29T02:09:31Z

28 May 23:45

github-actions

b3029

b864b50

b3029

[SYCL] Align GEMM dispatch (#7566)

* align GEMM dispatch

Assets 21

28 May 21:58

github-actions

b3028

02c1eca

b3028

Tokenizer WPM fixes (#7500)

* Update random test: add_bos_token.
* Update random test: add WPM models for testing.
* Build vocab.special_tokens_cache using vocab token types.
* Fix and improve WPM preprocessing.
  - Fix unicode edge case combinations.
  - Split by whitspace in the same pass.
* Discard all tokens when no matching found.

Assets 21

28 May 21:47

github-actions

b3027

6bd12ce

b3027

sycl : fix assert (#7563)

Assets 21

28 May 21:19

github-actions

b3026

5442939

b3026

llama : support small Granite models (#7481)

* Add optional MLP bias for Granite models

Add optional MLP bias for ARCH_LLAMA to support Granite models.
Partially addresses ggerganov/llama.cpp/issues/7116
Still needs some more changes to properly support Granite.

* llama: honor add_space_prefix from the model configuration

propagate the add_space_prefix configuration from the HF model
configuration to the gguf file and honor it with the gpt2 tokenizer.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

* llama: add support for small granite models

it works only for the small models 3b and 8b.

The convert-hf-to-gguf.py script uses the vocabulary size of the
granite models to detect granite and set the correct configuration.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

---------

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Co-authored-by: Steffen Roecker <sroecker@redhat.com>

Assets 21

28 May 19:50

github-actions

b3025

56411a9

b3025

vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE …

…(#7552)

Assets 21

28 May 18:37

github-actions

b3024

2b737ca

b3024

rpc : resource management rework (#7562)

* rpc : resource management rework

* address review comments

Assets 21

28 May 17:41

github-actions

b3023

ee3dff6

b3023

Add support for DeepseekV2ForCausalLM (#7519)

* common : increase max number of experts to 160

* common : add tensors ATTN_Q_A, ATTN_Q_A_NORM, ATTN_Q_B, ATTN_KV_A_MQA, ATTN_KV_A_NORM, ATTN_KV_B needed by DeepSeek-V2 MLA (multi-head latent attention) architecture

* common : add model header parameters: leading_dense_block_count, expert_feed_forward_length, expert_shared_count, expert_weights_scale, attention.q_lora_rank, attention.kv_lora_rank, rope.scaling.yarn_log_multiplier

* convert-hf : add model conversion support for DeepseekV2ForCausalLM

* llama : add model types for DeepSeek-V2 and DeepSeek-V2-Lite models

* llama : add two new llm_build_moe_ffn() arguments: scale_w (whether to scale weights of selected MoE experts) and w_scale (numerical value of the scaling factor)

* llama : add inference support for LLM_ARCH_DEEPSEEK2

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

Assets 21

28 May 15:55

github-actions

b3021

8b99e2a

b3021

llama : handle unknown utf8 bytes (#7588)

Assets 21

28 May 12:04

github-actions

b3019

e2b0650

b3019

[SYCL]fix ggml_sycl_mul_mat_id() to match the change of api (#7436)

* fix mul_mat_id to match the change of api

* rm comment

* rm unused or duplicated code, rename as review comment

Assets 21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b3030

b3029

b3028

b3027

b3026

b3025

b3024

b3023

b3021

b3019