Fix: Propagate flash attn to model loader by dthuerck · Pull Request #1424 · abetlen/llama-cpp-python

dthuerck · 2024-05-03T05:39:31Z

I noticed that even though setting flash_attn to true in my model config file, llama.cpp kept reporting llama_new_context_with_model: flash_attn = 0. This super-small PR fixes that - turns out the setting wasn't passed on to the model loader.

abetlen · 2024-05-03T16:16:51Z

@dthuerck thank you!

BadisG · 2024-05-08T15:12:58Z

I installed the latest version of llama_cpp_python (0.2.70) with this command:

pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

But after using it on oobabooga's software (llama_cpp_hf), I still have this flash_attn = 0 issue:

llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 2048
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 8000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      CUDA0 KV buffer size =  1728.00 MiB
llama_kv_cache_init:      CUDA1 KV buffer size =   832.00 MiB
llama_new_context_with_model: KV self size  = 2560.00 MiB, K (f16): 1280.00 MiB, V (f16): 1280.00 MiB
llama_new_context_with_model:  CUDA_Host  output buffer size =     0.98 MiB
llama_new_context_with_model: pipeline parallelism enabled (n_copies=4)
llama_new_context_with_model:      CUDA0 compute buffer size =   400.01 MiB
llama_new_context_with_model:      CUDA1 compute buffer size =   596.02 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =    32.02 MiB
llama_new_context_with_model: graph nodes  = 1208
llama_new_context_with_model: graph splits = 3

Propagate flash attn to model load.

3c402c5

abetlen merged commit 2138561 into abetlen:main May 3, 2024

BadisG mentioned this pull request May 23, 2024

"flash_attn = 0" is still still present on the newest llama-cpp-python versions #1479

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Propagate flash attn to model loader#1424

Fix: Propagate flash attn to model loader#1424
abetlen merged 1 commit intoabetlen:mainfrom
dthuerck:propagate-flash-attn

dthuerck commented May 3, 2024

Uh oh!

abetlen commented May 3, 2024

Uh oh!

BadisG commented May 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dthuerck commented May 3, 2024

Uh oh!

abetlen commented May 3, 2024

Uh oh!

BadisG commented May 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants