Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama3 is not working. #36

Open
rayjang opened this issue Apr 26, 2024 · 1 comment
Open

llama3 is not working. #36

rayjang opened this issue Apr 26, 2024 · 1 comment

Comments

@rayjang
Copy link

rayjang commented Apr 26, 2024

I followed your direction like the below to apply selfextend to llama3
"""
[04/19/2024]:💡 We added the support for LLama-3 with transformers==4.40. To use it with transformers==4.40, you may change the file name of Llama_4_40.py to Llama.py to replace the existing patch file.
"""

I got this error.
"""

Exception Traceback (most recent call last)
Cell In[12], line 4
2 group_size = 5
3 window_size = 1024
----> 4 SelfExtend.apply(model, group_size, window_size, enable_flash_attention=True)#, flash_attention_impl='flash_attn')
5 model.eval()

File /home/ubuntu/reports/SelfExtend.py:109, in apply(loaded_model, group_size, window_size, enable_flash_attention, scale_base, flash_attention_impl)
107 print("Using triton flash self_extend!!")
108 if (not modifed):
--> 109 raise Exception(f"Failed to modify the attention method of {arch_name}")
110 else:
111 raise Exception(f"Need to set the flash_attention_impl to 'flash_attn' or 'triton'.")

Exception: Failed to modify the attention method of LlamaForCausalLM
"""

how to fix it?

@Mooler0410
Copy link
Collaborator

I followed your direction like the below to apply selfextend to llama3

"""

[04/19/2024]:💡 We added the support for LLama-3 with transformers==4.40. To use it with transformers==4.40, you may change the file name of Llama_4_40.py to Llama.py to replace the existing patch file.

"""

I got this error.

"""


Exception Traceback (most recent call last)

Cell In[12], line 4

  2 group_size = 5

  3 window_size = 1024

----> 4 SelfExtend.apply(model, group_size, window_size, enable_flash_attention=True)#, flash_attention_impl='flash_attn')

  5 model.eval()

File /home/ubuntu/reports/SelfExtend.py:109, in apply(loaded_model, group_size, window_size, enable_flash_attention, scale_base, flash_attention_impl)

107     print("Using triton flash self_extend!!")

108     if (not modifed):

--> 109 raise Exception(f"Failed to modify the attention method of {arch_name}")

110 else:

111     raise Exception(f"Need to set the flash_attention_impl to 'flash_attn' or 'triton'.")

Exception: Failed to modify the attention method of LlamaForCausalLM

"""

how to fix it?

This Exception is for the case: the targeted instance has no designated attention module within it. It may be caused by:
You load the model without flash attention but set enable_flash_attention = True, or the reverse case.

If possible, you may check the attention module's name by simple print(model) before calling SelfExtend.apply

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants