llama3 is not working. #36

rayjang · 2024-04-26T08:10:08Z

I followed your direction like the below to apply selfextend to llama3
"""
[04/19/2024]:💡 We added the support for LLama-3 with transformers==4.40. To use it with transformers==4.40, you may change the file name of Llama_4_40.py to Llama.py to replace the existing patch file.
"""

I got this error.
"""

Exception Traceback (most recent call last)
Cell In[12], line 4
2 group_size = 5
3 window_size = 1024
----> 4 SelfExtend.apply(model, group_size, window_size, enable_flash_attention=True)#, flash_attention_impl='flash_attn')
5 model.eval()

File /home/ubuntu/reports/SelfExtend.py:109, in apply(loaded_model, group_size, window_size, enable_flash_attention, scale_base, flash_attention_impl)
107 print("Using triton flash self_extend!!")
108 if (not modifed):
--> 109 raise Exception(f"Failed to modify the attention method of {arch_name}")
110 else:
111 raise Exception(f"Need to set the flash_attention_impl to 'flash_attn' or 'triton'.")

Exception: Failed to modify the attention method of LlamaForCausalLM
"""

how to fix it?

Mooler0410 · 2024-04-26T15:30:16Z

I followed your direction like the below to apply selfextend to llama3

"""

[04/19/2024]:💡 We added the support for LLama-3 with transformers==4.40. To use it with transformers==4.40, you may change the file name of Llama_4_40.py to Llama.py to replace the existing patch file.

"""

I got this error.

"""

Exception Traceback (most recent call last)

Cell In[12], line 4
  2 group_size = 5

  3 window_size = 1024
----> 4 SelfExtend.apply(model, group_size, window_size, enable_flash_attention=True)#, flash_attention_impl='flash_attn')
  5 model.eval()
File /home/ubuntu/reports/SelfExtend.py:109, in apply(loaded_model, group_size, window_size, enable_flash_attention, scale_base, flash_attention_impl)
107     print("Using triton flash self_extend!!")

108     if (not modifed):
--> 109 raise Exception(f"Failed to modify the attention method of {arch_name}")
110 else:

111     raise Exception(f"Need to set the flash_attention_impl to 'flash_attn' or 'triton'.")
Exception: Failed to modify the attention method of LlamaForCausalLM

"""

how to fix it?

This Exception is for the case: the targeted instance has no designated attention module within it. It may be caused by:
You load the model without flash attention but set enable_flash_attention = True, or the reverse case.

If possible, you may check the attention module's name by simple print(model) before calling SelfExtend.apply

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama3 is not working. #36

llama3 is not working. #36

rayjang commented Apr 26, 2024

Mooler0410 commented Apr 26, 2024

llama3 is not working. #36

llama3 is not working. #36

Comments

rayjang commented Apr 26, 2024

I got this error. """

Mooler0410 commented Apr 26, 2024

I got this error.
"""