Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example for phi2? #22

Closed
JoanZhou opened this issue Mar 7, 2024 · 7 comments
Closed

Example for phi2? #22

JoanZhou opened this issue Mar 7, 2024 · 7 comments

Comments

@JoanZhou
Copy link

JoanZhou commented Mar 7, 2024

Could you please release the example script for Phi-2? Thanks.

@Mooler0410
Copy link
Collaborator

Mooler0410 commented Mar 7, 2024

It's similar to llama or mistral / llama. Just use the function modify_method_of_instance (in modify_utils.py) to replace the forward method of Phi-2's attention with self-extend.

@JoanZhou
Copy link
Author

JoanZhou commented Mar 8, 2024

Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this:
image
Tried a few parameters:
group_size_1=4, group_size_2=1024
group_size_1=4, group_size_2=512

@Mooler0410
Copy link
Collaborator

Mooler0410 commented Mar 9, 2024

Thanks for the feedback!

As we commented on the patch for phi-2 on 4.37

" transfromers version 4.37 (a furture version)
Should work for 'microsoft/phi-2', a offical hf version of microsfot/phi-2, check the detail in Huggingface Hub.
It's dfferent from the previous version for 'susnato/phi-2', which is the default version in transformers 4.36.2 !
Haven't done comprehensive test, but it should work."

It may have some bugs. Will check it once I'm back from the spring break.

@Mooler0410
Copy link
Collaborator

Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this: image Tried a few parameters: group_size_1=4, group_size_2=1024 group_size_1=4, group_size_2=512

Hi, we found that Phi-2 it-self cannot do passkey retrieval well 🤣. Without SelfExtend, you may try to ask the vanilla Phi-2 to do a 1.5k passkey retrieval and it will fail like the example you provided.

SelfExtend just elicits a LLM's long capabilities and does not equip it with any new capability. This means, if the model cannot finish a task well within its own pretraining window, it cannot do it on longer contexts with SelfExtend.

With the existing patch for Phi-2, on other tasks, such as 'Needle in the haystack', SelfExtend works well. Have a try!

@Mooler0410 Mooler0410 pinned this issue Mar 24, 2024
@Mooler0410
Copy link
Collaborator

Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this: image Tried a few parameters: group_size_1=4, group_size_2=1024 group_size_1=4, group_size_2=512

Hi, I happened to test Phi-2 with 4.36.2 today. It seems that Phi-2 itself can do passkey with this version, while it cannot with transformers == 4.38.2. You may consider this.

@Mooler0410
Copy link
Collaborator

Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this: image Tried a few parameters: group_size_1=4, group_size_2=1024 group_size_1=4, group_size_2=512

Hi, I happened to test Phi-2 with 4.36.2 today. It seems that Phi-2 itself can do passkey with this version, while it cannot with transformers == 4.38.2. You may consider this.

This conclusion is misleading. It is one of the variants I tested works well with transformers==4.36.2.

I tested more variants of Phi-2, find that: with transformers==4.38.2, rhysjones/phi-2-orange-v2 works well, while "microsoft/phi-2" cannot work... It's super wired, considering that passkey retrieval is very simple.

All the tested models are vanilla (without SelfExtend). All the input sequences have a length of 1.5k, which is within its context window (2k).

@Mooler0410
Copy link
Collaborator

Thank you for the reply. I am using phi_self_extend_patch_4_37.py in transformers 4.37.1, and I got answer like this: image Tried a few parameters: group_size_1=4, group_size_2=1024 group_size_1=4, group_size_2=512

I believe this response will be the last one. I have figured out what happens. Phi-2 is sensitive to the prompt format. Different from other models, Phi-2 requires using the recommended template: "To encourage the model to write more concise answers, you can also try the following QA format using "Instruct: \nOutput:"(https://huggingface.co/microsoft/phi-2). With this template, now, with transfromers==4.38.2, Phi-2 can successfully do passkey retrieval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants