Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama3 instruct and chat system prompts #950

Merged
merged 14 commits into from
Jun 30, 2024
Merged

llama3 instruct and chat system prompts #950

merged 14 commits into from
Jun 30, 2024

Conversation

oktie
Copy link
Member

@oktie oktie commented Jun 25, 2024

Issue #945

I removed the formats.llama3_chat_with_system_prompt system prompt that was not being used anywhere, and also wasn't right (e.g. had \n between parts instead of \n\n).

I've updated formats.llama3_chat definition so:

  1. the demos also have the user/assistant prompts. I believe that is how we are supposed to provide examples.
  2. added a default DEFAULT_SYSTEM_PROMPT that is one that is being suggested/used for llama2, e.g. see: https://developer.ibm.com/tutorials/awb-prompt-engineering-llama-2/

I've added a formats.llama3_instruct that I believe more accurately reflects the official guidance https://huggingface.co/blog/llama3#how-to-prompt-llama-3 which states

The Instruct versions use the following conversation structure:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>

{{ user_msg_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{{ model_answer_1 }}<|eot_id|>

This format has to be exactly reproduced for effective use.

Note: the instruction is not a part of the demos. I believe a demo example-specific instruction can improve the prompt, but I don't think it's possible the way unitxt templates work now (instruction cannot be used in demo_format).

Copy link
Member

@eladven eladven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elron, The PR looks fine. Please add the json files and it will allow the consistency test to pass

Copy link
Member

@yoavkatz yoavkatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent. The system prompt needs to be defined outside. the format

"{instruction}{source}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
"{target_prefix}{target}<|eot_id|>",
model_input_format="<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n"
"{DEFAULT_SYSTEM_PRO#MPT}<|eot_id|>{demos}<|start_header_id|>user<|end_header_id|>"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

system_prompt should be taken from {system_prompt}, so it can be configured from the outside.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yoavkatz yes, this is how it is for llama3_instruct. For llama3_chat it was defined without a system prompt in the past, and according to the guidelines that is not a good idea. So I thought it makes sense to add a default system prompt so if the user does not define one, we at least have a generic one. Does this make sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just put the system prompt there instead of using this DEFAULT_SYSTEM_PROMPT constant? I tested it on my side, and it worked, but I am not sure if it will work if unitxt is installed as a library.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the recommended approach is to have system_prompt, then llama3_chat should have system prompt, but this should be configurable via {system_prompt}.

By default, it should be format=formats.llama3_chat,system_prompt=system_prompt.llama3.

The users can always pass system_prompt.empty if they don't want system prompt.

If there is any need to format without |system| special token, there should be a format which is llama3_chat_without_system_prompt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, formats.llama3_chat_with_system_prompt was used in the internal project that uses unitxt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the instruction and default behavior to the system prompt. I created under examples , simple code that we can all run and review the output:

python examples/evaluate_different_formats.py

source (str):
    <|begin_of_text|><|start_header_id|>system<|end_header_id|>
    
    You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethica
    l, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
    
    
    If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a quest
    ion, please don't share false information.
    Answer the following questions in one word.<|eot_id|><|start_header_id|>user<|end_header_id|>
    
    Where is Spain?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
    
    Europe<|eot_id|><|start_header_id|>user<|end_header_id|>
    
    When was IBM founded?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
    
    1911<|eot_id|><|start_header_id|>user<|end_header_id|>
    
    What is the capital of Texas?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

template = InputOutputTemplate(
instruction="Answer the following questions in one word.",
input_format="{question}",
output_format="{answers}",
postprocessors=["processors.lower_case"],
)

dataset = load_dataset(
card=card,
template=template,
format="formats.llama3_chat",
system_prompt="system_prompts.models.llama2",
num_demos=2,
demos_pool_size=3,
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yoavkatz I was also working on a fix and testing it, but yours looks good to me, and I just tested it on a task and it works well (with llama3-70b-instruct I get a much better score with llama3 format).

The only thing is that now llama3_chat and llama3_instruct are the same. I suggest removing llama3_chat and updating all the references to it. Do you want me to do it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If they are the same we can keep just one. I just wonder about about the alternative format where the instruction and demos are in the same user prompt as the question. Granite recommended icl format is this way.

Copy link
Member Author

@oktie oktie Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yoavkatz okay I just replaced llama_chat with llama_instruct anywhere it was used.

I also experimented with two alternatives. An alt1 option where the instruction goes to the user role prompt, and an alt2 option where the demos all go into one user prompt, as you had suggested. Numbers I get on two example tasks, each with llama-3-70b-instruct and 3-shot prompts:

Task Option Score
cards.boolq.classification formats.llama3_instruct 0.9119
cards.boolq.classification formats.llama3_instruct_alt1 0.9081
cards.boolq.classification formats.llama3_instruct_alt2 0.89
cards.summarize_from_human_feedback no format 0.64
cards.summarize_from_human_feedback formats.llama3_instruct 0.68
cards.summarize_from_human_feedback formats.llama3_instruct_alt1 0.67
cards.summarize_from_human_feedback formats.llama3_instruct_alt2 0.56

So it seems like alt1 is not making much of a difference, but alt2 is worse. I believe we have the right prompt now. I suggest approving this PR and merging for now. I'll be doing more testing and will let you know if I find ways to improve the format.

Copy link
Member

@yoavkatz yoavkatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approved. Make sure the old jsons are deleted.
Also, given the results maybe we don't need alt1 and alt2 in the catalog. If we keep then, let's give them descriptive names (e.g formats.llama3_instruct_all_demos_in_a_single_turn".)

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
and show confidence internvals

Signed-off-by: Yoav Katz <katz@il.ibm.com>
large model inference

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
@yoavkatz yoavkatz merged commit c10f4ba into main Jun 30, 2024
8 checks passed
@yoavkatz yoavkatz deleted the llama3_prompt_fix branch June 30, 2024 12:49
gitMichal pushed a commit that referenced this pull request Jul 15, 2024
* llama3 instruct and chat system prompts

* fixing llama3_chat prompt + adding jsons

* fixing llama3 formats + a boolqa system prompt

* Start of example to check different formats.

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Added instructions to llama3 model.

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* replacing llama3_chat with llama3_instruct + 2 alts

* Added example for multiple formats

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* prepare artifcats (whitespace changes)

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Updated evaluation to use more examples

and show confidence internvals

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Do not run examples that require
large model inference

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Seperated examples in example table

Signed-off-by: Yoav Katz <katz@il.ibm.com>

* Fixed test_examples to skip some files.

Signed-off-by: Yoav Katz <katz@il.ibm.com>

---------

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Co-authored-by: Yoav Katz <katz@il.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants