Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guide update #830

Merged
merged 2 commits into from
May 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 21 additions & 27 deletions docs/source/examples/customize_conversation_template.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,36 @@
# Customize Conversation Template

```{admonition} **Work in Progress**
:class: info

We are rapidly working on this page.
```

> For beginners: Why template?
> Almost all LLMs today do a simple job - predict the next "word". To make the interaction between user and model smoother, developers use tricks: they add special "words" to the input text (at back-end, thus invisible to the user when using services like ChatGPT) to "tell" the model what user had said before, and ask the model to respond like an assistant. These "hidden words" are called "template".

We provide the flexibility to customize the conversation template. You can customize your own conversation template by following the steps below:

## Knowing the conversation template of your model

The conversation template varies according to the model you are using. For example:
### 1. Decompose your conversations
Say you want to make the conversations between user and assistant look like:

The template for Llama-2 looks like:
```
<s>[INST] <<SYS>>\n{{system_message}}\n<</SYS>>\n\n{{user_message_0}} [/INST] {{assistant_reply_0}}</s>
```
<bos>System:
You are a chatbot developed by LMFlow team.

Find more templates [here](./supported_conversation_template.md).
User:
Who are you?

Assistant:
I am a chatbot developed by LMFlow team.<eos>

## Make your own template
User:
How old are you?

`TemplateComponent`s to a conversation template is just like bricks to a LEGO house. You can build your own template by combining different components.
Assistant:
I don't age like humans do. I exist as a piece of software, so I don't have a concept of age in the traditional sense.<eos>
```

The following provides an example of building a conversation template for the ChatML format:
It is easy to abstract the format for each message:
- System message: `System:\n{{content}}\n\n`
- User message: `User:\n{{content}}\n\n`
- Assistant message: `Assistant:\n{{content}}\n\n<eos>`

### 1. Decompose the official template
The official template looks like:
```
<|im_start|>system\n{{system_message}}<|im_end|>\n<|im_start|>user\n{{user_message_0}}<|im_end|>\n<|im_start|>assistant\n{{assistant_reply_0}}<|im_end|>\n<|im_start|>user\n{{user_message_1}}<|im_end|>\n<|im_start|>assistant\n{{assistant_reply_1}}<|im_end|>\n
```
It is easy to recognize the format for each message:
- System message: `<|im_start|>system\n{{system_message}}<|im_end|>\n`
- User message: `<|im_start|>user\n{{user_message}}<|im_end|>\n`
- Assistant message: `<|im_start|>assistant\n{{assistant_reply}}<|im_end|>\n`
Also, we have a bos token at the beginning of the conversation session.

### 2. Choose proper `Formatter`
Recall the requirements for a conversation dataset:
Expand Down Expand Up @@ -87,9 +80,10 @@ YOUR_TEMPLATE = ConversationTemplate(
# v.s.
# llama-3: <|begin_of_text|> only at the beginning of a session
special_starter=TemplateComponent(type='token', content='bos_token'),
# Similar to the special starter...
special_stopper=TemplateComponent(type='token', content='eos_token')

# Similar to the special starter... (just for illustration, commented out
# since it is not necessary for our purposed template above)
# special_stopper=TemplateComponent(type='token', content='eos_token')
)
```

Expand Down
67 changes: 43 additions & 24 deletions docs/source/examples/reward_modeling.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Reinforcement Learning from Human Feedback (RLHF) requires a reward function to

Prompt:

" Human: What kind of noises did dinosaurs make? Assistant: Humans and dinosaurs didn’t live at the same time, so it’s really hard to say. The best place to find out what noises dinosaurs made would be Human: yes they did Assistant: to guess, and that would probably require lots of reading and a certain amount of imagination, so we’re not really prepared to do that. Human: you cant read Assistant:
"Human: What kind of noises did dinosaurs make? Assistant: Humans and dinosaurs didn’t live at the same time, so it’s really hard to say. The best place to find out what noises dinosaurs made would be Human: yes they did Assistant: to guess, and that would probably require lots of reading and a certain amount of imagination, so we’re not really prepared to do that. Human: you cant read Assistant:

Chosen response: "You can read?"

Expand All @@ -19,37 +19,56 @@ As an example, we prepare 10K sft training samples, 12K reward modeling samples
We prepare the dataset used for supervised finetuning by adding a prefix to the Human and Assistant inputs to prompt model responses and simplify post-processing. Here is an example of a two-sample dataset to illustrate this.


```sh
{"type": "text_only", "instances": [{"text": "###Human: Should you buy a case to protect your cell phone?###Assistant: It depends on your circumstances. If you carry your phone in a pocket or a purse then you probably want a case. But if you only need a phone for quick interactions, a case may actually cause more harm than good. What do you need the phone for? Are you a parent, or do you work from home?###Human: What harm could it do?###Assistant: A phone case can damage the screen, for one thing. It can also get you in trouble if you have your phone turned off for some reason. Then you will turn it back on and it won’t do anything. If you can afford to replace it, then you need a case to protect it. The problem is that most people aren’t able to afford to replace their phones all the time.###Human: Thanks for letting me know.###Assistant: You’re welcome."}, {"text": "###Human: I'm trying to learn about the salam witch trials###Assistant: If you’re looking for books about witchcraft trials, I can recommend some books for you. But you probably mean you’re looking for more historical information about the Salem witch trials in 1692, and specifically about the salam witch trials in 1692?###Human: What are some good books to learn about the salam witch trials###Assistant: What would you like to learn? If you’re interested in history, one of the best books is The Witch Persecutions in Colonial America: A History. If you’re interested in witchcraft as a cultural phenomenon, you might enjoy two excellent books: Religion and the Decline of Magic: Studies in Popular Beliefs in Sixteenth- and Seventeenth-Century England by Keith Thomas and Magic, Witchcraft, and the Otherworld: An Anthropology of Superstition by Jack Goody. If you’re interested in history specifically as it relates to religion, you might enjoy The Popish Plot, or Prelates' Plot: A History of the Popish Plot in England, by K. J. Everett."}]}
```json
{
"type": "conversation",
"instances": [
{"messages": [
{"role": "user", "content": "Instruction: The sentence you are given might be too wordy, complicated, or unclear. Rewrite the sentence and make your writing clearer by keeping it concise. Whenever possible, break complex sentences into multiple sentences and eliminate unnecessary words. If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know."},
{"role": "assistant", "content": "If you have any questions about my rate or find it necessary to increase or decrease this project's scope, please let me know."}
]},
{"messages": [
{"role": "user", "content": "Instruction: Rewrite the given text and correct grammar, spelling, and punctuation errors. If you'd told me year ago that today I would finish a marathon, I would of laughed. Your support had a huge affect on me!"},
{"role": "assistant", "content": "If you'd told me a year ago that today I would finish a marathon, I would have laughed. Your support had a huge effect on me!"}
]}
]
}
```

You can edit /scripts/run_finetune.sh and choose the model you want to use by modify "--model_name_or_path", and also modify the "dataset_path" so that the used dataset is the above dataset. For instance, you can use the model EleutherAI/gpt-neo-1.3B, and set the dataset as:

```sh
dataset_path=${project_dir}/data/hh_rlhf/sft
```

Then, we can run the following command to perform SFT.

```sh
./scripts/run_finetune.sh
```

You can also use lora training by the following command but also need to set the model_name_or_path and dataset appropriately by editing run_finetune_with_lora.sh .

```sh
./scripts/run_finetune_with_lora.sh
```
See [Finetuning (Full)](../../../README.md#finetuning-full), [Finetuning (LISA)](../../../README.md#finetuning-lisa), and [Finetuning (LoRA)](../../../README.md#finetuning-lora) for more details on the finetuning process.

## Step 2 Reward Modeling

**Data Preparation** Reward modeling involves ranking responses based on the given prompt, typically requiring human labeling. Fortunately, we already have two responses for each prompt in this example, with one preferred over the other. To create a new dataset, we modify the original dataset to include two keys, "positive" and "negative", with the former being the preferred response. Here is an example of the dataset, consisting of two samples.

```sh
```json
{"instances": [
{"positive": "###Human: Are space heaters safe?###Assistant: What do you mean? Can you give me more details?###Human: I mean, can they be a fire hazard?###Assistant: I don’t know about that. Can you tell me more about the situation? Are you using the heater on a fireplace? Do you have any other heater in the room? What materials are in the room? Do you feel any heat from the heater? How are you touching the heater?###Human: Can you just tell me if there are any situations where they could be a fire hazard?###Assistant: The answer is “it depends”. It would depend on the specific heater you are using, how you are using it, what kind of fuel it uses, what materials you have in the room, and what the surroundings are like. You’d also have to make sure you’re using it correctly. Is it switched off? Is it safe to touch? It’s not the sort of thing I could just answer by looking at the text you provided.###Human: Thank you. That information is helpful.###Assistant: Glad to be helpful. I’m always happy to assist.", "negative": "###Human: Are space heaters safe?###Assistant: What do you mean? Can you give me more details?###Human: I mean, can they be a fire hazard?###Assistant: I don’t know about that. Can you tell me more about the situation? Are you using the heater on a fireplace? Do you have any other heater in the room? What materials are in the room? Do you feel any heat from the heater? How are you touching the heater?###Human: Can you just tell me if there are any situations where they could be a fire hazard?###Assistant: The answer is “it depends”. It would depend on the specific heater you are using, how you are using it, what kind of fuel it uses, what materials you have in the room, and what the surroundings are like. You’d also have to make sure you’re using it correctly. Is it switched off? Is it safe to touch? It’s not the sort of thing I could just answer by looking at the text you provided.###Human: Thank you. That information is helpful.###Assistant: You’re welcome. I’m glad to help. Is there anything else you want to know?"},
{"positive": "###Human: how do i teach my kids to make forts at home?###Assistant: Do you mean you want to teach them to build a simple structure they can hide in? Or something more elaborate? I don’t know exactly what you mean by a “fort”, but maybe this gives you some ideas?", "negative": "###Human: how do i teach my kids to make forts at home?###Assistant: There are many ways to make forts! The simplest might be to just pile up all the furniture in one room. This works if you have a single room in your house, although it might still be a bit challenging to build a taller and sturdier fort this way. The best way to build forts is to start with something easy, and then have fun exploring how you can improve it."}
]}
{
"chosen": [
{"content": "Please identify the top 5 rarest animals in the world.", "role": "user" },
{"content": "Do you mean animals that are really rare, or rare relative to the size of the human population?", "role": "assistant" },
{"content": "The ones that are really rare.", "role": "user" },
{"content": "I’m not sure I can answer this question very easily. I’ll try to give you a basic overview of animals that are rare, and then you can tell me if you think that’s good enough. \n\nIf we were to measure animal populations by the area of land they occupy, the number 1 animal would probably be the Giant Panda. Panda populations are generally estimated to be about 2.1 million Pandas in the wild, which is a pretty small number of animals when we’re thinking about Earth’s entire surface area. The number 2 animal would probably be the Great Auk. Great Auks used to live in North America, and their total population was probably in the low hundreds, but they are now extinct.", "role": "assistant"}
],
"rejected": [
{"content": "Please identify the top 5 rarest animals in the world.", "role": "user"},
{"content": "Do you mean animals that are really rare, or rare relative to the size of the human population?", "role": "assistant"},
{"content": "The ones that are really rare.", "role": "user"},
{"content": "Alright, here’s what I found:", "role": "assistant"}
]
},
{
"chosen": [
{"content": "How do I teach my kids to make forts at home?", "role": "user"},
{"content": "Do you mean you want to teach them to build a simple structure they can hide in? Or something more elaborate? I don’t know exactly what you mean by a “fort”, but maybe this gives you some ideas?", "role": "assistant"}
],
"rejected": [
{"content": "How do I teach my kids to make forts at home?", "role": "user"},
{"content": "There are many ways to make forts! The simplest might be to just pile up all the furniture in one room. This works if you have a single room in your house, although it might still be a bit challenging to build a taller and sturdier fort this way. The best way to build forts is to start with something easy, and then have fun exploring how you can improve it.", "role": "assistant"}
]
}
]
}
```

To start from a model from a previous sft step, you may edit the run_reward_modeling.sh script and update the "dataset_path" to use the desired dataset. Additionally, you can modify the validation_split_percentage parameter to select the last percentage of samples for evaluation. The load_dataset function splits the dataset into training and evaluation sets, which can also be customized by editing the function in /examples/run_reward_modeling.py if you want to prepare your own dataset when running the script.
Expand Down
Loading