Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add chat template #1873

Merged
merged 30 commits into from
Jun 3, 2024
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
62df55d
initial chat template
KonradSzafer May 8, 2024
f4902e0
tokenizer attribute check
KonradSzafer May 8, 2024
4b790fa
variable rename
KonradSzafer May 8, 2024
cd9e454
interface update
KonradSzafer May 8, 2024
9dfb58a
system instruction
KonradSzafer May 12, 2024
3369f88
system inst default update
KonradSzafer May 14, 2024
921c4d6
fewshot as multiturn
KonradSzafer May 14, 2024
a4bc484
typing update
KonradSzafer May 14, 2024
d01032d
indent update
KonradSzafer May 14, 2024
8a0ce59
added comments
KonradSzafer May 14, 2024
9bd948d
Merge branch 'main' into chat_template
KonradSzafer May 22, 2024
691e0c0
Adding a fewshot in a more readable way
KonradSzafer May 22, 2024
1162e34
linting
KonradSzafer May 22, 2024
c370665
Moved apply chat template to LM
KonradSzafer May 29, 2024
899a544
multiturn alternation fix
KonradSzafer May 30, 2024
f8771d2
cache key update
KonradSzafer May 30, 2024
52df595
apply chat template method fix
KonradSzafer May 30, 2024
615352c
add system prompt hash to cache_key
KonradSzafer May 30, 2024
d7b8fd9
tokenizer name property for cache_key
KonradSzafer May 30, 2024
6f76522
property name fix
KonradSzafer May 30, 2024
4b0c49a
linting backward compatibility fix
KonradSzafer May 31, 2024
dca730a
docs and errors update
KonradSzafer May 31, 2024
a6d3c05
add documentation on adding chat template compatibility to model_guide
haileyschoelkopf May 31, 2024
16715f2
fewshot as multiturn check fix
KonradSzafer May 31, 2024
0ee30f1
Merge pull request #9 from EleutherAI/chat_template
KonradSzafer May 31, 2024
8ed9d77
saving system inst and chat template in results
KonradSzafer Jun 3, 2024
222dae3
eval tracker update
KonradSzafer Jun 3, 2024
2db5209
docs update
KonradSzafer Jun 3, 2024
54ef077
merge main
KonradSzafer Jun 3, 2024
4bcd0ae
Apply suggestions from code review
clefourrier Jun 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/interface.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,12 @@ This mode supports a number of command-line arguments, the details of which can

- `--include_path` : Accepts a path to a folder. If passed, then all YAML files containing `lm-eval` compatible task configurations will be added to the task registry as available tasks. Used for when one is writing config files for their own task in a folder other than `lm_eval/tasks/`.

- `--system_instruction`: Specifies a system instruction string to prepend to the prompt.

- `--apply_chat_template` : If this flag is on, a chat template will be applied to the prompt. For Hugging Face models, the chat template is taken from the tokenizer, if the tokenizer does not have a chat template, a default one will be applied. For other models, a generic chat template is used.
KonradSzafer marked this conversation as resolved.
Show resolved Hide resolved

- `--fewshot_as_multiturn` : If this flag is on, the Fewshot examples are treated as a multi-turn conversation. Questions are provided as user content and answers are provided as assistant responses. Requires `--num_fewshot` to be set to be greater than 0, and `--apply_chat_template` to be on.

- `--predict_only`: Generates the model outputs without computing metrics. Use with `--log_samples` to retrieve decoded results.

* `--seed`: Set seed for python's random, numpy and torch. Accepts a comma-separated list of 3 values for python's random, numpy, and torch seeds, respectively, or a single integer to set the same seed for all three. The values are either an integer or 'None' to not set the seed. Default is `0,1234,1234` (for backward compatibility). E.g. `--seed 0,None,8` sets `random.seed(0)` and `torch.manual_seed(8)`. Here numpy's seed is not set since the second value is `None`. E.g, `--seed 42` sets all three seeds to 42.
Expand Down
26 changes: 26 additions & 0 deletions lm_eval/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,24 @@ def setup_parser() -> argparse.ArgumentParser:
default=False,
help="If True, write out all model outputs and documents for per-sample measurement and post-hoc analysis. Use with --output_path.",
)
parser.add_argument(
"--system_instruction",
type=str,
default=None,
help="System instruction to be used in the prompt",
)
parser.add_argument(
"--apply_chat_template",
action="store_true",
default=False,
help="If True, applies the chat template to the prompt",
)
parser.add_argument(
"--fewshot_as_multiturn",
action="store_true",
default=False,
help="If True, uses the fewshot as a multi-turn conversation",
)
parser.add_argument(
"--show_config",
action="store_true",
Expand Down Expand Up @@ -270,6 +288,11 @@ def cli_evaluate(args: Union[argparse.Namespace, None] = None) -> None:
"Specify --output_path if providing --log_samples or --predict_only"
)

if args.fewshot_as_multiturn and args.apply_chat_template is False:
raise ValueError(
"If fewshot_as_multiturn is set, apply_chat_template must be set to True."
)

if args.include_path is not None:
eval_logger.info(f"Including path: {args.include_path}")
task_manager = TaskManager(args.verbosity, include_path=args.include_path)
Expand Down Expand Up @@ -357,6 +380,9 @@ def cli_evaluate(args: Union[argparse.Namespace, None] = None) -> None:
check_integrity=args.check_integrity,
write_out=args.write_out,
log_samples=args.log_samples,
system_instruction=args.system_instruction,
apply_chat_template=args.apply_chat_template,
fewshot_as_multiturn=args.fewshot_as_multiturn,
gen_kwargs=args.gen_kwargs,
task_manager=task_manager,
verbosity=args.verbosity,
Expand Down
96 changes: 69 additions & 27 deletions lm_eval/api/samplers.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,37 +35,79 @@ def get_context(self, doc, num_fewshot):
# TODO: should we just stop people from using fewshot from same split as evaluating?
selected_docs = [x for x in fewshotex if x != doc][:num_fewshot]

labeled_examples = (
self.fewshot_delimiter.join(
[
# TODO: is separating doc_to_text and doc_to_target by one space always desired?
(
self.doc_to_text(doc)
if (
self.config.doc_to_choice is None
or isinstance(self.doc_to_text(doc), str)
)
else self.doc_to_choice(doc)[self.doc_to_text(doc)]
)
+ self.target_delimiter
+ (
str(self.doc_to_target(doc)[0])
if isinstance(self.doc_to_target(doc), list)
else self.doc_to_target(doc)
if (
self.config.doc_to_choice is None
or isinstance(self.doc_to_target(doc), str)
)
else str(self.doc_to_choice(doc)[self.doc_to_target(doc)])
)
for doc in selected_docs
]
labeled_examples = ""
for doc in selected_docs:
doc_content = self.doc_to_text(doc)
doc_target = self.doc_to_target(doc)
labeled_examples += (
doc_content
if self.config.doc_to_choice is None or isinstance(doc_content, str)
else self.doc_to_choice(doc)[doc_content]
)
+ self.fewshot_delimiter
)
labeled_examples += self.target_delimiter
labeled_examples += (
str(doc_target[0])
if isinstance(doc_target, list)
else doc_target
if self.config.doc_to_choice is None or isinstance(doc_target, str)
else str(self.doc_to_choice(doc)[doc_target])
)
labeled_examples += self.fewshot_delimiter

return labeled_examples

def get_chat_context(
self,
doc,
num_fewshot,
fewshot_as_multiturn: bool = False,
):
chat_history = []
# draw an extra fewshot sample if using same split as evaluating on
n_samples = (
num_fewshot + 1
if self.config.fewshot_split == self.config.test_split
else num_fewshot
)
# draw `n_samples` docs from fewshot_docs
fewshotex = self.sample(n_samples)

# get rid of the doc that's the one we're evaluating, if it's in the fewshot
# TODO: should we just stop people from using fewshot from same split as evaluating?
selected_docs = [x for x in fewshotex if x != doc][:num_fewshot]

if fewshot_as_multiturn:
for doc in selected_docs:
doc_content = self.doc_to_text(doc)
doc_target = self.doc_to_target(doc)
chat_history.append(
{
"role": "user",
"content": doc_content
if self.config.doc_to_choice is None
or isinstance(doc_content, str)
else self.doc_to_choice(doc)[doc_content],
}
)
chat_history.append(
{
"role": "assistant",
"content": str(doc_target[0])
if isinstance(doc_target, list)
else doc_target
if self.config.doc_to_choice is None
or isinstance(doc_target, str)
else str(self.doc_to_choice(doc)[doc_target]),
}
)
else:
# get fewshot context as one user turn
chat_history.append(
{"role": "user", "content": self.get_context(doc, num_fewshot)}
)

return chat_history

def sample(self, n):
"""
Draw `n` samples from our fewshot docs. This method should be overridden by subclasses.
Expand Down
114 changes: 107 additions & 7 deletions lm_eval/api/task.py
Original file line number Diff line number Diff line change
Expand Up @@ -373,6 +373,10 @@ def build_all_requests(
world_size=None,
cache_requests=False,
rewrite_requests_cache=False,
system_instruction=None,
apply_chat_template=False,
fewshot_as_multiturn=False,
tokenizer=None,
KonradSzafer marked this conversation as resolved.
Show resolved Hide resolved
) -> None:
"""Build a set of Instances for a task, and store them in task.instances"""

Expand Down Expand Up @@ -421,6 +425,10 @@ def build_all_requests(
fewshot_ctx = self.fewshot_context(
doc,
0 if self.config.num_fewshot is None else self.config.num_fewshot,
system_instruction,
apply_chat_template,
fewshot_as_multiturn,
tokenizer,
)

# TODO: we should override self.config.repeats if doing greedy gen so users don't waste time+compute
Expand Down Expand Up @@ -957,31 +965,123 @@ def fewshot_docs(self):
)
return super().fewshot_docs()

def convert_chat_history_to_string(self, chat_history: list, tokenizer=None) -> str:
"""Returns chat history tokenized or concatenated as a string.

:param chat_history: list
The chat history to convert to a string.
:param tokenizer:
Optional tokenizer to use for applying the chat template, if None, the sampler's fewshot_delimiter is used.
"""
if tokenizer:
return tokenizer.apply_chat_template(
chat_history, tokenize=False, add_generation_prompt=True
)
else:
return self.sampler.fewshot_delimiter + "".join(
KonradSzafer marked this conversation as resolved.
Show resolved Hide resolved
f"{s['role']}: {s['content']}" + self.sampler.fewshot_delimiter
for s in chat_history
)

@utils.positional_deprecated
def fewshot_context(self, doc: str, num_fewshot: int) -> str:
def fewshot_context(
self,
doc: str,
num_fewshot: int,
system_instruction: Optional[str] = None,
apply_chat_template: bool = False,
fewshot_as_multiturn: bool = False,
tokenizer=None,
) -> str:
"""Returns a fewshot context string that is made up of a prepended description
(if provided), the `num_fewshot` number of examples, and an appended prompt example.

:param doc: str
The document as returned from training_docs, validation_docs, or test_docs.
:param num_fewshot: int
The number of fewshot examples to provide in the returned context string.
:param system_instruction: str
System instruction to be applied to the prompt.
:param apply_chat_template: bool
Whether to apply the chat template to the fewshot context.
:param fewshot_as_multiturn: bool
Whether to provide the fewshot examples as a multiturn conversation or a single user turn.
:param tokenizer:
The tokenizer to use for applying the chat template.
:returns: str
The fewshot context.
"""

if apply_chat_template:
labeled_examples = []
else:
labeled_examples = ""

# get task description
if description := self.config.description:
description = utils.apply_template(self.config.description, doc)

if num_fewshot == 0:
# always prepend the (possibly empty) task description
labeled_examples = description
# create system prompt based on the provided system instruction and description
if system_instruction is not None and description:
system_prompt = (
f"{system_instruction}{self.sampler.fewshot_delimiter}{description}"
)
elif system_instruction is not None:
system_prompt = system_instruction
elif description:
system_prompt = description
else:
labeled_examples = description + self.sampler.get_context(doc, num_fewshot)
system_prompt = ""

# add system prompt if specified
if system_prompt:
if apply_chat_template:
labeled_examples.append({"role": "system", "content": system_prompt})
else:
labeled_examples = system_prompt

# if few-shot - append examples after the system prompt
if num_fewshot > 0:
if apply_chat_template:
labeled_examples.extend(
self.sampler.get_chat_context(
doc, num_fewshot, fewshot_as_multiturn
)
)
else:
labeled_examples += self.sampler.get_context(doc, num_fewshot)

example = self.doc_to_text(doc)
if self.multiple_input:
return labeled_examples
if apply_chat_template:
if not self.multiple_input:
if isinstance(example, str):
labeled_examples.append({"role": "user", "content": example})
# for loglikelihood create a list of questions with appended choices
elif isinstance(example, list):
labeled_examples_list = []
# copy chat history for each example and append the answer
for ex in example:
chat = deepcopy(labeled_examples)
chat.append({"role": "user", "content": ex})
labeled_examples_list.append(
self.convert_chat_history_to_string(chat, tokenizer)
)
return labeled_examples_list
# if example is an integer, append the choice or convert to string
elif isinstance(example, int):
if self.config.doc_to_choice is not None:
choices = self.doc_to_choice(doc)
labeled_examples.append(
{"role": "user", "content": choices[example]}
)
else:
labeled_examples.append(
{"role": "user", "content": str(example)}
)
return self.convert_chat_history_to_string(labeled_examples, tokenizer)
else:
if self.multiple_input:
return labeled_examples
if isinstance(example, str):
return labeled_examples + example
elif isinstance(example, list):
Expand Down
25 changes: 25 additions & 0 deletions lm_eval/evaluator.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,9 @@ def simple_evaluate(
check_integrity: bool = False,
write_out: bool = False,
log_samples: bool = True,
system_instruction: Optional[str] = None,
apply_chat_template: bool = False,
fewshot_as_multiturn: bool = False,
gen_kwargs: Optional[str] = None,
task_manager: Optional[TaskManager] = None,
verbosity: str = "INFO",
Expand Down Expand Up @@ -99,6 +102,12 @@ def simple_evaluate(
If True, write out an example document and model input for checking task integrity
:param log_samples: bool
If True, write out all model outputs and documents for per-sample measurement and post-hoc analysis
:param system_instruction: str
System instruction to be applied to the prompt
:param apply_chat_template: bool
If True, apply chat template to the prompt
:param fewshot_as_multiturn: bool
Whether to provide the fewshot examples as a multiturn conversation or a single user turn.
:param gen_kwargs: str
String arguments for model generation
Ignored for all tasks with loglikelihood output_type
Expand Down Expand Up @@ -262,6 +271,9 @@ def simple_evaluate(
bootstrap_iters=bootstrap_iters,
write_out=write_out,
log_samples=log_samples,
system_instruction=system_instruction,
apply_chat_template=apply_chat_template,
fewshot_as_multiturn=fewshot_as_multiturn,
verbosity=verbosity,
)

Expand Down Expand Up @@ -317,6 +329,9 @@ def evaluate(
bootstrap_iters: Optional[int] = 100000,
write_out: bool = False,
log_samples: bool = True,
system_instruction: Optional[str] = None,
apply_chat_template: bool = False,
fewshot_as_multiturn: bool = False,
verbosity: str = "INFO",
):
"""Instantiate and evaluate a model on a list of tasks.
Expand All @@ -333,6 +348,12 @@ def evaluate(
If True, write out an example document and model input for checking task integrity
:param log_samples: bool
If True, write out all model outputs and documents for per-sample measurement and post-hoc analysis
:param system_instruction: str
System instruction to be applied to the prompt
:param apply_chat_template: bool
If True, apply chat template to the prompt
:param fewshot_as_multiturn: bool
Whether to provide the fewshot examples as a multiturn conversation or a single user turn.
:return
Dictionary of results
"""
Expand Down Expand Up @@ -362,6 +383,10 @@ def evaluate(
world_size=lm.world_size,
cache_requests=cache_requests,
rewrite_requests_cache=rewrite_requests_cache,
system_instruction=system_instruction,
apply_chat_template=apply_chat_template,
fewshot_as_multiturn=fewshot_as_multiturn,
tokenizer=lm.tokenizer if hasattr(lm, "tokenizer") else None,
)
eval_logger.debug(
f"Task: {task_output.task_name}; number of requests on this rank: {len(task.instances)}"
Expand Down
4 changes: 3 additions & 1 deletion lm_eval/models/huggingface.py
Original file line number Diff line number Diff line change
Expand Up @@ -583,7 +583,9 @@ def _create_model(
if self._model.config.vocab_size != len(self.tokenizer):
# resize model for LoRAs with added tokens
self._model.resize_token_embeddings(len(self.tokenizer))
eval_logger.info(f"Model config indicates vocab_size='{self._model.config.vocab_size}', but found tokenizer with vocab size '{len(self.tokenizer)}'. Resizing model embedding layer...")
eval_logger.info(
f"Model config indicates vocab_size='{self._model.config.vocab_size}', but found tokenizer with vocab size '{len(self.tokenizer)}'. Resizing model embedding layer..."
)
self._model = PeftModel.from_pretrained(
self._model, peft, revision=revision
)
Expand Down
Loading