add context-based requests processing #1571

artemorloff · 2024-03-13T10:31:51Z

The PR add support for new type of tasks - context-based tasks.

Motivation. Some tasks and CoT strategies may require knowing the answer of the model for the previous question to form the current request. Till now it is impossible to implement such tasks without changing evaluator.py (or models/*.py file) which leads to inability to use lm-evaluation-harness as an external library (user cannot directly pass new evaluator.py instead of the default one while running the task). This PR changes it.

How it works. All requests are split into two meta-groups: regular tasks and context-based tasks. Each group is processed separately. No changes in processing regular tasks. For context-based tasks after preparing requests each request is updated, processed through the model, the external storage is updated. If no context-based tasks claimed, the loop for processing them is not accessed. So, the workflow for all existing tasks has not been changed.
Also, to encompass new functionality a new instance class is added: ContextInstance. It inherits from regular Instance and adds only two new methods: update_request that takes storage and request and does something to the request right before passing it into the model so that the changes are available with --log_samples flag. Old tasks that use Instance are not affected. New class is meant to avoid confusion between regular and context-based tasks instances.
To indicate that task is context-based the new attr is used. It shouldn't be False by defalut, as while running all tasks the presence of this attr and its value is checked. So, no changes needed to run existing tasks, no way old tasks will be run through a new loop.

All tests are passed successfully. No need in changes for different models. The only problem that may happen: each time while calling the model a new progress bar apears. This can be solved by merging #1569

Closes: #1432 #1537

artemorloff · 2024-03-13T11:07:33Z

@uanu2002

artemorloff · 2024-03-13T14:57:14Z

It is also important to define the behaviour of ConfigurableTask so that user could define context-based task with yaml file, isn't it?

artemorloff · 2024-03-15T12:44:07Z

I have added support for yaml tasks format. Now context-based task can be passed via task.yaml file. ContextInstance is called only for context-based tasks, that are defined by specific flag inside the task. Old tasks are not affected and use original Instance class.

artemorloff · 2024-03-25T00:01:02Z

@haileyschoelkopf is there anything I can do to enhance the PR and speed up the merging process? If there is anything I can add to make the PR look better, the code work more efficiently or avoid any confusion I will do it. I am open for ideas and comments :)

artemorloff · 2024-04-03T11:24:42Z

@haileyschoelkopf merged recent updates from main so that it was easier to review the changes I'm proposing

haileyschoelkopf · 2024-04-07T16:07:56Z

Left a comment about this PR and the feature in discord!

StellaAthena · 2024-04-18T12:48:42Z

Some tasks and CoT strategies may require knowing the answer of the model for the previous question to form the current request.

is this primarily about multi-step / multi-round problems?

artemorloff · 2024-04-18T14:44:03Z

@StellaAthena hi! Yes, when first introducing these changes I was thinking about multi-step prompting things. Like described here: https://arxiv.org/pdf/2305.14279.pdf
But I wanted to make the solution more flexible to handle multi-round (like here https://arxiv.org/pdf/2311.07689.pdf may be).
The main idea is to let users define theirselves the number of steps/rounds, the insides of requests passing the result for the previous request to the current one.

…rness into feature/context_tasks

artemorloff · 2024-05-06T14:00:05Z

@haileyschoelkopf can you give more information so that I could develope this PR? :)

artemorloff · 2024-05-13T22:43:14Z

This PR is designed to accomodate primarily different variants of multi-step and multi-round tasks. This way it suggests flexible functions to update the internal storage (keeps info about previous requests of the dataset) and to update the currect request (takes info from storage). With minor changes can close #1816 issue (need to envisage managing the amount of requests so that user can add more requests following some condition for it without affecting other context-based tasks).

I see it working this way. Eample of yaml task:

# download the task
task: dataset_y
dataset_path: dataset
dataset_name: dataset_name
# define that it is multiple-choice to compute log-probs
output_type: multiple_choice
# have only train and test splits
training_split: train
test_split: test
# new flag indicating it is context-based task
context_based: true
# methods used to update requests and storage
request_updater: !function utils._update_request
storage_updater: !function utils._update_storage
# for this task process_docs ensures the order of the instances
process_docs: !function utils.process_docs
# doc['instruction'] has place to put context inside
doc_to_text: "{{doc['instruction']}}"
doc_to_target: "{{outputs}}"
doc_to_choice: ["1", "2"]
target_delimiter: " "
should_decontaminate: false
process_results: !function utils.process_results
metric_list:
  - metric: acc
    aggregation: mean
    higher_is_better: true
metadata:
  version: 1.0

artemorloff · 2024-05-13T22:44:04Z

Examples of funcs:

def _update_request(storage, request):
    if not len(storage) and request.doc["meta"]["q_id"] != 0:
        print("No previous responses logged in storage!")
        return request
    if request.doc["meta"]["q_id"] == 0:
        # no update for first request
        update_ctx = ""
    else:
    # take context from storage
        update_ctx = storage["string"]
    # create new args for request to pass in lm and be logged in jsonl file
    new_pair = (
        request.arguments[0].replace(CONTEXT_PLACEHOLDER, update_ctx),
        request.arguments[1],
    )
    request.arguments = new_pair
    return request

def _update_storage(storage, request):
    # check that the set is over to clear storage
    if (
        request.doc["meta"]["set_id"] == 0
        and request.doc["meta"]["q_id"] == 429
        and len(storage["candidates"]) == 1
    ):
        dataset_ends = True
    else:
        dataset_ends = False
    # clear storage after dataset ends and return
    if dataset_ends:
        return {}
    # update storage only after running 2 choices for the same req
    storage.setdefault("candidates", []).extend([request.resps[0][0]])
    if len(storage["candidates"]) == 2:
        # decide on the answer
        res = ["1", "2"][np.argmax(storage["candidates"])]
        # get string that includes the context
        storage["string"] = storage.get("string", "")
        # update the previous context with the new one and answer
        storage[
            "string"
        ] += "\n{question}\n1. {choice1}\n2. {choice2}\nОтвет: {result}".format(
            question=request.doc["inputs"]["question"],
            choice1=request.doc["inputs"]["choice1"],
            choice2=request.doc["inputs"]["choice2"],
            result=res,
        )
        # discard storage each time all choices of req are passed
        storage["candidates"] = []
    return storage

artemorloff · 2024-05-13T22:46:05Z

@haileyschoelkopf i would gladly incorporate my ideas in existing lm-harness features!

lintangsutawika · 2024-05-14T03:19:41Z

@artemorloff I like the general idea but I'm missing some context here. Could describe a few types of tasks that would benefit from this type of evaluation (I've seen a recent issue that provided an example, but I'm wondering if there are other types of tasks that would benefit from this pattern)

artemorloff · 2024-05-17T13:29:23Z

@lintangsutawika thank you for reply! there emerging a lot of works and tasks that use multi-step/multi-round strategy. The goal is to allow users right now start creating these tasks and find surely existing ways to enhance it. Some examples of tasks and ideas using this strategy:

arithmetic task described here - https://arxiv.org/pdf/2305.14279,
CoT for benchmerk tasks with two-step questions [first model generates rationale, then final answer being prompted with prev generated rationale] - https://arxiv.org/pdf/2311.12022
even multi-round stuff like this (https://arxiv.org/pdf/2311.07689) can be done with a few changes in code [the idea is in regenerating the answer for the same request without adding new requests to the list - run as many rounds as one may want and update instance resps with maybe the last try or the most successfull one]
even quite hard things like this (https://arxiv.org/pdf/2402.08702v2) i guess may be done though lm-harness [this tasks of prompt tuning seems more multi-round then multi-step, but the idea is that one can even train something with harness if he manages to cover backward and step in post processing of each round]
same idea of CoT that makes a model to think on the task - https://arxiv.org/pdf/2403.14312v1 [originally a few llms "discuss" the task, in my realisation one model can judge in its own previous "thoughts"]

all this works refer to other works, so I think multi-step (round) reasoning is quite important and may allow for new tasks in LM-harness.

artemorloff · 2024-05-17T13:29:48Z

my PR introduces a quite straightforward approach to the issue. All such tasks are processed one by one to avoid possible problems with batches failing GPU memory, multiple batch size computations.
I believe lm-harness may be a good tool to conduct such researches to develop CoT resoning through multi-step (round) strategy.

add context-based requests processing

8053761

artemorloff requested review from haileyschoelkopf and lintangsutawika as code owners March 13, 2024 10:31

artemorloff mentioned this pull request Mar 13, 2024

Create new release and ship to PyPI #1560

Closed

fix typo

f018890

artemorloff added 3 commits March 14, 2024 15:08

Merge remote-tracking branch 'upstream/main'

3682f14

disable tqdm pbars for separate model calls for context-based tasks

8fa1ad9

add support of context-based tasks from ConfigurableTask

79d33f7

Merge branch 'main' into feature/context_tasks

90b7144

artemorloff added 4 commits April 22, 2024 16:37

Merge branch 'main' of https://github.com/EleutherAI/lm-evaluation-ha…

9ae44e0

…rness into feature/context_tasks

add more comments

a05b81e

add more comments

8e6dc50

Merge branch 'main' of https://github.com/EleutherAI/lm-evaluation-ha…

5ebba48

…rness into feature/context_tasks

artemorloff mentioned this pull request May 13, 2024

Multi-round evaluation for chat models #1816

Open

Merge remote-tracking branch 'upstream/main' into feature/context_tasks

fd38759

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add context-based requests processing #1571

add context-based requests processing #1571

artemorloff commented Mar 13, 2024

artemorloff commented Mar 13, 2024

artemorloff commented Mar 13, 2024

artemorloff commented Mar 15, 2024

artemorloff commented Mar 25, 2024

artemorloff commented Apr 3, 2024

haileyschoelkopf commented Apr 7, 2024

StellaAthena commented Apr 18, 2024

artemorloff commented Apr 18, 2024 •

edited

artemorloff commented May 6, 2024

artemorloff commented May 13, 2024

artemorloff commented May 13, 2024

artemorloff commented May 13, 2024

lintangsutawika commented May 14, 2024 •

edited

artemorloff commented May 17, 2024

artemorloff commented May 17, 2024 •

edited

add context-based requests processing #1571

Are you sure you want to change the base?

add context-based requests processing #1571

Conversation

artemorloff commented Mar 13, 2024

artemorloff commented Mar 13, 2024

artemorloff commented Mar 13, 2024

artemorloff commented Mar 15, 2024

artemorloff commented Mar 25, 2024

artemorloff commented Apr 3, 2024

haileyschoelkopf commented Apr 7, 2024

StellaAthena commented Apr 18, 2024

artemorloff commented Apr 18, 2024 • edited

artemorloff commented May 6, 2024

artemorloff commented May 13, 2024

artemorloff commented May 13, 2024

artemorloff commented May 13, 2024

lintangsutawika commented May 14, 2024 • edited

artemorloff commented May 17, 2024

artemorloff commented May 17, 2024 • edited

artemorloff commented Apr 18, 2024 •

edited

lintangsutawika commented May 14, 2024 •

edited

artemorloff commented May 17, 2024 •

edited