Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add context-based requests processing #1571

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

artemorloff
Copy link
Contributor

The PR add support for new type of tasks - context-based tasks.

Motivation. Some tasks and CoT strategies may require knowing the answer of the model for the previous question to form the current request. Till now it is impossible to implement such tasks without changing evaluator.py (or models/*.py file) which leads to inability to use lm-evaluation-harness as an external library (user cannot directly pass new evaluator.py instead of the default one while running the task). This PR changes it.

How it works. All requests are split into two meta-groups: regular tasks and context-based tasks. Each group is processed separately. No changes in processing regular tasks. For context-based tasks after preparing requests each request is updated, processed through the model, the external storage is updated. If no context-based tasks claimed, the loop for processing them is not accessed. So, the workflow for all existing tasks has not been changed.
Also, to encompass new functionality a new instance class is added: ContextInstance. It inherits from regular Instance and adds only two new methods: update_request that takes storage and request and does something to the request right before passing it into the model so that the changes are available with --log_samples flag. Old tasks that use Instance are not affected. New class is meant to avoid confusion between regular and context-based tasks instances.
To indicate that task is context-based the new attr is used. It shouldn't be False by defalut, as while running all tasks the presence of this attr and its value is checked. So, no changes needed to run existing tasks, no way old tasks will be run through a new loop.

All tests are passed successfully. No need in changes for different models. The only problem that may happen: each time while calling the model a new progress bar apears. This can be solved by merging #1569

Closes: #1432 #1537

@artemorloff
Copy link
Contributor Author

@uanu2002

@artemorloff
Copy link
Contributor Author

It is also important to define the behaviour of ConfigurableTask so that user could define context-based task with yaml file, isn't it?

@artemorloff
Copy link
Contributor Author

I have added support for yaml tasks format. Now context-based task can be passed via task.yaml file. ContextInstance is called only for context-based tasks, that are defined by specific flag inside the task. Old tasks are not affected and use original Instance class.

@artemorloff
Copy link
Contributor Author

@haileyschoelkopf is there anything I can do to enhance the PR and speed up the merging process? If there is anything I can add to make the PR look better, the code work more efficiently or avoid any confusion I will do it. I am open for ideas and comments :)

@artemorloff
Copy link
Contributor Author

@haileyschoelkopf merged recent updates from main so that it was easier to review the changes I'm proposing

@haileyschoelkopf
Copy link
Contributor

Left a comment about this PR and the feature in discord!

@StellaAthena
Copy link
Member

Some tasks and CoT strategies may require knowing the answer of the model for the previous question to form the current request.

is this primarily about multi-step / multi-round problems?

@artemorloff
Copy link
Contributor Author

artemorloff commented Apr 18, 2024

@StellaAthena hi! Yes, when first introducing these changes I was thinking about multi-step prompting things. Like described here: https://arxiv.org/pdf/2305.14279.pdf
But I wanted to make the solution more flexible to handle multi-round (like here https://arxiv.org/pdf/2311.07689.pdf may be).
The main idea is to let users define theirselves the number of steps/rounds, the insides of requests passing the result for the previous request to the current one.

@artemorloff
Copy link
Contributor Author

@haileyschoelkopf can you give more information so that I could develope this PR? :)

@artemorloff
Copy link
Contributor Author

This PR is designed to accomodate primarily different variants of multi-step and multi-round tasks. This way it suggests flexible functions to update the internal storage (keeps info about previous requests of the dataset) and to update the currect request (takes info from storage). With minor changes can close #1816 issue (need to envisage managing the amount of requests so that user can add more requests following some condition for it without affecting other context-based tasks).

I see it working this way. Eample of yaml task:

# download the task
task: dataset_y
dataset_path: dataset
dataset_name: dataset_name
# define that it is multiple-choice to compute log-probs
output_type: multiple_choice
# have only train and test splits
training_split: train
test_split: test
# new flag indicating it is context-based task
context_based: true
# methods used to update requests and storage
request_updater: !function utils._update_request
storage_updater: !function utils._update_storage
# for this task process_docs ensures the order of the instances
process_docs: !function utils.process_docs
# doc['instruction'] has place to put context inside
doc_to_text: "{{doc['instruction']}}"
doc_to_target: "{{outputs}}"
doc_to_choice: ["1", "2"]
target_delimiter: " "
should_decontaminate: false
process_results: !function utils.process_results
metric_list:
  - metric: acc
    aggregation: mean
    higher_is_better: true
metadata:
  version: 1.0

@artemorloff
Copy link
Contributor Author

Examples of funcs:

def _update_request(storage, request):
    if not len(storage) and request.doc["meta"]["q_id"] != 0:
        print("No previous responses logged in storage!")
        return request
    if request.doc["meta"]["q_id"] == 0:
        # no update for first request
        update_ctx = ""
    else:
    # take context from storage
        update_ctx = storage["string"]
    # create new args for request to pass in lm and be logged in jsonl file
    new_pair = (
        request.arguments[0].replace(CONTEXT_PLACEHOLDER, update_ctx),
        request.arguments[1],
    )
    request.arguments = new_pair
    return request
def _update_storage(storage, request):
    # check that the set is over to clear storage
    if (
        request.doc["meta"]["set_id"] == 0
        and request.doc["meta"]["q_id"] == 429
        and len(storage["candidates"]) == 1
    ):
        dataset_ends = True
    else:
        dataset_ends = False
    # clear storage after dataset ends and return
    if dataset_ends:
        return {}
    # update storage only after running 2 choices for the same req
    storage.setdefault("candidates", []).extend([request.resps[0][0]])
    if len(storage["candidates"]) == 2:
        # decide on the answer
        res = ["1", "2"][np.argmax(storage["candidates"])]
        # get string that includes the context
        storage["string"] = storage.get("string", "")
        # update the previous context with the new one and answer
        storage[
            "string"
        ] += "\n{question}\n1. {choice1}\n2. {choice2}\nОтвет: {result}".format(
            question=request.doc["inputs"]["question"],
            choice1=request.doc["inputs"]["choice1"],
            choice2=request.doc["inputs"]["choice2"],
            result=res,
        )
        # discard storage each time all choices of req are passed
        storage["candidates"] = []
    return storage

@artemorloff
Copy link
Contributor Author

@haileyschoelkopf i would gladly incorporate my ideas in existing lm-harness features!

@lintangsutawika
Copy link
Contributor

lintangsutawika commented May 14, 2024

@artemorloff I like the general idea but I'm missing some context here. Could describe a few types of tasks that would benefit from this type of evaluation (I've seen a recent issue that provided an example, but I'm wondering if there are other types of tasks that would benefit from this pattern)

@artemorloff
Copy link
Contributor Author

@lintangsutawika thank you for reply! there emerging a lot of works and tasks that use multi-step/multi-round strategy. The goal is to allow users right now start creating these tasks and find surely existing ways to enhance it. Some examples of tasks and ideas using this strategy:

  • arithmetic task described here - https://arxiv.org/pdf/2305.14279,
  • CoT for benchmerk tasks with two-step questions [first model generates rationale, then final answer being prompted with prev generated rationale] - https://arxiv.org/pdf/2311.12022
  • even multi-round stuff like this (https://arxiv.org/pdf/2311.07689) can be done with a few changes in code [the idea is in regenerating the answer for the same request without adding new requests to the list - run as many rounds as one may want and update instance resps with maybe the last try or the most successfull one]
  • even quite hard things like this (https://arxiv.org/pdf/2402.08702v2) i guess may be done though lm-harness [this tasks of prompt tuning seems more multi-round then multi-step, but the idea is that one can even train something with harness if he manages to cover backward and step in post processing of each round]
  • same idea of CoT that makes a model to think on the task - https://arxiv.org/pdf/2403.14312v1 [originally a few llms "discuss" the task, in my realisation one model can judge in its own previous "thoughts"]

all this works refer to other works, so I think multi-step (round) reasoning is quite important and may allow for new tasks in LM-harness.

@artemorloff
Copy link
Contributor Author

artemorloff commented May 17, 2024

my PR introduces a quite straightforward approach to the issue. All such tasks are processed one by one to avoid possible problems with batches failing GPU memory, multiple batch size computations.
I believe lm-harness may be a good tool to conduct such researches to develop CoT resoning through multi-step (round) strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

how to add tasks with requests based on the answers for the previous requests?
4 participants