How to use custom dataset for evaluate? #1383

Gooooooogo · 2024-05-04T12:35:33Z

No description provided.

lzd-1230 · 2024-07-05T02:00:23Z

have you know how to eval in own dataset?

rasbt · 2024-07-05T12:21:08Z

One way is to evaluate the models responses on the test set. I have an example here: https://github.com/rasbt/LLM-workshop-2024/blob/main/06_finetuning/06_part-4.ipynb

Note that the code is not 100% complete there as this is for a workshop where part of it is an exercise, but let me post the missing parts below:

IN:

from litgpt import LLM
from tqdm import tqdm

llm = LLM.load("path/to/your/model")

for i in tqdm(range(len(test_data))):
    response = llm.generate(format_input(test_data[i]))
    test_data[i]["response"] = response

OUT:

100%|██████████| 165/165 [00:47<00:00,  3.51it/s]

IN:

with open("test_with_response.json", "w") as json_file:
    json.dump(test_data, json_file, indent=4)

del llm

llm = LLM.load("meta-llama/Meta-Llama-3-8B-Instruct", access_token="...")


def generate_model_scores(json_data, json_key):
    scores = []
    for entry in tqdm(json_data, desc="Scoring entries"):
        prompt = (
            f"Given the input `{format_input(entry)}` "
            f"and correct output `{entry['output']}`, "
            f"score the model response `{entry[json_key]}`"
            f" on a scale from 0 to 100, where 100 is the best score. "
            f"Respond with the integer number only."
        )
        score = llm.generate(prompt, max_new_tokens=50)
        try:
            scores.append(int(score))
        except ValueError:
            continue

    return scores


scores = generate_model_scores(json_data, "response")
print(f"\n{model}")
print(f"Number of scores: {len(scores)} of {len(json_data)}")
print(f"Average score: {sum(scores)/len(scores):.2f}\n")

OUT:

Scoring entries: 100%|██████████| 165/165 [00:30<00:00,  5.50it/s]

response_before
Number of scores: 161 of 165
Average score: 84.02

rasbt added the question Further information is requested label Jul 5, 2024

rasbt mentioned this issue Jul 5, 2024

Finetuning run times out at evaluation step on multiple devices #1202

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use custom dataset for evaluate? #1383

How to use custom dataset for evaluate? #1383

Gooooooogo commented May 4, 2024

lzd-1230 commented Jul 5, 2024

rasbt commented Jul 5, 2024

How to use custom dataset for evaluate? #1383

How to use custom dataset for evaluate? #1383

Comments

Gooooooogo commented May 4, 2024

lzd-1230 commented Jul 5, 2024

rasbt commented Jul 5, 2024