Add option to dump prompts and completions to a JSON file #492

juletx · 2023-05-10T16:18:20Z

This PR adds the --write_detailed_eval_info parameter to dump JSON with prompts and completions. The output path can be chosen with --detailed_eval_info_path. This PR is based on this commit: OpenGPTX@d6f84c4.

It is helpful for debugging and for exploring model outputs. It is handy for debugging greedy_until tasks such as GSM8K. I have used it to debug MGSM in this PR: #426. Here is an example:

    {
        "doc_id": 1,
        "prompt_0": "Question: If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot?\nStep-by-Step Answer: There are 3 cars in the beginning, 2 more arrive, so now there should be 3 + 2 = 5 cars. The answer is 5.\n\nQuestion: Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops did Jason give to Denny?\nStep-by-Step Answer: Jason started with 20 lollipops, but now he only has 12, so he gave Denny 20 - 12 = 8 lollipops. The answer is 8.\n\nQuestion: Olivia has $23. She bought five bagels for $3 each. How much money does she have left?\nStep-by-Step Answer: 5 bagels for $3 each should cost 5 * 3 = 15 dollars. Olivia had $23 in the beginning, so now she has 23 - 15 = 8 dollars left. The answer is 8.\n\nQuestion: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?\nStep-by-Step Answer: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11.\n\nQuestion: Michael had 58 golf balls. On tuesday, he lost 23 golf balls. On wednesday, he lost 2 more. How many golf balls did he have at the end of wednesday?\nStep-by-Step Answer: Michael started with 58 golf balls and lost 23, so he has 58 - 23 = 35. After he lost 2 more, he has 35 - 2 = 33 balls now. The answer is 33.\n\nQuestion: Shawn has five toys. For Christmas, he got two toys each from his mom and dad. How many toys does he have now?\nStep-by-Step Answer: He has 5 toys. He got 2 from mom, so after that he has 5 + 2 = 7 toys. Then he got 2 more from dad, so in total he has 7 + 2 = 9 toys. The answer is 9.\n\nQuestion: There were nine computers in the server room. Five more computers were installed each day, from monday to thursday. How many computers are now in the server room?\nStep-by-Step Answer: There are 4 days from monday to thursday. 5 computers were added each day. That means in total 4 * 5 = 20 computers were added. There were 9 computers in the beginning, so now there are 9 + 20 = 29 computers. The answer is 29.\n\nQuestion: Leah had 32 chocolates and her sister had 42. If they ate 35, how many pieces do they have left in total?\nStep-by-Step Answer: Leah had 32 chocolates and Leah’s sister had 42. That means there were originally 32 + 42 = 74 chocolates. 35 have been eaten. So in total they still have 74 - 35 = 39 chocolates. The answer is 39.\n\nQuestion: Cars have lined up on the motorway. Some of the cars drive through in the first 15 minutes of the traffic jam, then 20 more cars drive through in the remaining 15 minutes of the jam. 5 cars from the line take an exit so they don't have to drive through the traffic jam. If there were originally 30 cars on the motorway, how many cars drove through the traffic jam in the first 15 minutes?\nStep-by-Step Answer:until",
        "logit_0": "There were 30 cars in the beginning. 5 cars took an exit, so there are 30 - 5 = 25 cars left. 15 minutes are left. 20 cars drive through the traffic jam in the remaining 15 minutes. That means 25 cars drive through the traffic jam in the first 15 minutes. The answer is 25.",
        "truth": " 5",
        "acc": "False"
    }

StellaAthena · 2023-05-10T16:29:48Z

Hmmm. We wrote something like this for the BigScience fork but I guess never upstreamed it?

It's also worth noting that we currently we support saving examples (but not competitions) via

python write_out.py \
    --tasks hellaswag \
    --num_fewshot 5 \
    --num_examples 10 \
    --output_base_path /path/to/output/folder

This has the advantage of not requiring a GPU / running a model, so I think it's likely worth keeping? Maybe we can make the two set-ups consistent in naming conventions for usability though.

juletx · 2023-05-10T16:44:09Z

Yes, I have used write_out.py for implementing new multilingual tasks. It is useful for making sure that prompts are correct. However, sometimes you need completions to know what is going wrong. We can keep both. What should I change to make naming consistent?

StellaAthena · 2023-05-10T21:16:48Z

Yes, I have used write_out.py for implementing new multilingual tasks. It is useful for making sure that prompts are correct. However, sometimes you need completions to know what is going wrong. We can keep both. What should I change to make naming consistent?

Call the flag write_out and use the same variable name for the output path?

juletx · 2023-05-11T10:35:13Z

@StellaAthena I've updated parameter names and added docs.

StellaAthena · 2023-05-12T04:51:31Z

Looks good. I’ll run it in the morning and if nothing weird happens merge it.

juletx · 2023-05-21T09:42:26Z

@StellaAthena is this ready to merge?

StellaAthena · 2023-05-21T18:12:09Z

@juletx thanks for the ping! I had forgotten about this.

Add option to dump prompts and completions to a JSON file

add --write_detailed_eval_info to dump JSON with prompts and completions

2e046ce

juletx requested review from jon-tow and StellaAthena as code owners May 10, 2023 16:18

juletx added 2 commits May 11, 2023 12:28

update parameter names and add docs

99b0a42

update write out variable name

af91342

StellaAthena mentioned this pull request May 11, 2023

Support for ggml #417

Closed

StellaAthena merged commit bda6884 into EleutherAI:master May 21, 2023

qmdnls pushed a commit to qmdnls/lm-evaluation-harness that referenced this pull request Aug 17, 2023

Merge pull request EleutherAI#492 from juletx/eval-info

37001f6

Add option to dump prompts and completions to a JSON file

LZY-the-boys pushed a commit to LZY-the-boys/lm-evaluation-harness-fast that referenced this pull request Sep 12, 2023

Merge pull request EleutherAI#492 from juletx/eval-info

3138257

Add option to dump prompts and completions to a JSON file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to dump prompts and completions to a JSON file #492

Add option to dump prompts and completions to a JSON file #492

juletx commented May 10, 2023

StellaAthena commented May 10, 2023 •

edited

Loading

juletx commented May 10, 2023

StellaAthena commented May 10, 2023

juletx commented May 11, 2023

StellaAthena commented May 12, 2023

juletx commented May 21, 2023

StellaAthena commented May 21, 2023

Add option to dump prompts and completions to a JSON file #492

Add option to dump prompts and completions to a JSON file #492

Conversation

juletx commented May 10, 2023

StellaAthena commented May 10, 2023 • edited Loading

juletx commented May 10, 2023

StellaAthena commented May 10, 2023

juletx commented May 11, 2023

StellaAthena commented May 12, 2023

juletx commented May 21, 2023

StellaAthena commented May 21, 2023

StellaAthena commented May 10, 2023 •

edited

Loading