Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to dump prompts and completions to a JSON file #492

Merged
merged 3 commits into from
May 21, 2023

Conversation

juletx
Copy link
Contributor

@juletx juletx commented May 10, 2023

This PR adds the --write_detailed_eval_info parameter to dump JSON with prompts and completions. The output path can be chosen with --detailed_eval_info_path. This PR is based on this commit: OpenGPTX@d6f84c4.

It is helpful for debugging and for exploring model outputs. It is handy for debugging greedy_until tasks such as GSM8K. I have used it to debug MGSM in this PR: #426. Here is an example:

    {
        "doc_id": 1,
        "prompt_0": "Question: If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot?\nStep-by-Step Answer: There are 3 cars in the beginning, 2 more arrive, so now there should be 3 + 2 = 5 cars. The answer is 5.\n\nQuestion: Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops did Jason give to Denny?\nStep-by-Step Answer: Jason started with 20 lollipops, but now he only has 12, so he gave Denny 20 - 12 = 8 lollipops. The answer is 8.\n\nQuestion: Olivia has $23. She bought five bagels for $3 each. How much money does she have left?\nStep-by-Step Answer: 5 bagels for $3 each should cost 5 * 3 = 15 dollars. Olivia had $23 in the beginning, so now she has 23 - 15 = 8 dollars left. The answer is 8.\n\nQuestion: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?\nStep-by-Step Answer: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11.\n\nQuestion: Michael had 58 golf balls. On tuesday, he lost 23 golf balls. On wednesday, he lost 2 more. How many golf balls did he have at the end of wednesday?\nStep-by-Step Answer: Michael started with 58 golf balls and lost 23, so he has 58 - 23 = 35. After he lost 2 more, he has 35 - 2 = 33 balls now. The answer is 33.\n\nQuestion: Shawn has five toys. For Christmas, he got two toys each from his mom and dad. How many toys does he have now?\nStep-by-Step Answer: He has 5 toys. He got 2 from mom, so after that he has 5 + 2 = 7 toys. Then he got 2 more from dad, so in total he has 7 + 2 = 9 toys. The answer is 9.\n\nQuestion: There were nine computers in the server room. Five more computers were installed each day, from monday to thursday. How many computers are now in the server room?\nStep-by-Step Answer: There are 4 days from monday to thursday. 5 computers were added each day. That means in total 4 * 5 = 20 computers were added. There were 9 computers in the beginning, so now there are 9 + 20 = 29 computers. The answer is 29.\n\nQuestion: Leah had 32 chocolates and her sister had 42. If they ate 35, how many pieces do they have left in total?\nStep-by-Step Answer: Leah had 32 chocolates and Leah’s sister had 42. That means there were originally 32 + 42 = 74 chocolates. 35 have been eaten. So in total they still have 74 - 35 = 39 chocolates. The answer is 39.\n\nQuestion: Cars have lined up on the motorway. Some of the cars drive through in the first 15 minutes of the traffic jam, then 20 more cars drive through in the remaining 15 minutes of the jam. 5 cars from the line take an exit so they don't have to drive through the traffic jam. If there were originally 30 cars on the motorway, how many cars drove through the traffic jam in the first 15 minutes?\nStep-by-Step Answer:until",
        "logit_0": "There were 30 cars in the beginning. 5 cars took an exit, so there are 30 - 5 = 25 cars left. 15 minutes are left. 20 cars drive through the traffic jam in the remaining 15 minutes. That means 25 cars drive through the traffic jam in the first 15 minutes. The answer is 25.",
        "truth": " 5",
        "acc": "False"
    }

@StellaAthena
Copy link
Member

StellaAthena commented May 10, 2023

Hmmm. We wrote something like this for the BigScience fork but I guess never upstreamed it?

It's also worth noting that we currently we support saving examples (but not competitions) via

python write_out.py \
    --tasks hellaswag \
    --num_fewshot 5 \
    --num_examples 10 \
    --output_base_path /path/to/output/folder

This has the advantage of not requiring a GPU / running a model, so I think it's likely worth keeping? Maybe we can make the two set-ups consistent in naming conventions for usability though.

@juletx
Copy link
Contributor Author

juletx commented May 10, 2023

Yes, I have used write_out.py for implementing new multilingual tasks. It is useful for making sure that prompts are correct. However, sometimes you need completions to know what is going wrong. We can keep both. What should I change to make naming consistent?

@StellaAthena
Copy link
Member

Yes, I have used write_out.py for implementing new multilingual tasks. It is useful for making sure that prompts are correct. However, sometimes you need completions to know what is going wrong. We can keep both. What should I change to make naming consistent?

Call the flag write_out and use the same variable name for the output path?

@juletx
Copy link
Contributor Author

juletx commented May 11, 2023

@StellaAthena I've updated parameter names and added docs.

@StellaAthena StellaAthena mentioned this pull request May 11, 2023
@StellaAthena
Copy link
Member

Looks good. I’ll run it in the morning and if nothing weird happens merge it.

@juletx
Copy link
Contributor Author

juletx commented May 21, 2023

@StellaAthena is this ready to merge?

@StellaAthena StellaAthena merged commit bda6884 into EleutherAI:master May 21, 2023
@StellaAthena
Copy link
Member

@juletx thanks for the ping! I had forgotten about this.

qmdnls pushed a commit to qmdnls/lm-evaluation-harness that referenced this pull request Aug 17, 2023
Add option to dump prompts and completions to a JSON file
LZY-the-boys pushed a commit to LZY-the-boys/lm-evaluation-harness-fast that referenced this pull request Sep 12, 2023
Add option to dump prompts and completions to a JSON file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants