Skip to content

Conversation

@codelion
Copy link
Member

@codelion codelion commented Nov 8, 2025

Fixes #269

codelion and others added 4 commits November 6, 2025 09:09
Introduces two new scripts: eval_imobench_answer.py for evaluating short-answer mathematical problems from the AnswerBench dataset, and eval_imobench_proof.py for evaluating rigorous proof problems from the ProofBench dataset. Both scripts support model evaluation, result saving, and detailed performance analysis.
Added a step to remove unused SDKs and prune Docker system volumes in both amd64 and arm64 Docker publish workflows. This helps prevent disk space issues during CI builds.
Co-Authored-By: Claude <noreply@anthropic.com>
@codelion codelion merged commit 1d65beb into main Nov 8, 2025
3 checks passed
@codelion codelion deleted the feat-add-proofbench-answerbench-evals branch November 8, 2025 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Docker latest tag not updating to v0.3.5

2 participants