Skip to content

STEM Benchmarks and verifiers#95

Merged
gupta-abhay merged 10 commits intomainfrom
abhay/stem_verifiers
Jul 10, 2025
Merged

STEM Benchmarks and verifiers#95
gupta-abhay merged 10 commits intomainfrom
abhay/stem_verifiers

Conversation

@gupta-abhay
Copy link
Collaborator

@gupta-abhay gupta-abhay commented Jun 18, 2025

  • We are going to rely on STEM evals (MMLU-Pro and GPQA-Diamond) as our 3rd subset in Reasoning Gauntlet
  • This adds a verifier for these datasets (including extensions for different formats etc.) and also dataloaders

Sample Run name: qwen25-stem-grpo-test-1OAMen
(note: this is with like 1K samples, actual dataset work going on in the background)

Screenshot 2025-07-10 at 10 50 21

@gupta-abhay gupta-abhay changed the title WIP: Adding verifier for STEM datasets Adding verifier for STEM datasets Jul 9, 2025
@gupta-abhay gupta-abhay changed the title Adding verifier for STEM datasets STEM Benchmarks and verifiers Jul 10, 2025
@gupta-abhay gupta-abhay marked this pull request as ready for review July 10, 2025 17:54
Copy link
Collaborator

@abaheti95 abaheti95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Mostly looks good with minor function renames for future re-use.

@gupta-abhay gupta-abhay merged commit 148ba62 into main Jul 10, 2025
4 checks passed
@gupta-abhay gupta-abhay deleted the abhay/stem_verifiers branch July 10, 2025 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants