v0.3.0
·
130 commits
to main
since this release
What's Changed
- fix: OLMES matching effort (MC Task Suite) by @fsschneider in #182
- feat: Feature to have metric aggregators like Pass@K by @prabhuteja12 in #190
- feat: Adds GSM8k with Olmes parity by @prabhuteja12 in #191
- feat: adding pool to sandbox by @prabhuteja12 in #194
- fix: scipy should be a non-optional dependency by @fsschneider in #196
- feat: add BPB metric to more tasks by @fsschneider in #198
- fix: bug fix in grouping per subject metrics by @prabhuteja12 in #201
- feat: add OLMES variant of BigCodeBench by @tfburns in #184
- feat: Task suites by @prabhuteja12 in #200
- refactor: add task formatters and helper functions for choice-based e… by @martinreinhardt01 in #202
- fix: match OLMES for GenQA tasks by @fsschneider in #195
- chore: Removing assertions that prevent proper logging of error messages by @prabhuteja12 in #203
- feat: add MultiPL HumanEval & MBPPP tasks by @tfburns in #189
Full Changelog: v0.2.14...v0.2.15