-
Notifications
You must be signed in to change notification settings - Fork 4
Sys2Bench is a benchmarking suite designed to evaluate reasoning and planning capabilities of large language models across algorithmic, logical, arithmetic, and common-sense reasoning tasks.
License
divelab/Sys2Bench
ErrorLooks like something went wrong!
About
Sys2Bench is a benchmarking suite designed to evaluate reasoning and planning capabilities of large language models across algorithmic, logical, arithmetic, and common-sense reasoning tasks.
Topics
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published