Skip to content

Sys2Bench is a benchmarking suite designed to evaluate reasoning and planning capabilities of large language models across algorithmic, logical, arithmetic, and common-sense reasoning tasks.

License

Notifications You must be signed in to change notification settings

divelab/Sys2Bench

Error
Looks like something went wrong!

About

Sys2Bench is a benchmarking suite designed to evaluate reasoning and planning capabilities of large language models across algorithmic, logical, arithmetic, and common-sense reasoning tasks.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •