The benchbench
package simplifies benchmark agreement testing for NLP models. Compare multiple models across various benchmarks and generate comprehensive agreement reports easily.
It also powers BenchBench (https://huggingface.co/spaces/ibm/benchbench), a benchmark for comparing benchmarks.
To contribute a new benchmark, create a pull request with a new CSV file in src/bat/assets/benchmarks
. The filename should reflect the data source and snapshot date (see existing files for examples).
While much of benchbench
's functionality is available via the interactive BenchBench app (https://huggingface.co/spaces/ibm/benchbench), for more advanced usage and customization, clone the repository:
git clone git@github.com:IBM/benchbench.git
Install in the environment of your choice:
cd benchbench
conda create -n bat python=3.11
pip install -e .
And check out the example in ``examples/newbench_example.py `` (or here: https://github.com/IBM/benchbench/blob/main/examples/newbench_example.py) (Note: Use backticks for file path)
Contributions to the benchbench
package are welcome! Please submit your pull requests or issues through our GitHub repository.
This package is released under the MIT License.