From 4e4379229232b606d1c07cdadbb6d3949443636c Mon Sep 17 00:00:00 2001
From: Jeremy D <115047575+bmosaicml@users.noreply.github.com>
Date: Mon, 11 Mar 2024 17:50:30 -0400
Subject: [PATCH] finish (#1022)

Co-authored-by: Max Marion <mmarion538@gmail.com>
---
 scripts/eval/local_data/EVAL_GAUNTLET.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/eval/local_data/EVAL_GAUNTLET.md b/scripts/eval/local_data/EVAL_GAUNTLET.md
index b857e1664e..4183138bdb 100644
--- a/scripts/eval/local_data/EVAL_GAUNTLET.md
+++ b/scripts/eval/local_data/EVAL_GAUNTLET.md
@@ -1,4 +1,4 @@
-# Mosaic Eval Gauntlet v0.1.0 - Evaluation Suite
+# Mosaic Eval Gauntlet v0.3.0 - Evaluation Suite
 
 
 <!-- SETUPTOOLS_LONG_DESCRIPTION_HIDE_BEGIN -->
@@ -24,7 +24,7 @@ At evaluation time, we run all the benchmarks, average the subscores within each
 
 For example, if benchmark A has a random baseline accuracy of 25%, and the model achieved 30%, we would report this as (0.3 - 0.25)/(1-0.25) = 0.0667. This can be thought of as the accuracy above chance rescaled so the max is 1. For benchmarks in which the random guessing baseline accuracy is ~0 we report the accuracy as is. Note that with this rescaling, a model could technically score below 0 on a category as a whole, but we haven’t found this to occur with any of the models we’ve tested.
 
-This is version v0.1.0 of the Eval Gauntlet.
+This is version v0.3.0 of the Eval Gauntlet.
 
 ### Reading Comprehension