You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Today EvalAlgorithmInterface.evaluate is typed to return List[EvalOutput] ("for dataset(s)", per the docstring), but its dataset_config argument only accepts Optional[DataConfig].
It seems like most concrete eval algorithms (like QAAccuracy here) either take the user's data_config for a single dataset, or take all the pre-defined DATASET_CONFIGS relevant to the evaluator's problem type.
...So the internal logic of evaluators is set up to support providing multiple datasets and returning multiple results already, but we seem to prevent users from calling evaluate() with multiple of their own datasets for no particular reason?
The text was updated successfully, but these errors were encountered:
Your understanding is correct. Currently, evaluate can either be configured to use a single user-provided dataset (via data_config) or configured to use all of the "built-in" datasets. Your feature request certainly makes sense; there isn't a particularly compelling reason I can think of for why we shouldn't be able to evaluate multiple "custom" (i.e. user-provided) datasets.
Today EvalAlgorithmInterface.evaluate is typed to return
List[EvalOutput]
("for dataset(s)", per the docstring), but itsdataset_config
argument only acceptsOptional[DataConfig]
.It seems like most concrete eval algorithms (like QAAccuracy here) either take the user's
data_config
for a single dataset, or take all the pre-defined DATASET_CONFIGS relevant to the evaluator's problem type....So the internal logic of evaluators is set up to support providing multiple datasets and returning multiple results already, but we seem to prevent users from calling
evaluate()
with multiple of their own datasets for no particular reason?The text was updated successfully, but these errors were encountered: