[Feature] EvalAlgorithmInterface.evaluate should accept a list of DataConfigs for consistency #269

athewsey · 2024-05-03T03:41:29Z

Today EvalAlgorithmInterface.evaluate is typed to return List[EvalOutput] ("for dataset(s)", per the docstring), but its dataset_config argument only accepts Optional[DataConfig].

It seems like most concrete eval algorithms (like QAAccuracy here) either take the user's data_config for a single dataset, or take all the pre-defined DATASET_CONFIGS relevant to the evaluator's problem type.

...So the internal logic of evaluators is set up to support providing multiple datasets and returning multiple results already, but we seem to prevent users from calling evaluate() with multiple of their own datasets for no particular reason?

The text was updated successfully, but these errors were encountered:

danielezhu · 2024-05-06T18:32:24Z

Your understanding is correct. Currently, evaluate can either be configured to use a single user-provided dataset (via data_config) or configured to use all of the "built-in" datasets. Your feature request certainly makes sense; there isn't a particularly compelling reason I can think of for why we shouldn't be able to evaluate multiple "custom" (i.e. user-provided) datasets.

athewsey mentioned this issue May 29, 2024

Support multiple data configs in evaluate #283

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] EvalAlgorithmInterface.evaluate should accept a list of DataConfigs for consistency #269

[Feature] EvalAlgorithmInterface.evaluate should accept a list of DataConfigs for consistency #269

athewsey commented May 3, 2024

danielezhu commented May 6, 2024

[Feature] EvalAlgorithmInterface.evaluate should accept a list of DataConfigs for consistency #269

[Feature] EvalAlgorithmInterface.evaluate should accept a list of DataConfigs for consistency #269

Comments

athewsey commented May 3, 2024

danielezhu commented May 6, 2024