feat: add num_records as a parameter to evaluate method #39

keerthanvasist · 2023-10-17T04:34:20Z

Description of changes:
Add num_records as a parameter to evaluate method.

However, a concerning finding of this PR is that Ray's random_sample produces non-deterministic results even after setting seed parameter.

Update:
The sampling is now deterministic. We are using Dataset.from_pandas(Dataset.to_pandas().sample()) to use pandas's sampling ability to get a deterministic sampling. This is a temporary workaround till Ray's issue is fixed.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

malhotra18 · 2023-10-17T04:37:21Z

src/amazon_fmeval/data_loaders/util.py

@@ -27,6 +27,8 @@ def get_dataset(config: DataConfig) -> ray.data.Dataset:
        data_loader_config = _get_data_loader_config(data_source, config)
        data_loader = _get_data_loader(config.dataset_mime_type)
        data = data_loader.load_dataset(data_loader_config).materialize()
+        if num_records:  # pragma: no branch
+            data = data.random_sample(num_records/data.count(), seed=1234)


please define this seed as a constant

Update docstring too

malhotra18 · 2023-10-17T04:38:57Z

src/amazon_fmeval/eval_algorithms/factual_knowledge.py

@@ -107,6 +106,7 @@ def evaluate(
        dataset_config: Optional[DataConfig] = None,
        prompt_template: Optional[str] = None,
        save: bool = False,
+        num_records=100,


we decided to have algorithm specific default num_records, right? 100 is too low a number for default. Did we check with science team on this?

I didn't have defaults from then, and didn't want to wait - I need this test.
But, now that we have it, I will update it.

malhotra18 · 2023-10-17T04:39:36Z

src/amazon_fmeval/eval_algorithms/general_semantic_robustness.py

        self,
-        model: ModelRunner,
+        model: ModelRunner = None,


why is None as a default for this needed? That is an invalid use case for this eval algo.

I think this happened by an automatic refactor. Will update. Good catch.

keerthanvasist · 2023-10-17T16:04:34Z

Created this issue in Ray repository

pinaraws · 2023-10-17T16:44:44Z

src/amazon_fmeval/data_loaders/util.py

@@ -27,6 +27,8 @@ def get_dataset(config: DataConfig) -> ray.data.Dataset:
        data_loader_config = _get_data_loader_config(data_source, config)
        data_loader = _get_data_loader(config.dataset_mime_type)
        data = data_loader.load_dataset(data_loader_config).materialize()
+        if num_records:  # pragma: no branch


add a check for -1

pinaraws · 2023-10-17T17:42:58Z

src/amazon_fmeval/data_loaders/util.py

    """
    with timed_block(f"Loading dataset {config.dataset_name}", logger):
        data_source = get_data_source(config.dataset_uri)
        data_loader_config = _get_data_loader_config(data_source, config)
        data_loader = _get_data_loader(config.dataset_mime_type)
        data = data_loader.load_dataset(data_loader_config).materialize()
+        count = data.count()
+        util.require(count > 0, "Data has to have atleast one record")


typo: atleast -> at least

malhotra18 · 2023-10-17T19:12:15Z

src/amazon_fmeval/eval_algorithms/general_semantic_robustness.py

        self,
        model: ModelRunner,
        dataset_config: Optional[DataConfig] = None,
        prompt_template: Optional[str] = None,
        save: bool = False,
+        num_records=100,


Note: Merged SemanticRobustness sum accuracy, could you please include this in that too?

malhotra18 requested changes Oct 17, 2023

View reviewed changes

keerthanvasist force-pushed the ps branch from a58194a to a784593 Compare October 17, 2023 16:03

pinaraws reviewed Oct 17, 2023

View reviewed changes

keerthanvasist force-pushed the ps branch 2 times, most recently from 63be5de to dbc20eb Compare October 17, 2023 17:13

pinaraws reviewed Oct 17, 2023

View reviewed changes

keerthanvasist force-pushed the ps branch from dbc20eb to 6a1a115 Compare October 17, 2023 17:50

pinaraws previously approved these changes Oct 17, 2023

View reviewed changes

malhotra18 reviewed Oct 17, 2023

View reviewed changes

keerthanvasist dismissed pinaraws’s stale review via f30e414 October 17, 2023 19:51

keerthanvasist added 2 commits October 17, 2023 12:53

feat: add num_records as a parameter to evaluate method

c220063

feat: deterministic sampling with to_pandas()

a356e78

pinaraws previously approved these changes Oct 17, 2023

View reviewed changes

malhotra18 previously approved these changes Oct 17, 2023

View reviewed changes

xiaoyi-cheng previously approved these changes Oct 17, 2023

View reviewed changes

keerthanvasist dismissed stale reviews from xiaoyi-cheng, malhotra18, and pinaraws via 6dd5099 October 17, 2023 21:09

keerthanvasist force-pushed the ps branch from f30e414 to 6dd5099 Compare October 17, 2023 21:09

pinaraws previously approved these changes Oct 17, 2023

View reviewed changes

updated evaluate for summarization accuracy semantic robustness

5f2c3d4

keerthanvasist dismissed pinaraws’s stale review via 5f2c3d4 October 17, 2023 21:20

keerthanvasist force-pushed the ps branch from 6dd5099 to 5f2c3d4 Compare October 17, 2023 21:20

keerthanvasist merged commit 38364e6 into aws:main Oct 17, 2023
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add num_records as a parameter to evaluate method #39

feat: add num_records as a parameter to evaluate method #39

keerthanvasist commented Oct 17, 2023 •

edited

Loading

malhotra18 Oct 17, 2023

malhotra18 Oct 17, 2023

keerthanvasist Oct 17, 2023

malhotra18 Oct 17, 2023

keerthanvasist Oct 17, 2023 •

edited

Loading

keerthanvasist commented Oct 17, 2023

pinaraws Oct 17, 2023

pinaraws Oct 17, 2023

malhotra18 Oct 17, 2023

feat: add num_records as a parameter to evaluate method #39

feat: add num_records as a parameter to evaluate method #39

Conversation

keerthanvasist commented Oct 17, 2023 • edited Loading

malhotra18 Oct 17, 2023

Choose a reason for hiding this comment

malhotra18 Oct 17, 2023

Choose a reason for hiding this comment

keerthanvasist Oct 17, 2023

Choose a reason for hiding this comment

malhotra18 Oct 17, 2023

Choose a reason for hiding this comment

keerthanvasist Oct 17, 2023 • edited Loading

Choose a reason for hiding this comment

keerthanvasist commented Oct 17, 2023

pinaraws Oct 17, 2023

Choose a reason for hiding this comment

pinaraws Oct 17, 2023

Choose a reason for hiding this comment

malhotra18 Oct 17, 2023

Choose a reason for hiding this comment

keerthanvasist commented Oct 17, 2023 •

edited

Loading

keerthanvasist Oct 17, 2023 •

edited

Loading