feat: add `BERT_SCORE` to `QAAccuracySemanticRobustness` #315

kirupang-code · 2024-07-24T17:57:17Z

Used SplitWithDelimiter transform to add BERT_SCORE and DELTA_BERT_SCORE to QAAccuracySemanticRobustness. Created a shared resource of the bertscore model to be used in the evaluate function in qa_accuracy_semantic_robustness to reduce the amount of memory consumed.
Created a smaller dataset, triviaQA_sample_small.jsonl, to be used in integration tests because existing integration tests timed out with the 100 records in triviaQA_sample_small.jsonl.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

cr: https://code.amazon.com/reviews/CR-135854933

…weldge

…obustness

danielezhu · 2024-08-15T19:00:59Z

test/unit/eval_algorithms/test_qa_accuracy_semantic_robustness.py

@@ -105,11 +107,13 @@ class TestCaseQAAccuracySemanticRobustnessEvaluateSample(NamedTuple):
                    EvalScore(name=QUASI_EXACT_MATCH_SCORE, value=1.0),
                    EvalScore(name=PRECISION_OVER_WORDS, value=1.0),
                    EvalScore(name=RECALL_OVER_WORDS, value=1.0),
+                    EvalScore(name=BERT_SCORE, value=0.48504769802093506),


We should be mocking the bertscore model in the unit tests. We need the unit tests to validate that all of the new logic that was added to the transform pipeline is correct. Currently, none of this logic is getting tested.

Ok, will work on that

xiaoyi-cheng

lgtm

kirupang-code added 30 commits July 3, 2024 13:24

Added metric to factual knowledge + unit/integration tests

6d66b62

cr: https://code.amazon.com/reviews/CR-135854933

fixed changes from PR comments

cc866ed

Deleted metrics.py and restored code in util.py

843d9f6

added factual knowledge metrics to constants.py

c2f9efb

Merge branch 'main' of github.com:aws/fmeval

d7e5fa5

added factual knowledge metrics to be included in binary score

8d9bf4f

updated score descriptions for factual knowledge

1aee116

feat: add configurable param logical_operator (OR/AND) to factual kno…

d8c29da

…weldge

Merge branch 'main' of github.com:aws/fmeval

8715749

Merge branch 'main' of github.com:aws/fmeval

0ca7c47

fixed changes from PR comments

ba43b92

added warning and fixed typo

5e7bb50

Merge branch 'main' into main

411e0fa

modified warnings and fixed invalid config tests for factual_knowledge

f1f9792

Merge branch 'main' of github.com:aws/fmeval

43a9a48

Merge branch 'main' of github.com:kirupang-code/fmeval

5f2d1d4

Merge branch 'main' of github.com:kirupang-code/fmeval

b684515

Merge branch 'main' of github.com:kirupang-code/fmeval

e13bd5c

Merge branch 'main' of github.com:aws/fmeval

d51f3ff

feat: Adding BERTScore to QAAccuracy + QAAccuracySemanticRobustness

2a165cf

fix: documentation and tests for qa accuracy + qa accuracy semantic r…

634e85f

…obustness

fix: lint checks

c019be5

fix: created dataset for qa_accuracy, reverted to js_model_runner

0988ac7

fix: integration tests by adding approx for BertScore

28b449e

fix: moved BertScoreWithDelimiter to qa_accuracy and updated tests

5b99691

fix: restored qa_accuracy_semantic_robustness

e6c6f33

fix: smaller dataset for integ tests to reduce runtime

e3d02d6

fix: moved BertScoreWithDelimiter to qa_accuracy and updated tests

e1ce6ba

fix: smaller dataset for integ tests to reduce runtime

7052ef2

Merge branch 'main' of github.com:kirupang-code/fmeval

4859531

kirupang-code changed the title ~~feat: AddBERT_SCORE to QAAccuracySemanticRobustness + updated tests~~ feat: Add BERT_SCORE to QAAccuracySemanticRobustness + updated tests Jul 25, 2024

kirupang-code added 19 commits July 25, 2024 15:28

Add BertScoreMax transform for qa_accuracy

fdb99b1

fix: lint checks

cf6448a

fix: cleaning up code and checking reporting folder for changes

c9f565d

saving changes while working on another branch

7076976

save changes before moving onto diff branch

e1e1a52

tested new summarization accuracy metrics w qa_accuracy

02182f1

refactor: using BertScore in qa_accuracy

a2a80c2

Merge branch 'main' of github.com:kirupang-code/fmeval into second

d84b9a5

fixed lint checks

318ef4b

Merge branch 'main' of github.com:aws/fmeval into second

db6e0d3

fixed ordering of bertscore parameters

adfd028

Merge branch 'main' into second

6a6c37c

fix: pinned nltk version to address build failure

41f02b2

Merge branch 'main' into second

7b50dc6

Merge branch 'main' of github.com:aws/fmeval into second

e86354c

Update test_factual_knowledge.py

c3cbfc1

Merge branch 'main' of github.com:kirupang-code/fmeval into second

e5ef088

fixed lint checks

f4dceec

Merge branch 'second' of github.com:kirupang-code/fmeval into second

3b8af75

danielezhu requested changes Aug 15, 2024

View reviewed changes

kirupang-code added 3 commits August 15, 2024 12:05

fix: changed wording in documentation

48fc0e6

fix: mocked unit tests

9bbb536

fix: removed unused variable

9e8dbf2

kirupang-code requested a review from danielezhu August 15, 2024 19:50

oyangz approved these changes Aug 15, 2024

View reviewed changes

danielezhu changed the title ~~feat: Add BERT_SCORE to QAAccuracySemanticRobustness + updated tests~~ feat: add BERT_SCORE to QAAccuracySemanticRobustness Aug 15, 2024

danielezhu approved these changes Aug 15, 2024

View reviewed changes

xiaoyi-cheng approved these changes Aug 15, 2024

View reviewed changes

xiaoyi-cheng merged commit a0f439a into aws:main Aug 15, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `BERT_SCORE` to `QAAccuracySemanticRobustness` #315

feat: add `BERT_SCORE` to `QAAccuracySemanticRobustness` #315

kirupang-code commented Jul 24, 2024 •

edited

Loading

danielezhu Aug 15, 2024

kirupang-code Aug 15, 2024

xiaoyi-cheng left a comment

feat: add BERT_SCORE to QAAccuracySemanticRobustness #315

feat: add BERT_SCORE to QAAccuracySemanticRobustness #315

Conversation

kirupang-code commented Jul 24, 2024 • edited Loading

danielezhu Aug 15, 2024

Choose a reason for hiding this comment

kirupang-code Aug 15, 2024

Choose a reason for hiding this comment

xiaoyi-cheng left a comment

Choose a reason for hiding this comment

feat: add `BERT_SCORE` to `QAAccuracySemanticRobustness` #315

feat: add `BERT_SCORE` to `QAAccuracySemanticRobustness` #315

kirupang-code commented Jul 24, 2024 •

edited

Loading