-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add BERT_SCORE
to QAAccuracySemanticRobustness
#315
Conversation
BERT_SCORE
to QAAccuracySemanticRobustness
+ updated testsBERT_SCORE
to QAAccuracySemanticRobustness
+ updated tests
@@ -105,11 +107,13 @@ class TestCaseQAAccuracySemanticRobustnessEvaluateSample(NamedTuple): | |||
EvalScore(name=QUASI_EXACT_MATCH_SCORE, value=1.0), | |||
EvalScore(name=PRECISION_OVER_WORDS, value=1.0), | |||
EvalScore(name=RECALL_OVER_WORDS, value=1.0), | |||
EvalScore(name=BERT_SCORE, value=0.48504769802093506), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be mocking the bertscore model in the unit tests. We need the unit tests to validate that all of the new logic that was added to the transform pipeline is correct. Currently, none of this logic is getting tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will work on that
BERT_SCORE
to QAAccuracySemanticRobustness
+ updated testsBERT_SCORE
to QAAccuracySemanticRobustness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
SplitWithDelimiter
transform to addBERT_SCORE
andDELTA_BERT_SCORE
toQAAccuracySemanticRobustness
. Created a shared resource of the bertscore model to be used in the evaluate function inqa_accuracy_semantic_robustness
to reduce the amount of memory consumed.triviaQA_sample_small.jsonl
, to be used in integration tests because existing integration tests timed out with the 100 records intriviaQA_sample_small.jsonl
.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.