-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add classification accuracy semantic robustness eval algo #47
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Please implement the two 2 quick fixes below.
We can discuss whether we want to address the point in the bottom at some later point, not important for mvp.
Quick fixes:
- ClassificationAccuracySemanticRobustness missing from
eval_algo_mapping.py
, so it cannot be imported like the others (for example notebooks). - In example notebooks, first cell:
# from amazon_fmeval.eval_algo_mapping import get_eval_algorithm
needs to be
from amazon_fmeval import get_eval_algorithm
More involved, for later:
- Only accuracy is reported for robustness (not "balanced_accuracy_score", "precision_score", and "recall_score"). Adding those will probably require some refactoring because they cannot be computed on a per-sample-basis but need the whole dataset at once.
num_perturbations: int = 5 | ||
seed: int = 5 | ||
# BUTTER FINGER PERTURBATION | ||
butter_finger_perturbation_prob: Optional[float] = 0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These defaults should be the same between all robustness evals. Consider turning them into constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Later: abstract away base_task (for QA, summarization, classification) + robustness to avoid duplications.
274ee28
to
6720541
Compare
6720541
to
3ccbfd9
Compare
Description of changes:
add classification accuracy semantic robustness eval algo
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.