-
Notifications
You must be signed in to change notification settings - Fork 1k
Don't run actions during evaluate #966
Conversation
|
Something i noticed, if there's an |
|
Also it prints out a failure for each OR possibility, one would be enough 😄 |
|
I've pushed a fix for the naming, would be great if you can try it out. Concerning the number of failures: that is a tricky one and I don't think we can solve it as part of this one. |
|
Haha yeah i thought that might be trickier. but yeah, will try out that fix! |
|
just tested and the naming thing seems to be fixed :) |
|
Actually I take back my comment about only printing out one OR possibility. Sometimes it is genuinely just the one intent that causes the problem and not the others |
|
ready for review ✅ |
Ghostvv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me except for inconsistent names (see comments): I think we should use everywhere actual or gold, prediction or pred. And name lists with plural ...s.
I ran it on paper bot - works fine, but additionally I would like to have the statement: number of correct stories out of
For some reason my logger statements start from :INFO:rasa_nlu.evaluate:..., why is it nlu?
rasa_core/evaluate.py
Outdated
| preds_padding = (len(actions_between_utterances) - | ||
| len(last_prediction)) | ||
| preds = [] | ||
| gold = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd name it golds as in above method
rasa_core/evaluate.py
Outdated
| preds.extend(last_prediction) | ||
| preds_padding = (len(actions_between_utterances) - | ||
| len(last_prediction)) | ||
| preds = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd name it predictions here or preds above
rasa_core/evaluate.py
Outdated
| actual_padding = (len(last_prediction) - | ||
| len(actions_between_utterances)) | ||
| for tracker in tqdm(completed_trackers): | ||
| curr_gold, curr_predictions, predicted_tracker = \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like curr, can we rename it to current?
rasa_core/restore.py
Outdated
| "{} but got {}.".format(p, a)) | ||
|
|
||
|
|
||
| def align_lists(pred, actual): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's name it predictions, golds here, or preds, actuals, but then rename gold to actual everywhere
tests/test_evaluation.py
Outdated
| completed_trackers = evaluate._generate_trackers( | ||
| DEFAULT_STORIES_FILE, default_agent) | ||
|
|
||
| actual, preds, failed_stories = collect_story_predictions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consistent names here as wall
|
Great push for better names 👍 |
rasa_core/evaluate.py
Outdated
| from rasa_core.interpreter import NaturalLanguageInterpreter | ||
| from rasa_core.trackers import DialogueStateTracker | ||
| from rasa_core.training.generator import TrainingDataGenerator | ||
| from rasa_nlu.evaluate import plot_confusion_matrix, log_evaluation_table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably should not import anything from rasa_nlu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's actually fine, rasa core depends on nlu (we also use things in other places from nlu) - the only things we shouldn't use are parts that depend on optional dependencies (e.g. spacy).
|
I guess the last thing is: could you please add |
rasa_core/evaluate.py
Outdated
|
|
||
| for tracker in tqdm(completed_trackers): | ||
| curr_gold, curr_predictions, predicted_tracker = \ | ||
| current_gold, current_predictions, predicted_tracker = \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be current_golds I guess, because it is a list here
rasa_core/evaluate.py
Outdated
| golds.extend(current_gold) | ||
|
|
||
| if not curr_gold == curr_predictions: | ||
| if not current_gold == current_predictions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current_golds
rasa_core/evaluate.py
Outdated
| preds.extend(curr_predictions) | ||
| gold.extend(curr_gold) | ||
| predictions.extend(current_predictions) | ||
| golds.extend(current_gold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current_golds
|
@Ghostvv all remarks have been addressed, this is ready to get merged if you give your ok |
|
not sure what you mean, it is changed here: https://github.com/RasaHQ/rasa_core/pull/966/files#diff-4b73ffae216c7073d031c5bb15f53d11R157 is there any other location? |
|
yes, for some reason, it was displayed incorrectly |
|
so all good? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code-wise looks good. All works, good to merge
|
That is printed now as well. |
Proposed changes:
Status (please check what you already did):