Don't run actions during evaluate #966

tmbo · 2018-09-10T09:15:57Z

Proposed changes:

Don't run actions during evaluate
fixed export format to include the whole stories
use the name of the original story if possible
fixes #720

Status (please check what you already did):

made PR ready for code review
added some tests for the functionality
updated the documentation
updated the changelog

akelad · 2018-09-10T12:27:45Z

Something i noticed, if there's an OR in a story, the story title becomes a repetition of the story title, separated by a > (e.g. ## just newsletter > just newsletter) -- any way we can avoid this and just have the actual original title?

akelad · 2018-09-10T12:35:24Z

Also it prints out a failure for each OR possibility, one would be enough 😄

tmbo · 2018-09-10T12:43:59Z

I've pushed a fix for the naming, would be great if you can try it out.

Concerning the number of failures: that is a tricky one and I don't think we can solve it as part of this one.

akelad · 2018-09-10T13:07:14Z

Haha yeah i thought that might be trickier. but yeah, will try out that fix!

akelad · 2018-09-10T13:27:59Z

just tested and the naming thing seems to be fixed :)

akelad · 2018-09-10T16:24:01Z

Actually I take back my comment about only printing out one OR possibility. Sometimes it is genuinely just the one intent that causes the problem and not the others

tmbo · 2018-09-10T16:49:17Z

ready for review ✅

Ghostvv

Looks good to me except for inconsistent names (see comments): I think we should use everywhere actual or gold, prediction or pred. And name lists with plural ...s.

I ran it on paper bot - works fine, but additionally I would like to have the statement: number of correct stories out of

For some reason my logger statements start from :INFO:rasa_nlu.evaluate:..., why is it nlu?

Ghostvv · 2018-09-13T14:17:31Z

rasa_core/evaluate.py

-            preds_padding = (len(actions_between_utterances) -
-                             len(last_prediction))
+    preds = []
+    gold = []


I'd name it golds as in above method

Ghostvv · 2018-09-13T14:18:00Z

rasa_core/evaluate.py

-            preds.extend(last_prediction)
-            preds_padding = (len(actions_between_utterances) -
-                             len(last_prediction))
+    preds = []


I'd name it predictions here or preds above

Ghostvv · 2018-09-13T14:19:12Z

rasa_core/evaluate.py

-            actual_padding = (len(last_prediction) -
-                              len(actions_between_utterances))
+    for tracker in tqdm(completed_trackers):
+        curr_gold, curr_predictions, predicted_tracker = \


I don't like curr, can we rename it to current?

Ghostvv · 2018-09-13T14:21:48Z

rasa_core/restore.py

                      "{} but got {}.".format(p, a))


+def align_lists(pred, actual):


let's name it predictions, golds here, or preds, actuals, but then rename gold to actual everywhere

Ghostvv · 2018-09-13T14:23:03Z

tests/test_evaluation.py

+    completed_trackers = evaluate._generate_trackers(
+            DEFAULT_STORIES_FILE, default_agent)

    actual, preds, failed_stories = collect_story_predictions(


consistent names here as wall

tmbo · 2018-09-13T14:58:50Z

Great push for better names 👍

Ghostvv · 2018-09-13T15:04:30Z

rasa_core/evaluate.py

+from rasa_core.interpreter import NaturalLanguageInterpreter
 from rasa_core.trackers import DialogueStateTracker
 from rasa_core.training.generator import TrainingDataGenerator
 from rasa_nlu.evaluate import plot_confusion_matrix, log_evaluation_table


we probably should not import anything from rasa_nlu?

it's actually fine, rasa core depends on nlu (we also use things in other places from nlu) - the only things we shouldn't use are parts that depend on optional dependencies (e.g. spacy).

Ghostvv · 2018-09-13T15:41:04Z

I guess the last thing is: could you please add number of correct stories out of and then we are good to merge

Ghostvv · 2018-09-13T15:42:35Z

rasa_core/evaluate.py


    for tracker in tqdm(completed_trackers):
-        curr_gold, curr_predictions, predicted_tracker = \
+        current_gold, current_predictions, predicted_tracker = \


should be current_golds I guess, because it is a list here

Ghostvv · 2018-09-13T15:42:45Z

rasa_core/evaluate.py

+        golds.extend(current_gold)

-        if not curr_gold == curr_predictions:
+        if not current_gold == current_predictions:


current_golds

Ghostvv · 2018-09-13T15:42:52Z

rasa_core/evaluate.py

-        preds.extend(curr_predictions)
-        gold.extend(curr_gold)
+        predictions.extend(current_predictions)
+        golds.extend(current_gold)


current_golds

tmbo · 2018-09-14T07:40:34Z

@Ghostvv all remarks have been addressed, this is ready to get merged if you give your ok

tmbo · 2018-09-14T07:54:39Z

not sure what you mean, it is changed here: https://github.com/RasaHQ/rasa_core/pull/966/files#diff-4b73ffae216c7073d031c5bb15f53d11R157 is there any other location?

Ghostvv · 2018-09-14T07:54:43Z

yes, for some reason, it was displayed incorrectly

tmbo · 2018-09-14T07:56:11Z

so all good?

Ghostvv

Code-wise looks good. All works, good to merge

tmbo · 2018-09-14T08:00:10Z

That is printed now as well.

tmbo added 3 commits September 7, 2018 18:41

do not run actions during evaluation

87011c8

fixed tests

65f8388

added names to trackers for story export

899b7a2

tried to fix naming of ORs

e7448e9

trying to fix tests

e650733

tmbo added 2 commits September 10, 2018 15:53

added some comments

6e52554

added docs

cfabed3

improved evaluation code

b73ca4f

tmbo requested a review from akelad September 10, 2018 16:49

tmbo mentioned this pull request Sep 10, 2018

Cli more flags #970

Merged

4 tasks

Ghostvv self-requested a review September 12, 2018 08:24

tmbo removed the request for review from akelad September 12, 2018 11:46

Ghostvv approved these changes Sep 13, 2018

View reviewed changes

tmbo added 2 commits September 13, 2018 16:52

merged master

f32de6e

better naming of variables

a903f4a

Ghostvv reviewed Sep 13, 2018

View reviewed changes

Ghostvv suggested changes Sep 13, 2018

View reviewed changes

tmbo added 4 commits September 14, 2018 09:30

added total failed stories to output

c60dc4d

fixed gold vs golds

057e784

Merge branch 'master' into proper-evaluate

4734e3b

merged master

b5a20c2

incremented version

8b4b4d6

Ghostvv approved these changes Sep 14, 2018

View reviewed changes

fixed output title

ec5d13c

tmbo merged commit 4da91e1 into master Sep 14, 2018

tmbo deleted the proper-evaluate branch September 14, 2018 08:07

		"{} but got {}.".format(p, a))


		def align_lists(pred, actual):

Don't run actions during evaluate #966

Don't run actions during evaluate #966

Uh oh!

Conversation

tmbo commented Sep 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akelad commented Sep 10, 2018

Uh oh!

akelad commented Sep 10, 2018

Uh oh!

tmbo commented Sep 10, 2018

Uh oh!

akelad commented Sep 10, 2018

Uh oh!

akelad commented Sep 10, 2018

Uh oh!

akelad commented Sep 10, 2018

Uh oh!

tmbo commented Sep 10, 2018

Uh oh!

Ghostvv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tmbo commented Sep 13, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ghostvv commented Sep 13, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tmbo commented Sep 14, 2018

Uh oh!

tmbo commented Sep 14, 2018

Uh oh!

Ghostvv commented Sep 14, 2018

Uh oh!

tmbo commented Sep 14, 2018

Uh oh!

Ghostvv left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tmbo commented Sep 14, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tmbo commented Sep 10, 2018 •

edited

Loading

Ghostvv left a comment •

edited

Loading