Include story source filename in core output for failed stories #5496

cheemingli · 2020-03-25T20:43:25Z

Proposed changes:
Include story source filename in the story name in the failed stories output to help find the failed story more easily (see #3419).
Passed the source filename starting from the StoryFileReader to a StoryStep.
Besides the story block names the source filename is included in the tracker events which are used for outputting the failed stories.

Status (please check what you already did):

added some tests for the functionality
updated the documentation
updated the changelog (please check changelog for instructions)
reformat files using black (please check Readme for instructions)

CLAassistant · 2020-03-25T20:43:29Z

All committers have signed the CLA.

Include story source filename in the story name in the failed stories output to help find the failed story more easily (see RasaHQ#3419). Passed the source filename starting from the `StoryFileReader` to a `StoryStep`. Besides the story block names the source filename is included in the tracker events which are used for outputting the failed stories. Because the story files are copied to a temporary folder it is not possible to include the original full story path.Instead only the file name is included. If a recursive folder structure is used with the same story file names it can still be hard to find the problem file.

sara-tagger · 2020-03-26T07:49:33Z

Thanks for submitting a pull request 🚀 @degiz will take a look at it as soon as possible ✨

degiz

Thanks for the PR 👍

I've left few comments. Also I think that the PR is missing a unit test that would actually check that for a failed story now prints the story name. 🙂

rasa/core/training/dsl.py

rasa/core/training/generator.py

rasa/data.py

degiz · 2020-04-02T16:56:35Z

tests/core/test_data.py

+    assert data.get_source_file_name("") == ""
+    assert data.get_source_file_name("/tmp/stories.md") == "stories.md"
+    assert data.get_source_file_name("/tmp/123_stories.md") == "stories.md"
+    assert data.get_source_file_name("/tmp/123_my_stories.md") == "my_stories.md"


Few questions:

Why do we remove first _ at all? I though the idea of the PR is to keep the file name

What is the reason behind removing only first _ ? So for old_123_my_stories.md the result will be 123_my_stories.md

The original stories are copied to a temporary directory and the filename get prefixed with an unique ID. This was the reason to strip file name so the original story file name was printed.

I agree to leave the file name intact so there is still a reference to the story (in the temporary directory) which failed e.g.

## happy path > /tmp/tmp73u056dx/4f8a5df888ec4e96bafbf9dad54c624d_stories2.md

Removed the function which stripped the file name and made changes to pass the file name instead of stripped one.

cheemingli · 2020-04-02T19:07:42Z

Thanks for the PR 👍

I've left few comments. Also I think that the PR is missing a unit test that would actually check that for a failed story now prints the story name. 🙂

Give me some time to write a unit test and to be able to run it. I wasn't able to do it yet on my environment. Any guidance would be helpful. I'm using a Docker environment.

* Add data types * Use f-strings * Story file is a copy in a temporary directory. For now leave file path intact till it is clear what needs to be included in failed_stories

cheemingli · 2020-04-08T05:41:28Z

@degiz I just noticed two failing tests because of my changes:

test_persist_and_read_test_story_graph
test_persist_and_read_test_story

Because I've made changes in the tracker's sender_id in TrainingDataGenerator the assert fails now because the test is asserting on story files in different locations but the tracker contents should still be the same (except for the sender_id).

Do you think my changes are valid and we need to change the test asserts?
Or do you want me to add the story file path only when printing failed stories?

degiz · 2020-04-14T08:23:27Z

Hey @cheemingli

Give me some time to write a unit test and to be able to run it

So the idea of the change as I understand it is the following:

for each story block we keep the source file name (which is not the original story file, but a tmp copy of it)
in case some story block training fails, we can print the message that would contain the file name

I think the test should try to train a story with incorrect block, and check the stdout/stderr for the messages.

we need to change the test asserts

I think it's fine to change the asserts in mentioned tests cases.
I also see now why originally you wanted to strip _ from the tmp filenames.

…-failed-stories

The file source was always included in the tracker's sender id. This causes that persisted story to have 'different' trackers because the tracker's sender id will be different (because of the source). To make sure the impact is minimal only failed stories will be exported with the source of the story file. For this reason the tracker has been extended with an optional sender_source paramenter. (cherry picked from commit 090e55134d9922d13ba626b1311eeecaa8e04915)

cheemingli · 2020-04-16T15:06:37Z

@degiz I've pushed some new changes:

added a unit test to test that the file source is included in the failed stories output
made changes to only include the file source in exported stories when outputting failed stories. To make this possible I had to extend the tracker with an additional parameter. These were also the changes in the previous pull request which was created in the past for this improvement.

degiz

Thanks a lot for addressing the comments!

I've added two more minor things! Could you please also include a changelog entry to the ./changelog folder?

After that we'll be ready to merge 🚀

tests/core/test_evaluation.py

(cherry picked from commit 20ce257f21ec57ca388025511c2153a40e69694d)

degiz

Awesome job! 🚀 🚀

…-failed-stories # Conflicts: # tests/core/test_evaluation.py

cheemingli force-pushed the include-source-in-failed-stories branch from b8e2b73 to 8537252 Compare March 25, 2020 21:10

sara-tagger requested a review from TyDunn March 26, 2020 07:49

TyDunn requested review from degiz and removed request for TyDunn March 30, 2020 12:14

Merge branch 'master' into include-source-in-failed-stories

82b6b4b

degiz suggested changes Apr 2, 2020

View reviewed changes

Made changes based on review comments

74d3c5a

* Add data types * Use f-strings * Story file is a copy in a temporary directory. For now leave file path intact till it is clear what needs to be included in failed_stories

cheemingli force-pushed the include-source-in-failed-stories branch from 9189ee9 to 74d3c5a Compare April 2, 2020 19:09

cheemingli added 2 commits April 16, 2020 00:43

Merge remote-tracking branch 'upstream/master' into include-source-in…

e972166

…-failed-stories

Merge branch 'master' into include-source-in-failed-stories

d8877a6

degiz self-requested a review April 17, 2020 09:00

degiz suggested changes Apr 17, 2020

View reviewed changes

tests/core/test_evaluation.py Outdated Show resolved Hide resolved

tests/core/test_evaluation.py Outdated Show resolved Hide resolved

Extract failed stories filename to constant and add changelog

eef0f29

(cherry picked from commit 20ce257f21ec57ca388025511c2153a40e69694d)

degiz approved these changes Apr 20, 2020

View reviewed changes

degiz and others added 4 commits April 20, 2020 12:24

Merge branch 'master' into include-source-in-failed-stories

70391f1

Merge remote-tracking branch 'upstream/master' into include-source-in…

26f78e8

…-failed-stories # Conflicts: # tests/core/test_evaluation.py

Fix removed import when fixing merge conflicts

ce112d8

Merge branch 'master' into include-source-in-failed-stories

151723d

degiz merged commit 850344f into RasaHQ:master Apr 27, 2020

degiz added this to the 1.10 Rasa Open Source milestone Apr 27, 2020

wochinge mentioned this pull request Jan 29, 2021

Include story source filename in core output for failed stories #3419

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include story source filename in core output for failed stories #5496

Include story source filename in core output for failed stories #5496

cheemingli commented Mar 25, 2020 •

edited

CLAassistant commented Mar 25, 2020 •

edited

sara-tagger commented Mar 26, 2020 •

edited by TyDunn

degiz left a comment •

edited

degiz Apr 2, 2020

cheemingli Apr 2, 2020

cheemingli commented Apr 2, 2020

cheemingli commented Apr 8, 2020

degiz commented Apr 14, 2020

cheemingli commented Apr 16, 2020

degiz left a comment

degiz left a comment

Include story source filename in core output for failed stories #5496

Include story source filename in core output for failed stories #5496

Conversation

cheemingli commented Mar 25, 2020 • edited

CLAassistant commented Mar 25, 2020 • edited

sara-tagger commented Mar 26, 2020 • edited by TyDunn

degiz left a comment • edited

Choose a reason for hiding this comment

degiz Apr 2, 2020

Choose a reason for hiding this comment

cheemingli Apr 2, 2020

Choose a reason for hiding this comment

cheemingli commented Apr 2, 2020

cheemingli commented Apr 8, 2020

degiz commented Apr 14, 2020

cheemingli commented Apr 16, 2020

degiz left a comment

Choose a reason for hiding this comment

degiz left a comment

Choose a reason for hiding this comment

cheemingli commented Mar 25, 2020 •

edited

CLAassistant commented Mar 25, 2020 •

edited

sara-tagger commented Mar 26, 2020 •

edited by TyDunn

degiz left a comment •

edited