Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include story source filename in core output for failed stories #5496

Merged
merged 11 commits into from Apr 27, 2020

Conversation

cheemingli
Copy link
Contributor

@cheemingli cheemingli commented Mar 25, 2020

Proposed changes:
Include story source filename in the story name in the failed stories output to help find the failed story more easily (see #3419).
Passed the source filename starting from the StoryFileReader to a StoryStep.
Besides the story block names the source filename is included in the tracker events which are used for outputting the failed stories.

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)

@CLAassistant
Copy link

CLAassistant commented Mar 25, 2020

CLA assistant check
All committers have signed the CLA.

Include story source filename in the story name in the failed stories output to help find the failed story more easily (see RasaHQ#3419).
Passed the source filename starting from the `StoryFileReader` to a `StoryStep`.
Besides the story block names the source filename is included in the tracker events which are used for outputting the failed stories.

Because the story files are copied to a temporary folder it is not possible to include the original full story path.Instead only the file name is included.
If a recursive folder structure is used with the same story file names it can still be hard to find the problem file.
@cheemingli cheemingli force-pushed the include-source-in-failed-stories branch from b8e2b73 to 8537252 Compare March 25, 2020 21:10
@sara-tagger sara-tagger requested a review from TyDunn March 26, 2020 07:49
@sara-tagger
Copy link
Collaborator

sara-tagger commented Mar 26, 2020

Thanks for submitting a pull request 🚀 @degiz will take a look at it as soon as possible ✨

@TyDunn TyDunn requested review from degiz and removed request for TyDunn March 30, 2020 12:14
Copy link
Contributor

@degiz degiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR 👍

I've left few comments. Also I think that the PR is missing a unit test that would actually check that for a failed story now prints the story name. 🙂

rasa/core/training/dsl.py Outdated Show resolved Hide resolved
rasa/core/training/dsl.py Outdated Show resolved Hide resolved
rasa/core/training/dsl.py Outdated Show resolved Hide resolved
rasa/core/training/generator.py Outdated Show resolved Hide resolved
rasa/core/training/generator.py Outdated Show resolved Hide resolved
rasa/data.py Outdated Show resolved Hide resolved
assert data.get_source_file_name("") == ""
assert data.get_source_file_name("/tmp/stories.md") == "stories.md"
assert data.get_source_file_name("/tmp/123_stories.md") == "stories.md"
assert data.get_source_file_name("/tmp/123_my_stories.md") == "my_stories.md"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few questions:

  • Why do we remove first _ at all? I though the idea of the PR is to keep the file name
  • What is the reason behind removing only first _ ? So for old_123_my_stories.md the result will be 123_my_stories.md

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original stories are copied to a temporary directory and the filename get prefixed with an unique ID. This was the reason to strip file name so the original story file name was printed.

I agree to leave the file name intact so there is still a reference to the story (in the temporary directory) which failed e.g.

## happy path > /tmp/tmp73u056dx/4f8a5df888ec4e96bafbf9dad54c624d_stories2.md

Removed the function which stripped the file name and made changes to pass the file name instead of stripped one.

@cheemingli
Copy link
Contributor Author

Thanks for the PR 👍

I've left few comments. Also I think that the PR is missing a unit test that would actually check that for a failed story now prints the story name. 🙂

Give me some time to write a unit test and to be able to run it. I wasn't able to do it yet on my environment. Any guidance would be helpful. I'm using a Docker environment.

* Add data types
* Use f-strings
* Story file is a copy in a temporary directory. For now leave file path intact till it is clear what needs to be included in failed_stories
@cheemingli cheemingli force-pushed the include-source-in-failed-stories branch from 9189ee9 to 74d3c5a Compare April 2, 2020 19:09
@cheemingli
Copy link
Contributor Author

@degiz I just noticed two failing tests because of my changes:

  • test_persist_and_read_test_story_graph
  • test_persist_and_read_test_story

Because I've made changes in the tracker's sender_id in TrainingDataGenerator the assert fails now because the test is asserting on story files in different locations but the tracker contents should still be the same (except for the sender_id).

Do you think my changes are valid and we need to change the test asserts?
Or do you want me to add the story file path only when printing failed stories?

@degiz
Copy link
Contributor

degiz commented Apr 14, 2020

Hey @cheemingli

Give me some time to write a unit test and to be able to run it

So the idea of the change as I understand it is the following:

  • for each story block we keep the source file name (which is not the original story file, but a tmp copy of it)
  • in case some story block training fails, we can print the message that would contain the file name

I think the test should try to train a story with incorrect block, and check the stdout/stderr for the messages.

we need to change the test asserts

I think it's fine to change the asserts in mentioned tests cases.
I also see now why originally you wanted to strip _ from the tmp filenames.

The file source was always included in the tracker's sender id.
This causes that persisted story to have 'different' trackers because the tracker's sender id will be different (because of the source).

To make sure the impact is minimal only failed stories will be exported with the source of the story file.
For this reason the tracker has been extended with an optional sender_source paramenter.

(cherry picked from commit 090e55134d9922d13ba626b1311eeecaa8e04915)
@cheemingli
Copy link
Contributor Author

@degiz I've pushed some new changes:

  • added a unit test to test that the file source is included in the failed stories output
  • made changes to only include the file source in exported stories when outputting failed stories. To make this possible I had to extend the tracker with an additional parameter. These were also the changes in the previous pull request which was created in the past for this improvement.

@degiz degiz self-requested a review April 17, 2020 09:00
Copy link
Contributor

@degiz degiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for addressing the comments!

I've added two more minor things! Could you please also include a changelog entry to the ./changelog folder?

After that we'll be ready to merge 🚀

tests/core/test_evaluation.py Outdated Show resolved Hide resolved
tests/core/test_evaluation.py Outdated Show resolved Hide resolved
(cherry picked from commit 20ce257f21ec57ca388025511c2153a40e69694d)
Copy link
Contributor

@degiz degiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome job! 🚀 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants