Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of Core YAML parser #6001

Merged
merged 5 commits into from
Jun 30, 2020
Merged

Conversation

degiz
Copy link
Contributor

@degiz degiz commented Jun 11, 2020

Part of #5996

Proposed changes:

  • Created new YAML parser
  • Extracted common parts for YAML and MD to the separate classes
  • Added unit tests

Forms and Rules are not covered in this PR.

Things to address in the next PRs:

  • Update the docs
  • Split loaders for Core and NLU files

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)

@degiz degiz changed the title YAML parser for the Core training data Initial implenemtation of Core YAML parser Jun 19, 2020
@degiz degiz marked this pull request as ready for review June 22, 2020 13:41
@wochinge wochinge changed the title Initial implenemtation of Core YAML parser Initial implementation of Core YAML parser Jun 24, 2020
Copy link
Contributor

@wochinge wochinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the cleanness of the implementation 🚀

  • What about a YAMLStoryWriter ?

data/test_yaml_stories/stories_2.yml Outdated Show resolved Hide resolved
data/test_yaml_stories/stories_3.yml Outdated Show resolved Hide resolved
rasa/core/training/dsl.py Show resolved Hide resolved
rasa/core/training/story_reader/markdown_story_reader.py Outdated Show resolved Hide resolved
tests/core/test_dsl.py Show resolved Hide resolved
tests/core/training/story_reader/test_yaml_story_reader.py Outdated Show resolved Hide resolved
@degiz
Copy link
Contributor Author

degiz commented Jun 24, 2020

Thanks for the review! I'm bringing the checkpoints and or for the yaml parsers, so I'll request yet another one.

What about a YAMLStoryWriter?

As mentioned in the #5970 I plan to implement writers for nlu and stories as a separate reviews.

Copy link
Contributor

@federicotdn federicotdn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tobias' review was already super extensive so I didn't add many comments. It's working nicely at the moment and I was able to write stories using the new format. Here's some things that we should consider changing:

  • If I don't prefix the user intent with /, I get three warnings for this. Though I am not sure if this is specific to this format.
  • When adding a new story, if I do not write story correctly (e.g. storie), the block is ignored. I think we should show a warning in this case, if the key does not correspond to any of the ones we recognize.
  • If I don't add steps , or write steps incorrectly, the entire story is ignored. We should show a warning in this case.
  • (In reference to my comment below) If I don't specify action/user/slot/etc. correctly, the step is ignored. We should should a warning for this. Maybe we can do something similar to what I suggested in the NLU PR, take one of the keys of the dict and include them in the warning.
  • If I don't write entities correctly in the user step, they are ignored. Maybe a warning for this would work as well.

So I guess the main improvements we could do are making sure the user knows if they have made a mistake writing stories.

rasa/core/training/loading.py Outdated Show resolved Hide resolved
rasa/core/training/story_reader/markdown_story_reader.py Outdated Show resolved Hide resolved
rasa/core/training/story_reader/yaml_story_reader.py Outdated Show resolved Hide resolved
rasa/core/training/story_reader/yaml_story_reader.py Outdated Show resolved Hide resolved
@degiz degiz force-pushed the 5996_new_stories_parser branch 3 times, most recently from 0d16c86 to 358f67b Compare June 29, 2020 11:52
@degiz degiz requested a review from wochinge June 29, 2020 12:57
@degiz degiz requested a review from federicotdn June 29, 2020 12:59
Alexander Khizov added 3 commits June 29, 2020 15:54
- Created new YAML parser
- Extracted common parts for YAML and MD to the separate classes
- Added unit tests
- Added support for `OR` and `Checkpoints` for the YAML parser
- Addressing PR review comments
- Added more tests
Copy link
Contributor

@wochinge wochinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't checked the tests yet, will do that separately.

Great work 🚀 💪 🎸 🥒 🌶️

data/test_mixed_yaml_md_stories/stories_part_1.yml Outdated Show resolved Hide resolved
data/test_yaml_stories/stories.yml Show resolved Hide resolved
data/test_yaml_stories/stories.yml Outdated Show resolved Hide resolved
rasa/core/training/loading.py Outdated Show resolved Hide resolved
rasa/core/training/story_reader/yaml_story_reader.py Outdated Show resolved Hide resolved
rasa/core/training/story_reader/yaml_story_reader.py Outdated Show resolved Hide resolved
rasa/core/training/story_reader/yaml_story_reader.py Outdated Show resolved Hide resolved
Copy link
Contributor

@wochinge wochinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing: I think the logger.error()s should actually be raise_warnings, no? I'm referring to the ones in yaml_story_reader

@wochinge
Copy link
Contributor

wochinge commented Jun 29, 2020

stories:
- story:
  steps:
  - user: /greet
  - action: utter_greet

This breaks with the rasa init bot for me.

  File "/Users/tobias/Workspace/stack/rasa/importers/importer.py", line 242, in get_domain
    domains = await asyncio.gather(*domains)
  File "/Users/tobias/Workspace/stack/rasa/importers/rasa.py", line 64, in get_domain
    domain = Domain.load(self._domain_path)
  File "/Users/tobias/Workspace/stack/rasa/core/domain.py", line 100, in load
    other = cls.from_path(path)
  File "/Users/tobias/Workspace/stack/rasa/core/domain.py", line 110, in from_path
    domain = cls.from_file(path)
  File "/Users/tobias/Workspace/stack/rasa/core/domain.py", line 123, in from_file
    return cls.from_yaml(rasa.utils.io.read_file(path))
  File "/Users/tobias/Workspace/stack/rasa/core/domain.py", line 133, in from_yaml
    return cls.from_dict(data)
  File "/Users/tobias/Workspace/stack/rasa/core/domain.py", line 149, in from_dict
    slots = cls.collect_slots(data.get("slots", {}))
  File "/Users/tobias/Workspace/stack/rasa/core/domain.py", line 268, in collect_slots
    slot = slot_class(slot_name, **slot_dict[slot_name])
TypeError: __init__() got an unexpected keyword argument 'text'

@wochinge
Copy link
Contributor

Ignore my comment from above. I think that was due to an experimental state my domain file was in :-D

@wochinge
Copy link
Contributor

wochinge commented Jun 29, 2020

with

.
├── __init__.py
├── actions.py
├── config.yml
├── credentials.yml
├── data
│   ├── nlu.md
│   └── stories.yml
├── domain.yml

Screenshot 2020-06-29 at 18 20 13

@degiz
Copy link
Contributor Author

degiz commented Jun 29, 2020

@wochinge yes, I see it too. Bad that I didn't add a test for that.

@degiz degiz force-pushed the 5996_new_stories_parser branch 2 times, most recently from c39e551 to 1c66a07 Compare June 29, 2020 19:20
- Addessed the comments
- Adjusted the tests
@degiz degiz requested a review from wochinge June 30, 2020 07:34
Copy link
Contributor

@wochinge wochinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great work @degiz 💪

Do you wanna add a changelog?

data/test_wrong_yaml_stories/wrong_yaml.yml Outdated Show resolved Hide resolved
data/test_yaml_stories/stories_3.yml Outdated Show resolved Hide resolved
data/test_yaml_stories/stories_3.yml Outdated Show resolved Hide resolved
reader = MarkdownStoryReader(
interpreter, domain, template_variables, use_e2e, filename
)
elif PurePath(filename).suffix in YAML_FILE_EXTENSIONS:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that doesn't work? Why do we need PurePath?

Suggested change
elif PurePath(filename).suffix in YAML_FILE_EXTENSIONS:
elif Path(filename).suffix in YAML_FILE_EXTENSIONS:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path is a child of PurePath, and the pathlib docs mention PurePath(filename).suffix as an example, I simply followed it.
But since you've found that confusing I can change that.

rasa/core/training/story_reader/yaml_story_reader.py Outdated Show resolved Hide resolved
f"Issue found in {self.source_name}: \n"
f'User intent "{user_utterance}" should start with '
f'"{INTENT_MESSAGE_PREFIX}"',
docs=DOCS_URL_STORIES,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still should tell them that we skipped

rasa/core/training/story_reader/yaml_story_reader.py Outdated Show resolved Hide resolved
rasa/core/training/story_reader/yaml_story_reader.py Outdated Show resolved Hide resolved
rasa/data.py Outdated Show resolved Hide resolved
@degiz
Copy link
Contributor Author

degiz commented Jun 30, 2020

Do you wanna add a changelog?

I guess yes, thanks. Even though this is only an "Initial implementation". But probably later we can simply modify the same changelog entry?

Copy link
Contributor

@federicotdn federicotdn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@rasabot rasabot merged commit a706b30 into master Jun 30, 2020
@rasabot rasabot deleted the 5996_new_stories_parser branch June 30, 2020 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants