Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TrainingData.merge() doesn't check for duplicate examples per intent #1446

Closed
MetcalfeTom opened this issue Oct 4, 2018 · 10 comments
Closed
Assignees
Labels
difficulty:medium 🚶‍♀️ help wanted type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR

Comments

@MetcalfeTom
Copy link
Contributor

e.g. by merging

## intent:greet
- hi

with

## intent:greet
- hi

we get

## intent:greet
- hi
- hi
@MetcalfeTom MetcalfeTom added the type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR label Oct 4, 2018
@akelad
Copy link
Contributor

akelad commented Oct 5, 2018

just when merging? do even check for duplicates within an intent?

@MetcalfeTom
Copy link
Contributor Author

You're right, there's nothing implemented. The only data processing done for the intents upon building TrainingData() is in sanitize_examples() which removes whitespace

@akelad
Copy link
Contributor

akelad commented Apr 4, 2019

@MetcalfeTom is this still relevant?

@MetcalfeTom
Copy link
Contributor Author

It is still relevant for features like interactive learning where we dump new NLU data. I think it functions well as a community issue

@hsm207
Copy link
Contributor

hsm207 commented Sep 2, 2019

I want to work on this

@akelad
Copy link
Contributor

akelad commented Sep 3, 2019

awesome! let us know if we can help with anything

@hsm207
Copy link
Contributor

hsm207 commented Sep 3, 2019

@akelad what would the unit test using @MetcalfeTom example look like? Should I create a markdown file containing the given examples or can each intent be represented as an instance of TrainingData?

@akelad
Copy link
Contributor

akelad commented Sep 4, 2019

@hsm207 you can take a look at the tests written here: https://github.com/RasaHQ/rasa/blob/master/tests/nlu/base/test_training_data.py

hsm207 added a commit to hsm207/rasa that referenced this issue Sep 7, 2019
@hsm207
Copy link
Contributor

hsm207 commented Sep 7, 2019

@akelad I've made PR #4414 for your review.

hsm207 added a commit to hsm207/rasa that referenced this issue Oct 1, 2019
Rewrite test_markdown_entity_regex() because the training_examples in
TrainingData no longer maintains the order of the list of messages
passed to its constructor after sanitization.
@erohmensing erohmensing moved this from 💻Code - Open Issues to 🚀Doing in Contribute to Rasa Nov 12, 2019
hsm207 added a commit to hsm207/rasa that referenced this issue Nov 12, 2019
@hsm207
Copy link
Contributor

hsm207 commented Nov 23, 2019

@akelad This issue can be closed now.

@akelad akelad closed this as completed Nov 25, 2019
Contribute to Rasa automation moved this from 🚀Doing to ✅Done Nov 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty:medium 🚶‍♀️ help wanted type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR
Projects
No open projects
Development

No branches or pull requests

5 participants