Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugs with Rasa X UI while uploading new training data #3580

Open
psds01 opened this issue May 24, 2019 · 14 comments

Comments

@psds01
Copy link

commented May 24, 2019

Rasa version: 1.0.1

Python version: 3.6.8/ Anaconda

Operating system (windows, osx, ...): Ubuntu

Issue:
First of all, great work team Rasa on the project Rasa X, this is just beautiful.

Rasa X UI bugs/features: I actually have found 4 "things" in UI of Rasa X that are not working as expected.

  1. [More of a Feature Request] Under Training -> NLU Training -> Training data.
    If I have training data loaded there already and try to upload a new training data file, it doesn't ask me for "confirmation" like "Are you sure? You might not have downloaded this tagged data." Should have options like : "Download current data and upload" and "ignore current data and upload".
  2. If I upload a new training data file under the same UI as above, the new file content does not show up immediately. I have to refresh the page and then it shows the new training data that I just uploaded. Keeps stale data even after uploading new data.
  3. The order in which the contents of this new training data is displayed is in reverse order. What I mean is if I have content like
    { "rasa_nlu_data": { "common_examples": [ { "text": "Hi", "intent": "", "entities": [] }, { "text": "Bye", "intent": "", "entities": [] } ] } }
    the UI will show the intents like this :
    Screenshot from 2019-05-24 11-06-48
  4. It takes more than 2 minutes for my rasa X UI to load. During this time, I see a file named rasa.db_journal getting created and deleted at least a hundred times. (This does not happen with rasa init)

Content of configuration file (config.yml):

Content of domain file (domain.yml) (if used & relevant):

@akelad

This comment has been minimized.

Copy link
Collaborator

commented May 24, 2019

awesome @psds01 thanks so much for your feedback :D glad you like Rasa X!

  1. I believe this is something we've already incorporated in our designs for future updates -- @abhilasharoy can you confirm?
  2. @gausie can you look into that?
  3. Hmm, is it very important that it's in the order of the file?
  4. @ricwo can you look into that?
@akelad

This comment has been minimized.

Copy link
Collaborator

commented May 24, 2019

actually re 3: I guess in future the plan is to group by intent anyways right @abhilasharoy ?

@psds01

This comment has been minimized.

Copy link
Author

commented May 24, 2019

First approach, let's say I have 10 conversations b/w a bot and a user. Normally, I would combine all user inputs into a single nlu.json file, upload on rasa X, tag it, and train rasa/custom nlu. Having done that, it would mean that I have to go through these 10 conversations again to create "stories" to train rasa core. This would work if you have a small number of conversations to tag.

But, let's say I have 1000 of such conversations that I need tagged. To train both nlu and core, I can upload a single conversation, tag its intents and generate a rasa story for that conversation using some custom script. I can do this for all the 1000 files. This will give me 1000 intents files and 1000 stories files but I have to go over each file only once (unlike the first approach, where I go through intents as well as stories).

In the first approach, I tag 2000 in total (1000 files at a time, for intent tagging and 1000X1 file at a time for stories creation). In the second approch, I tag 1000X1 file at a time, and from that itself I can generate 1000X1 stories, saving me the manual trouble of making 1000 rasa stories.

So for me atleast, it makes sense that I have my intents in order and not grouped by intents.

@abhilasharoy

This comment has been minimized.

Copy link

commented May 24, 2019

@akelad

  1. This actually has not been handled yet (UX wise), It'll be on my to-do now.
  2. After thinking about this, I remember that it was a conscious decision to keep the order this way. Uploading from files may be a bit weird (as mentioned here), but this order works better when training data is added manually. Because the user is then able to see the most recently added sentence on top.

And in the future, it would not be grouped by intent by default on this screen (though we do have designs for a "intents" landing page, where this would happen). But discovery can still be super easy because users can filter by intent at will.

@psds01

This comment has been minimized.

Copy link
Author

commented May 24, 2019

We can avoid this, generating stories and intents differently for a conversation, if my 'nlu.json' file also had two addition fields, namely, story_id and message_id. To be more specific, instead of

{
    "rasa_nlu_data": {
        "common_examples": [
            {
                "text": "Hi",
                "intent": "",
                "entities": []
            },
            {
                "text": "Bye",
                "intent": "",
                "entities": []
            }
        ]
    }
}

if I had

{
    "rasa_nlu_data": {
        "common_examples": [
            {
                "story_id":"1",
                "message_id":"1",
                "text": "Hi",
                "intent": "",
                "entities": []
            },
            {
                "story_id":"1",
                "message_id":"2",
                "text": "Bye",
                "intent": "",
                "entities": []
            }
        ]
    }
}

Then it would save a lot of time while tagging files. I wouldn't need to worry about the order in which intent text appeared on the rasa X UI. I could always get multiple rasa stories from a single nlu.json based on story_id and message_id .

This would make it very easy to tag data : intents or stories.

@psds01

This comment has been minimized.

Copy link
Author

commented May 24, 2019

@akelad @ricwo @gausie
Any suggestions?

@ricwo

This comment has been minimized.

Copy link
Collaborator

commented May 24, 2019

Regarding the startup delay - do you experience this with a fresh project (after rasa init), or after importing lots of training data? It sounds like it could be due to a very large database

@psds01

This comment has been minimized.

Copy link
Author

commented May 24, 2019

Thanks @ricwo for the kind comment. There is no startup delay with rasa init.
When I train on custom data, the sizes of databases are: rasa.db = 2.5 MB and tracker.db = 151.6 kB. Is 2.5 MB a large DB in this case?
Thanks,

@tmbo tmbo added bug Rasa X UI enhancement and removed bug labels May 27, 2019

@tmbo

This comment has been minimized.

Copy link
Member

commented May 27, 2019

It makes more sense to open a separate issue for the startup delay as that is separate from the whole training data upload topic. @psds01 do you mind doing that?

@psds01

This comment has been minimized.

Copy link
Author

commented May 27, 2019

Thanks for the suggestion @tmbo . Here's the issue: #3611

@rgstephens

This comment has been minimized.

Copy link
Contributor

commented Jul 31, 2019

I just ran across the import issue #1 in @psds01 original post. I was surprised to find all my data removed when I imported new data.

My request is that the UI change and have options to allow the user to choose to either replace or add to the existing training data when they do the import.

@rgstephens

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2019

Is this getting any attention? I'm sorry to see #4067 getting resolved without addressing this related issue.

@akelad

This comment has been minimized.

Copy link
Collaborator

commented Aug 13, 2019

@rgstephens could you clarify what you mean? That PR is related to addressing the NLU training data page, not related to Rasa X

@rgstephens

This comment has been minimized.

Copy link
Contributor

commented Aug 13, 2019

Sounds like I mis-read that issue. Nevermind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.