New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data set creation routine for gobot DSTC 2 format #1230
Comments
Hi @oserikov so you have self-assigned this to answer my questions or do you want to roll out the routine? If you want to roll out then could you please inform when do you think you could finish this? I need it by some deadline so I need your answer on this. |
Hey! I'll be happy to answer questions and provide the necessary help on that feature -- it is extremely useful. Sorry for the delayed response. I will write a bit more meaningful comment either tonight or morning. |
ok, thanks, will be waiting then |
Ok so
Yes, definitely
Regardless of the target format one should properly understand the input data before serializing. I think that it would be useful to have an intermediate data representation as smth pythonic (or detaframe probably). I would preserve the ability to easily implement additional dialogues serialization formats.
What CLI would look like? |
Thanks for your response. I will proceed to the creation of the routine with some prototype for your consideration soon. |
Ok, so I came up with something like below. You run it using command line interface.
and the sample output (saved as 'generated_data.json'):
I just used the first dialog in the train file to emulate the generation. There are bugs here and there, which I will fix. |
So please guide me what should be best to do next, how to refactor the code, should this module inherit from a deeppavlov component to make it a part of a chain if needed, your view on the GUI version, etc |
Hey! Cool, thanks! I'll take a look this morning |
Ok what's the purpose of this exactly conversion? It does dstc2 -> dstc8 conversion, did I get it right? |
I think, I'd better paste you the console. Below is the console and the input. By inputting "dstc2-templates.txt", "dstc_slot_vals.json" and "db.sqlite", we can generate any dialog. Above, I just tried to recreate the first dialog in the train dataset of the dstc2 dataset:
|
I hope you got the idea by now. If there are any questions let me know. I want to finalize this quickly to take it to a pull request so your comment is very much appreciated. |
Ok! I got it, thanks 😅 Seems like I didn't get you right at first, my bad. So.. it does the data generation (to dstc8 if I got you right, at least that's what I expected to see). That's cool. I think that the next step is to allow for DP to use the dstc8 format: the library codebase would extremely benefit from this. This could be pretty simple: we have a data reader: either To provide some intuition I'll describe the The purpose of The main method
Turns are (in our format and iirc in dstc8 too) dicts so that's about sharing some fields of simultanous dicts in dataset. So what do you think about this? I find this Idea easier to implement than the GUI but extremely relevant. |
oops, the codebase has significantly changed recently, I was behind several commits... Thanks for your guidance, this is exactly what I wanted right from the beginning, and now when I have your input, I will take it from here. |
@oserikov I have created a PR but I am not sure why I cannot pass jenkins tests. can you explain what is wrong and how can I pass them? |
Hello @Eugen2525 Can you please provide the detailed Deeppavlov Go_Bot tutorial for a custom dataset. |
yes, I am planning to do some tutorial for creating a bot from scratch, with no initial train data at hand. I will do it soon, so stay tooned |
@Eugen2525 Thanks. Without training data how can we do that? Can you please share some thoughts of how you trained the bot and the format in which you prepared the dataset for your custom data |
…eppavlov#1249) resolves deeppavlov#1230 * fix: issue#1230 * create contrib section; move train_set_generation tool to contrib * fix import after moved gobot_generation * added input files * generated data added * added readme and data download * remove hardcoded paths from notebook Co-authored-by: oserikov <srkvoa@gmail.com>
Hi,
I want to create data set creation routine for gobot DSTC 2 format. I know that that there is an on going refactoring of the codebase for the Goal-oriented bot (gobot).
Also, there is a new DSTC 8 challenge and Alexa Prize socialbot which is to be open sourced.
So I want to ask if this feature would be needed or is it duplication of work?
Ideally, I want to pull the routine to the deeppavlov repo, so I need some guidance/advice before jumping into the implementation.
Things I want to clarify:
Anything else you think might be appropriate.
The text was updated successfully, but these errors were encountered: