Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data set creation routine for gobot DSTC 2 format #1230

Closed
Eugen2525 opened this issue May 25, 2020 · 17 comments
Closed

Data set creation routine for gobot DSTC 2 format #1230

Eugen2525 opened this issue May 25, 2020 · 17 comments
Assignees

Comments

@Eugen2525
Copy link
Contributor

Hi,

I want to create data set creation routine for gobot DSTC 2 format. I know that that there is an on going refactoring of the codebase for the Goal-oriented bot (gobot).

Also, there is a new DSTC 8 challenge and Alexa Prize socialbot which is to be open sourced.

So I want to ask if this feature would be needed or is it duplication of work?

Ideally, I want to pull the routine to the deeppavlov repo, so I need some guidance/advice before jumping into the implementation.

Things I want to clarify:

  1. Is this routine needed to be developed? Or is it already underway and it would be duplication of work?
  2. What format would be best (DSTC 2 json, DSTC 8, etc)?
  3. I want to create CLI with python, is it good?

Anything else you think might be appropriate.

@oserikov oserikov self-assigned this May 25, 2020
@Eugen2525
Copy link
Contributor Author

Hi @oserikov

so you have self-assigned this to answer my questions or do you want to roll out the routine?

If you want to roll out then could you please inform when do you think you could finish this? I need it by some deadline so I need your answer on this.

@oserikov
Copy link
Contributor

Hey! I'll be happy to answer questions and provide the necessary help on that feature -- it is extremely useful. Sorry for the delayed response. I will write a bit more meaningful comment either tonight or morning.

@Eugen2525
Copy link
Contributor Author

ok, thanks, will be waiting then

@oserikov
Copy link
Contributor

oserikov commented May 27, 2020

Ok so

Is this routine needed to be developed?

Yes, definitely


What format would be best (DSTC 2 json, DSTC 8, etc)?

Regardless of the target format one should properly understand the input data before serializing.

I think that it would be useful to have an intermediate data representation as smth pythonic (or detaframe probably).
The intermediate representation could then be serialized in multiple ways: I think that both dstc2 and dstc8 serializations are OK, the latter is more readable probs.

I would preserve the ability to easily implement additional dialogues serialization formats.


I want to create CLI with python, is it good?

What CLI would look like?

@Eugen2525
Copy link
Contributor Author

Thanks for your response.

I will proceed to the creation of the routine with some prototype for your consideration soon.

@Eugen2525
Copy link
Contributor Author

Ok, so I came up with something like below. You run it using command line interface.

import json
import deeppavlov.models.go_bot.templates as templ
from deeppavlov.models.go_bot.templates import DefaultTemplate
from logging import getLogger
from deeppavlov.models.slotfill.slotfill_raw import SlotFillingComponent
from deeppavlov.core.data.sqlite_database import Sqlite3Database
from deeppavlov.models.go_bot.tracker import DialogueStateTracker
import re
import itertools
log = getLogger(__name__)

class TrainSetGeneration():

    def __init__(self, template_path, slot_path, save_path, db_path, primary_key = ['name']):
        self.templates = templ.Templates(DefaultTemplate()).load(template_path)
        self.slotfiller = SlotFillingComponent(load_path=slot_path, save_path=slot_path)
        self.save_path = save_path
        self.database = Sqlite3Database(db_path, primary_key)
        self.ds_tracker = DialogueStateTracker(slot_names=list(self.slotfiller._slot_vals.keys()),
                                               n_actions=len(self.templates.actions),
                                               hidden_size=128,
                                               database=self.database)
        self.slots = list(set(list(itertools.chain.from_iterable(map(lambda x: re.findall(r"#(\w+)", x.text), self.templates.templates)))
                              + list(self.slotfiller._slot_vals.keys())))
        self.utters = []
        self.slots_history = {}


    def get_id_input(self, prompt, valid_vals):
        idx = -1
        while idx == -1:
            try:
                idx = int(input(prompt))
                if not idx in valid_vals:
                    print('please input a valid integer in: ', valid_vals)
                    idx = -1
            except ValueError:
                print('please enter integer value')
        return idx

    def save_dialog(self):
        from pathlib import Path
        with open(Path(self.save_path), 'w', encoding='utf8') as f:
            print(self.utters)
            json.dump(self.utters, f)

    def get_user_input(self):
        text = input('write a user sentence: ')
        has_slot = self.get_id_input(prompt = 'type 1 if your sentence has a slot, else 0: ',
                                     valid_vals = [0, 1])
        slots = []
        if has_slot:

            while has_slot:
                for i, key in enumerate(self.slots):
                    print(i, key)
                idx = self.get_id_input(prompt = 'type slot category number from the list: ',
                                        valid_vals = list(range(len(self.slots))))
                slot_category = self.slots[idx]
                if slot_category in self.slotfiller._slot_vals:
                    id2key = {}
                    for i, key in enumerate(self.slotfiller._slot_vals[slot_category]):
                        print(i, key)
                        id2key[i]=key
                    idx = self.get_id_input(prompt = 'type slot subcategory number from the list: ',
                                            valid_vals = list(range(len(id2key))))
                    sub_category = id2key[idx]
                else:
                    sub_category = ''

                slots.append([slot_category, sub_category])
                has_slot = self.get_id_input(prompt = 'type 1 if you want to add more slots, else 0: ',
                                             valid_vals = [0, 1])

        user_input = {'speaker': 1,
                      'text': text,
                      'slots': slots}
        print(user_input)
        self.update_slots_history(slots)
        self.utters.append(user_input)

    def update_slots_history(self, slots):
        for slot, val in slots:
            self.slots_history[slot] = val

    def start_generation(self):

        while True:
            turn = self.get_id_input(prompt = 'choose turn (1 for user, 2 for bot, 33 for saving and exit): ',
                                     valid_vals = [1, 2, 33])
            if turn == 1:
                self.get_user_input()
            elif turn == 2:
                self.get_bot_output()
            else:
                self.save_dialog()
                return


    def get_bot_output(self):

        print('current slot vals are: ', self.slots_history)
        for i, act in enumerate(self.templates.actions):
            print(i, act)
        id = int(input('type template number from the list: '))
        template_slots = re.findall(r"#(\w+)", self.templates.templates[id].text)
        slots = [[slot, self.slots_history[slot]] for slot in template_slots if slot in self.slots_history and slot in self.slotfiller._slot_vals]
        missing_slots = [st for st in template_slots if st not in slots]
        if missing_slots and self.ds_tracker.db_result:
            for slot in missing_slots:
                slots.append([slot, self.ds_tracker.db_result[slot]])
        text = self.templates.templates[id].generate_text(slots).strip()
        print('generated response is: ', text)

        if 'api_call' in self.templates.templates[id].text:
            self.ds_tracker.update_state(slots)
            self.ds_tracker.make_api_call()
            print('the result of the db call is: ', self.ds_tracker.db_result)

            bot_output = {'speaker': 2,
                  'text': text,
                  'db_result': json.dumps(self.ds_tracker.db_result),
                  'slots': slots,
                  'act': self.templates.actions[id]}
        else:
            bot_output = {'speaker': 2,
                  'text': text,
                  'slots': slots,
                  'act': self.templates.actions[id]}
        self.utters.append(bot_output)




if __name__ == '__main__':
    trainsetgen = TrainSetGeneration(template_path = 'generation/dstc2-templates.txt',
                                     slot_path = 'generation/dstc_slot_vals.json',
                                     save_path = 'generation/generated_data.json',
                                     db_path = 'my_bot/db.sqlite')

    trainsetgen.start_generation()

and the sample output (saved as 'generated_data.json'):

[
	{
		"speaker": 2,
		"text": "Hello, welcome to the Cambridge restaurant system. You can ask for restaurants by area, price range or food type. How may I help you?",
		"slots": [],
		"act": "welcomemsg"
	},
	{
		"speaker": 1,
		"text": "cheap restaurant",
		"slots": [
			[
				"pricerange",
				"cheap"
			]
		]
	},
	{
		"speaker": 2,
		"text": "What kind of food would you like?",
		"slots": [],
		"act": "request_food"
	},
	{
		"speaker": 1,
		"text": "any",
		"slots": [
			[
				"this",
				"dontcare"
			]
		]
	},
	{
		"speaker": 2,
		"text": "What part of town do you have in mind?",
		"slots": [],
		"act": "request_area"
	},
	{
		"speaker": 1,
		"text": "south",
		"slots": [
			[
				"area",
				"south"
			]
		]
	},
	{
		"speaker": 2,
		"text": "Api_call area=\"south\" food=\"#food\" pricerange=\"cheap\"\tapi_call area=\"south\" food=\"#food\" pricerange=\"cheap\"",
		"db_result": "{\"food\": \"chinese\", \"pricerange\": \"cheap\", \"area\": \"south\", \"postcode\": \"c.b 1, 7 d.y\", \"phone\": \"01223 244277\", \"addr\": \"cambridge leisure park clifton way cherry hinton\", \"name\": \"the lucky star\"}",
		"slots": [
			[
				"area",
				"south"
			],
			[
				"pricerange",
				"cheap"
			],
			[
				"area",
				"south"
			],
			[
				"pricerange",
				"cheap"
			]
		],
		"act": "api_call"
	},
	{
		"speaker": 2,
		"text": "The lucky star is a nice place in the south of town serving tasty chinese food.",
		"slots": [
			[
				"area",
				"south"
			],
			[
				"name",
				"the lucky star"
			],
			[
				"area",
				"south"
			],
			[
				"food",
				"chinese"
			]
		],
		"act": "inform_area+inform_food+offer_name"
	},
	{
		"speaker": 1,
		"text": "address",
		"slots": [
			[
				"addr",
				""
			]
		]
	},
	{
		"speaker": 2,
		"text": "Sure, the lucky star is on cambridge leisure park clifton way cherry hinton.",
		"slots": [
			[
				"name",
				"the lucky star"
			],
			[
				"addr",
				"cambridge leisure park clifton way cherry hinton"
			]
		],
		"act": "inform_addr+offer_name"
	},
	{
		"speaker": 1,
		"text": "phone number",
		"slots": [
			[
				"phone",
				""
			]
		]
	},
	{
		"speaker": 2,
		"text": "The phone number of the lucky star is 01223 244277.\tThe phone number of the lucky star is dontcare.",
		"slots": [
			[
				"name",
				"the lucky star"
			],
			[
				"phone",
				"01223 244277"
			],
			[
				"name",
				"the lucky star"
			]
		],
		"act": "inform_phone+offer_name"
	},
	{
		"speaker": 1,
		"text": "thank you good bye",
		"slots": []
	},
	{
		"speaker": 2,
		"text": "You are welcome!",
		"slots": [],
		"act": "bye"
	}
]

I just used the first dialog in the train file to emulate the generation. There are bugs here and there, which I will fix.
At this stage, I want your general opinion on it to make it more deeppavlov way. Then, I want to create pull request to the repository.

@Eugen2525
Copy link
Contributor Author

So please guide me what should be best to do next, how to refactor the code, should this module inherit from a deeppavlov component to make it a part of a chain if needed, your view on the GUI version, etc

@oserikov
Copy link
Contributor

oserikov commented Jun 2, 2020

Hey! Cool, thanks! I'll take a look this morning

@oserikov
Copy link
Contributor

oserikov commented Jun 3, 2020

Ok what's the purpose of this exactly conversion? It does dstc2 -> dstc8 conversion, did I get it right?

@Eugen2525
Copy link
Contributor Author

I think, I'd better paste you the console. Below is the console and the input.

By inputting "dstc2-templates.txt", "dstc_slot_vals.json" and "db.sqlite", we can generate any dialog. Above, I just tried to recreate the first dialog in the train dataset of the dstc2 dataset:

choose turn (1 for user, 2 for bot, 33 for saving and exit): current slot vals are:  {}
0 api_call
1 bye
2 canthear
3 canthelp_area
4 canthelp_area_food
5 canthelp_area_food_pricerange
6 canthelp_area_pricerange
7 canthelp_food
8 canthelp_food_pricerange
9 confirm-domain
10 expl-conf_area
11 expl-conf_food
12 expl-conf_pricerange
13 impl-conf_area+impl-conf_pricerange+request_food
14 impl-conf_food+impl-conf_pricerange+request_area
15 impl-conf_food+request_area
16 inform_addr+inform_food+offer_name
17 inform_addr+inform_phone+inform_pricerange+offer_name
18 inform_addr+inform_phone+offer_name
19 inform_addr+inform_postcode+offer_name
20 inform_addr+inform_pricerange+offer_name
21 inform_addr+offer_name
22 inform_area+inform_food+inform_pricerange+offer_name
23 inform_area+inform_food+offer_name
24 inform_area+inform_phone+offer_name
25 inform_area+inform_postcode+offer_name
26 inform_area+inform_pricerange+offer_name
27 inform_area+offer_name
28 inform_food+inform_pricerange+offer_name
29 inform_food+offer_name
30 inform_phone+inform_postcode+offer_name
31 inform_phone+inform_pricerange+offer_name
32 inform_phone+offer_name
33 inform_postcode+inform_pricerange+offer_name
34 inform_postcode+offer_name
35 inform_pricerange+offer_name
36 offer_name
37 repeat
38 reqmore
39 request_area
40 request_food
41 request_pricerange
42 select_area
43 select_food
44 select_pricerange
45 welcomemsg
type template number from the list: generated response is:  Hello, welcome to the Cambridge restaurant system. You can ask for restaurants by area, price range or food type. How may I help you?
choose turn (1 for user, 2 for bot, 33 for saving and exit): write a user sentence: type 1 if your sentence has a slot, else 0: 0 postcode
1 area
2 name
3 food
4 this
5 phone
6 pricerange
7 addr
type slot category number from the list: 0 moderate
1 expensive
2 cheap
3 dontcare
type slot subcategory number from the list: type 1 if you want to add more slots, else 0: {'speaker': 1, 'text': 'cheap restaurant', 'slots': [['pricerange', 'cheap']]}
choose turn (1 for user, 2 for bot, 33 for saving and exit): current slot vals are:  {'pricerange': 'cheap'}
0 api_call
1 bye
2 canthear
3 canthelp_area
4 canthelp_area_food
5 canthelp_area_food_pricerange
6 canthelp_area_pricerange
7 canthelp_food
8 canthelp_food_pricerange
9 confirm-domain
10 expl-conf_area
11 expl-conf_food
12 expl-conf_pricerange
13 impl-conf_area+impl-conf_pricerange+request_food
14 impl-conf_food+impl-conf_pricerange+request_area
15 impl-conf_food+request_area
16 inform_addr+inform_food+offer_name
17 inform_addr+inform_phone+inform_pricerange+offer_name
18 inform_addr+inform_phone+offer_name
19 inform_addr+inform_postcode+offer_name
20 inform_addr+inform_pricerange+offer_name
21 inform_addr+offer_name
22 inform_area+inform_food+inform_pricerange+offer_name
23 inform_area+inform_food+offer_name
24 inform_area+inform_phone+offer_name
25 inform_area+inform_postcode+offer_name
26 inform_area+inform_pricerange+offer_name
27 inform_area+offer_name
28 inform_food+inform_pricerange+offer_name
29 inform_food+offer_name
30 inform_phone+inform_postcode+offer_name
31 inform_phone+inform_pricerange+offer_name
32 inform_phone+offer_name
33 inform_postcode+inform_pricerange+offer_name
34 inform_postcode+offer_name
35 inform_pricerange+offer_name
36 offer_name
37 repeat
38 reqmore
39 request_area
40 request_food
41 request_pricerange
42 select_area
43 select_food
44 select_pricerange
45 welcomemsg
type template number from the list: generated response is:  What kind of food would you like?
choose turn (1 for user, 2 for bot, 33 for saving and exit): write a user sentence: type 1 if your sentence has a slot, else 0: 0 postcode
1 area
2 name
3 food
4 this
5 phone
6 pricerange
7 addr
type slot category number from the list: 0 dontcare
type slot subcategory number from the list: type 1 if you want to add more slots, else 0: {'speaker': 1, 'text': 'any', 'slots': [['this', 'dontcare']]}
choose turn (1 for user, 2 for bot, 33 for saving and exit): current slot vals are:  {'pricerange': 'cheap', 'this': 'dontcare'}
0 api_call
1 bye
2 canthear
3 canthelp_area
4 canthelp_area_food
5 canthelp_area_food_pricerange
6 canthelp_area_pricerange
7 canthelp_food
8 canthelp_food_pricerange
9 confirm-domain
10 expl-conf_area
11 expl-conf_food
12 expl-conf_pricerange
13 impl-conf_area+impl-conf_pricerange+request_food
14 impl-conf_food+impl-conf_pricerange+request_area
15 impl-conf_food+request_area
16 inform_addr+inform_food+offer_name
17 inform_addr+inform_phone+inform_pricerange+offer_name
18 inform_addr+inform_phone+offer_name
19 inform_addr+inform_postcode+offer_name
20 inform_addr+inform_pricerange+offer_name
21 inform_addr+offer_name
22 inform_area+inform_food+inform_pricerange+offer_name
23 inform_area+inform_food+offer_name
24 inform_area+inform_phone+offer_name
25 inform_area+inform_postcode+offer_name
26 inform_area+inform_pricerange+offer_name
27 inform_area+offer_name
28 inform_food+inform_pricerange+offer_name
29 inform_food+offer_name
30 inform_phone+inform_postcode+offer_name
31 inform_phone+inform_pricerange+offer_name
32 inform_phone+offer_name
33 inform_postcode+inform_pricerange+offer_name
34 inform_postcode+offer_name
35 inform_pricerange+offer_name
36 offer_name
37 repeat
38 reqmore
39 request_area
40 request_food
41 request_pricerange
42 select_area
43 select_food
44 select_pricerange
45 welcomemsg
type template number from the list: generated response is:  What part of town do you have in mind?
choose turn (1 for user, 2 for bot, 33 for saving and exit): write a user sentence: type 1 if your sentence has a slot, else 0: 0 postcode
1 area
2 name
3 food
4 this
5 phone
6 pricerange
7 addr
type slot category number from the list: 0 south
1 east
2 dontcare
3 north
4 west
5 centre
type slot subcategory number from the list: type 1 if you want to add more slots, else 0: {'speaker': 1, 'text': 'south', 'slots': [['area', 'south']]}
choose turn (1 for user, 2 for bot, 33 for saving and exit): current slot vals are:  {'pricerange': 'cheap', 'this': 'dontcare', 'area': 'south'}
0 api_call
1 bye
2 canthear
3 canthelp_area
4 canthelp_area_food
5 canthelp_area_food_pricerange
6 canthelp_area_pricerange
7 canthelp_food
8 canthelp_food_pricerange
9 confirm-domain
10 expl-conf_area
11 expl-conf_food
12 expl-conf_pricerange
13 impl-conf_area+impl-conf_pricerange+request_food
14 impl-conf_food+impl-conf_pricerange+request_area
15 impl-conf_food+request_area
16 inform_addr+inform_food+offer_name
17 inform_addr+inform_phone+inform_pricerange+offer_name
18 inform_addr+inform_phone+offer_name
19 inform_addr+inform_postcode+offer_name
20 inform_addr+inform_pricerange+offer_name
21 inform_addr+offer_name
22 inform_area+inform_food+inform_pricerange+offer_name
23 inform_area+inform_food+offer_name
24 inform_area+inform_phone+offer_name
25 inform_area+inform_postcode+offer_name
26 inform_area+inform_pricerange+offer_name
27 inform_area+offer_name
28 inform_food+inform_pricerange+offer_name
29 inform_food+offer_name
30 inform_phone+inform_postcode+offer_name
31 inform_phone+inform_pricerange+offer_name
32 inform_phone+offer_name
33 inform_postcode+inform_pricerange+offer_name
34 inform_postcode+offer_name
35 inform_pricerange+offer_name
36 offer_name
37 repeat
38 reqmore
39 request_area
40 request_food
41 request_pricerange
42 select_area
43 select_food
44 select_pricerange
45 welcomemsg
type template number from the list: generated response is:  Api_call area="south" food="#food" pricerange="cheap"	api_call area="south" food="#food" pricerange="cheap"
the result of the db call is:  {'food': 'chinese', 'pricerange': 'cheap', 'area': 'south', 'postcode': 'c.b 1, 7 d.y', 'phone': '01223 244277', 'addr': 'cambridge leisure park clifton way cherry hinton', 'name': 'the lucky star'}
choose turn (1 for user, 2 for bot, 33 for saving and exit): 2020-06-02 14:41:57.149 INFO in 'deeppavlov.models.go_bot.tracker'['tracker'] at line 210: Made api_call with {'area': 'south', 'pricerange': 'cheap'}, got 2 results.
current slot vals are:  {'pricerange': 'cheap', 'this': 'dontcare', 'area': 'south'}
0 api_call
1 bye
2 canthear
3 canthelp_area
4 canthelp_area_food
5 canthelp_area_food_pricerange
6 canthelp_area_pricerange
7 canthelp_food
8 canthelp_food_pricerange
9 confirm-domain
10 expl-conf_area
11 expl-conf_food
12 expl-conf_pricerange
13 impl-conf_area+impl-conf_pricerange+request_food
14 impl-conf_food+impl-conf_pricerange+request_area
15 impl-conf_food+request_area
16 inform_addr+inform_food+offer_name
17 inform_addr+inform_phone+inform_pricerange+offer_name
18 inform_addr+inform_phone+offer_name
19 inform_addr+inform_postcode+offer_name
20 inform_addr+inform_pricerange+offer_name
21 inform_addr+offer_name
22 inform_area+inform_food+inform_pricerange+offer_name
23 inform_area+inform_food+offer_name
24 inform_area+inform_phone+offer_name
25 inform_area+inform_postcode+offer_name
26 inform_area+inform_pricerange+offer_name
27 inform_area+offer_name
28 inform_food+inform_pricerange+offer_name
29 inform_food+offer_name
30 inform_phone+inform_postcode+offer_name
31 inform_phone+inform_pricerange+offer_name
32 inform_phone+offer_name
33 inform_postcode+inform_pricerange+offer_name
34 inform_postcode+offer_name
35 inform_pricerange+offer_name
36 offer_name
37 repeat
38 reqmore
39 request_area
40 request_food
41 request_pricerange
42 select_area
43 select_food
44 select_pricerange
45 welcomemsg
type template number from the list: generated response is:  The lucky star is a nice place in the south of town serving tasty chinese food.
choose turn (1 for user, 2 for bot, 33 for saving and exit): write a user sentence: type 1 if your sentence has a slot, else 0: 0 postcode
1 area
2 name
3 food
4 this
5 phone
6 pricerange
7 addr
type slot category number from the list: type 1 if you want to add more slots, else 0: {'speaker': 1, 'text': 'address', 'slots': [['addr', '']]}
choose turn (1 for user, 2 for bot, 33 for saving and exit): current slot vals are:  {'pricerange': 'cheap', 'this': 'dontcare', 'area': 'south', 'addr': ''}
0 api_call
1 bye
2 canthear
3 canthelp_area
4 canthelp_area_food
5 canthelp_area_food_pricerange
6 canthelp_area_pricerange
7 canthelp_food
8 canthelp_food_pricerange
9 confirm-domain
10 expl-conf_area
11 expl-conf_food
12 expl-conf_pricerange
13 impl-conf_area+impl-conf_pricerange+request_food
14 impl-conf_food+impl-conf_pricerange+request_area
15 impl-conf_food+request_area
16 inform_addr+inform_food+offer_name
17 inform_addr+inform_phone+inform_pricerange+offer_name
18 inform_addr+inform_phone+offer_name
19 inform_addr+inform_postcode+offer_name
20 inform_addr+inform_pricerange+offer_name
21 inform_addr+offer_name
22 inform_area+inform_food+inform_pricerange+offer_name
23 inform_area+inform_food+offer_name
24 inform_area+inform_phone+offer_name
25 inform_area+inform_postcode+offer_name
26 inform_area+inform_pricerange+offer_name
27 inform_area+offer_name
28 inform_food+inform_pricerange+offer_name
29 inform_food+offer_name
30 inform_phone+inform_postcode+offer_name
31 inform_phone+inform_pricerange+offer_name
32 inform_phone+offer_name
33 inform_postcode+inform_pricerange+offer_name
34 inform_postcode+offer_name
35 inform_pricerange+offer_name
36 offer_name
37 repeat
38 reqmore
39 request_area
40 request_food
41 request_pricerange
42 select_area
43 select_food
44 select_pricerange
45 welcomemsg
type template number from the list: generated response is:  Sure, the lucky star is on cambridge leisure park clifton way cherry hinton.
choose turn (1 for user, 2 for bot, 33 for saving and exit): write a user sentence: type 1 if your sentence has a slot, else 0: 0 postcode
1 area
2 name
3 food
4 this
5 phone
6 pricerange
7 addr
type slot category number from the list: type 1 if you want to add more slots, else 0: {'speaker': 1, 'text': 'phone number', 'slots': [['phone', '']]}
choose turn (1 for user, 2 for bot, 33 for saving and exit): current slot vals are:  {'pricerange': 'cheap', 'this': 'dontcare', 'area': 'south', 'addr': '', 'phone': ''}
0 api_call
1 bye
2 canthear
3 canthelp_area
4 canthelp_area_food
5 canthelp_area_food_pricerange
6 canthelp_area_pricerange
7 canthelp_food
8 canthelp_food_pricerange
9 confirm-domain
10 expl-conf_area
11 expl-conf_food
12 expl-conf_pricerange
13 impl-conf_area+impl-conf_pricerange+request_food
14 impl-conf_food+impl-conf_pricerange+request_area
15 impl-conf_food+request_area
16 inform_addr+inform_food+offer_name
17 inform_addr+inform_phone+inform_pricerange+offer_name
18 inform_addr+inform_phone+offer_name
19 inform_addr+inform_postcode+offer_name
20 inform_addr+inform_pricerange+offer_name
21 inform_addr+offer_name
22 inform_area+inform_food+inform_pricerange+offer_name
23 inform_area+inform_food+offer_name
24 inform_area+inform_phone+offer_name
25 inform_area+inform_postcode+offer_name
26 inform_area+inform_pricerange+offer_name
27 inform_area+offer_name
28 inform_food+inform_pricerange+offer_name
29 inform_food+offer_name
30 inform_phone+inform_postcode+offer_name
31 inform_phone+inform_pricerange+offer_name
32 inform_phone+offer_name
33 inform_postcode+inform_pricerange+offer_name
34 inform_postcode+offer_name
35 inform_pricerange+offer_name
36 offer_name
37 repeat
38 reqmore
39 request_area
40 request_food
41 request_pricerange
42 select_area
43 select_food
44 select_pricerange
45 welcomemsg
type template number from the list: generated response is:  The phone number of the lucky star is 01223 244277.	The phone number of the lucky star is dontcare.
choose turn (1 for user, 2 for bot, 33 for saving and exit): write a user sentence: type 1 if your sentence has a slot, else 0: {'speaker': 1, 'text': 'thank you good bye', 'slots': []}
choose turn (1 for user, 2 for bot, 33 for saving and exit): current slot vals are:  {'pricerange': 'cheap', 'this': 'dontcare', 'area': 'south', 'addr': '', 'phone': ''}
0 api_call
1 bye
2 canthear
3 canthelp_area
4 canthelp_area_food
5 canthelp_area_food_pricerange
6 canthelp_area_pricerange
7 canthelp_food
8 canthelp_food_pricerange
9 confirm-domain
10 expl-conf_area
11 expl-conf_food
12 expl-conf_pricerange
13 impl-conf_area+impl-conf_pricerange+request_food
14 impl-conf_food+impl-conf_pricerange+request_area
15 impl-conf_food+request_area
16 inform_addr+inform_food+offer_name
17 inform_addr+inform_phone+inform_pricerange+offer_name
18 inform_addr+inform_phone+offer_name
19 inform_addr+inform_postcode+offer_name
20 inform_addr+inform_pricerange+offer_name
21 inform_addr+offer_name
22 inform_area+inform_food+inform_pricerange+offer_name
23 inform_area+inform_food+offer_name
24 inform_area+inform_phone+offer_name
25 inform_area+inform_postcode+offer_name
26 inform_area+inform_pricerange+offer_name
27 inform_area+offer_name
28 inform_food+inform_pricerange+offer_name
29 inform_food+offer_name
30 inform_phone+inform_postcode+offer_name
31 inform_phone+inform_pricerange+offer_name
32 inform_phone+offer_name
33 inform_postcode+inform_pricerange+offer_name
34 inform_postcode+offer_name
35 inform_pricerange+offer_name
36 offer_name
37 repeat
38 reqmore
39 request_area
40 request_food
41 request_pricerange
42 select_area
43 select_food
44 select_pricerange
45 welcomemsg
type template number from the list: generated response is:  You are welcome!
choose turn (1 for user, 2 for bot, 33 for saving and exit): please enter integer value
choose turn (1 for user, 2 for bot, 33 for saving and exit): [{'speaker': 2, 'text': 'Hello, welcome to the Cambridge restaurant system. You can ask for restaurants by area, price range or food type. How may I help you?', 'slots': [], 'act': 'welcomemsg'}, {'speaker': 1, 'text': 'cheap restaurant', 'slots': [['pricerange', 'cheap']]}, {'speaker': 2, 'text': 'What kind of food would you like?', 'slots': [], 'act': 'request_food'}, {'speaker': 1, 'text': 'any', 'slots': [['this', 'dontcare']]}, {'speaker': 2, 'text': 'What part of town do you have in mind?', 'slots': [], 'act': 'request_area'}, {'speaker': 1, 'text': 'south', 'slots': [['area', 'south']]}, {'speaker': 2, 'text': 'Api_call area="south" food="#food" pricerange="cheap"\tapi_call area="south" food="#food" pricerange="cheap"', 'db_result': '{"food": "chinese", "pricerange": "cheap", "area": "south", "postcode": "c.b 1, 7 d.y", "phone": "01223 244277", "addr": "cambridge leisure park clifton way cherry hinton", "name": "the lucky star"}', 'slots': [['area', 'south'], ['pricerange', 'cheap'], ['area', 'south'], ['pricerange', 'cheap']], 'act': 'api_call'}, {'speaker': 2, 'text': 'The lucky star is a nice place in the south of town serving tasty chinese food.', 'slots': [['area', 'south'], ['name', 'the lucky star'], ['area', 'south'], ['food', 'chinese']], 'act': 'inform_area+inform_food+offer_name'}, {'speaker': 1, 'text': 'address', 'slots': [['addr', '']]}, {'speaker': 2, 'text': 'Sure, the lucky star is on cambridge leisure park clifton way cherry hinton.', 'slots': [['name', 'the lucky star'], ['addr', 'cambridge leisure park clifton way cherry hinton']], 'act': 'inform_addr+offer_name'}, {'speaker': 1, 'text': 'phone number', 'slots': [['phone', '']]}, {'speaker': 2, 'text': 'The phone number of the lucky star is 01223 244277.\tThe phone number of the lucky star is dontcare.', 'slots': [['name', 'the lucky star'], ['phone', '01223 244277'], ['name', 'the lucky star']], 'act': 'inform_phone+offer_name'}, {'speaker': 1, 'text': 'thank you good bye', 'slots': []}, {'speaker': 2, 'text': 'You are welcome!', 'slots': [], 'act': 'bye'}]

Process finished with exit code 0

@Eugen2525
Copy link
Contributor Author

I hope you got the idea by now. If there are any questions let me know. I want to finalize this quickly to take it to a pull request so your comment is very much appreciated.

@oserikov
Copy link
Contributor

oserikov commented Jun 5, 2020

Ok! I got it, thanks 😅
Sorry for the delay, thanks for interest to our published contribution ideas btw!

Seems like I didn't get you right at first, my bad.
I mean, what I thought you suggested was just the converter from one popular format to another, and the routine here above does more than the conversion, it assists the creation of completely new datasets. Sorry again for misunderstanding.

So.. it does the data generation (to dstc8 if I got you right, at least that's what I expected to see). That's cool.

I think that the next step is to allow for DP to use the dstc8 format: the library codebase would extremely benefit from this.

This could be pretty simple: we have a data reader: either DSTC2DatasetReader or SimpleDSTC2DatasetReader. My point here is to implement DSTC8DatasetReader.

To provide some intuition I'll describe the SimpleDSTC2DatasetReader below:


The purpose of Simple reader is to read the dataset in dstc2-inspired format. The data is read and transformed into dp-native dialogs-data-format that is used in further go-bot pipeline.

The main method read does exactly what is described above, (also downloads the data if nowhere to read from).

_read_from_file handles the actual read&convert process of each data file: it json.reads the data json, passes the received dict to get_turns, converts the list of user utterances and list of system responses to list of pairs <utterance, response>.

_get_turns is written in a bit curious manner though what it does is looping through a list of turns for each dialogue and handles the dataset-specific problem of db calls being represented by two simultanous system utterances: api_call, then actual response utterance given the call results.

Turns are (in our format and iirc in dstc8 too) dicts so that's about sharing some fields of simultanous dicts in dataset.


So what do you think about this? I find this Idea easier to implement than the GUI but extremely relevant.

Eugen2525 added a commit to Eugen2525/DeepPavlov that referenced this issue Jun 5, 2020
@Eugen2525
Copy link
Contributor Author

Eugen2525 commented Jun 5, 2020

oops, the codebase has significantly changed recently, I was behind several commits...
Anyways, I have submitted a pull request with an example where you can see an illustration of what I meant.
Regarding your suggestions, I see that the codebase has changed extremely from what it was just 2-3 weeks ago... I need some time to inspect and I will then comeback with an idea.

Thanks for your guidance, this is exactly what I wanted right from the beginning, and now when I have your input, I will take it from here.

Eugen2525 added a commit to Eugen2525/DeepPavlov that referenced this issue Jun 11, 2020
@Eugen2525
Copy link
Contributor Author

@oserikov I have created a PR but I am not sure why I cannot pass jenkins tests. can you explain what is wrong and how can I pass them?

@abhisheksurya578
Copy link

Hello @Eugen2525 Can you please provide the detailed Deeppavlov Go_Bot tutorial for a custom dataset.

@Eugen2525
Copy link
Contributor Author

Hello @Eugen2525 Can you please provide the detailed Deeppavlov Go_Bot tutorial for a custom dataset.

yes, I am planning to do some tutorial for creating a bot from scratch, with no initial train data at hand. I will do it soon, so stay tooned

@abhisheksurya578
Copy link

@Eugen2525 Thanks. Without training data how can we do that? Can you please share some thoughts of how you trained the bot and the format in which you prepared the dataset for your custom data

surkovv pushed a commit to surkovv/DeepPavlov that referenced this issue Aug 24, 2022
…eppavlov#1249)

resolves deeppavlov#1230

* fix: issue#1230

* create contrib section; move train_set_generation tool to contrib

* fix import after moved gobot_generation

* added input files

* generated data added

* added readme and data download

* remove hardcoded paths from notebook

Co-authored-by: oserikov <srkvoa@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants