New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
building go-bot in russian #375
Comments
Hi @vitalyuf! TLTR: yes, it has. I suggest firstly to try To integrate a database of employee names, you should provide dataset with appropriate If your dialogs are simple (as it seems to me), then the model shouldn't require too much data to be properly trained. DSTC2 was trained on 1000 dialogs, you will need less. I guess 100 will be even enough, but you'll have to check by yourself. |
Thank you! |
HI! BTW: Should I open new issues in such cases or I may reopen elder ones if i get some corellated troubles in future? |
Oh. I see now. |
Yes, if you want a minimal slot-filler, you can use configs/ner/slotfiller_dstc2_raw.json, which will match slots according to your dictionary using Levenshtein distance. |
I guess, reopening the issue is the right way =) |
In order to use simple pattern matching slot_filler you need to build a vocabulary of slots and slot values pairs with paraphrases. You can find an example of such vocabulary here. You need to specify path to this file in |
Thanks, I've alreay tried raw filler. And it works. I plan to generate such a dictionary from a list of all possible surnames and names. I suppose the resulting file will be about 30Mb size. Also I tried to use ner_rus.json as a subcomponent of slotfill_dstc2 with reference to my stc_slot_vals.json but it failed (KeyError: 'PER' at ---> 65 for entity_name in self._slot_vals[slot]: Generally speaking for now I want to fill 'name', 'surname', 'fathersname' slots. |
Unfortunately, raw slotfiller IRL performs not very good for my task due to next reasons:
|
This solution is not suited for large vocabularies at the moment. For large vocabularies prefix trees would be way faster. Furthermore, we will explore the 'Иванов'->'Живанов' case. |
Hi! |
We will fix it next release. However, you can try to fix it via removing a couple of lines here https://github.com/deepmipt/DeepPavlov/blob/e91ff6c3eafbd49b6f09499d2e1b4b5675b513bb/deeppavlov/models/slotfill/slotfill.py#L115-L117 |
Hi! |
Hi! Please tell me if there is a way for slotfiller not to search values in its dictionary slot_vals.json. |
@vitalyuf Hi. Did you find pre-trained embeddings for Russian language in .txt format? |
Hi!
I want to build a go-bot using DeepPavlov in russian.
The task of gobot is to output phone number of requested employee by his name, surname, fathers name.
I plan to use tutorial03 as a reference.
And the main idea is using instead of DSTC2 data set a new one, which i gonna generate in DSTC2 format.
Has the described aproach a right to exist?
The text was updated successfully, but these errors were encountered: