Skip to content

Commit

Permalink
feat: simple slot filler and go_bot database
Browse files Browse the repository at this point in the history
* release 0.0.3 (#150)

* feat: tests can be run from project root (#86)

* refactor: instead of juggling global random states use instances of Random for datasets

* test(): add test for interacting with custom queries

After refactoring, it is possible to easily add list of query-response
pairs for every model (config), which will be used to compare pretrained
model output with expected output. Initial lists added for error_model
and ner. Also URL for downloading pretrained ner_conll2003_model added
IP-1344 #done

* Update docs from master (#96)

* fixed grammar and style

* Update README.md

* fix grammar & style

* fix grammar & style

* fix grammar&style in Intent classification README

* doc: add supported platform notes

* docs: correct paths to scripts and configs to be relative to repository root (#94)

* docs: correct paths to scripts and configs to be relative to repository root

fixes #93

* docs: set paths in basic examples to be relative to the project root

* docs: run deep.py as a python module in examples

* doc: add notes for python 3.5

* test(): change downloading to temp dir (#97)

* feat: assert python version is 3.6 or higher

* Rename dataset to dataset_iterator and other renames (#103)

* refactor: rename 'dataset' to 'dataset_iterator'

* refactor: rename dataset readers and iterators

* refactor: classification iterator and reader

* fix: dialog_iterator

* test: fix downloading procedure (#108)

* Feature/tf layers to core (#67)

* feat: layers moved to core

* feat: attention added

* fix: highway/skip connections for different dimensionality of units are fixed

* feat: NER now supports core layers

* fix: minor docstrings fixes

* feat: CuDNN GRU and LSTM added

* feat: Bidirectional CuDNN GRU and LSTM added

* feat: stacked bi-rnn refactored

* fix: fixed arguments order in rnn

* fix: remove duplicate mult_att

* chore: merge with dev

* fix: backward forward bug in cudnnrnn

* refactor: use single fasttext module, clean dependencies

* fix: add error when n_classes is zero

* feat: add fastText model usage instead of fasttext

* fix: emb_module default fastText

* chore: embedding fixed in configs

* chore: change new models names

* feat: change intent embeddings in gobot configs

* chore: fastText to fasttext, new model, change intents in gobot configs

* chore: new url on new fasttext embeddings

* fix: delete dowload all true

* fix: add url of old embedding file

* fix: delete comma

* fix: delete old embedding file from urls

* fix: delete pyfasttext from requirements, fasttext_embedder

* fix: change pyfasttext embeddings from gobot

* fix: delete from requirements

* fix: delete gensim from fasttext_embedder

* fix: simplify requirements

* fix: fix dim in gobot_all config

* refactor: remove redundant parameter 'emb_module'

* feat: use wiki.en.bin embeddings in gobot_all

* feat: check saved model params and fix lowercase for interact

* fix: lowercase text while interact

* feat: check saved model params

* fix: rm extra configs

* feat: add support for classification data in csv/json formats (#115)

* feat: add support for csv/json classification datasets

* feat: add tests for snips and samples

* fix: gobot_all config fix

* feat: add REST API for all models

* Moved telegram_utils -> utils; Refactored telegram_ui.py

* Moved telegram_utils -> utils: modified deeppavlov/deep.py

* Fixed getting model name with get_main_component() in telegram_ui.py

* chaner.py: minor fix in get_main_component()

* Added riseapi launch mode

* README.md: added riseapi mode reference

* Updated README.MD and fixed requirements.txt

* minor fixes in README.md

* Fixes in utils/server

* refactor: change endpoint names

* feat: add SteamSpacyTokenizer

* refactor: remove duplicating from script naming

* refactor: outline detokenize() meth in utils, because it should be used by all tokenizers and doesn't depend on tokenize()

* feat: add streaming spaCy tokenizer

* refactor: DELETE original spaCy tokenizer, rename stream_spacy to spacy

* refactor: rename tokenoizer scripts back

* fix: wrong grammar

* feat: include spacy_tokenizer import

* feat: replace old SpacyTokenizer with new StreamSpacyTokenizer

* feat: ability to manage lowercasing from class constructor, typing improvements

* fix: update go-bot configs, so they would work with StreamSpacyTokenizer the same as with the old tokenizer

* feat: add optional logging to the spacy tokenizer

* docs: update docstrings

* refactor: replace custom logger with deppavlov's, pep8 style update

* refactor: uotline ngramize() cause it is independent from tokenizer classes

* refactor: return original JSON formatting

* fix: add **kwargs to __init__()

* chore: update .gitignore

* refactor: more stable and consistent code

* feat: add TravisCI integration

* build(): add TravisCI integration

* build(): add TravisCI integration

* feat: add ranking model

* feat: add ranking model to deeppavlov

* feat: add download of dataset and embedding_model

* feat: adapt to new deeppavlov interfaces

* refactor: use pathlib where available in the ranking model

* feat: add saving and loading responses saving with np.save

* feat: add saving and loading response embeddings saving with np.save,
use response embeddings to calculate predictions in  __call__ function

* feat: add interact regime

* feat: add interact_pred_num parameter

* refactor: change parameter default value, change check if the file with
embeddings model exists

* fix: fix non-string keys in EmbeddingDict class

* feat: add parameters dict for autotests

* feat: add tests support

* feat: add context embeddings vocabulary (it is used in interact regime
to predict the most similar contexts)

* chore: change shuffle parameter default value to True in batch_generator

* refactor: change config to chainer representation

* fix: bug fix in urls.py file

* refactor: remove emb_vocab_file saving, move build_tok2int_vocab and
make_ints funcs to InsuranceDict class, add set_embeddings and
reset_embeddings funcs in RankingModel

* feat: add initial documentation

* refactor: remove idx2int vocabulary, add vocabularies saving

* change config parameters default values, remove examples in tests

* feat: add table in documentation

* fix: fix bug in urls.py

* refactor: remove paths from config

* feat: add documentation

* feat: add True in tests

* feat: add documentation

* refactor: move init/load in the load function.

* refactor: change parameters in config

* feat: add logging

* feat: add more logging

* feat: add documentation, change parameters values in config

* fix: add genesis for ranking model

* fix: requirements installation order that caused setup.py error

* refactor: train script

* feat: add documentation

* feat: models parameters check for ner

* feat: parameters check added to ner

* feat: parameters check added to slotfill

* chore: minor clean-up

* fix: fix conll-2003 model file names and archive names

* refactor: remove blank line

* feat: allow to stop training after n batches (#127)

* fix: many minor fixes

* fix: fix mark_done data_path

* refactor: rename ranking_dataset to ranking_iterator.py and move it to the dataset_iterators folder

* fix: fix embedding matrix construction, change epochs num
default parameter value

* refactor: rename registered name and name of the class

* refactor: rename files and classes

* refactor: change dataset downlaod

* feat: add insurance embeddings and datasets in urls.py

* refactor: change batch data representation (#131)

* feat: install tensorflow-gpu

* feat: add SQUAD model

* feat: add SQuAD dataset reader

* feat: add dataset, preprocessing, config

* feat: add VocabEmbedder for chars and tokens

* feat&fix: add model realization

* feat: add training support, answer postprocessing

* fix: predicted answer extraction from context

* fix: dropout mask

* feat: true_answer is a list of answers now

* merge with dev

* docs: add some docstrings

* refactor: renaming variables

* docs: add README.md

* feat: add support of multiple inputs and outputs in interact mode

* docs: upd README.md

* fix: bugs after merge with dev

* fix: turn on training vocabs

* fix: remove keep_prob multiplier for dropout mask

* fix: add short contexts support

* docs: upd README.md

* feat: chainer returns batch of tuples instead of tuple of batches

* docs: upd squad README.md

* docs: upd squad README.md

* feat: add link to pretrained SQuAD model

* fix: SQuAD model url

* feat: add embeddings downloading and upd config

* feat: add variable scope for optimizer

* refactor: do not override __init__ method for squad_iterator

* fix: ensure that directory exists before saving SquadVocabEmbedder

* style: upd names in config and docs

* chore: remove main.py used for debugging

* docs: upd README.md

* fix: change batch_size to fix possible OOM

* test: add possibility to interact with several input query

* chore: add max_batches to squad config

* docs: upd README.md

* fix(ranking_network): wrap y as np.array

* fix: fix training stop for pytest

* style: add license header

* fix: refactor training stop for pytest

* test: specify pytest_max_batches

* feat: use all pytest keys and not only max_batches (#134)

* fix: remove result stringification

* feat: add GPU_only and Slow marks for tests

* feat: add SQuAD dataset reader

* feat: add dataset, preprocessing, config

* feat: add VocabEmbedder for chars and tokens

* feat&fix: add model realization

* feat: add training support, answer postprocessing

* fix: predicted answer extraction from context

* fix: dropout mask

* feat: true_answer is a list of answers now

* merge with dev

* docs: add some docstrings

* refactor: renaming variables

* docs: add README.md

* feat: add support of multiple inputs and outputs in interact mode

* docs: upd README.md

* fix: bugs after merge with dev

* fix: turn on training vocabs

* fix: remove keep_prob multiplier for dropout mask

* fix: add short contexts support

* docs: upd README.md

* feat: chainer returns batch of tuples instead of tuple of batches

* docs: upd squad README.md

* docs: upd squad README.md

* feat: add link to pretrained SQuAD model

* fix: SQuAD model url

* feat: add embeddings downloading and upd config

* feat: add variable scope for optimizer

* refactor: do not override __init__ method for squad_iterator

* fix: ensure that directory exists before saving SquadVocabEmbedder

* style: upd names in config and docs

* chore: remove main.py used for debugging

* docs: upd README.md

* fix: change batch_size to fix possible OOM

* test: add possibility to interact with several input query

* chore: add max_batches to squad config

* docs: upd README.md

* fix(ranking_network): wrap y as np.array

* fix: fix training stop for pytest

* style: add license header

* fix: refactor training stop for pytest

* test: specify pytest_max_batches

* test: add couple of marks for selecting tests

* test: make Travis running only fast tests without GPU

* fix: ranking config works in interactbot

* fix: add downloading nltk punkt for tokenization (#140)

* feat: bot start message for intents does not say anything about dstc2 (#142)

* feat: interactbot command works with pipes that require multiple inputs (#137)

* build: change TravisCI script (#143)

* feat: add Glove embedder (#138)

* feat: glove embedder added

* feat: embeddings added to NER network

* feat: dataset and embeddings are added to urls.py for downloading

* fix: char embeddings added to pretrained embeddings

* feat: embedder return list of embeddings instead zero padded np array

* feat: capitalization added

* feat: config modified according to new features

* feat: double dense added to input parameters

* feat:config parameters updated

* chore: fix urls for conll NER, ontonotes model url added

* feat: pytest_max_batches added for faster tran check

* feat: ontonotes tests added

* feat: test conll max batches added

* Update README.md

* feat: add seq2seq go bot

* fix: lowercase text while interact

* feat: check saved model params

* fix: rm extra configs

* feat: add kvret dataset_reader

* feat: add kvret_dataset_iterator

* fix: add configerror

* fix: dirty fix for dialog data to be lowercased

* feat: check np.int and int in Vocabulary

* feat: seq2seqbot works for train and infer

* feat: add bleu-metric

* feat: add simple seq2seq_go_bot config

* fix: fix inference and load()

* feat: add variable scope for optimizer

* feat: add support of multiple inputs and outputs in interact mode

* fix: fix padding

* feat: tokenizer argument in Vocabulary

* feat: chainer returns batch of tuples instead of tuple of batches

* fix: spacy_tokenizer returns [['']] for batch with empty string and add alpha_only argument

* feat: add per_item_bleu

* feat: train seq2seq_go_bot on utterance batches

* feat: tokenize y_true

* feat: fit kb_entries knowledge base

* feat: add split tokenizer

* feat: standartize tokenizers output

* feat: normalize kb entities

* feat: db_columns, db_items in each sample

* fix: go_bot configs (for new vocab) and loading of network

* style: minor restyling

* feat: add config for infer

* feat: add config for infer

* feat: add seq2seq_go_bot pretrained model

* feat: update telegram start and help messages

* style: minor styling

* docs: add simple readme

* doc: remove red ... blocks

* doc: change Dataset to DatasetIterator

* doc: update list of configs

* doc: update package structure

* doc: add notes about dataset element in config

* feat: add squad model description to README.md

* doc: add config specification for seq2seq_go_bot

* fix: lowercase text while interact

* feat: check saved model params

* fix: rm extra configs

* feat: add kvret dataset_reader

* feat: add kvret_dataset_iterator

* fix: add configerror

* fix: dirty fix for dialog data to be lowercased

* feat: check np.int and int in Vocabulary

* feat: seq2seqbot works for train and infer

* feat: add bleu-metric

* feat: add simple seq2seq_go_bot config

* fix: fix inference and load()

* feat: add variable scope for optimizer

* feat: add support of multiple inputs and outputs in interact mode

* fix: fix padding

* feat: tokenizer argument in Vocabulary

* feat: chainer returns batch of tuples instead of tuple of batches

* fix: spacy_tokenizer returns [['']] for batch with empty string and add alpha_only argument

* feat: add per_item_bleu

* feat: train seq2seq_go_bot on utterance batches

* feat: tokenize y_true

* feat: fit kb_entries knowledge base

* feat: add split tokenizer

* feat: standartize tokenizers output

* feat: normalize kb entities

* feat: db_columns, db_items in each sample

* fix: go_bot configs (for new vocab) and loading of network

* style: minor restyling

* feat: add config for infer

* feat: add config for infer

* feat: add seq2seq_go_bot pretrained model

* feat: update telegram start and help messages

* style: minor styling

* docs: add simple readme

* docs: add seq2seq_go_bot in main readme

* docs: small fix

* docs: add config specification for seq2seq_go_bot

* chore: remove install.py (#151)

* feat: add support for batches in go-bot

* feat: batching v1

* feat: bow_encoder is optional

* fix: probs calculation for use_action_mask=true

* refactor: do not feed inital_state during train

* feat: feed sequence lengths in dynamic_rnn

* refactor: rename go_bot.py -> bot.py

* Update README.md

* feat: add Ontonotes NER with Senna

* feat: Ontonotes NER added

* chore: train part removed from config

* fix: readme dataset_iterator fixed, json removed from striong

* feat: raw version of test added

* fix: test modes

* fix: folder name in ontonotes config and download path now consistent

* fix: skip tests

* Revert "feat: add Ontonotes NER with Senna" (#160)

This reverts commit ae91d8f.

* Add custom slot_filler.

* Add custom go_bot configuration gobot_my.json.

* Import my_slot_filler.slotfill to register simple_slotfiller.

* Add new configuration files for gobot_my and gobot_simple.

* Build data_reader for pharma_bot.

* Fix bugs about reading data.

* Rename modules

* Implement the data reader templates

* Remove the normalize function because it is not needed. Assuming the slot names are not changed.

* feat: add dstc2 with api calls

* fix: episode_done fix

* fix: mv db_result from y to x

* feat: if made api_call respond with next prediction

* feat: create dstc2_v2 with api_calls

* refactor: change interact_db_result logic and fix debug output

* feat: add learning rate polynomial decay

* refactor: rm moved to core modules

* feat: add Sqlite3Database

* feat: rm db_result_during_interaction & add database in go_bot

* refactor: logging

* fix: import fix

* feat: add threshold for levenshtein score

* fix: fix l2 regularization

* feat: add features to state tracker

* feat: add db context features

* feat: new config for intents model with wiki.en.bin

* docs: readme for intents

add info about two models with different embeddings for DSTC 2

* feat: configs use dstc2_v2

* fix: minor db fix

* feat: add variational dropout & fix logging

* feat: raw slotfiller moved to the separate folder and inherited from
Serializable

* feat: orthodox slotfill moved to slotfill folder

* refactor: external functions to class methods

* feat: config for raw slotfiller added

* feat: slotfilling config is simplified by dstc_ner config reference

* chore: unnecessary imports removed

* chore: fix outdated imports

* docs: simple description for raw slotfill

* fix: fix action mask for api-calls

* refactor: dict to tuple in database

* feat: add training for database

* feat: add optimizer configuration

* docs: fix github links

* docs: add new network parameters and database

* docs: add -d description

* feat: add threshold to slotfill configs

* fix: attention over intents work

* docs: add comparison with external models

* docs: dstc2_v2 vs dstc2

* docs: fix dstc2_reader

* refactor: add api call action as a config parameter

* docs: add template and database doc

* feat: template type from str to class

* docs: database class

* feat: update configs

* feat: fix templates

* refactor: rm extra files

* refactor: database.py -> sqlite_database.py

* feat: mv dropout to dense layer, update configs, fix dropout_rate

* feat: update gobot_dstc2_best model

* feat: dropout on attentioned embeddings

* feat: add intents_dstc2_big tests

* feat: update gobot_dstc2 model

* feat: retrain gobot_dstc2

* refactor: rm unused code

* docs: update examples and metrics

* fix: remove training from slotfill
  • Loading branch information
vikmary authored and seliverstov committed May 16, 2018
1 parent 53f4a3b commit fe272cd
Show file tree
Hide file tree
Showing 27 changed files with 1,648 additions and 815 deletions.
4 changes: 3 additions & 1 deletion deeppavlov/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
import deeppavlov.core.models.keras_model
import deeppavlov.core.data.vocab
import deeppavlov.core.data.simple_vocab
import deeppavlov.core.data.sqlite_database
import deeppavlov.dataset_readers.babi_reader
import deeppavlov.dataset_readers.dstc2_reader
import deeppavlov.dataset_readers.kvret_reader
Expand All @@ -42,7 +43,6 @@
import deeppavlov.models.embedders.dict_embedder
import deeppavlov.models.embedders.glove_embedder
import deeppavlov.models.embedders.bow_embedder
import deeppavlov.models.ner.slotfill
import deeppavlov.models.ner.ner_ontonotes
import deeppavlov.models.spellers.error_model.error_model
import deeppavlov.models.trackers.hcn_at
Expand Down Expand Up @@ -74,6 +74,8 @@
import deeppavlov.models.preprocessors.field_getter
import deeppavlov.models.preprocessors.sanitizer
import deeppavlov.models.preprocessors.lazy_tokenizer
import deeppavlov.models.slotfill.slotfill_raw
import deeppavlov.models.slotfill.slotfill
import deeppavlov.models.preprocessors.one_hotter
import deeppavlov.dataset_readers.ontonotes_reader

Expand Down
34 changes: 34 additions & 0 deletions deeppavlov/configs/go_bot/database_dstc2.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
{
"dataset_reader": {
"name": "dstc2_v2_reader",
"data_path": "dstc2_v2"
},
"dataset_iterator": {
"name": "dialog_db_result_iterator"
},
"chainer": {
"in": ["db_result"],
"in_y": [],
"out": [],
"pipe": [
{
"id": "restaurant_database",
"name": "sql_database",
"fit_on": ["db_result"],
"table_name": "mytable",
"primary_keys": ["name"],
"save_path": "dstc2_v2/resto.sqlite"
}
]
},
"train": {
},
"metadata": {
"download": [
{
"url": "http://lnsigo.mipt.ru/export/datasets/dstc2_v2.tar.gz",
"subdir": "dstc2_v2"
}
]
}
}
43 changes: 23 additions & 20 deletions deeppavlov/configs/go_bot/gobot_dstc2.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"dataset_reader": {
"name": "dstc2_reader",
"data_path": "dstc2"
"name": "dstc2_v2_reader",
"data_path": "dstc2_v2"
},
"dataset_iterator": {
"name": "dialog_iterator"
Expand All @@ -20,6 +20,13 @@
"save_path": "vocabs/token.dict",
"load_path": "vocabs/token.dict"
},
{
"id": "restaurant_database",
"name": "sqlite_database",
"table_name": "mytable",
"primary_keys": ["name"],
"save_path": "dstc2_v2/resto.sqlite"
},
{
"in": ["x"],
"in_y": ["y"],
Expand All @@ -28,29 +35,25 @@
"name": "go_bot",
"debug": false,
"word_vocab": "#token_vocab",
"template_path": "dstc2/dstc2-templates.txt",
"template_path": "dstc2_v2/dstc2-templates.txt",
"template_type": "DualTemplate",
"database": "#restaurant_database",
"api_call_action": "api_call",
"use_action_mask": false,
"db_result_during_interaction": {
"addr": "Sobina Square, 1/4",
"area": "north",
"food": "russian",
"phone": "+7(965)173-37-33",
"postcode": "141700",
"pricerange": "cheap"
},
"network_parameters": {
"load_path": "gobot_dstc2/model",
"save_path": "gobot_dstc2/model",
"learning_rate": 0.002,
"dropout_rate": 0.8,
"learning_rate": 0.004,
"dropout_rate": 0.85,
"l2_reg_coef": 7e-4,
"hidden_size": 128,
"dense_size": 64
"dense_size": 160
},
"slot_filler": {
"config_path": "../deeppavlov/configs/ner/slotfill_dstc2.json"
},
"intent_classifier": {
"config_path": "../deeppavlov/configs/intents/intents_dstc2.json"
"config_path": "../deeppavlov/configs/intents/intents_dstc2_big.json"
},
"embedder": null,
"bow_embedder": {
Expand All @@ -62,14 +65,14 @@
},
"tracker": {
"name": "featurized_tracker",
"slot_names": ["pricerange", "this", "area", "slot", "food", "name"]
"slot_names": ["pricerange", "this", "area", "food", "name"]
}
}
]
},
"train": {
"epochs": 200,
"batch_size": 2,
"batch_size": 4,

"metrics": ["per_item_dialog_accuracy"],
"validation_patience": 20,
Expand All @@ -86,10 +89,10 @@
},
"download": [
"http://lnsigo.mipt.ru/export/deeppavlov_data/vocabs.tar.gz",
"http://lnsigo.mipt.ru/export/deeppavlov_data/gobot_dstc2_v2.tar.gz",
"http://lnsigo.mipt.ru/export/deeppavlov_data/gobot_dstc2_v4.tar.gz",
{
"url": "http://lnsigo.mipt.ru/export/datasets/dstc2.tar.gz",
"subdir": "dstc2"
"url": "http://lnsigo.mipt.ru/export/datasets/dstc2_v2.tar.gz",
"subdir": "dstc2_v2"
}
]
}
Expand Down
42 changes: 22 additions & 20 deletions deeppavlov/configs/go_bot/gobot_dstc2_all.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"dataset_reader": {
"name": "dstc2_reader",
"data_path": "dstc2"
"name": "dstc2_v2_reader",
"data_path": "dstc2_v2"
},
"dataset_iterator": {
"name": "dialog_iterator"
Expand All @@ -20,6 +20,13 @@
"save_path": "vocabs/token.dict",
"load_path": "vocabs/token.dict"
},
{
"id": "restaurant_database",
"name": "sqlite_database",
"table_name": "mytable",
"primary_keys": ["name"],
"save_path": "dstc2_v2/resto.sqlite"
},
{
"in": ["x"],
"in_y": ["y"],
Expand All @@ -28,29 +35,25 @@
"name": "go_bot",
"debug": false,
"word_vocab": "#token_vocab",
"template_path": "dstc2/dstc2-templates.txt",
"template_path": "dstc2_v2/dstc2-templates.txt",
"template_type": "DualTemplate",
"database": "#restaurant_database",
"api_call_action": "api_call",
"use_action_mask": false,
"db_result_during_interaction": {
"addr": "Sobina Square, 1/4",
"area": "north",
"food": "russian",
"phone": "+7(965)173-37-33",
"postcode": "141700",
"pricerange": "cheap"
},
"network_parameters": {
"load_path": "gobot_dstc2_all/model",
"save_path": "gobot_dstc2_all/model",
"learning_rate": 0.0009,
"dropout_rate": 0.7,
"learning_rate": 0.006,
"dropout_rate": 0.35,
"l2_reg_coef": 5e-4,
"hidden_size": 128,
"dense_size": 64
},
"slot_filler": {
"config_path": "../deeppavlov/configs/ner/slotfill_dstc2.json"
},
"intent_classifier": {
"config_path": "../deeppavlov/configs/intents/intents_dstc2.json"
"config_path": "../deeppavlov/configs/intents/intents_dstc2_big.json"
},
"embedder": {
"name": "fasttext",
Expand All @@ -67,17 +70,17 @@
},
"tracker": {
"name": "featurized_tracker",
"slot_names": ["pricerange", "this", "area", "slot", "food", "name"]
"slot_names": ["pricerange", "this", "area", "food", "name"]
}
}
]
},
"train": {
"epochs": 200,
"batch_size": 1,
"batch_size": 4,

"metrics": ["per_item_dialog_accuracy"],
"validation_patience": 30,
"validation_patience": 20,
"val_every_n_epochs": 1,

"log_every_n_batches": -1,
Expand All @@ -90,10 +93,9 @@
"server_utils": "GoalOrientedBot"
},
"download": [
"http://lnsigo.mipt.ru/export/deeppavlov_data/vocabs.tar.gz",
{
"url": "http://lnsigo.mipt.ru/export/datasets/dstc2.tar.gz",
"subdir": "dstc2"
"url": "http://lnsigo.mipt.ru/export/datasets/dstc2_v2.tar.gz",
"subdir": "dstc2_v2"
},
{
"url": "http://lnsigo.mipt.ru/export/deeppavlov_data/embeddings/wiki.en.bin",
Expand Down
42 changes: 24 additions & 18 deletions deeppavlov/configs/go_bot/gobot_dstc2_best.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"dataset_reader": {
"name": "dstc2_reader",
"data_path": "dstc2"
"name": "dstc2_v2_reader",
"data_path": "dstc2_v2"
},
"dataset_iterator": {
"name": "dialog_iterator"
Expand All @@ -20,6 +20,13 @@
"save_path": "vocabs/token.dict",
"load_path": "vocabs/token.dict"
},
{
"id": "restaurant_database",
"name": "sqlite_database",
"table_name": "mytable",
"primary_keys": ["name"],
"save_path": "dstc2_v2/resto.sqlite"
},
{
"in": ["x"],
"in_y": ["y"],
Expand All @@ -28,21 +35,20 @@
"name": "go_bot",
"debug": false,
"word_vocab": "#token_vocab",
"template_path": "dstc2/dstc2-templates.txt",
"template_path": "dstc2_v2/dstc2-templates.txt",
"template_type": "DualTemplate",
"database": "#restaurant_database",
"api_call_action": "api_call",
"use_action_mask": false,
"db_result_during_interaction": {
"addr": "Sobina Square, 1/4",
"area": "north",
"food": "russian",
"phone": "+7(965)173-37-33",
"postcode": "141700",
"pricerange": "cheap"
},
"network_parameters": {
"load_path": "gobot_dstc2_best/model",
"save_path": "gobot_dstc2_best/model",
"learning_rate": 0.0008,
"dropout_rate": 0.85,
"learning_rate": 0.002,
"end_learning_rate": 0.00002,
"decay_steps": 10,
"decay_power": 0.5,
"dropout_rate": 0.45,
"l2_reg_coef": 2e-3,
"hidden_size": 128,
"dense_size": 64,
"attention_mechanism": {
Expand Down Expand Up @@ -71,14 +77,14 @@
},
"tracker": {
"name": "featurized_tracker",
"slot_names": ["pricerange", "this", "area", "slot", "food", "name"]
"slot_names": ["pricerange", "this", "area", "food", "name"]
}
}
]
},
"train": {
"epochs": 200,
"batch_size": 16,
"batch_size": 4,

"metrics": ["per_item_dialog_accuracy"],
"validation_patience": 30,
Expand All @@ -95,10 +101,10 @@
},
"download": [
"http://lnsigo.mipt.ru/export/deeppavlov_data/vocabs.tar.gz",
"http://lnsigo.mipt.ru/export/deeppavlov_data/gobot_dstc2_best_v1.tar.gz",
"http://lnsigo.mipt.ru/export/deeppavlov_data/gobot_dstc2_best_v2.tar.gz",
{
"url": "http://lnsigo.mipt.ru/export/datasets/dstc2.tar.gz",
"subdir": "dstc2"
"url": "http://lnsigo.mipt.ru/export/datasets/dstc2_v2.tar.gz",
"subdir": "dstc2_v2"
},
{
"url": "http://lnsigo.mipt.ru/export/deeppavlov_data/embeddings/wiki.en.bin",
Expand Down

0 comments on commit fe272cd

Please sign in to comment.