release 0.0.3 (#150)

* feat: tests can be run from project root (#86) * refactor: instead of juggling global random states use instances of Random for datasets * test(): add test for interacting with custom queries After refactoring, it is possible to easily add list of query-response pairs for every model (config), which will be used to compare pretrained model output with expected output. Initial lists added for error_model and ner. Also URL for downloading pretrained ner_conll2003_model added IP-1344 #done * Update docs from master (#96) * fixed grammar and style * Update README.md * fix grammar & style * fix grammar & style * fix grammar&style in Intent classification README * doc: add supported platform notes * docs: correct paths to scripts and configs to be relative to repository root (#94) * docs: correct paths to scripts and configs to be relative to repository root fixes #93 * docs: set paths in basic examples to be relative to the project root * docs: run deep.py as a python module in examples * doc: add notes for python 3.5 * test(): change downloading to temp dir (#97) * feat: assert python version is 3.6 or higher * Rename dataset to dataset_iterator and other renames (#103) * refactor: rename 'dataset' to 'dataset_iterator' * refactor: rename dataset readers and iterators * refactor: classification iterator and reader * fix: dialog_iterator * test: fix downloading procedure (#108) * Feature/tf layers to core (#67) * feat: layers moved to core * feat: attention added * fix: highway/skip connections for different dimensionality of units are fixed * feat: NER now supports core layers * fix: minor docstrings fixes * feat: CuDNN GRU and LSTM added * feat: Bidirectional CuDNN GRU and LSTM added * feat: stacked bi-rnn refactored * fix: fixed arguments order in rnn * fix: remove duplicate mult_att * chore: merge with dev * fix: backward forward bug in cudnnrnn * refactor: use single fasttext module, clean dependencies * fix: add error when n_classes is zero * feat: add fastText model usage instead of fasttext * fix: emb_module default fastText * chore: embedding fixed in configs * chore: change new models names * feat: change intent embeddings in gobot configs * chore: fastText to fasttext, new model, change intents in gobot configs * chore: new url on new fasttext embeddings * fix: delete dowload all true * fix: add url of old embedding file * fix: delete comma * fix: delete old embedding file from urls * fix: delete pyfasttext from requirements, fasttext_embedder * fix: change pyfasttext embeddings from gobot * fix: delete from requirements * fix: delete gensim from fasttext_embedder * fix: simplify requirements * fix: fix dim in gobot_all config * refactor: remove redundant parameter 'emb_module' * feat: use wiki.en.bin embeddings in gobot_all * feat: check saved model params and fix lowercase for interact * fix: lowercase text while interact * feat: check saved model params * fix: rm extra configs * feat: add support for classification data in csv/json formats (#115) * feat: add support for csv/json classification datasets * feat: add tests for snips and samples * fix: gobot_all config fix * feat: add REST API for all models * Moved telegram_utils -> utils; Refactored telegram_ui.py * Moved telegram_utils -> utils: modified deeppavlov/deep.py * Fixed getting model name with get_main_component() in telegram_ui.py * chaner.py: minor fix in get_main_component() * Added riseapi launch mode * README.md: added riseapi mode reference * Updated README.MD and fixed requirements.txt * minor fixes in README.md * Fixes in utils/server * refactor: change endpoint names * feat: add SteamSpacyTokenizer * refactor: remove duplicating from script naming * refactor: outline detokenize() meth in utils, because it should be used by all tokenizers and doesn't depend on tokenize() * feat: add streaming spaCy tokenizer * refactor: DELETE original spaCy tokenizer, rename stream_spacy to spacy * refactor: rename tokenoizer scripts back * fix: wrong grammar * feat: include spacy_tokenizer import * feat: replace old SpacyTokenizer with new StreamSpacyTokenizer * feat: ability to manage lowercasing from class constructor, typing improvements * fix: update go-bot configs, so they would work with StreamSpacyTokenizer the same as with the old tokenizer * feat: add optional logging to the spacy tokenizer * docs: update docstrings * refactor: replace custom logger with deppavlov's, pep8 style update * refactor: uotline ngramize() cause it is independent from tokenizer classes * refactor: return original JSON formatting * fix: add **kwargs to __init__() * chore: update .gitignore * refactor: more stable and consistent code * feat: add TravisCI integration * build(): add TravisCI integration * build(): add TravisCI integration * feat: add ranking model * feat: add ranking model to deeppavlov * feat: add download of dataset and embedding_model * feat: adapt to new deeppavlov interfaces * refactor: use pathlib where available in the ranking model * feat: add saving and loading responses saving with np.save * feat: add saving and loading response embeddings saving with np.save, use response embeddings to calculate predictions in __call__ function * feat: add interact regime * feat: add interact_pred_num parameter * refactor: change parameter default value, change check if the file with embeddings model exists * fix: fix non-string keys in EmbeddingDict class * feat: add parameters dict for autotests * feat: add tests support * feat: add context embeddings vocabulary (it is used in interact regime to predict the most similar contexts) * chore: change shuffle parameter default value to True in batch_generator * refactor: change config to chainer representation * fix: bug fix in urls.py file * refactor: remove emb_vocab_file saving, move build_tok2int_vocab and make_ints funcs to InsuranceDict class, add set_embeddings and reset_embeddings funcs in RankingModel * feat: add initial documentation * refactor: remove idx2int vocabulary, add vocabularies saving * change config parameters default values, remove examples in tests * feat: add table in documentation * fix: fix bug in urls.py * refactor: remove paths from config * feat: add documentation * feat: add True in tests * feat: add documentation * refactor: move init/load in the load function. * refactor: change parameters in config * feat: add logging * feat: add more logging * feat: add documentation, change parameters values in config * fix: add genesis for ranking model * fix: requirements installation order that caused setup.py error * refactor: train script * feat: add documentation * feat: models parameters check for ner * feat: parameters check added to ner * feat: parameters check added to slotfill * chore: minor clean-up * fix: fix conll-2003 model file names and archive names * refactor: remove blank line * feat: allow to stop training after n batches (#127) * fix: many minor fixes * fix: fix mark_done data_path * refactor: rename ranking_dataset to ranking_iterator.py and move it to the dataset_iterators folder * fix: fix embedding matrix construction, change epochs num default parameter value * refactor: rename registered name and name of the class * refactor: rename files and classes * refactor: change dataset downlaod * feat: add insurance embeddings and datasets in urls.py * refactor: change batch data representation (#131) * feat: install tensorflow-gpu * feat: add SQUAD model * feat: add SQuAD dataset reader * feat: add dataset, preprocessing, config * feat: add VocabEmbedder for chars and tokens * feat&fix: add model realization * feat: add training support, answer postprocessing * fix: predicted answer extraction from context * fix: dropout mask * feat: true_answer is a list of answers now * merge with dev * docs: add some docstrings * refactor: renaming variables * docs: add README.md * feat: add support of multiple inputs and outputs in interact mode * docs: upd README.md * fix: bugs after merge with dev * fix: turn on training vocabs * fix: remove keep_prob multiplier for dropout mask * fix: add short contexts support * docs: upd README.md * feat: chainer returns batch of tuples instead of tuple of batches * docs: upd squad README.md * docs: upd squad README.md * feat: add link to pretrained SQuAD model * fix: SQuAD model url * feat: add embeddings downloading and upd config * feat: add variable scope for optimizer * refactor: do not override __init__ method for squad_iterator * fix: ensure that directory exists before saving SquadVocabEmbedder * style: upd names in config and docs * chore: remove main.py used for debugging * docs: upd README.md * fix: change batch_size to fix possible OOM * test: add possibility to interact with several input query * chore: add max_batches to squad config * docs: upd README.md * fix(ranking_network): wrap y as np.array * fix: fix training stop for pytest * style: add license header * fix: refactor training stop for pytest * test: specify pytest_max_batches * feat: use all pytest keys and not only max_batches (#134) * fix: remove result stringification * feat: add GPU_only and Slow marks for tests * feat: add SQuAD dataset reader * feat: add dataset, preprocessing, config * feat: add VocabEmbedder for chars and tokens * feat&fix: add model realization * feat: add training support, answer postprocessing * fix: predicted answer extraction from context * fix: dropout mask * feat: true_answer is a list of answers now * merge with dev * docs: add some docstrings * refactor: renaming variables * docs: add README.md * feat: add support of multiple inputs and outputs in interact mode * docs: upd README.md * fix: bugs after merge with dev * fix: turn on training vocabs * fix: remove keep_prob multiplier for dropout mask * fix: add short contexts support * docs: upd README.md * feat: chainer returns batch of tuples instead of tuple of batches * docs: upd squad README.md * docs: upd squad README.md * feat: add link to pretrained SQuAD model * fix: SQuAD model url * feat: add embeddings downloading and upd config * feat: add variable scope for optimizer * refactor: do not override __init__ method for squad_iterator * fix: ensure that directory exists before saving SquadVocabEmbedder * style: upd names in config and docs * chore: remove main.py used for debugging * docs: upd README.md * fix: change batch_size to fix possible OOM * test: add possibility to interact with several input query * chore: add max_batches to squad config * docs: upd README.md * fix(ranking_network): wrap y as np.array * fix: fix training stop for pytest * style: add license header * fix: refactor training stop for pytest * test: specify pytest_max_batches * test: add couple of marks for selecting tests * test: make Travis running only fast tests without GPU * fix: ranking config works in interactbot * fix: add downloading nltk punkt for tokenization (#140) * feat: bot start message for intents does not say anything about dstc2 (#142) * feat: interactbot command works with pipes that require multiple inputs (#137) * build: change TravisCI script (#143) * feat: add Glove embedder (#138) * feat: glove embedder added * feat: embeddings added to NER network * feat: dataset and embeddings are added to urls.py for downloading * fix: char embeddings added to pretrained embeddings * feat: embedder return list of embeddings instead zero padded np array * feat: capitalization added * feat: config modified according to new features * feat: double dense added to input parameters * feat:config parameters updated * chore: fix urls for conll NER, ontonotes model url added * feat: pytest_max_batches added for faster tran check * feat: ontonotes tests added * feat: test conll max batches added * Update README.md * feat: add seq2seq go bot * fix: lowercase text while interact * feat: check saved model params * fix: rm extra configs * feat: add kvret dataset_reader * feat: add kvret_dataset_iterator * fix: add configerror * fix: dirty fix for dialog data to be lowercased * feat: check np.int and int in Vocabulary * feat: seq2seqbot works for train and infer * feat: add bleu-metric * feat: add simple seq2seq_go_bot config * fix: fix inference and load() * feat: add variable scope for optimizer * feat: add support of multiple inputs and outputs in interact mode * fix: fix padding * feat: tokenizer argument in Vocabulary * feat: chainer returns batch of tuples instead of tuple of batches * fix: spacy_tokenizer returns [['']] for batch with empty string and add alpha_only argument * feat: add per_item_bleu * feat: train seq2seq_go_bot on utterance batches * feat: tokenize y_true * feat: fit kb_entries knowledge base * feat: add split tokenizer * feat: standartize tokenizers output * feat: normalize kb entities * feat: db_columns, db_items in each sample * fix: go_bot configs (for new vocab) and loading of network * style: minor restyling * feat: add config for infer * feat: add config for infer * feat: add seq2seq_go_bot pretrained model * feat: update telegram start and help messages * style: minor styling * docs: add simple readme * doc: remove red ... blocks * doc: change Dataset to DatasetIterator * doc: update list of configs * doc: update package structure * doc: add notes about dataset element in config * feat: add squad model description to README.md * doc: add config specification for seq2seq_go_bot * fix: lowercase text while interact * feat: check saved model params * fix: rm extra configs * feat: add kvret dataset_reader * feat: add kvret_dataset_iterator * fix: add configerror * fix: dirty fix for dialog data to be lowercased * feat: check np.int and int in Vocabulary * feat: seq2seqbot works for train and infer * feat: add bleu-metric * feat: add simple seq2seq_go_bot config * fix: fix inference and load() * feat: add variable scope for optimizer * feat: add support of multiple inputs and outputs in interact mode * fix: fix padding * feat: tokenizer argument in Vocabulary * feat: chainer returns batch of tuples instead of tuple of batches * fix: spacy_tokenizer returns [['']] for batch with empty string and add alpha_only argument * feat: add per_item_bleu * feat: train seq2seq_go_bot on utterance batches * feat: tokenize y_true * feat: fit kb_entries knowledge base * feat: add split tokenizer * feat: standartize tokenizers output * feat: normalize kb entities * feat: db_columns, db_items in each sample * fix: go_bot configs (for new vocab) and loading of network * style: minor restyling * feat: add config for infer * feat: add config for infer * feat: add seq2seq_go_bot pretrained model * feat: update telegram start and help messages * style: minor styling * docs: add simple readme * docs: add seq2seq_go_bot in main readme * docs: small fix * docs: add config specification for seq2seq_go_bot * chore: remove install.py (#151) * feat: add support for batches in go-bot * feat: batching v1 * feat: bow_encoder is optional * fix: probs calculation for use_action_mask=true * refactor: do not feed inital_state during train * feat: feed sequence lengths in dynamic_rnn * refactor: rename go_bot.py -> bot.py
deeppavlov · Mar 26, 2018 · d1bccd1 · d1bccd1
1 parent 5c52988
commit d1bccd1
Show file tree

Hide file tree

Showing 100 changed files with 5,332 additions and 816 deletions.
diff --git a/.gitignore b/.gitignore
@@ -111,6 +111,7 @@ download/
 
 #project test
 /test/
+.pytest_cache
 
 # project data
 /data/
diff --git a/.travis.yml b/.travis.yml
@@ -0,0 +1,17 @@
+language: python
+
+python:
+- '3.6'
+
+cache: pip
+
+git:
+  depth: false
+
+install:
+- pip3 install -r requirements-dev.txt
+- python3 setup.py develop
+- python3 -m spacy download en
+
+script:
+- pytest -v -m "not gpu_only and not slow"
diff --git a/README.md b/README.md
@@ -4,7 +4,7 @@
 # <center>DeepPavlov</center>
 
 ### *We are in a really early Alpha release. You should be ready for hard adventures.*
-### *If you have updated to version 0.0.2 - please re-download all pre-trained models*
+### *If you have updated to version 0.0.2 or greater - please re-download all pre-trained models*
 
 DeepPavlov is an open-source conversational AI library built on TensorFlow and Keras. It is designed for
  * development of production ready chat-bots and complex conversational systems
@@ -24,8 +24,11 @@ Our goal is to enable AI-application developers researchers with:
 | [Slot filling and NER components](deeppavlov/models/ner/README.md) | Based on neural Named Entity Recognition network and fuzzy Levenshtein search to extract normalized slot values from text. The NER component reproduces architecture from the paper [Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition](https://arxiv.org/pdf/1709.09686.pdf) which is inspired by Bi-LSTM+CRF architecture from https://arxiv.org/pdf/1603.01360.pdf. |
 | [Intent classification component](deeppavlov/models/classifiers/intents/README.md) | Based on shallow-and-wide Convolutional Neural Network architecture from [Kim Y. Convolutional neural networks for sentence classification – 2014](https://arxiv.org/pdf/1408.5882). The model allows multilabel classification of sentences. |
 | [Automatic spelling correction component](deeppavlov/models/spellers/error_model/README.md) | Based on [An Improved Error Model for Noisy Channel Spelling Correction by Eric Brill and Robert C. Moore](http://www.aclweb.org/anthology/P00-1037) and uses statistics based error model, a static dictionary and an ARPA language model to correct spelling errors. |
-| **Skill** |  |
-| [Goal-oriented bot](deeppavlov/skills/go_bot/README.md) | Based on Hybrid Code Networks (HCNs) architecture from [Jason D. Williams, Kavosh Asadi, Geoffrey Zweig, Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning – 2017](https://arxiv.org/abs/1702.03274). It allows to predict responses in goal-oriented dialog. The model is customizable: embeddings, slot filler and intent classifier can switched on and off on demand. |
+| [Ranking component](deeppavlov/models/ranking/README.md) |  Based on [LSTM-based deep learning models for non-factoid answer selection](https://arxiv.org/abs/1511.04108). The model performs ranking of responses or contexts from some database by their relevance for the given context. |
+| [Question Answering component](deeppavlov/models/squad/README.md) | Based on [R-NET: Machine Reading Comprehension with Self-matching Networks](https://www.microsoft.com/en-us/research/publication/mrc/). The model solves the task of looking for an answer on a question in a given context ([SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) task format). |
+| **Skills** |  |
+| [Goal-oriented bot](deeppavlov/skills/go_bot/README.md) | Based on Hybrid Code Networks (HCNs) architecture from [Jason D. Williams, Kavosh Asadi, Geoffrey Zweig, Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning – 2017](https://arxiv.org/abs/1702.03274). It allows to predict responses in goal-oriented dialog. The model is customizable: embeddings, slot filler and intent classifier can switched on and off on demand.  |
+| [Seq2seq goal-oriented bot](deeppavlov/skills/seq2seq_go_bot/README.md) | Dialogue agent predicts responses in a goal-oriented dialog and is able to handle multiple domains (pretrained bot allows calendar scheduling, weather information retrieval, and point-of-interest navigation). The model is end-to-end differentiable and does not need to explicitly model dialogue state or belief trackers. |
 | **Embeddings** |  |
 | [Pre-trained embeddings for the Russian language](pretrained-vectors.md) | Word vectors for the Russian language trained on joint [Russian Wikipedia](https://ru.wikipedia.org/wiki/%D0%97%D0%B0%D0%B3%D0%BB%D0%B0%D0%B2%D0%BD%D0%B0%D1%8F_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D0%B0) and [Lenta.ru](https://lenta.ru/) corpora. |
 
@@ -43,14 +46,22 @@ View video demo of deployment of a goal-oriented bot and a slot-filling model wi
  ```
  python -m deeppavlov.deep interact deeppavlov/configs/go_bot/gobot_dstc2.json
  ```
- * Run slot-filling model with Telegram interface:
+  * Run goal-oriented bot with REST API:
+ ```
+ python -m deeppavlov.deep riseapi deeppavlov/configs/go_bot/gobot_dstc2.json
+ ``` 
+  * Run slot-filling model with Telegram interface:
  ```
  python -m deeppavlov.deep interactbot deeppavlov/configs/ner/slotfill_dstc2.json -t <TELEGRAM_TOKEN>
  ```
  * Run slot-filling model with console interface:
  ```
  python -m deeppavlov.deep interact deeppavlov/configs/ner/slotfill_dstc2.json
  ```
+ * Run slot-filling model with REST API:
+ ```
+ python -m deeppavlov.deep riseapi deeppavlov/configs/ner/slotfill_dstc2.json
+ ```
 ## Conceptual overview
 
 ### Principles
@@ -91,8 +102,8 @@ DeepPavlov is built on top of machine learning frameworks [TensorFlow](https://w
     * [Train config](#train-config)
     * [Train parameters](#train-parameters)
     * [DatasetReader](#datasetreader)
-    * [Dataset](#dataset)
-    * [Inferring](#inferring)
+    * [DatasetIterator](#datasetiterator)
+    * [Inference](#inference)
  * [License](#license)
  * [Support and collaboration](#support-and-collaboration)
  * [The Team](#the-team)
@@ -138,21 +149,28 @@ Then you can interact with the models or train them with the following command:
 python -m deeppavlov.deep <mode> <path_to_config>
 ```
 
-* `<mode>` can be 'train', 'interact' or 'interactbot'
+* `<mode>` can be 'train', 'interact', 'interactbot' or 'riseapi'
 * `<path_to_config>` should be a path to an NLP pipeline json config
 
 For 'interactbot' mode you should specify Telegram bot token in `-t` parameter or in `TELEGRAM_TOKEN` environment variable.
 
+For 'riseapi' mode you should specify api settings (host, port, etc.) in [*utils/server_utils/server_config.json*](utils/server_utils/server_config.json) configuration file. If provided, values from *model_defaults* section override values for the same parameters from *common_defaults* section. Model names in *model_defaults* section should be similar to the class names of the models main component.
 
 Available model configs are:
 
-*deeppavlov/configs/go_bot/gobot_dstc2.json*
+- ```deeppavlov/configs/go_bot/*.json```
+
+- ```deeppavlov/configs/seq2seq_go_bot/*.json```
 
-*deeppavlov/configs/intents/intents_dstc2.json*
+- ```deeppavlov/configs/squad/*.json```
 
-*deeppavlov/configs/ner/slotfill_dstc2.json*
+- ```deeppavlov/configs/intents/*.json```
 
-*deeppavlov/configs/error_model/brillmoore_wikitypos_en.json*
+- ```deeppavlov/configs/ner/*.json```
+
+- ```deeppavlov/configs/rankinf/*.json```
+
+- ```deeppavlov/configs/error_model/*.json```
 
 ---
 
@@ -171,7 +189,11 @@ Available model configs are:
 </tr>
 <tr>
     <td><b> deeppavlov.core.data </b></td>
-    <td> basic <b><i>Dataset</i></b>, <b><i>DatasetReader</i></b> and <b><i>Vocab</i></b> classes </td>
+    <td> basic <b><i>DatasetIterator</i></b>, <b><i>DatasetReader</i></b> and <b><i>Vocab</i></b> classes </td>
+</tr>
+<tr>
+    <td><b> deeppavlov.core.layers </b></td>
+    <td> collection of commonly used <b><i>Layers</i></b> for TF models </td>
 </tr>
 <tr>
     <td><b> deeppavlov.core.models </b></td>
@@ -182,8 +204,12 @@ Available model configs are:
     <td> concrete <b><i>DatasetReader</i></b> classes </td>
 </tr>
 <tr>
-    <td><b> deeppavlov.datasets </b></td>
-    <td> concrete <b><i>Dataset</i></b> classes </td>
+    <td><b> deeppavlov.dataset_iterators </b></td>
+    <td> concrete <b><i>DatasetIterators</i></b> classes </td>
+</tr>
+<tr>
+    <td><b> deeppavlov.metrics </b></td>
+    <td> different <b><i>Metric</i></b> functions </td>
 </tr>
 <tr>
     <td><b> deeppavlov.models </b></td>
@@ -203,7 +229,7 @@ Available model configs are:
 
 An NLP pipeline config is a JSON file that contains one required element `chainer`:
 
-```json
+```
 {
   "chainer": {
     "in": ["x"],
@@ -288,15 +314,15 @@ An NNModel should have the `in_y` parameter which contains a list of ground trut
 ]
 ```
 
-The config for training the pipeline should have three additional elements: `dataset_reader`, `dataset` and `train`:
+The config for training the pipeline should have three additional elements: `dataset_reader`, `dataset_iterator` and `train`:
 
-```json
+```
 {
   "dataset_reader": {
     "name": ...,
     ...
   }
-  "dataset": {
+  "dataset_iterator": {
     "name": ...,
     ...
   },
@@ -309,6 +335,10 @@ The config for training the pipeline should have three additional elements: `dat
 }
 ```
 
+Simplified version of trainig pipeline contains two elemens: `dataset` and `train`. The `dataset` element currently 
+can be used for train from classification data in `csv` and `json` formats. You can find complete examples of how to use simplified training pipeline in [intents_sample_csv.json](deeppavlov/configs/intents/intents_sample_csv.json) and [intents_sample_json.json](deeppavlov/configs/intents/intents_sample_json.json) config files.
+
+
 ### Train Parameters
 * `epochs` — maximum number of epochs to train NNModel, defaults to `-1` (infinite)
 * `batch_size`,
@@ -334,12 +364,12 @@ from deeppavlov.core.data.dataset_reader import DatasetReader
 class DSTC2DatasetReader(DatasetReader):
 ```
 
-### Dataset
+### DatasetIterator
 
-`Dataset` forms the sets of data ('train', 'valid', 'test') needed for training/inference and divides it into batches.
-A concrete `Dataset` class should be registered and can be inherited from
-`deeppavlov.data.dataset_reader.Dataset` class. `deeppavlov.data.dataset_reader.Dataset`
-is not an abstract class and can be used as a `Dataset` as well.
+`DatasetIterator` forms the sets of data ('train', 'valid', 'test') needed for training/inference and divides it into batches.
+A concrete `DatasetIterator` class should be registered and can be inherited from
+`deeppavlov.data.dataset_iterator.BasicDatasetIterator` class. `deeppavlov.data.dataset_iterator.BasicDatasetIterator`
+is not an abstract class and can be used as a `DatasetIterator` as well.
 
 ### Inference
 
@@ -359,7 +389,7 @@ If you have any questions, bug reports or feature requests, please feel free to
 
 ## The Team
 
-DeepPavlov is built and maintained by [Neural Networks and Deep Learning Lab](https://mipt.ru/english/research/labs/neural-networks-and-deep-learning-lab) at [MIPT](https://mipt.ru/english/).
+DeepPavlov is built and maintained by [Neural Networks and Deep Learning Lab](https://mipt.ru/english/research/labs/neural-networks-and-deep-learning-lab) at [MIPT](https://mipt.ru/english/) within [iPavlov](http://ipavlov.ai/) project (part of [National Technology Initiative](https://asi.ru/eng/nti/)) and in partnership with [Sberbank](http://www.sberbank.com/).
 
 <p align="center">
 <img src="http://ipavlov.ai/img/ipavlov_footer.png" width="50%" height="50%"/>

diff --git a/deeppavlov/__init__.py b/deeppavlov/__init__.py
@@ -1,34 +1,56 @@
+# check version
+import sys
+assert sys.hexversion >= 0x3060000, 'Does not work in python3.5 or lower'
+
+
 import deeppavlov.core.models.keras_model
-import deeppavlov.core.data.dataset
+import deeppavlov.core.data.dataset_iterator
 import deeppavlov.core.data.vocab
-import deeppavlov.dataset_readers.babi_dataset_reader
-import deeppavlov.dataset_readers.dstc2_dataset_reader
-import deeppavlov.dataset_readers.basic_ner_dataset_reader
-import deeppavlov.dataset_readers.typos
-import deeppavlov.dataset_readers.classification_dataset_reader
-import deeppavlov.datasets.dialog_dataset
-import deeppavlov.datasets.dstc2_datasets
-import deeppavlov.datasets.hcn_dataset
-import deeppavlov.datasets.intent_dataset
-import deeppavlov.datasets.typos_dataset
-import deeppavlov.datasets.classification_dataset
+import deeppavlov.dataset_readers.babi_reader
+import deeppavlov.dataset_readers.dstc2_reader
+import deeppavlov.dataset_readers.kvret_reader
+import deeppavlov.dataset_readers.conll2003_reader
+import deeppavlov.dataset_readers.typos_reader
+import deeppavlov.dataset_readers.basic_classification_reader
+import deeppavlov.dataset_readers.squad_dataset_reader
+import deeppavlov.dataset_iterators.dialog_iterator
+import deeppavlov.dataset_iterators.kvret_dialog_iterator
+import deeppavlov.dataset_iterators.dstc2_ner_iterator
+import deeppavlov.dataset_iterators.dstc2_intents_iterator
+import deeppavlov.dataset_iterators.typos_iterator
+import deeppavlov.dataset_iterators.basic_classification_iterator
+import deeppavlov.dataset_iterators.squad_iterator
 import deeppavlov.models.classifiers.intents.intent_model
 import deeppavlov.models.commutators.random_commutator
 import deeppavlov.models.embedders.fasttext_embedder
 import deeppavlov.models.embedders.dict_embedder
+import deeppavlov.models.embedders.glove_embedder
 import deeppavlov.models.encoders.bow
 import deeppavlov.models.ner.slotfill
 import deeppavlov.models.spellers.error_model.error_model
 import deeppavlov.models.trackers.hcn_at
 import deeppavlov.models.trackers.hcn_et
 import deeppavlov.models.preprocessors.str_lower
+import deeppavlov.models.preprocessors.squad_preprocessor
 import deeppavlov.models.ner.ner
-import deeppavlov.skills.go_bot.go_bot
+import deeppavlov.models.tokenizers.spacy_tokenizer
+import deeppavlov.models.tokenizers.split_tokenizer
+import deeppavlov.models.squad.squad
+import deeppavlov.skills.go_bot.bot
 import deeppavlov.skills.go_bot.network
 import deeppavlov.skills.go_bot.tracker
+import deeppavlov.skills.seq2seq_go_bot.bot
+import deeppavlov.skills.seq2seq_go_bot.network
+import deeppavlov.skills.seq2seq_go_bot.kb
 import deeppavlov.vocabs.typos
+import deeppavlov.dataset_readers.insurance_reader
+import deeppavlov.dataset_iterators.ranking_iterator
+import deeppavlov.models.ranking.ranking_model
+import deeppavlov.models.ranking.metrics
 
 import deeppavlov.metrics.accuracy
 import deeppavlov.metrics.fmeasure
+import deeppavlov.metrics.bleu
+import deeppavlov.metrics.squad_metrics
 
 import deeppavlov.core.common.log
diff --git a/deeppavlov/configs/error_model/brillmoore_kartaslov_ru.json b/deeppavlov/configs/error_model/brillmoore_kartaslov_ru.json
@@ -2,8 +2,8 @@
   "dataset_reader": {
     "name": "typos_kartaslov_reader"
   },
-  "dataset": {
-    "name": "typos_dataset",
+  "dataset_iterator": {
+    "name": "typos_iterator",
     "test_ratio": 0.02
   },
   "chainer":{

diff --git a/deeppavlov/configs/error_model/brillmoore_kartaslov_ru_custom_vocab.json b/deeppavlov/configs/error_model/brillmoore_kartaslov_ru_custom_vocab.json
@@ -2,8 +2,8 @@
   "dataset_reader": {
     "name": "typos_kartaslov_reader"
   },
-  "dataset": {
-    "name": "typos_dataset",
+  "dataset_iterator": {
+    "name": "typos_iterator",
     "test_ratio": 0.02
   },
   "chainer":{

diff --git a/deeppavlov/configs/error_model/brillmoore_wikitypos_en.json b/deeppavlov/configs/error_model/brillmoore_wikitypos_en.json
@@ -2,8 +2,8 @@
   "dataset_reader": {
     "name": "typos_wikipedia_reader"
   },
-  "dataset": {
-    "name": "typos_dataset",
+  "dataset_iterator": {
+    "name": "typos_iterator",
     "test_ratio": 0.05
   },
   "chainer":{

diff --git a/deeppavlov/configs/go_bot/gobot_dstc2.json b/deeppavlov/configs/go_bot/gobot_dstc2.json
@@ -1,10 +1,10 @@
 {
   "dataset_reader": {
-    "name": "dstc2_datasetreader",
+    "name": "dstc2_reader",
     "data_path": "dstc2"
   },
-  "dataset": {
-    "name": "dialog_dataset"
+  "dataset_iterator": {
+    "name": "dialog_iterator"
   },
   "chainer": {
     "in": ["x"],
@@ -16,7 +16,7 @@
         "fit_on": ["x"],
         "name": "default_vocab",
         "level": "token",
-        "tokenize": true,
+        "tokenizer": { "name": "split_tokenizer" },
         "save_path": "vocabs/token.dict",
         "load_path": "vocabs/token.dict"
       },
@@ -64,7 +64,7 @@
           "save_path": "go_bot/model",
           "learning_rate": 0.002,
           "dropout_rate": 0.8,
-          "hidden_dim": 128,
+          "hidden_size": 128,
           "dense_size": 64,
           "obs_size": 530,
           "action_size": 45
@@ -89,8 +89,8 @@
         },
         "intent_classifier": {
           "name": "intent_model",
-          "save_path": "intents/intent_cnn",
-          "load_path": "intents/intent_cnn",
+          "save_path": "intents/intent_cnn_v2",
+          "load_path": "intents/intent_cnn_v2",
           "classes": "#classes_vocab.keys()",
           "opt": {
             "train_now": true,
@@ -123,8 +123,8 @@
           },
           "embedder": {
             "name": "fasttext",
-            "save_path": "embeddings/dstc2_fasttext_model_100.bin",
-            "load_path": "embeddings/dstc2_fasttext_model_100.bin",
+            "save_path": "embeddings/dstc2_fastText_model.bin",
+            "load_path": "embeddings/dstc2_fastText_model.bin",
             "emb_module": "fasttext",
             "dim": 100
           },
@@ -138,7 +138,8 @@
           "name": "bow"
         },
         "tokenizer": {
-          "name": "spacy_tokenizer"
+          "name": "stream_spacy_tokenizer",
+          "lowercase": false
         },
         "tracker": {
           "name": "featurized_tracker",
@@ -156,7 +157,7 @@
   },
   "train": {
     "epochs": 200,
-    "batch_size": 1,
+    "batch_size": 2,
 
     "metrics": ["per_item_dialog_accuracy"],
     "validation_patience": 20,
@@ -167,4 +168,3 @@
     "show_examples": false
   }
 }
-