Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloading requirements #93

Closed
ohld opened this issue Mar 12, 2018 · 0 comments · Fixed by #94
Closed

Downloading requirements #93

ohld opened this issue Mar 12, 2018 · 0 comments · Fixed by #94

Comments

@ohld
Copy link

ohld commented Mar 12, 2018

I was trying to install deeppavlov and had a problem following the installation steps.

  1. There is no download.py file in root folder, it is in deeppavlov/download.py
python download.py [-all] 
  1. Even if I use that file it outputs the error:
(env) root@mysexyhost:~/work/ipavlov/DeepPavlov# python3 deeppavlov/download.py
/home/ubuntu/work/ipavlov/env/local/lib/python3.5/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
2018-03-12 07:34:11.490 ERROR in 'deeppavlov.core.models.serializable'['log'] at line 54: LOGGER ERROR: Can not initialise deeppavlov.core.models.serializable logger, logging to the stderr. Error traceback:
Traceback (most recent call last):
  File "/home/ubuntu/work/ipavlov/DeepPavlov/deeppavlov/core/common/log.py", line 32, in get_logger
    with open(log_config_path) as log_config_json:
TypeError: invalid file: PosixPath('/home/ubuntu/work/ipavlov/DeepPavlov/deeppavlov/log_config.json')
2018-03-12 07:34:11.491 ERROR in 'deeppavlov.core.models.keras_model'['log'] at line 54: LOGGER ERROR: Can not initialise deeppavlov.core.models.keras_model logger, logging to the stderr. Error traceback:
Traceback (most recent call last):
  File "/home/ubuntu/work/ipavlov/DeepPavlov/deeppavlov/core/common/log.py", line 32, in get_logger
    with open(log_config_path) as log_config_json:
TypeError: invalid file: PosixPath('/home/ubuntu/work/ipavlov/DeepPavlov/deeppavlov/log_config.json')
Traceback (most recent call last):
  File "deeppavlov/download.py", line 24, in <module>
    from deeppavlov.core.data.utils import download, download_decompress
  File "/home/ubuntu/work/ipavlov/DeepPavlov/deeppavlov/__init__.py", line 1, in <module>
    import deeppavlov.core.models.keras_model
  File "/home/ubuntu/work/ipavlov/DeepPavlov/deeppavlov/core/models/keras_model.py", line 39, in <module>
    class KerasModel(NNModel, metaclass=TfModelMeta):
  File "/home/ubuntu/work/ipavlov/DeepPavlov/deeppavlov/core/models/keras_model.py", line 143, in KerasModel
    sample_weight_mode=None, weighted_metrics=None, target_tensors=None):
  File "/home/ubuntu/work/ipavlov/env/local/lib/python3.5/site-packages/overrides/overrides.py", line 70, in overrides
    method.__name__)
AssertionError: No super class method found for "load"
seliverstov pushed a commit that referenced this issue Mar 12, 2018
…ry root (#94)

* docs: correct paths to scripts and configs to be relative to repository root

fixes #93

* docs: set paths in basic examples to be relative to the project root

* docs: run deep.py as a python module in examples
seliverstov added a commit that referenced this issue Mar 12, 2018
* fixed grammar and style

* Update README.md

* fix grammar & style

* fix grammar & style

* fix grammar&style in Intent classification README

* doc: add supported platform notes

* docs: correct paths to scripts and configs to be relative to repository root (#94)

* docs: correct paths to scripts and configs to be relative to repository root

fixes #93

* docs: set paths in basic examples to be relative to the project root

* docs: run deep.py as a python module in examples

* doc: add notes for python 3.5
seliverstov added a commit that referenced this issue Mar 26, 2018
* feat: tests can be run from project root (#86)

* refactor: instead of juggling global random states use instances of Random for datasets

* test(): add test for interacting with custom queries

After refactoring, it is possible to easily add list of query-response
pairs for every model (config), which will be used to compare pretrained
model output with expected output. Initial lists added for error_model
and ner. Also URL for downloading pretrained ner_conll2003_model added
IP-1344 #done

* Update docs from master (#96)

* fixed grammar and style

* Update README.md

* fix grammar & style

* fix grammar & style

* fix grammar&style in Intent classification README

* doc: add supported platform notes

* docs: correct paths to scripts and configs to be relative to repository root (#94)

* docs: correct paths to scripts and configs to be relative to repository root

fixes #93

* docs: set paths in basic examples to be relative to the project root

* docs: run deep.py as a python module in examples

* doc: add notes for python 3.5

* test(): change downloading to temp dir (#97)

* feat: assert python version is 3.6 or higher

* Rename dataset to dataset_iterator and other renames (#103)

* refactor: rename 'dataset' to 'dataset_iterator'

* refactor: rename dataset readers and iterators

* refactor: classification iterator and reader

* fix: dialog_iterator

* test: fix downloading procedure (#108)

* Feature/tf layers to core (#67)

* feat: layers moved to core

* feat: attention added

* fix: highway/skip connections for different dimensionality of units are fixed

* feat: NER now supports core layers

* fix: minor docstrings fixes

* feat: CuDNN GRU and LSTM added

* feat: Bidirectional CuDNN GRU and LSTM added

* feat: stacked bi-rnn refactored

* fix: fixed arguments order in rnn

* fix: remove duplicate mult_att

* chore: merge with dev

* fix: backward forward bug in cudnnrnn

* refactor: use single fasttext module, clean dependencies

* fix: add error when n_classes is zero

* feat: add fastText model usage instead of fasttext

* fix: emb_module default fastText

* chore: embedding fixed in configs

* chore: change new models names

* feat: change intent embeddings in gobot configs

* chore: fastText to fasttext, new model, change intents in gobot configs

* chore: new url on new fasttext embeddings

* fix: delete dowload all true

* fix: add url of old embedding file

* fix: delete comma

* fix: delete old embedding file from urls

* fix: delete pyfasttext from requirements, fasttext_embedder

* fix: change pyfasttext embeddings from gobot

* fix: delete from requirements

* fix: delete gensim from fasttext_embedder

* fix: simplify requirements

* fix: fix dim in gobot_all config

* refactor: remove redundant parameter 'emb_module'

* feat: use wiki.en.bin embeddings in gobot_all

* feat: check saved model params and fix lowercase for interact

* fix: lowercase text while interact

* feat: check saved model params

* fix: rm extra configs

* feat: add support for classification data in csv/json formats (#115)

* feat: add support for csv/json classification datasets

* feat: add tests for snips and samples

* fix: gobot_all config fix

* feat: add REST API for all models

* Moved telegram_utils -> utils; Refactored telegram_ui.py

* Moved telegram_utils -> utils: modified deeppavlov/deep.py

* Fixed getting model name with get_main_component() in telegram_ui.py

* chaner.py: minor fix in get_main_component()

* Added riseapi launch mode

* README.md: added riseapi mode reference

* Updated README.MD and fixed requirements.txt

* minor fixes in README.md

* Fixes in utils/server

* refactor: change endpoint names

* feat: add SteamSpacyTokenizer

* refactor: remove duplicating from script naming

* refactor: outline detokenize() meth in utils, because it should be used by all tokenizers and doesn't depend on tokenize()

* feat: add streaming spaCy tokenizer

* refactor: DELETE original spaCy tokenizer, rename stream_spacy to spacy

* refactor: rename tokenoizer scripts back

* fix: wrong grammar

* feat: include spacy_tokenizer import

* feat: replace old SpacyTokenizer with new StreamSpacyTokenizer

* feat: ability to manage lowercasing from class constructor, typing improvements

* fix: update go-bot configs, so they would work with StreamSpacyTokenizer the same as with the old tokenizer

* feat: add optional logging to the spacy tokenizer

* docs: update docstrings

* refactor: replace custom logger with deppavlov's, pep8 style update

* refactor: uotline ngramize() cause it is independent from tokenizer classes

* refactor: return original JSON formatting

* fix: add **kwargs to __init__()

* chore: update .gitignore

* refactor: more stable and consistent code

* feat: add TravisCI integration

* build(): add TravisCI integration

* build(): add TravisCI integration

* feat: add ranking model

* feat: add ranking model to deeppavlov

* feat: add download of dataset and embedding_model

* feat: adapt to new deeppavlov interfaces

* refactor: use pathlib where available in the ranking model

* feat: add saving and loading responses saving with np.save

* feat: add saving and loading response embeddings saving with np.save,
use response embeddings to calculate predictions in  __call__ function

* feat: add interact regime

* feat: add interact_pred_num parameter

* refactor: change parameter default value, change check if the file with
embeddings model exists

* fix: fix non-string keys in EmbeddingDict class

* feat: add parameters dict for autotests

* feat: add tests support

* feat: add context embeddings vocabulary (it is used in interact regime
to predict the most similar contexts)

* chore: change shuffle parameter default value to True in batch_generator

* refactor: change config to chainer representation

* fix: bug fix in urls.py file

* refactor: remove emb_vocab_file saving, move build_tok2int_vocab and
make_ints funcs to InsuranceDict class, add set_embeddings and
reset_embeddings funcs in RankingModel

* feat: add initial documentation

* refactor: remove idx2int vocabulary, add vocabularies saving

* change config parameters default values, remove examples in tests

* feat: add table in documentation

* fix: fix bug in urls.py

* refactor: remove paths from config

* feat: add documentation

* feat: add True in tests

* feat: add documentation

* refactor: move init/load in the load function.

* refactor: change parameters in config

* feat: add logging

* feat: add more logging

* feat: add documentation, change parameters values in config

* fix: add genesis for ranking model

* fix: requirements installation order that caused setup.py error

* refactor: train script

* feat: add documentation

* feat: models parameters check for ner

* feat: parameters check added to ner

* feat: parameters check added to slotfill

* chore: minor clean-up

* fix: fix conll-2003 model file names and archive names

* refactor: remove blank line

* feat: allow to stop training after n batches (#127)

* fix: many minor fixes

* fix: fix mark_done data_path

* refactor: rename ranking_dataset to ranking_iterator.py and move it to the dataset_iterators folder

* fix: fix embedding matrix construction, change epochs num
default parameter value

* refactor: rename registered name and name of the class

* refactor: rename files and classes

* refactor: change dataset downlaod

* feat: add insurance embeddings and datasets in urls.py

* refactor: change batch data representation (#131)

* feat: install tensorflow-gpu

* feat: add SQUAD model

* feat: add SQuAD dataset reader

* feat: add dataset, preprocessing, config

* feat: add VocabEmbedder for chars and tokens

* feat&fix: add model realization

* feat: add training support, answer postprocessing

* fix: predicted answer extraction from context

* fix: dropout mask

* feat: true_answer is a list of answers now

* merge with dev

* docs: add some docstrings

* refactor: renaming variables

* docs: add README.md

* feat: add support of multiple inputs and outputs in interact mode

* docs: upd README.md

* fix: bugs after merge with dev

* fix: turn on training vocabs

* fix: remove keep_prob multiplier for dropout mask

* fix: add short contexts support

* docs: upd README.md

* feat: chainer returns batch of tuples instead of tuple of batches

* docs: upd squad README.md

* docs: upd squad README.md

* feat: add link to pretrained SQuAD model

* fix: SQuAD model url

* feat: add embeddings downloading and upd config

* feat: add variable scope for optimizer

* refactor: do not override __init__ method for squad_iterator

* fix: ensure that directory exists before saving SquadVocabEmbedder

* style: upd names in config and docs

* chore: remove main.py used for debugging

* docs: upd README.md

* fix: change batch_size to fix possible OOM

* test: add possibility to interact with several input query

* chore: add max_batches to squad config

* docs: upd README.md

* fix(ranking_network): wrap y as np.array

* fix: fix training stop for pytest

* style: add license header

* fix: refactor training stop for pytest

* test: specify pytest_max_batches

* feat: use all pytest keys and not only max_batches (#134)

* fix: remove result stringification

* feat: add GPU_only and Slow marks for tests

* feat: add SQuAD dataset reader

* feat: add dataset, preprocessing, config

* feat: add VocabEmbedder for chars and tokens

* feat&fix: add model realization

* feat: add training support, answer postprocessing

* fix: predicted answer extraction from context

* fix: dropout mask

* feat: true_answer is a list of answers now

* merge with dev

* docs: add some docstrings

* refactor: renaming variables

* docs: add README.md

* feat: add support of multiple inputs and outputs in interact mode

* docs: upd README.md

* fix: bugs after merge with dev

* fix: turn on training vocabs

* fix: remove keep_prob multiplier for dropout mask

* fix: add short contexts support

* docs: upd README.md

* feat: chainer returns batch of tuples instead of tuple of batches

* docs: upd squad README.md

* docs: upd squad README.md

* feat: add link to pretrained SQuAD model

* fix: SQuAD model url

* feat: add embeddings downloading and upd config

* feat: add variable scope for optimizer

* refactor: do not override __init__ method for squad_iterator

* fix: ensure that directory exists before saving SquadVocabEmbedder

* style: upd names in config and docs

* chore: remove main.py used for debugging

* docs: upd README.md

* fix: change batch_size to fix possible OOM

* test: add possibility to interact with several input query

* chore: add max_batches to squad config

* docs: upd README.md

* fix(ranking_network): wrap y as np.array

* fix: fix training stop for pytest

* style: add license header

* fix: refactor training stop for pytest

* test: specify pytest_max_batches

* test: add couple of marks for selecting tests

* test: make Travis running only fast tests without GPU

* fix: ranking config works in interactbot

* fix: add downloading nltk punkt for tokenization (#140)

* feat: bot start message for intents does not say anything about dstc2 (#142)

* feat: interactbot command works with pipes that require multiple inputs (#137)

* build: change TravisCI script (#143)

* feat: add Glove embedder (#138)

* feat: glove embedder added

* feat: embeddings added to NER network

* feat: dataset and embeddings are added to urls.py for downloading

* fix: char embeddings added to pretrained embeddings

* feat: embedder return list of embeddings instead zero padded np array

* feat: capitalization added

* feat: config modified according to new features

* feat: double dense added to input parameters

* feat:config parameters updated

* chore: fix urls for conll NER, ontonotes model url added

* feat: pytest_max_batches added for faster tran check

* feat: ontonotes tests added

* feat: test conll max batches added

* Update README.md

* feat: add seq2seq go bot

* fix: lowercase text while interact

* feat: check saved model params

* fix: rm extra configs

* feat: add kvret dataset_reader

* feat: add kvret_dataset_iterator

* fix: add configerror

* fix: dirty fix for dialog data to be lowercased

* feat: check np.int and int in Vocabulary

* feat: seq2seqbot works for train and infer

* feat: add bleu-metric

* feat: add simple seq2seq_go_bot config

* fix: fix inference and load()

* feat: add variable scope for optimizer

* feat: add support of multiple inputs and outputs in interact mode

* fix: fix padding

* feat: tokenizer argument in Vocabulary

* feat: chainer returns batch of tuples instead of tuple of batches

* fix: spacy_tokenizer returns [['']] for batch with empty string and add alpha_only argument

* feat: add per_item_bleu

* feat: train seq2seq_go_bot on utterance batches

* feat: tokenize y_true

* feat: fit kb_entries knowledge base

* feat: add split tokenizer

* feat: standartize tokenizers output

* feat: normalize kb entities

* feat: db_columns, db_items in each sample

* fix: go_bot configs (for new vocab) and loading of network

* style: minor restyling

* feat: add config for infer

* feat: add config for infer

* feat: add seq2seq_go_bot pretrained model

* feat: update telegram start and help messages

* style: minor styling

* docs: add simple readme

* doc: remove red ... blocks

* doc: change Dataset to DatasetIterator

* doc: update list of configs

* doc: update package structure

* doc: add notes about dataset element in config

* feat: add squad model description to README.md

* doc: add config specification for seq2seq_go_bot

* fix: lowercase text while interact

* feat: check saved model params

* fix: rm extra configs

* feat: add kvret dataset_reader

* feat: add kvret_dataset_iterator

* fix: add configerror

* fix: dirty fix for dialog data to be lowercased

* feat: check np.int and int in Vocabulary

* feat: seq2seqbot works for train and infer

* feat: add bleu-metric

* feat: add simple seq2seq_go_bot config

* fix: fix inference and load()

* feat: add variable scope for optimizer

* feat: add support of multiple inputs and outputs in interact mode

* fix: fix padding

* feat: tokenizer argument in Vocabulary

* feat: chainer returns batch of tuples instead of tuple of batches

* fix: spacy_tokenizer returns [['']] for batch with empty string and add alpha_only argument

* feat: add per_item_bleu

* feat: train seq2seq_go_bot on utterance batches

* feat: tokenize y_true

* feat: fit kb_entries knowledge base

* feat: add split tokenizer

* feat: standartize tokenizers output

* feat: normalize kb entities

* feat: db_columns, db_items in each sample

* fix: go_bot configs (for new vocab) and loading of network

* style: minor restyling

* feat: add config for infer

* feat: add config for infer

* feat: add seq2seq_go_bot pretrained model

* feat: update telegram start and help messages

* style: minor styling

* docs: add simple readme

* docs: add seq2seq_go_bot in main readme

* docs: small fix

* docs: add config specification for seq2seq_go_bot

* chore: remove install.py (#151)

* feat: add support for batches in go-bot

* feat: batching v1

* feat: bow_encoder is optional

* fix: probs calculation for use_action_mask=true

* refactor: do not feed inital_state during train

* feat: feed sequence lengths in dynamic_rnn

* refactor: rename go_bot.py -> bot.py
seliverstov pushed a commit that referenced this issue Apr 3, 2018
* release 0.0.3 (#150)

* feat: tests can be run from project root (#86)

* refactor: instead of juggling global random states use instances of Random for datasets

* test(): add test for interacting with custom queries

After refactoring, it is possible to easily add list of query-response
pairs for every model (config), which will be used to compare pretrained
model output with expected output. Initial lists added for error_model
and ner. Also URL for downloading pretrained ner_conll2003_model added
IP-1344 #done

* Update docs from master (#96)

* fixed grammar and style

* Update README.md

* fix grammar & style

* fix grammar & style

* fix grammar&style in Intent classification README

* doc: add supported platform notes

* docs: correct paths to scripts and configs to be relative to repository root (#94)

* docs: correct paths to scripts and configs to be relative to repository root

fixes #93

* docs: set paths in basic examples to be relative to the project root

* docs: run deep.py as a python module in examples

* doc: add notes for python 3.5

* test(): change downloading to temp dir (#97)

* feat: assert python version is 3.6 or higher

* Rename dataset to dataset_iterator and other renames (#103)

* refactor: rename 'dataset' to 'dataset_iterator'

* refactor: rename dataset readers and iterators

* refactor: classification iterator and reader

* fix: dialog_iterator

* test: fix downloading procedure (#108)

* Feature/tf layers to core (#67)

* feat: layers moved to core

* feat: attention added

* fix: highway/skip connections for different dimensionality of units are fixed

* feat: NER now supports core layers

* fix: minor docstrings fixes

* feat: CuDNN GRU and LSTM added

* feat: Bidirectional CuDNN GRU and LSTM added

* feat: stacked bi-rnn refactored

* fix: fixed arguments order in rnn

* fix: remove duplicate mult_att

* chore: merge with dev

* fix: backward forward bug in cudnnrnn

* refactor: use single fasttext module, clean dependencies

* fix: add error when n_classes is zero

* feat: add fastText model usage instead of fasttext

* fix: emb_module default fastText

* chore: embedding fixed in configs

* chore: change new models names

* feat: change intent embeddings in gobot configs

* chore: fastText to fasttext, new model, change intents in gobot configs

* chore: new url on new fasttext embeddings

* fix: delete dowload all true

* fix: add url of old embedding file

* fix: delete comma

* fix: delete old embedding file from urls

* fix: delete pyfasttext from requirements, fasttext_embedder

* fix: change pyfasttext embeddings from gobot

* fix: delete from requirements

* fix: delete gensim from fasttext_embedder

* fix: simplify requirements

* fix: fix dim in gobot_all config

* refactor: remove redundant parameter 'emb_module'

* feat: use wiki.en.bin embeddings in gobot_all

* feat: check saved model params and fix lowercase for interact

* fix: lowercase text while interact

* feat: check saved model params

* fix: rm extra configs

* feat: add support for classification data in csv/json formats (#115)

* feat: add support for csv/json classification datasets

* feat: add tests for snips and samples

* fix: gobot_all config fix

* feat: add REST API for all models

* Moved telegram_utils -> utils; Refactored telegram_ui.py

* Moved telegram_utils -> utils: modified deeppavlov/deep.py

* Fixed getting model name with get_main_component() in telegram_ui.py

* chaner.py: minor fix in get_main_component()

* Added riseapi launch mode

* README.md: added riseapi mode reference

* Updated README.MD and fixed requirements.txt

* minor fixes in README.md

* Fixes in utils/server

* refactor: change endpoint names

* feat: add SteamSpacyTokenizer

* refactor: remove duplicating from script naming

* refactor: outline detokenize() meth in utils, because it should be used by all tokenizers and doesn't depend on tokenize()

* feat: add streaming spaCy tokenizer

* refactor: DELETE original spaCy tokenizer, rename stream_spacy to spacy

* refactor: rename tokenoizer scripts back

* fix: wrong grammar

* feat: include spacy_tokenizer import

* feat: replace old SpacyTokenizer with new StreamSpacyTokenizer

* feat: ability to manage lowercasing from class constructor, typing improvements

* fix: update go-bot configs, so they would work with StreamSpacyTokenizer the same as with the old tokenizer

* feat: add optional logging to the spacy tokenizer

* docs: update docstrings

* refactor: replace custom logger with deppavlov's, pep8 style update

* refactor: uotline ngramize() cause it is independent from tokenizer classes

* refactor: return original JSON formatting

* fix: add **kwargs to __init__()

* chore: update .gitignore

* refactor: more stable and consistent code

* feat: add TravisCI integration

* build(): add TravisCI integration

* build(): add TravisCI integration

* feat: add ranking model

* feat: add ranking model to deeppavlov

* feat: add download of dataset and embedding_model

* feat: adapt to new deeppavlov interfaces

* refactor: use pathlib where available in the ranking model

* feat: add saving and loading responses saving with np.save

* feat: add saving and loading response embeddings saving with np.save,
use response embeddings to calculate predictions in  __call__ function

* feat: add interact regime

* feat: add interact_pred_num parameter

* refactor: change parameter default value, change check if the file with
embeddings model exists

* fix: fix non-string keys in EmbeddingDict class

* feat: add parameters dict for autotests

* feat: add tests support

* feat: add context embeddings vocabulary (it is used in interact regime
to predict the most similar contexts)

* chore: change shuffle parameter default value to True in batch_generator

* refactor: change config to chainer representation

* fix: bug fix in urls.py file

* refactor: remove emb_vocab_file saving, move build_tok2int_vocab and
make_ints funcs to InsuranceDict class, add set_embeddings and
reset_embeddings funcs in RankingModel

* feat: add initial documentation

* refactor: remove idx2int vocabulary, add vocabularies saving

* change config parameters default values, remove examples in tests

* feat: add table in documentation

* fix: fix bug in urls.py

* refactor: remove paths from config

* feat: add documentation

* feat: add True in tests

* feat: add documentation

* refactor: move init/load in the load function.

* refactor: change parameters in config

* feat: add logging

* feat: add more logging

* feat: add documentation, change parameters values in config

* fix: add genesis for ranking model

* fix: requirements installation order that caused setup.py error

* refactor: train script

* feat: add documentation

* feat: models parameters check for ner

* feat: parameters check added to ner

* feat: parameters check added to slotfill

* chore: minor clean-up

* fix: fix conll-2003 model file names and archive names

* refactor: remove blank line

* feat: allow to stop training after n batches (#127)

* fix: many minor fixes

* fix: fix mark_done data_path

* refactor: rename ranking_dataset to ranking_iterator.py and move it to the dataset_iterators folder

* fix: fix embedding matrix construction, change epochs num
default parameter value

* refactor: rename registered name and name of the class

* refactor: rename files and classes

* refactor: change dataset downlaod

* feat: add insurance embeddings and datasets in urls.py

* refactor: change batch data representation (#131)

* feat: install tensorflow-gpu

* feat: add SQUAD model

* feat: add SQuAD dataset reader

* feat: add dataset, preprocessing, config

* feat: add VocabEmbedder for chars and tokens

* feat&fix: add model realization

* feat: add training support, answer postprocessing

* fix: predicted answer extraction from context

* fix: dropout mask

* feat: true_answer is a list of answers now

* merge with dev

* docs: add some docstrings

* refactor: renaming variables

* docs: add README.md

* feat: add support of multiple inputs and outputs in interact mode

* docs: upd README.md

* fix: bugs after merge with dev

* fix: turn on training vocabs

* fix: remove keep_prob multiplier for dropout mask

* fix: add short contexts support

* docs: upd README.md

* feat: chainer returns batch of tuples instead of tuple of batches

* docs: upd squad README.md

* docs: upd squad README.md

* feat: add link to pretrained SQuAD model

* fix: SQuAD model url

* feat: add embeddings downloading and upd config

* feat: add variable scope for optimizer

* refactor: do not override __init__ method for squad_iterator

* fix: ensure that directory exists before saving SquadVocabEmbedder

* style: upd names in config and docs

* chore: remove main.py used for debugging

* docs: upd README.md

* fix: change batch_size to fix possible OOM

* test: add possibility to interact with several input query

* chore: add max_batches to squad config

* docs: upd README.md

* fix(ranking_network): wrap y as np.array

* fix: fix training stop for pytest

* style: add license header

* fix: refactor training stop for pytest

* test: specify pytest_max_batches

* feat: use all pytest keys and not only max_batches (#134)

* fix: remove result stringification

* feat: add GPU_only and Slow marks for tests

* feat: add SQuAD dataset reader

* feat: add dataset, preprocessing, config

* feat: add VocabEmbedder for chars and tokens

* feat&fix: add model realization

* feat: add training support, answer postprocessing

* fix: predicted answer extraction from context

* fix: dropout mask

* feat: true_answer is a list of answers now

* merge with dev

* docs: add some docstrings

* refactor: renaming variables

* docs: add README.md

* feat: add support of multiple inputs and outputs in interact mode

* docs: upd README.md

* fix: bugs after merge with dev

* fix: turn on training vocabs

* fix: remove keep_prob multiplier for dropout mask

* fix: add short contexts support

* docs: upd README.md

* feat: chainer returns batch of tuples instead of tuple of batches

* docs: upd squad README.md

* docs: upd squad README.md

* feat: add link to pretrained SQuAD model

* fix: SQuAD model url

* feat: add embeddings downloading and upd config

* feat: add variable scope for optimizer

* refactor: do not override __init__ method for squad_iterator

* fix: ensure that directory exists before saving SquadVocabEmbedder

* style: upd names in config and docs

* chore: remove main.py used for debugging

* docs: upd README.md

* fix: change batch_size to fix possible OOM

* test: add possibility to interact with several input query

* chore: add max_batches to squad config

* docs: upd README.md

* fix(ranking_network): wrap y as np.array

* fix: fix training stop for pytest

* style: add license header

* fix: refactor training stop for pytest

* test: specify pytest_max_batches

* test: add couple of marks for selecting tests

* test: make Travis running only fast tests without GPU

* fix: ranking config works in interactbot

* fix: add downloading nltk punkt for tokenization (#140)

* feat: bot start message for intents does not say anything about dstc2 (#142)

* feat: interactbot command works with pipes that require multiple inputs (#137)

* build: change TravisCI script (#143)

* feat: add Glove embedder (#138)

* feat: glove embedder added

* feat: embeddings added to NER network

* feat: dataset and embeddings are added to urls.py for downloading

* fix: char embeddings added to pretrained embeddings

* feat: embedder return list of embeddings instead zero padded np array

* feat: capitalization added

* feat: config modified according to new features

* feat: double dense added to input parameters

* feat:config parameters updated

* chore: fix urls for conll NER, ontonotes model url added

* feat: pytest_max_batches added for faster tran check

* feat: ontonotes tests added

* feat: test conll max batches added

* Update README.md

* feat: add seq2seq go bot

* fix: lowercase text while interact

* feat: check saved model params

* fix: rm extra configs

* feat: add kvret dataset_reader

* feat: add kvret_dataset_iterator

* fix: add configerror

* fix: dirty fix for dialog data to be lowercased

* feat: check np.int and int in Vocabulary

* feat: seq2seqbot works for train and infer

* feat: add bleu-metric

* feat: add simple seq2seq_go_bot config

* fix: fix inference and load()

* feat: add variable scope for optimizer

* feat: add support of multiple inputs and outputs in interact mode

* fix: fix padding

* feat: tokenizer argument in Vocabulary

* feat: chainer returns batch of tuples instead of tuple of batches

* fix: spacy_tokenizer returns [['']] for batch with empty string and add alpha_only argument

* feat: add per_item_bleu

* feat: train seq2seq_go_bot on utterance batches

* feat: tokenize y_true

* feat: fit kb_entries knowledge base

* feat: add split tokenizer

* feat: standartize tokenizers output

* feat: normalize kb entities

* feat: db_columns, db_items in each sample

* fix: go_bot configs (for new vocab) and loading of network

* style: minor restyling

* feat: add config for infer

* feat: add config for infer

* feat: add seq2seq_go_bot pretrained model

* feat: update telegram start and help messages

* style: minor styling

* docs: add simple readme

* doc: remove red ... blocks

* doc: change Dataset to DatasetIterator

* doc: update list of configs

* doc: update package structure

* doc: add notes about dataset element in config

* feat: add squad model description to README.md

* doc: add config specification for seq2seq_go_bot

* fix: lowercase text while interact

* feat: check saved model params

* fix: rm extra configs

* feat: add kvret dataset_reader

* feat: add kvret_dataset_iterator

* fix: add configerror

* fix: dirty fix for dialog data to be lowercased

* feat: check np.int and int in Vocabulary

* feat: seq2seqbot works for train and infer

* feat: add bleu-metric

* feat: add simple seq2seq_go_bot config

* fix: fix inference and load()

* feat: add variable scope for optimizer

* feat: add support of multiple inputs and outputs in interact mode

* fix: fix padding

* feat: tokenizer argument in Vocabulary

* feat: chainer returns batch of tuples instead of tuple of batches

* fix: spacy_tokenizer returns [['']] for batch with empty string and add alpha_only argument

* feat: add per_item_bleu

* feat: train seq2seq_go_bot on utterance batches

* feat: tokenize y_true

* feat: fit kb_entries knowledge base

* feat: add split tokenizer

* feat: standartize tokenizers output

* feat: normalize kb entities

* feat: db_columns, db_items in each sample

* fix: go_bot configs (for new vocab) and loading of network

* style: minor restyling

* feat: add config for infer

* feat: add config for infer

* feat: add seq2seq_go_bot pretrained model

* feat: update telegram start and help messages

* style: minor styling

* docs: add simple readme

* docs: add seq2seq_go_bot in main readme

* docs: small fix

* docs: add config specification for seq2seq_go_bot

* chore: remove install.py (#151)

* feat: add support for batches in go-bot

* feat: batching v1

* feat: bow_encoder is optional

* fix: probs calculation for use_action_mask=true

* refactor: do not feed inital_state during train

* feat: feed sequence lengths in dynamic_rnn

* refactor: rename go_bot.py -> bot.py

* Update README.md

* feat: Ontonotes NER added

* chore: train part removed from config

* fix: readme dataset_iterator fixed, json removed from striong

* feat: raw version of test added

* fix: test modes

* fix: folder name in ontonotes config and download path now consistent

* fix: skip tests

* feat: check GPU added to ner OntoNotes
seliverstov pushed a commit that referenced this issue May 16, 2018
* release 0.0.3 (#150)

* feat: tests can be run from project root (#86)

* refactor: instead of juggling global random states use instances of Random for datasets

* test(): add test for interacting with custom queries

After refactoring, it is possible to easily add list of query-response
pairs for every model (config), which will be used to compare pretrained
model output with expected output. Initial lists added for error_model
and ner. Also URL for downloading pretrained ner_conll2003_model added
IP-1344 #done

* Update docs from master (#96)

* fixed grammar and style

* Update README.md

* fix grammar & style

* fix grammar & style

* fix grammar&style in Intent classification README

* doc: add supported platform notes

* docs: correct paths to scripts and configs to be relative to repository root (#94)

* docs: correct paths to scripts and configs to be relative to repository root

fixes #93

* docs: set paths in basic examples to be relative to the project root

* docs: run deep.py as a python module in examples

* doc: add notes for python 3.5

* test(): change downloading to temp dir (#97)

* feat: assert python version is 3.6 or higher

* Rename dataset to dataset_iterator and other renames (#103)

* refactor: rename 'dataset' to 'dataset_iterator'

* refactor: rename dataset readers and iterators

* refactor: classification iterator and reader

* fix: dialog_iterator

* test: fix downloading procedure (#108)

* Feature/tf layers to core (#67)

* feat: layers moved to core

* feat: attention added

* fix: highway/skip connections for different dimensionality of units are fixed

* feat: NER now supports core layers

* fix: minor docstrings fixes

* feat: CuDNN GRU and LSTM added

* feat: Bidirectional CuDNN GRU and LSTM added

* feat: stacked bi-rnn refactored

* fix: fixed arguments order in rnn

* fix: remove duplicate mult_att

* chore: merge with dev

* fix: backward forward bug in cudnnrnn

* refactor: use single fasttext module, clean dependencies

* fix: add error when n_classes is zero

* feat: add fastText model usage instead of fasttext

* fix: emb_module default fastText

* chore: embedding fixed in configs

* chore: change new models names

* feat: change intent embeddings in gobot configs

* chore: fastText to fasttext, new model, change intents in gobot configs

* chore: new url on new fasttext embeddings

* fix: delete dowload all true

* fix: add url of old embedding file

* fix: delete comma

* fix: delete old embedding file from urls

* fix: delete pyfasttext from requirements, fasttext_embedder

* fix: change pyfasttext embeddings from gobot

* fix: delete from requirements

* fix: delete gensim from fasttext_embedder

* fix: simplify requirements

* fix: fix dim in gobot_all config

* refactor: remove redundant parameter 'emb_module'

* feat: use wiki.en.bin embeddings in gobot_all

* feat: check saved model params and fix lowercase for interact

* fix: lowercase text while interact

* feat: check saved model params

* fix: rm extra configs

* feat: add support for classification data in csv/json formats (#115)

* feat: add support for csv/json classification datasets

* feat: add tests for snips and samples

* fix: gobot_all config fix

* feat: add REST API for all models

* Moved telegram_utils -> utils; Refactored telegram_ui.py

* Moved telegram_utils -> utils: modified deeppavlov/deep.py

* Fixed getting model name with get_main_component() in telegram_ui.py

* chaner.py: minor fix in get_main_component()

* Added riseapi launch mode

* README.md: added riseapi mode reference

* Updated README.MD and fixed requirements.txt

* minor fixes in README.md

* Fixes in utils/server

* refactor: change endpoint names

* feat: add SteamSpacyTokenizer

* refactor: remove duplicating from script naming

* refactor: outline detokenize() meth in utils, because it should be used by all tokenizers and doesn't depend on tokenize()

* feat: add streaming spaCy tokenizer

* refactor: DELETE original spaCy tokenizer, rename stream_spacy to spacy

* refactor: rename tokenoizer scripts back

* fix: wrong grammar

* feat: include spacy_tokenizer import

* feat: replace old SpacyTokenizer with new StreamSpacyTokenizer

* feat: ability to manage lowercasing from class constructor, typing improvements

* fix: update go-bot configs, so they would work with StreamSpacyTokenizer the same as with the old tokenizer

* feat: add optional logging to the spacy tokenizer

* docs: update docstrings

* refactor: replace custom logger with deppavlov's, pep8 style update

* refactor: uotline ngramize() cause it is independent from tokenizer classes

* refactor: return original JSON formatting

* fix: add **kwargs to __init__()

* chore: update .gitignore

* refactor: more stable and consistent code

* feat: add TravisCI integration

* build(): add TravisCI integration

* build(): add TravisCI integration

* feat: add ranking model

* feat: add ranking model to deeppavlov

* feat: add download of dataset and embedding_model

* feat: adapt to new deeppavlov interfaces

* refactor: use pathlib where available in the ranking model

* feat: add saving and loading responses saving with np.save

* feat: add saving and loading response embeddings saving with np.save,
use response embeddings to calculate predictions in  __call__ function

* feat: add interact regime

* feat: add interact_pred_num parameter

* refactor: change parameter default value, change check if the file with
embeddings model exists

* fix: fix non-string keys in EmbeddingDict class

* feat: add parameters dict for autotests

* feat: add tests support

* feat: add context embeddings vocabulary (it is used in interact regime
to predict the most similar contexts)

* chore: change shuffle parameter default value to True in batch_generator

* refactor: change config to chainer representation

* fix: bug fix in urls.py file

* refactor: remove emb_vocab_file saving, move build_tok2int_vocab and
make_ints funcs to InsuranceDict class, add set_embeddings and
reset_embeddings funcs in RankingModel

* feat: add initial documentation

* refactor: remove idx2int vocabulary, add vocabularies saving

* change config parameters default values, remove examples in tests

* feat: add table in documentation

* fix: fix bug in urls.py

* refactor: remove paths from config

* feat: add documentation

* feat: add True in tests

* feat: add documentation

* refactor: move init/load in the load function.

* refactor: change parameters in config

* feat: add logging

* feat: add more logging

* feat: add documentation, change parameters values in config

* fix: add genesis for ranking model

* fix: requirements installation order that caused setup.py error

* refactor: train script

* feat: add documentation

* feat: models parameters check for ner

* feat: parameters check added to ner

* feat: parameters check added to slotfill

* chore: minor clean-up

* fix: fix conll-2003 model file names and archive names

* refactor: remove blank line

* feat: allow to stop training after n batches (#127)

* fix: many minor fixes

* fix: fix mark_done data_path

* refactor: rename ranking_dataset to ranking_iterator.py and move it to the dataset_iterators folder

* fix: fix embedding matrix construction, change epochs num
default parameter value

* refactor: rename registered name and name of the class

* refactor: rename files and classes

* refactor: change dataset downlaod

* feat: add insurance embeddings and datasets in urls.py

* refactor: change batch data representation (#131)

* feat: install tensorflow-gpu

* feat: add SQUAD model

* feat: add SQuAD dataset reader

* feat: add dataset, preprocessing, config

* feat: add VocabEmbedder for chars and tokens

* feat&fix: add model realization

* feat: add training support, answer postprocessing

* fix: predicted answer extraction from context

* fix: dropout mask

* feat: true_answer is a list of answers now

* merge with dev

* docs: add some docstrings

* refactor: renaming variables

* docs: add README.md

* feat: add support of multiple inputs and outputs in interact mode

* docs: upd README.md

* fix: bugs after merge with dev

* fix: turn on training vocabs

* fix: remove keep_prob multiplier for dropout mask

* fix: add short contexts support

* docs: upd README.md

* feat: chainer returns batch of tuples instead of tuple of batches

* docs: upd squad README.md

* docs: upd squad README.md

* feat: add link to pretrained SQuAD model

* fix: SQuAD model url

* feat: add embeddings downloading and upd config

* feat: add variable scope for optimizer

* refactor: do not override __init__ method for squad_iterator

* fix: ensure that directory exists before saving SquadVocabEmbedder

* style: upd names in config and docs

* chore: remove main.py used for debugging

* docs: upd README.md

* fix: change batch_size to fix possible OOM

* test: add possibility to interact with several input query

* chore: add max_batches to squad config

* docs: upd README.md

* fix(ranking_network): wrap y as np.array

* fix: fix training stop for pytest

* style: add license header

* fix: refactor training stop for pytest

* test: specify pytest_max_batches

* feat: use all pytest keys and not only max_batches (#134)

* fix: remove result stringification

* feat: add GPU_only and Slow marks for tests

* feat: add SQuAD dataset reader

* feat: add dataset, preprocessing, config

* feat: add VocabEmbedder for chars and tokens

* feat&fix: add model realization

* feat: add training support, answer postprocessing

* fix: predicted answer extraction from context

* fix: dropout mask

* feat: true_answer is a list of answers now

* merge with dev

* docs: add some docstrings

* refactor: renaming variables

* docs: add README.md

* feat: add support of multiple inputs and outputs in interact mode

* docs: upd README.md

* fix: bugs after merge with dev

* fix: turn on training vocabs

* fix: remove keep_prob multiplier for dropout mask

* fix: add short contexts support

* docs: upd README.md

* feat: chainer returns batch of tuples instead of tuple of batches

* docs: upd squad README.md

* docs: upd squad README.md

* feat: add link to pretrained SQuAD model

* fix: SQuAD model url

* feat: add embeddings downloading and upd config

* feat: add variable scope for optimizer

* refactor: do not override __init__ method for squad_iterator

* fix: ensure that directory exists before saving SquadVocabEmbedder

* style: upd names in config and docs

* chore: remove main.py used for debugging

* docs: upd README.md

* fix: change batch_size to fix possible OOM

* test: add possibility to interact with several input query

* chore: add max_batches to squad config

* docs: upd README.md

* fix(ranking_network): wrap y as np.array

* fix: fix training stop for pytest

* style: add license header

* fix: refactor training stop for pytest

* test: specify pytest_max_batches

* test: add couple of marks for selecting tests

* test: make Travis running only fast tests without GPU

* fix: ranking config works in interactbot

* fix: add downloading nltk punkt for tokenization (#140)

* feat: bot start message for intents does not say anything about dstc2 (#142)

* feat: interactbot command works with pipes that require multiple inputs (#137)

* build: change TravisCI script (#143)

* feat: add Glove embedder (#138)

* feat: glove embedder added

* feat: embeddings added to NER network

* feat: dataset and embeddings are added to urls.py for downloading

* fix: char embeddings added to pretrained embeddings

* feat: embedder return list of embeddings instead zero padded np array

* feat: capitalization added

* feat: config modified according to new features

* feat: double dense added to input parameters

* feat:config parameters updated

* chore: fix urls for conll NER, ontonotes model url added

* feat: pytest_max_batches added for faster tran check

* feat: ontonotes tests added

* feat: test conll max batches added

* Update README.md

* feat: add seq2seq go bot

* fix: lowercase text while interact

* feat: check saved model params

* fix: rm extra configs

* feat: add kvret dataset_reader

* feat: add kvret_dataset_iterator

* fix: add configerror

* fix: dirty fix for dialog data to be lowercased

* feat: check np.int and int in Vocabulary

* feat: seq2seqbot works for train and infer

* feat: add bleu-metric

* feat: add simple seq2seq_go_bot config

* fix: fix inference and load()

* feat: add variable scope for optimizer

* feat: add support of multiple inputs and outputs in interact mode

* fix: fix padding

* feat: tokenizer argument in Vocabulary

* feat: chainer returns batch of tuples instead of tuple of batches

* fix: spacy_tokenizer returns [['']] for batch with empty string and add alpha_only argument

* feat: add per_item_bleu

* feat: train seq2seq_go_bot on utterance batches

* feat: tokenize y_true

* feat: fit kb_entries knowledge base

* feat: add split tokenizer

* feat: standartize tokenizers output

* feat: normalize kb entities

* feat: db_columns, db_items in each sample

* fix: go_bot configs (for new vocab) and loading of network

* style: minor restyling

* feat: add config for infer

* feat: add config for infer

* feat: add seq2seq_go_bot pretrained model

* feat: update telegram start and help messages

* style: minor styling

* docs: add simple readme

* doc: remove red ... blocks

* doc: change Dataset to DatasetIterator

* doc: update list of configs

* doc: update package structure

* doc: add notes about dataset element in config

* feat: add squad model description to README.md

* doc: add config specification for seq2seq_go_bot

* fix: lowercase text while interact

* feat: check saved model params

* fix: rm extra configs

* feat: add kvret dataset_reader

* feat: add kvret_dataset_iterator

* fix: add configerror

* fix: dirty fix for dialog data to be lowercased

* feat: check np.int and int in Vocabulary

* feat: seq2seqbot works for train and infer

* feat: add bleu-metric

* feat: add simple seq2seq_go_bot config

* fix: fix inference and load()

* feat: add variable scope for optimizer

* feat: add support of multiple inputs and outputs in interact mode

* fix: fix padding

* feat: tokenizer argument in Vocabulary

* feat: chainer returns batch of tuples instead of tuple of batches

* fix: spacy_tokenizer returns [['']] for batch with empty string and add alpha_only argument

* feat: add per_item_bleu

* feat: train seq2seq_go_bot on utterance batches

* feat: tokenize y_true

* feat: fit kb_entries knowledge base

* feat: add split tokenizer

* feat: standartize tokenizers output

* feat: normalize kb entities

* feat: db_columns, db_items in each sample

* fix: go_bot configs (for new vocab) and loading of network

* style: minor restyling

* feat: add config for infer

* feat: add config for infer

* feat: add seq2seq_go_bot pretrained model

* feat: update telegram start and help messages

* style: minor styling

* docs: add simple readme

* docs: add seq2seq_go_bot in main readme

* docs: small fix

* docs: add config specification for seq2seq_go_bot

* chore: remove install.py (#151)

* feat: add support for batches in go-bot

* feat: batching v1

* feat: bow_encoder is optional

* fix: probs calculation for use_action_mask=true

* refactor: do not feed inital_state during train

* feat: feed sequence lengths in dynamic_rnn

* refactor: rename go_bot.py -> bot.py

* Update README.md

* feat: add Ontonotes NER with Senna

* feat: Ontonotes NER added

* chore: train part removed from config

* fix: readme dataset_iterator fixed, json removed from striong

* feat: raw version of test added

* fix: test modes

* fix: folder name in ontonotes config and download path now consistent

* fix: skip tests

* Revert "feat: add Ontonotes NER with Senna" (#160)

This reverts commit ae91d8f.

* Add custom slot_filler.

* Add custom go_bot configuration gobot_my.json.

* Import my_slot_filler.slotfill to register simple_slotfiller.

* Add new configuration files for gobot_my and gobot_simple.

* Build data_reader for pharma_bot.

* Fix bugs about reading data.

* Rename modules

* Implement the data reader templates

* Remove the normalize function because it is not needed. Assuming the slot names are not changed.

* feat: add dstc2 with api calls

* fix: episode_done fix

* fix: mv db_result from y to x

* feat: if made api_call respond with next prediction

* feat: create dstc2_v2 with api_calls

* refactor: change interact_db_result logic and fix debug output

* feat: add learning rate polynomial decay

* refactor: rm moved to core modules

* feat: add Sqlite3Database

* feat: rm db_result_during_interaction & add database in go_bot

* refactor: logging

* fix: import fix

* feat: add threshold for levenshtein score

* fix: fix l2 regularization

* feat: add features to state tracker

* feat: add db context features

* feat: new config for intents model with wiki.en.bin

* docs: readme for intents

add info about two models with different embeddings for DSTC 2

* feat: configs use dstc2_v2

* fix: minor db fix

* feat: add variational dropout & fix logging

* feat: raw slotfiller moved to the separate folder and inherited from
Serializable

* feat: orthodox slotfill moved to slotfill folder

* refactor: external functions to class methods

* feat: config for raw slotfiller added

* feat: slotfilling config is simplified by dstc_ner config reference

* chore: unnecessary imports removed

* chore: fix outdated imports

* docs: simple description for raw slotfill

* fix: fix action mask for api-calls

* refactor: dict to tuple in database

* feat: add training for database

* feat: add optimizer configuration

* docs: fix github links

* docs: add new network parameters and database

* docs: add -d description

* feat: add threshold to slotfill configs

* fix: attention over intents work

* docs: add comparison with external models

* docs: dstc2_v2 vs dstc2

* docs: fix dstc2_reader

* refactor: add api call action as a config parameter

* docs: add template and database doc

* feat: template type from str to class

* docs: database class

* feat: update configs

* feat: fix templates

* refactor: rm extra files

* refactor: database.py -> sqlite_database.py

* feat: mv dropout to dense layer, update configs, fix dropout_rate

* feat: update gobot_dstc2_best model

* feat: dropout on attentioned embeddings

* feat: add intents_dstc2_big tests

* feat: update gobot_dstc2 model

* feat: retrain gobot_dstc2

* refactor: rm unused code

* docs: update examples and metrics

* fix: remove training from slotfill
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant