Release 0.2.0

deeppavlov · Mar 22, 2019 · 2ac5a09 · 2ac5a09
2 parents c5f481d + aea7497
commit 2ac5a09
Show file tree

Hide file tree

Showing 355 changed files with 8,646 additions and 3,893 deletions.
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -4,4 +4,3 @@ include requirements.txt
 include deeppavlov/requirements/*.txt
 recursive-include deeppavlov *.json
 recursive-include deeppavlov *.md
-recursive-include utils *.json
diff --git a/README.md b/README.md
@@ -7,6 +7,12 @@ DeepPavlov is an open-source conversational AI library built on [TensorFlow](htt
  * NLP and dialog systems research.
 
 
+### Breaking changes in version 0.2.0!
+- `utils` module was moved from repository root in to `deeppavlov` module
+- `ms_bot_framework_utils`,`server_utils`, `telegram utils` modules was renamed to `ms_bot_framework`, `server` and `telegram` correspondingly
+- rename metric functions `exact_match` to `squad_v2_em` and  `squad_f1` to `squad_v2_f1`
+- replace dashes in configs name with underscores
+
 ### Breaking changes in version 0.1.0!
 - As of `version 0.1.0` all models, embeddings and other downloaded data for provided configurations are
  by default downloaded to the `.deeppavlov` directory in current user's home directory.
@@ -83,7 +89,7 @@ print(HelloBot(['Hello!', 'Boo...', 'Bye.']))
 
 **Auto ML**
 
-[Tuning Models with Evolutionary Algorithm](http://docs.deeppavlov.ai/en/latest/intro/parameters_evolution.html)
+[Tuning Models with Evolutionary Algorithm](http://docs.deeppavlov.ai/en/latest/intro/hypersearch.html)
 
 # Installation
 
@@ -136,19 +142,20 @@ python -m deeppavlov <mode> <path_to_config> [-d]
 * `<path_to_config>` should be a path to an NLP pipeline json config (e.g. `deeppavlov/configs/ner/slotfill_dstc2.json`)
 or a name without the `.json` extension of one of the config files [provided](deeppavlov/configs) in this repository (e.g. `slotfill_dstc2`)
 
-For the `interactbot` mode you should specify Telegram bot token in `-t` parameter or in `TELEGRAM_TOKEN` environment variable. Also if you want to get custom `/start` and `/help` Telegram messages for the running model you should:
-* Add section to [*utils/settings/models_info.json*](utils/settings/models_info.json) with your custom Telegram messages
-* In model config file specify `metadata.labels.telegram_utils` parameter with name which refers to the added section of [*utils/settings/models_info.json*](utils/settings/models_info.json)
-
-For the `interactmsbot` mode you should specify **Microsoft app id** in `-i` and **Microsoft app secret** in `-s`. Also before launch you should specify api deployment settings (host, port) in [*utils/settings/server_config.json*](utils/settings/server_config.json) configuration file. Note, that Microsoft Bot Framework requires `https` endpoint with valid certificate from CA.
-Here is [detailed info on the Microsoft Bot Framework integration](http://docs.deeppavlov.ai/en/latest/devguides/ms_bot_integration.html)
+For the `interactbot` mode you should specify Telegram bot token in `-t` parameter or in `TELEGRAM_TOKEN` environment variable.
+Also you should use `--no-default-skill` optional flag if your component implements an interface of DeepPavlov [*Skill*](deeppavlov/core/skill/skill.py) to skip its wrapping with DeepPavlov [*DefaultStatelessSkill*](deeppavlov/skills/default_skill/default_skill.py).
+If you want to get custom `/start` and `/help` Telegram messages for the running model you should:
+* Add section to [*deeppavlov/utils/settings/models_info.json*](deeppavlov/utils/settings/models_info.json) with your custom Telegram messages
+* In model config file specify `metadata.labels.telegram_utils` parameter with name which refers to the added section of [*deeppavlov/utils/settings/models_info.json*](deeppavlov/utils/settings/models_info.json)
 
-You can also store your tokens, app ids, secrets in appropriate sections of [*utils/settings/server_config.json*](utils/settings/server_config.json). Please note, that all command line parameters override corresponding config ones.
+You can also serve DeepPavlov models for:
+* Microsoft Bot Framework ([see developer guide for the detailed instructions](http://docs.deeppavlov.ai/en/latest/devguides/ms_bot_integration.html)) 
+* Amazon Alexa ([see developer guide for the detailed instructions](http://docs.deeppavlov.ai/en/latest/devguides/amazon_alexa.html)) 
 
-For `riseapi` mode you should specify api settings (host, port, etc.) in [*utils/settings/server_config.json*](utils/settings/server_config.json) configuration file. If provided, values from *model_defaults* section override values for the same parameters from *common_defaults* section. Model names in *model_defaults* section should be similar to the class names of the models main component.
+For `riseapi` mode you should specify api settings (host, port, etc.) in [*deeppavlov/utils/settings/server_config.json*](deeppavlov/utils/settings/server_config.json) configuration file. If provided, values from *model_defaults* section override values for the same parameters from *common_defaults* section. Model names in *model_defaults* section should be similar to the class names of the models main component.
 Here is [detailed info on the DeepPavlov REST API](http://docs.deeppavlov.ai/en/latest/devguides/rest_api.html)
 
-All DeepPavlov settings files are stored in `utils/settings` by default. You can get full path to it with `python -m deeppavlov.settings settings`. Also you can move it with with `python -m deeppavlov.settings settings -p <new/configs/dir/path>` (all your configuration settings will be preserved) or move it to default location with `python -m deeppavlov.settings settings -d` (all your configuration settings will be RESET to default ones).
+All DeepPavlov settings files are stored in `deeppavlov/utils/settings` by default. You can get full path to it with `python -m deeppavlov.settings settings`. Also you can move it with with `python -m deeppavlov.settings settings -p <new/configs/dir/path>` (all your configuration settings will be preserved) or move it to default location with `python -m deeppavlov.settings settings -d` (all your configuration settings will be RESET to default ones).
 
 For `predict` you can specify path to input file with `-f` or `--input-file` parameter, otherwise, data will be taken
 from stdin.  

diff --git a/deeppavlov/__init__.py b/deeppavlov/__init__.py
@@ -15,6 +15,8 @@
 import sys
 from pathlib import Path
 
+from .core.common.log import init_logger
+
 try:
     from .configs import configs
     # noinspection PyUnresolvedReferences
@@ -35,7 +37,7 @@ def evaluate_model(config: [str, Path, dict], download: bool = False, recursive:
 except ImportError:
     'Assuming that requirements are not yet installed'
 
-__version__ = '0.1.6'
+__version__ = '0.2.0'
 __author__ = 'Neural Networks and Deep Learning lab, MIPT'
 __description__ = 'An open source library for building end-to-end dialog systems and training chatbots.'
 __keywords__ = ['NLP', 'NER', 'SQUAD', 'Intents', 'Chatbot']
@@ -49,3 +51,6 @@ def evaluate_model(config: [str, Path, dict], download: bool = False, recursive:
 dot_dp_path = Path('~/.deeppavlov').expanduser().resolve()
 if dot_dp_path.is_file():
     dot_dp_path.unlink()
+
+# initiate logging
+init_logger()
diff --git a/deeppavlov/agents/default_agent/default_agent.py b/deeppavlov/agents/default_agent/default_agent.py
@@ -14,12 +14,12 @@
 
 from typing import List, Optional
 
+from deeppavlov.agents.filters.transparent_filter import TransparentFilter
+from deeppavlov.agents.processors.highest_confidence_selector import HighestConfidenceSelector
 from deeppavlov.core.agent.agent import Agent
 from deeppavlov.core.agent.filter import Filter
 from deeppavlov.core.agent.processor import Processor
-from deeppavlov.core.skill.skill import Skill
-from deeppavlov.agents.filters.transparent_filter import TransparentFilter
-from deeppavlov.agents.processors.highest_confidence_selector import HighestConfidenceSelector
+from deeppavlov.core.models.component import Component
 
 
 class DefaultAgent(Agent):
@@ -38,7 +38,7 @@ class DefaultAgent(Agent):
     You can refer to :class:`deeppavlov.core.skill.Skill`, :class:`deeppavlov.core.agent.Filter`, :class:`deeppavlov.core.agent.Processor` base classes to get more info.
 
     Args:
-        skills: List of initiated agent skills instances.
+        skills: List of initiated agent skills or components instances.
         skills_processor: Initiated agent processor.
         skills_filter: Initiated agent filter.
 
@@ -47,11 +47,11 @@ class DefaultAgent(Agent):
         skills_processor: Initiated agent processor.
         skills_filter: Initiated agent filter.
     """
-    def __init__(self, skills: List[Skill], skills_processor: Optional[Processor]=None,
-                 skills_filter: Optional[Filter]=None, *args, **kwargs) -> None:
+    def __init__(self, skills: List[Component], skills_processor: Optional[Processor] = None,
+                 skills_filter: Optional[Filter] = None, *args, **kwargs) -> None:
         super(DefaultAgent, self).__init__(skills=skills)
-        self.skills_filter: Filter = skills_filter or TransparentFilter(len(skills))
-        self.skills_processor: Processor = skills_processor or HighestConfidenceSelector()
+        self.skills_filter = skills_filter or TransparentFilter(len(skills))
+        self.skills_processor = skills_processor or HighestConfidenceSelector()
 
     def _call(self, utterances_batch: list, utterances_ids: Optional[list]=None) -> list:
         """

diff --git a/deeppavlov/agents/ecommerce_agent/ecommerce_agent.py b/deeppavlov/agents/ecommerce_agent/ecommerce_agent.py
@@ -14,22 +14,22 @@
 
 import argparse
 from collections import defaultdict
+from logging import getLogger
 from typing import List, Dict, Any
 
+from deeppavlov.agents.rich_content.default_rich_content import PlainText, ButtonsFrame, Button
 from deeppavlov.core.agent.agent import Agent
-from deeppavlov.core.common.log import get_logger
-from deeppavlov.core.skill.skill import Skill
-from deeppavlov.core.commands.infer import build_model
 from deeppavlov.core.agent.rich_content import RichMessage
-from deeppavlov.agents.rich_content.default_rich_content import PlainText, ButtonsFrame, Button
+from deeppavlov.core.commands.infer import build_model
+from deeppavlov.core.skill.skill import Skill
 from deeppavlov.deep import find_config
-from utils.ms_bot_framework_utils.server import run_ms_bot_framework_server
+from deeppavlov.utils.ms_bot_framework.server import run_ms_bot_framework_server
 
 parser = argparse.ArgumentParser()
 parser.add_argument("-i", "--ms-id", help="microsoft bot framework app id", type=str)
 parser.add_argument("-s", "--ms-secret", help="microsoft bot framework app secret", type=str)
 
-log = get_logger(__name__)
+log = getLogger(__name__)
 
 
 class EcommerceAgent(Agent):

diff --git a/deeppavlov/agents/processors/default_rich_content_processor.py b/deeppavlov/agents/processors/default_rich_content_processor.py
@@ -12,9 +12,9 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+from deeppavlov.agents.rich_content.default_rich_content import PlainText
 from deeppavlov.core.agent.processor import Processor
 from deeppavlov.core.agent.rich_content import RichMessage
-from deeppavlov.agents.rich_content.default_rich_content import PlainText
 
 
 class DefaultRichContentWrapper(Processor):

diff --git a/deeppavlov/agents/rich_content/default_rich_content.py b/deeppavlov/agents/rich_content/default_rich_content.py
@@ -30,6 +30,9 @@ def __init__(self, text: str) -> None:
         super(PlainText, self).__init__('plain_text')
         self.content: str = text
 
+    def __str__(self) -> str:
+        return self.content
+
     def json(self) -> dict:
         """Returns json compatible state of the PlainText instance.
 

diff --git a/deeppavlov/configs/classifiers/insults_kaggle.json b/deeppavlov/configs/classifiers/insults_kaggle.json
@@ -123,8 +123,12 @@
     "val_every_n_epochs": 5,
     "log_every_n_epochs": 5,
     "show_examples": false,
-    "validate_best": true,
-    "test_best": true
+    "evaluation_targets": [
+      "train",
+      "valid",
+      "test"
+    ],
+    "class_name": "nn_trainer"
   },
   "metadata": {
     "variables": {

diff --git a/deeppavlov/configs/classifiers/insults_kaggle_bert.json b/deeppavlov/configs/classifiers/insults_kaggle_bert.json
@@ -0,0 +1,144 @@
+{
+  "dataset_reader": {
+    "class_name": "basic_classification_reader",
+    "x": "Comment",
+    "y": "Class",
+    "data_path": "{DOWNLOADS_PATH}/insults_data"
+  },
+  "dataset_iterator": {
+    "class_name": "basic_classification_iterator",
+    "seed": 42
+  },
+  "chainer": {
+    "in": [
+      "x"
+    ],
+    "in_y": [
+      "y"
+    ],
+    "pipe": [
+      {
+        "class_name": "bert_preprocessor",
+        "vocab_file": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12/vocab.txt",
+        "do_lower_case": false,
+        "max_seq_length": 64,
+        "in": [
+          "x"
+        ],
+        "out": [
+          "bert_features"
+        ]
+      },
+      {
+        "id": "classes_vocab",
+        "class_name": "simple_vocab",
+        "fit_on": [
+          "y"
+        ],
+        "save_path": "{MODELS_PATH}/classes.dict",
+        "load_path": "{MODELS_PATH}/classes.dict",
+        "in": "y",
+        "out": "y_ids"
+      },
+      {
+        "in": "y_ids",
+        "out": "y_onehot",
+        "class_name": "one_hotter",
+        "depth": "#classes_vocab.len",
+        "single_vector": true
+      },
+      {
+        "class_name": "bert_classifier",
+        "n_classes": "#classes_vocab.len",
+        "return_probas": true,
+        "one_hot_labels": true,
+        "bert_config_file": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12/bert_config.json",
+        "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12/bert_model.ckpt",
+        "save_path": "{MODELS_PATH}/model",
+        "load_path": "{MODELS_PATH}/model",
+        "keep_prob": 0.5,
+        "learning_rate": 1e-05,
+        "learning_rate_drop_patience": 5,
+        "learning_rate_drop_div": 2.0,
+        "in": [
+          "bert_features"
+        ],
+        "in_y": [
+          "y_onehot"
+        ],
+        "out": [
+          "y_pred_probas"
+        ]
+      },
+      {
+        "in": "y_pred_probas",
+        "out": "y_pred_ids",
+        "class_name": "proba2labels",
+        "max_proba": true
+      },
+      {
+        "in": "y_pred_ids",
+        "out": "y_pred_labels",
+        "ref": "classes_vocab"
+      }
+    ],
+    "out": [
+      "y_pred_labels"
+    ]
+  },
+  "train": {
+    "epochs": 100,
+    "batch_size": 64,
+    "metrics": [
+      {
+        "name": "roc_auc",
+        "inputs": [
+          "y_onehot",
+          "y_pred_probas"
+        ]
+      },
+      "sets_accuracy",
+      "f1_macro"
+    ],
+    "validation_patience": 5,
+    "val_every_n_epochs": 1,
+    "log_every_n_epochs": 1,
+    "show_examples": false,
+    "evaluation_targets": [
+      "train",
+      "valid",
+      "test"
+    ],
+    "class_name": "nn_trainer",
+    "tensorboard_log_dir": "{MODELS_PATH}/"
+  },
+  "metadata": {
+    "variables": {
+      "ROOT_PATH": "~/.deeppavlov",
+      "DOWNLOADS_PATH": "{ROOT_PATH}/downloads",
+      "MODELS_PATH": "{ROOT_PATH}/models/classifiers/insults_kaggle_v3"
+    },
+    "requirements": [
+      "{DEEPPAVLOV_PATH}/requirements/tf.txt",
+      "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt"
+    ],
+    "labels": {
+      "telegram_utils": "IntentModel",
+      "server_utils": "KerasIntentModel"
+    },
+    "download": [
+      {
+        "url": "http://files.deeppavlov.ai/datasets/insults_data.tar.gz",
+        "subdir": "{DOWNLOADS_PATH}"
+      },
+      {
+        "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/cased_L-12_H-768_A-12.zip",
+        "subdir": "{DOWNLOADS_PATH}/bert_models"
+      },
+      {
+        "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/insults_kaggle_v3.tar.gz",
+        "subdir": "{ROOT_PATH}/models/classifiers"
+      }
+    ]
+  }
+}
diff --git a/deeppavlov/configs/classifiers/intents_dstc2.json b/deeppavlov/configs/classifiers/intents_dstc2.json
@@ -127,8 +127,12 @@
     "val_every_n_epochs": 5,
     "log_every_n_batches": 100,
     "show_examples": false,
-    "validate_best": true,
-    "test_best": true
+    "evaluation_targets": [
+      "train",
+      "valid",
+      "test"
+    ],
+    "class_name": "nn_trainer"
   },
   "metadata": {
     "variables": {