Merge pull request #84 from deepmipt/dev

Dev
deeppavlov · Mar 23, 2020 · 87e3d5c · 87e3d5c
2 parents 130d191 + 0334fe4
commit 87e3d5c
Show file tree

Hide file tree

Showing 6 changed files with 286 additions and 426 deletions.
diff --git a/docs/source/api/services_http_api.rst b/docs/source/api/services_http_api.rst
@@ -1,6 +1,6 @@
-There are 5 types of dialog services that can be connected to the `Agent's dialog pipeline <dialog-pipeline_>`__:
+These types of dialog services can be connected to the agent's conversational pipeline:
 
-    *  **Annotators**
+    *  **Annotator**
     *  **Skill Selector**
     *  **Skills**
     *  **Response Selector**
@@ -10,36 +10,34 @@ There are 5 types of dialog services that can be connected to the `Agent's dialo
 Input Format
 ============
 
-All services get a standardized Agent State as input. The input format is described `here <state_>`__.
-
-To reformat Agent State format into your service's input format, you need to write a **formatter** function and
-specify it's name into the Agent's `config file <config file>`__. You can use our DeepPavlov `formatters <formatters>`__
-as example.
+All services should accept an input in an agent ``state`` format. This format is described `here <state_>`__.
+If an input format of a service differs from the agent state format then a **formatter** function should be implemented. 
+This formatter function recieves a request in agent state format and returns a request in format supported by the service. 
 
 Output Format
 =============
 
-All services have it's own specified output format. If you need to reformat your service's response, you should use the same
-formatter function that you used for the input format, just use the ``mode=='out'`` flag.
+All services should provide an output in an agent ``state`` format. This format is described `here <state_>`__.
+To use the same formatter for input and output set the ``mode=='out'`` flag.
 
 Annotator
 =========
 
-Annotator should return a free-form response.
+Annotator service returns a free-form response.
 
 For example, the NER annotator may return a dictionary with ``tokens`` and ``tags`` keys:
 
     .. code:: json
 
         {"tokens": ["Paris"], "tags": ["I-LOC"]}
 
-For example, a Sentiment annotator can return a list of labels:
+Sentiment annotator can return a list of labels:
 
     .. code:: json
 
         ["neutral", "speech"]
 
-Also a Sentiment annotator can return just a string:
+Also, Sentiment annotator can return just a string:
 
     .. code:: json
 
@@ -48,7 +46,7 @@ Also a Sentiment annotator can return just a string:
 Skill Selector
 ==============
 
-Skill Selector should return a list of selected skill names.
+Skill Selector service should return a list of names for skills selected to generate a candidate response for a dialog.
 
 For example:
 
@@ -60,73 +58,80 @@ For example:
 Skill
 =====
 
-Skill should return a **list of dicts** where each dict ia a single hypothesis. Each dict requires
-``text`` and ``confidence`` keys. If a skill wants to update either **Human** or **Bot** profile,
-it should pack these attributes into ``human_attributes`` and ``bot_attributes`` keys.
+Skill service should return a **list of dicts** where each dict corresponds to a single candidate response. 
+Each candidate response entry requires ``text`` and ``confidence`` keys. 
+The Skill can update **Human** or **Bot** profile. 
+To do this, it should pack these attributes into ``human_attributes`` and ``bot_attributes`` keys.
 
 All attributes in ``human_attributes`` and ``bot_attributes`` will overwrite current **Human** and **Bot**
-attribute values accordingly. And if there are no such attributes, they will be stored under ``attributes``
-key inside **Human** or **Bot**.
+attribute values in agent state. And if there are no such attributes, 
+they will be stored under ``attributes`` key inside **Human** or **Bot**.
 
 The minimum required response of a skill is a 2-key dictionary:
 
 
     .. code:: json
 
-        [{"text": "hello", "confidence": 0.33}]
+        [{"text": "hello", 
+          "confidence": 0.33}]
 
 But it's possible to extend it with  ``human_attributes`` and ``bot_attributes`` keys:
 
     .. code:: json
 
-        [{"text": "hello", "confidence": 0.33, "human_attributes": {"name": "Vasily"},
-        "bot_attributes": {"persona": ["I like swimming.", "I have a nice swimming suit."]}}]
+        [{"text": "hello", 
+          "confidence": 0.33, 
+          "human_attributes": 
+            {"name": "Vasily"},
+          "bot_attributes": 
+            {"persona": ["I like swimming.", "I have a nice swimming suit."]}}]
 
 Everything sent to ``human_attributes`` and ``bot_attributes`` keys will update `user` field in the same
-utterance for the human and in the next utterance for the bot. Please refer to user_state_api_ to find more
-information about the **User** object updates.
+utterance for the human and in the next utterance for the bot. Please refer to agent state_ documentation for more information about the **User** object updates.
 
 Also it's possible for a skill to send any additional key to the state:
 
     .. code:: json
 
-        [{"text": "hello", "confidence": 0.33, "any_key": "any_value"}]
+        [{"text": "hello", 
+          "confidence": 0.33, 
+          "any_key": "any_value"}]
 
 
 Response Selector
 =================
 
-Unlike Skill Selector, Response Selector should select a *single* skill responsible for generation of the
-final response shown to the user. The expected result is a name of the selected skill, text (may be
-overwritten from the original skill response) and confidence (also may be overwritten):
+Unlike Skill Selector, Response Selector service should select a *single* skill as a source of the
+final version of response. The service returns a name of the selected skill, text (might be
+overwritten from the original skill response) and confidence (also might be overwritten):
 
  .. code:: json
 
-        {"skill_name": "chitchat", "text": "Hello, Joe!", "confidence": 0.3}
+        {"skill_name": "chitchat", 
+         "text": "Hello, Joe!", 
+         "confidence": 0.3}
 
 Also it's possible for a Response Selector to overwrite any ``human`` or ``bot`` attributes:
 
  .. code:: json
 
-        {"skill_name": "chitchat", "text": "Hello, Joe!", "confidence": 0.3, "human_attributes": {"name": "Ivan"}}
+        {"skill_name": "chitchat", 
+         "text": "Hello, Joe!", 
+         "confidence": 0.3, 
+         "human_attributes": {"name": "Ivan"}}
 
 Postprocessor
 =============
 
-Postprocessor has a power to rewrite a final bot answer selected by the Response Selector. For example, it can
+Postprocessor service can rewrite an utterance selected by the Response Selector. For example, it can
 take a user's name from the state and add it to the final answer.
 
-It simply should return a rewritten answer. The rewritten answer will go the ``text`` field of the final
-utterance shown to the user, and the original skill answer will go to the ``orig_text`` field.
+If a response was modified by Postprocessor then a new version goes the ``text`` field of the final
+utterance and shown to the user, and the utterance selected by Response Selector goes to the ``orig_text`` field.
 
  .. code:: json
 
         "Goodbye, Joe!"
 
 
-.. _dialog-pipeline: https://deeppavlov-agent.readthedocs.io/en/latest/intro/overview.html#architecture-overview
 .. _state: https://deeppavlov-agent.readthedocs.io/en/latest/_static/api.html
-.. _config file: https://github.com/deepmipt/dp-agent/blob/master/config.py
-.. _formatters: https://github.com/deepmipt/dp-agent/blob/master/state_formatters/dp_formatters.py
-.. _user_state_api: https://deeppavlov-agent.readthedocs.io/en/latest/api/user_state_api.html
-
diff --git a/docs/source/api/user_state_api.rst b/docs/source/api/user_state_api.rst
@@ -8,7 +8,7 @@ the utterance, refer to the ``user.user_type`` field:
 
         "utterances": [{"user": {"user_type": "human"}}]
 
-A `Skill <skill>`__  can update any fields in **User** (**Human** or **Bot**) objects. If a **Skill** updates a **Human**,
+A skill can update any fields in **User** (**Human** or **Bot**) objects. If a **Skill** updates a **Human**,
 the **Human** fields will be changed in this utterance accordingly. If a **Skill** updates a **Bot**, the **Bot** fields will be
 changed in the *next* (generated by the bot) utterance.
 
@@ -21,5 +21,3 @@ The history of all changes made by skills to users can be looked up at the list
     .. code:: javascript
 
         "utterances": [{"user": {"user_type": "human"}, "hypotheses": []}]
-
-.. _skill: https://deeppavlov-agent.readthedocs.io/en/latest/api/services_http_api.html#skill
diff --git a/docs/source/config/config.rst b/docs/source/config/config.rst
@@ -0,0 +1,147 @@
+Agent Configuration
+======================
+
+You can provide pipeline and database configuration for agent with config files. Both ``json`` and ``yml`` formats are acceptable.
+
+**Config Description**
+
+**Database**
+
+Database configuration parameters are provided via ``db_conf`` file. Currently, agent runs on Mongo DB.
+
+All default values are taken from `Mongo DB documentation <https://docs.mongodb.com/manual/>`__. Please refer to these docs if you need to
+change anything.
+
+Sample database config:
+
+    .. code-block:: json
+
+        {
+            "env": false,
+            "host": "mongo",
+            "port": 27017,
+            "name": "dp_agent"
+        }
+
+* **env**
+    * If set to **false** (or not mentioned), exact parameters values will be used for db initialisation. Otherwise, agent will try to get an environmental variable by name, associated with parameter.
+* **host**
+    * A database host, or env variable, where database host name is stored
+* **port**
+    * A database port, or env variable, where database port is stored
+* **name**
+    * An name of the database, or env variable, where name of the database is stored
+
+
+**Pipeline**
+
+Pipeline configuration parameters are provided via ``pipeline_conf`` file. There are two different sections in config, which are used to configure Connectors and Services
+
+**Services Config**
+
+Service represents a single node of pipeline graph, or single step in processing of user message.
+In ``pipeline_conf`` all services are grouped under "service" key.
+Sample service config:
+
+    .. code-block:: json
+
+        {"group_name": {
+                "service_label": {
+                    "dialog_formatter": "dialog formatter",
+                    "response_formatter": "response formatter",
+                    "connector": "used connector",
+                    "previous_services": "list of previous services",
+                    "required_previous_services": "list of previous services",
+                    "state_manager_method": "associated state manager method",
+                    "tags": "list of tags"
+                }
+            }
+        }
+
+* **group name**
+    * This is an optional key. If it is presented, you can mention services via their group name (in previous_services and required_previous_services)
+    * In case if `group name` is presented, the actual service name will be ``<group name>.<service label>``
+* **service_label**
+    * Label of the service. Used as unique service name, if service is not grouped
+    * Passed to state manager method, associated with service. So, service_label is saved in state
+* **dialog_formatter**
+    * Function, which extracts all the needed information from dialog and generate a list of tasks for sending to services
+    * Can be configured in ``<python module name>:<function name>`` format
+    * Formatter can produce several tasks from one dialog (for instance, you want to annotate all hypotheses)
+    * Each task represents a single valid request payload, which can be processed by service without further formatting
+* **response formatter**
+    * Function, which re-formats a service response in a way, which is suitable for saving in dialog state
+    * Can be configured in ``<python module name>:<function name>`` format
+    * Optional parameter. Exact service output will be sent to state manager method, if that parameter is not presented
+* **connector**
+    * Function, which represents a connector to service. Can be configured here, or in Connectors
+    * You can link a connector from `connectors` section by typing ``connectors.<connector name>``
+* **previous_services**
+    * List of name of services, which should be completed (or skipped, or respond with an error) before sending data to current service
+    * Should contain either groups names or service names
+* **required_previous_services**
+    * List of names of services, which must be correctly completed before this service since their results are used in current service
+    * If at least one of the required_previous_services is skipped or finished with error, current service will be skipped to
+    * Should contain either groups names or service names
+* **state_manager_method**
+    * Name of the method of a StateManager class, which will be executed afterwards
+* **tags**
+    * Tags, associated with the service
+    * Currently, tags are used in order to separate a service with specific behaviour
+    * **selector** - this tag marks a skill selector service. It returns a list of skills, which are selected for further processing
+    * **timeout** - this tag marks a timeout service, which will engage if deadline timestamp is presented and processing time exceeds it
+    * **last_chance** - this tag marks a last chance service, which will engage if other services in pipeline have finished executing with an error, and further processing became impossible
+
+**Connectors config**
+
+Connector represents a function, where tasks are sent in order to process. Can be implementation of some data transfer protocol or model implemented in python.
+Since agent is based on asynchronous execution, and can be slowed down by blocking synchronous parts, it is strongly advised to implement computational heavy services separate from agent, and use some protocols (like http) for data transfer.
+
+There are several possibilities, to configure connector:
+
+1. *Built-in HTTP*
+
+    .. code:: json
+
+        {"connector name": {
+                "protocol": "http",
+                "url": "connector url"
+                "batch_size": "batch size for the service"
+            }
+        }
+
+    * **connector name**
+        * A name of the connector. Used in `services` part of the config, in order to associate service with the connector
+    * **protocol**
+        * http
+    * **url**
+        * Actual url, where an external service api is accessible. Should be in format ``http://<host>:<port>/<path>``
+    * **batch_size**
+        * Represents a maximum task count, which will be sent to a service in a batch. If not presented is interpreted as 1
+        * If the value is 1, an `HTTPConnector <https://github.com/deepmipt/dp-agent/blob/master/deeppavlov_agent/core/connectors.py#L10>`__ class is used.
+        * If the value is more than one, agent will use `AioQueueConnector <https://github.com/deepmipt/dp-agent/blob/master/deeppavlov_agent/core/connectors.py#L32>`__. That connector sends data to asyncio queue. Same time, worker `QueueListenerBatchifyer <https://github.com/deepmipt/dp-agent/blob/master/deeppavlov_agent/core/connectors.py#L40>`__, which collects data from queue, assembles batches and sends them to a service.
+
+
+2. *Python class*
+
+    .. code:: json
+
+        {"connector name": {
+                "protocol": "python",
+                "class_name": "class name in 'python module name:class name' format",
+                "other parameter 1": "",
+                "other parameter 2": ""
+            }
+        }
+
+    * **connector name**
+        * Same as in HTTP connector case
+    * **protocol**
+        * python
+    * **class_name**
+        * Path to the connector's class in ``<python module name>:<class name>`` format
+            * Connector's class should implement asynchronous ``send(self, payload: Dict, callback: Callable)`` method
+            * ``payload represents`` a single task, provided by a dialog formatter, associated with service, alongside with ``task_id``: :code:`{'task_id': some_uuid, 'payload': dialog_formatter_task_data}`
+            * ``callback`` is an asynchronous function `process <https://github.com/deepmipt/dp-agent/blob/master/deeppavlov_agent/core/agent.py#L58>`__. You should call that with service response and task_id after processing
+    * **other parameters**
+        * Any json compatible parameters, which will be passed to the connector class initialisation as ``**kwargs``
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -1,7 +1,7 @@
 Welcome to DeepPavlov Agent documentation!
 ==========================================
 
-**DeepPavlov Agent** is a platform for creating multi-skill chatbots.
+**DeepPavlov Agent** is a framework for development of scalable and production ready multi-skill virtual assistants, complex dialogue systems and chatbots.
 
 .. toctree::
    :glob:
@@ -34,4 +34,10 @@ Welcome to DeepPavlov Agent documentation!
    :maxdepth: 2
    :caption: State Formatters
 
-   state_formatters/formatters
+   state_formatters/formatters
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Configuration files
+
+   config/config