Merge 959919a into 8b864e1

RasaHQ · Oct 16, 2018 · cba70a9 · cba70a9
2 parents 8b864e1 + 959919a
commit cba70a9
Show file tree

Hide file tree

Showing 10 changed files with 494 additions and 111 deletions.
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -19,6 +19,8 @@ Added
 - Command line interface for interactive learning now displays policy confidence alongside the action name
 - added action prediction confidence & policy to ``ActionExecuted`` event
 - both the date and the time at which a model was trained are now included in the policy's metadata when it is persisted
+- option for end-to-end evaluation of Rasa Core and NLU examples in
+  ``evaluate.py`` script
 
 
 Changed

diff --git a/data/test_evaluations/end_to_end_story.md b/data/test_evaluations/end_to_end_story.md
@@ -0,0 +1,17 @@
+## simple_story_with_only_start
+> check_greet                   <!-- checkpoints at the start define entry points -->
+* default:/default
+    - utter_default
+
+## simple_story_with_only_end
+* greet:/greet
+    - utter_greet
+> check_greet                   <!-- checkpoint defining the end of this turn -->
+
+## simple_story_with_multiple_turns
+* greet:/greet
+    - utter_greet
+* default:/default
+    - utter_default
+* goodbye:/goodbye
+    - utter_goodbye
diff --git a/docs/evaluation.rst b/docs/evaluation.rst
@@ -5,6 +5,12 @@
 Evaluating and Testing
 ======================
 
+.. note::
+
+  If you're looking to evaluate both Rasa NLU and Rasa Core predictions
+  combined, take a look at the section on
+  :ref:`end-to-end evaluation <end_to_end_evaluation>`.
+
 Evaluating a Trained Model
 --------------------------
 
@@ -13,7 +19,7 @@ by using the evaluate script:
 
 .. code-block:: bash
 
-    python -m rasa_core.evaluate -d models/dialogue \
+    $ python -m rasa_core.evaluate -d models/dialogue \
       -s test_stories.md -o matrix.pdf --failed failed_stories.md
 
 
@@ -26,12 +32,60 @@ In addition, this will save a confusion matrix to a file called
 domain, how often that action was predicted, and how often an
 incorrect action was predicted instead.
 
-
-
 The full list of options for the script is:
 
 .. program-output:: python -m rasa_core.evaluate -h
 
+.. _end_to_end_evaluation:
+
+End-to-end evaluation of Rasa NLU and Core
+------------------------------------------
+
+Say your bot uses a dialogue model in combination with a Rasa NLU model to
+parse intent messages, and you would like to evaluate how the two models
+perform together on whole dialogues.
+The evaluate script lets you evaluate dialogues end-to-end, combining
+Rasa NLU intent predictions with Rasa Core action predictions.
+You can activate this feature with the ``--e2e`` option in the
+``rasa_core.evaluate`` module.
+
+The story format used for end-to-end evaluation is slightly different to
+the standard Rasa Core stories, as you'll have to include the user
+messages in natural language instead of just their intent. The format for the
+user messages is ``* <intent>:<Rasa NLU example>``. The NLU part follows the
+`markdown syntax for Rasa NLU training data
+<https://rasa.com/docs/nlu/dataformat/#markdown-format>`_.
+
+Here's an example of what an end-to-end story file may look like:
+
+.. code-block:: story
+
+  ## end-to-end story 1
+  * greet: hello
+     - utter_ask_howcanhelp
+  * inform: show me [chinese](cuisine) restaurants
+     - utter_ask_location
+  * inform: in [Paris](location)
+     - utter_ask_price
+
+  ## end-to-end story 2
+  ...
+
+
+If you've saved these stories under ``e2e_storied.md``,
+the full end-to-end evaluation command is this:
+
+.. code-block:: bash
+
+  $ python -m rasa_core.evaluate -d models/dialogue --nlu models/nlu/current \
+    -s e2e_stories.md --e2e
+
+.. note::
+
+  Make sure you specify an NLU model to load with the dialogue model using the
+  ``--nlu`` option of ``rasa_core.evaluate``. If you do not specify an NLU
+  model, Rasa Core will load the default ``RegexInterpreter``.
+
 
 Comparing Policies
 ------------------