Skip to content
This repository has been archived by the owner on Aug 22, 2019. It is now read-only.

Commit

Permalink
Merge 959919a into 8b864e1
Browse files Browse the repository at this point in the history
  • Loading branch information
ricwo committed Oct 16, 2018
2 parents 8b864e1 + 959919a commit cba70a9
Show file tree
Hide file tree
Showing 10 changed files with 494 additions and 111 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.rst
Expand Up @@ -19,6 +19,8 @@ Added
- Command line interface for interactive learning now displays policy confidence alongside the action name
- added action prediction confidence & policy to ``ActionExecuted`` event
- both the date and the time at which a model was trained are now included in the policy's metadata when it is persisted
- option for end-to-end evaluation of Rasa Core and NLU examples in
``evaluate.py`` script


Changed
Expand Down
17 changes: 17 additions & 0 deletions data/test_evaluations/end_to_end_story.md
@@ -0,0 +1,17 @@
## simple_story_with_only_start
> check_greet <!-- checkpoints at the start define entry points -->
* default:/default
- utter_default

## simple_story_with_only_end
* greet:/greet
- utter_greet
> check_greet <!-- checkpoint defining the end of this turn -->
## simple_story_with_multiple_turns
* greet:/greet
- utter_greet
* default:/default
- utter_default
* goodbye:/goodbye
- utter_goodbye
60 changes: 57 additions & 3 deletions docs/evaluation.rst
Expand Up @@ -5,6 +5,12 @@
Evaluating and Testing
======================

.. note::

If you're looking to evaluate both Rasa NLU and Rasa Core predictions
combined, take a look at the section on
:ref:`end-to-end evaluation <end_to_end_evaluation>`.

Evaluating a Trained Model
--------------------------

Expand All @@ -13,7 +19,7 @@ by using the evaluate script:

.. code-block:: bash
python -m rasa_core.evaluate -d models/dialogue \
$ python -m rasa_core.evaluate -d models/dialogue \
-s test_stories.md -o matrix.pdf --failed failed_stories.md
Expand All @@ -26,12 +32,60 @@ In addition, this will save a confusion matrix to a file called
domain, how often that action was predicted, and how often an
incorrect action was predicted instead.



The full list of options for the script is:

.. program-output:: python -m rasa_core.evaluate -h

.. _end_to_end_evaluation:

End-to-end evaluation of Rasa NLU and Core
------------------------------------------

Say your bot uses a dialogue model in combination with a Rasa NLU model to
parse intent messages, and you would like to evaluate how the two models
perform together on whole dialogues.
The evaluate script lets you evaluate dialogues end-to-end, combining
Rasa NLU intent predictions with Rasa Core action predictions.
You can activate this feature with the ``--e2e`` option in the
``rasa_core.evaluate`` module.

The story format used for end-to-end evaluation is slightly different to
the standard Rasa Core stories, as you'll have to include the user
messages in natural language instead of just their intent. The format for the
user messages is ``* <intent>:<Rasa NLU example>``. The NLU part follows the
`markdown syntax for Rasa NLU training data
<https://rasa.com/docs/nlu/dataformat/#markdown-format>`_.

Here's an example of what an end-to-end story file may look like:

.. code-block:: story
## end-to-end story 1
* greet: hello
- utter_ask_howcanhelp
* inform: show me [chinese](cuisine) restaurants
- utter_ask_location
* inform: in [Paris](location)
- utter_ask_price
## end-to-end story 2
...
If you've saved these stories under ``e2e_storied.md``,
the full end-to-end evaluation command is this:

.. code-block:: bash
$ python -m rasa_core.evaluate -d models/dialogue --nlu models/nlu/current \
-s e2e_stories.md --e2e
.. note::

Make sure you specify an NLU model to load with the dialogue model using the
``--nlu`` option of ``rasa_core.evaluate``. If you do not specify an NLU
model, Rasa Core will load the default ``RegexInterpreter``.


Comparing Policies
------------------
Expand Down

0 comments on commit cba70a9

Please sign in to comment.