Skip to content

Commit

Permalink
Merge pull request #1283 from RasaHQ/duckling-http-docs
Browse files Browse the repository at this point in the history
added duckling http docs
  • Loading branch information
tmbo committed Aug 2, 2018
2 parents c4f8d64 + 31ee533 commit 92ec05c
Show file tree
Hide file tree
Showing 5 changed files with 50 additions and 33 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ Added
- the ``/version`` endpoint returns a new field ``minimum_compatible_version``
- added logging of intent prediction errors to evaluation script
- added histogram of confidence scores to evaluation script
- documentation for the ``ner_duckling_http`` component

Changed
-------
Expand All @@ -47,6 +48,9 @@ Changed
Removed
-------
- dependence on spaCy when training ``ner_crf`` without POS features
- documentation for the ``ner_duckling`` component - facebook doesn't maintain
the underlying clojure version of duckling anymore. component will be
removed in the next release.

Fixed
-----
Expand Down
25 changes: 11 additions & 14 deletions docs/entities.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,14 @@ Entity Extraction
=================


================ ================ ======================== ===================================
Component Requires Model notes
================ ================ ======================== ===================================
``ner_crf`` sklearn-crfsuite conditional random field good for training custom entities
``ner_spacy`` spaCy averaged perceptron provides pre-trained entities
``ner_duckling`` duckling context-free grammar provides pre-trained entities
``ner_mitie`` MITIE structured SVM good for training custom entities
================ ================ ======================== ===================================
======================= ================ ======================== ===================================
Component Requires Model notes
======================= ================ ======================== ===================================
``ner_crf`` sklearn-crfsuite conditional random field good for training custom entities
``ner_spacy`` spaCy averaged perceptron provides pre-trained entities
``ner_duckling_http`` running duckling context-free grammar provides pre-trained entities
``ner_mitie`` MITIE structured SVM good for training custom entities
======================= ================ ======================== ===================================


Custom Entities
Expand Down Expand Up @@ -48,12 +48,9 @@ objects that you can use, e.g.
The list of supported langauges is `here <https://github.com/facebook/duckling/tree/master/Duckling/Dimensions>`_.
Duckling can also handle durations like "two hours",
amounts of money, distances, and ordinals.
Fortunately, there is also a
`python wrapper <https://github.com/FraBle/python-duckling>`_ for
duckling! You can use this component by installing the duckling
package from PyPI and adding ``ner_duckling`` to your pipeline.
Alternatively, you can run duckling separately (natively or in a docker container)
and use the ``ner_duckling_http`` component.
Fortunately, there is a duckling docker container ready to use,
that you just need to spin up and connect to Rasa NLU.
(see :ref:`ner_duckling_http`)


Regular Expressions (regex)
Expand Down
18 changes: 10 additions & 8 deletions docs/evaluation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -91,21 +91,23 @@ Improving the quality of your training data will move the histogram bars to the

Entity Extraction
-----------------
For each entity extractor, the evaluation script logs its performance per entity type in your training data.
So if you use ``ner_crf`` and ``ner_duckling`` in your pipeline, it will log two evaluation tables
For each entity extractor, the evaluation script
logs its performance per entity type in your training data.
So if you use ``ner_crf`` and ``ner_duckling_http``
in your pipeline, it will log two evaluation tables
containing recall, precision, and f1 measure for each entity type.

In the case ``ner_duckling`` we actually run the evaluation for each defined
duckling dimension. If you use the ``time`` and ``ordinal`` dimensions, you would
get two evaluation tables: one for ``ner_duckling (Time)`` and one for
``ner_duckling (Ordinal)``.
In the case ``ner_duckling_http`` we actually run the evaluation for
each defined duckling dimension. If you use the ``time`` and ``ordinal``
dimensions, you would get two evaluation tables: one for
``ner_duckling_http (Time)`` and one for ``ner_duckling_http (Ordinal)``.

``ner_synonyms`` does not create an evaluation table, because it only changes the value of the found
entities and does not find entity boundaries itself.

Finally, keep in mind that entity types in your testing data have to match the output
of the extraction components. This is particularly important for ``ner_duckling``, because it is not
fit to your training data.
of the extraction components. This is particularly important for
``ner_duckling_http``, because it is not fit to your training data.


Entity Scoring
Expand Down
2 changes: 0 additions & 2 deletions docs/migrations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,6 @@ how you can migrate from one version to another.
Unfortunately, it is not possible to load previously trained models as
the parameters for the tensorflow and CRF models changed.



0.11.x to 0.12.0
----------------

Expand Down
34 changes: 25 additions & 9 deletions docs/pipeline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -760,11 +760,12 @@ ner_crf
# Specifies the L2 regularization coefficient.
L2_c: 0.1
.. _section_pipeline_duckling:
.. _ner_duckling_http:

ner_duckling
~~~~~~~~~~~~
:Short: Adds duckling support to the pipeline to unify entity types (e.g. to retrieve common date / number formats)
ner_duckling_http
~~~~~~~~~~~~~~~~~
:Short: Duckling lets you extract common entities like dates,
amounts of money, distances, and others in a number of languages.
:Outputs: appends ``entities``
:Output-Example:

Expand All @@ -776,10 +777,17 @@ ner_duckling
"start": 48,
"value": "2017-04-10T00:00:00.000+02:00",
"confidence": 1.0,
"extractor": "ner_duckling"}]
"extractor": "ner_duckling_http"}]
}
:Description:
To use this component you need to run a duckling server. The easiest
option is to spin up a docker container using
``docker run -p 8000:8000 rasa/duckling``.

Alternatively, you can install duckling directly on your
`machine and start the server <https://github.com/facebook/duckling#quickstart>`_.

Duckling allows to recognize dates, numbers, distances and other structured entities
and normalizes them (for a reference of all available entities
see `the duckling documentation <https://duckling.wit.ai/#getting-started>`_).
Expand All @@ -792,16 +800,24 @@ ner_duckling
based system.

:Configuration:
Configure which dimensions, i.e. entity types, the :ref:`duckling component <section_pipeline_duckling>` to extract.
A full list of available dimensions can be found in the `duckling documentation <https://duckling.wit.ai/>`_.
Configure which dimensions, i.e. entity types, the duckling component
to extract. A full list of available dimensions can be found in
the `duckling documentation <https://duckling.wit.ai/>`_.

.. code-block:: yaml
pipeline:
- name: "ner_duckling"
- name: "ner_duckling_http"
# url of the running duckling server
url: "http://localhost:8000"
# dimensions to extract
dimensions: ["time", "number", "amount-of-money", "distance"]
# allows you to configure the locale, by default the language is
# used
locale: "de_DE"
# if not set the default timezone of Duckling is going to be used
# needed to calculate dates from relative expressions like "tomorrow"
timezone: "Europe/Berlin"
.. _section_component_lifecycle:
Expand Down

0 comments on commit 92ec05c

Please sign in to comment.