Merge 898e30a into 59e120d

RasaHQ · Feb 12, 2019 · a2d4b42 · a2d4b42
2 parents 59e120d + 898e30a
commit a2d4b42
Show file tree

Hide file tree

Showing 21 changed files with 169 additions and 163 deletions.
diff --git a/docs/choosing_pipeline.rst b/docs/choosing_pipeline.rst
@@ -1,4 +1,6 @@
-:desc: Setting up a Rasa NLU pipeline
+:desc: Set up a pipeline of pre-trained word vectors form GloVe or fastText
+       or fit them specifically on your dataset using the tensorflow pipeline
+       for open source NLU.
 
 .. _choosing_pipeline:
 
@@ -8,7 +10,7 @@ Choosing a Rasa NLU Pipeline
 The Short Answer
 ----------------
 
-If you have less than 1000 total training examples, and there is a spaCy model for your 
+If you have less than 1000 total training examples, and there is a spaCy model for your
 language, use the ``spacy_sklearn`` pipeline:
 
 .. literalinclude:: ../sample_configs/config_spacy.yml
@@ -38,39 +40,39 @@ doesn't use any pre-trained word vectors, but instead fits these specifically fo
 The advantage of the ``spacy_sklearn`` pipeline is that if you have a training example like:
 "I want to buy apples", and Rasa is asked to predict the intent for "get pears", your model
 already knows that the words "apples" and "pears" are very similar. This is especially useful
-if you don't have very much training data. 
+if you don't have very much training data.
 
-The advantage of the ``tensorflow_embedding`` pipeline is that your word vectors will be customised 
+The advantage of the ``tensorflow_embedding`` pipeline is that your word vectors will be customised
 for your domain. For example, in general English, the word "balance" is closely related to "symmetry",
 but very different to the word "cash". In a banking domain, "balance" and "cash" are closely related
 and you'd like your model to capture that. This pipeline doesn't use a language-specific model,
 so it will work with any language that you can tokenize (on whitespace or using a custom tokenizer).
 
-You can read more about this topic `here <https://medium.com/rasa-blog/supervised-word-vectors-from-scratch-in-rasa-nlu-6daf794efcd8>`_ . 
+You can read more about this topic `here <https://medium.com/rasa-blog/supervised-word-vectors-from-scratch-in-rasa-nlu-6daf794efcd8>`_ .
 
 
-There are also the ``mitie`` and ``mitie_sklearn`` pipelines, which use MITIE as a source of word vectors. 
+There are also the ``mitie`` and ``mitie_sklearn`` pipelines, which use MITIE as a source of word vectors.
 We do not recommend that you use these; they are likely to be deprecated in a future release.
 
 .. note::
 
     Intent classification is independent of entity extraction. So sometimes
-    NLU will get the intent right but entities wrong, or the other way around. 
-    You need to provide enough data for both intents and entities. 
+    NLU will get the intent right but entities wrong, or the other way around.
+    You need to provide enough data for both intents and entities.
 
 
 Multiple Intents
 ----------------
 
-If you want to split intents into multiple labels, 
+If you want to split intents into multiple labels,
 e.g. for predicting multiple intents or for modeling hierarchical intent structure,
 you can only do this with the tensorflow pipeline.
 To do this, use these flags:
 
     - ``intent_tokenization_flag`` if ``true`` the algorithm will split the intent labels into tokens and use a bag-of-words representations for them;
     - ``intent_split_symbol`` sets the delimiter string to split the intent labels. Default ``_``
 
-`Here <https://blog.rasa.com/how-to-handle-multiple-intents-per-input-using-rasa-nlu-tensorflow-pipeline/>`_ is a tutorial on how to use multiple intents in Rasa Core and NLU. 
+`Here <https://blog.rasa.com/how-to-handle-multiple-intents-per-input-using-rasa-nlu-tensorflow-pipeline/>`_ is a tutorial on how to use multiple intents in Rasa Core and NLU.
 
 Here's an example configuration:
 
@@ -93,7 +95,7 @@ In Rasa NLU, incoming messages are processed by a sequence of components.
 These components are executed one after another
 in a so-called processing pipeline. There are components for entity extraction, for intent classification,
 pre-processing, and others. If you want to add your own component, for example to run a spell-check or to
-do sentiment analysis, check out :ref:`section_customcomponents`. 
+do sentiment analysis, check out :ref:`section_customcomponents`.
 
 Each component processes the input and creates an output. The ouput can be used by any component that comes after
 this component in the pipeline. There are components which only produce information that is used by other components
@@ -180,7 +182,7 @@ exactly. Instead it will return the trained synonym.
 Pre-configured Pipelines
 ------------------------
 
-A template is just a shortcut for 
+A template is just a shortcut for
 a full list of components. For example, these two configurations are equivalent:
 
 .. literalinclude:: ../sample_configs/config_spacy.yml
@@ -255,7 +257,7 @@ default is to use a simple whitespace tokenizer:
     - name: "intent_classifier_tensorflow_embedding"
 
 If you have a custom tokenizer for your language, you can replace the whitespace
-tokenizer with something more accurate. 
+tokenizer with something more accurate.
 
 .. _section_mitie_pipeline:
 
@@ -320,5 +322,3 @@ If you want to use custom components in your pipeline, see :ref:`section_customc
 
 
 .. include:: feedback.inc
-
-
diff --git a/docs/components.rst b/docs/components.rst
@@ -1,10 +1,12 @@
-:desc: Understanding a Rasa NLU Pipeline
+:desc: Configure the custom components of your ML model to optimise the
+       processes performed on the user input of your contextual assistant.
+
 .. _section_pipeline:
 
 Component Configuration
 =======================
 
-This is a reference of the configuration options for every built-in component in 
+This is a reference of the configuration options for every built-in component in
 Rasa NLU. If you want to build a custom component, check out :ref:`section_customcomponents`.
 
 .. contents::
@@ -117,13 +119,13 @@ intent_featurizer_count_vectors
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 :Short: Creates bag-of-words representation of intent features
-:Outputs: 
-   nothing, used as an input to intent classifiers that 
-   need bag-of-words representation of intent features  
+:Outputs:
+   nothing, used as an input to intent classifiers that
+   need bag-of-words representation of intent features
    (e.g. ``intent_classifier_tensorflow_embedding``)
 :Description:
     Creates bag-of-words representation of intent features using
-    `sklearn's CountVectorizer <http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html>`_. 
+    `sklearn's CountVectorizer <http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html>`_.
     All tokens which consist only of digits (e.g. 123 and 99 but not a123d) will be assigned to the same feature.
 
     .. note::
@@ -426,7 +428,7 @@ tokenizer_whitespace
 :Description:
     Creates a token for every whitespace separated character sequence. Can be used to define tokens for the MITIE entity
     extractor.
-                                                                   
+
 tokenizer_jieba
 ~~~~~~~~~~~~~~~~~~~~
 
@@ -678,7 +680,5 @@ ner_duckling_http
           # needed to calculate dates from relative expressions like "tomorrow"
           timezone: "Europe/Berlin"
 
-		  
-.. include:: feedback.inc		  
-
 
+.. include:: feedback.inc
diff --git a/docs/config.rst b/docs/config.rst
@@ -1,4 +1,6 @@
-:desc: Customizing Your Rasa NLU Configuration
+:desc: Read more on configuring open source library Rasa NLU to access machine
+       learning based prediction of intents and entities as a server.
+
 .. _section_configuration:
 
 Server Configuration
@@ -16,7 +18,7 @@ Server Configuration
     In older versions of Rasa NLU, the server and models were configured with a single file.
     Now, the server only takes command line arguments (see :ref:`server_parameters`).
     The configuration file only refers to the model that you want to train,
-    i.e. the pipeline and components. 
+    i.e. the pipeline and components.
 
 
 Running the server
@@ -49,21 +51,21 @@ from the same process & avoid duplicating the memory load.
 
 As stated previously, Rasa NLU naturally handles serving multiple apps.
 By default the server will load all projects found
-under the ``path`` directory passed at run time. 
+under the ``path`` directory passed at run time.
 
 Rasa NLU naturally handles serving multiple apps, by default the server will load all projects found
-under the directory specified with ``--path`` option. unless you have provide ``--pre_load`` option 
-to load a specific project. 
+under the directory specified with ``--path`` option. unless you have provide ``--pre_load`` option
+to load a specific project.
 
 .. code-block:: console
 
     $ # This will load all projects under projects/ directory
-    $ python -m rasa_nlu.server -c config.yaml --path projects/ 
+    $ python -m rasa_nlu.server -c config.yaml --path projects/
 
 .. code-block:: console
 
     $ # This will load only hotels project under projects/ directory
-    $ python -m rasa_nlu.server -c config.yaml --pre_load hotels --path projects/ 
+    $ python -m rasa_nlu.server -c config.yaml --pre_load hotels --path projects/
 
 
 The file structure under ``path directory`` is as follows:
@@ -135,5 +137,3 @@ CORS
 By default CORS (cross-origin resource sharing) calls are not allowed. If you want to call your Rasa NLU server from another domain (for example from a training web UI) then you can whitelist that domain by adding it to the config value ``cors_origin``.
 
 .. include:: feedback.inc
-
-
diff --git a/docs/customcomponents.rst b/docs/customcomponents.rst
@@ -1,5 +1,7 @@
-:desc: How to build custom Rasa NLU components
-.. _section_customcomponents:
+:desc: Create custom components to create additional features like sentiment
+       analysis to integrate with open source bot framework Rasa Stack.
+
+.._section_customcomponents:
 
 Custom Components
 =================
@@ -52,5 +54,3 @@ Component
 
 
 .. include:: feedback.inc
-
-
diff --git a/docs/dataformat.rst b/docs/dataformat.rst
@@ -1,4 +1,6 @@
-:desc: The Rasa NLU Training Data Format
+:desc: Read more about how to format training data with Rasa NLU for open
+       source natural language processing.
+
 .. _section_dataformat:
 
 Training Data Format
@@ -9,13 +11,13 @@ Data Format
 ~~~~~~~~~~~
 
 You can provide training data as markdown or as json, as a single file or as a directory containing multiple files.
-Note that markdown is usually easier to work with. 
+Note that markdown is usually easier to work with.
 
 
 Markdown Format
 ---------------
 
-Markdown is the easiest Rasa NLU format for humans to read and write. 
+Markdown is the easiest Rasa NLU format for humans to read and write.
 Examples are listed using the unordered
 list syntax, e.g. minus ``-``, asterisk ``*``, or plus ``+``.
 Examples are grouped by intent, and entities are annotated as markdown links.
@@ -47,10 +49,10 @@ Examples are grouped by intent, and entities are annotated as markdown links.
     path/to/currencies.txt
 
 The training data for Rasa NLU is structured into different parts:
-examples, synonyms, regex features, and lookup tables. 
+examples, synonyms, regex features, and lookup tables.
 
 Synonyms will map extracted entities to the same name, for example mapping "my savings account" to simply "savings".
-However, this only happens *after* the entities have been extracted, so you need to provide examples with the synonyms present so that Rasa can learn to pick them up. 
+However, this only happens *after* the entities have been extracted, so you need to provide examples with the synonyms present so that Rasa can learn to pick them up.
 
 Lookup tables may be specified either directly as lists or as txt files containing newline-separated words or phrases.  Upon loading the training data, these files are used to generate case-insensitive regex patterns that are added to the regex features.  For example, in this case a list of currency names is supplied so that it is easier to pick out this entity.
 
@@ -73,7 +75,7 @@ The most important one is ``common_examples``.
     }
 
 The ``common_examples`` are used to train your model. You should put all of your training
-examples in the ``common_examples`` array. 
+examples in the ``common_examples`` array.
 Regex features are a tool to help the classifier detect entities or intents and improve the performance.
 
 
@@ -86,7 +88,7 @@ and after training a model. Luckily, there's a
 for creating training data in rasa's format.
 - created by `@azazdeaz <https://github.com/azazdeaz>`_ -
 and it's also extremely helpful for inspecting and modifying existing data.
-`Rasa Platform <https://rasa.com/products/rasa-platform>`_ (Rasa's commercial product) also has 
+`Rasa Platform <https://rasa.com/products/rasa-platform>`_ (Rasa's commercial product) also has
 a full-featured UI for annotating data.
 
 
@@ -101,8 +103,8 @@ data in the GUI before training.
 Generating More Entity Examples
 -------------------------------
 
-It is sometimes helpful to generate a bunch of entity examples, for 
-example if you have a database of restaurant names. There are a couple 
+It is sometimes helpful to generate a bunch of entity examples, for
+example if you have a database of restaurant names. There are a couple
 of great tools built by the community to help with that.
 
 You can use `Chatito <https://rodrigopivi.github.io/Chatito/>`__ , a tool for generating training datasets in rasa's format using a simple DSL or `Tracy <https://yuukanoo.github.io/tracy>`__, a simple GUI to create training datasets for rasa.
@@ -297,7 +299,7 @@ you could have a folder called ``nlu_data``:
 
    nlu_data/
    ├── restaurants.md
-   ├── smalltalk.md  
+   ├── smalltalk.md
 
 To train a model with this data, pass the path to the directory to the train script:
 
@@ -316,6 +318,4 @@ To train a model with this data, pass the path to the directory to the train scr
     and json
 
 
-.. include:: feedback.inc	
-
-
+.. include:: feedback.inc
diff --git a/docs/docker.rst b/docs/docker.rst
@@ -1,4 +1,5 @@
-:desc: Using Rasa NLU with Docker
+:desc: Setup Rasa NLU with Docker in your own infrastructure for local
+       intent recognition and entity recognition. 
 
 .. _section_docker:
 

diff --git a/docs/endpoint_configuration.rst b/docs/endpoint_configuration.rst
@@ -1,4 +1,7 @@
-:desc: Adding endpoints using an endpoint configuration file
+:desc: Add new endpoints to the configuration file of Rasa NLU to connect
+       your API's to integrate with open source NLU.
+
+.. _section_endpoint_configuration:
 
 Endpoint Configuration
 ======================
@@ -33,4 +36,4 @@ To use models from a model server, add this to your endpoint configuration:
     model:
         url: <path to your model>
         token: <authentication token>   # [optional]
-        token_name: <name of the token  # [optional] (default: token)
+        token_name: <name of the token  # [optional] (default: token)
diff --git a/docs/entities.rst b/docs/entities.rst
@@ -1,4 +1,6 @@
-:desc: Entity Extraction with Rasa NLU
+:desc: Use open source named entity recognition like spacy and duckling
+       for building contextual AI Assistants.
+
 .. _section_entities:
 
 Entity Extraction
@@ -19,8 +21,8 @@ Custom Entities
 ^^^^^^^^^^^^^^^
 
 Almost every chatbot and voice app will have some custom entities.
-In a restaurant bot, ``chinese`` is a cuisine, but in a language-learning app it would mean something very different. 
-The ``ner_crf`` component can learn custom entities in any language. 
+In a restaurant bot, ``chinese`` is a cuisine, but in a language-learning app it would mean something very different.
+The ``ner_crf`` component can learn custom entities in any language.
 
 
 Extracting Places, Dates, People, Organisations
@@ -38,7 +40,7 @@ Dates, Amounts of Money, Durations, Distances, Ordinals
 
 The `duckling <https://duckling.wit.ai/>`_ library does a great job
 of turning expressions like "next Thursday at 8pm" into actual datetime
-objects that you can use, e.g. 
+objects that you can use, e.g.
 
 .. code-block:: python
 
@@ -47,8 +49,8 @@ objects that you can use, e.g.
 
 
 The list of supported langauges is `here <https://github.com/facebook/duckling/tree/master/Duckling/Dimensions>`_.
-Duckling can also handle durations like "two hours", 
-amounts of money, distances, and ordinals. 
+Duckling can also handle durations like "two hours",
+amounts of money, distances, and ordinals.
 Fortunately, there is a duckling docker container ready to use,
 that you just need to spin up and connect to Rasa NLU.
 (see :ref:`ner_duckling_http`)
@@ -59,11 +61,11 @@ Regular Expressions (regex)
 
 You can use regular expressions to help the CRF model learn to recognize entities.
 In the :ref:`section_dataformat` you can provide a list of regular expressions, each of which provides
-the ``ner_crf`` with an extra binary feature, which says if the regex was found (1) or not (0). 
+the ``ner_crf`` with an extra binary feature, which says if the regex was found (1) or not (0).
 
 For example, the names of German streets often end in ``strasse``. By adding this as a regex,
 we are telling the model to pay attention to words ending this way, and will quickly learn to
-associate that with a location entity. 
+associate that with a location entity.
 
 If you just want to match regular expressions exactly, you can do this in your code,
 as a postprocessing step after receiving the response form Rasa NLU.
@@ -103,13 +105,13 @@ Some extractors, like ``duckling``, may include additional information. For exam
 
 .. code-block:: json
 
-   {  
-     "additional_info":{  
+   {
+     "additional_info":{
        "grain":"day",
        "type":"value",
        "value":"2018-06-21T00:00:00.000-07:00",
-       "values":[  
-         {  
+       "values":[
+         {
            "grain":"day",
            "type":"value",
            "value":"2018-06-21T00:00:00.000-07:00"
@@ -134,5 +136,3 @@ Some extractors, like ``duckling``, may include additional information. For exam
 
 
 .. include:: feedback.inc
-
-