Skip to content

Commit

Permalink
Merge pull request #1445 from RasaHQ/language_example
Browse files Browse the repository at this point in the history
added custom language example
  • Loading branch information
JustinaPetr committed Oct 4, 2018
2 parents 51c8491 + 8a55f9c commit d1f3075
Showing 1 changed file with 59 additions and 0 deletions.
59 changes: 59 additions & 0 deletions docs/languages.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,65 @@ pipeline in :ref:`choosing_pipeline`.
Other backends have some restrictions and support those languages
which have pre-trained word vectors available.

Training a model in any language using the tensorflow_embedding pipeline
------------------------------------------------------------------------
To train the Rasa NLU model in your preferred language you have to define the
tensorflow_embedding pipeline and save it as a yaml file inside your project directory.
One way to define the pipeline configuration is to use a template configuration:

.. code-block:: yaml
language: "en"
pipeline: "tensorflow_embedding"
Another way is to define a custom configuration by listing all components you would like your pipeline to use.
The tensorflow pipeline supports any language that can be tokenized. The default is to use a simple
whitespace tokenizer:

.. code-block:: yaml
language: "en"
pipeline:
- name: "tokenizer_whitespace"
- name: "ner_crf"
- name: "ner_synonyms"
- name: "intent_featurizer_count_vectors"
- name: "intent_classifier_tensorflow_embedding"
If your chosen language cannot be tokenized using the whitespace you can use your own custom tokenizer
and use it instead of the whitespace tokenizer.

After you define the ``tensorflow_embedding`` processing pipeline you are good to generate some NLU training
examples in your chosen language and train the model. For example, if you wanted to build an assistant
in Norwegian, then your NLU data examples could look something like this:

.. code-block:: md
## intent:hallo
- Hallo!
- Hei
- Lenge siden sist.
- God morgen
## intent:farvel
- Ha det!
- P氓 Gjensyn
- Ses i morgen.
Let's say you saved training examples as nlu_data.md and one of the pipeline configuration examples mentioned above as config.yml,
then you can train the model by running:

.. code-block:: console
$ python -m rasa_nlu.train \
--config config.yml \
--data nlu_data/ \
--path projects
Once the training is finished, you can test your model's Norwegian language skills.


Pre-trained Word Vectors
------------------------
Expand Down

0 comments on commit d1f3075

Please sign in to comment.