- desc
Find out how to use pre-trained embeddings from language models implemented in HuggingFace's Transformers like BERT, GPT-2, etc. inside Rasa NLU to add more Modern Deep Learning Techniques to your Chatbot.
Since Rasa 1.8.0, you can use pre-trained embeddings from language models like BERT inside of Rasa NLU pipelines. This page shows how you can use models like BERT and GPT-2 in your contextual AI assitant and includes practical tips on how to get the most out of these models.
To demonstrate how to use BERT
we will train two pipelines on Sara, the demo bot in the Rasa docs. In doing this we will also be able to measure the pros and cons of having BERT
in your pipeline.
If you want to reproduce the results in this document you will need to first clone the repository found here:
git clone git@github.com:RasaHQ/rasa-demo.git
Once cloned you can install the requirements. Be sure that you explicitly install the transformers
dependency.
pip install "rasa[transformers]"
You should now be all set to train an assistant that will use BERT
. So let's write configuration files that will allow us to compare approaches. We'll make a seperate folder where we can place two new configuration files.
mkdir config
For the next step we've created two configuration files. They only contain the pipeline part that is relevant for NLU model training and hence don't declare any dialogue policies.
config/config-light.yml
language: en
pipeline:
- name: WhitespaceTokenizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 50
config/config-heavy.yml
language: en
pipeline:
- name: HFTransformersNLP
model_weights: "bert-base-uncased"
model_name: "bert"
- name: LanguageModelTokenizer
- name: LanguageModelFeaturizer
- name: DIETClassifier
epochs: 50
In both cases we're training a diet-classifier
for combined intent classification and entity recognition for 50 epochs but there are a few differences.
In the light configuration we have CountVectorsFeaturizer
which creates bag-of-word representations for each incoming message(at word and character levels). The heavy configuration replaces it with a BERT
model inside the pipeline. HFTransformersNLP
is a utility component that does the heavy lifting work of loading the BERT
model in memory. Under the hood it leverages HuggingFace's Transformers library to initialize the specified language model. Notice that we add two additional components LanguageModelTokenizer
and LanguageModelFeaturizer
which pick up the tokens and feature vectors respectively that are constructed by the utility component.
Note
We strictly use these language models as featurizers, which means that their parameters are not fine-tuned during training of downstream models in your NLU pipeline. This saves a lot of compute time and the machine learning models in the pipeline can typically compensate for the lack of fine-tuning.
You can run both configurations yourself.
mkdir gridresults
rasa test nlu --config configs/config-light.yml \
--cross-validation --runs 1 --folds 2 \
--out gridresults/config-light
rasa test nlu --config configs/config-heavy.yml \
--cross-validation --runs 1 --folds 2 \
--out gridresults/config-heavy
When this runs you should see logs appear. We've picked a few of those lines to list them here.
# output from the light model
2020-03-30 16:21:54 INFO rasa.nlu.model - Starting to train component DIETClassifier
Epochs: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 50/50 [04:30<00:00, ...]
2020-03-30 16:23:53 INFO rasa.nlu.test - Running model for predictions:
100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 2396/2396 [01:23<00:00, 28.65it/s]
...
# output from the heavy model
2020-03-30 16:47:04 INFO rasa.nlu.model - Starting to train component DIETClassifier
Epochs: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 50/50 [04:33<00:00, ...]
2020-03-30 16:49:52 INFO rasa.nlu.test - Running model for predictions:
100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 2396/2396 [07:20<00:00, 5.69it/s]
Note
From the logs we can gather an important observation. The heavy model consisting of BERT
is a fair bit slower, not in training, but at inference time we see a ~6 fold increase. Depending on your use-case this is something to seriously consider.
The results from these two runs can be found in the gridresults
folder. We've summarised the main results below.
These are the scores for intent classification.
Config | Precision | Recall | f1 score |
---|---|---|---|
Light |
|
|
|
Heavy |
|
|
|
These are the scores for entity recognition.
Config | Precision | Recall | f1 score |
---|---|---|---|
Light |
|
|
|
Heavy |
|
|
|
On all fronts we see that the heavy model with the BERT
embeddings performs better. The performance gain for intent classification is marginal but entity recognition has improved substantially.
Note that in practice you'll need to run this experiment on your own data. Odds are that our dataset is not representative of yours so you should always try out different settings yourself.
There are a few things to consider;
- Which task is more important - intent classification or entity recognition? If your assistant barely uses entities then you may care less about improved performance there.
- Is accuracy more important or do we care more about latency of bot predictions? If responses from the assistant become much slower as shown in the above example, we may also need to invest in more compute resources.
- The
BERT
embeddings that we're using here as features can be extended with other featurizers as well. It may still be a good idea to add aCountVectorsFeaturizer
to capture words specific to the vocabulary of your domain.