# Evaluating Rasa NLU from Jupyter

In this notebook I'll highlight how you might load Rasa models into jupyter. The use-case is to be able to more easily customise how you evaluate the trained model. We've got summaries generated for you on disk, but maybe you need more. For this use-case I'm writing this guide.

This guide was written for Rasa 2.0.2.

In [1]:
import rasa
rasa.__version__

'2.0.2'

It's important to make sure that your trained model is up to date with your Rasa version. 

In [2]:
import pathlib

from rasa.cli.utils import get_validated_path
from rasa.model import get_model, get_model_subdirectories
from rasa.nlu.model import Interpreter
from rasa.shared.nlu.training_data.message import Message
from rasa.shared.nlu.constants import TEXT


def load_interpreter(model_path):
    """
    This loads the Rasa NLU interpreter. It is able to apply all NLU
    pipeline steps to a text that you provide it. 
    """
    model = get_validated_path(model_path, "model")
    model_path = get_model(model)
    _, nlu_model = get_model_subdirectories(model_path)
    return Interpreter.load(nlu_model)

In [3]:
nlu_interpreter = load_interpreter("models/nlu-20201029-130124.tar.gz")

Instructions for updating:
back_prop=False is deprecated. Consider using tf.stop_gradient instead.
Instead of:
results = tf.while_loop(c, b, vars, back_prop=False)
Use:
results = tf.nest.map_structure(tf.stop_gradient, tf.while_loop(c, b, vars))
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Index'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Index'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Index'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and 

We can see all the NLU components inside of our trained model.

In [4]:
nlu_interpreter.pipeline

[<rasa.nlu.tokenizers.whitespace_tokenizer.WhitespaceTokenizer at 0x7f97d204e490>,
 <rasa.nlu.featurizers.sparse_featurizer.regex_featurizer.RegexFeaturizer at 0x7f97d204e410>,
 <rasa.nlu.featurizers.sparse_featurizer.lexical_syntactic_featurizer.LexicalSyntacticFeaturizer at 0x7f97d204e110>,
 <rasa.nlu.featurizers.sparse_featurizer.count_vectors_featurizer.CountVectorsFeaturizer at 0x7f9773a42a50>,
 <rasa.nlu.featurizers.sparse_featurizer.count_vectors_featurizer.CountVectorsFeaturizer at 0x7f97d204e850>,
 <rasa.nlu.classifiers.diet_classifier.DIETClassifier at 0x7f976836a9d0>,
 <rasa.nlu.extractors.entity_synonyms.EntitySynonymMapper at 0x7f97684f0350>]

We can get relevant information out by running this command;

In [5]:
from pprint import pprint 

pprint(nlu_interpreter.parse("pick up the key"))

{'entities': [],
 'intent': {'confidence': 0.2415866255760193,
            'id': 3902249864047352120,
            'name': 'how_are_you'},
 'intent_ranking': [{'confidence': 0.2415866255760193,
                     'id': 3902249864047352120,
                     'name': 'how_are_you'},
                    {'confidence': 0.1478807032108307,
                     'id': 812832756522555264,
                     'name': 'weather'},
                    {'confidence': 0.10809577256441116,
                     'id': 2494083358195087505,
                     'name': 'who_made_you'},
                    {'confidence': 0.1064174622297287,
                     'id': -8809246561339837571,
                     'name': 'are_you_real'},
                    {'confidence': 0.07944877445697784,
                     'id': -1827177683994385996,
                     'name': 'restaurant'},
                    {'confidence': 0.07899823784828186,
                     'id': 100417830020823555,
                   

You can use this to do any kind of research that you're interested in. For example, you could check against your own test data!

In [6]:
import rasa.shared.nlu.training_data.loading

train_data = rasa.shared.nlu.training_data.loading.load_data(
    "data/nlu.yml", nlu_interpreter.model_metadata.language
)

# This `train_data` object contains intent_examples. This is a 
# list of `Message` objects. These are containers that can 
# contain intents, entities but also other information that 
# is relevant to a NLU pipeline. 

[m.as_dict() for m in train_data.intent_examples][:5]

[{'text': 'are you a real person', 'intent': 'are_you_real'},
 {'text': 'Ar you a bot ?', 'intent': 'are_you_real'},
 {'text': 'hey are you human', 'intent': 'are_you_real'},
 {'text': 'are you a real bot?', 'intent': 'are_you_real'},
 {'text': 'Are you human ?', 'intent': 'are_you_real'}]

In [7]:
import pandas as pd 

def add_predictions(dataf):
    pred_blob = [nlu_interpreter.parse(t)['intent'] for t in dataf['text']]
    return (dataf
            [['text', 'intent']]
            .assign(pred_intent=[p['name'] for p in pred_blob])
            .assign(pred_confidence=[p['confidence'] for p in pred_blob]))

df_intents = pd.DataFrame([m.as_dict() for m in train_data.intent_examples]).pipe(add_predictions)

The main benefit of doing this is that you can use whatever evaluation metrics you like. You can zoom in on a particular intent and you can make whatever charts you like. If you keep a seperate `nlu-test.yml` file as a validation set then you can really customise the reporting to your liking. For example, you can use scikit-learn to generate classification reports. 

In [8]:
from sklearn.metrics import classification_report

In [9]:
report = classification_report(y_true=df_intents['intent'], y_pred=df_intents['pred_intent'])
print(report)

                 precision    recall  f1-score   support

   are_you_real       1.00      0.99      0.99        74
       birthday       0.98      0.64      0.77       102
  happy_to_meet       0.98      1.00      0.99        48
    how_are_you       0.79      0.92      0.85        91
       how_made       0.71      0.98      0.82        47
           joke       0.95      1.00      0.98        41
      languages       0.97      1.00      0.98        62
     restaurant       0.98      1.00      0.99        55
        weather       0.96      0.99      0.97        74
what_can_you_do       0.96      0.90      0.93        90
   what_is_rasa       0.93      1.00      0.96        80
      what_time       0.98      0.91      0.94        54
 whats_you_name       0.97      0.97      0.97        74
        whatsup       1.00      0.25      0.40        12
    who_are_you       0.82      0.89      0.85        84
   who_made_you       0.99      0.99      0.99        86

       accuracy              

Alternatively you can use plotting tools like [altair]() to generate interactive visualisations on your behalf. 

You can write whatever pandas queries you like on the data too.

In [11]:
df_summary = (df_intents
 .groupby("pred_intent")
 .agg(n=('pred_confidence', 'size'),
      mean_conf=('pred_confidence', 'mean')))

df_summary

Unnamed: 0_level_0,n,mean_conf
pred_intent,Unnamed: 1_level_1,Unnamed: 2_level_1
are_you_real,73,0.835666
birthday,66,0.284392
happy_to_meet,49,0.781069
how_are_you,107,0.456159
how_made,65,0.663902
joke,43,0.666984
languages,64,0.898767
restaurant,56,0.751922
weather,76,0.73356
what_can_you_do,84,0.376587


In [10]:
import altair as alt

In [46]:
df_intents

bars = alt.Chart(df_intents).mark_bar().encode(
    x='pred_confidence:Q',
    y="pred_intent:O"
)

(bars).properties(height=100) 

In [48]:
df_conf_mat = (df_intents
               .groupby(["intent", "pred_intent"])
               .agg(n_pred=("pred_confidence", "size"))
               .reset_index())
df_plot = df_conf_mat.merge(df_summary.reset_index()).assign(p=lambda d: d['n_pred']/d['n'])

In [49]:
import altair as alt 

alt.Chart(df_plot).mark_rect().encode(
    x='intent:O',
    y='pred_intent:O',
    color='p:Q'
)