Skip to content
Scripts that run against Watson Assistant for K fold validation on training set, testing on blind test, and draw precision curves for comparison.
Jupyter Notebook Python
Branch: master
Clone or download
andrewrfreed Merge pull request #112 from cognitive-catalyst/111_list_context_vari…

Lists context variables used in workspace, nested vars too
Latest commit 54ca01b Nov 5, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
data #107 fixed workspace parser test / removed duplicated integration tests Oct 29, 2019
examples Update with IAM key Oct 25, 2019
extract_utterances Update to ibm_watson library Oct 21, 2019
notebook_ws Ignore local notebook save files Oct 24, 2019
resources Example documentation for confusion heatmap Sep 30, 2019
tests Fixed path in example Nov 5, 2019
utils Merge branch 'master' into update_test_run_scripts Oct 28, 2019
validate_workspace Merge branch 'master' into 111_list_context_variables Nov 5, 2019
.gitignore #107 fixed workspace parser test / removed duplicated integration tests Oct 29, 2019 Partial credit scoring test case added Jul 9, 2018
LICENSE Add licensing content Mar 26, 2018 Clarification on k-folds cross-validation Oct 24, 2019
config.ini.sample Revert "Update config.ini.sample" Oct 25, 2019 spelling error Aug 29, 2019
requirements.txt Allow flexibility in Watson SDK level Oct 23, 2019 Update with IAM key Oct 25, 2019


Scripts that run against Watson Assistant for

  • KFOLD K fold cross validation on training set,
  • BLIND Evaluating a blind test, and
  • TEST Testing the WA against a list of utterances.

In the case of a k-fold cross validation, or a blind set, the tool will output a precision curve, in addition to per-intent true positive and positive predictive value rates, and a confustion matrix.


  • Easy to setup in one configuration file.
  • Save the state when Assistant service is down in the middle of processing.
  • Able to resume from where it stops using modularized scripts.


  • Python 3.6.4 +
  • Mac users: you may need to initialize Python's SSL certificate store by running Install Certificates.command found in /Applications/Python. See more here

Quick Start

  1. Install dependencies pip3 install -r requirements.txt
  2. Set up parameters properly in configuration file (ex: config.ini). Use config.ini.sample to bootstrap your configuration.
  3. Run the process. python3 -c config.ini or python3 -c <path to your config file>

Quick Update

If you have already installed this utility use these steps to get the latest code.

  1. Upgrade dependencies pip3 install --upgrade -r requirements.txt
  2. Update to latest code level git pull

Input Files

config.ini - Configuration file for This is formatted differently for each mode. Review the Examples below to explore the possible modes and how each is configured.

test_input_file.csv - Test set for blind testing and standard test.

For blind test with golden intent used for comparison:

utterance golden intent
utterance 0 intent 0
utterance 1 intent 0
utterance 2 intent 1

For standard test, the input must only have one column or error will be thrown:

utterance 0
utterance 1
utterance 2


There are a variety of ways to use this tool. Primarily you will execute a k-folds, blind, or standard test.

Core execution modes

Run k-fold cross-validation

Run blind test

Run standard test without ground truth

Extended modes (executed by default)

Generate precision/recall for classification test

Generate confusion matrix for classification test

Extended modes

Generate description for intents

Generate long-tail classification results

Run syntax validation patterns on a workspace

Extract utterances leading to a dialog node

Caveats and Troubleshooting

  1. Due to different coverage among service plans, user may need to adjust max_test_rate accordingly to avoid network connection error.

  2. Users on Lite plans are only able to create 5 workspaces. They should set fold_num=3 on their k-fold configuration file.

  3. In case of interrupted execution, the tool may not be able to clean up the workspaces it creates. In this case you will need to manually delete the extra workspaces.

  4. Workspace ID is not the Skill ID. In the Watson Assistant user interface, the Workspace ID can be found on the Skills tab, clicking the three dots (top-right of skill), and choosing View API Details.

  5. SSL: [CERTIFICATE_VERIFY_FAILED] on Mac means you may need to initialize Python's SSL certificate store by running Install Certificates.command found in /Applications/Python. See more here

  6. "This utility used to work and now it doesn't." Upgrade to latest dependencies with pip3 install --upgrade -r requirements.txt and latest code with git pull.

You can’t perform that action at this time.