# Programmatic Evaluation of Watson Conversation Intent Recognition Performance

This notebook demonstrates a technique to programmatically train and evaluate the intent recognition performance for a workspace in <a href="https://www.ibm.com/watson/developercloud/conversation/api/v1/" target="_blank" rel="noopener noreferrer">Watson Conversation</a>.

At a high level, intents are purposes or goals expressed in a user's input, such as answering a question or processing a bill payment. By recognizing the intent expressed in a customer's input, the Conversation service can choose the correct dialog flow for responding to it.

This notebook will demonstrate how the Watson Conversation API can be directly accessed to programmatically train the workspace on intents. This is an alternative to the GUI tool typically used to train a workspace.

By managing the training process programmatically, the intent recognition performance can be reliably tested with a truly blind test set.

This notebook runs on Python 3.5 with Spark 2.0 or 2.1.




## Table of contents

1. [Install and import packages](#setup)
2. [Import the data as a pandas DataFrame](#import)
3. [Split the data set for training and testing](#scikit)
4. [Authenticate to the Watson Conversation Service](#authenticate)
5. [Test the connection to the Watson Conversation service](#wcs1)
6. [Create unique intents from the training data](#wcs2)
7. [Add examples to each intent from the training data set](#wcs3)
8. [Evaluate the test set with the message function](#wcs4)<br>
[Summary and next steps](#Summary-and-next-steps)

## <a id="setup"></a> Step 1. Install and import packages

Install and import the necessary packages.

In [1]:
!pip install --upgrade watson-developer-cloud

Requirement already up-to-date: watson-developer-cloud in /gpfs/global_fs01/sym_shared/YPProdSpark/user/sc3e-53554f95eddadf-4e28db014a7c/.local/lib/python3.5/site-packages
Requirement already up-to-date: pysolr<4.0,>=3.3 in /gpfs/global_fs01/sym_shared/YPProdSpark/user/sc3e-53554f95eddadf-4e28db014a7c/.local/lib/python3.5/site-packages (from watson-developer-cloud)
Requirement already up-to-date: requests<3.0,>=2.0 in /gpfs/global_fs01/sym_shared/YPProdSpark/user/sc3e-53554f95eddadf-4e28db014a7c/.local/lib/python3.5/site-packages (from watson-developer-cloud)
Requirement already up-to-date: pyOpenSSL>=16.2.0 in /gpfs/global_fs01/sym_shared/YPProdSpark/user/sc3e-53554f95eddadf-4e28db014a7c/.local/lib/python3.5/site-packages (from watson-developer-cloud)
Requirement already up-to-date: urllib3<1.23,>=1.21.1 in /gpfs/global_fs01/sym_shared/YPProdSpark/user/sc3e-53554f95eddadf-4e28db014a7c/.local/lib/python3.5/site-packages (from requests<3.0,>=2.0->watson-developer-cloud)
Requirement alre

In [2]:
!pip install sklearn --upgrade

Collecting sklearn
  Downloading sklearn-0.0.tar.gz
Requirement already up-to-date: scikit-learn in /gpfs/global_fs01/sym_shared/YPProdSpark/user/sc3e-53554f95eddadf-4e28db014a7c/.local/lib/python3.5/site-packages (from sklearn)
Building wheels for collected packages: sklearn
  Running setup.py bdist_wheel for sklearn ... [?25ldone
[?25h  Stored in directory: /gpfs/fs01/user/sc3e-53554f95eddadf-4e28db014a7c/.cache/pip/wheels/d7/db/a3/1b8041ab0be63b5c96c503df8e757cf205c2848cf9ef55f85e
Successfully built sklearn
Installing collected packages: sklearn
Successfully installed sklearn-0.0


In [3]:
import pandas as pd
import numpy as np
from bokeh.charts import Histogram, output_file, show
import random
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

## <a id="import"></a>Step 2. Import the data as a pandas DataFrame

The data consists of sample user questions and the assigned intents. 

**For notebooks running on IBM Data Science Experience:**

To get the data and load it into a pandas DataFrame:

1. Go to the [Watson Conversation intents data card](https://apsportal.ibm.com/exchange/public/entry/view/3460a7906f329ea1523b6a0455c53757) and click the download icon to save the file on your computer.
1. Back in your notebook, load the file by clicking the **Find and Add Data** icon and then dragging and dropping the file onto the pane or browsing for the file. The data is stored in the object storage container that is associated with your project.
1. Click in the next cell and then choose **Insert to code > Insert Pandas DataFrame** from below the file name and then run the cell.

**For Python notebook servers**
1. Uncomment and modify the code stub to load data from your server's filesystem. 

In [1]:
# Insert your WCS intent examples as a pandas dataframe here
# data should be imported to a variable with a name like df_data_1.

# Uncomment and modify only if you are not using the IBM Data Science Experience
# df_data_1 = pd.read_csv('YOUR FILE')
# df_data_1.head()


Rename the DataFrame to `df`:

In [5]:
# Make sure this uses the variable above. The number will vary in the inserted code.
try:
    df = df_data_1
except NameError as e:
    print('Error: Setup is incorrect or incomplete.\n')
    print('Follow the instructions to insert the pandas DataFrame above, and edit to')
    print('make the generated df_data_# variable match the variable used here.')
    raise

## <a id="scikit"></a>Step 3. Split the data set for training and testing 
Using Scikit Learn, split the data set into two separate sets, one for training and one for testing. The size of the testing data set is set to 20% of the original data set, but you can change the percentage if you like.

In [6]:
train, test = train_test_split(df, test_size = 0.2)

## <a id="authenticate"></a>Step 4. Authenticate to the Watson Conversation service

Sign up for the Watson Conversation service and enter your credentials. 

1. Sign up for [Watson Conversation service](https://console.bluemix.net/catalog/services/conversation) in Bluemix.
1. On your Watson Conversation service page, click **Launch Tool**. The Workspaces page appears in a separate tab.
1. On your Watson Conversation Workspaces page, click **Create**. 
1. Add a name, for example, `Intents example`, and click **Create**.
1. Find your workspace ID and credentials by clicking the **Deploy** button and then **Credentials**. 
1. Add your workspace ID, username, and password to the next cell and run the cell.

In [7]:
CONVERSATION_USERNAME = ''
CONVERSATION_PASSWORD = ''
VERSION = ''
WORKSPACE_ID = ''

Import the Watson Conversation package and set variables:

In [8]:
import json
from watson_developer_cloud import ConversationV1
conversation = ConversationV1(
    username=CONVERSATION_USERNAME,
    password=CONVERSATION_PASSWORD,
    version= VERSION
)

## <a id="wcs1"></a>Step 5. Test the connection to the Watson Conversation service
Run the <a href="https://www.ibm.com/watson/developercloud/conversation/api/v1/" target="_blank" rel="noopener noreferrer">Watson Conversation API</a> functions to make sure you are properly connected to your Watson Conversation Workspace.

List the existing intents with the `list_intents` function. If this is the first time you're using the Watson Conversation service, you won't have any intents.

In [9]:
intents = conversation.list_intents(WORKSPACE_ID)
print(json.dumps(intents, indent=2))

{
  "pagination": {
    "refresh_url": "/v1/workspaces/dd596e58-5941-4860-ae87-52bcf3649dcc/intents?version=2017-05-26"
  },
  "intents": [
    {
      "created": "2017-08-23T15:58:25.882Z",
      "updated": "2017-08-23T16:00:52.937Z",
      "intent": "capabilities",
      "description": "capabilities"
    },
    {
      "created": "2017-08-23T15:58:26.176Z",
      "updated": "2017-08-23T16:00:32.312Z",
      "intent": "interface_issues",
      "description": "interface_issues"
    },
    {
      "created": "2017-08-23T15:58:26.032Z",
      "updated": "2017-08-23T16:00:53.082Z",
      "intent": "locate_amenity",
      "description": "locate_amenity"
    }
  ]
}


Create a sample intent with the `create_intent` function:

In [10]:
create = conversation.create_intent(WORKSPACE_ID,'sample','This is an example')
print(json.dumps(create, indent=2))


{
  "created": "2017-08-25T22:12:27.766Z",
  "updated": "2017-08-25T22:12:27.766Z",
  "intent": "sample",
  "description": "This is an example"
}


Now delete all intents with the `delete_intent` function:

In [11]:
#Clear the workspace of all existing intents
intents = conversation.list_intents(workspace_id=WORKSPACE_ID)['intents']
for intent in intents:
    conversation.delete_intent(workspace_id=WORKSPACE_ID, intent=intent['intent'])

## <a id="wcs2"></a>Step 6. Create unique intents from the training data

Use the values from the `intent` column in the training data set to create intents: `locate_amenity`, `capabilities`, and `interface_issues`.

In [12]:
for intent in set([x for x in train['intent']]):
    conversation.create_intent(workspace_id=WORKSPACE_ID, intent=intent, description=intent)

## <a id="wcs3"></a>Step 7. Add examples to each intent from the training data set
Add example text from the training data set for each intent so that the Watson Conversation service can learn what sorts of questions to assign to each intent.

In [13]:
for training_data in [x[1] for x in train[:].iterrows()]:
    conversation.create_example(workspace_id=WORKSPACE_ID, intent=training_data.intent, text=training_data.example)

## <a id="wcs4"></a>Step 8. Evaluate the test set with the message function
Now test how accurately the Watson Conversation service can assign intents to the examples in the testing data set. By using the `message` function from the <a href="https://www.ibm.com/watson/developercloud/conversation/api/v1/" target="_blank" rel="noopener noreferrer">Watson Conversation API</a>, you can test all examples at once, instead of examining each example individually with the Conversation Workspace tool.

In [14]:
results = []
for test_data in [x[1] for x in test[:].iterrows()]:
    try:
        results.append(1 if conversation.message(workspace_id=WORKSPACE_ID, message_input={"text": test_data.example})['intents'][0]['intent'] == test_data.intent else 0)
    except:
        results.append(0)
results = np.array(results)

print("Intent Recognizer Performance: {:.2%}".format(np.sum(results) / results.size))

Intent Recognizer Performance: 84.57%


## Summary and next steps
You've learned how to use the Watson Conversation API to train and evaluate the service. Try adding your own user questions and intents data and see how Watson does!

Learn more:
- <a href="https://www.ibm.com/watson/developercloud/conversation/api/v1/" target="_blank" rel="noopener noreferrer">Watson Conversation API reference</a>
- <a href="https://github.com/watson-developer-cloud/python-sdk" target="_blank" rel="noopener noreferrer">Watson Conversation Python SDK</a>

### Authors
Paul Thoresen & Tyler Andersen from the Watson Accelerators Team.

Copyright &copy; IBM Corp. 2017. This notebook and its source code are released under the terms of the MIT License.