<table style="border: none" align="left">
   <tr style="border: none">
      <th style="border: none"><font face="verdana" size="5" color="black"><b>Building a Chatbot using Watson™ Assistant</b></th>
      <th style="border: none"><img src="https://github.com/pmservice/customer-satisfaction-prediction/blob/master/app/static/images/ml_icon_gray.png?raw=true" alt="Watson Machine Learning icon" height="40" width="40"></th>   
   </tr>
</table>

This notebook demonstrates how to build a chatbot using Watson™ Assistant.

The IBM Watson™ Assistant service combines machine learning, natural language understanding, and integrated dialog tools to create conversation flows between apps and users.

This notebook uses the <a href="http://www.cs.cmu.edu/~ark/QA-data/" target="_blank" rel="noopener noreferrer">Wikipedia Q&A data set</a> collected by students at Carnegie Mellon University and the University of Pittsburgh in 2010.

Some familiarity with Python is helpful. This notebook runs on Python. It uses the <a href="https://cloud.ibm.com/apidocs/assistant" target="_blank" rel="noopener noreferrer">Watson™ Assistant API</a>.  

## Learning Goals
In this notebook, you will learn how to build a chatbot using <a href="https://cloud.ibm.com/apidocs/assistant" target="_blank" rel="noopener noreferrer">Watson™ Assistant API</a>.


## Contents
1. [Set up the environment](#setup)
2. [Create a Watson™ Assistant workspace](#ws)
3. [Explore data](#data)
4. [Train the chatbot](#train)
5. [Test the chatbot](#test)
6. [Summary and next steps](#summary)

## 1. Set up the environment <a id="setup"></a>

Before running the code in this notebook, please make sure to perform the following setup task:

- Create a <a href="https://cloud.ibm.com/apidocs/assistant" target="_blank" rel="noopener noreferrer">Watson™ Assistant </a> service instance - a lite (free) plan is offered and information about how to create the instance can be found on IBM Cloud at <a href="https://cloud.ibm.com/docs/services/assistant?topic=assistant-getting-started" target="_blank" rel="noopener noreferrer">https://cloud.ibm.com/docs/services/assistant?topic=assistant-getting-started</a>.

Install the `ibm-watson` package. Currently, the required version of the package is `3.0.4`. You can find the required version information in the Watson Assistant <a href="https://cloud.ibm.com/apidocs/assistant" target="_blank" rel="noopener noreferrer">documentation</a>.

In [None]:
!pip install --upgrade 'ibm-watson>=3.0.4'

Import the `AssistantV1` module and authenticate the service instance. Replace `version`, `iam_apikey`, and `url` in the following cell.

To obtain your `iam_apikey` and `url`:

1. Login to the <a href="https://cloud.ibm.com" target="_blank" rel="noopener noreferrer">IBM cloud</a>.
2. The first webpage you see is the dashboard. Find the panel that has the title `Resource summary` and click on `Services`, or directly access it via this <a href="https://cloud.ibm.com/resources" target="_blank" rel="noopener noreferrer">link</a>.
3. On the `Resource list` webpage, expand the `Services` list, find the service that starts with `Watson-Assistant`, and click it.
4. You will be directed to a webpage that has the `iam_apikey` (API Key) and `url`.

In [2]:
from ibm_watson import AssistantV1

assistant = AssistantV1(
    version='version', # '2019-02-28' is used as the version value in this notebook.
    iam_apikey='apikey', 
    url='url'
)

## 2. Create a Watson™ Assistant workspace <a id="ws"></a>

In this section, you will learn how to create a Watson Assistant workspace for the `chatbot` and assign `name` and `description` values. 

Watson Assistant also supports Graphical User Interface (GUI), which can be accessed through <a href="https://cloud.ibm.com" target="_blank" rel="noopener noreferrer">IBM cloud</a>. 

You can launch the GUI tool by clicking on the `Launch Watson Assistant` button which you can find on the same webpage as the `iam_apikey` (API Key) and `url` in section 1. [Set up the environment](#setup).

In [4]:
import json

create_workspace_response = assistant.create_workspace(
    name='Wikipedia Q&A chatbot',
    description='Wikipedia Q&A chatbot workspace created via API'
).get_result()

print(json.dumps(create_workspace_response, indent=2))

{
  "language": "en",
  "workspace_id": "c5653fb7-bc80-40de-8f13-330b7ff42944",
  "name": "Wikipedia Q&A chatbot",
  "metadata": {
    "api_version": {
      "major_version": "v1",
      "minor_version": "2019-02-28"
    }
  },
  "description": "Wikipedia Q&A chatbot workspace created via API",
  "learning_opt_out": false
}


## 3. Explore data <a id="data"></a>

In this section, you will learn how to:

- 3.1 [Download the Wikipedia Q&A data set](#download)
- 3.2 [Preprocess data for training the chatbot](#preprocess)

### 3.1 Download the Wikipedia Q&A data set <a id="download"></a>

Install the `wget` package in order to download the `Wikipedia Q&A data set`. Note that the `Wikipedia Q&A data set` is available under `GFDL` and `CC BY-SA 3.0` licenses. For more information see http://www.cs.cmu.edu/~ark/QA-data/data/README.v1.2.

In [None]:
!pip install --upgrade wget

Import `os` and `wget` modules.

In [6]:
from pathlib import Path
import os
import wget

Download the data file.

In [7]:
link_to_data = 'http://www.cs.cmu.edu/~ark/QA-data/data/Question_Answer_Dataset_v1.2.tar.gz'
filename = link_to_data.split('/')[-1]

In [8]:
try:
    data_file = Path(filename)
    data_file_path = data_file.resolve()
except FileNotFoundError:
    downloaded_fname = wget.download(link_to_data)
    print(downloaded_fname)
else:
    print('Found a duplicate file. Deleting old file and downloading new file.')
    os.remove(filename)
    downloaded_fname = wget.download(link_to_data)
    print(downloaded_fname)

Found a duplicate file. Deleting old file and download new file.
Question_Answer_Dataset_v1.2.tar.gz


In [9]:
!ls

Question_Answer_Dataset_v1.2.tar.gz


Since the data file is compressed, you need to decompress (extract) using `tarfile` module.

In [10]:
import tarfile

tar = tarfile.open(filename)
tar.extractall()
tar.close()

In [11]:
!ls -l

total 8068
drwxr-xr-x 5 dsxuser dsxuser    4096 Aug 23  2013 Question_Answer_Dataset_v1.2
-rw-r----- 1 dsxuser dsxuser 8254496 Jul 15 21:08 Question_Answer_Dataset_v1.2.tar.gz


You can see that there are three folders that contain `Q&A` data.

In [12]:
!ls -l Question_Answer_Dataset_v1.2

total 40
-rw-r--r-- 1 dsxuser dsxuser 22962 Nov  3  2008 LICENSE-S08,S09
-rwxr-xr-x 1 dsxuser dsxuser  2823 Aug 23  2013 README.v1.2
drwxr-xr-x 3 dsxuser dsxuser  4096 Aug  6  2010 S08
drwxr-xr-x 3 dsxuser dsxuser  4096 Aug  6  2010 S09
drwxr-xr-x 3 dsxuser dsxuser  4096 Aug  6  2010 S10


In [13]:
!ls -l Question_Answer_Dataset_v1.2/S10

total 168
drwxr-xr-x 8 dsxuser dsxuser   4096 Aug  6  2010 data
-rw-r--r-- 1 dsxuser dsxuser 167585 Aug  6  2010 question_answer_pairs.txt


### 3.2 Preprocess the data<a id="preprocess"></a>

In this subsection, you will use the Q&A pairs collected in 2010 and perform preprocessing.

In [14]:
file_path = os.path.join('Question_Answer_Dataset_v1.2', 'S10', 'question_answer_pairs.txt')

Import `pandas` to explore the data set.

In [15]:
import pandas as pd

df = pd.read_csv(file_path, delimiter='\t', encoding='ISO-8859-1')
df.head(10)

Unnamed: 0,ArticleTitle,Question,Answer,DifficultyFromQuestioner,DifficultyFromAnswerer,ArticleFile
0,Alessandro_Volta,Was Alessandro Volta a professor of chemistry?,Alessandro Volta was not a professor of chemis...,easy,easy,data/set4/a10
1,Alessandro_Volta,Was Alessandro Volta a professor of chemistry?,No,easy,hard,data/set4/a10
2,Alessandro_Volta,Did Alessandro Volta invent the remotely opera...,Alessandro Volta did invent the remotely opera...,easy,easy,data/set4/a10
3,Alessandro_Volta,Did Alessandro Volta invent the remotely opera...,Yes,easy,easy,data/set4/a10
4,Alessandro_Volta,Was Alessandro Volta taught in public schools?,Volta was taught in public schools.,easy,easy,data/set4/a10
5,Alessandro_Volta,Was Alessandro Volta taught in public schools?,Yes,easy,easy,data/set4/a10
6,Alessandro_Volta,Who did Alessandro Volta marry?,Alessandro Volta married Teresa Peregrini.,medium,medium,data/set4/a10
7,Alessandro_Volta,Who did Alessandro Volta marry?,Teresa Peregrini,medium,medium,data/set4/a10
8,Alessandro_Volta,What did Alessandro Volta invent in 1800?,"In 1800, Alessandro Volta invented the voltaic...",medium,easy,data/set4/a10
9,Alessandro_Volta,What did Alessandro Volta invent in 1800?,voltaic pile,medium,medium,data/set4/a10


The data set has 6 columns - `ArticleTitle`, `Question`, `Answer`, `DifficultyFromQuestioner`, `DifficultyFromAnswerer`, and `ArticleFile`.

In [16]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1458 entries, 0 to 1457
Data columns (total 6 columns):
ArticleTitle                1458 non-null object
Question                    1440 non-null object
Answer                      1222 non-null object
DifficultyFromQuestioner    1262 non-null object
DifficultyFromAnswerer      1222 non-null object
ArticleFile                 1458 non-null object
dtypes: object(6)
memory usage: 68.4+ KB


Check if there are `NaN` (Not a Number) values.

In [17]:
df.isnull().sum()

ArticleTitle                  0
Question                     18
Answer                      236
DifficultyFromQuestioner    196
DifficultyFromAnswerer      236
ArticleFile                   0
dtype: int64

Since there are NaN's, remove them, and select only `ArticleTitle`, `Question`, `Answer` columns for training the chatbot.

In [18]:
df = df.loc[:, ['ArticleTitle', 'Question', 'Answer']]
df.dropna(inplace=True)

In [19]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1222 entries, 0 to 1457
Data columns (total 3 columns):
ArticleTitle    1222 non-null object
Question        1222 non-null object
Answer          1222 non-null object
dtypes: object(3)
memory usage: 38.2+ KB


Check if there are still NaN's.

In [20]:
df.isnull().sum()

ArticleTitle    0
Question        0
Answer          0
dtype: int64

## 4. Train the chatbot <a id="train"></a>

In this section, you will learn how to train a chatbot.

Due to the rate limit of the Watson Assistant service, you will use 3 `articles`. Each `article` has unique `questions`, and each `question` can have `multiple answers`.

In [21]:
articles = df['ArticleTitle'].unique().tolist()
articles = articles[:3]

The `intents` are the unique questions of each article. Each unique questions have at least two different answers.

Create a `dialog` for each question and assign the answers. You can learn the details of each term, `intent`, `dialog`, etc., from the <a href="https://cloud.ibm.com/docs/services/assistant?topic=assistant-getting-started" target="_blank" rel="noopener noreferrer">Watson™ Assistant</a> documentation and the <a href="https://cloud.ibm.com/apidocs/assistant" target="_blank" rel="noopener noreferrer">Watson Assistant API</a> documentation.

In [22]:
workspace_id = create_workspace_response['workspace_id']

for article in articles:
    # Get unique questions from the selected articles.
    questions = df[df['ArticleTitle'] == article]['Question'].unique()
    idx = 0
    
    # For each question, the intent name is assigned as 'article_name_idx'.
    for q in questions:
        intent_name = article + '_' + str(idx)
        
        # Create an intent for the question.
        assistant.create_intent(
            workspace_id=workspace_id,
            intent=intent_name,
            examples=[
                {'text': q}
            ]
        )
        idx += 1
        
        # Create a dialog node.
        assistant.create_dialog_node(
            workspace_id=workspace_id,
            dialog_node=intent_name.lower(),
            conditions='#' + intent_name,
            # Pair the question and the answers.
            output={
                'text': df[(df['ArticleTitle'] == article) & (df['Question'] == q)]['Answer'].tolist()
            },
            title=intent_name
        )

## 5. Test the chatbot <a id="test"></a>

In this section, you will learn how to test the trained chatbot.

Import `random` module in order to make the chatbot randomly select the answer from given answers to the question. As mentioned in section [4. Train the chatbot](#train), a question can have multiple answers according to the data set.

In [23]:
import random

Here are some questions from the data set you can try:

1. Was Alessandro Volta a professor of chemistry?
2. Was Avogadro a  professor at the University of Turin?
3. Do ants belong to the same order as bees?

If you want to quit the chatbot, just type in `quit`.

In [24]:
print('Welcome to the Watson Assistant example!\n')

while True:
    q = input('Enter a question: ')
    if q != 'quit':
        response = assistant.message(
            workspace_id=create_workspace_response['workspace_id'],
            input={
                'text': q
            }
        ).get_result()
        print(random.choice(response['output']['text']) + '\n')
    else:
        break

Welcome to the Watson Assistant example!

Enter a question: Was Alessandro Volta a professor of chemistry?
No

Enter a question: Was Avogadro a professor at the University of Turin?
Alessandro Volta was not a professor of chemistry.

Enter a question: Do ants belong to the same order as bees?
Yes, ants belong to the same order as bees.

Enter a question: quit


## 6. Summary and next steps <a id="summary"></a>

You successfully completed this notebook!

You learned how to create a simple chatbot using <a href="https://cloud.ibm.com/apidocs/assistant" target="_blank" rel="noopener noreferrer">Watson™ Assistant API</a> and Python 3.x.

Currently, you cannot deploy your chatbot via Watson Studio, but you can deploy your it using the GUI tool mentioned section [2. Create a Watson™ Assistant workspace](#ws).

You can also integrate your chatbot with a custom application, Intercom, Facebook Messenger, web-hosted chat widget, and Slack. You can find the details in the Watson Assistant <a href="https://cloud.ibm.com/docs/services/assistant?topic=assistant-deploy-integration-add" target="_blank" rel="noopener noreferrer">documenation</a>.

### Citation
Noah Smith, Michael Heilman, Rebecca Hwa, Shay Cohen, Kevin Gimpel, and many students at Carnegie Mellon University and the University of Pittsburgh, <a href="http://www.cs.cmu.edu/~ark/QA-data/" target="_blank" rel="noopener noreferrer">http://www.cs.cmu.edu/~ark/QA-data/</a>, 2008-2010. 

### Author
 
**Jihyoung Kim**, Ph.D., is a Data Scientist at IBM who strives to make data science easy for everyone through Watson Studio.

Copyright © 2019 IBM. This notebook and its source code are released under the terms of the MIT License. The data set is separately licensed under the terms of the GFDL (http://www.gnu.org/licenses/fdl.html) and the CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0/).

<div style="background:#F5F7FA; height:110px; padding: 2em; font-size:14px;">
<span style="font-size:18px;color:#152935;">Love this notebook? </span>
<span style="font-size:15px;color:#152935;float:right;margin-right:40px;">Don't have an account yet?</span><br>
<span style="color:#5A6872;">Share it with your colleagues and help them discover the power of Watson Studio!</span>
<span style="border: 1px solid #3d70b2;padding:8px;float:right;margin-right:40px; color:#3d70b2;"><a href="https://ibm.co/wsnotebooks" target="_blank" style="color: #3d70b2;text-decoration: none;">Sign Up</a></span><br>
</div>