# Bookworm


## Overview

In this project, you will build a simple question-answering agent that is able to learn from any text data you provide, and answer queries posed in natural language. You will use IBM Watson's cloud-based services to process the input text data and find relevant responses.

## Learning Objectives

By completing this project, you will learn how to:

- Create a cloud-based NLP service instance and configure it.
- Ingest a set of text documents using the service and analyze the results.
- Accept questions in natural language and parse them.
- Find relevant answers from the preprocessed text data.

## Getting Started

In order to use Watson's cloud-based services, you first need to create an account on the [IBM Bluemix platform](https://console.ng.bluemix.net/).

<div>
    <div style="display: table-cell; width: 50%;">
        <img src="images/watson-logo.png" alt="IBM Watson logo" width="200" />
    </div>
    <div style="display: table-cell; width: 50%;">
        <img src="images/bluemix-logo.png" alt="IBM Bluemix logo" width="400" />
    </div>
</div>

Then, for each service you want to use, you have to create an instance of that service. You can continue with the tasks below, and create a service instance when indicated.

## 1. Create and configure Discovery service

Create an instance of the **Discovery** service. You will use this to process a set of text documents, and _discover_ relevant facts and relationships.

- Go to the [IBM Bluemix Catalog](https://console.bluemix.net/catalog/).
- Select [Discovery](https://console.bluemix.net/catalog/services/discovery) service under the [AI](https://console.bluemix.net/catalog/?category=ai) category.
- Enter a Service Name for that instance, e.g. `Discovery-Bookworm` and click **`Create`** button on the bottom right hand corner of the screen.
- You should be able to see your newly-created service in your [Bluemix Apps Dashboard](https://console.bluemix.net/dashboard/apps).
<img src="images/app-dashboard-discovery.png" alt="App Dashboard" width="800" />

- Open the `Discovery-Bookworm` service instance and find your `Url` and `API Key` in **Credentials** section.

<img src="images/discovery-apikey.png" alt="Discovery Service - Credentials tab" width="800" />

_Note: you will need the username and password when connecting to the service in the next steps shortly._

### Connect to the service instance

Let's connect to the service instance you just created using IBM Watson's [Python SDK](https://github.com/watson-developer-cloud/python-sdk). You will first need to install the SDK:
```bash
pip install watson-developer-cloud
```

Now execute each code cell below using **`Shift+Enter`**, and complete any steps indicated by a **`TODO`** comment. For more information on the Discovery service, please read the [Documentation](https://www.ibm.com/watson/developercloud/doc/discovery/index.html) and look at the [API Reference](https://www.ibm.com/watson/developercloud/discovery/api/v1/?python) as needed.

In [2]:
# Install watson-developer-cloud
# This takes about a minute

!pip install --upgrade "watson-developer-cloud>=2.4.1" --user

Requirement already up-to-date: watson-developer-cloud>=2.4.1 in c:\users\rababalkhalifa\appdata\roaming\python\python36\site-packages (2.10.1)


You are using pip version 19.0.2, however version 19.1.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.


In [3]:
# Usual Python imports
import sys
import os
import glob
import json

# BeautifulSoup, for parsing HTML
from bs4 import BeautifulSoup

# Matplotlib, for plotting
import matplotlib.pyplot as plt
%matplotlib inline

# Watson Python SDK
# If not installed run: 
# !pip install --upgrade watson-developer-cloud
import watson_developer_cloud

# Utility functions
%load_ext autoreload
%aimport helper
%autoreload 1

###  Using the Service Credentials

Before you can connect to Watson Service, you need to copy and paste `Username` and `Password` credentials from Bluemix Service console to this notebook.
_Note: these credentials are different from your IBM Bluemix login, and are specific to the service instance._

1. Open `service-credentials.json` file:
    * from this Jupyter Notebook top navigation, click `File` &rarr; `Open` &rarr; `service-credentials.json`
2. Copy your `API Key` and `Url` from Discovery service console.
3. Paste the credentials into `apikey` and `url` values in `Discovery` object.


<img src="images/service-discovery-json.png" alt="Discovery Service - Credentials JSON" width="600" />

In [4]:
discovery_creds = helper.fetch_credentials('discovery')

discovery = watson_developer_cloud.DiscoveryV1(
                        version='2018-08-01',
                        url=discovery_creds['url'],
                        iam_apikey=discovery_creds['apikey'])



### Create an environment

The Discovery service organizes everything needed for a particular application in an [_environment_](https://www.ibm.com/watson/developercloud/discovery/api/v1/curl.html?curl#environments-api). An environment must be created before collections of private data can be created.

Let's create one called `Bookworm` for this project.

> _**Note**: It is okay to run this block multiple times - it will not create duplicate environments with the same name._

In [5]:
# Prepare an environment to work in
env, env_id = helper.fetch_object(
    discovery, "environment", "Bookworm",
    create=True, create_args=dict(
        description="A space to read and understand stories"  # feel free to edit
    ))

Found environment: Bookworm (22211a5c-b356-4b13-b357-04bbe476c87c)


### Verify configuration options

A [_configuration_](https://www.ibm.com/watson/developercloud/discovery/api/v1/curl.html?curl#configurations-api) defines what natural language processing routines are applied to any documents that are submitted to the service. Each environment gets a default configuration when it is created.

You can fetch the default configuration and view the different options using the following piece of code.


In [6]:
# Lists existing configurations for the service instance and store default configuration id
configurations = discovery.list_configurations(environment_id=env_id).get_result()
cfg_id =  configurations['configurations'][0]['configuration_id']
print(json.dumps(configurations, indent=2))

{
  "configurations": [
    {
      "configuration_id": "82a07426-9816-4c8d-81b3-b9419c30d02e",
      "name": "Default Configuration",
      "description": "The configuration used by default when creating a new collection without specifying a configuration_id.",
      "created": "2019-06-07T19:29:30.906Z",
      "updated": "2019-06-07T19:29:30.906Z"
    },
    {
      "configuration_id": "65b04976-db3b-4b9c-8bcc-fbc6c297da98",
      "name": "Default Contract Configuration",
      "description": "Extract party, nature, and category from elements in PDFs.",
      "created": "2019-06-07T20:49:29.398Z",
      "updated": "2019-06-07T20:49:29.398Z"
    }
  ]
}


In [7]:
# Get default configuration details
config = discovery.get_configuration(environment_id=env_id, configuration_id=cfg_id).get_result()
print(json.dumps(config, indent=2))

{
  "configuration_id": "82a07426-9816-4c8d-81b3-b9419c30d02e",
  "name": "Default Configuration",
  "created": "2019-06-07T19:29:30.906Z",
  "updated": "2019-06-07T19:29:30.906Z",
  "description": "The configuration used by default when creating a new collection without specifying a configuration_id.",
  "conversions": {
    "html": {
      "exclude_content": {
        "xpaths": []
      },
      "exclude_tag_attributes": [
        "EVENT_ACTIONS"
      ],
      "exclude_tags_completely": [
        "script",
        "sup"
      ],
      "exclude_tags_keep_content": [
        "font",
        "em",
        "span"
      ],
      "keep_content": {
        "xpaths": []
      }
    },
    "json_normalizations": [],
    "pdf": {
      "heading": {
        "fonts": [
          {
            "level": 1,
            "max_size": 80,
            "min_size": 24
          },
          {
            "bold": false,
            "italic": false,
            "level": 2,
            "max_size": 24,
     

There are 3 main configuration blocks that affect how input documents are processed:

1. **conversions**: How to convert documents in various formats (Word, PDF, HTML) and extract elements that indicate some structure (e.g. headings).
2. **enrichments**: What NLP output results are we interested in (keywords, entities, sentiment, etc.).
3. **normalizations**: Post-processing steps to be applied to the output. This can be left empty in most cases, unless you need the output to be normalized into a very specific format.

_**Note**: The default configuration for an environment cannot be modified. If you need to change any of the options, you will need to create a new one, and then edit it. The easiest way to do this is using the service dashboard, which is described later._

### Testing Language Enrichment

It is a good idea to test your configuration on a small sample text before you apply it to a larger document collection.

_**Note**: We have supplied a sample document (`data/sample.html`) containing the opening crawl text for Star Wars: Episode IV, but you are free to use a text of your choosing._

**Q**: (optional) If you use your own sample text, provide a brief title and description below.

**A**: 

In [8]:
# Test configuration on some sample text
data_dir = "data"
filename = os.path.join(data_dir, "sample.html")
with open(filename, "r") as f:
    res = discovery.test_configuration_in_environment(environment_id=env_id, configuration_id=cfg_id, file=f).get_result()
print(json.dumps(res, indent=2))

{
  "status": "completed",
  "original_media_type": "text/html",
  "snapshots": [
    {
      "step": "html_input",
      "snapshot": {
        "html": "<html>\n<head>\n    <title>Star Wars: Episode IV - A New Hope (Opening Crawl)</title>\n</head>\n<body>\n    <article>\n        <h1>Star Wars: Episode IV - A New Hope (Opening Crawl)</h1>\n        <p>\n            It is a period of civil war. Rebel spaceships, striking from a hidden base, have won their first victory against the evil Galactic Empire.\n        </p><p>\n            During the battle, Rebel spies managed to steal secret plans to the Empire's ultimate weapon, the DEATH STAR, an armored space station with enough power to destroy an entire planet.\n        </p><p>\n            Pursued by the Empire's sinister agents, Princess Leia races home aboard her starship, custodian of the stolen plans that can save her people and restore freedom to the galaxy...\n        </p>\n    </article>\n</body>"
      }
    },
    {
      "step":

### Analyze test output



The results returned by the service contain a _snapshot_ of the information extracted at each step of processing - document conversions, enrichments and normalizations. We are interested in the output of applying enrichments ("enrichments_output") or after normalizing them ("normalizations_output"). These should be identical if no post-processing/normalizations were specified in the configuration.

In [9]:
# Take a closer look at the results from the "enrichments_output" or "normalizations_output" step
output = next((s["snapshot"] for s in res["snapshots"] if s["step"] == "normalizations_output"), None)
print(json.dumps(output, indent=2))

{
  "extracted_metadata": {
    "title": "Star Wars: Episode IV - A New Hope (Opening Crawl)"
  },
  "html": "<?xml version='1.0' encoding='UTF-8' standalone='yes'?><html>\n<head>\n    <meta content=\"text/html; charset=UTF-8\" http-equiv=\"Content-Type\"/>\n    \n    <title>Star Wars: Episode IV - A New Hope (Opening Crawl)</title>\n\n\n</head>\n<body>\n\n\n    <article>\n        <h1>Star Wars: Episode IV - A New Hope (Opening Crawl)</h1>\n        <p>\n            It is a period of civil war. Rebel spaceships, striking from a hidden base, have won their first victory against the evil Galactic Empire.\n        </p><p>\n            During the battle, Rebel spies managed to steal secret plans to the Empire's ultimate weapon, the DEATH STAR, an armored space station with enough power to destroy an entire planet.\n        </p><p>\n            Pursued by the Empire's sinister agents, Princess Leia races home aboard her starship, custodian of the stolen plans that can save her people and res

Answer the following questions based on the output above. Note that it contains the input HTML, extracted text and metadata as well as the actual enrichment results.

#### Sentiment

**Q**: What is the overall sentiment detected in this text? Mention the `type` (positive/negative) and `score`.<br />
(_Hint: Look for the `"sentiment"` key in the output._)

**A**: 
 "score": -0.607654,
 "label": "negative"

#### Concepts

**Q**: List 3 concepts that have been identified with a relevance > 0.5. Note that not all concepts here may be present directly in the text, some may have been inferred by Watson.<br />
(_Hint: Look for `"concepts"`._)

**A**:


Star Wars Episode IV: A New Hope 0.98887 <br />
Star Wars 0.985705<br />
Rebel Alliance 0.90639<br />
Luke Skywalker 0.805771<br />
Star Wars Episode V: The Empire Strikes Back 0.801062<br />
Darth Vader 0.777337<br />
Star Wars Episode VI: Return of the Jedi 0.77159<br />
Grand Moff Tarkin 0.748553<br />

In [10]:
res.keys()

dict_keys(['status', 'original_media_type', 'snapshots', 'notices'])

In [11]:
x = res['snapshots']

In [12]:
y = x[6]['snapshot']

In [13]:
y.keys()

dict_keys(['extracted_metadata', 'html', 'text', 'enriched_text'])

In [14]:
z = y['enriched_text']

In [15]:
z.keys()

dict_keys(['sentiment', 'entities', 'concepts', 'categories'])

In [16]:
n=z['concepts']

In [17]:
n

[{'dbpedia_resource': 'http://dbpedia.org/resource/Star_Wars_Episode_IV:_A_New_Hope',
  'relevance': 0.98887,
  'text': 'Star Wars Episode IV: A New Hope'},
 {'dbpedia_resource': 'http://dbpedia.org/resource/Star_Wars',
  'relevance': 0.985705,
  'text': 'Star Wars'},
 {'dbpedia_resource': 'http://dbpedia.org/resource/Rebel_Alliance',
  'relevance': 0.90639,
  'text': 'Rebel Alliance'},
 {'dbpedia_resource': 'http://dbpedia.org/resource/Luke_Skywalker',
  'relevance': 0.805771,
  'text': 'Luke Skywalker'},
 {'dbpedia_resource': 'http://dbpedia.org/resource/Star_Wars_Episode_V:_The_Empire_Strikes_Back',
  'relevance': 0.801062,
  'text': 'Star Wars Episode V: The Empire Strikes Back'},
 {'dbpedia_resource': 'http://dbpedia.org/resource/Darth_Vader',
  'relevance': 0.777337,
  'text': 'Darth Vader'},
 {'dbpedia_resource': 'http://dbpedia.org/resource/Star_Wars_Episode_VI:_Return_of_the_Jedi',
  'relevance': 0.77159,
  'text': 'Star Wars Episode VI: Return of the Jedi'},
 {'dbpedia_resour

In [18]:
for i in range(len(n)):
    rev = n[i]['relevance']
    if rev > .5:
        print(n[i]['text'],rev)

Star Wars Episode IV: A New Hope 0.98887
Star Wars 0.985705
Rebel Alliance 0.90639
Luke Skywalker 0.805771
Star Wars Episode V: The Empire Strikes Back 0.801062
Darth Vader 0.777337
Star Wars Episode VI: Return of the Jedi 0.77159
Grand Moff Tarkin 0.748553




Get a good sense of all the different pieces of information available in the results. Start thinking about which ones will be useful for looking up answers to questions, and how you might use them.

## 2. Ingest documents

### Create a collection

A _collection_ is used to organize documents of the same kind. For instance, you may want to create a collection of book reviews, or a collection of Wikipedia articles, but it may not make much sense to mix the two groups. This allows Watson to make meaningful inferences over the set of documents, find commonalities and identify important concepts.

Let's create a collection called `Story Chunks` in the Discovery service environment.

In [19]:
# Prepare a collection of documents to use
col, col_id = helper.fetch_object(discovery, "collection", "Story Chunks", environment_id=env_id,
    create=True, create_args=dict(
        environment_id=env_id, configuration_id=cfg_id,
        description="Stories and plots split up into chunks suitable for answering")
    )

Found collection: Story Chunks (9a8dd41b-2a64-4450-8541-91be8d2f69b6)


Once you have created a collection, you should be able to view it using the Discovery Service tool. Select the Discovery instance from your BlueMix dashboard.  To open, click the **`Launch tool`** button.

<img src="images/discovery-launch.png" alt="Discovery service - Manage tab" width="800" />


Here you should see the `Story Chunks` collection you just created.

<img src="images/discovery-tooling.png" alt="Discovery service - Tool showing collections" width="800" />

You can open the collection to view more details about it. If you need to modify configuration options, click the **Switch** link and create a new configuration (the default one cannot be changed).

### Add documents

Okay, now that we have everything set up, let's add a set of "documents" we want Watson to look up answers from, using the Python SDK. Note that Watson treats each "document" as a unit of text that is returned as the result of a query. But we want to retrieve a paragraph of text for each question. So, let's split each file up into individual paragraphs. We will use the [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) library for this purpose.

_**Note**: You could also add and manage documents in the collection using the Discovery tool, but you would have to split paragraphs up into separate files._

_**Note**: We have provided a set of files (`data/Star-Wars/*.html`) with summary plots for Star Wars movies, but you are free to use a collection of your choice. Open one of the files in a text editor to see how the paragraphs are delimited using `<p>...</p>` tags - this is how the code block below split paragraphs into separate "documents"._

In [20]:
# Add documents to collection
doc_ids = []  # to store the generated id for each document added
for filename in glob.glob(os.path.join(data_dir, "Star-Wars", "*.html")):
    print("Adding file:", filename)
    with open(filename, "r") as f:
        # Split each individual <p> into its own "document"
        doc = f.read()
        soup = BeautifulSoup(doc, 'html.parser')
        for i, p in enumerate(soup.find_all('p')):
            doc_info = discovery.add_document(
                environment_id=env_id, 
                collection_id=col_id,
                file=json.dumps({"text": p.get_text(strip=True)}),
                filename='n',
                file_content_type="application/json",
                metadata=json.dumps({"title": soup.title.get_text(strip=True)})
            ).get_result() # add get_result() here

            doc_ids.append(doc_info["document_id"])
print("Total", len(doc_ids), "documents added.")

Adding file: data\Star-Wars\Episode-III_Revenge-of-the-Sith.html
Adding file: data\Star-Wars\Episode-II_Attack-of-the-Clones.html
Adding file: data\Star-Wars\Episode-IV_A-New-Hope.html
Adding file: data\Star-Wars\Episode-I_The-Phantom-Menace.html
Adding file: data\Star-Wars\Episode-VII_The-Force-Awakens.html
Adding file: data\Star-Wars\Episode-VI_Return-of-the-Jedi.html
Adding file: data\Star-Wars\Episode-V_The-Empire-Strikes-Back.html
Adding file: data\Star-Wars\Rogue-One.html
Total 42 documents added.


If you look at the collection details, you may notice that the `"document_counts"` field now shows some documents as `available` or `processing`. Once processing is complete, you should see all the documents under the `available` count.

In [21]:
# View collection details to verify all documents have been processed
col, col_id = helper.fetch_object(discovery, "collection", "Story Chunks", environment_id=env_id)

Found collection: Story Chunks (9a8dd41b-2a64-4450-8541-91be8d2f69b6)


So, what did the Discovery service learn? If you list the fields extracted from the set of documents in the collection as part of the enrichment process, you'll see familiar fields like `concepts`, `entities` and `keywords` that were returned in the test analysis.

In [22]:
# List all fields extracted
discovery.list_collection_fields(environment_id=env_id, collection_id=col_id).get_result()

{'fields': [{'field': 'enriched_text.entities.sentiment.score',
   'type': 'double'},
  {'field': 'enriched_text.sentiment.document.label', 'type': 'string'},
  {'field': 'enriched_text.concepts.dbpedia_resource', 'type': 'string'},
  {'field': 'extracted_metadata.file_type', 'type': 'string'},
  {'field': 'enriched_text.concepts.text', 'type': 'string'},
  {'field': 'enriched_text.concepts.relevance', 'type': 'double'},
  {'field': 'enriched_text.categories.score', 'type': 'double'},
  {'field': 'enriched_text.entities.sentiment.label', 'type': 'string'},
  {'field': 'enriched_text.entities.relevance', 'type': 'double'},
  {'field': 'text', 'type': 'string'},
  {'field': 'enriched_text.entities.count', 'type': 'double'},
  {'field': 'enriched_text.entities.sentiment', 'type': 'nested'},
  {'field': 'enriched_text.sentiment.document', 'type': 'nested'},
  {'field': 'enriched_text.entities.text', 'type': 'string'},
  {'field': 'enriched_text.entities', 'type': 'nested'},
  {'field': 'en

### Test query

Let's perform a simple query to see if the service can fetch the proper document for us:
> _Look for all paragraphs that have a relation (sentence) with "Jar Jar" as the subject, and return the title and text._


In [23]:
# A simple query
results = discovery.query(environment_id=env_id, collection_id=col_id,
    query_options={
        "query": "enriched_text.relations.subject.text:\"Jar Jar\"",
        "return": "metadata.title,text"
    }).get_result()
print(json.dumps(results, indent=2))

{
  "matching_results": 81,
  "session_token": "1_j1sJmD7b2O6F0Qa2_cqS6f8WLa",
  "results": [
    {
      "id": "f848842d-7e7b-44c7-9a07-8692fe01ff17",
      "result_metadata": {
        "score": 1
      },
      "extracted_metadata": {
        "sha1": "8c37b2d965338d249621229ed6dd5d8014684a6f",
        "filename": "n",
        "file_type": "json"
      },
      "text": "Three years after the the start of the Clone Wars between the Galactic Republic and the Confederacy of Independent Systems, war has gripped the galaxy. During a space battle over the planet Coruscant, Jedi Knights Obi-Wan Kenobi and Anakin Skywalker lead a mission to rescue the kidnapped Supreme Chancellor Palpatine from Separatist commander General Grievous. After infiltrating Grievous's flagship, the Jedi battle Count Dooku. Anakin subdues Dooku, and on Palpatine's urging, executes him. Grievous flees the battle-torn cruiser, which the Jedi crash-land on Coruscant. There, Anakin reunites with his wife, Padm\u00c3\u00

Change the above query and see what results you get! Try to find one that returns relevant results, and keep that (along with the output) for review.

> See [Query building reference](https://www.ibm.com/watson/developercloud/doc/discovery/query-reference.html) for descriptions of all possible parameters, operators and aggregations. You can also choose to build the query using the web interface (click the "Story Chunks" collection to query it), and then reproduce the query here.

Then answer the questions below:

**Q**: What query did you try? Express it in plain words below.

**A**: nested(enriched_text.entities).filter(enriched_text.entities.type::Person).term(enriched_text.entities.text)


**Q**: What answer did you get back from Watson? You only need to mention the relevant snippet of text fro mthe paragraph(s) returned.

**A**: 
Showing 10 of 42 matching documents

In [24]:
# A simple query
results = discovery.query(environment_id=env_id, collection_id=col_id,
    query_options={
        "query": "nested(enriched_text.entities).filter(enriched_text.entities.type::Person).term(enriched_text.entities.text)",
        "return": "metadata.title,text"
    }).get_result()
print(json.dumps(results, indent=2))

{
  "matching_results": 84,
  "session_token": "1_j1sJmD7b2O68eQa2_cqS6f8WLa",
  "results": [
    {
      "id": "4938db67-404f-4638-a214-9aa5081e8e7f",
      "result_metadata": {
        "score": 1
      },
      "extracted_metadata": {
        "sha1": "8c37b2d965338d249621229ed6dd5d8014684a6f",
        "filename": "n",
        "file_type": "json"
      },
      "text": "Three years after the the start of the Clone Wars between the Galactic Republic and the Confederacy of Independent Systems, war has gripped the galaxy. During a space battle over the planet Coruscant, Jedi Knights Obi-Wan Kenobi and Anakin Skywalker lead a mission to rescue the kidnapped Supreme Chancellor Palpatine from Separatist commander General Grievous. After infiltrating Grievous's flagship, the Jedi battle Count Dooku. Anakin subdues Dooku, and on Palpatine's urging, executes him. Grievous flees the battle-torn cruiser, which the Jedi crash-land on Coruscant. There, Anakin reunites with his wife, Padm\u00c3\u00

## 3. Parse natural language questions

In order to understand questions posed in natural language, we'll use another AI service called [Watson Assistant](https://console.bluemix.net/catalog/services/watson-assistant-formerly-conversation) (Formerly 'Conversation'). It can be used to design conversational agents or _chatbots_ that exhibit complex behavior, but for the purpose of this project, we'll only use it to parse certain kinds of queries.

### Create a Assistant service instance

Just like you did for the Discovery service, create an instance of the Assistant service. Then launch the associated tool from the service dashboard.

- Go to the [IBM Bluemix Catalog](https://console.bluemix.net/catalog/).
- Select [Watson Assistant (formerly Conversation)](https://console.bluemix.net/catalog/services/watson-assistant-formerly-conversation) service under the [AI](https://console.bluemix.net/catalog/?category=ai) category.
- Enter a Service Name for that instance, e.g. `Assistant-Bookworm` and click **`Create`** button on the bottom right hand corner of the screen.
- You should be able to see your newly-created service in your [Bluemix Apps Dashboard](https://console.bluemix.net/dashboard/apps).
- Open the service instance and find your `Url` and `API Key` in **Credentials** section.
- Copy *API Key* and *URL* into `service-credentials.json` file in this notebook:

<img src="images/assistant-apikey.png" alt="Discovery Service - Credentials tab" width="600" />

<img src="images/assistant-cred.png" alt="Discovery Service - Credentials tab" width="600" />

### Create a Workspace from Watson Assistant console

A [_workspace_](https://www.ibm.com/watson/developercloud/assistant/api/v1/python.html?python#workspaces-api) allows you to keep all the items you need for a particular application in one place, just like an _environment_ in case of the Discovery service. 

From Watson Assistant console, please follow these steps:

1) Click `Launch Tool` to start Watson Assistant.
<img src="images/assistant-launch-tool.png" alt="Assistant service - Bookworm workspace" width="800" />

2) Click `Skills` tab on the navigation menu and click `Create new` button. 
<img src="images/assistant-create-new-skills.png" alt="Assistant service - Bookworm workspace" width="800" />

3) Create new skills called `Bookworm` with a suitable description, such as "I know a lot of stories. Ask me a question!".
<img src="images/assistant-add-dialog-skills.png" alt="Assistant service - Bookworm workspace" width="800" />

Clicking on `Bookworm` workspace should open up a blank workspace, where you can add intents, define the entities you want the agent to idenitfy and structure the overall dialog.

<img src="images/blank-workspace.png" alt="Conversation service - Blank workspace" width="800" />

### Add intents

An [_intent_](https://www.ibm.com/watson/developercloud/assistant/api/v1/python.html?python#intents-api) is the goal or purpose of a user's input. Intent will determine the dialog flows with the users and allow Watson Assistant to provide a useful response. Please read Watson Assistant documentation on [_Planning Your Intents and Entities_](https://console.bluemix.net/docs/services/conversation/intents-entities.html#planning-your-entities).

Your task is to create a set of intents (at least 3) that capture the different kinds of questions that you want the system to answer, e.g. _who_, _what_ and _where_. Along with each intent, add a list of user examples or _utterances_ that map to that intent.

For instance, you could enter the following examples for the _where_ intent:

- Where is the Jedi temple located?
- Where was Luke born?

_Intent user examples should represent typical sentences that end users will use to interact with the application. The more examples you can provide for each intent, the better Watson Assistant will respond to the end user._

<img src="images/assistant-intents.png" alt="Assistant service - Intents listed" width="800" />

> See [**Defining intents**](https://console.bluemix.net/docs/services/conversation/intents.html#defining-intents) for a helpful video and further instructions.

**Q**: What intents did you add to the Assistant service instance?

**A**:



**Q**: Pick one of these intents, and list at least 5 examples for the intent that you entered.

**A**:

In [25]:
import pandas
df = pandas.read_csv('data/dcfbd5c9-8080-4364-b5d7-e0051db6a1c9_intents.csv',header=None)
df.columns = [ 'Questions','Intent']
df.head()

Unnamed: 0,Questions,Intent
0,where Star Wars was filmed?,where
1,Where is the Jedi temple located?,where
2,From where Rebel came?,where
3,Where was Luke born?,where
4,"Where can I be a hero in my own ""Star Wars"" st...",where


In [26]:
df[df['Intent'] == 'who']

Unnamed: 0,Questions,Intent
5,who want to hear more in-depth analysis,who
6,Who doesn’t like Star Wars?,who
7,who bridge the Star Wars Classic & Prequel tr...,who
8,Who love start war covers?,who
9,Who is star wars character in the movie,who


### Add entities

Once you have your intents set, let's tell the service what [_entities_](https://www.ibm.com/watson/developercloud/assistant/api/v1/python.html?python#entities-api) we want it to identify. One way to do this is using the `Entities` tool on Watson Assistant console, and entering them one-by-one to the blank `My entities` page.

<img src="images/assistant-entities-blank.png" alt="Assistant service - No entities listed" width="800" />

> Go to [**Defining entities**](https://console.bluemix.net/docs/services/conversation/entities.html#defining-entities) to see how that is done.

But that can be tedious! So let's refer back to the entities that the Discovery service identified, and load them in programmatically.

As before, let's connect to the Assistant service first. Remember to enter your service credentials below.

In [45]:
# Connect to the Assistant service instance
# TODO: Enter your username and password from the Service Credentials tab in service-credentials.json


assistant_cred = helper.fetch_credentials('assistant')

assistant = watson_developer_cloud.AssistantV1(
                        version='2018-08-01',
                        url=assistant_cred['url'],
                        iam_apikey=assistant_cred['apikey'])


# Credentials_cred = helper.fetch_credentials('Credentials')  did not work

# assistant = watson_developer_cloud.AssistantV1(
#                         version='2018-08-01',
#                         url=Credentials_cred['Username'],
#                         iam_apikey=Credentials_cred['Password'])



Fetch the workspace you just created called "Bookworm".

In [46]:
wrk, wrk_id = helper.fetch_object(assistant, "workspace", "Bookworm")
wrk_id  = 'dcfbd5c9-8080-4364-b5d7-e0051db6a1c9'

In [47]:
print(json.dumps(wrk, indent=2))

null


### Exporting entities from Discovery service to Assistant service

Next step is to collect all the entities from the `Star Wars` documents that we added to the **Discovery** service collection and group them by entities type.

In [29]:
# Get all the entities from the collection and group them by type
response = discovery.query(environment_id=env_id, collection_id=col_id,
    query_options={
        "return": "enriched_text.entities.type,enriched_text.entities.text"
    }).get_result()

# Group individual entities by type ("Person", "Location", etc.)
entities_by_type = {}
for document in response["results"]:
    for entity in document["enriched_text"]["entities"]:
        if entity["type"] not in entities_by_type:
            entities_by_type[entity["type"]] = set()
        entities_by_type[entity["type"]].add(entity["text"])

# Ignore case to avoid duplicates
for entity_type in entities_by_type:
    entities_by_type[entity_type] = {
        e.lower(): e for e in entities_by_type[entity_type]
    }.values()

# Restructure for loading into Assistant workspace
entities_grouped = [{
    "entity": entity_type,
    "values": [{"value": entity} for entity in entities]}
        for entity_type, entities in entities_by_type.items()]
entities_grouped


[{'entity': 'Person',
  'values': [{'value': 'Senator PadmÃ© Amidala'},
   {'value': 'Boba'},
   {'value': 'Palpatine'},
   {'value': 'Supreme Chancellor Palpatine'},
   {'value': 'Organa'},
   {'value': 'Obi-Wan Kenobi'},
   {'value': 'Anakin'},
   {'value': 'Luke'},
   {'value': 'Polis Massa'},
   {'value': 'Mustafar'},
   {'value': 'Darth Sidious'},
   {'value': 'Leia'},
   {'value': 'Yoda'},
   {'value': 'PadmÃ© Amidala'},
   {'value': 'Shmi'},
   {'value': 'Obi-Wan'},
   {'value': 'Darth Vader'},
   {'value': 'Vader'},
   {'value': 'Cliegg Lars'},
   {'value': 'Jango Fett'},
   {'value': 'Beru Lars'},
   {'value': 'Owen'},
   {'value': 'Zam Wesell'},
   {'value': 'Anakin Skywalker'},
   {'value': 'Mace Windu'}]},
 {'entity': 'Organization',
  'values': [{'value': 'Senate'},
   {'value': 'Tusken Raiders'},
   {'value': 'Trade Federation'},
   {'value': 'Confederacy of Independent Systems'},
   {'value': 'Jedi Council'}]},
 {'entity': 'JobTitle',
  'values': [{'value': 'representati

Update the workspace with these entities and verify that have been added correctly.

In [30]:
# Add these grouped entities to the Assistant workspace
assistant.update_workspace(workspace_id=wrk_id, entities=entities_grouped).get_result()

workspace_details = assistant.get_workspace(workspace_id=wrk_id, export=True).get_result()
print(json.dumps(workspace_details["entities"], indent=2))

[
  {
    "entity": "Facility",
    "values": [
      {
        "type": "synonyms",
        "value": "Jedi Temple",
        "synonyms": []
      }
    ]
  },
  {
    "entity": "Organization",
    "values": [
      {
        "type": "synonyms",
        "value": "Senate",
        "synonyms": []
      },
      {
        "type": "synonyms",
        "value": "Tusken Raiders",
        "synonyms": []
      },
      {
        "type": "synonyms",
        "value": "Trade Federation",
        "synonyms": []
      },
      {
        "type": "synonyms",
        "value": "Confederacy of Independent Systems",
        "synonyms": []
      },
      {
        "type": "synonyms",
        "value": "Jedi Council",
        "synonyms": []
      }
    ]
  },
  {
    "entity": "Person",
    "values": [
      {
        "type": "synonyms",
        "value": "Anakin",
        "synonyms": []
      },
      {
        "type": "synonyms",
        "value": "Luke",
        "synonyms": []
      },
      {
        "type":

_**Note**: Ensure that at least 3 entity types, with at least 1 example entity each have been added._

Here is what the list of entities should look like through the Assistant tool.

<img src="images/assistant-entities.png" alt="Assistant service - Entities listed" width="800" />

**Q**: Name 3 entity types that were added, with at least 1 example entity each (e.g. entity type: _City_, example: _Los Angeles_).

**A**: 

In [34]:
df = pandas.read_csv('data/dcfbd5c9-8080-4364-b5d7-e0051db6a1c9_entities.csv',header=None)
df.columns = [ 'Entity_Values','Type']
df.head()

Unnamed: 0,Entity_Values,Type
0,Person,Jango Fett
1,Person,Anakin Skywalker
2,Person,Rey
3,Person,Owen
4,Person,Darth Vader


### Design dialog flow

As a final step in creating the Assistant interface, let's design a typical dialog with a user. The most intuitive way to do this is to use the Dialog tab in the tool. Here, you can add _nodes_ that capture different stages in the dialog flow, and connect them in a meaningful way.

Go ahead and add at least 3 dialog nodes. Specify the triggers in terms of the intents and entities that you'd like to match, and an optional intermediate response like "Let me find that out for you." The actual response will be fetched by querying the Discovery service.

Here is what the dialog nodes should look like.

<img src="images/assistant_dialog_nodes.png" alt="Assistant service - Dialog nodes" width="800" />

**Q**: Specify 3 dialog nodes you added, along with the trigger (intent and/or entities) for each.

**A**: 

### Test dialog

Let's run through a test dialog to demonstrate how the system transitions to one of the nodes you defined above.

In [35]:
# Testing the dialog flow

# Start conversation with a blank message
results = assistant.message(workspace_id=wrk_id).get_result()
context = results["context"]

# Then ask a sample question
question= "Who is Luke's father?"
results = assistant.message(workspace_id=wrk_id,
                            input={'text': question}, context=context).get_result()

print(json.dumps(results, indent=2))

{
  "intents": [
    {
      "intent": "who",
      "confidence": 0.6688200473785401
    }
  ],
  "entities": [
    {
      "entity": "Person",
      "location": [
        7,
        11
      ],
      "value": "Luke",
      "confidence": 1
    }
  ],
  "input": {
    "text": "Who is Luke's father?"
  },
  "output": {
    "generic": [],
    "text": [],
    "nodes_visited": [
      "node_2_1559984435012"
    ],
    "log_messages": []
  },
  "context": {
    "conversation_id": "1104373d-fb32-4c3e-aeac-9327777a97b8",
    "system": {
      "initialized": true,
      "_node_output_map": {
        "Welcome": {
          "0": [
            0
          ]
        }
      },
      "dialog_turn_counter": 2,
      "dialog_stack": [
        {
          "dialog_node": "root"
        }
      ],
      "dialog_request_counter": 2,
      "branch_exited": true,
      "branch_exited_reason": "completed"
    }
  }
}


## 4. Query document collection to fetch answers

The Discovery service includes a simple mechanism to make queries against your enriched collection of documents. But you have a lot of control over what fields are searched, how results are aggregated and values are returned.

### Process sample question

Choose a sample nautal language question to ask, and run it through the Assistant service, just like you did above when testing dialog flow.

In [61]:
# TODO: Run a sample question through Assistant service

# Then ask a sample question
question= "Who is Luke's father?"
results = assistant.message(workspace_id=wrk_id,
                            input={'text': question}, context=context).get_result()

print(json.dumps(results, indent=2))

{
  "intents": [
    {
      "intent": "who",
      "confidence": 0.6688200473785401
    }
  ],
  "entities": [
    {
      "entity": "Person",
      "location": [
        7,
        11
      ],
      "value": "Luke",
      "confidence": 1
    }
  ],
  "input": {
    "text": "Who is Luke's father?"
  },
  "output": {
    "generic": [],
    "text": [],
    "nodes_visited": [
      "node_2_1559984435012"
    ],
    "log_messages": []
  },
  "context": {
    "conversation_id": "1104373d-fb32-4c3e-aeac-9327777a97b8",
    "system": {
      "initialized": true,
      "_node_output_map": {
        "Welcome": {
          "0": [
            0
          ]
        }
      },
      "dialog_turn_counter": 2,
      "dialog_stack": [
        {
          "dialog_node": "root"
        }
      ],
      "dialog_request_counter": 2,
      "branch_exited": true,
      "branch_exited_reason": "completed"
    }
  }
}


Now extract the intent and entities identified in the question, and optionally what dialog node was triggered (in case you need it later to customize your response). Some sample code is provided below, but you may need to modify it.

In [62]:
# TODO: Identify the intent(s) the user expressed (typically a single one)
query_intents = [intent["intent"] for intent in results["intents"]]
print("Intent(s):", query_intents)

# TODO: Extract the entities found in the question text
query_entities = [entity["value"] for entity in results["entities"]]
print("Entities:", query_entities)

# TODO: (optional) Find out what dialog node was triggered
query_dialog_node = [context['dialog_node'] for context in results["context"]['system']['dialog_stack']]
print("dialog node:", query_dialog_node)

query_dialognodes = [entity["entity"] for entity in results["entities"]]
print("dialog node:", query_dialognodes)

Intent(s): ['who']
Entities: ['Luke']
dialog node: ['root']
dialog node: ['Person']


### Query the collection

Design a query based on the information extracted above, and run it against the document collection. The sample query provided below simple looks for all the entities in the raw `text` field. Modify it to suit your needs.

Take a look at the [API Reference](https://www.ibm.com/watson/developercloud/discovery/api/v1/?python#query-collection) to learn more about the query options available, and for more guidance see this [documentation page](https://www.ibm.com/watson/developercloud/doc/discovery/using.html).

_**Note**: You may want to design different queries based on the intent / dialog node that was triggered._

In [66]:
# TODO: Query the Discovery service based on the intent and entities
query_results = discovery.query(environment_id=env_id, collection_id=col_id,
    query_options={
        "query": "text:{}".format(",".join("\"{}\"".format(e) for e in query_entities)),
        "return": "text"
    }).get_result()
print(json.dumps(query_results, indent=2))

{
  "matching_results": 84,
  "session_token": "1_j1sJmD7b2O6C1yn2_cqS6f8WLa",
  "results": [
    {
      "id": "f848842d-7e7b-44c7-9a07-8692fe01ff17",
      "result_metadata": {
        "score": 1
      },
      "extracted_metadata": {
        "sha1": "8c37b2d965338d249621229ed6dd5d8014684a6f",
        "filename": "n",
        "file_type": "json"
      },
      "text": "Three years after the the start of the Clone Wars between the Galactic Republic and the Confederacy of Independent Systems, war has gripped the galaxy. During a space battle over the planet Coruscant, Jedi Knights Obi-Wan Kenobi and Anakin Skywalker lead a mission to rescue the kidnapped Supreme Chancellor Palpatine from Separatist commander General Grievous. After infiltrating Grievous's flagship, the Jedi battle Count Dooku. Anakin subdues Dooku, and on Palpatine's urging, executes him. Grievous flees the battle-torn cruiser, which the Jedi crash-land on Coruscant. There, Anakin reunites with his wife, Padm\u00c3\u00

In [81]:
# TODO: Query the Discovery service based on the intent and entities
relations_query_results = discovery.query(environment_id=env_id, collection_id=col_id,
    query_options={
        "natural_language_query": question,
# return relations
        "return": "enriched_text.relations"

    }).get_result()
print(json.dumps(relations_query_results, indent=2))

{
  "matching_results": 84,
  "session_token": "1_j1sJmD7b2O6Gc3p2_cqS6f8WLa",
  "results": [
    {
      "id": "c4985279-fc35-4f92-b990-3f4676425845",
      "result_metadata": {
        "score": 1
      },
      "extracted_metadata": {
        "sha1": "f8d009a557f78cb2fd4116d06856bbf8caff23be",
        "filename": "n",
        "file_type": "json"
      },
      "text": "On Geonosis, Obi-Wan discovers a Separatist gathering led by Count Dooku, whom Obi-Wan learns had authorized Padm\u00c3\u00a9's assassination and is developing a battle droid army with Trade Federation Viceroy Nute Gunray. Obi-Wan transmits his findings to Anakin to relay to the Jedi Council, but is captured mid-transmission. With knowledge of the droid army, Supreme Chancellor Palpatine is voted emergency powers to send the clones into battle. Anakin and Padm\u00c3\u00a9 journey to Geonosis to rescue Obi-Wan, but are also captured. The three are sentenced to death, but are eventually saved by a battalion of Jedi and c

### Process returned results

If you properly structure the query, Watson is able to do a pretty good job of finding the relevant information. But the result returned is a JSON object. Now your task is to convert that result into an appropriate response that best addresses the original natural language question that was asked.

E.g. if the question was "Who saved Han Solo from Jabba the Hutt?" the answer should ideally just be "The Rebels" and not the entire paragraph describing Han Solo's rescue. But that can be a backup response if you cannot be more specific.

_**Note**: You may have to go back to the previous step and modify the query, especially what you want the Discovery service to return, and this may depend on the intent / dialog node triggered. E.g. study the different parts of a "relation" structure to see how you might construct queries to match them._

In [89]:
question

"Who is Luke's father?"

In [82]:
x =relations_query_results['results']
x[0]['enriched_text']['categories']

[{'label': '/law, govt and politics/politics', 'score': 0.599046},
 {'label': '/law, govt and politics/government/heads of state',
  'score': 0.568054},
 {'label': '/art and entertainment/comics and animation/comics',
  'score': 0.560328}]

In [88]:
x =relations_query_results['results']
x[0]['enriched_text']['concepts']

[{'dbpedia_resource': 'http://dbpedia.org/resource/Star_Wars_Episode_II:_Attack_of_the_Clones',
  'relevance': 0.922782,
  'text': 'Star Wars Episode II: Attack of the Clones'},
 {'dbpedia_resource': 'http://dbpedia.org/resource/Palpatine',
  'relevance': 0.800311,
  'text': 'Palpatine'},
 {'dbpedia_resource': 'http://dbpedia.org/resource/Obi-Wan_Kenobi',
  'relevance': 0.751928,
  'text': 'Obi-Wan Kenobi'},
 {'dbpedia_resource': 'http://dbpedia.org/resource/Star_Wars_Episode_III:_Revenge_of_the_Sith',
  'relevance': 0.7517,
  'text': 'Star Wars Episode III: Revenge of the Sith'},
 {'dbpedia_resource': 'http://dbpedia.org/resource/Clone_Wars_(Star_Wars)',
  'relevance': 0.662835,
  'text': 'Clone Wars'},
 {'dbpedia_resource': 'http://dbpedia.org/resource/Anakin_Skywalker',
  'relevance': 0.619955,
  'text': 'Anakin Skywalker'},
 {'dbpedia_resource': 'http://dbpedia.org/resource/Star_Wars_Episode_I:_The_Phantom_Menace',
  'relevance': 0.587378,
  'text': 'Star Wars Episode I: The Phanto

## 5. Reflections

**Q**: Now that you have gone through this exercise of designing a system that uses two IBM Watson services, what did you learn? What were some of the strengths and weaknesses of this approach?

**A**: Weaknesses: Extensive manual handling for all single details in the system which require time to understand the complete functionalities
Strength: it can be great start for even non programmer where every thing can be handle both using GUI and code with alot of built in functionalities 


## (Optional) Extensions

We have provided a set of sample data files containing Star Wars plot summaries. But as mentioned before, you are free to use your own dataset. In fact, a larger dataset maybe more suitable for use with IBM Watson's NLP services. If you used your own dataset, answer the following questions.

**Q**: What dataset did you use, and in what ways is it different from the sample files provided?

**A**: I used the same data but I found Arabic WhyQA dataset which I will utilize soon


**Q**: Either include your dataset in the .zip file or repository you submit, or provide clear instructions on how to obtain the dataset, so that your reviewer can run your notebook or inspect the data to verify your results.

**A**: http://xminers.club/2017/07/22/arabic-qa-dataset/


_You can also design a web-based application that utilizes these services and deploy that on Bluemix! If you do, please share with your instructors and peers._