This notebook demonstrates how to take an input query and run it through Watson Natural Language Understanding Service and then use the output to create a filtered query in Watson Discovery.
This technique can be used to provide enhanced results for Discovery search. 

NOTE: This notebook does not contain any official IBM resources nor have any guarantee of functionality it is for example purposes only

First import the watson_developer_cloud python sdk 
https://github.com/watson-developer-cloud/python-sdk

In [13]:
import json
import csv
from watson_developer_cloud import DiscoveryV1
from watson_developer_cloud import NaturalLanguageUnderstandingV1
from watson_developer_cloud.natural_language_understanding_v1 import Features, EntitiesOptions, KeywordsOptions

Setup Discovery and NLU objects from the SDK 

In [38]:
discovery = DiscoveryV1(
    version='2018-08-01',
    ## url is optional, and defaults to the URL below. Use the correct URL for your region.
    url='https://gateway.watsonplatform.net/discovery/api',
    username='USERNAME',
    password='PASSWORD')

environment_id = FILL IN
collection_id = FILL IN

nlu = NaturalLanguageUnderstandingV1(
    version='2018-03-16',
    ## url is optional, and defaults to the URL below. Use the correct URL for your region.
    # url='https://gateway.watsonplatform.net/natural-language-understanding/api',
    username= USERNAME,
    password=PASSWORD)

There are many ways to use the output of NLU to build a query, this example demonstrates two approaches: 
1. Concatenate all the keyword and entity terms together with AND (`,` in WDS query language) to form a single query 
This approach can lose some important terms if NLU doesn't recognize them but has the advantage of potentially reducing noise for the query.
2. Filter on the keywords and entities and pass the original query as natural language
This approach helps narrow down the search but can be too restrictive if the terms extracted from NLU don't appear in the documents 

Another approach not shown here that can be used is:
3. Expand the query with entity types
This approach takes the entities from the query and addds the identified type for those entities to the original query. This approach is most useful when you have defined custom types using Watson Knowledge Studio that bring some additional contextual meaning to the entities. 

In [39]:
def simpleQuery(originalQuery, nluResponse):
    
    if("keywords" in nluResponse):
        keywords = []
        for keyword in nluResponse["keywords"]:
            keywords.append("\"" + keyword["text"] + "\"")
        
    if("entities" in nluResponse):
        entities = []
        for entity in nluResponse["entities"]:
            entities.append("\"" + entity["text"] + "\"")
        
    queryTerms = keywords + entities
    query = ",".join(queryTerms)
    
    print(query)
            
    wdsResponse = discovery.query(environment_id=environment_id, 
                            collection_id=collection_id, 
                            query=query).get_result()
    
    return wdsResponse

In [43]:
def filterQuery(originalQuery, nluResponse):
    
    if("keywords" in nluResponse):
        keywords = []
        for keyword in nluResponse["keywords"]:
            if(keyword["relevance"] > 0.5):
                keywords.append("\"" + keyword["text"] + "\"")
            
    if("entities" in nluResponse):
        entities = []
        for entity in nluResponse["entities"]:
            if(entity["relevance"] > 0.5):
                entities.append("\"" + entity["text"] + "\"")
                
    filterTerms = keywords + entities
    filter = ",".join(filterTerms)
    
    print(filter)
            
    wdsResponse = discovery.query(environment_id=environment_id, 
                            collection_id=collection_id, 
                            natural_language_query=originalQuery,
                            filter=filter).get_result()
    
    return wdsResponse

Read from a list of input queries and run the query through NLU extracting keywords and entities. Then run one of the WDS query options making use of this output.

In [46]:
with open("question_list.txt") as list:
    for query in list:
        response = nlu.analyze(
            text=query,
            features=Features(entities=EntitiesOptions(),
                              keywords=KeywordsOptions())).get_result()
        
        print("**NLU OUTPUT:\n" + json.dumps(response,indent=2))
        
        result = simpleQuery(query, response)
        print("**WDS OUTPUT: \n" + json.dumps(result, indent=2))

NLU OUTPUT:
{
  "usage": {
    "text_units": 1,
    "text_characters": 47,
    "features": 2
  },
  "language": "en",
  "keywords": [
    {
      "text": "water damage",
      "relevance": 0.999999,
      "count": 1
    },
    {
      "text": "homeowners insurance",
      "relevance": 0.009985,
      "count": 1
    }
  ],
  "entities": []
}
"water damage","homeowners insurance"
WDS OUTPUT: 
{
  "matching_results": 3,
  "session_token": "1_IQA7DTdOES97NEu5JFb8PMj7I6",
  "results": [
    {
      "id": "858",
      "result_metadata": {
        "score": 12.815016
      },
      "extracted_metadata": {
        "sha1": "7d2429b5be08672e04187783a4961df9999da7d6",
        "filename": "insurancelib_857.json",
        "file_type": "json"
      },
      "text": " Homeowners insurance will cover water damage from a roof leak if the roof sustained damage from a recent storm and the damage to the roof resulted in damage inside the home. If the roof has issues that have gone unrepaired, water damage 