# Introduction

First, we must ensure that the Watson Python SDK is installed and ready to use, we'll then import the SDK as well as the pandas library

In [None]:
!pip install ibm-watson==5.1.0

In [None]:
from ibm_watson import DiscoveryV1
import pandas as pd

# Initialize Watson Discovery

Now we'll initialize Watson Discovery using our login credentials. In order to obtain these, create a Watson Discovery services with your IBM Cloud account, and generate new credentials.

In [None]:
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

disco_authenticator = IAMAuthenticator('{discovery_apikey}')
discovery = DiscoveryV1(
    version='{discovery_version}',
    authenticator=disco_authenticator
)

discovery.set_service_url('{discovery_url}')

# Creating the query

There are a few elements to querying Watson Discovery news, I'll break down each of the elements.

`environment_id`: `system` just denotes that we're using the system environment

`collection_id`: We want to query the news collection

`natural_language_query`: string for free text search

`offset`: The number of results (documents) to skip

`count`: The number of results (documents) to return 

***note: Count and offset is the way pagination of results is implemented, the maximum of total results (offset + count) cannot exceed 10,000***

`aggregation`: This is a analytic query of the results set

`filter`: The query for matching documents

`return_`: What items to actually return to us for our use

**For more information, check out the [query reference](https://cloud.ibm.com/docs/services/discovery/query-reference.html#query-reference)**


We are using DEFAULT_COUNT of 50, the maximum you can query at once, and incrementing until we've captured all available documents or hit 10000 (the maximum).

In [None]:
all_results = []

TOTAL_NUM_RESULTS = 50;
DEFAULT_COUNT = 50;
offset = 0;

STRING_TO_SEARCH = 'IBM NLP';

while offset + DEFAULT_COUNT <= TOTAL_NUM_RESULTS:
    try:
        discovery_results = discovery.query(environment_id='system',
                                 collection_id='news-en',
                                 natural_language_query ='IBM NLP',
                                 offset=offset,
                                 count=DEFAULT_COUNT,
                                 return_="url,author,title,text"
                                 ).get_result()
        
        # If the results are empty, stop querying
        if not discovery_results['results']:
            break
        
        # Add results to all_results and increment offset
        for result in discovery_results['results']:
            all_results.append(result)
        
        offset += offset + DEFAULT_COUNT
    except Exception as e:
        print("ERROR DETECTED")
        print(e)
        break

In [None]:
print(len(all_results))
all_results

# Add Enrichments

Now that we've queried Watson Discovery, we want to analyze in detail each article.

So we are going to use the `url` info that we obtained for each result to perform a detailed query with **Natural Language Understanding**.

Note that this is different in comparison to the `enrich_text.sentiment` field that can be returned by Watson Discovery call, as it "only" analyzes what is contained within the `text` field

# Initialize Watson Natural Language Understanding 

Now we'll initialize Watson Natural Language Understanding using our login credentials. In order to obtain these, create a Watson Natural Language Understanding services with your IBM Cloud account, and generate new credentials.

In [None]:
from ibm_watson import NaturalLanguageUnderstandingV1
#from ibm_cloud_sdk_core.authenticators import IAMAuthenticator # previously imported

nlu_authenticator = IAMAuthenticator('{nlu_apikey}')
natural_language_understanding = NaturalLanguageUnderstandingV1(
    version='{nlu_version}',
    authenticator=nlu_authenticator
)

natural_language_understanding.set_service_url('{nlu_url}')

Select the features you want to extract with NLU.
For example, let's activate the **sentiment** analysis and the **keywords** extraction, and pass the url extracted for each result as a query parameter to the service

### Example test
Perform a simple API call to Natural Language Understanding service

In [None]:
import json
from ibm_watson.natural_language_understanding_v1 import Features, SentimentOptions, KeywordsOptions

nlu_results = natural_language_understanding.analyze(
    url="www.ibm.com",
    features=Features(
        sentiment=SentimentOptions(document=True),
        keywords=KeywordsOptions(emotion=True, limit=8))
).get_result()

print(json.dumps(nlu_results, indent=2))

In [None]:
for result in all_results:
    print("analyzing url",result['url'])
    try:
        nlu_results = natural_language_understanding.analyze(
            url=result['url'],
            features=Features(
                sentiment=SentimentOptions(document=True),
                keywords=KeywordsOptions(limit=8))
        ).get_result()
        
        sentiment_score = None
        keywords_from_url = ''
        if 'sentiment' in nlu_results:
            sentiment_score = nlu_results['sentiment']['document']['score']
        if 'keywords' in nlu_results and len(nlu_results['keywords']) > 0:
            for key in nlu_results['keywords']:
                keywords_from_url += key['text'] + ','
        result['sentiment'] = sentiment_score
        result['keywords'] = keywords_from_url
    except Exception as e:
        print("error finding enrichments for",result['url'])
        print(e)

In [None]:
print(json.dumps(all_results, indent=2))