# Introduction

First, we must ensure that the Watson Python SDK is installed and ready to use, we'll then import the SDK as well as the pandas library

In [1]:
!pip install ibm-watson==5.1.0

You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [1]:
from ibm_watson import DiscoveryV1
import pandas as pd

# Initialize Watson Discovery

Now we'll initialize Watson Discovery using our login credentials. In order to obtain these, create a Watson Discovery services with your IBM Cloud account, and generate new credentials.

In [2]:
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

disco_authenticator = IAMAuthenticator('{discovery_apikey}')
discovery = DiscoveryV1(
    version='{discovery_version}',
    authenticator=disco_authenticator
)

discovery.set_service_url('{discovery_url}')

# Creating the query

There are a few elements to querying Watson Discovery news, I'll break down each of the elements.

`environment_id`: `system` just denotes that we're using the system environment

`collection_id`: We want to query the news collection

`natural_language_query`: string for free text search

`offset`: The number of results (documents) to skip

`count`: The number of results (documents) to return 

***note: Count and offset is the way pagination of results is implemented, the maximum of total results (offset + count) cannot exceed 10,000***

`aggregation`: This is a analytic query of the results set

`filter`: The query for matching documents

`return_`: What items to actually return to us for our use

**For more information, check out the [query reference](https://cloud.ibm.com/docs/services/discovery/query-reference.html#query-reference)**


We are using DEFAULT_COUNT of 50, the maximum you can query at once, and incrementing until we've captured all available documents or hit 10000 (the maximum).

In [4]:
all_results = []

TOTAL_NUM_RESULTS = 50;
DEFAULT_COUNT = 50;
offset = 0;

STRING_TO_SEARCH = 'IBM NLP';

while offset + DEFAULT_COUNT <= TOTAL_NUM_RESULTS:
    try:
        discovery_results = discovery.query(environment_id='system',
                                 collection_id='news-en',
                                 natural_language_query ='IBM NLP',
                                 offset=offset,
                                 count=DEFAULT_COUNT,
                                 return_="url,author,title,text"
                                 ).get_result()
        
        # If the results are empty, stop querying
        if not discovery_results['results']:
            break
        
        # Add results to all_results and increment offset
        for result in discovery_results['results']:
            all_results.append(result)
        
        offset += offset + DEFAULT_COUNT
    except Exception as e:
        print("ERROR DETECTED")
        print(e)
        break

In [5]:
print(len(all_results))
all_results

28


[{'id': 'zXU_IvGcbN6mk3trg4EIgFGn-K-G4qqsdNRasL15JhHTiapcYOXPQMLJvYFIddYx',
  'result_metadata': {'score': 30.317368},
  'author': 'Oscar Nyonyo',
  'text': 'This year, they introduced Masters Fantasy with player insights powered by IBM Watson, enabling participants to make their picks by using #NLP to deliver #AI-generated insight into every tournament player. #IBM #HybridCloud #HoleInOne #TheMasters2021',
  'title': 'https://www.ibm.com/sports/masters/index.html IBM and the Masters are playing the long game. With over 25 years of partnership and innovation, they have delivered new digital experiences to create a… - Oscar Nyonyo -',
  'url': 'https://medium.com/@oscarnyonyo/ibm-and-the-masters-are-playing-the-long-game-97c53ac3ca17'},
 {'id': 'pDMawjSuR3ggmYk9OQwwZ3BoGXpRg8qwqg8rFuFuTU7htb27QyLB_Lb_AJFwOLTE',
  'result_metadata': {'score': 29.957527},
  'text': 'According to IBM’s Global AI Adoption Index, nearly one in three IT professionals say their business is now using AI, with 4

# Add Enrichments

Now that we've queried Watson Discovery, we want to analyze in detail each article.

So we are going to use the `url` info that we obtained for each result to perform a detailed query with **Natural Language Understanding**.

Note that this is different in comparison to the `enrich_text.sentiment` field that can be returned by Watson Discovery call, as it "only" analyzes what is contained within the `text` field

# Initialize Watson Natural Language Understanding 

Now we'll initialize Watson Natural Language Understanding using our login credentials. In order to obtain these, create a Watson Natural Language Understanding services with your IBM Cloud account, and generate new credentials.

In [12]:
from ibm_watson import NaturalLanguageUnderstandingV1
#from ibm_cloud_sdk_core.authenticators import IAMAuthenticator # previously imported

nlu_authenticator = IAMAuthenticator('{nlu_apikey}')
natural_language_understanding = NaturalLanguageUnderstandingV1(
    version='{nlu_version}',
    authenticator=nlu_authenticator
)

natural_language_understanding.set_service_url('{nlu_url}')

Select the features you want to extract with NLU.
For example, let's activate the **sentiment** analysis and the **keywords** extraction, and pass the url extracted for each result as a query parameter to the service

### Example test
Perform a simple API call to Natural Language Understanding service

In [17]:
import json
from ibm_watson.natural_language_understanding_v1 import Features, SentimentOptions, KeywordsOptions

nlu_results = natural_language_understanding.analyze(
    url="www.ibm.com",
    features=Features(
        sentiment=SentimentOptions(document=True),
        keywords=KeywordsOptions(emotion=True, limit=8))
).get_result()

print(json.dumps(nlu_results, indent=2))

{
  "usage": {
    "text_units": 2,
    "text_characters": 14200,
    "features": 2
  },
  "sentiment": {
    "document": {
      "score": 0.856627,
      "label": "positive"
    }
  },
  "retrieved_url": "https://www.ibm.com/de-de",
  "language": "de",
  "keywords": [
    {
      "text": "IBM Garage-Methodik",
      "relevance": 0.699605,
      "count": 1
    },
    {
      "text": "IBM Events",
      "relevance": 0.615311,
      "count": 2
    },
    {
      "text": "Monate lang",
      "relevance": 0.609183,
      "count": 1
    },
    {
      "text": "IBM",
      "relevance": 0.604353,
      "count": 2
    },
    {
      "text": "manuelle Prozesse",
      "relevance": 0.604219,
      "count": 1
    },
    {
      "text": "bessere Customer Journey",
      "relevance": 0.588956,
      "count": 1
    },
    {
      "text": "digitalen Erlebnisplattform",
      "relevance": 0.582406,
      "count": 1
    },
    {
      "text": "digitale Lerninhalte von IBM",
      "relevance": 0.559231,

In [23]:
for result in all_results:
    print("analyzing url",result['url'])
    try:
        nlu_results = natural_language_understanding.analyze(
            url=result['url'],
            features=Features(
                sentiment=SentimentOptions(document=True),
                keywords=KeywordsOptions(limit=8))
        ).get_result()
        
        sentiment_score = None
        keywords_from_url = ''
        if 'sentiment' in nlu_results:
            sentiment_score = nlu_results['sentiment']['document']['score']
        if 'keywords' in nlu_results and len(nlu_results['keywords']) > 0:
            for key in nlu_results['keywords']:
                keywords_from_url += key['text'] + ','
        result['sentiment'] = sentiment_score
        result['keywords'] = keywords_from_url
    except Exception as e:
        print("error finding enrichments for",result['url'])
        print(e)

analyzing url https://medium.com/@oscarnyonyo/ibm-and-the-masters-are-playing-the-long-game-97c53ac3ca17
sleeping...
analyzing url https://www.kmworld.com/Articles/ReadArticle.aspx?ArticleID=148261
sleeping...
analyzing url https://canadianinquirer.net/2021/09/google-and-microsoft-are-creating-a-monopoly-on-coding-in-plain-language/
sleeping...
analyzing url https://manometcurrent.com/global-nlp-in-healthcare-and-life-sciences-market-2021-future-set-to-massive-growth-top-key-player-3m-cerner-corporation-ibm-corporation-microsoft-corporation/
sleeping...
analyzing url https://www.vitalnews24.com/news/2021/08/23/%ef%bb%bfglobal-healthcare-natural-language-processing-nlp-market-2020-nlp-technologies-nec-apple-microsoft-dolbey-ibm/


ERROR:root:Could not fetch URL: Timeout exceeded when loading resource
Traceback (most recent call last):
  File "/Users/federico/.pyenv/versions/3.8.1/envs/base/lib/python3.8/site-packages/ibm_cloud_sdk_core/base_service.py", line 267, in send
    raise ApiException(
ibm_cloud_sdk_core.api_exception.ApiException: Error: Could not fetch URL: Timeout exceeded when loading resource, Code: 400 , X-global-transaction-id: db918e4c-6adb-434a-9e89-b78e5a2cffb8


error finding enrichments for https://www.vitalnews24.com/news/2021/08/23/%ef%bb%bfglobal-healthcare-natural-language-processing-nlp-market-2020-nlp-technologies-nec-apple-microsoft-dolbey-ibm/
Error: Could not fetch URL: Timeout exceeded when loading resource, Code: 400 , X-global-transaction-id: db918e4c-6adb-434a-9e89-b78e5a2cffb8
sleeping...
analyzing url https://technuter.com/breaking-news/ibm-watson-launches-new-ai-and-automation-features.html
sleeping...
analyzing url https://cio.eletsonline.com/news/ibm-watson-launches-new-features-for-better-customer-services/68281/
sleeping...
analyzing url https://mathematicexperts.com/new-trends-of-healthcare-natural-language-processing-nlp-market-increasing-demand-with-key-players-nec-corporation-apple-inc-microsoft-corporation-dolbey-systems-ibm-corporation/
sleeping...
analyzing url https://vmvirtualmachine.com/natural-language-processing-nlp-in-the-healthcare-market-key-trends-drivers/
sleeping...
analyzing url http://www.generaldaily.c

In [21]:
print(json.dumps(all_results, indent=2))

[
  {
    "id": "zXU_IvGcbN6mk3trg4EIgFGn-K-G4qqsdNRasL15JhHTiapcYOXPQMLJvYFIddYx",
    "result_metadata": {
      "score": 30.317368
    },
    "author": "Oscar Nyonyo",
    "text": "This year, they introduced Masters Fantasy with player insights powered by IBM Watson, enabling participants to make their picks by using #NLP to deliver #AI-generated insight into every tournament player. #IBM #HybridCloud #HoleInOne #TheMasters2021",
    "title": "https://www.ibm.com/sports/masters/index.html IBM and the Masters are playing the long game. With over 25 years of partnership and innovation, they have delivered new digital experiences to create a\u2026 - Oscar Nyonyo -",
    "url": "https://medium.com/@oscarnyonyo/ibm-and-the-masters-are-playing-the-long-game-97c53ac3ca17"
  },
  {
    "id": "pDMawjSuR3ggmYk9OQwwZ3BoGXpRg8qwqg8rFuFuTU7htb27QyLB_Lb_AJFwOLTE",
    "result_metadata": {
      "score": 29.957527
    },
    "text": "According to IBM\u2019s Global AI Adoption Index, nearly one in 