# Interacting with the IBM Watson Natural Language Understanding API; POST vs GET

Another useful API, especially when dealing with text, is the [IBM Watson  Natural Language Understanding API](https://console.bluemix.net/catalog/services/natural-language-understanding), which offers a variety of text analysis functionalities, such as sentiment analysis, entity extraction, keyword extraction, etc.

We will give a couple of examples below, to understand how we can take an unstructured piece of text (either the text alone, or a URL with text), and extract some "semi-structured" representation of its content.



## /analyze call

We will first start with the `GET /analyze` API call ([documentation](https://www.ibm.com/watson/developercloud/natural-language-understanding/api/v1/#get-analyze)), which takes as input a piece of text, and returns an analysis across various dimensions.

The call below gets as input a "text" variable, and returns back the sentiment of the text.

In [None]:
import requests
import json

def getSentiment(text):
    endpoint = "https://gateway.watsonplatform.net/natural-language-understanding/api/v1/analyze"

    # You can register and get your own credentials
    # The ones below have a quota of 1000 calls per day 
    # and can run out quickly if multiple people use these
    username = "711b4792-170d-490f-ac0e-5d785a271868"
    password = "No7JUOE1UOYu"
    
    parameters = {
        'features': 'emotion,sentiment',
        'version' : '2017-02-27',
        'text': text,
        'language' : 'en',
        # url = url_to_analyze, this is an alternative to sending the text
    }

    resp = requests.get(endpoint, params=parameters, auth=(username, password))
    
    return resp.json()

In [None]:
# Some text from https://www.nytimes.com/2018/09/09/sports/tennis/serena-williams-us-open-equality.html
# 
# We will analyze the text below using the IBM Watson API

text = '''
That might have been something the chair umpire Carlos Ramos could have said to defuse the tension in the women’s final Saturday, which descended into chaos when he penalized Serena Williams in the second, decisive set.

She imploded after Ramos issued her a warning about receiving illegal coaching and then penalized her twice later in the second set, once when she threw down her racket and then again after she called him a liar and a thief.

Naomi Osaka, a 20-year-old from Japan, showed amazing poise amid the disarray and overpowered her childhood hero during her win, which was her first Grand Slam title.

But there was hardly an ounce of joy in the victory. The match tarnished tennis and was a stinging blow to sportsmanship.
'''

In [None]:
data = getSentiment(text)

Now, let's try to understand the structure of the answer. First, we check the high-level keys.

In [None]:
data.keys()

Now, let's check the content of these keys:

In [None]:
data['language']

In [None]:
data['sentiment']

In [None]:
data['emotion']

In [None]:
# Let's go deeper into the 'emotion' dictionary
data['emotion']['document']

In [None]:
# And a bit more
data['emotion']['document']['emotion']

### Exercise

Type your own piece of text, and analyze it to extract sentiment and emotions. Discuss your findings

## Entities call

[Full Documentation of the call](https://www.ibm.com/watson/developercloud/natural-language-understanding/api/v1/#entities)

This is a an API call that extracts entities from the text, and also the sentiment and emotion for each of these entities.

There are two new technical aspects with this API. First of all, we use the command `requests.post` as opposed to `requests.get`. This happens because `GET` is designed to handle limited amount of data. When we have a large volume of data to send as parameters, then the HTTP protocol requires the use of the `POST` command. You will also see that the parameters that we pass are not "flat" as they used to be. Instead we submit the `watson_options` as the set of parameters, which is itself semi-structured.

In terms of natural language processing, we will examine a couple of capabilities of the API. First, you will see that there is the capability of "normalizing" each entity, so that two different ways of saying the same thing get mapped to the same entity. So for example, "President Trump" and "Donald Trump" get mapped to the same Knowledge Graph entity.

In [None]:
import requests
import json

def processURL(url_to_analyze):
    endpoint_watson = "https://gateway.watsonplatform.net/natural-language-understanding/api/v1/analyze"
    params = {
        'version': '2017-02-27',
    }
    headers = { 
        'Content-Type': 'application/json',
    }
    watson_options = {
      "url": url_to_analyze,
      "features": {
        "entities": {
          "sentiment": True,
          "emotion": True,
          "limit": 10
        }
      }
    }
    username = "711b4792-170d-490f-ac0e-5d785a271868"
    password = "No7JUOE1UOYu"

    resp = requests.post(endpoint_watson, 
                         data=json.dumps(watson_options), 
                         headers=headers, 
                         params=params, 
                         auth=(username, password) 
                        )
    return resp.json()


url_to_analyze = 'https://www.nytimes.com/2018/09/09/sports/tennis/serena-williams-us-open-equality.html'

data = processURL(url_to_analyze)

In [None]:
# Let's see what we get back as top-level attributes
data.keys()

In [None]:
# Let' see the entities list
data["entities"]

In [None]:
# Let' see the first entity. Notice the "disambiguated" attribute that
# points to "canonical" versions of the entity, in DBPedia, Freebase, OpenCYC, YAGO, etc
data["entities"][0]

In [None]:
# This function takes as input the result
# from the IBM Watson API and returns a list
# of entities that are relevant (above threshold)
# to the article
def getEntities(data, threshold):
    result = []
    for entity in data["entities"]:
        relevance = float(entity['relevance'])
        if relevance > threshold:
            result.append(entity['text'])
    return result

getEntities(data, 0.25)

### Exercise

* First of all, **get your own credentials for the IBM Watson API**. The demo key that we use above has a limited quota.
* Use an API to get news articles. 
    * Option 1: Use the API at https://newsapi.org to fetch the news from various sources. Print the entities that are currently being discussed in the news, together with their relevance value and the associated sentiment.
    * Option 2: Use the NY Times API to fetch the Top Stories News. You can register and get an API key at https://developer.nytimes.com/. The `Top Stories V2 API` provides the details of the news of the day: (The API call documentation is at https://developer.nytimes.com/top_stories_v2.json and the API Call is  https://api.nytimes.com/svc/topstories/v2/home.json?api-key=PUTYOURKEYHERE). Repeat the entity extraction process from above.
    * Option 3: Use the Guardian API at https://open-platform.theguardian.com/documentation/ to fetch news from The Guardian.
