<a href="https://colab.research.google.com/github/RitinDev/projects-programming-data-sciences/blob/main/02-WebAPIs/B2-IBM_Watson_Natural_Language_API.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Interacting with the IBM Watson Natural Language Understanding API

Another useful API, especially when dealing with text, is the [IBM Watson  Natural Language Understanding API](https://console.bluemix.net/catalog/services/natural-language-understanding), which offers a variety of text analysis functionalities, such as sentiment analysis, entity extraction, keyword extraction, etc.

We will give a couple of examples below, to understand how we can take an unstructured piece of text (either the text alone, or a URL with text), and perform some analysis.





## Sentiment ana emotion analysis

We will first start with the `/analyze` API call ([documentation](https://cloud.ibm.com/apidocs/natural-language-understanding#analyzeget)), which takes as input a piece of text, and returns an analysis across various dimensions. 

The API supports the following analyses:

`categories,classifications,concepts,emotion,entities,keywords,metadata,relations,semantic_roles,sentiment,summarization (experimental),syntax`

The API supports not only English, but also a [variety of non-English languages](https://cloud.ibm.com/docs/natural-language-understanding?topic=natural-language-understanding-detectable-languages).

In our introductory attempt, we will use the `sentiment` and `emotion` and focus on English texts. 



In [4]:
import requests

In [1]:
URL = 'https://api.us-south.natural-language-understanding.watson.cloud.ibm.com/instances/9e683088-0d12-4399-8118-518f3e60e8c4'

# My own API key. It may run out of quota
# You can register and get your own credentials
# The ones below have a quota of 1000 calls per day 
# and can run out quickly if multiple people use these
API_KEY = 'yx39wyiwPNGm7DoDUPCSJB4SzFkr0qurARfbGYyEdaoC'

def analyzeText(text=None, url=None):

    endpoint = f"{URL}/v1/analyze"
    username = "apikey"
    password = API_KEY
    
    parameters = {
        'features': 'emotion,sentiment',
        'version' : '2022-04-07',
        'text': text,
        'language' : 'en',
        'url' : url # this is an alternative to sending the text
    }

    resp = requests.get(endpoint, params=parameters, auth=(username, password))
    
    return resp.json()

In [2]:
# We will analyze the text below using the IBM Watson API

review = '''
I got their Egg & Cheese sandwich on a Whole Wheat Everything Bagel. 
First off, I loved loved loved the texture of the bagel itself. 
It was very chewy yet soft, which is a top feature for a NY style bagel. 
However, I thought there could've been more seasoning on top of 
the bagel as I found the bagel itself to be a bit bland. 

Speaking of bland, I thought the egg and cheese filling were also quite bland. 
This was definitely lacking salt and pepper in the eggs and the cheese didn't
really add too much flavor either, which was really disappointing! 
My mom also had the same complaint with her bagel sandwich 
(she had the egg sandwich on a blueberry bagel) so I definitely wasn't 
the only one.

'''

In [5]:
data = analyzeText(text=review)
data

{'usage': {'text_units': 1, 'text_characters': 707, 'features': 2},
 'sentiment': {'document': {'score': -0.600662, 'label': 'negative'}},
 'language': 'en',
 'emotion': {'document': {'emotion': {'sadness': 0.167794,
    'joy': 0.370866,
    'fear': 0.039799,
    'disgust': 0.164856,
    'anger': 0.196751}}}}

Now, let's try to understand the structure of the answer. First, we check the high-level keys.

In [6]:
data.keys()

dict_keys(['usage', 'sentiment', 'language', 'emotion'])

Now, let's check the content of these keys:

In [7]:
data['language']

'en'

In [8]:
data['sentiment']

{'document': {'score': -0.600662, 'label': 'negative'}}

In [9]:
data['emotion']

{'document': {'emotion': {'sadness': 0.167794,
   'joy': 0.370866,
   'fear': 0.039799,
   'disgust': 0.164856,
   'anger': 0.196751}}}

In [10]:
# Let's go deeper into the 'emotion' dictionary
data['emotion']['document']

{'emotion': {'sadness': 0.167794,
  'joy': 0.370866,
  'fear': 0.039799,
  'disgust': 0.164856,
  'anger': 0.196751}}

In [11]:
# And a bit more
data['emotion']['document']['emotion']

{'sadness': 0.167794,
 'joy': 0.370866,
 'fear': 0.039799,
 'disgust': 0.164856,
 'anger': 0.196751}

### Exercise 1

Type your own piece of text, and analyze it to extract sentiment and emotions. Discuss your findings

In [14]:
my_own_text = '''
The FitnessGram™ Pacer Test is a multistage aerobic capacity test that progressively gets more difficult as it continues. The 20 meter pacer test will begin in 30 seconds. Line up at the start. The running speed starts slowly, but gets faster each minute after you hear this signal. [beep] A single lap should be completed each time you hear this sound. [ding] Remember to run in a straight line, and run as long as possible. The second time you fail to complete a lap before the sound, your test is over. The test will begin on the word start. On your mark, get ready, start.
'''

data = analyzeText( text = my_own_text )
print(f"Language: {data['language']}")
print(f"Sentiment: {data['sentiment']['document']}")
print(f"Emotion: {data['emotion']['document']}")

Language: en
Sentiment: {'score': 0.267975, 'label': 'positive'}
Emotion: {'emotion': {'sadness': 0.186833, 'joy': 0.425755, 'fear': 0.225033, 'disgust': 0.016703, 'anger': 0.059469}}


### Exercise 2

Below is slightly different call, which takes as input a URL to analyze, instead of a piece of text. Use it to analyze a URL of your choice

In [18]:
news_url = 'https://xarangi.github.io/'
analyzeText(url = news_url)

{'usage': {'text_units': 3, 'text_characters': 20223, 'features': 2},
 'sentiment': {'document': {'score': -0.375663, 'label': 'negative'}},
 'retrieved_url': 'https://xarangi.github.io/',
 'language': 'en',
 'emotion': {'document': {'emotion': {'sadness': 0.340438,
    'joy': 0.296978,
    'fear': 0.093135,
    'disgust': 0.036156,
    'anger': 0.072841}}}}

## Entities call

The code below changes slightly the way that we way that we call the API. Instead of asking for sentiment and emotion, we instead ask to extract entities from the text, and also the sentiment and emotion for each of these entities.

In terms of natural language processing, we will examine a couple of capabilities of the API. First, you will see that there is the capability of "normalizing" each entity, so that two different ways of saying the same thing get mapped to the same entity. So for example, "President Trump" and "Donald Trump" get mapped to the same Knowledge Graph entity.

In [19]:
URL = 'https://api.us-south.natural-language-understanding.watson.cloud.ibm.com/instances/9e683088-0d12-4399-8118-518f3e60e8c4'

API_KEY = 'yx39wyiwPNGm7DoDUPCSJB4SzFkr0qurARfbGYyEdaoC'

def extractEntities(text=None, url=None):

    endpoint = f"{URL}/v1/analyze"
    username = "apikey"
    password = API_KEY
    
    parameters = {
        'features': 'entities',
        'version' : '2022-04-07',
        'entities.limit' : 10,
        'entities.sentiment' : True,
        'entities.emotion' : True,
        'text': text,
        'language' : 'en',
        'url' : url # this is an alternative to sending the text
    }

    resp = requests.get(endpoint, params=parameters, auth=(username, password))
    
    return resp.json()

In [23]:
news_url = 'https://xarangi.github.io/'

data = extractEntities(url=news_url)

In [24]:
data

{'usage': {'text_units': 3, 'text_characters': 20223, 'features': 1},
 'retrieved_url': 'https://xarangi.github.io/',
 'language': 'en',
 'entities': [{'type': 'Organization',
   'text': 'NYUAD',
   'sentiment': {'score': 0.65601, 'label': 'positive'},
   'relevance': 0.949965,
   'emotion': {'sadness': 0.131668,
    'joy': 0.63642,
    'fear': 0.048552,
    'disgust': 0.04511,
    'anger': 0.07095},
   'count': 2,
   'confidence': 0.897769},
  {'type': 'Facility',
   'text': 'echo chambers',
   'sentiment': {'score': -0.835959, 'label': 'negative'},
   'relevance': 0.373493,
   'emotion': {'sadness': 0.739389,
    'joy': 0.044319,
    'fear': 0.105493,
    'disgust': 0.025149,
    'anger': 0.017458},
   'count': 1,
   'confidence': 0.155708},
  {'type': 'JobTitle',
   'text': 'independent researchers',
   'sentiment': {'score': -0.671545, 'label': 'negative'},
   'relevance': 0.367642,
   'emotion': {'sadness': 0.506375,
    'joy': 0.355752,
    'fear': 0.039037,
    'disgust': 0.0433

In [25]:
# Let's see what we get back as top-level attributes
data.keys()

dict_keys(['usage', 'retrieved_url', 'language', 'entities'])

In [26]:
# Let' see the entities list
data["entities"]

[{'type': 'Organization',
  'text': 'NYUAD',
  'sentiment': {'score': 0.65601, 'label': 'positive'},
  'relevance': 0.949965,
  'emotion': {'sadness': 0.131668,
   'joy': 0.63642,
   'fear': 0.048552,
   'disgust': 0.04511,
   'anger': 0.07095},
  'count': 2,
  'confidence': 0.897769},
 {'type': 'Facility',
  'text': 'echo chambers',
  'sentiment': {'score': -0.835959, 'label': 'negative'},
  'relevance': 0.373493,
  'emotion': {'sadness': 0.739389,
   'joy': 0.044319,
   'fear': 0.105493,
   'disgust': 0.025149,
   'anger': 0.017458},
  'count': 1,
  'confidence': 0.155708},
 {'type': 'JobTitle',
  'text': 'independent researchers',
  'sentiment': {'score': -0.671545, 'label': 'negative'},
  'relevance': 0.367642,
  'emotion': {'sadness': 0.506375,
   'joy': 0.355752,
   'fear': 0.039037,
   'disgust': 0.043384,
   'anger': 0.016998},
  'count': 1,
  'confidence': 0.539512},
 {'type': 'Organization',
  'text': 'Association for the Advancement of Artificial Intelligence (AAAI',
  'sent

In [29]:
# Let' see the 7th entity. Notice the "disambiguated" attribute that
# points to "canonical" versions of the entity, in DBPedia
# Notice that "Patriarch Kirill"" is the actual term used in the text
data["entities"][6]

{'type': 'Organization',
 'text': 'Twitter',
 'sentiment': {'score': -0.786611, 'label': 'negative'},
 'relevance': 0.237108,
 'emotion': {'sadness': 0.143802,
  'joy': 0.122481,
  'fear': 0.055955,
  'disgust': 0.031158,
  'anger': 0.075986},
 'disambiguation': {'subtype': ['Website', 'Company', 'VentureFundedCompany'],
  'name': 'Twitter',
  'dbpedia_resource': 'http://dbpedia.org/resource/Twitter'},
 'count': 2,
 'confidence': 0.991189}

In [30]:
# Let's put the results in a dataframe, so that we can browse easier
import pandas as pd

pd.json_normalize(data['entities'])

Unnamed: 0,type,text,relevance,count,confidence,sentiment.score,sentiment.label,emotion.sadness,emotion.joy,emotion.fear,emotion.disgust,emotion.anger,disambiguation.subtype,disambiguation.name,disambiguation.dbpedia_resource
0,Organization,NYUAD,0.949965,2,0.897769,0.65601,positive,0.131668,0.63642,0.048552,0.04511,0.07095,,,
1,Facility,echo chambers,0.373493,1,0.155708,-0.835959,negative,0.739389,0.044319,0.105493,0.025149,0.017458,,,
2,JobTitle,independent researchers,0.367642,1,0.539512,-0.671545,negative,0.506375,0.355752,0.039037,0.043384,0.016998,,,
3,Organization,Association for the Advancement of Artificial ...,0.366279,1,0.41565,-0.741511,negative,0.506375,0.355752,0.039037,0.043384,0.016998,,,
4,Organization,Princeton’s ESOC COVID,0.361858,1,0.294647,-0.733947,negative,0.16272,0.098013,0.034453,0.043754,0.046537,,,
5,Location,Abu Dhabi,0.28451,1,0.53532,0.65601,positive,0.120466,0.795878,0.042534,0.014642,0.049226,,,
6,Organization,Twitter,0.237108,2,0.991189,-0.786611,negative,0.143802,0.122481,0.055955,0.031158,0.075986,"[Website, Company, VentureFundedCompany]",Twitter,http://dbpedia.org/resource/Twitter
7,Organization,LIX,0.212657,1,0.6716,-0.46423,negative,0.291288,0.165589,0.084342,0.026022,0.019003,,,
8,JobTitle,specialists,0.19276,1,0.598181,-0.439049,negative,0.217601,0.052692,0.053525,0.020772,0.082558,,,
9,JobTitle,U.S. president,0.177453,1,0.305451,0.690893,positive,0.349607,0.191414,0.013034,0.220007,0.016376,,,


### Exercise

* First of all, **get your own credentials for the IBM Watson API**. The demo key that we use above has a limited quota.
* Use an API to get news articles. 
    * Option 1: Use the API at https://newsapi.org to fetch the news from various sources. Print the entities that are currently being discussed in the news, together with their relevance value and the associated sentiment.
    * Option 2: Use the NY Times API to fetch the Top Stories News. You can register and get an API key at https://developer.nytimes.com/. The `Top Stories V2 API` provides the details of the news of the day: (The API call documentation is at https://developer.nytimes.com/docs/top-stories-product/1/overview and the API Call is  https://api.nytimes.com/svc/topstories/v2/home.json?api-key=PUTYOURKEYHERE). Repeat the entity extraction process from above.
    * Option 3: Use the Guardian API at https://open-platform.theguardian.com/documentation/ to fetch news from The Guardian.


In [35]:
# !sudo -H pip3 install newsapi-python
!pip install newsapi-python

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting newsapi-python
  Downloading newsapi_python-0.2.6-py2.py3-none-any.whl (7.9 kB)
Installing collected packages: newsapi-python
Successfully installed newsapi-python-0.2.6


In [69]:
from newsapi import NewsApiClient

# Init
newsapi = NewsApiClient(api_key='a7eab21c34e545dba418c4344d59a54f')

# /v2/top-headlines
top_headlines = newsapi.get_top_headlines(q='cricket')

# top_headlines.keys()
articles = top_headlines['articles']
for article in articles:
    article_url = article['url']
    data = analyzeText(url=article_url)
    print(data['sentiment']['document'])
    print(data['emotion']['document'])
    print()

{'score': -0.70189, 'label': 'negative'}
{'emotion': {'sadness': 0.349559, 'joy': 0.552414, 'fear': 0.043578, 'disgust': 0.020817, 'anger': 0.027951}}

{'score': 0.34513, 'label': 'positive'}
{'emotion': {'sadness': 0.167893, 'joy': 0.403298, 'fear': 0.070387, 'disgust': 0.042444, 'anger': 0.102672}}

