<a href="https://colab.research.google.com/github/RitinDev/projects-programming-data-sciences/blob/main/class3/Assignment1_Final.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Interacting with the IBM Watson Natural Language Understanding API

Another useful API, especially when dealing with text, is the [IBM Watson  Natural Language Understanding API](https://console.bluemix.net/catalog/services/natural-language-understanding), which offers a variety of text analysis functionalities, such as sentiment analysis, entity extraction, keyword extraction, etc.

We will give a couple of examples below, to understand how we can take an unstructured piece of text (either the text alone, or a URL with text), and perform some analysis.





## Sentiment ana emotion analysis

We will first start with the `/analyze` API call ([documentation](https://cloud.ibm.com/apidocs/natural-language-understanding#analyzeget)), which takes as input a piece of text, and returns an analysis across various dimensions. 

The API supports the following analyses:

`categories,classifications,concepts,emotion,entities,keywords,metadata,relations,semantic_roles,sentiment,summarization (experimental),syntax`

The API supports not only English, but also a [variety of non-English languages](https://cloud.ibm.com/docs/natural-language-understanding?topic=natural-language-understanding-detectable-languages).

In our introductory attempt, we will use the `sentiment` and `emotion` and focus on English texts. 



In [94]:
import requests

In [95]:
'''URL = 'https://api.us-south.natural-language-understanding.watson.cloud.ibm.com/instances/9e683088-0d12-4399-8118-518f3e60e8c4'

# My own API key. It may run out of quota
# You can register and get your own credentials
# The ones below have a quota of 1000 calls per day 
# and can run out quickly if multiple people use these
API_KEY = 'yx39wyiwPNGm7DoDUPCSJB4SzFkr0qurARfbGYyEdaoC'''

URL = 'https://api.eu-gb.natural-language-understanding.watson.cloud.ibm.com/instances/d56bc7f9-88a8-4b7a-aa86-0557ca745925'
API_KEY = 'xUQrt-jdAvneYY4cC7oAhLzCkS-RrDldfPfarjn9kBwl'

def analyzeText(text=None, url=None):

    endpoint = f"{URL}/v1/analyze"
    username = "apikey"
    password = API_KEY
    
    parameters = {
        'features': 'emotion,sentiment',
        'version' : '2022-04-07',
        'text': text,
        'language' : 'en',
        'url' : url # this is an alternative to sending the text
    }

    resp = requests.get(endpoint, params=parameters, auth=(username, password))
    
    return resp.json()

### Exercise

* First of all, **get your own credentials for the IBM Watson API**. The demo key that we use above has a limited quota.
* Use an API to get news articles. 
    * Option 1: Use the API at https://newsapi.org to fetch the news from various sources. Print the entities that are currently being discussed in the news, together with their relevance value and the associated sentiment.
    * Option 2: Use the NY Times API to fetch the Top Stories News. You can register and get an API key at https://developer.nytimes.com/. The `Top Stories V2 API` provides the details of the news of the day: (The API call documentation is at https://developer.nytimes.com/docs/top-stories-product/1/overview and the API Call is  https://api.nytimes.com/svc/topstories/v2/home.json?api-key=PUTYOURKEYHERE). Repeat the entity extraction process from above.
    * Option 3: Use the Guardian API at https://open-platform.theguardian.com/documentation/ to fetch news from The Guardian.


In [96]:
# !sudo -H pip3 install newsapi-python
!pip install newsapi-python

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [97]:
from newsapi import NewsApiClient

# Init
newsapi = NewsApiClient(api_key='a7eab21c34e545dba418c4344d59a54f')

# /v2/top-headlines
top_headlines = newsapi.get_top_headlines(q = 'queen',
                                          language='en',
                                          page_size=100)

# top_headlines.keys()
articles = top_headlines['articles']

'''for article in articles:
    article_url = article['url']
    # print(article)
    print(article['content'])
    print(article['url'])
    data = analyzeText(url=article_url)
    print(data['sentiment']['document'])
    print(data['emotion']['document'])'''

"for article in articles:\n    article_url = article['url']\n    # print(article)\n    print(article['content'])\n    print(article['url'])\n    data = analyzeText(url=article_url)\n    print(data['sentiment']['document'])\n    print(data['emotion']['document'])"

In [98]:
!sudo pip3 install -U -q PyMySQL sqlalchemy sql_magic

In [99]:
from sqlalchemy import create_engine

conn_string = "mysql+pymysql://{user}:{password}@{host}/".format(
    host="db.ipeirotis.org", user="student", password="dwdstudent2015"
)

engine = create_engine(conn_string)

In [100]:
# Query to create a database
db_name = "public"
create_db_query = (
    f"CREATE DATABASE IF NOT EXISTS {db_name} DEFAULT CHARACTER SET 'utf8'"
)

# Create a database
engine.execute(create_db_query)

<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7fdf0bc76e10>

In [101]:
suffix = "rm5486"
table_name = f"{suffix}_news"
# Create a table
create_table_query = f"""CREATE TABLE IF NOT EXISTS {db_name}.{table_name} 
                                (content varchar(1000), 
                                url varchar(500), 
                                PRIMARY KEY(url)
                                )"""
engine.execute(create_table_query)

<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7fdf0be5dd50>

In [102]:
query_template = f"""
                    INSERT IGNORE INTO 
                    {db_name}.{table_name}(content,  url) 
                    VALUES (%s, %s)
                  """
for article in articles:
    content = article['content']
    url = article['url']

    # print("Inserting article", content, "with", url, "as URL")
    query_parameters = (content, url)
    engine.execute(query_template, query_parameters)

In [103]:
results = engine.execute(f"SELECT * FROM {db_name}.{table_name}")
rows = results.fetchall()
results.close()

In [104]:
for row in rows:
    print("Content:", row["content"])
    print("URL:", row["url"])
    print("=============================================")

Content: The Chinese vice-president, Wang Qishan, is to attend the Queens funeral in a move that has prompted complaints from a group of British Conservative MPs that have been banned from travelling to China… [+3987 chars]
URL: https://amp.theguardian.com/uk-news/2022/sep/15/anger-among-mps-as-chinese-vice-president-to-attend-queens-funeral
Content: Workers are scrambling to find last-minute child care in several provinces after governments announced the sudden closure of schools to mourn Queen Elizabeth.
The four Atlantic provinces, British Co… [+3083 chars]
URL: https://globalnews.ca/news/9130909/canada-school-closures-sept-19-queen-funeral/
Content: The passing of the Queen, a unifying figure more beloved than her son, King Charles III, comes as several Commonwealth realms are reassessing their relationships with the crown 
Author of the articl… [+11403 chars]
URL: https://nationalpost.com/news/as-elizabeth-gives-way-to-charles-realms-consider-severing-ties
Content: The new Princ

In [105]:
sentiment_scores = []

for row in rows:
  article_url = row["url"]
  data = analyzeText(url=article_url)
  entry = {}
  entry['url'] = article_url
  try:
    entry['score'] = data['sentiment']['document']['score']
    # print(data['sentiment']['document']['score'])
  except KeyError:
    entry['score'] = 'ERROR: Cannot determine sentiment analysis score for this article'
    # print("ERROR: Cannot determine sentiment analysis score for this article")
  
  # print(entry)
  sentiment_scores.append(entry);

{'url': 'https://amp.theguardian.com/uk-news/2022/sep/15/anger-among-mps-as-chinese-vice-president-to-attend-queens-funeral', 'score': 0.41797}
{'url': 'https://globalnews.ca/news/9130909/canada-school-closures-sept-19-queen-funeral/', 'score': -0.74832}
{'url': 'https://nationalpost.com/news/as-elizabeth-gives-way-to-charles-realms-consider-severing-ties', 'score': 0.258092}
{'url': 'https://news.sky.com/story/prince-and-princess-of-wales-view-tributes-to-queen-at-sandringham-12697938', 'score': -0.366433}
{'url': 'https://nz.news.yahoo.com/hidden-detail-in-photo-of-kate-at-queens-coffin-procession-225141994.html', 'score': -0.463513}
{'url': 'https://theindependent.sg/he-predicted-queen-elizabeth-iis-death-and-now-he-predicts-king-charless-death/', 'score': -0.467971}
{'url': 'https://www.9news.com.au/national/australia-breaking-news-today-live-queen-elizabeth-ii-king-charles-iii-updates-latest-headlines/92e62e8a-fd45-4cea-b00a-91b8eeaeee0c', 'score': 0.665348}
{'url': 'https://www.a

In [106]:
table_name = f"{suffix}_sentiment_score"
# Create a table
create_table_query = f"""CREATE TABLE IF NOT EXISTS {db_name}.{table_name} 
                                (url varchar(255),
                                sentiment_score varchar(255), 
                                PRIMARY KEY(url)
                                )"""
engine.execute(create_table_query)

<sqlalchemy.engine.cursor.LegacyCursorResult at 0x7fdf0bc7a310>

In [107]:
query_template = f"""
                    INSERT IGNORE INTO 
                    {db_name}.{table_name}(url, sentiment_score) 
                    VALUES (%s, %s)
                  """
for score in sentiment_scores:
    url = score['url']
    sentiment_score = score['score']

    # print("Inserting URL", url, "with", sentiment_score, "as score")
    query_parameters = (url, sentiment_score)
    engine.execute(query_template, query_parameters)

In [108]:
results = engine.execute(f"SELECT * FROM {db_name}.{table_name}")
rows = results.fetchall()
results.close()

In [109]:
for row in rows:
    print("URL:", row["url"])
    print("Sentiment Score:", row["sentiment_score"])
    print("=============================================")

URL: https://amp.theguardian.com/uk-news/2022/sep/15/anger-among-mps-as-chinese-vice-president-to-attend-queens-funeral
Sentiment Score: 0.41797
URL: https://globalnews.ca/news/9130909/canada-school-closures-sept-19-queen-funeral/
Sentiment Score: -0.74832
URL: https://nationalpost.com/news/as-elizabeth-gives-way-to-charles-realms-consider-severing-ties
Sentiment Score: 0.258092
URL: https://news.sky.com/story/prince-and-princess-of-wales-view-tributes-to-queen-at-sandringham-12697938
Sentiment Score: -0.366433
URL: https://nz.news.yahoo.com/hidden-detail-in-photo-of-kate-at-queens-coffin-procession-225141994.html
Sentiment Score: -0.463513
URL: https://theindependent.sg/he-predicted-queen-elizabeth-iis-death-and-now-he-predicts-king-charless-death/
Sentiment Score: -0.467971
URL: https://www.9news.com.au/national/australia-breaking-news-today-live-queen-elizabeth-ii-king-charles-iii-updates-latest-headlines/92e62e8a-fd45-4cea-b00a-91b8eeaeee0c
Sentiment Score: 0.665348
URL: https://ww

In [110]:
drop_table_query = f"DROP TABLE IF EXISTS {db_name}.{table_name}"
engine.execute(drop_table_query)

table_name = f"{suffix}_news"

drop_table_query = f"DROP TABLE IF EXISTS {db_name}.{table_name}"
engine.execute(drop_table_query)

print("Dropped table from database.")
print("A new one will be created everytime we run this program")

Dropped table from database.
A new one will be created everytime we run this program
