# Python and Analytics workshop - Using Natural Language Understanding and Sentiment

In this portion of the workshop, we'll use an instance of [Watson Natural Language Understanding](https://cloud.ibm.com/catalog/services/natural-language-understanding) to gather insights into data.

Watson Natural Language Understanding is a cloud native product that uses deep learning to extract metadata from text such as entities, keywords, categories, sentiment, emotion, relations, and syntax.
There is a rich [API](https://cloud.ibm.com/apidocs/natural-language-understanding?code=python) that we will use along with the [Watson Python SDK](https://github.com/watson-developer-cloud/python-sdk) to analyze our data.

## Contents

- [1.0 Setup - install modules](#setup)
- [2.0 Test NLU APIs](#test)
- [3.0 Import Data and Setup Pandas Dataframe ](#pandas)
- [4.0 Clean and Prepare data for NLU scoring](#clean)
- [5.0 Analyze response from NLU ](#analyze)
- [6.0 Get sentiment by row](#sentiment-row)
- [7.0 Graph with matplotlib](#graph)



## 1.0 Setup - Install Modules<a name="setup"></a>

We use the [Watson Python SDK](https://github.com/watson-developer-cloud/python-sdk) to access the [NLU APIs](https://cloud.ibm.com/apidocs/natural-language-understanding?code=python) programatically.

In [1]:
!pip install --upgrade numpy==1.18.5
!pip install --upgrade pandas==1.0.5
!pip install --upgrade ibm-watson==4.7.1
!pip install PyJWT==1.7.1

Collecting PyJWT==1.7.1
  Downloading PyJWT-1.7.1-py2.py3-none-any.whl (18 kB)
Installing collected packages: PyJWT
  Attempting uninstall: PyJWT
    Found existing installation: PyJWT 2.1.0
    Uninstalling PyJWT-2.1.0:
      Successfully uninstalled PyJWT-2.1.0
Successfully installed PyJWT-1.7.1


### Important: Restart the Jupyter kernel now
Restart the kernal by going to the `Kernel` tab above and choosing `Restart`.

Import python modules from the Watson Python SDKs

In [6]:
import json
from ibm_watson import NaturalLanguageUnderstandingV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
from ibm_watson.natural_language_understanding_v1 import Features,KeywordsOptions,SentimentOptions

### 1.1 Add NLU credentials
Get the [IAM Authentication Key](https://cloud.ibm.com/apidocs/natural-language-understanding?code=python#authentication) and [Service URL](https://cloud.ibm.com/apidocs/natural-language-understanding?code=python#service-endpoint) that you obtained when you [Created a Watson NLU instance](https://github.ibm.com/IBMDeveloper/python-and-analytics/tree/addNLU/workshop/natural-language-understanding#create-a-watson-nlu-instance).

Add your [IAM Authentication Key](https://cloud.ibm.com/apidocs/natural-language-understanding?code=python#authentication) below.

In [3]:
IAM_KEY = '***************************'

Add your [NLU Service URL](https://cloud.ibm.com/apidocs/natural-language-understanding?code=python#service-endpoint) below

In [4]:
SERVICE_URL = '*************************************a775c91'

## 2.0 NLU APIs <a name="test"></a>
Run a quick check to make sure everything is working. We'll use a [basic web page](https://www.ibm.com) to see how Watson Natural Language Understanding can extract categories when given a URL. [This example](https://cloud.ibm.com/apidocs/natural-language-understanding?code=python#categories) comes from the Watson NLU documentation.

In [7]:
authenticator = IAMAuthenticator(IAM_KEY)
natural_language_understanding = NaturalLanguageUnderstandingV1(version='2020-08-01',authenticator=authenticator)

natural_language_understanding.set_service_url(SERVICE_URL)
response = natural_language_understanding.analyze(
    text='Compartimos la actualización de datos sobre #COVID19 en nuestro país.Domingo 25 de julio de 2021.#ProtégetePanamá#UnidosVenceremos',
    features=Features(keywords=KeywordsOptions(limit=3))).get_result()

print(json.dumps(response, indent=2))


{
  "usage": {
    "text_units": 1,
    "text_characters": 130,
    "features": 1
  },
  "language": "es",
  "keywords": [
    {
      "text": "actualizaci\u00f3n de datos",
      "relevance": 0.992122,
      "count": 1
    },
    {
      "text": "#COVID19",
      "relevance": 0.35094,
      "count": 1
    },
    {
      "text": "pa\u00eds",
      "relevance": 0.336484,
      "count": 1
    }
  ]
}


In [8]:
response = natural_language_understanding.analyze(
    text='Compartimos la actualización de datos sobre #COVID19 en nuestro país.Domingo 25 de julio de 2021.#ProtégetePanamá#UnidosVenceremos',
    features=Features(sentiment=SentimentOptions())).get_result()

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 130,
    "features": 1
  },
  "sentiment": {
    "document": {
      "score": 0,
      "label": "neutral"
    }
  },
  "language": "es"
}


## 3.0 Import Data and Setup Pandas Dataframe <a name="pandas"></a>

In [2]:
import os, types
import pandas as pd
import numpy as np
df = pd.read_csv('https://raw.githubusercontent.com/LuisKu/WatsonNLU-test/main/insta_data2.csv')
df2 = pd.read_csv('https://raw.githubusercontent.com/LuisKu/WatsonNLU-test/main/insta_data3.csv')



In [55]:
df.head(5)

Unnamed: 0,numero de post,commentarios
0,1,Circuito 8-10 para cuándo estará programado?
1,1,👏👏👏
2,2,De verdad aún no entiendo… nuevas medidas y si...
3,2,"Siempre dice; arraijan, vista alegre_ arraijan..."
4,2,Quisiera saber donde están vacunando la 2a dos...


## 4.0 Get sentiment by row <a name="sentiment-row"></a>
Now, let's derive some sentiment and emotion information on a per-row basis, to provide more granualarity.
The number of API calls that you can make to Watson NLU is [rate limited and dependent on your service plan](https://cloud.ibm.com/catalog/services/natural-language-understanding), so in order to limit the number of API calls to the NLU endpoint we'll start with just 50 rows by setting `num_rows` to 50.

In [10]:
df_rows = df

In [15]:
sentimentComentario = []

for index, row in df_rows.iterrows():
    
    response = natural_language_understanding.analyze(
    text = row['commentarios'],
    features=Features(sentiment=SentimentOptions()),
    language='es').get_result()
    sentimentComentario.append(response)  

In [22]:
sentimentComentario

[{'usage': {'text_units': 1, 'text_characters': 44, 'features': 1},
  'sentiment': {'document': {'score': 0, 'label': 'neutral'}},
  'language': 'es'},
 {'usage': {'text_units': 1, 'text_characters': 3, 'features': 1},
  'sentiment': {'document': {'score': 0.524743, 'label': 'positive'}},
  'language': 'es'},
 {'usage': {'text_units': 1, 'text_characters': 203, 'features': 1},
  'sentiment': {'document': {'score': -0.578034, 'label': 'negative'}},
  'language': 'es'},
 {'usage': {'text_units': 1, 'text_characters': 114, 'features': 1},
  'sentiment': {'document': {'score': 0.892413, 'label': 'positive'}},
  'language': 'es'},
 {'usage': {'text_units': 1, 'text_characters': 75, 'features': 1},
  'sentiment': {'document': {'score': 0, 'label': 'neutral'}},
  'language': 'es'},
 {'usage': {'text_units': 1, 'text_characters': 56, 'features': 1},
  'sentiment': {'document': {'score': -0.647574, 'label': 'negative'}},
  'language': 'es'},
 {'usage': {'text_units': 1, 'text_characters': 41, '

In [17]:
test_df = df_rows
for index, row in test_df.iterrows():
    test_df.loc[index,"sentimientos"] = sentimentComentario[index]['sentiment']['document']['score']
test_df.head(5)  

Unnamed: 0,numero de post,commentarios,sentimientos
0,1,Circuito 8-10 para cuándo estará programado?,0.0
1,1,👏👏👏,0.524743
2,2,De verdad aún no entiendo… nuevas medidas y si...,-0.578034
3,2,"Siempre dice; arraijan, vista alegre_ arraijan...",0.892413
4,2,Quisiera saber donde están vacunando la 2a dos...,0.0


In [25]:
test_df.to_csv("Comentarios.csv",index=False)

In [83]:
df2_test = test_df
df_post = df2_test.groupby('numero de post')['sentimientos'].mean()
df_post.to_csv("postsentiment.csv", index=True)

In [43]:
test_df2 = df2_rows
df2_rows['key'] = keywordDescripcion
df2_rows

Unnamed: 0,numero de post,fecha,caption,palabrasClave,key
0,1,27 de julio de 2021,"¡Llegan 456,300 nuevas dosis Pfizer! Estas dos...",nuevas dosis,"{'usage': {'text_units': 1, 'text_characters':..."
1,2,26 de julio de 2021,Desglose de corregimientos con más casos en el...,Desglose de corregimientos,"{'usage': {'text_units': 1, 'text_characters':..."
2,3,26 de julio de 2021,Compartimos la actualización de datos sobre #C...,actualización de datos,"{'usage': {'text_units': 1, 'text_characters':..."
3,4,26 de julio de 2021,Comunicado N° 517 #UnPanamáMejor🇵🇦 #ProtégeteP...,UnPanamáMejor🇵🇦 #ProtégetePanamá,"{'usage': {'text_units': 1, 'text_characters':..."
4,5,26 de julio de 2021,Nuevas medidas que empezarán a regir a partir ...,Nuevas medidas,"{'usage': {'text_units': 1, 'text_characters':..."
5,6,26 de julio de 2021,#Herrera | Tras los barridos realizados en las...,#Herrera,"{'usage': {'text_units': 1, 'text_characters':..."
6,7,26 de julio de 2021,#ComarcaNgäbeBuglé | La Región de Salud de la ...,Región de Salud de la Comarca,"{'usage': {'text_units': 1, 'text_characters':..."
7,8,26 de julio de 2021,"#LosSantos | Recibimos 5,004 dosis de Pfizer p...",dosis de Pfizer,"{'usage': {'text_units': 1, 'text_characters':..."
8,9,26 de julio de 2021,"#Chiriquí | Con un total de 74,130 dosis aplic...",dosis aplicadas de Pfizer,"{'usage': {'text_units': 1, 'text_characters':..."
9,10,26 de julio de 2021,#Coclé | Con éxito culmina barrido de Pfizer e...,primeras dosis,"{'usage': {'text_units': 1, 'text_characters':..."


In [49]:
for index, row in test_df2.iterrows():
    test_df2.loc[index,"palabrasClave"] = df2_rows.iloc[index]['key']['keywords'][0]['text']
    
test_df2

IndexError: list index out of range

In [50]:
test_df2

Unnamed: 0,numero de post,fecha,caption,palabrasClave,key
0,1,27 de julio de 2021,"¡Llegan 456,300 nuevas dosis Pfizer! Estas dos...",nuevas dosis,"{'usage': {'text_units': 1, 'text_characters':..."
1,2,26 de julio de 2021,Desglose de corregimientos con más casos en el...,Desglose de corregimientos,"{'usage': {'text_units': 1, 'text_characters':..."
2,3,26 de julio de 2021,Compartimos la actualización de datos sobre #C...,actualización de datos,"{'usage': {'text_units': 1, 'text_characters':..."
3,4,26 de julio de 2021,Comunicado N° 517 #UnPanamáMejor🇵🇦 #ProtégeteP...,UnPanamáMejor🇵🇦 #ProtégetePanamá,"{'usage': {'text_units': 1, 'text_characters':..."
4,5,26 de julio de 2021,Nuevas medidas que empezarán a regir a partir ...,Nuevas medidas,"{'usage': {'text_units': 1, 'text_characters':..."
5,6,26 de julio de 2021,#Herrera | Tras los barridos realizados en las...,#Herrera,"{'usage': {'text_units': 1, 'text_characters':..."
6,7,26 de julio de 2021,#ComarcaNgäbeBuglé | La Región de Salud de la ...,Región de Salud de la Comarca,"{'usage': {'text_units': 1, 'text_characters':..."
7,8,26 de julio de 2021,"#LosSantos | Recibimos 5,004 dosis de Pfizer p...",dosis de Pfizer,"{'usage': {'text_units': 1, 'text_characters':..."
8,9,26 de julio de 2021,"#Chiriquí | Con un total de 74,130 dosis aplic...",dosis aplicadas de Pfizer,"{'usage': {'text_units': 1, 'text_characters':..."
9,10,26 de julio de 2021,#Coclé | Con éxito culmina barrido de Pfizer e...,primeras dosis,"{'usage': {'text_units': 1, 'text_characters':..."


In [20]:
test_df2.to_csv("Post",index=False)

Add the `responses` list and the `normalize` to the df_rows dataframe. We can continue to use these new data features, but more commonly we'll derive new dataframes for our experiments and change those new dataframes instead.

Let's create a new dataframe where we can pull out the column for the `emotion` `anger`, then sort by the highest rating of `anger`.