# Analyzing customer messages

This notebook demonstrates analyzing customer messages using Watson Natural Language Understanding - using default models.

1. Look up Natural Language Understanding API key
2. Analyze a test message
3. Download sample customer messages 
4. Analyze sample customer messages
5. Save results in a JSON file as a Project Asset

## Step 1: Look up Natural Language Understanding API key and URL

1. From the **Navigation menu** ( <img style="margin: 0px; padding: 0px; display: inline;" src="https://github.com/spackows/CASCON-2019_NLP-workshops/raw/master/images/nav-menu-icon.png"/> ), under the **Services** group, right-click "Watson Services" and then open the link in a new browser tab
2. In the new Watson services tab, from the **Action** menu beside your Natural Language Understanding instance, select "Manage in IBM Cloud"
3. In the service details page that opens, copy the apikey and URL

In [5]:
apikey = "" # <-- PASTE YOUR APIKEY HERE
url    = "" # <-- PASTE YOUR SERVICE URL HERE

## Step 2: Analyze a test message

Use the NLU API to extract:
- Sentiment
- Emotion
- Keywords
- Entities
- Categories
- Concepts
- Syntax
- Semantics

See:
- [Watson Natural Language Understanding demo app](https://natural-language-understanding-demo.ng.bluemix.net/)
- [Text anaytics features](https://cloud.ibm.com/apidocs/natural-language-understanding?code=python#text-analytics-features)
- [Watson Natural Language Understanding API](https://cloud.ibm.com/apidocs/natural-language-understanding/natural-language-understanding?code=python)

In [None]:
!pip install --upgrade "ibm-watson>=4.0.1"

In [8]:
# Instantiate a natural language understanding object
#
import json
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
from ibm_watson import NaturalLanguageUnderstandingV1
from ibm_watson.natural_language_understanding_v1 import Features, ConceptsOptions, EmotionOptions, EntitiesOptions, KeywordsOptions, SemanticRolesOptions, SentimentOptions, CategoriesOptions, SyntaxOptions, SyntaxOptionsTokens
authenticator = IAMAuthenticator( apikey )
nlu = NaturalLanguageUnderstandingV1( version='2018-11-16', authenticator=authenticator )
nlu.set_service_url( url )

In [20]:
# Explore every NLU feature option using this test message
text = "My IBM Cloud account is in Dallas, but I'm in Europe.  My friend, Sam, says that's not supported."

In [21]:
# Sentiment
result = nlu.analyze( text=text, features=Features( sentiment=SentimentOptions() ) ).get_result()
print( '"' + text + '"' + "\n" )
print( json.dumps( result["sentiment"]["document"], indent=3 ) )

"My IBM Cloud account is in Dallas, but I'm in Europe.  My friend, Sam, says that's not supported."

{
   "score": -0.650127,
   "label": "negative"
}


In [22]:
# Emotion
result = nlu.analyze( text=text, features=Features( emotion=EmotionOptions() ) ).get_result()
print( '"' + text + '"' + "\n" )
print( json.dumps( result["emotion"]["document"]["emotion"], indent=3 ) )

"My IBM Cloud account is in Dallas, but I'm in Europe.  My friend, Sam, says that's not supported."

{
   "sadness": 0.232996,
   "joy": 0.059741,
   "fear": 0.032187,
   "disgust": 0.081764,
   "anger": 0.03555
}


In [23]:
# Keywords
result = nlu.analyze( text=text, features=Features( keywords=KeywordsOptions() ) ).get_result()
print( '"' + text + '"' + "\n" )
print( json.dumps( result["keywords"], indent=3 ) )

"My IBM Cloud account is in Dallas, but I'm in Europe.  My friend, Sam, says that's not supported."

[
   {
      "text": "IBM Cloud account",
      "relevance": 0.997981,
      "count": 1
   },
   {
      "text": "friend",
      "relevance": 0.760324,
      "count": 1
   },
   {
      "text": "Sam",
      "relevance": 0.651134,
      "count": 1
   },
   {
      "text": "Dallas",
      "relevance": 0.601985,
      "count": 1
   },
   {
      "text": "Europe",
      "relevance": 0.58449,
      "count": 1
   }
]


In [24]:
# Entities
result = nlu.analyze( text=text, features=Features( entities=EntitiesOptions() ) ).get_result()
print( '"' + text + '"' + "\n" )
print( json.dumps( result["entities"], indent=3 ) )

"My IBM Cloud account is in Dallas, but I'm in Europe.  My friend, Sam, says that's not supported."

[
   {
      "type": "Company",
      "text": "IBM",
      "relevance": 0.859385,
      "count": 1
   },
   {
      "type": "Person",
      "text": "Sam",
      "relevance": 0.846087,
      "count": 1
   },
   {
      "type": "Location",
      "text": "Dallas",
      "relevance": 0.786843,
      "disambiguation": {
         "subtype": [
            "City"
         ]
      },
      "count": 1
   },
   {
      "type": "Location",
      "text": "Europe",
      "relevance": 0.760723,
      "disambiguation": {
         "subtype": [
            "MusicalGroup",
            "BroadcastArtist",
            "FilmMusicContributor",
            "Lyricist",
            "MusicalArtist",
            "RecordProducer",
            "Continent"
         ],
         "name": "Europe",
         "dbpedia_resource": "http://dbpedia.org/resource/Europe"
      },
      "count": 1
   }
]


In [26]:
# Categories
result = nlu.analyze( text=text, features=Features( categories=CategoriesOptions() ) ).get_result()
print( '"' + text + '"' + "\n" )
print( json.dumps( result["categories"], indent=3 ) )

"My IBM Cloud account is in Dallas, but I'm in Europe.  My friend, Sam, says that's not supported."

[
   {
      "score": 0.883607,
      "label": "/technology and computing/operating systems"
   },
   {
      "score": 0.814576,
      "label": "/technology and computing/hardware/computer"
   },
   {
      "score": 0.789138,
      "label": "/technology and computing/hardware/computer peripherals"
   }
]


In [27]:
# Concepts
result = nlu.analyze( text=text, features=Features( concepts=ConceptsOptions() ) ).get_result()
print( '"' + text + '"' + "\n" )
print( json.dumps( result["concepts"], indent=3 ) )

"My IBM Cloud account is in Dallas, but I'm in Europe.  My friend, Sam, says that's not supported."

[
   {
      "text": "United States",
      "relevance": 0.916595,
      "dbpedia_resource": "http://dbpedia.org/resource/United_States"
   }
]


In [29]:
# Syntax
result = nlu.analyze( text=text, features=Features( syntax=SyntaxOptions( tokens=SyntaxOptionsTokens( part_of_speech=True ) ) ) ).get_result()
print( '"' + text + '"' + "\n" )
print( json.dumps( result["syntax"]["tokens"], indent=3 ) )

"My IBM Cloud account is in Dallas, but I'm in Europe.  My friend, Sam, says that's not supported."

[
   {
      "text": "My",
      "part_of_speech": "PRON",
      "location": [
         0,
         2
      ]
   },
   {
      "text": "IBM",
      "part_of_speech": "PROPN",
      "location": [
         3,
         6
      ]
   },
   {
      "text": "Cloud",
      "part_of_speech": "PROPN",
      "location": [
         7,
         12
      ]
   },
   {
      "text": "account",
      "part_of_speech": "NOUN",
      "location": [
         13,
         20
      ]
   },
   {
      "text": "is",
      "part_of_speech": "AUX",
      "location": [
         21,
         23
      ]
   },
   {
      "text": "in",
      "part_of_speech": "ADP",
      "location": [
         24,
         26
      ]
   },
   {
      "text": "Dallas",
      "part_of_speech": "NOUN",
      "location": [
         27,
         33
      ]
   },
   {
      "text": ",",
      "part_of_speech": "PUNCT",
      "location": [


In [30]:
# Semantics
result = nlu.analyze( text=text, features=Features( semantic_roles=SemanticRolesOptions() ) ).get_result()
print( '"' + text + '"' + "\n" )
print( json.dumps( result["semantic_roles"], indent=3 ) )

"My IBM Cloud account is in Dallas, but I'm in Europe.  My friend, Sam, says that's not supported."

[
   {
      "subject": {
         "text": "My IBM Cloud account"
      },
      "sentence": "My IBM Cloud account is in Dallas, but I'm in Europe.",
      "object": {
         "text": "in Dallas"
      },
      "action": {
         "verb": {
            "text": "be",
            "tense": "present"
         },
         "text": "is",
         "normalized": "be"
      }
   },
   {
      "subject": {
         "text": "I"
      },
      "sentence": "My IBM Cloud account is in Dallas, but I'm in Europe.",
      "action": {
         "verb": {
            "text": "be",
            "tense": "present"
         },
         "text": "am",
         "normalized": "be"
      }
   },
   {
      "subject": {
         "text": "My friend, Sam,"
      },
      "sentence": " My friend, Sam, says that's not supported.",
      "object": {
         "text": "that's not supported"
      },
      "action": {
    

## Step 3: Import sample customer messages

This sample data set is from the Watson Studio Gallary: [Customer messages](https://dataplatform.cloud.ibm.com/exchange/public/entry/view/015ddef6a868441188268a123404f744)

In [2]:
# Import the data into a DataFrame by reading from a URL
#
import pandas as pd
import io
import requests
url = "https://api.dataplatform.cloud.ibm.com/v2/gallery-assets/entries/015ddef6a868441188268a123404f744/data?accessKey=1e878a1edda3c1c8b3f9defb83e5c84b"
csv_contents = io.StringIO( requests.get( url ).content.decode( "utf-8" ) )
all_messages = pd.read_csv( csv_contents, header=None )
all_messages.head()

Unnamed: 0,0,1
0,excuse me,hi
1,Good evening,hi
2,Good morning,hi
3,good morning,hi
4,Good morning can you help me upload a shapefile?,question


In [3]:
# For analysis purposes, we want just the questions and problems, 
# not the short, social messages labeled as "hi". And we want just 
# the text of those questions and problems, not the labels column.
#
questions_problems_only = all_messages[all_messages.iloc[:,1] != "hi" ].reset_index(drop=True)
questions_problems_text = list( questions_problems_only.iloc[:,0] )
questions_problems_text[0:6]

['Good morning can you help me upload a shapefile?',
 'Good night where to place my file to import it into notebook?',
 'hai how can i do analyze with csv file is there any tutorial on it',
 'Having issues setup WML service',
 'hello - Im trying to edit a notebook and the circie just keeps spinning. any idea to get around this?',
 'hello how can i download a csv file from my notebook?']

## Step 4: Analyze sample customer messages

For our analysis, we'll focus on extracting:
- Keywords 
- Actions and Objects (from semantic roles)

In [9]:
# Loop through all sample customer questions and problems, extracting keywords and sematic roles
#
results_list = []
for message in questions_problems_text:
    result = nlu.analyze( text=message, features=Features( keywords=KeywordsOptions(), semantic_roles=SemanticRolesOptions() ) ).get_result()
    keywords_arr = []
    for keyword in result["keywords"]:
        keywords_arr.append( keyword["text"] )
    actions_arr = []
    objects_arr = []
    if( "semantic_roles" in result ):
        for semantic_result in result["semantic_roles"]:
            if( "action" in semantic_result ):
                actions_arr.append( semantic_result["action"]["normalized"] )
            if( "object" in semantic_result ):
                objects_arr.append( semantic_result["object"]["text"] )
    results_list.append( { "header"   : "-------------------------------------------------------------",
                           "message"  : message,
                           "keywords" : keywords_arr,
                           "actions"  : actions_arr,
                           "objects"  : objects_arr,
                           "spacer"   : "" } )

In [10]:
results_list

[{'header': '-------------------------------------------------------------',
  'message': 'Good morning can you help me upload a shapefile?',
  'keywords': ['Good morning', 'shapefile'],
  'actions': ['help'],
  'objects': ['me upload a shapefile'],
  'spacer': ''},
 {'header': '-------------------------------------------------------------',
  'message': 'Good night where to place my file to import it into notebook?',
  'keywords': ['Good night', 'file', 'notebook'],
  'actions': ['to place', 'to import'],
  'objects': ['to import it into notebook', 'into notebook'],
  'spacer': ''},
 {'header': '-------------------------------------------------------------',
  'message': 'hai how can i do analyze with csv file is there any tutorial on it',
  'keywords': ['csv file', 'hai', 'tutorial'],
  'actions': ['do', 'be'],
  'objects': ['analyze', 'any tutorial on it'],
  'spacer': ''},
 {'header': '-------------------------------------------------------------',
  'message': 'Having issues setup

## Step 5: Save results

Save NLU results in a JSON file as a Project Asset.

To be able to easily save questions in .csv files as assets in our Watson Studio project, we need a project token.

Follow the steps in this topic: [Adding a project token](https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/token.html?audience=wdp&context=data)

***The project token is added in the very first cell at the top of the notebook.  Don't forget to scroll up and run that cell.***

(If you forget to run the inserted cell, you'll see the error <code>name 'project' is not defined</code> when you try to run the next cell below.)

In [14]:
project.save_data( 'NLU-results.json', json.dumps( results_list, indent=3 ) , overwrite=True )

{'file_name': 'NLU-results.json',
 'message': 'File saved to project storage.',
 'bucket_name': 'cascon2019-donotdelete-pr-gsnhbqe4skdcxh',
 'asset_id': '2d1dd212-4308-40ab-88f4-8206620b2a9c'}

Copyright © 2019 IBM. This notebook and its source code are released under the terms of the MIT License.