# Analyzing customer messages - with a custom language model

This notebook demonstrates analyzing customer messages using Watson Natural Language Understanding - using a custom language model.

1. Look up Natural Language Understanding API key
2. Look up custom model ID
3. Analyze a test message
4. Download sample customer messages 
5. Analyze sample customer messages
6. Normalize results
7. Save results in a JSON file as a Project Asset

## Step 1: Look up Natural Language Understanding API key and URL

1. From the **Navigation menu** ( <img style="margin: 0px; padding: 0px; display: inline;" src="https://github.com/spackows/CASCON-2019_NLP-workshops/raw/master/images/nav-menu-icon.png"/> ), under the **Services** group, right-click "Watson Services" and then open the link in a new browser tab
2. In the new Watson services tab, from the **Action** menu beside your Natural Language Understanding instance, select "Manage in IBM Cloud"
3. In the service details page that opens, copy the apikey and URL

In [2]:
apikey = "" # <-- PASTE YOUR APIKEY HERE
url    = "" # <-- PASTE YOUR SERVICE URL HERE

## Step 2: Look up custom model ID

1. On the **Versions** page in your Knowledge Studio workspace, expand the **Deployed Models** list
2. Copy the **Model ID**

In [3]:
custom_model_id = "" # <-- PASTE THE MODEL ID FROM KNOWLEDGE STUDIO HERE

## Step 3: Analyze a test message

Use the NLU API to analyze a test message, comparing results from the default models to results from the custom model.

**Default models**
- Keywords
- Semantics: "action" words

**Custom model**
- Actions
- Objects
- Technology

See:
- [Watson Natural Language Understanding demo app](https://natural-language-understanding-demo.ng.bluemix.net/)
- [Watson Natural Language Understanding API](https://cloud.ibm.com/apidocs/natural-language-understanding/natural-language-understanding?code=python)
- [Exploring NLU notebook](https://github.com/spackows/CASCON-2019_NLP-workshops/blob/master/notebooks/Notebook-1_Exploring-NLU.ipynb)

In [None]:
!pip install --upgrade "ibm-watson>=4.0.1"

In [5]:
# Instantiate a natural language understanding object
#
import json
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
from ibm_watson import NaturalLanguageUnderstandingV1
from ibm_watson.natural_language_understanding_v1 import Features, KeywordsOptions, SemanticRolesOptions, EntitiesOptions
authenticator = IAMAuthenticator( apikey )
nlu = NaturalLanguageUnderstandingV1( version='2018-11-16', authenticator=authenticator )
nlu.set_service_url( url )

In [6]:
text = "Hi I wanted to know how to export data from python notebooks?"

In [7]:
default_result_keywords = nlu.analyze( text=text, features=Features( keywords=KeywordsOptions() ) ).get_result()
default_result_action = nlu.analyze( text=text, features=Features( semantic_roles=SemanticRolesOptions() ) ).get_result()
custom_result = nlu.analyze( text=text, features=Features( entities=EntitiesOptions( model=custom_model_id ) ) ).get_result()

In [8]:
# Compare results
#
print( 'Text: "' + text + '"' + "\n" )

default_keywords = []
for keyword in default_result_keywords["keywords"]:
    default_keywords.append( keyword["text"] )
default_actions = []
for semantics in default_result_action["semantic_roles"]:
    default_actions.append( semantics["action"]["normalized"] )    
print( "Default keywords: " + "[ '" + "', '".join( default_keywords ) + "' ]" )
print( "Default actions: " + "[ '" + "', '".join( default_actions ) + "' ]" )
print( "\n" )

custom_result_entities = { "action" : [], "docs" : [], "obj" : [], "persona" : [], "tech" : [] }
if( "entities" in custom_result ):
    for entity in custom_result["entities"]:
        entity_type = entity["type"]
        custom_result_entities[entity_type].append(entity["text"])
print( "Custom actions: " + "[ '" + "', '".join( custom_result_entities["action"] ) + "' ]" )
print( "Custom objects: " + "[ '" + "', '".join( custom_result_entities["obj"] ) + "' ]" )
print( "Custom technology: " + "[ '" + "', '".join( custom_result_entities["tech"] ) + "' ]" )

Text: "Hi I wanted to know how to export data from python notebooks?"

Default keywords: [ 'python notebooks', 'data' ]
Default actions: [ 'want', 'want to know', 'to export' ]


Custom actions: [ 'export' ]
Custom objects: [ 'notebooks' ]
Custom technology: [ 'python' ]


## Step 4: Import sample customer messages

This sample data set is from the Watson Studio Gallary: [Customer messages](https://dataplatform.cloud.ibm.com/exchange/public/entry/view/015ddef6a868441188268a123404f744)

In [9]:
# Import the data into a DataFrame by reading from a URL
#
import pandas as pd
import io
import requests
url = "https://api.dataplatform.cloud.ibm.com/v2/gallery-assets/entries/015ddef6a868441188268a123404f744/data?accessKey=1e878a1edda3c1c8b3f9defb83e5c84b"
csv_contents = io.StringIO( requests.get( url ).content.decode( "utf-8" ) )
all_messages = pd.read_csv( csv_contents, header=None )
all_messages.head()

Unnamed: 0,0,1
0,excuse me,hi
1,Good evening,hi
2,Good morning,hi
3,good morning,hi
4,Good morning can you help me upload a shapefile?,question


In [10]:
# For analysis purposes, we want just the questions and problems, 
# not the short, social messages labeled as "hi". And we want just 
# the text of those questions and problems, not the labels column.
#
questions_problems_only = all_messages[all_messages.iloc[:,1] != "hi" ].reset_index(drop=True)
questions_problems_text = list( questions_problems_only.iloc[:,0] )
questions_problems_text[0:6]

['Good morning can you help me upload a shapefile?',
 'Good night where to place my file to import it into notebook?',
 'hai how can i do analyze with csv file is there any tutorial on it',
 'Having issues setup WML service',
 'hello - Im trying to edit a notebook and the circie just keeps spinning. any idea to get around this?',
 'hello how can i download a csv file from my notebook?']

## Step 5: Analyze sample customer messages

For our analysis, we'll focus on extracting:
- Keywords 
- Actions and Objects (from semantic roles)

In [29]:
# Loop through all sample customer questions and problems, 
# extracting entities using the custom anguage model
#
results_list = []
for message in questions_problems_text:
    result = nlu.analyze( text=message, features=Features( entities=EntitiesOptions( model=custom_model_id ) ) ).get_result()
    result_entities = { "action" : [], "docs" : [], "obj" : [], "persona" : [], "tech" : [] }
    if( "entities" in result ):
        for entity in result["entities"]:
            entity_type = entity["type"]
            result_entities[entity_type].append( entity["text"] )
    results_list.append( { "header"   : "-------------------------------------------------------------",
                           "message"  : message,
                           "actions"  : result_entities["action"],
                           "objects"  : result_entities["obj"],
                           "tech"     : result_entities["tech"],
                           "docs"     : result_entities["docs"],
                           "persona"  : result_entities["persona"],
                           "spacer"   : "" } )

In [30]:
results_list

[{'header': '-------------------------------------------------------------',
  'message': 'Good morning can you help me upload a shapefile?',
  'actions': ['upload'],
  'objects': ['shapefile'],
  'tech': [],
  'docs': [],
  'persona': [],
  'spacer': ''},
 {'header': '-------------------------------------------------------------',
  'message': 'Good night where to place my file to import it into notebook?',
  'actions': ['import'],
  'objects': ['notebook'],
  'tech': [],
  'docs': [],
  'persona': [],
  'spacer': ''},
 {'header': '-------------------------------------------------------------',
  'message': 'hai how can i do analyze with csv file is there any tutorial on it',
  'actions': ['analyze'],
  'objects': [],
  'tech': [],
  'docs': ['tutorial'],
  'persona': [],
  'spacer': ''},
 {'header': '-------------------------------------------------------------',
  'message': 'Having issues setup WML service',
  'actions': ['setup'],
  'objects': [],
  'tech': ['WML'],
  'docs': [],


###  View results

Let's view the action words, listed in order of occurrence. 

In [33]:
# Count the custom actions words
#
all_actions = {}
for result in results_list:
    actions_arr = result["actions"]
    for action in actions_arr:
        if( action not in all_actions ):
            all_actions[action] = 0
        all_actions[action] += 1

common_actions = dict( [ (k,v) for k,v in all_actions.items() if v > 1 ] )

from collections import OrderedDict
ordered_common_actions = OrderedDict( sorted( common_actions.items(), key=lambda x:x[1], reverse=True ) )
ordered_common_actions

OrderedDict([('create', 10),
             ('upload', 5),
             ('import', 3),
             ('download', 3),
             ('creating', 3),
             ('add', 3),
             ('connection', 3),
             ('connect', 2),
             ('training', 2),
             ('export', 2),
             ('deploy', 2),
             ('signup', 2)])

**Notice the repetition!**

- `create` and `Create` counted seperately
- `upload` and `Uploading` counted seperately
- ... and many more

We need to _normalize_ these results before they can be most useful.

(We'll do that in another notebook.)

## Step 6: Save results

Save NLU custom model results in a JSON file as a Project Asset.

To be able to easily save questions in .csv files as assets in our Watson Studio project, we need a project token.

Follow the steps in this topic: [Adding a project token](https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/token.html?audience=wdp&context=data)

***The project token is added in the very first cell at the top of the notebook.  Don't forget to scroll up and run that cell.***

(If you forget to run the inserted cell, you'll see the error <code>name 'project' is not defined</code> when you try to run the next cell below.)

In [17]:
project.save_data( 'NLU-results-custom-model.json', json.dumps( results_list, indent=3 ) , overwrite=True )

{'file_name': 'NLU-results-custom-model.json',
 'message': 'File saved to project storage.',
 'bucket_name': 'cascon2019-donotdelete-pr-gsnhbqe4skdcxh',
 'asset_id': 'b90a9d01-61bd-4b11-8618-75a4715d80e1'}

Copyright © 2019 IBM. This notebook and its source code are released under the terms of the MIT License.