# Custom negativity intent recognizer.

This notebook covers how to prepare a training dataset for custom entities in Amazon Comprehend leveraging the custom keywords that were generated from our word2vec model. 

We will build custom negativity intent recognizer based on keywords semantically similar to the word "frustrated"



In [1]:
# library imports
import re
import numpy as np
import pandas as pd
import matplotlib
import csv


In this example we will re-use the dataset that we wrangled and filtered for the telco domain. 

In [2]:
colnames=['text'] 
tweets = pd.read_csv('./data/tweet_telco.csv',encoding='utf-8',names=colnames, header=None)
print(tweets.shape)
tweets.head()

(32716, 1)


Unnamed: 0,text
0,@sprintcare is the worst customer service | @1...
1,@sprintcare is the worst customer service | @1...
2,@sprintcare is the worst customer service | @1...
3,@115714 y’all lie about your “great” connectio...
4,"@115714 whenever I contact customer support, t..."


<a id='data-wrangling'></a>

In order to create our dataset we need to provide an entity list for our new class named NEGATIVITY.

In order to find relevant entities, we will be using our custom word2vec model to find semantically similar words to "frustrated". See the blazingtext_word2vec_telco_tweets.ipynb notebook for generating keywords.

In [3]:
negative_words = ['Really', 'cheated', 'annoyed', 'unhelpful', 'frustrated', 'upset' , 'unhappy', 'angry', 'badly', 'bad', 'surprised', 'sadly', 'dissatisfied', 'disappointed', 'disgusted']

df_entity_list = pd.DataFrame(negative_words, columns=['Text'])



Let's add another column with our class label. This is required part of the Amazon Comprehend training dataset.

More information can be found here.

https://docs.aws.amazon.com/comprehend/latest/dg/cer-entity-list.html


In [5]:
df_entity_list['Type'] = 'NEGATIVE'


Let's create our training file.

In [8]:
tweets['text'].to_csv('./data/raw_negative.csv', encoding='utf-8', index=False)


In [21]:
!head ./data/raw_negative.csv

"@115911 @TMobileHelp y’all just pissed me off and I’m highly disappointed with the customer service been with y’all over 16 years FIX THIS! | @117690 Hey bro! Send me a DM, I got you 100% - https://t.co/UOOUCn8nWm *JeremyKelley"
"@115714 sucks so bad. Always switching to roaming so they can charge me whatever the hell they want. Get a real network! | @117883 We'd be more than willing to take a look at the area for you if you can DM us a good intersection, J. :) -CDE"
"Mad at Sprint, daughter had her phone stolen and we are getting the rum around trying to get her a new one. #Sprint  #frustrated | @117885 This is concerning to us. Please, send us a DM with more details of your issue for us to assist you further. -DP"
"@115911 @TMobileHelp terrible customer service. 3 dysfunctional refurbished phones in 2 weeks. @115913 #badservice | @120051 Ni Nicos, thank you for reaching out to us. I replied to your DM and look forward to working with you. *JasonYaddow"
"@115913 so upset with @11

Let's create the entity list file

In [9]:
df_entity_list.to_csv('./data/entity_negative_list.csv', encoding='utf-8', index=False)


In [10]:
!head ./data/entity_negative_list.csv

Text,Type
Really,NEGATIVE
cheated,NEGATIVE
annoyed,NEGATIVE
unhelpful,NEGATIVE
frustrated,NEGATIVE
upset,NEGATIVE
unhappy,NEGATIVE
angry,NEGATIVE
badly,NEGATIVE


Let's create a test file from our original telco tweet dataset.

In [11]:
tweets['text'].tail(10000).to_csv('./data/telco_device_test.csv', encoding='utf-8', index=False)

## Training our model

I am going to use the console to submit our custom entity recognizer job. Look at the first notebook for details.




## Testing our custom entity model

Let's invoke the Comprehend API to run our test job from the test file we prepared earlier.

In [None]:
aws comprehend start-entities-detection-job \
     --entity-recognizer-arn "arn:aws:comprehend:us-east-1:202860692096:entity-recognizer/Negativity-copy" \
     --job-name Test \
     --data-access-role-arn "arn:aws:iam::202860692096:role/service-role/AmazonComprehendServiceRole-AmazonComprehendServiceRole" \
     --language-code en \
     --input-data-config "S3Uri=s3://data-phi/telco_random.csv" \
     --output-data-config "S3Uri=s3://data-phi/telco_negative" \
     --region "us-east-1"

The output will be a json file specified in my --output-data-config.
You can use Glue and Athena to inspect the results.

