<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Sentiment Analysis Using Vantage and Open Analytics Framework Analytic GPU Clusters
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<hr>

<p style = 'font-size:28px;font-family:Arial;color:#00233C'><b>Demonstration Overview</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The following demonstration illustrates an operationalized end-to-end process of utilizing VantageCloud Lake <b>GPU-enabled Analytic Cluster</b> architecture to run open-source large language models at massive parallelism and scale.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>This notebook illustrates the final step of a GPU-augmented analytic pipeline.  In the previous demonstration, we reviewed;</p>
<ol style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li><b>Container Management</b>.  Administrators can create and manage <b>secure, custom</b> runtime containers that will host any number of models and model artifacts to unlock GPU-augmented analytics</li>
    <li><b>Data Prep with Sentiment Extraction</b>. Developers will use the Hugging Face cardiffnlp/twitter-roberta-base-sentiment-latest model to extract user sentiment from Call Center transcripts</li>
    <li><b>Operationalization</b>. Combine the data prep and transformation steps with powerful native <b>ClearScape Analytics</b> functions against the data sets to create a scalable, on-demand Sentiment extraction function</li>
    </ol>
<hr>

<p style = 'font-size:28px;font-family:Arial;color:#00233C'><b>Understanding Customer Sentiment</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In this final demonstraion, we will illustrate various methods for visualizing the customer sentiment.</p>

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Python Package Installation</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>If necessary, install required client packages for the demonstrations.  User may need to restart the Jupyter kernel after installation.</p> 

In [None]:
%pip install -r requirements.txt

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Python Package Imports</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Standard practice to import required packages and libraries; execute this cell to import packages for Teradata automation as well as machine learning, analytics, utility, and data management packages.</p> 

In [None]:
import numpy as np
import pandas as pd


import plotly.express as px
from wordcloud import WordCloud
import matplotlib.pyplot as plt
import plotly.subplots as subplots

from teradataml import *
from oaf_utils import *
from teradatasqlalchemy.types import *
from time import sleep
import csv, sys, os, warnings
from collections import OrderedDict

from IPython.display import clear_output , display as ipydisplay
import matplotlib.pyplot as plt
%matplotlib inline
warnings.filterwarnings('ignore')

display.max_rows = 5
display.suppress_vantage_runtime_warnings = True

In [None]:
# load vars json
with open('vars.json', 'r') as f:
    session_vars = json.load(f)

# Database login information
host = session_vars['environment']['host']
username = session_vars['hierarchy']['users']['business_users'][1]['username']
password = session_vars['hierarchy']['users']['business_users'][1]['password']

# UES Authentication information
ues_url = session_vars['environment']['UES_URI']
configure.ues_url = ues_url
pat_token = session_vars['hierarchy']['users']['business_users'][1]['pat_token']
pem_file = session_vars['hierarchy']['users']['business_users'][1]['key_file']


compute_group = session_vars['hierarchy']['users']['business_users'][1]['compute_group']

# container name - set here for easier notebook navigation
### User will also be asked to change it ###
oaf_name = 'oaf_sentiment_demo'
###########################

<hr>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Demo 1 - Inspect the original data and extract sentiment
  <br>
    </p>

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Connect to the database</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>After connecting, check cluster status. Start it if necessary - note the cluster only needs to be running to execute the APPLY sections of the demo.</p> 

In [None]:
# check for existing connection
eng = check_and_connect(host=host, username=username, password=password, compute_group = compute_group)
print(eng)

# check cluster status
res = check_cluster_start(compute_group = compute_group)

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>1.1 - Inspect the Data</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Simple DataFrame methods to show the data.  A teradataml DataFrame behaves like a normal pandas DataFrame, with one significant difference in that it is a reference to data on the analytic database.  This allows developers to perform familiar data mangement operations on extremely large data sets as if the data is local.</p>

In [None]:
tdf_cust_calls = DataFrame('"demo_ofs"."cust_calls"')
tdf_cust_calls.head()

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>1.2 - Extract Sentiment using the Hugging Face Model</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In the prior demonstration notebook, we created two database <b>views</b> that simplified the end-to-end process to extract sentiment using an <b>Analytic GPU Cluster</b> hosting the Hugging Face <a href = 'https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest'>twitter-roberta-base-sentiment</a> Sentiment Extractor.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The steps performed in that demonstration include:</p>

<table style = 'width:100%;table-layout:fixed;'>
    <tr>
        <td style = 'vertical-align:top' width = '30%'>
           <ol style = 'font-size:16px;font-family:Arial;color:#00233C'>
               <li><b>Prepare the environment</b>.  Package the scoring function into a more robust program, and stage it on the remote environment</li>
            <br>
            <br>
               <li><b>Python Pipeline</b>.  Execute the function using Python methods, and commit the resulting transformations to database tables.  Test the native ClearScape Analytics Functions</li>
            <br>
            <br>
               <li><b>Operationalize</b>.  Simplify the analytic pipeline to support ongoing operational transformations, on-demand analytics, and third-party applications</li>
        </ol>
        </td>
        <td width = '20%'></td>
        <td style = 'vertical-align:top'><img src = 'images/OAF_Ops.png' width = 350 ></td>
    </tr>
</table>

<hr>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>First, the query that will clean and prepare data:</p>

```sql
REPLACE VIEW prepared_data_V AS
    SELECT id,
    CASE 
            WHEN text IS NULL THEN ' '
            ELSE regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(text , X'0d' , ' ') , X'0a' , ' ') , X'09', ' '), ',', ' '), '"', ' ')
    END text
    FROM demo_ofs.cust_calls;
```
    
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Next, the query that will execute the sentiment extraction functions in the cluster:</p>

```sql
REPLACE VIEW simplified_sentiment_V AS
    SELECT * FROM Apply(
        ON prepared_data_V
        PARTITION BY ANY

        returns(id BIGINT, label VARCHAR(100), score FLOAT) 
        USING

        APPLY_COMMAND('python Sentiment_Extractor_twitter_roberta_base.py')
        ENVIRONMENT('environment_name')
        STYLE('csv')
        delimiter(',') 
    ) as d;
```
<hr>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Rerun the queries to create the view:</b></p>

In [None]:
qry = '''
REPLACE VIEW prepared_data_V AS
    SELECT id,
    CASE 
            WHEN text IS NULL THEN ' '
            ELSE regexp_replace(regexp_replace(regexp_replace(regexp_replace(regexp_replace(text , X'0d' , ' ') , X'0a' , ' ') , X'09', ' '), ',', ' '), '"', ' ')
    END text
    FROM demo_ofs.cust_calls;
'''
execute_sql(qry)

qry = f'''
REPLACE VIEW simplified_sentiment_V AS
    SELECT * FROM Apply(
        ON prepared_data_V
        PARTITION BY ANY

        returns(id BIGINT, label VARCHAR(100), score FLOAT) 
        USING

        APPLY_COMMAND('python Sentiment_Extractor_twitter_roberta_base.py')
        ENVIRONMENT('{oaf_name}')
        STYLE('csv')
        delimiter(',') 
    ) as d;
'''
execute_sql(qry)

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Use the view to inspect sentiment</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Note this is executing the python function and model, so it may take a few seconds</p>

In [None]:
DataFrame('simplified_sentiment_V')

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>1.3 - Persist the data</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The Teradata Python Package has simple methods for persisting the data - this is helpful to extract the sentiment once, then perform normal analytic processing moving forward.</p>

In [None]:
copy_to_sql(DataFrame('simplified_sentiment_V'), table_name = 'customer_sentiment', temporary = True, if_exists = 'replace')
tdf_sentiment = DataFrame('customer_sentiment')
tdf_sentiment

<hr>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Demo 2 - Analyze customer sentiment
  <br>
    </p>
    
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Teradata Vantage <b>ClearScape Analytics Functions</b> offer powerful in-database capabilities for analyzing data at scale.  This sentiment data can be combined with other analytics, or used as an additional feature for churn prediction, next-best action, or any other analytic outcome.  Here we will perform some simple exploration and visualizations.  Note some visualizations return data to the client machine.  Depending on the number of records in the database, it may be more efficient to use additional in-database analytic functions.</p>

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>2.1 - Sentiment Distribution</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Use simple methods on the Teradata DataFrame to groupby and count records in the database.</p>

In [None]:
tdf_sentiment.groupby('label').count()

In [None]:
# Create bar graph using Plotly Express
df_gb = tdf_sentiment.groupby('label').count().to_pandas()
fig = px.bar(df_gb, x='label', y='count_id', color='label',
             labels={'count_id': 'Number of Occurrences', 'label': 'label'})

# Show the plot
fig.show()

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>2.2 - Word Clouds</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Use client-side visualization tools to see the common terms for each negative, neutral, and positive calls.  First, join the original call transcripts to the sentiment, and use this as a filter.</p>

In [None]:
tdf_cust_calls = DataFrame('"demo_ofs"."cust_calls"')
tdf_joined = tdf_sentiment.join(tdf_cust_calls, on = 'id', lprefix = 'l').drop(labels = 'l_id', axis = 1)
tdf_joined

In [None]:
neg = tdf_joined[tdf_joined['label'] == 'negative'].to_pandas()
neg_text = ' '.join(neg['text'])

# Replace 'X' with blank space
modified_string = neg_text.replace('X', '')

wordcloud = WordCloud(width=800, height=400, background_color='white').generate(modified_string)

# Display the word cloud
plt.imshow(wordcloud, interpolation='bilinear')
plt.tight_layout()
plt.axis("off")
plt.show()

In [None]:
neu = tdf_joined[tdf_joined['label'] == 'neutral'].to_pandas()
neu_text = ' '.join(neu['text'])

# Replace 'X' with blank space
modified_string = neu_text.replace('X', '')

wordcloud = WordCloud(width=800, height=400, background_color='white').generate(modified_string)

# Display the word cloud
plt.imshow(wordcloud, interpolation='bilinear')
plt.tight_layout()
plt.axis("off")
plt.show()

In [None]:
pos = tdf_joined[tdf_joined['label'] == 'positive'].to_pandas()
pos_text = ' '.join(pos['text'])

# Replace 'X' with blank space
modified_string = pos_text.replace('X', '')

wordcloud = WordCloud(width=800, height=400, background_color='white').generate(modified_string)

# Display the word cloud
plt.imshow(wordcloud, interpolation='bilinear')
plt.tight_layout()
plt.axis("off")
plt.show()

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>2.3 - Sentiment Strength - in-database analysis</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Use the native <b>TD_Histogram</b> to analyze the distribution of strength scores for each negative, neutral sentiment.</p>

In [None]:
res = Histogram(data = tdf_joined[tdf_joined['label'] == 'negative'], target_columns = 'score', method_type = 'STURGES').result
res

In [None]:
# Create bar graph using Plotly Express
neg_hist = res.to_pandas()
fig = px.bar(neg_hist, x='MinValue', y='CountOfValues',
             labels={'CountOfValues': 'Number of Occurrences', 'MinValue': 'score'},
             title = 'Negative Sentiment Score Distribution',
             color_discrete_sequence = ['red'])

# Show the plot
fig.show()

In [None]:
res = Histogram(data = tdf_joined[tdf_joined['label'] == 'neutral'], target_columns = 'score', method_type = 'STURGES').result
res

In [None]:
# Create bar graph using Plotly Express
neu_hist = res.to_pandas()
fig = px.bar(neu_hist, x='MinValue', y='CountOfValues',
             labels={'CountOfValues': 'Number of Occurrences', 'MinValue': 'score'},
             title = 'Neutral Sentiment Score Distribution',
             color_discrete_sequence = ['yellow'])

# Show the plot
fig.show()

In [None]:
res = Histogram(data = tdf_joined[tdf_joined['label'] == 'positive'], target_columns = 'score', method_type = 'STURGES').result
res

In [None]:
# Create bar graph using Plotly Express
pos_hist = res.to_pandas()
fig = px.bar(neg_hist, x='MinValue', y='CountOfValues',
             labels={'CountOfValues': 'Number of Occurrences', 'MinValue': 'score'},
             title = 'Positive Sentiment Score Distribution',
             color_discrete_sequence = ['green'])

# Show the plot
fig.show()

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>Cleanup</b>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'> <b>Databases and Tables </b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The following code will clean up tables and databases created above.</p>

In [None]:
execute_sql('DROP VIEW prepared_data_V;');
execute_sql('DROP VIEW simplified_sentiment_V;');

In [None]:
remove_context()