<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Sentiment Analysis Using Vantage and Azure OpenAI
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:20px;font-family:Arial;color:#00233c'><b>Introduction:</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Sentiment analysis using <b>Teradata Vantage</b> and the advanced <b>Azure OpenAI</b> model involves leveraging cutting-edge technologies to extract insights from unstructured data. This process empowers businesses to swiftly identify and address customer concerns, enhancing overall customer satisfaction and loyalty.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Key Features:</b></p>
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li><b>Polarity Classification:</b> Identifies specific emotions such as happiness, anger, sadness, and more.</li>
    <li><b>Emotion Detection:</b> The system comprehends the nuances of customer feedback, capturing subtle differences in tone and language.</li>
    <li><b>Aspect-Based Sentiment Analysis:</b>  Analyzes sentiment towards specific features or aspects of a product or service.</li>
    <li><b>Fine-Grained Sentiment Analysis:</b> Provides detailed sentiment analysis at the phrase or clause level.</li>
    <li><b>Subjectivity Classification:</b> Distinguishes between objective and subjective text.</li>

</ul>


<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Benefits:</b></p>
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li><b>Improved Customer Satisfaction:</b> Enhances customer experience by addressing concerns and improving products.</li>
    <li><b>Competitive Advantage:</b> Provides valuable insights to stay ahead of competitors.</li> 
    <li><b>Objective Insights:</b> Offers unbiased and accurate sentiment analysis.</li>
    <li><b>Real-Time Decision Making:</b> Enables swift responses to customer concerns and market trends.</li>
    <li><b>Scalability:</b> Handles large volumes of data efficiently.</li>
</ul>

<p style = 'font-size:16px;font-family:Arial;color:#00233c'>Experience the transformative power of Generative AI in complaints classification.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233c'><b>Steps in the analysis:</b></p>
<ol style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li>Configuring the environment</li>
    <li>Connect to Vantage</li>
    <li>Configuring Azure OpenAI</li>
    <li>Complaints Sentiment Analysis</li>
    <li>Cleanup</li>
</ol>

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:20px;font-family:Arial;color:#00233c'>1. Configuring the environment</b>
<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>1.1 Downloading and installing additional software needed</b>

In [None]:
%%capture
!pip install -r requirements.txt --upgrade --quiet

<div class="alert alert-block alert-info">
    <p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Note: </b><i>Please restart the kernel after executing these two lines. The simplest way to restart the Kernel is by typing zero zero: <b> 0 0</b></i></p>
</div>

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>1.2 Import the required libraries</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Here, we import the required libraries, set environment variables and environment paths (if required).</p>

In [None]:
# Data manipulation and analysis
import numpy as np
import pandas as pd

# Visualization
import plotly.express as px
import matplotlib.pyplot as plt
from wordcloud import WordCloud

# Progress bar
from tqdm import tqdm

# Machine learning and other utilities from Teradata
from teradataml import *
from sqlalchemy import func

# Requests
import requests

# Display settings
display.max_rows = 5
pd.set_option('display.max_colwidth', None)

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>2. Connect to Vantage</b>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We will be prompted to provide the password. We will enter the password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ../startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)
execute_sql('''SET query_band='DEMO=Sentiment_Analysis_OpenAI.ipynb;' UPDATE FOR SESSION;''')

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Begin running steps with Shift + Enter keys. </p>

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>2.1 Getting Data for This Demo</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We have provided data for this demo on cloud storage. We have the option of either running the demo using foreign tables to access the data without using any storage on our environment or downloading the data to local storage, which may yield somewhat faster execution. However, we need to consider available storage. There are two statements in the following cell, and one is commented out. We may switch which mode we choose by changing the comment string.</p>

In [None]:
# %run -i ../run_procedure.py "call get_data('DEMO_ComplaintAnalysis_cloud');"        # Takes 1 minute
%run -i ../run_procedure.py "call get_data('DEMO_ComplaintAnalysis_local');"        # Takes 2 minutes

<hr style="height:2px;border:none;background-color:#00233C;">
<b style='font-size:20px;font-family:Arial;color:#00233C'>3. Configuring Azure OpenAI</b>
<p style='font-size:16px;font-family:Arial;color:#00233C'>Before proceeding, you need to provide the following information:</p>
<ul style='font-size:16px;font-family:Arial;color:#00233C'>
<li><b>Endpoint</b>: Enter your Azure OpenAI deployment endpoint.</li>
<li><b>Azure OpenAI API Key</b>: Enter your Azure OpenAI API Key.</li>
</ul>
<p style='font-size:16px;font-family:Arial;color:#00233C'>If you haven't retrieved your API Key and Endpoint yet, follow the instructions <a href="https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Cpython-new&pivots=programming-language-python" target="_blank" style="color:#0066CC;text-decoration:none;"><b>here</b></a>.</p>
<p style='font-size:16px;font-family:Arial;color:#00233C'>Don't have an Azure OpenAI resource yet? Follow this guide:</p>
<a href="./Azure-OpenAI.ipynb" style="text-decoration:none;" target="_blank">
    <button style="font-size:16px;font-family:Arial;color:#fff;background-color:#00233C;border:none;border-radius:5px;cursor:pointer;height:50px;line-height:50px;display:flex;align-items:center;">
        Azure OpenAI Guide <span style="margin-left:10px;">&#8658;</span>
    </button>
</a>

In [None]:
# Prompt user for Azure OpenAI endpoint securely
os.environ["ENDPOINT"] = getpass.getpass(prompt="\nPlease enter your Azure OpenAI endpoint(gpt-4o-mini): ")
# Prompt user for Azure OpenAI API key securely
os.environ["API_KEY"] = getpass.getpass(prompt="\nPlease enter your Azure OpenAI API key(gpt-4o-mini): ")

<hr style="height:1px;border:none;background-color:#00233C;">
<b style = 'font-size:18px;font-family:Arial;color:#00233C'>3.1 Initialize the Azure OpenAI API request</b>

In [None]:
headers = {
    "Content-Type": "application/json",
    "api-key": os.environ["API_KEY"],
}

def get_payload(complaint):
    
    prompt = f'''
        User prompt:
        The following is text from a review:

        “{complaint}”

        Categorize the review as one of the following: Positive, Negative, Neutral

        My output comes in the format:
        Sentiment:
        Reasoning:
    '''
    
    # Payload for the request
    payload = {
      "messages": [
        {
          "role": "system",
          "content": [
            {
              "type": "text",
              "text": "You are an assistant that analyzes sentiment of a text and gives reasoning for the categorization as well.\n"
            }
          ]
        },
        {
          "role": "user",
          "content": [
            {
              "type": "text",
              "text": prompt
            }
          ]
        }
      ],
      "temperature": 0.7,
      "top_p": 1,
      "max_tokens": 512
    }

    return(payload)

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>4. Complaints Sentiment Analysis</b>
<p style="font-size:16px;font-family:Arial;color:#00233C">We'll analyze the sentiments of a sample of customer complaints data.</p>

In [None]:
tdf = DataFrame(in_schema('DEMO_ComplaintAnalysis', 'Consumer_Complaints'))
tdf

In [None]:
df = tdf.to_pandas(num_rows = 20)
df['Sentiment'] = ""
df['Reasoning with Chain of Thought'] = ""

In [None]:
for i in tqdm(range(len(df))):
    # Send request
    try:
        response = requests.post(os.environ["ENDPOINT"], headers = headers, json = get_payload(df['consumer_complaint_narrative'][i]))
        response.raise_for_status()
    except requests.RequestException as e:
        raise SystemExit(f"Failed to make the request. Error: {e}")

    output = response.json()['choices'][0]['message']['content']

    try:
        category = re.search('Sentiment:(.*)', output).group(1)
        reasoning = re.search('Reasoning:(.*)', output).group(1)
        df['Sentiment'][i] = category.strip()
        df['Reasoning with Chain of Thought'][i] = reasoning.strip()
    except:
        pass

In [None]:
df[['complaint_id', 'consumer_complaint_narrative', 'Sentiment', 'Reasoning with Chain of Thought']].head(5)

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Now the results can be saved back to Vantage.</p>

In [None]:
copy_to_sql(df = df, table_name = 'complaints_sentiment', if_exists = 'replace')

In [None]:
sentiment_df = DataFrame('complaints_sentiment')
sentiment_df = sentiment_df.assign(date_received = sentiment_df.date_received.cast(type_=DATE))
print('Before: ', sentiment_df.shape)
sentiment_df = sentiment_df.loc[sentiment_df.Sentiment.isin(['Positive', 'Negative', 'Neutral'])]
print('After: ', sentiment_df.shape)

<hr style='height:1px;border:none;background-color:#00233C;'>
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>4.1 Consumer Sentiments Prediction vs Occurrences</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233c'>A graph illustrating the relationship between consumer sentiments (positive, negative, neutral) prediction and the number of occurrences. This visual representation helps identify trends, patterns, and areas for improvement, enabling data-driven decision making.</p>

In [None]:
from IPython.display import display, Markdown
def display_helper(msg):
    return display(Markdown(
        f"""<div class="alert alert-block alert-info">
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Note: </b>
<i>{msg}</i></p>"""))

In [None]:
from collections import Counter
data = Counter(sentiment_df[['Sentiment']].get_values().flatten())

# Convert Counter data to DataFrame
df = pd.DataFrame.from_dict(data, orient='index', columns=['Count']).reset_index()

# Rename columns
df.columns = ['Sentiment', 'Count']

# Create bar graph using Plotly Express
fig = px.bar(df, x='Sentiment', y='Count', color='Sentiment',
             labels={'Count': 'Number of Occurrences', 'Sentiment': 'Sentiment'})

# Show the plot
fig.show()

<hr style='height:1px;border:none;background-color:#00233C;'>
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>4.2 Word Cloud for Negative Consumer Sentiment Prediction</b></p>

<p style='font-size:16px;font-family:Arial;color:#00233c'> Unlock the power of customer feedback with our intuitive word cloud visualization, which provides a comprehensive snapshot of <b>negative consumer complaints sentiment</b>. This innovative tool highlights the most frequently occurring words and pain points in customer feedback, empowering businesses to: </p> <ol style='font-size:16px;font-family:Arial;color:#00233c'> <li>Identify trends and sentiment patterns</li> <li>Pinpoint areas for improvement</li> <li>Make data-driven decisions to enhance customer satisfaction and loyalty</li> </ol> <p style='font-size:16px;font-family:Arial;color:#00233c'> By leveraging this word cloud, businesses can proactively address customer concerns, refine their products and services, and ultimately drive growth through a deeper understanding of their customers' needs and preferences. </p>

In [None]:
neg = sentiment_df[sentiment_df['Sentiment'] == 'Negative'].to_pandas()
neg_text = ' '.join(neg['consumer_complaint_narrative'])

# Replace 'X' with blank space
modified_string = neg_text.replace('X', '')

if len(modified_string) > 0:
    wordcloud = WordCloud(width=800, height=400, background_color='white').generate(modified_string)

    # Display the word cloud
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.tight_layout()
    plt.axis("off")
    plt.show()
else:
    display_helper("We included positive, negative, and neutral categories to cover all bases. But in this sample, it's possible that none of the complaints are actually negative.")

<hr style='height:1px;border:none;background-color:#00233C;'>
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>4.3 Word Cloud for Neutral Consumer Sentiment Prediction</b></p>

<p style='font-size:16px;font-family:Arial;color:#00233c'>Tap into the insights of customer feedback with our intuitive word cloud visualization, which offers a detailed overview of <b>neutral consumer complaints sentiment</b></p>

In [None]:
neu = sentiment_df[sentiment_df['Sentiment'] == 'Neutral'].to_pandas()
neu_text = ' '.join(neu['consumer_complaint_narrative'])

# Replace 'X' with blank space
modified_string = neu_text.replace('X', '')

if len(modified_string) > 0:
    wordcloud = WordCloud(width=800, height=400, background_color='white').generate(modified_string)

    # Display the word cloud
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.tight_layout()
    plt.axis("off")
    plt.show()
else:
    display_helper("To cover all possible scenarios, we included positive, negative, and neutral categories in our analysis. However, given that this dataset consists of complaints, it's expected that the model would rarely, if ever, encounter positive or neutral responses.")

<hr style='height:1px;border:none;background-color:#00233C;'>
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>4.4 Word Cloud for Positive Consumer Sentiment Prediction</b></p>

<p style="font-size:16px;font-family:Arial;color:#00233c">Explore customer feedback insights with our intuitive word cloud visualization, providing a detailed overview of consumer sentiment.</p>

In [None]:
pos = sentiment_df[sentiment_df['Sentiment'] == 'Positive'].to_pandas()
pos_text = ' '.join(pos['consumer_complaint_narrative'])

# Replace 'X' with blank space
modified_string = pos_text.replace('X', '')

if len(modified_string) > 0:
    wordcloud = WordCloud(width=800, height=400, background_color='white').generate(modified_string)

    # Display the word cloud
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.tight_layout()
    plt.axis("off")
    plt.show()
else:
    display_helper("To cover all possible scenarios, we included positive, negative, and neutral categories in our analysis. However, given that this dataset consists of complaints, it's expected that the model would rarely, if ever, encounter positive or neutral responses.")

<hr style='height:1px;border:none;background-color:#00233C;'>
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>4.5 Negative Sentiment per Product Over Years</b></p>

<p style="font-size:16px;font-family:Arial;color:#00233c">This graph tracks the negative sentiment  associated with different products over time, offering valuable insights into evolving customer perceptions and pain points.</p>

<p style="font-size:16px;font-family:Arial;color:#00233c">We will use <b>Vantage in-db</b> function <b>OrdinalEncodingFit</b> which will identifies distinct categorical values from the input data or a user-defined list and generates the distinct categorical values along with the ordinal value for each category.<p?

In [None]:
ordinal_fit = OrdinalEncodingFit(
    data = sentiment_df,
    target_column = ['Sentiment'],
    approach = 'LIST',
    categories = ['Negative', 'Neutral', 'Positive']
)

ordinal_fit.result

In [None]:
out = ColumnTransformer(
    input_data = sentiment_df[['date_received', 'product', 'Sentiment']],
    ordinalencoding_fit_data = ordinal_fit.result
)

In [None]:
result = out.result
result = result.assign(Sentiment = result.Sentiment - 1)
result = result.assign(year = func.td_year_of_calendar(result.date_received.expression))
result

In [None]:
viz_neg = result[result['Sentiment'] == -1]

if viz_neg.shape[0] > 0:

    viz_senti = viz_neg.select(['product','Sentiment', 'year']).groupby(['product', 'year']).agg(['sum']).to_pandas()

    # Sorting the DataFrame by year for each product
    pd_df_sorted = viz_senti.sort_values(by=['product', 'year'])

    # Plotting using Plotly
    fig = px.line(pd_df_sorted, x='year', y='sum_Sentiment', color='product', markers=True, title='Negative Sentiment per Product Over Years')
    fig.update_layout(xaxis_title='Year', yaxis_title='Count', legend_title='Product', width=1000, height=600)

    fig.show()
else:
    display_helper("We included positive, negative, and neutral categories to cover all bases. But in this sample, it's possible that none of the complaints are actually negative.")

<hr style='height:1px;border:none;background-color:#00233C;'>
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>4.6 Neutral Sentiment per Product Over Years</b></p>

<p style="font-size:16px;font-family:Arial;color:#00233c">This graph tracks the neutral sentiment  associated with different products over time, offering valuable insights into evolving customer perceptions and pain points.</p>

In [None]:
viz_neu = result[result['Sentiment'] == 0]

if viz_neu.shape[0] > 0:
    viz_senti = viz_neu.select(['product','Sentiment', 'year']).groupby(['product', 'year']).agg(['sum']).to_pandas()

    # Sorting the DataFrame by year for each product
    pd_df_sorted = viz_senti.sort_values(by=['product', 'year'])

    # Plotting using Plotly
    fig = px.line(pd_df_sorted, x='year', y='sum_Sentiment', color='product', markers=True, title='Neutral Sentiment per Product Over Years')
    fig.update_layout(xaxis_title='Year', yaxis_title='Count', legend_title='Product', width=1000, height=600)

    fig.show()
else:
    display_helper("To cover all possible scenarios, we included positive, negative, and neutral categories in our analysis. However, given that this dataset consists of complaints, it's expected that the model would rarely, if ever, encounter positive or neutral responses.")

<hr style='height:1px;border:none;background-color:#00233C;'>
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>4.7 Positive Sentiment per Product Over Years</b></p>

<p style="font-size:16px;font-family:Arial;color:#00233c">This graph tracks the positive sentiment  associated with different products over time, offering valuable insights into evolving customer perceptions and pain points.</p>

In [None]:
viz_pos = result[result['Sentiment'] == 1]

if viz_pos.shape[0] > 0:
    viz_senti = viz_pos.select(['product','Sentiment', 'year']).groupby(['product', 'year']).agg(['sum']).to_pandas()

    # Sorting the DataFrame by year for each product
    pd_df_sorted = viz_senti.sort_values(by=['product', 'year'])

    # Plotting using Plotly
    fig = px.line(pd_df_sorted, x='year', y='sum_Sentiment', color='product', markers=True, title='Positive Sentiment per Product Over Years')
    fig.update_layout(xaxis_title='Year', yaxis_title='Count', legend_title='Product', width=1000, height=600)

    fig.show()
else:
    display_helper("To cover all possible scenarios, we included positive, negative, and neutral categories in our analysis. However, given that this dataset consists of complaints, it's expected that the model would rarely, if ever, encounter positive or neutral responses.")

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>5. Cleanup</b>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Work Tables</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Cleanup work tables to prevent errors next time.</p>

In [None]:
tables = ['complaints_sentiment']

# Loop through the list of tables and execute the drop table command for each table
for table in tables:
    try:
        db_drop_table(table_name=table)
    except:
        pass

<p style = 'font-size:18px;font-family:Arial;color:#00233C'> <b>Databases and Tables </b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The following code will clean up tables and databases created above.</p>

In [None]:
%run -i ../run_procedure.py "call remove_data('DEMO_ComplaintAnalysis');"        # Takes 10 seconds

In [None]:
remove_context()

<hr style="height:1px;border:none;background-color:#00233C;">
<b style = 'font-size:18px;font-family:Arial;color:#00233C'>Dataset:</b>
<br>
<br>
<p style='font-size: 16px; font-family: Arial; color: #00233C;'>The dataset is sourced from <a href='https://www.consumerfinance.gov/data-research/consumer-complaints/'>Consumer Financial Protection Bureau</a></p>

<footer style="padding-bottom:35px; background:#f9f9f9; border-bottom:3px solid #00233C">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2024. All Rights Reserved
        </div>
    </div>
</footer>