<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Complaints Summarization Using Vantage and Azure-OpenAI
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:20px;font-family:Arial;color:#00233c'><b>Introduction:</b></p>

<p style='font-size:16px;font-family:Arial;color:#00233C'>In this demo we'll deep dive on Complaints Summarization using <b>Teradata Vantage</b> and <b>Azure-OpenAI</b> model. This cutting-edge solution empowers organizations to efficiently manage and analyze customer complaints, providing actionable insights to enhance customer satisfaction and improve business operations.</p> 

<p style='font-size:16px;font-family:Arial;color:#00233C'><b>Key Features:</b></p> 

<ol style='font-size:16px;font-family:Arial;color:#00233C'>
  <li><b>AI-Powered Summarization</b>: Utilizing advanced natural language processing (NLP) and machine learning algorithms, the system automatically summarizes complaints, identifying key issues, sentiment, and root causes.</li>
  <li><b>Real-Time Analytics</b>: The platform provides real-time analytics and visualization tools, enabling users to track complaint trends, sentiment analysis, and issue resolution rates.</li>
  <li><b>Customizable Dashboards</b>: Users can create personalized dashboards to monitor specific complaint categories, product lines, or geographic regions, ensuring targeted insights and swift action.</li>
    <li><b>Integration with Azure-OpenAI</b>: Seamless integration with <b>Teradata Vantage</b> and <b>Azure-OpenAI</b> models enables users to leverage the power of cloud-based infrastructure and advanced analytics capabilities.</li>
</ol>

<p style='font-size:16px;font-family:Arial;color:#00233C'><b>Benefits:</b></p> 
<ol style='font-size:16px;font-family:Arial;color:#00233C'>
  <li><b>Enhanced Customer Experience</b>: By quickly identifying and addressing customer concerns, organizations can improve customer satisfaction and loyalty.</li>
  <li><b>Operational Efficiency</b>: Automated complaint summarization and analytics reduce manual processing time, allowing teams to focus on issue resolution and strategic decision-making.</li>
  <li><b>Data-Driven Decision-Making</b>: The platform provides actionable insights, enabling organizations to make informed decisions and drive business growth.</li>
</ol>

<p style = 'font-size:16px;font-family:Arial;color:#00233c'><b>Steps in the analysis:</b></p>
<ol style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li>Configuring the environment</li>
    <li>Connect to Vantage</li>
    <li>Configuring Azure-OpenAI</li>
    <li>Complaints Summarization</li>
    <li>Cleanup</li>
</ol>

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:20px;font-family:Arial;color:#00233c'>1. Configuring the environment</b>
<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>1.1 Downloading and installing additional software needed</b>

In [None]:
%%capture
!pip install -r requirements.txt --upgrade --quiet

<div class="alert alert-block alert-info">
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Note: </b><i>Please restart the kernel after executing these two lines. The simplest way to restart the Kernel is by typing zero zero: <b> 0 0</b></i></p>

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>1.2 Import the required libraries</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Here, we import the required libraries, set environment variables and environment paths (if required).</p>

In [None]:
# Data manipulation and analysis
import numpy as np
import pandas as pd

# Plotting
import plotly.express as px

# Progress bar
from tqdm import tqdm

# Machine learning and other utilities from Teradata
from teradataml import *

# Requests
import requests

# Display settings
display.max_rows = 5
pd.set_option('display.max_colwidth', None)

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>2. Connect to Vantage</b>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We will be prompted to provide the password. We will enter the password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ../startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)
execute_sql('''SET query_band='DEMO=Complaint_Summarization.ipynb;' UPDATE FOR SESSION;''')

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Begin running steps with Shift + Enter keys. </p>

<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>Getting Data for This Demo</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We have provided data for this demo on cloud storage. We have the option of either running the demo using foreign tables to access the data without using any storage on our environment or downloading the data to local storage, which may yield somewhat faster execution. However, we need to consider available storage. There are two statements in the following cell, and one is commented out. We may switch which mode we choose by changing the comment string.</p>

In [None]:
# %run -i ../run_procedure.py "call get_data('DEMO_ComplaintAnalysis_cloud');"        # Takes 1 minute
%run -i ../run_procedure.py "call get_data('DEMO_ComplaintAnalysis_local');"        # Takes 2 minutes

<hr style="height:2px;border:none;background-color:#00233C;">
<b style='font-size:22px;font-family:Arial;color:#00233C'>3. Configuring Azure-OpenAI</b>
<p style='font-size:16px;font-family:Arial;color:#00233C'>Before proceeding, you need to provide the following information:</p>
<ul style='font-size:16px;font-family:Arial;color:#00233C'>
<li><b>Endpoint</b>: Enter your Azure-OpenAI deployment endpoint.</li>
<li><b>Azure-OpenAI API Key</b>: Enter your Azure-OpenAI API Key.</li>
</ul>
<p style='font-size:16px;font-family:Arial;color:#00233C'>If you haven't retrieved your API Key and Endpoint yet, follow the instructions <a href="https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart?tabs=command-line%2Cpython-new&pivots=programming-language-python#retrieve-key-and-endpoint" target="_blank" style="color:#0066CC;text-decoration:none;"><b>here</b></a>.</p>
<p style='font-size:16px;font-family:Arial;color:#00233C'>Don't have an Azure-OpenAI resource yet? Follow this guide:</p>
<a href="./Azure-OpenAI.ipynb" style="text-decoration:none;" target="_blank">
    <button style="font-size:16px;font-family:Arial;color:#fff;background-color:#00233C;border:none;border-radius:5px;cursor:pointer;height:50px;line-height:50px;display:flex;align-items:center;">
        Azure-OpenAI Guide <span style="margin-left:10px;">&#8658;</span>
    </button>
</a>


In [None]:
# Prompt user for Azure-OpenAI endpoint securely
ENDPOINT = getpass.getpass(prompt="\nPlease enter your Azure-OpenAI endpoint: ")
# Prompt user for Azure-OpenAI API key securely
API_KEY = getpass.getpass(prompt="\nPlease enter your Azure-OpenAI API key: ")

<hr style="height:1px;border:none;background-color:#00233C;">
<b style = 'font-size:18px;font-family:Arial;color:#00233C'>3.1 Initialize the Azure-OpenAI API request</b>

In [None]:
headers = {
    "Content-Type": "application/json",
    "api-key": API_KEY,
}

def get_payload(complaint):
    
    prompt = f'''
        The following is text from a Bank Review:

        “{complaint}”

        Give me reasoning as well as summary for this review.

        Instructions for Reasoning:
        - Give me Reasoning in short
        - Only one sentence reasoning
        Instructions for Summary:
        - A short one sentence Summary of everything the review states.

        My output comes in the format:
        Summary:
        Reasoning:
    '''
    
    # Payload for the request
    payload = {
      "messages": [
        {
          "role": "system",
          "content": [
            {
              "type": "text",
              "text": "You are an assistant that summarizes and gives reasoning for the summarization as well.\n"
            }
          ]
        },
        {
          "role": "user",
          "content": [
            {
              "type": "text",
              "text": prompt
            }
          ]
        }
      ],
      "temperature": 0.7,
      "top_p": 1,
      "max_tokens": 4096
    }

    return(payload)

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>4. Complaints summarization</b>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Complaints summarization with Language Model (LLM) models involves condensing lengthy complaints into concise, informative summaries. By leveraging advanced natural language processing techniques, LLMs efficiently extract key issues, sentiments, and resolutions, aiding in quicker understanding and response to customer grievances.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Streamlining the complaint summarization process, Language Model (LLM) models efficiently distill verbose grievances into concise, yet informative summaries. These summaries meticulously capture crucial elements including primary issues, prevalent sentiments, and possible resolutions. Harnessing advanced natural language processing capabilities, LLMs accelerate both comprehension and response to customer concerns, thereby elevating operational efficiency and bolstering overall customer satisfaction.</p>

In [None]:
df = DataFrame(in_schema('DEMO_ComplaintAnalysis', 'Consumer_Complaints')).to_pandas(num_rows = 200)
df['Summary'] = ""
df['Reasoning with Chain of Thought'] = ""

In [None]:
for i in tqdm(range(len(df))):
    # Send request
    try:
        response = requests.post(ENDPOINT, headers = headers, json = get_payload(df['consumer_complaint_narrative'][i]))
        response.raise_for_status()
    except requests.RequestException as e:
        raise SystemExit(f"Failed to make the request. Error: {e}")

    output = response.json()['choices'][0]['message']['content']
    summary = re.search('Summary:(.*)', output).group(1)
    if summary == "":
        summary = re.search('Summary:\n(.*)', output).group(1)
    reasoning = re.search('Reasoning:(.*)', output).group(1)
    if reasoning == "":
        reasoning = re.search('Reasoning:\n(.*)', output).group(1)
    df['Summary'][i] = summary.strip()
    df['Reasoning with Chain of Thought'][i] = reasoning.strip()

In [None]:
df[['complaint_id', 'consumer_complaint_narrative', 'Summary', 'Reasoning with Chain of Thought']]

<hr style='height:1px;border:none;background-color:#00233C;'>
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>4.1 Graph for Complaint and Summary Lengths</b></p><p style = 'font-size:16px;font-family:Arial;color:#00233c'>A graph illustrating the Narrative length vs summary length. On the x-axis, you'd have "Narrative length" ranging from short to long complaints or narratives. On the y-axis, you'd have "Summary length" ranging from brief to detailed summaries. As narrative length increases, summary length would generally decrease, indicating the summarization process effectively condenses longer narratives into shorter summaries. This relationship would likely follow a downward trend, showcasing the summarization efficiency of the LLM models.</p>

In [None]:
# Truncate text for hover data
max_chars = 50  # Maximum characters to display
df['truncated_narrative'] = df['consumer_complaint_narrative'].apply(lambda x: x[:max_chars] + '...' if len(x) > max_chars else x)
df['truncated_summary'] = df['Summary'].apply(lambda x: x[:max_chars] + '...' if len(x) > max_chars else x)

# Calculate the length of consumer_complaint_narrative and Summary
df['narrative_length'] = df['consumer_complaint_narrative'].apply(len)
df['summary_length'] = df['Summary'].apply(len)

# Create a scatter plot
fig = px.scatter(df.sort_values(['narrative_length']), x='narrative_length', y='summary_length',
                 hover_data=['complaint_id', 'truncated_narrative', 'truncated_summary'],
                 labels={'narrative_length': 'Narrative Length', 'summary_length': 'Summary Length'},
                 title='Complaint and Summary Lengths')

# Update the x-axis to show values as they are (not in scientific notation)
fig.update_xaxes(type='category')

# Show the plot
fig.show()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Save the results back to Vantage.</p>

In [None]:
copy_to_sql(df = df, table_name = 'Complaints_Summaries', if_exists = 'replace')

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>5. Cleanup</b>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'> <b>Databases and Tables </b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The following code will clean up tables and databases created above.</p>

In [None]:
%run -i ../run_procedure.py "call remove_data('DEMO_ComplaintAnalysis');"        # Takes 10 seconds

In [None]:
remove_context()

<hr style="height:1px;border:none;background-color:#00233C;">
<b style = 'font-size:18px;font-family:Arial;color:#00233C'>Dataset:</b>
<br>
<br>
<p style='font-size: 16px; font-family: Arial; color: #00233C;'>The dataset is sourced from <a href='https://www.consumerfinance.gov/data-research/consumer-complaints/'>Consumer Financial Protection Bureau</a></p>

<footer style="padding-bottom:35px; background:#f9f9f9; border-bottom:3px solid #00233C">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2024. All Rights Reserved
        </div>
    </div>
</footer>