# Generate Call Summarization OpenAI or Bedrock

This notebook goes over how to generate individual call metadata (also known as Gen AI Powered Fields) from call transcripts. Primarily these include: 

- Summary
- Topic
- Root Cause
- Issue Resolved (Y/N)
- Callback (Y/N)
- Next Steps by the Customer and Agent

for each call. 
More metadata fields can be added as per requirement.


We generate summaries for both Source Model as well as Target Model. Our source model by default is "mistral.mistral-large-2402-v1:0" but it can be changed to any other model in the config.py file under /src folder. This notebook also demonstrates using OpenAI as the source model. If you have the OpenAI key, you can invoke the model to generate summaries using OpenAI. As for the Target model, this workshop is designed to work with "anthropic.claude-3-sonnet-20240229-v1:0" (Step 4)

![1a3a_Notebook.png](../images/1a3a_Notebook.png)

### Import Libraries

We start with installing deepeval, openai and boto3 libraries
Run below cell. You can ignore pip errors.

In [None]:
!pip install pydantic==1.10.8 --quiet
!pip install openai --quiet
!pip install -U boto3 --quiet

### Parameters

We first define our parameters for source and target models. 

In [17]:
import sys
import os

sys.path.append("../src/")
from config import *

# src models
model_id = MISTRAL_MODEL_ID
# model_id = CLAUDE_MODEL_ID
# model_id = LAMA_MODEL_ID
# model_id = OPENAI_MODEL_ID

# OPENAI_API_KEY
os.environ["OPENAI_API_KEY"] = ""

# target models
# model_id = CLAUDE_MODEL_ID
# model_id = LAMA_MODEL_ID
# model_id = MISTRAL_MODEL_ID

# summarization prompt
prompt_id = "raw"
# prompt_id="optimized" # use for CLAUDE_MODEL_ID

print("model_id=%s, prompt_id=%s" % (model_id, prompt_id))

model_id=mistral.mistral-large-2402-v1:0, prompt_id=raw


### Standard Libraries

In [18]:
import boto3
from botocore.config import Config
import base64
import datetime
import numpy as np
import pandas as pd
import json
import time

### OpenAI invokation functions

This code creates a definition for invoking OpenAI function to generate summaries

In [19]:
from openai import OpenAI

openai_client = OpenAI()


def invoke_openai_base(
    openai_client, messages, model_id, max_tokens=1024, temperature=0.0
):
    time0 = time.time()

    completion = openai_client.chat.completions.create(
        model=model_id,
        messages=messages,
        max_tokens=max_tokens,
        temperature=temperature,
    )
    generated_text = completion.choices[0].message.content

    input_tokens = completion.usage.prompt_tokens
    output_tokens = completion.usage.completion_tokens

    end_time = time.time() - time0
    latency_end = end_time

    output_obj = {
        "response_text": generated_text,
        "input_tokens": input_tokens,
        "output_tokens": output_tokens,
        "latency_end": latency_end,
    }

    return output_obj

Test the invokation

In [20]:
question = "what is LLM (Large Language Model)?"

messages = [
    {"role": "system", "content": ""},
    {"role": "user", "content": question},
    {"role": "assistant", "content": ""},
]

if os.environ["OPENAI_API_KEY"] != "":
    answer = invoke_openai_base(openai_client, messages, OPENAI_MODEL_ID)
    print(answer)

### Bedrock invokation functions

Initialize the Amazon Bedrock runtime client

In [21]:
my_config = Config(
    region_name=AWS_REGION,
    signature_version="v4",
    retries={"max_attempts": 3, "mode": "standard"},
)

client = boto3.client("bedrock-runtime", config=my_config)

Create definition for the invocation function for models in Bedrock. The invocation uses the Converse API for both text and streaming text data. We also calculate metrics like Latency, Input Tokens and Output tokens.

In [22]:
def invoke_base(
    client,
    messages=[{"role": "user", "content": [{"text": "Hello"}]}],
    system="You are an assistant.",
    model_id="",
    max_tokens=1024,
    temperature=0.0,
    top_k=None,
    top_p=None,
    stop_sequences=["Human:"],
    use_streaming=False,
    print_details=True,
):
    """
    Invokes Bedrock models to run an inference using the input
    provided in the request body.

    :param prompt: The prompt that you want to complete.
    :return: Inference response from the model.
    """

    # Invoke Claude models with the text prompt

    inference_config = {
        "maxTokens": max_tokens,
        "temperature": temperature,
    }

    system_config = []

    if top_p is not None:
        inference_config["topP"] = top_p
    if stop_sequences is not None:
        inference_config["stopSequences"] = stop_sequences

    if system is not None:
        system_config.append({"text": system})

    time0 = time.time()
    if use_streaming:
        response = client.converse_stream(
            modelId=model_id,
            messages=messages,
            inferenceConfig=inference_config,
            system=system_config,
        )

        stream = response["stream"]
        output_text = ""
        la = True
        if stream:
            for chunk in stream:
                if la:
                    start_time = time.time() - time0
                    la = False
                if "contentBlockDelta" in chunk:
                    text = chunk["contentBlockDelta"]["delta"]["text"]
                    print(text, end="")
                    output_text = output_text + text
                if "metadata" in chunk:
                    input_tokens = chunk["metadata"]["usage"]["inputTokens"]
                    output_tokens = chunk["metadata"]["usage"]["outputTokens"]
                    latency_start = chunk["metadata"]["metrics"]["latencyMs"] / 1000

        end_time = time.time() - time0
        latency_end = end_time
        output_list = [output_text]
        print(f"\n**** Stream End {end_time} ****\n")
        print("\n")
    else:
        response = client.converse(
            modelId=model_id,
            messages=messages,
            inferenceConfig=inference_config,
            system=system_config,
        )

        end_time = time.time() - time0
        latency_start = end_time
        latency_end = end_time

        # Process and print the response
        result = response.get("output")
        input_tokens = response["usage"]["inputTokens"]
        output_tokens = response["usage"]["outputTokens"]
        output_list = result["message"].get("content", [])
        output_text = "\n".join([x["text"] for x in output_list])
        if print_details:
            print(f"Response(s):")
            print(output_text)

    if print_details:
        print("Latency details:")
        print(f"- The start latency is {latency_start} seconds.")
        print(f"- The full invocation latency is {latency_end} seconds.")

        print("Invocation details:")
        print(f"- The input length is {input_tokens} tokens.")
        print(f"- The output length is {output_tokens} tokens.")

    output_obj = {
        "response_text": output_text,
        "input_tokens": input_tokens,
        "output_tokens": output_tokens,
        "latency_start": latency_start,
        "latency_end": latency_end,
    }

    return output_obj

Test the invokation
You can change parameters like temperature, Top K, Top P as per the requirements. You can also alter the value of "use_streaming" to True or False.

In [23]:
if model_id != OPENAI_MODEL_ID:
    question = "What is the history of Youtube?"

    messages = [
        {"role": "user", "content": [{"text": question}]},
    ]

    output_obj = invoke_base(
        client,
        messages=messages,
        system=None,
        model_id=model_id,
        max_tokens=1024,
        temperature=0.0,
        top_k=None,
        top_p=None,
        stop_sequences=["Human:"],
        use_streaming=False,
        print_details=True,
    )

    answer = output_obj["response_text"]
    print(answer)

Response(s):
YouTube is a video sharing platform that was created in 2005 by three former PayPal employees: Chad Hurley, Steve Chen, and Jawed Karim. The idea for YouTube was born out of the difficulty the three founders faced while trying to share videos of a dinner party online. At the time, there were no easy-to-use video sharing platforms available, so they decided to create their own.

YouTube was officially launched in February 2005, and the first video, titled "Me at the zoo," was uploaded by Karim on April 23, 2005. The video, which is 18 seconds long and features Karim at the San Diego Zoo, is still available on the site.

In the early days, YouTube was primarily used by individuals to share personal videos, but it quickly grew in popularity and began to attract a wider range of content creators, including musicians, comedians, and filmmakers. In 2006, Google acquired YouTube for $1.65 billion, and the platform has since become one of the most popular websites in the world, wi

### Load the Transcripts Data into DataFrame

In [24]:
transcripts = pd.read_csv("../data/call_transcripts.csv")

In [25]:
transcripts.head()

Unnamed: 0,customer_id,call_id,agent_id,transcript,date
0,C101,1,A01,"\nAgent: Good morning, thank you for calling S...",3/5/24
1,C102,2,A02,"Agent: Good morning, thank you for calling SB ...",3/11/24
2,C103,3,A03,"\nAgent: Good morning, thank you for calling S...",2/29/24
3,C104,4,A04,"\nAgent: Good morning, thank you for calling S...",3/7/24
4,C105,5,A05,"\nAgent: Good morning, thank you for calling S...",2/21/24


## Prompts

In this section, the user can add their own prompts to */data/call_summarization_prompts.csv* to generate the corresponding fields.

In this example, we generate only the "summary" of the call. You can add other field in the field_to_question_map dictionary below. 

In [26]:
prompts = pd.read_csv("../data/call_summarization_prompts.csv", encoding="UTF-8")
prompts.head()

Unnamed: 0,prompt_id,prompt_text
0,raw,"Answer the question below, by obtaining the {F..."
1,optimized,You will be answering a question by obtaining ...


In [27]:
prompt_dict = prompts.set_index("prompt_id").T.to_dict()
prompt_dict

{'raw': {'prompt_text': "Answer the question below, by obtaining the {FIELD} from following call transcript of a call between a customer and a customer agent on a specific issue. \nIf you cannot answer the question, reply with 'n/a'. Use gender neutral pronouns.\nWhen you reply, do not use XML tags in the answer. Respond with the answer and a 1-2 sentence explanation. \n\nQuestion: {QUESTION}\n\nTranscript:\n{TRANSCRIPT}\n\nAnswer in the following way: \nAnswer: <your answer to the question>\nStep by Step Reason: <corresponding reason>"},
 'optimized': {'prompt_text': 'You will be answering a question by obtaining information from a transcript of a call between a customer and a customer service agent. Here are the steps:\n\n1. Read the provided transcript carefully:\n<transcript>\n{TRANSCRIPT}\n</transcript>\n\n2. Identify the {FIELD} from the transcript that is relevant to answering the question: "{QUESTION}"\n\n3. Provide your answer and reasoning in the following format:\n\nAnswer: 

In [28]:
user_prompt_template_raw = prompt_dict[prompt_id]["prompt_text"]
print(user_prompt_template_raw)

Answer the question below, by obtaining the {FIELD} from following call transcript of a call between a customer and a customer agent on a specific issue. 
If you cannot answer the question, reply with 'n/a'. Use gender neutral pronouns.
When you reply, do not use XML tags in the answer. Respond with the answer and a 1-2 sentence explanation. 

Question: {QUESTION}

Transcript:
{TRANSCRIPT}

Answer in the following way: 
Answer: <your answer to the question>
Step by Step Reason: <corresponding reason>


You can enter as many metadata fields below that you want to generate in the output file. We are using the ones mentioned in the beginning of the notebook.

In [29]:
field_to_question_map = {
    "summary": "What is the summary of the transcript?",
    "topic": "Describe the topic of this transcript in one word",
    "resolution": "Was the issue resolved? Answer in Yes or No",
    "root_cause": "What was the root cause of the transcript? Write it in one line",
    "call_back": "Does the agent need to call back the customer? Answer in Yes or No",
    "next_steps": "What are the next steps?",
}

## Generate Summaries for Each Transcript

This section demonstrates how to generate the genAI powered fields. 

Define task-specific function to generate the answer. You can change parameters like temperature, Top K, Top P as per the requirements. You can also alter the value of "use_streaming" to True or False.

In [30]:
def generate_answer(
    model_id,
    user_prompt_template,
    transcript,
    field,
    question,
    max_tokens=1024,
    temperature=0.0,
):

    user_prompt = user_prompt_template.format(
        TRANSCRIPT=transcript, FIELD=field, QUESTION=question
    )
    generated_text = None
    if model_id.startswith("gpt-"):
        messages = [
            {"role": "system", "content": ""},
            {"role": "user", "content": user_prompt},
            {"role": "assistant", "content": ""},
        ]
        output_obj = invoke_openai_base(openai_client, messages, model_id)
        generated_text = output_obj["response_text"]
    else:
        messages = [{"role": "user", "content": [{"text": user_prompt}]}]

        output_obj = invoke_base(
            client,
            messages=messages,
            system=None,
            model_id=model_id,
            max_tokens=max_tokens,
            temperature=temperature,
            top_k=None,
            top_p=None,
            stop_sequences=["Human:"],
            use_streaming=False,
            print_details=False,
        )

        generated_text = output_obj["response_text"]

    generated_text = generated_text.split("Step by Step Reason:")[0]
    generated_text = generated_text.replace("```", "").strip("Answer: ").strip()

    return (generated_text, output_obj)

### Specify model and generate answer for each field

In [31]:
%%time

model_name = ""
if model_id.startswith("gpt-"):
    model_name="openai"
else:
    model_name=model_id.split(".")[0]
final_file_name = f"call_summarization_outputs_{model_name}.csv"

print("running model:%s" % model_name)

max_tokens = 256
temperature = 0.0

tList=[]
n=0
for x,y in transcripts.iterrows():
    n=n+1
    print("call summary:%s" % n)
    for prompt in prompt_dict:
        if prompt==prompt_id:
            row=y.copy()
            user_prompt_template=prompt_dict[prompt]['prompt_text'] 
            print("  run prompt:%s" % (prompt))
            row['prompt_id']=prompt
            for key in field_to_question_map.keys():
                field = key.replace('_',' ')
                print("    run field:%s" % field)
                question = field_to_question_map[key]
                (genText, outputObj) = generate_answer(model_id, user_prompt_template, 
                    transcript = row['transcript'], field = field, question = question, 
                    max_tokens = max_tokens, temperature = temperature)
                row[key]=genText
                if field=='summary':
                    row['metric_summary_input_tokens']=outputObj['input_tokens']
                    row['metric_summary_output_tokens']=outputObj['output_tokens']
                    row['metric_summary_output_tokens']=outputObj['output_tokens']
                    row['metric_summary_latency']=outputObj['latency_end']
            tList.append(row)
transcripts=pd.DataFrame.from_records(tList)

running model:mistral
call summary:1
  run prompt:raw
    run field:summary
    run field:topic
    run field:resolution
    run field:root cause
    run field:call back
    run field:next steps
call summary:2
  run prompt:raw
    run field:summary
    run field:topic
    run field:resolution
    run field:root cause
    run field:call back
    run field:next steps
call summary:3
  run prompt:raw
    run field:summary
    run field:topic
    run field:resolution
    run field:root cause
    run field:call back
    run field:next steps
call summary:4
  run prompt:raw
    run field:summary
    run field:topic
    run field:resolution
    run field:root cause
    run field:call back
    run field:next steps
call summary:5
  run prompt:raw
    run field:summary
    run field:topic
    run field:resolution
    run field:root cause
    run field:call back
    run field:next steps
CPU times: user 152 ms, sys: 11.4 ms, total: 164 ms
Wall time: 1min 20s


Perform any post-processing of the dataset. 

In [32]:
transcripts = transcripts.loc[~transcripts.astype(str).eq("").any(axis=1)]
transcripts = transcripts.dropna()

In [33]:
transcripts.head()

Unnamed: 0,customer_id,call_id,agent_id,transcript,date,prompt_id,summary,metric_summary_input_tokens,metric_summary_output_tokens,metric_summary_latency,topic,resolution,root_cause,call_back,next_steps
0,C101,1,A01,"\nAgent: Good morning, thank you for calling S...",3/5/24,raw,"Sarah, the customer, called SB Bank to inquire...",767,150,4.195616,Credit-Card,Yes,The root cause of the call was the customer's ...,No,Sarah needs to wait for 7-10 business days to ...
1,C102,2,A02,"Agent: Good morning, thank you for calling SB ...",3/11/24,raw,"Sarah Thompson, a customer who applied for a c...",635,184,4.976691,Credit card delivery,Yes,The root cause of the issue was the credit car...,No,Sarah will receive a replacement credit card w...
2,C103,3,A03,"\nAgent: Good morning, thank you for calling S...",2/29/24,raw,"The customer, Sarah, was affected by recent fl...",676,121,3.438767,Extension,Yes,The root cause was the recent floods in Califo...,No,The next step is for Sarah to make her minimum...
3,C104,4,A04,"\nAgent: Good morning, thank you for calling S...",3/7/24,raw,"The customer, Sarah, was incorrectly charged a...",680,154,4.216477,Refund,Yes,The root cause of the issue was a temporary sy...,No,The next steps are for the customer to wait fo...
4,C105,5,A05,"\nAgent: Good morning, thank you for calling S...",2/21/24,raw,Sarah Thompson reported a fraudulent transacti...,850,155,4.327091,Fraud,Yes,The root cause of the call was a fraudulent ai...,No,The next steps are for the agent to pass the d...


Save results to a file

In [34]:
transcripts.to_csv(f"../outputs/{final_file_name}", index=False)