# Summarization Metrics

In this notebook, we will demonstrate how to calculate metrics to assess the quality of a Generative AI (GenAI) summary. Unfortunately, there isn't a particularly clean way for analyzing any GenAI model, as the quality of the summary is subjective. However, we can use some metrics to get a sense of how well the model is performing.

## Notebook Setup

In [10]:
# Importing the necessary Python libraries
import os
import json

import pandas as pd
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate, HumanMessagePromptTemplate
from langchain_openai import ChatOpenAI

In [6]:
# Setting the LangChain chat model
chat_model = ChatOpenAI(api_key = os.getenv('PERPLEXITY_API_KEY'),
                        base_url = 'https://api.perplexity.ai',
                        model = 'llama-3.1-70b-instruct')

## Data Simulation
In order to proceed forward with this notebook, we'll need to simulate some fake data. For your benefit, I have saved the simulated data back as a CSV back to this repo so that you don't have to regenerate the same thing.

In [15]:
# Creating a prompt to generate topics around various IT related activities
TOPIC_GENERATION_PROMPT = '''Assume that you are an IT helpdesk specialist that is responsible for providing technical support to users. Please generate a list of 10 different topics that you might help users with. Please output the final response as a JSON list. Only include the JSON list with no additional text. Follow the example below:

Example:
["Resetting a Password", "Setting Up a VPN"]
'''

# Setting the prompt template to generate the IT related topics
topic_generation_template = ChatPromptTemplate(messages = [
    HumanMessagePromptTemplate.from_template(template = TOPIC_GENERATION_PROMPT)
])

# Creating a chain to generate the IT related topics
topic_generation_chain = topic_generation_template | chat_model | StrOutputParser()

In [24]:
# Checking if the simulated data file exists and generating topics if it does not
if not os.path.exists('simulated_data.csv'):

    # Generating topics using the topic generation chain
    generated_topics = json.loads(topic_generation_chain.invoke(input = {}))
    print(generated_topics)

['Troubleshooting Printer Connectivity Issues', 'Configuring Email on a Mobile Device', 'Resetting a Forgotten Password', 'Installing and Updating Software', 'Resolving Wi-Fi Connectivity Problems', 'Setting Up a New Computer or Laptop', 'Configuring Dual Monitors', 'Troubleshooting Microsoft Office Issues', 'Setting Up a Virtual Private Network (VPN)', 'Backing Up and Restoring Data']


In [23]:
# Creating a prompt to simulate a conversation between an IT helpdesk specialist and a user
CONVERSATION_SIMULATION_PROMPT = '''Assume you are an IT helpdesk specialist responsible for providing technical support to users. You’ve received a call from a user experiencing trouble with their computer. Simulate a natural conversation between you and the user, addressing the issue in a friendly, professional, and helpful manner. 

- Ensure the conversation contains at least 10 back-and-forth exchanges.
- The user may provide vague or incomplete information initially; ask for clarifications when necessary.
- Include at least three troubleshooting steps in the conversation.
- If the issue can’t be resolved on the call, suggest escalation or other solutions.
- Keep the user engaged, acknowledging frustrations or confusion as needed, while explaining solutions clearly.

Here is the topic:
{topic}

Please format the output as a list of messages in the following JSON format:

[
    {
        "sender": "user",
        "message": "Hello, I am having trouble with my computer."
    },
    {
        "sender": "specialist",
        "message": "I'm sorry to hear that! Could you please describe the issue in more detail?"
    }
]
'''

# Setting the prompt template to simulate the conversations
conversation_generation_template = ChatPromptTemplate(messages = [
    HumanMessagePromptTemplate.from_template(template = CONVERSATION_SIMULATION_PROMPT)
])

# Creating the conversation simulation chain
conversation_generation_chain = conversation_generation_template | chat_model | StrOutputParser()

In [22]:
# Instantiating a Pandas DataFrame with a single column called 'original_text'
df = pd.DataFrame(columns = ['original_text'])