# Evaluating Summarization with TruLens

In this notebook, we will evaluate a summarization application based on [DialogSum dataset](https://github.com/cylnlp/dialogsum). Using a number of different metrics. These will break down into two main categories: 
1. Ground truth agreement: For these set of metrics, we will measure how similar the generated summary is to some human-created ground truth. We will use for different measures: BERT score, BLEU, ROUGE and a measure where an LLM is prompted to produce a similarity score.
2. Groundedness: For this measure, we will estimate if the generated summary can be traced back to parts of the original transcript.

### Dependencies
Let's first install the packages that this notebook depends on. Uncomment these linse to run.

In [None]:
"""!pip install bert_score==0.3.13 \
             evaluate==0.4.0 \
             absl-py==1.4.0 \
             rouge-score==0.1.2 \
             pandas \
             tenacity """

For the latest metrics, install TruLens from development branch

In [22]:
"""!pip install git+https://github.com/truera/trulens.git@ss/comparison_scores#subdirectory=trulens_eval"""

'!pip install git+https://github.com/truera/trulens.git@ss/comparison_scores#subdirectory=trulens_eval'

### Download and load data
Now we will download a portion of the DialogSum dataset from github.

In [2]:
import pandas as pd    

In [3]:
!wget -O dialogsum.dev.jsonl https://raw.githubusercontent.com/cylnlp/dialogsum/main/DialogSum_Data/dialogsum.dev.jsonl

--2023-08-30 23:51:35--  https://raw.githubusercontent.com/cylnlp/dialogsum/main/DialogSum_Data/dialogsum.dev.jsonl
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 471830 (461K) [text/plain]
Saving to: ‘dialogsum.dev.jsonl’


2023-08-30 23:51:35 (9.17 MB/s) - ‘dialogsum.dev.jsonl’ saved [471830/471830]



In [4]:
file_path_dev = 'dialogsum.dev.jsonl'
dev_df = pd.read_json(path_or_buf=file_path_dev, lines=True)

Let's preview the data to make sure that the data was properly loaded

In [5]:
dev_df.head(10)

Unnamed: 0,fname,dialogue,summary,topic
0,dev_0,"#Person1#: Hello, how are you doing today?\n#P...",#Person2# has trouble breathing. The doctor as...,see a doctor
1,dev_1,#Person1#: Hey Jimmy. Let's go workout later t...,#Person1# invites Jimmy to go workout and pers...,do exercise
2,dev_2,#Person1#: I need to stop eating such unhealth...,#Person1# plans to stop eating unhealthy foods...,healthy foods
3,dev_3,#Person1#: Do you believe in UFOs?\n#Person2#:...,#Person2# believes in UFOs and can see them in...,UFOs and aliens
4,dev_4,#Person1#: Did you go to school today?\n#Perso...,#Person1# didn't go to school today. #Person2#...,go to school
5,dev_5,"#Person1#: Honey, I think you should quit smok...",#Person1# asks #Person2# to quit smoking for h...,quit smoking
6,dev_6,"#Person1#: Excuse me, Mr. White? I just need y...",Sherry reminds Mr. White to sign.,workplace conversation
7,dev_7,"#Person1#: Hey, Karen. Look like you got some ...",#Person1# asks Karen where Karen stayed and ho...,holidays
8,dev_8,#Person1#: How do you usually spend your leisu...,#Person1# asks about #Person2#'s hobbies. #Per...,hobby
9,dev_9,#Person1#: have you ever seen Bill Gate's home...,#Person1# and #Person2# talk about Bill Gate's...,dream home


## Create a simple summarization app and instrument it

We will create a simple summarization app based on the OpenAI ChatGPT model and instrument it for use with TruLens

In [7]:
from trulens_eval.tru_custom_app import instrument
from trulens_eval.tru_custom_app import TruCustomApp

In [8]:
import openai

class DialogSummaryApp:
    
    @instrument
    def summarize(self, dialog):
        summary = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=[
                    {"role": "system", "content": """Summarize the given dialog into 1-2 sentences based on the following criteria: 
                     1. Convey only the most salient information; 
                     2. Be brief; 
                     3. Preserve important named entities within the conversation; 
                     4. Be written from an observer perspective; 
                     5. Be written in formal language. """},
                    {"role": "user", "content": dialog}
                ]
            )["choices"][0]["message"]["content"]
        return summary

## Initialize Database and view dashboard

In [9]:
from trulens_eval import Tru
tru = Tru(database_url="postgresql://localhost/trulens1?user=trulensuser&password=trulens123")

🦑 Tru initialized with db url postgresql://localhost/trulens1?password=trulens123&user=trulensuser .


In [14]:
tru.run_dashboard()

Starting dashboard ...


Accordion(children=(VBox(children=(VBox(children=(Label(value='STDOUT'), Output())), VBox(children=(Label(valu…

Dashboard started at http://172.20.45.173:8501 .


<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>

## Write feedback functions

We will now create the feedback functions that will evaluate the app. Remember that the criteria we were evaluating against were:
1. Ground truth agreement: For these set of metrics, we will measure how similar the generated summary is to some human-created ground truth. We will use for different measures: BERT score, BLEU, ROUGE and a measure where an LLM is prompted to produce a similarity score.
2. Groundedness: For this measure, we will estimate if the generated summary can be traced back to parts of the original transcript.

In [10]:
from trulens_eval import Feedback, feedback
from trulens_eval.feedback import GroundTruthAgreement

We select the golden dataset based on dataset we downloaded

In [11]:
golden_set = dev_df[['dialogue', 'summary']].rename(columns={'dialogue': 'query', 'summary': 'response'}).to_dict('records')

In [12]:
ground_truth_collection = GroundTruthAgreement(golden_set)
f_groundtruth = Feedback(ground_truth_collection.agreement_measure).on_input_output()
f_bert_score = Feedback(ground_truth_collection.bert_score).on_input_output()
f_bleu = Feedback(ground_truth_collection.bleu).on_input_output()
f_rouge = Feedback(ground_truth_collection.rouge).on_input_output()
# Groundedness between each context chunk and the response.
grounded = feedback.Groundedness()
f_groundedness = feedback.Feedback(grounded.groundedness_measure).on_input().on_output().aggregate(grounded.grounded_statements_aggregator)

✅ In agreement_measure, input prompt will be set to *.__record__.main_input or `Select.RecordInput` .
✅ In agreement_measure, input response will be set to *.__record__.main_output or `Select.RecordOutput` .
✅ In bert_score, input prompt will be set to *.__record__.main_input or `Select.RecordInput` .
✅ In bert_score, input response will be set to *.__record__.main_output or `Select.RecordOutput` .
✅ In bleu, input prompt will be set to *.__record__.main_input or `Select.RecordInput` .
✅ In bleu, input response will be set to *.__record__.main_output or `Select.RecordOutput` .
✅ In rouge, input prompt will be set to *.__record__.main_input or `Select.RecordInput` .
✅ In rouge, input response will be set to *.__record__.main_output or `Select.RecordOutput` .
✅ In groundedness_measure, input source will be set to *.__record__.main_input or `Select.RecordInput` .
✅ In groundedness_measure, input statement will be set to *.__record__.main_output or `Select.RecordOutput` .


## Create the app and wrap it

Now we are ready to wrap our summarization app with TruLens as a `TruCustomApp`. Now each time it will be called, TruLens will log inputs, outputs and any instrumented intermediate steps and evaluate them ith the feedback functions we created.

In [15]:
app = DialogSummaryApp()
#print(app.summarize(dev_df.dialogue[498]))

In [16]:
ta = TruCustomApp(app, app_id='Summarize_v1', feedbacks = [f_groundtruth, f_groundedness, f_bert_score, f_bleu, f_rouge])

We can test a single run of the App as so. This should show up on the dashboard.

In [25]:
ta.with_record(app.summarize, dialog=dev_df.dialogue[498])

("A customer called Amazon's customer service to report a missing page in a book they purchased. The customer service representative requested the order number, confirmed the details, and instructed the customer to take a photo of the missing page. Once verified, a new book will be sent to the customer in 2 days, and they can keep the old one. The conversation ended with the customer expressing no further needs and the representative wishing them a nice day.",
 Record(record_id='record_hash_788d646d08ca593ed7e16cdc528ae5bc', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=439, n_stream_chunks=0, n_prompt_tokens=351, n_completion_tokens=88, cost=0.0007025), perf=Perf(start_time=datetime.datetime(2023, 8, 31, 0, 20, 10, 989566), end_time=datetime.datetime(2023, 8, 31, 0, 20, 14, 196660)), ts=datetime.datetime(2023, 8, 31, 0, 20, 14, 196720), tags='-', meta=None, main_input="#Person1#: Hello, Amazon's customer service. How can I help you?\n#Pe



We'll make a lot of queries in a short amount of time, so we need tenacity to make sure that most of our requests eventually go through.

In [18]:
from tenacity import (
    retry,
    stop_after_attempt,
    wait_random_exponential,
)  # for exponential backoff


In [26]:
@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
def run_with_backoff(doc):
    return ta.with_record(app.summarize, dialog=doc)


In [None]:
for pair in golden_set:
    llm_response = run_with_backoff(pair["query"])
    print(llm_response)

("Person 1 asks Person 2 how they are doing to which Person 2 responds that they have been having trouble breathing lately. Person 1 asks if Person 2 has had a cold recently and Person 2 responds that they haven't, but they feel a heavy feeling in their chest when they try to breathe. Person 1 asks if Person 2 has any known allergies and Person 2 responds that they don't. Person 1 then asks if the breathing trouble happens all the time or mostly when they are active, to which Person 2 responds that it happens a lot when they work out. Person 1 decides to refer Person 2 to a pulmonary specialist for tests to check for asthma. Person 2 thanks Person 1 for their help.", Record(record_id='record_hash_27915e894e4a4e8f24dcc29be96196c9', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=405, n_stream_chunks=0, n_prompt_tokens=252, n_completion_tokens=153, cost=0.000684), perf=Perf(start_time=datetime.datetime(2023, 8, 31, 0, 22, 28, 395452), end_tim

('Person 1 asks Person 2 if they believe in UFOs, and Person 2 confirms that they do. Person 1 expresses doubt because they have never seen a UFO, and Person 2 explains that not everyone can see them. Person 2 claims to be able to see UFOs in their dreams and states that the purpose of UFOs is to bring aliens from outer space to Earth. Person 1 asks what the aliens look like and if Person 2 can communicate with them, and Person 2 describes the aliens as robot-like beings who can speak. Person 1 is amazed and asks if Person 2 communicates with them in English, and Person 2 affirms that they do.', Record(record_id='record_hash_b90da8db439da736e11d4afcf9320037', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=437, n_stream_chunks=0, n_prompt_tokens=296, n_completion_tokens=141, cost=0.000726), perf=Perf(start_time=datetime.datetime(2023, 8, 31, 0, 22, 39, 944131), end_time=datetime.datetime(2023, 8, 31, 0, 22, 44, 532623)), ts=datetime.datetim

('Person2, Mr. White, apologizes to Person1, Sherry, for making her wait and thanks her for reminding him to sign the papers. Sherry then asks for one more signature, and Mr. White provides it.', Record(record_id='record_hash_1056691f88f44f06bc9dc5cdd87f7d31', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=218, n_stream_chunks=0, n_prompt_tokens=171, n_completion_tokens=47, cost=0.0003505), perf=Perf(start_time=datetime.datetime(2023, 8, 31, 0, 22, 51, 352792), end_time=datetime.datetime(2023, 8, 31, 0, 22, 53, 444608)), ts=datetime.datetime(2023, 8, 31, 0, 22, 53, 444651), tags='-', meta=None, main_input="#Person1#: Excuse me, Mr. White? I just need you to sign these before I leave.\n#Person2#: Sure, Sherry. Sorry to have kept you waiting. If you hadn't told me, I probably would have just forgotten all about them.\n#Person1#: That's my job, sir. Just one more signature here, please.\n#Person2#: There you are.", main_output='Person2, Mr. W

("Person1 expresses dissatisfaction with their life and feeling tired. Person2 expresses envy towards Person1's life but Person1 reveals they have been over-protected by their mother and are considering leaving the family. Person2 acknowledges Person1's perspective.", Record(record_id='record_hash_2459787d8d14469c335f6c50f389b50c', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=209, n_stream_chunks=0, n_prompt_tokens=161, n_completion_tokens=48, cost=0.0003375), perf=Perf(start_time=datetime.datetime(2023, 8, 31, 0, 23, 1, 75693), end_time=datetime.datetime(2023, 8, 31, 0, 23, 2, 781112)), ts=datetime.datetime(2023, 8, 31, 0, 23, 2, 781161), tags='-', meta=None, main_input="#Person1#: I am tired of everything in my life.\n#Person2#: What? How happy you life is! I do envy you.\n#Person1#: You don't know that I have been over-protected by my mother these years. I am really about to leave the family and spread my wings.\n#Person2#: Maybe you 

('In a conversation about women in sports, Person1 asks which sports women excel at, to which Person2 responds that women excel in every sport except those that are taboo for them to join in, like football. Person1 then asks which sports women are better at than men, but Person2 argues that women and men are different and cannot be compared. Person1 then changes the question to ask which sports women like best, and Person2 responds that it varies with individuals, mentioning golf and contact sports as examples. Finally, Person1 asks if women can be generally categorized, but Person2 questions the possibility of categorizing anyone.', Record(record_id='record_hash_db8ad7fad1b8851ee63f0a7c69ee7024', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=366, n_stream_chunks=0, n_prompt_tokens=241, n_completion_tokens=125, cost=0.0006115000000000001), perf=Perf(start_time=datetime.datetime(2023, 8, 31, 0, 23, 8, 148943), end_time=datetime.datetime(20

('Person 2 receives bad news that they did not get a position they were hoping for, despite thinking they were qualified. Person 1 tries to console them and encourages them to keep working hard for future opportunities.', Record(record_id='record_hash_e8aa6de7336a5b5e84e9e4cfe053c98e', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=316, n_stream_chunks=0, n_prompt_tokens=274, n_completion_tokens=42, cost=0.000495), perf=Perf(start_time=datetime.datetime(2023, 8, 31, 0, 23, 16, 760831), end_time=datetime.datetime(2023, 8, 31, 0, 23, 18, 569560)), ts=datetime.datetime(2023, 8, 31, 0, 23, 18, 569604), tags='-', meta=None, main_input="#Person1#: I'm afraid it's bad news for you. You haven't got the position.\n#Person2#: Oh, no! I can't have failed. Are you sure?\n#Person1#: I'm afraid so. I'm terribly sorry.\n#Person2#: It sucks. But Arden told me he's satisfied with my qualifications and experience.\n#Person1#: He's the only one of the severa

("Person 1, Ms. Murphy, tells Brad that he needs to redo something because it is badly organized and she can't present it to the board. Brad apologizes and says he will re-work it and give it back to her in the afternoon.", Record(record_id='record_hash_edc9b0e1e58abe8cef544fb5bfdba4d4', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=217, n_stream_chunks=0, n_prompt_tokens=166, n_completion_tokens=51, cost=0.000351), perf=Perf(start_time=datetime.datetime(2023, 8, 31, 0, 23, 23, 201906), end_time=datetime.datetime(2023, 8, 31, 0, 23, 25, 57331)), ts=datetime.datetime(2023, 8, 31, 0, 23, 25, 57373), tags='-', meta=None, main_input="#Person1#: sorry, Brad. But you are going to have to re-do this.\n#Person2#: What's the problem, Ms. Murphy?\n#Person1#: It's badly organized. I can't present this to the board.\n#Person2#: I'm sorry. Ms. Murphy. I'll re-work it. Can I give it back to you this afternoon?", main_output="Person 1, Ms. Murphy, tells

('The customer would like to order a steak with mushrooms, well-done. They would like baked potatoes and cream onion soup. They will not be having dessert.', Record(record_id='record_hash_4c879efde893e626ecc85d2d6b7b031e', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=305, n_stream_chunks=0, n_prompt_tokens=274, n_completion_tokens=31, cost=0.000473), perf=Perf(start_time=datetime.datetime(2023, 8, 31, 0, 23, 31, 983156), end_time=datetime.datetime(2023, 8, 31, 0, 23, 33, 202737)), ts=datetime.datetime(2023, 8, 31, 0, 23, 33, 202782), tags='-', meta=None, main_input="#Person1#: Would you like to order now, madam?\n#Person2#: Yes, please. I'd like the steak and mushrooms.\n#Person1#: How would you like your steak, rare, medium, or well-done?\n#Person2#: I'd like it well done, please.\n#Person1#: What kind of potatoes would like to go with that, mushed, boiled, or baked?\n#Person2#: I think i have bake potatoes. And i now have ice tea with 

('Bill is asked by Person1 if he is free at noon, and after confirming that he is, Person1 asks Bill to go downtown with them after lunch to buy a new filing cabinet and some office supplies based on a list given by Susan.', Record(record_id='record_hash_e8213dcc4f058c94f3b58fa38d58161b', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=273, n_stream_chunks=0, n_prompt_tokens=224, n_completion_tokens=49, cost=0.00043400000000000003), perf=Perf(start_time=datetime.datetime(2023, 8, 31, 0, 23, 39, 562846), end_time=datetime.datetime(2023, 8, 31, 0, 23, 41, 424459)), ts=datetime.datetime(2023, 8, 31, 0, 23, 41, 424504), tags='-', meta=None, main_input='#Person1#: Bill, will you be free at noon?\n#Person2#: Yes. What can I do for you?\n#Person1#: We need a new filing cabinet in the office. Could you go downtown with me after lunch?\n#Person2#: All right. Have you got an idea about what type to buy?\n#Person1#: Yes, the same as the one we have. A

("Person 1 suggests taking a break from studying for tomorrow's history exam to listen to music. Person 2 agrees and comments on the large collection of music. Person 1 mentions having a variety of genres but no classical music. Person 2 reminds Person 1 of the upcoming exam and suggests getting back to studying.", Record(record_id='record_hash_34c0d15fcfe7a44ead89966e1d4b6583', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=379, n_stream_chunks=0, n_prompt_tokens=315, n_completion_tokens=64, cost=0.0006005), perf=Perf(start_time=datetime.datetime(2023, 8, 31, 0, 23, 49, 782938), end_time=datetime.datetime(2023, 8, 31, 0, 23, 52, 66861)), ts=datetime.datetime(2023, 8, 31, 0, 23, 52, 66907), tags='-', meta=None, main_input="#Person1#: We've been cramming for tomorrow's history exam since early this morning. What do you say we take a break and listen to some music, okay?\n#Person2#: Now that you mention it, I'm getting a little bumed-out fro

("Mrs. Brandon is not doing well as she lost her job, while Person 2's students are feeling anxious about their final test. Person 1 remembers struggling with a difficult test given by Person 2 in college, but also expresses gratitude for the valuable English language skills acquired.", Record(record_id='record_hash_ec8d919225f640636076503141bdc9ee', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=271, n_stream_chunks=0, n_prompt_tokens=215, n_completion_tokens=56, cost=0.0004345), perf=Perf(start_time=datetime.datetime(2023, 8, 31, 0, 23, 57, 960229), end_time=datetime.datetime(2023, 8, 31, 0, 23, 59, 980960)), ts=datetime.datetime(2023, 8, 31, 0, 23, 59, 981003), tags='-', meta=None, main_input="#Person1#: How are you, Mrs. Brandon?\n#Person2#: Pretty good. How are you doing?\n#Person1#: Not so good. I lost my job today.\n#Person2#: I'm sorry to hear that.\n#Person1#: How are your students doing?\n#Person2#: They are very nervous about th

('Gary expresses his excitement about his first date with Caroline to Anne and mentions his desire to propose marriage. Anne advises Gary to go on a second date with Caroline before considering a marriage proposal.', Record(record_id='record_hash_6749752ceadac8abb9f146d8a832117e', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=240, n_stream_chunks=0, n_prompt_tokens=203, n_completion_tokens=37, cost=0.00037850000000000004), perf=Perf(start_time=datetime.datetime(2023, 8, 31, 0, 24, 5, 614623), end_time=datetime.datetime(2023, 8, 31, 0, 24, 7, 65157)), ts=datetime.datetime(2023, 8, 31, 0, 24, 7, 65202), tags='-', meta=None, main_input="#Person1#: Anne, thanks so much for introducing me to Caroline! Our first date went so well. I'm so excited to be in love right now.\n#Person2#: I'm just glad to see you so happy, Gary!\n#Person1#: I want to climb the highest mountain and shout, Caroline, will you marry me?!\n#Person2#: Wow, you'd better not.

('Person1 asks Person2 for directions to the Rainbow Restaurant. Person2 provides clear instructions, telling Person1 to drive two blocks, turn left, continue on until Heath Street, turn right, and then turn left at the second stop light.', Record(record_id='record_hash_e010abcc0fb961084f1d36729a3a3fff', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=195, n_stream_chunks=0, n_prompt_tokens=147, n_completion_tokens=48, cost=0.0003165), perf=Perf(start_time=datetime.datetime(2023, 8, 31, 0, 24, 11, 939567), end_time=datetime.datetime(2023, 8, 31, 0, 24, 14, 62695)), ts=datetime.datetime(2023, 8, 31, 0, 24, 14, 62738), tags='-', meta=None, main_input="#Person1#: Excuse me, can you tell me how to get to the Rainbow Restaurant from here?\n#Person2#: Drive two blocks and turn left. Continue on until you reach Heath Street and turn right. Then turn left at the second stop light. You can't miss it.", main_output='Person1 asks Person2 for direction

('Person 1 is looking for a big pan for their kitchen and rejects a heavy one but eventually finds a lightweight one with a heat-resistant handle. They decide to buy it and also ask for a lid, which Person 2 provides.', Record(record_id='record_hash_55358e38d97fd8e703325a1aa97ca4c2', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=360, n_stream_chunks=0, n_prompt_tokens=313, n_completion_tokens=47, cost=0.0005635), perf=Perf(start_time=datetime.datetime(2023, 8, 31, 0, 24, 22, 440554), end_time=datetime.datetime(2023, 8, 31, 0, 24, 24, 105512)), ts=datetime.datetime(2023, 8, 31, 0, 24, 24, 105554), tags='-', meta=None, main_input="#Person1#: I'm looking for a pan I can use in my kitchen. \n#Person2#: What size pan were you thinking of? \n#Person1#: I've already got a small pan. I need a big one. \n#Person2#: Well, this one might work for you. \n#Person1#: Oh, no, that's way too heavy a pan for me. \n#Person2#: Here, lift this aluminum pan. 

("Daniel is applying for a manager position at the company. He found out about the company through famous brands and believes it is the best-known. Although he doesn't have much experience, he is interested in the job and hopes to hear back within the week.", Record(record_id='record_hash_847b19a230eed1bb1beafa95937149d5', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=390, n_stream_chunks=0, n_prompt_tokens=339, n_completion_tokens=51, cost=0.0006105), perf=Perf(start_time=datetime.datetime(2023, 8, 31, 0, 24, 25, 864031), end_time=datetime.datetime(2023, 8, 31, 0, 24, 27, 850040)), ts=datetime.datetime(2023, 8, 31, 0, 24, 27, 850084), tags='-', meta=None, main_input="#Person1#: Good morning, I'm Daniel. I'm applying for the positon of manager. \n#Person2#: Yes. Sit down, please. How did you learn about our company? \n#Person1#: I got to know your company through such famous brands as LUX, LIPTON and WALLS. After making a customer survey,

("Person1 asks Person2 if they need assistance, to which Person2 responds that they'd like to book 3 seats to Calgary, Canada on a flight before next Sunday. Person1 asks if they want economy class and if it's a one-way or round trip. Person2 confirms economy class and one-way. Person1 informs Person2 that there are no direct flights and they will have to change in Vancouver, to which Person2 agrees. Person1 suggests a flight leaving Beijing next Friday at 10 am with 3 available seats, and Person2 confirms it. Person1 asks for Person2's name, which is given as Basil. Person2 then asks about the ticket price, and Person1 states it is $580 for one ticket.", Record(record_id='record_hash_a7948ef1b6d80f6cb3e72e8dc20bbe0f', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=410, n_stream_chunks=0, n_prompt_tokens=260, n_completion_tokens=150, cost=0.0006900000000000001), perf=Perf(start_time=datetime.datetime(2023, 8, 31, 0, 24, 31, 705659), end_ti

And that's it! This might take a few minutes to run, at the end of it, you can explore the dashboard to see how well your app does.