<a href="https://colab.research.google.com/github/IyadSultan/Basic_Flask_site/blob/master/educational/eval_trulens/summarization_eval.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Evaluating Summarization with TruLens

In this notebook, we will evaluate a summarization application based on [DialogSum dataset](https://github.com/cylnlp/dialogsum). Using a number of different metrics. These will break down into two main categories:
1. Ground truth agreement: For these set of metrics, we will measure how similar the generated summary is to some human-created ground truth. We will use for different measures: BERT score, BLEU, ROUGE and a measure where an LLM is prompted to produce a similarity score.
2. Groundedness: For this measure, we will estimate if the generated summary can be traced back to parts of the original transcript.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/main/trulens_eval/examples/quickstart/summarization_eval.ipynb)

### Dependencies
Let's first install the packages tadwqehat this notebook depends on. Uncomment these linse to run.

In [None]:
# !pip install trulens_eval==0.18.0\
#             bert_score==0.3.13 \
#              evaluate==0.4.0 \
#              absl-py==1.4.0 \
#              rouge-score==0.1.2 \
#              pandas \
#              tenacity

In [None]:
# prompt: set OPENAI_API_KEY environment variable to sk-******************************* using os package

import os
os.environ['OPENAI_API_KEY'] = 'sk-*******************************'


### Download and load data
Now we will download a portion of the DialogSum dataset from github. ewrwerwe

In [None]:
import pandas as pd

In [None]:
!wget -O dialogsum.dev.jsonl https://raw.githubusercontent.com/cylnlp/dialogsum/main/DialogSum_Data/dialogsum.dev.jsonl

--2024-01-31 20:31:04--  https://raw.githubusercontent.com/cylnlp/dialogsum/main/DialogSum_Data/dialogsum.dev.jsonl
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 471830 (461K) [text/plain]
Saving to: ‘dialogsum.dev.jsonl’


2024-01-31 20:31:04 (18.7 MB/s) - ‘dialogsum.dev.jsonl’ saved [471830/471830]



In [None]:
file_path_dev = 'dialogsum.dev.jsonl'
dev_df = pd.read_json(path_or_buf=file_path_dev, lines=True)

Let's preview the data to make sure that the data was properly loaded

In [None]:
dev_df.head(10)

Unnamed: 0,fname,dialogue,summary,topic
0,dev_0,"#Person1#: Hello, how are you doing today?\n#P...",#Person2# has trouble breathing. The doctor as...,see a doctor
1,dev_1,#Person1#: Hey Jimmy. Let's go workout later t...,#Person1# invites Jimmy to go workout and pers...,do exercise
2,dev_2,#Person1#: I need to stop eating such unhealth...,#Person1# plans to stop eating unhealthy foods...,healthy foods
3,dev_3,#Person1#: Do you believe in UFOs?\n#Person2#:...,#Person2# believes in UFOs and can see them in...,UFOs and aliens
4,dev_4,#Person1#: Did you go to school today?\n#Perso...,#Person1# didn't go to school today. #Person2#...,go to school
5,dev_5,"#Person1#: Honey, I think you should quit smok...",#Person1# asks #Person2# to quit smoking for h...,quit smoking
6,dev_6,"#Person1#: Excuse me, Mr. White? I just need y...",Sherry reminds Mr. White to sign.,workplace conversation
7,dev_7,"#Person1#: Hey, Karen. Look like you got some ...",#Person1# asks Karen where Karen stayed and ho...,holidays
8,dev_8,#Person1#: How do you usually spend your leisu...,#Person1# asks about #Person2#'s hobbies. #Per...,hobby
9,dev_9,#Person1#: have you ever seen Bill Gate's home...,#Person1# and #Person2# talk about Bill Gate's...,dream home


## Create a simple summarization app and instrument it

We will create a simple summarization app based on the OpenAI ChatGPT model and instrument it for use with TruLens

In [None]:
from trulens_eval.tru_custom_app import instrument
from trulens_eval.tru_custom_app import TruCustomApp

In [None]:
import openai
import json

class DialogSummaryApp:

    @instrument
    def summarize(self, dialog):
        client = openai.OpenAI()
        summary = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                    {"role": "system", "content": """Summarize the given dialog into 1-2 sentences based on the following criteria:
                     1. Convey only the most salient information;
                     2. Be brief;
                     3. Preserve important named entities within the conversation;
                     4. Be written from an observer perspective;
                     5. Be written in formal language. """},
                    {"role": "user", "content": dialog}
                ]

            )
        response = json.loads(summary.model_dump_json())

        return response["choices"][0]["message"]["content"]



'The individuals in the conversation are discussing an error message that they encountered, but no specific details about the error message are provided.'

## Initialize Database and view dashboard

In [None]:
from trulens_eval import Tru
tru = Tru()
# If you have a database you can connect to, use a URL. For example:
# tru = Tru(database_url="postgresql://hostname/database?user=username&password=password")

🦑 Tru initialized with db url sqlite:///default.sqlite .
🛑 Secret keys may be written to the database. See the `database_redact_keys` option of `Tru` to prevent this.


In [None]:
tru.run_dashboard()

Starting dashboard ...
Config file already exists. Skipping writing process.
Credentials file already exists. Skipping writing process.
npx: installed 22 in 4.67s

Go to this url and submit the ip given here. your url is: https://heavy-lamps-stop.loca.lt

  Submit this IP Address: 34.82.249.199:8502


<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>




## Write feedback functions

We will now create the feedback functions that will evaluate the app. Remember that the criteria we were evaluating against were:
1. Ground truth agreement: For these set of metrics, we will measure how similar the generated summary is to some human-created ground truth. We will use for different measures: BERT score, BLEU, ROUGE and a measure where an LLM is prompted to produce a similarity score.
2. Groundedness: For this measure, we will estimate if the generated summary can be traced back to parts of the original transcript.

In [None]:
from trulens_eval import Feedback, feedback
from trulens_eval.feedback import GroundTruthAgreement

We select the golden dataset based on dataset we downloaded

In [None]:
golden_set = dev_df[['dialogue', 'summary']].rename(columns={'dialogue': 'query', 'summary': 'response'}).to_dict('records')

In [None]:
ground_truth_collection = GroundTruthAgreement(golden_set)
f_groundtruth = Feedback(ground_truth_collection.agreement_measure).on_input_output()
f_bert_score = Feedback(ground_truth_collection.bert_score).on_input_output()
f_bleu = Feedback(ground_truth_collection.bleu).on_input_output()
f_rouge = Feedback(ground_truth_collection.rouge).on_input_output()
# Groundedness between each context chunk and the response.
grounded = feedback.Groundedness()
f_groundedness = feedback.Feedback(grounded.groundedness_measure).on_input().on_output().aggregate(grounded.grounded_statements_aggregator)

✅ In agreement_measure, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In agreement_measure, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In bert_score, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In bert_score, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In bleu, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In bleu, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In rouge, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In rouge, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In groundedness_measure, input source will be set to __record__.main_input or `Select.RecordInput` .
✅ In groundedness_measure, input statement will be set to __record__.main_output or `Select.RecordOutput` .


## Create the app and wrap it

Now we are ready to wrap our summarization app with TruLens as a `TruCustomApp`. Now each time it will be called, TruLens will log inputs, outputs and any instrumented intermediate steps and evaluate them ith the feedback functions we created.

In [None]:
app = DialogSummaryApp()
#print(app.summarize(dev_df.dialogue[498]))

In [None]:
ta = TruCustomApp(app, app_id='Summarize_v1', feedbacks = [f_groundtruth, f_groundedness, f_bert_score, f_bleu, f_rouge])



We can test a single run of the App as so. This should show up on the dashboard.

In [None]:
ta.with_record(app.summarize, dialog=dev_df.dialogue[498])



("Amazon's customer service representative asks for the order number and verifies the details of the purchased book. They instruct the customer to take a photo of the missing page and upload it to their website for confirmation. A new book will be sent in 2 days, and the customer can keep the old one. The conversation concludes with the customer expressing no further issues and the customer service representative wishing them a nice day.",
 Record(record_id='record_hash_84faa48ead0206ed5e3f547c1830999f', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=432, n_stream_chunks=0, n_prompt_tokens=351, n_completion_tokens=81, cost=0.0006885), perf=Perf(start_time=datetime.datetime(2024, 1, 31, 20, 31, 5, 982729), end_time=datetime.datetime(2024, 1, 31, 20, 31, 7, 647015)), ts=datetime.datetime(2024, 1, 31, 20, 31, 7, 647368), tags='-', meta=None, main_input="#Person1#: Hello, Amazon's customer service. How can I help you?\n#Person2#: Hello, it's t

We'll make a lot of queries in a short amount of time, so we need tenacity to make sure that most of our requests eventually go through.

In [None]:
from tenacity import (
    retry,
    stop_after_attempt,
    wait_random_exponential,
)  # for exponential backoff


In [None]:
@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
def run_with_backoff(doc):
    return ta.with_record(app.summarize, dialog=doc)


In [None]:
for pair in golden_set:
    llm_response = run_with_backoff(pair["query"])
    print(llm_response)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading builder script:   0%|          | 0.00/6.27k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/5.94k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/482 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]



Downloading extra modules:   0%|          | 0.00/3.34k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]



('Person2 is experiencing trouble breathing and a heavy feeling in their chest, especially when working out. They do not have any known allergies or recent colds. Person1, a doctor, suggests that Person2 sees a pulmonary specialist to test for asthma.', Record(record_id='record_hash_187b889f5afbfa05fe4e74da0f59bf6d', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=302, n_stream_chunks=0, n_prompt_tokens=252, n_completion_tokens=50, cost=0.000478), perf=Perf(start_time=datetime.datetime(2024, 1, 31, 20, 31, 8, 460799), end_time=datetime.datetime(2024, 1, 31, 20, 31, 10, 495881)), ts=datetime.datetime(2024, 1, 31, 20, 31, 10, 496316), tags='-', meta=None, main_input="#Person1#: Hello, how are you doing today?\n#Person2#: I ' Ve been having trouble breathing lately.\n#Person1#: Have you had any type of cold lately?\n#Person2#: No, I haven ' t had a cold. I just have a heavy feeling in my chest when I try to breathe.\n#Person1#: Do you have any

model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]



('Person1 and Person2 are discussing their workout plans. Person2 initially suggests working on legs and forearms, but Person1, who has already played basketball and has sore legs, suggests working on arms and stomach instead. Person2 initially resists the change but eventually agrees to meet Person1 at the gym at 3:30 to work on arms and stomach.', Record(record_id='record_hash_0cc657e1c3b339a21a8c44ac32eca0e2', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=307, n_stream_chunks=0, n_prompt_tokens=234, n_completion_tokens=73, cost=0.000497), perf=Perf(start_time=datetime.datetime(2024, 1, 31, 20, 31, 13, 583031), end_time=datetime.datetime(2024, 1, 31, 20, 31, 16, 232253)), ts=datetime.datetime(2024, 1, 31, 20, 31, 16, 232613), tags='-', meta=None, main_input="#Person1#: Hey Jimmy. Let's go workout later today.\n#Person2#: Sure. What time do you want to go?\n#Person1#: How about at 3:30?\n#Person2#: That sounds good. Today we work on Legs



('Person1 wants to eat healthier and asks Person2 what foods they eat now. Person2 says they mainly eat fruits, vegetables, and chicken because they are very healthy, especially when baked. Person1 agrees that it sounds healthier than what they currently eat.', Record(record_id='record_hash_e32dd1d308fffadb450d62497acf24e3', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=280, n_stream_chunks=0, n_prompt_tokens=229, n_completion_tokens=51, cost=0.0004455), perf=Perf(start_time=datetime.datetime(2024, 1, 31, 20, 31, 18, 743427), end_time=datetime.datetime(2024, 1, 31, 20, 31, 21, 85703)), ts=datetime.datetime(2024, 1, 31, 20, 31, 21, 86020), tags='-', meta=None, main_input="#Person1#: I need to stop eating such unhealthy foods.\n#Person2#: I know what you mean. I've started eating better myself.\n#Person1#: What foods do you eat now?\n#Person2#: I tend to stick to fruits, vegetables, and chicken.\n#Person1#: Those are the only things you eat



("Person 2 believes in UFOs and claims to see them in their dreams. They believe that the UFOs' purpose is to bring aliens to Earth and make friends with humans, and that these aliens appear as robots who can speak and learn English on Mars. Person 1 expresses amazement at this information.", Record(record_id='record_hash_f218a88ba6abf717fc36d3b7f2642566', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=359, n_stream_chunks=0, n_prompt_tokens=296, n_completion_tokens=63, cost=0.00057), perf=Perf(start_time=datetime.datetime(2024, 1, 31, 20, 31, 24, 893180), end_time=datetime.datetime(2024, 1, 31, 20, 31, 28, 679633)), ts=datetime.datetime(2024, 1, 31, 20, 31, 28, 680299), tags='-', meta=None, main_input="#Person1#: Do you believe in UFOs?\n#Person2#: Of course, they are out there.\n#Person1#: But I never saw them.\n#Person2#: Are you stupid? They are called UFOs, so not everybody can see them.\n#Person1#: You mean that you can them.\n#Perso



("Person1 and Person2 are discussing their school attendance, with Person1 admitting to skipping school because they didn't want to go. Person2 then asks if Person1 has gone to the movies recently and expresses their desire to go this weekend, prompting Person1 to suggest they go alone. Person1 then asks if Person2 plans on going to school tomorrow, but Person2 plans on going to the movies instead.", Record(record_id='record_hash_258d61140b71293064f8364913cf40b1', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=325, n_stream_chunks=0, n_prompt_tokens=243, n_completion_tokens=82, cost=0.0005285), perf=Perf(start_time=datetime.datetime(2024, 1, 31, 20, 31, 34, 647902), end_time=datetime.datetime(2024, 1, 31, 20, 31, 38, 271934)), ts=datetime.datetime(2024, 1, 31, 20, 31, 38, 272320), tags='-', meta=None, main_input="#Person1#: Did you go to school today?\n#Person2#: Of course. Did you?\n#Person1#: I didn't want to, so I didn't.\n#Person2#: Th



('Person2 is hesitant about quitting smoking but ultimately agrees to quit after Person1 presses them to make a decision.', Record(record_id='record_hash_02106d6d6780f299b6c480b107415ab0', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=255, n_stream_chunks=0, n_prompt_tokens=233, n_completion_tokens=22, cost=0.0003935), perf=Perf(start_time=datetime.datetime(2024, 1, 31, 20, 31, 40, 300289), end_time=datetime.datetime(2024, 1, 31, 20, 31, 42, 358079)), ts=datetime.datetime(2024, 1, 31, 20, 31, 42, 358527), tags='-', meta=None, main_input="#Person1#: Honey, I think you should quit smoking.\n#Person2#: Why? You said I was hot when smoking.\n#Person1#: But I want you to be fit.\n#Person2#: Smoking is killing. I know.\n#Person1#: Check out this article. It says smoking can lead to lung cancer.\n#Person2#: I don't believe it.\n#Person1#: But you know that smoking does harm to health, right?\n#Person2#: Of course I know it, but you know it's har



("Person1, referred to as Sherry, approaches Mr. White and asks him to sign some documents. Mr. White apologizes for the delay and mentions that he would have forgotten about the papers if Sherry hadn't reminded him. Sherry then requests one final signature from Mr. White.", Record(record_id='record_hash_b560a13f49d7b3b45951ddb9077f3c24', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=230, n_stream_chunks=0, n_prompt_tokens=171, n_completion_tokens=59, cost=0.0003745), perf=Perf(start_time=datetime.datetime(2024, 1, 31, 20, 31, 45, 814820), end_time=datetime.datetime(2024, 1, 31, 20, 31, 48, 498793)), ts=datetime.datetime(2024, 1, 31, 20, 31, 48, 499287), tags='-', meta=None, main_input="#Person1#: Excuse me, Mr. White? I just need you to sign these before I leave.\n#Person2#: Sure, Sherry. Sorry to have kept you waiting. If you hadn't told me, I probably would have just forgotten all about them.\n#Person1#: That's my job, sir. Just one mo



("Person2 tells Person1 that she spent the weekend at the beach and stayed with some friends of her parents. She jogged, played volleyball, and didn't swim because the water was too cold. Person1 expresses jealousy and suggests that Person1 spent the weekend in the library.", Record(record_id='record_hash_cf2602f47205a693a73d2a8761a16edf', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=366, n_stream_chunks=0, n_prompt_tokens=310, n_completion_tokens=56, cost=0.000577), perf=Perf(start_time=datetime.datetime(2024, 1, 31, 20, 31, 50, 869312), end_time=datetime.datetime(2024, 1, 31, 20, 31, 53, 432793)), ts=datetime.datetime(2024, 1, 31, 20, 31, 53, 433237), tags='-', meta=None, main_input="#Person1#: Hey, Karen. Look like you got some sun this weekend.\n#Person2#: Yeah? I guess so. I spent the weekend at beach.\n#Person1#: That's great. Where did you stay?\n#Person2#: Some friends of my parents live out there, and they invited me there.\n#Pe



('Person2 enjoys taking photos outdoors and has their own photo studio where they develop and print their own pictures.', Record(record_id='record_hash_b21198599d4bcf88fb1bc26381e5acfd', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=239, n_stream_chunks=0, n_prompt_tokens=218, n_completion_tokens=21, cost=0.00036899999999999997), perf=Perf(start_time=datetime.datetime(2024, 1, 31, 20, 31, 55, 995062), end_time=datetime.datetime(2024, 1, 31, 20, 31, 58, 58006)), ts=datetime.datetime(2024, 1, 31, 20, 31, 58, 58494), tags='-', meta=None, main_input="#Person1#: How do you usually spend your leisure time? I mean, do you have any special interests out of your job?\n#Person2#: Of course. You see, almost everyone has some kind of hobby\n#Person1#: Yeah, you're quite right and what's your hobby?\n#Person2#: I like taking photos out of door.\n#Person1#: Oh, photography, It's really a good hobby.\n#Person2#: Yes, I usually develop and print all my o



("Person1 mentions that Bill Gates' home has its own library, theatre, swimming pool, and guest house, as well as multiple rooms connected to computers. Person2 asks if Person1 would want to live there, to which Person1 responds that while they think the house is fantastic, they wouldn't want to live there due to the need for additional staff for maintenance. They mention that their dream home is a small cottage in a quiet village in England and that they prefer old homes with character. Person2 asks if Person1 also likes second-hand clothes for the same reason, and Person1 clarifies that it's due to budget constraints. Person2 then asks whether if Person1 lived in an old house, they would decorate it in a modern way, and Person1 responds that they would try to restore it to its original state to experience living in another time in history.", Record(record_id='record_hash_fd5bc12592a75bb49542e03234288e74', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_class



("Person1 expresses dissatisfaction with their life and feeling of being tired. Person2 disagrees and expresses envy towards Person1's life. Person1 reveals that they have been over-protected by their mother and desire to break free from the family. Person2 acknowledges that Person1 may be right.", Record(record_id='record_hash_89241998bddb71c1e2232b0c5c85435a', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=218, n_stream_chunks=0, n_prompt_tokens=161, n_completion_tokens=57, cost=0.0003555), perf=Perf(start_time=datetime.datetime(2024, 1, 31, 20, 32, 7, 166677), end_time=datetime.datetime(2024, 1, 31, 20, 32, 9, 684509)), ts=datetime.datetime(2024, 1, 31, 20, 32, 9, 685020), tags='-', meta=None, main_input="#Person1#: I am tired of everything in my life.\n#Person2#: What? How happy you life is! I do envy you.\n#Person1#: You don't know that I have been over-protected by my mother these years. I am really about to leave the family and spre



('There is a discussion about the prevalence of advertisements in Hong Kong, with Person 1 expressing their dislike while Person 2 sees them as adding to the vibrancy of the city. Person 1 argues that companies should spend less on advertising and lower prices, while Person 2 counters that advertising is necessary for product awareness. They both agree that certain forms of advertising, like spam and intrusive broadcasts, are annoying, but appreciate the use of comedy in ad campaigns. Person 1 also dislikes the pressure tactics employed in some advertisements. Person 2 mentions that brand name products often use this type of advertising to maintain brand loyalty.', Record(record_id='record_hash_ff2c11a2292ef0ea19665cd17ecb6137', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=520, n_stream_chunks=0, n_prompt_tokens=396, n_completion_tokens=124, cost=0.0008420000000000001), perf=Perf(start_time=datetime.datetime(2024, 1, 31, 20, 32, 12, 3691



("Person1 asks Person2 about their date, and they express frustration at being turned down. Person1 suggests that Person2 should exercise to improve their chances with American women, citing that being in good shape is generally liked by them. Person2 is initially hesitant but is convinced by Person1's argument that exercising will bring benefits regardless.", Record(record_id='record_hash_30b9b27d30247f2f0510a2a9b9edb93e', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=452, n_stream_chunks=0, n_prompt_tokens=386, n_completion_tokens=66, cost=0.000711), perf=Perf(start_time=datetime.datetime(2024, 1, 31, 20, 32, 18, 693240), end_time=datetime.datetime(2024, 1, 31, 20, 32, 20, 863583)), ts=datetime.datetime(2024, 1, 31, 20, 32, 20, 864107), tags='-', meta=None, main_input="#Person1#: Hi, Mr. Zhang. What's wrong? You don't look so happy. How was your date?\n#Person2#: I was turned down again. It's frustrating. I guess you'Ve got to teach me 



('Person2 believes that women excel in every sport except for those that are considered taboo, like football. Person1 acknowledges that women and men are different and asks which sports women like best. Person2 responds by mentioning that some women love golf while others enjoy contact sports, indicating that women cannot be generally categorized.', Record(record_id='record_hash_bcb627b402ce55dd92e0e43f8bd17964', app_id='Summarize_v1', cost=Cost(n_requests=1, n_successful_requests=1, n_classes=0, n_tokens=302, n_stream_chunks=0, n_prompt_tokens=241, n_completion_tokens=61, cost=0.0004835), perf=Perf(start_time=datetime.datetime(2024, 1, 31, 20, 32, 24, 87095), end_time=datetime.datetime(2024, 1, 31, 20, 32, 27, 113883)), ts=datetime.datetime(2024, 1, 31, 20, 32, 27, 114361), tags='-', meta=None, main_input="#Person1#: What sports do you think women excel at most?\n#Person2#: I think women excel in every sport except the ones that are taboo for us to join in, like football.\n#Person1#: 



And that's it! This might take a few minutes to run, at the end of it, you can explore the dashboard to see how well your app does.