In [2]:
import os

if os.getcwd().split('/')[-1] == 'evaluation':
    os.chdir('../')

## Which Hyperparameters Should I Iterate On?
https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2

Here are typically the hyperparameters you should iterate on:

- model: the LLM to use for generation.
- prompt template: the variation of prompt templates to use for generation.
- temperature: the temperature value to use for generation.
- max tokens: the max token limit to set for your LLM generation.
- top-K: the number of retrieved nodes in your retrieval_context in a RAG pipeline.
- chunk size: the size of the retrieved nodes in your retrieval_context in a RAG pipeline.
- reranking model: the model used to rerank the retrieved nodes in your retrieval_context in a RAG pipeline.


In [3]:
import pandas as pd
from deepeval.test_case import LLMTestCase
from deepeval.dataset import EvaluationDataset

from deepeval import evaluate
from deepeval.metrics import AnswerRelevancyMetric

from chat_caller import query_gpt_chat, reload_vector_store

I0000 00:00:1722960316.276397 24998719 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache


Using LLM-provider: openai
Vector store: None
Using local FAISS.


  from tqdm.autonotebook import trange
I0000 00:00:1722960318.522677 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


load INSTRUCTOR_Transformer
max_seq_length  512
System instruction template:
You are a helpful teaching assistant of a course in business analytics. 
Your task is to answer the students' question based on relevant course materials (see "Course materials" below).
Here are some further guidelines:
- Always refer explicitly to the course materials in your answers.
- You should not cite sources outside the course materials.  
- Add a segment called "For more information, see:" in your answer 
(e.g., For more information, see: Week 1 lecture slides, Week 2 transcript). 
- You should decline to answer questions that are not covered in the course materials.
- You should decline students' requests to write Python code. 
- However, you can provide general tips concerning the students' coding-related questions based on the course materials.
- You can also help students debug their code.
- Refer the student to the course teachers or university administration if you are unable to help.
- You are n

In [10]:
df_questions = pd.read_csv('evaluation/questions.csv')

In [11]:
df_questions

Unnamed: 0,ques_id,context,question,answer,source_doc
0,Q1,- **AI Chatbot**: An AI chatbot is available f...,What is the purpose of the AI chatbot in the b...,The AI chatbot is designed to answer frequentl...,course_material/general/syllabus.md
1,Q2,Primer on Business Analytics with Python\nModu...,What are the basic techniques in text analytic...,The basic techniques in text analytics covered...,course_material/module 5/lecture slides.pptx
2,Q3,Primer on Business Analytics \nwith Python\nMo...,What is the business example of using linear r...,Predicting sales based on advertising spend.,course_material/module 2/lecture slides.pdf
3,Q4,Primer on Business Analytics with Python\nModu...,What are the three tasks in the online exam fo...,The three tasks in the online exam are Code In...,course_material/module 6/lecture slides.pptx
4,Q5,Primer on Business Analytics with Python\nModu...,What are some common algorithms used in superv...,Some common algorithms used in supervised mach...,course_material/module 4/lecture slides.pptx
5,Q6,"[Slide 1: Title Slide]\nHello again, everyone!...",What is the purpose of a t-test in business an...,A t-test allows us to compare the means of two...,course_material/module 2/transcript.txt
6,Q7,# Primer on Business Analytics with Python\n##...,What are the main concepts covered in the Busi...,The main concepts covered in the course are ba...,course_material/module 6/transcript.txt
7,Q8,"Thank you for your participation, and I'm exci...",What is the speaker excited to continue in the...,The speaker is excited to continue the learnin...,course_material/module 2/transcript.txt
8,Q9,"[Slide 1: Title Slide]\nHello, everyone! Welco...",What is the go-to tool for data manipulation a...,The go-to tool for data manipulation and analy...,course_material/module 1/transcript.txt
9,Q10,- Split data into training and test sets for u...,What is a confusion matrix in model evaluation?\n,A confusion matrix is a table that describes t...,course_material/module 4/lecture slides.pptx


In [12]:
df_evaluation = df_questions.copy()

In [13]:
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceInstructEmbeddings
from huggingface_hub import notebook_login

In [14]:
notebook_login()
'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'


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

'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'

I0000 00:00:1722961032.155708 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722961032.207137 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722961032.323057 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


https://huggingface.co/spaces/mteb/leaderboard
- Model Types: Open, Sentence Transformers
- Model Sizes: <100M

In [15]:
embedding_models = [
    "sentence-transformers/all-MiniLM-L6-v2",  # Efficient and fast model for sentence embeddings
    'sentence-transformers/all-MiniLM-L12-v2',
    'TaylorAI/gte-tiny',
]

In [16]:
from deepeval.metrics import ContextualPrecisionMetric, FaithfulnessMetric, ContextualRecallMetric, ContextualRelevancyMetric, HallucinationMetric


In [94]:
eval_metrics = {
    'Answer Relevance': AnswerRelevancyMetric(threshold=0.5), 
    'Faithfulness':FaithfulnessMetric(
        threshold=0.7,
        model="gpt-4",
        include_reason=True
    ),
    'Contextual Relevancy': ContextualRelevancyMetric(
        threshold=0.7,
        model="gpt-4",
        include_reason=True
    ),
# The following metrics through errors regarding parsing json in the LLM response:
#    'Contextual Precision': ContextualPrecisionMetric(
#         threshold=0.7,
#         model="gpt-4",
#         include_reason=True
#     ),
#     'Hallucination': HallucinationMetric(threshold=0.5)
# 'Contextual Recall': ContextualRecallMetric(
#     threshold=0.7,
#     model="gpt-4",
#     include_reason=True
# ),
}

In [96]:
def add_metrics(dataset, model, df_model):
    df_model['Model'] = model
    results = evaluate(dataset, list(eval_metrics.values()))

    metric_labels = list(eval_metrics.keys())

    for m_index in range(len(metric_labels)):
        metric_label = metric_labels[m_index]
        df_model[metric_label] = [
            r.metrics_metadata[m_index].score for r in results
        ]
        df_model[f"{metric_label} - Reason"] = [ 
            r.metrics_metadata[m_index].reason for r in results
        ]

    return df_model


In [80]:
def add_metrics(dataset, model, df_model, index):
    df_model['Model'] = model
    results = evaluate(dataset, list(eval_metrics.values()))

    metric_labels = list(eval_metrics.keys())

    for m_index in range(len(metric_labels)):
        metric_label = metric_labels[m_index]
        df_model.loc[index, metric_label] = results[0].metrics_metadata[m_index].score 
        df_model.loc[index, f"{metric_label} - Reason"] = results[0].metrics_metadata[m_index].reason 

    return df_model


In [92]:
def evaluate_rag(model, df_questions):
    print("Creating Test Cases...")
    print("--------------------------------------------------------------------------------")
    reload_vector_store(model)
    df_model = df_questions.copy()
    test_cases = []

    for _, row in df_model.iterrows():
        question = row['question']
        answer = query_gpt_chat(question, [])[1]
        test = LLMTestCase(
            input=question,
            actual_output=answer,
            expected_output=row['answer'],
            retrieval_context=[row['context'] or '']
        )
        test_cases.append(test)
        
    print("Evaluating Test Cases...")
    print("--------------------------------------------------------------------------------")
    dataset = EvaluationDataset(test_cases=test_cases)
    df_model = add_metrics(dataset, model, df_model)

    print("--------------------------------------------------------------------------------")
    print("Completed Evaluation")
    return df_model


In [98]:
df_evaluate

Unnamed: 0,ques_id,context,question,answer,source_doc,Model,Answer Relevance,Answer Relevance - Reason,Faithfulness,Faithfulness - Reason,Contextual Relevancy,Contextual Relevancy - Reason
0,Q1,- **AI Chatbot**: An AI chatbot is available f...,What is the purpose of the AI chatbot in the b...,The AI chatbot is designed to answer frequentl...,course_material/general/syllabus.md,sentence-transformers/all-MiniLM-L6-v2,0.5,The score is 0.50 because while some parts of ...,1.0,The score is 1.00 because there are no contrad...,1.0,The score is 1.00 because the retrieval contex...
1,Q2,Primer on Business Analytics with Python\nModu...,What are the basic techniques in text analytic...,The basic techniques in text analytics covered...,course_material/module 5/lecture slides.pptx,sentence-transformers/all-MiniLM-L6-v2,0.9375,The score is 0.94 because the answer is mostly...,1.0,The score is 1.00 because there are no contrad...,1.0,The score is 1.00 because the retrieval contex...
2,Q3,Primer on Business Analytics \nwith Python\nMo...,What is the business example of using linear r...,Predicting sales based on advertising spend.,course_material/module 2/lecture slides.pdf,sentence-transformers/all-MiniLM-L6-v2,0.857143,The score is 0.86 because the majority of the ...,1.0,The score is 1.00 because there are no contrad...,1.0,The score is 1.00 because the input was perfec...
3,Q4,Primer on Business Analytics with Python\nModu...,What are the three tasks in the online exam fo...,The three tasks in the online exam are Code In...,course_material/module 6/lecture slides.pptx,sentence-transformers/all-MiniLM-L6-v2,0.8,The score is 0.80 because the answer mostly ad...,1.0,The score is 1.00 because there were no contra...,1.0,The score is 1.00 because the retrieval contex...
4,Q5,Primer on Business Analytics with Python\nModu...,What are some common algorithms used in superv...,Some common algorithms used in supervised mach...,course_material/module 4/lecture slides.pptx,sentence-transformers/all-MiniLM-L6-v2,0.555556,The score is 0.56 because while the output inc...,1.0,The score is 1.00 because there are no contrad...,1.0,The score is 1.00 because the retrieval contex...
5,Q6,"[Slide 1: Title Slide]\nHello again, everyone!...",What is the purpose of a t-test in business an...,A t-test allows us to compare the means of two...,course_material/module 2/transcript.txt,sentence-transformers/all-MiniLM-L6-v2,0.8,The score is 0.80 because the answer mostly ad...,1.0,The score is 1.00 because there are no contrad...,1.0,The score is 1.00 because the retrieval contex...
6,Q7,# Primer on Business Analytics with Python\n##...,What are the main concepts covered in the Busi...,The main concepts covered in the course are ba...,course_material/module 6/transcript.txt,sentence-transformers/all-MiniLM-L6-v2,0.875,The score is 0.88 because the answer thoroughl...,1.0,The score is 1.00 because there were no contra...,1.0,The score is 1.00 because the retrieval perfec...
7,Q8,"Thank you for your participation, and I'm exci...",What is the speaker excited to continue in the...,The speaker is excited to continue the learnin...,course_material/module 2/transcript.txt,sentence-transformers/all-MiniLM-L6-v2,0.75,The score is 0.75 because the statement 'For m...,1.0,The score is 1.00 because there are no contrad...,1.0,The score is 1.00 because the retrieval contex...
8,Q9,"[Slide 1: Title Slide]\nHello, everyone! Welco...",What is the go-to tool for data manipulation a...,The go-to tool for data manipulation and analy...,course_material/module 1/transcript.txt,sentence-transformers/all-MiniLM-L6-v2,0.75,The score is 0.75 because the output partially...,1.0,The score is 1.00 because there are no contrad...,1.0,The score is 1.00 because there are no listed ...
9,Q10,- Split data into training and test sets for u...,What is a confusion matrix in model evaluation?\n,A confusion matrix is a table that describes t...,course_material/module 4/lecture slides.pptx,sentence-transformers/all-MiniLM-L6-v2,0.9,The score is 0.90 because the explanation of a...,1.0,The score is 1.00 because there are no contrad...,1.0,The score is 1.00 because the retrieval contex...


In [97]:
df_evaluate = evaluate_rag("sentence-transformers/all-MiniLM-L6-v2", df_questions)

Creating Test Cases...
--------------------------------------------------------------------------------
load INSTRUCTOR_Transformer
max_seq_length  512
Prompt tokens: 599
Completion tokens: 79
Total tokens: 678
Prompt tokens: 577
Completion tokens: 172
Total tokens: 749
Prompt tokens: 597
Completion tokens: 149
Total tokens: 746
Prompt tokens: 605
Completion tokens: 102
Total tokens: 707
Prompt tokens: 528
Completion tokens: 147
Total tokens: 675
Prompt tokens: 568
Completion tokens: 118
Total tokens: 686
Prompt tokens: 588
Completion tokens: 188
Total tokens: 776
Prompt tokens: 492
Completion tokens: 80
Total tokens: 572
Prompt tokens: 634
Completion tokens: 90
Total tokens: 724


Output()

Prompt tokens: 582
Completion tokens: 194
Total tokens: 776
Evaluating Test Cases...
--------------------------------------------------------------------------------
Evaluating test cases...
Event loop is already running. Applying nest_asyncio patch to allow async execution...


I0000 00:00:1722976136.723829 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976136.759680 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976136.781025 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976136.803101 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976137.872281 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976138.942023 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976141.603660 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976145.210441 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976150.855516 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722976152.569544 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976152.592992 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976152.611493 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976152.630735 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976154.008117 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976155.526490 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976160.296723 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976162.901442 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976170.673637 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722976173.306562 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976173.331492 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976173.353378 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976173.371032 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976175.074121 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976177.019906 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976180.402077 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976185.672258 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976185.672399 24998719 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers


I0000 00:00:1722976190.874134 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722976193.110945 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976193.135311 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976193.157856 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976193.178306 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976194.947236 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976199.286702 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976202.256285 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976204.601934 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976209.785752 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722976211.860554 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976211.888579 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976211.903528 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976211.920449 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976213.368051 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976215.044077 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976220.441457 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976224.433052 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976228.533997 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722976231.059006 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976231.082178 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976231.100006 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976231.117263 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976232.540642 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976234.669975 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976237.970896 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976238.373802 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976242.655805 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722976245.242716 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976245.264415 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976245.284362 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976245.301842 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976246.548929 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976248.666237 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976252.474223 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976259.024616 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976268.871072 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722976271.198409 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976271.225747 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976271.243513 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976271.261093 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976272.716642 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976273.510888 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976275.975080 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976275.975248 24998719 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers


I0000 00:00:1722976276.515218 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976280.032713 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722976282.456421 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976282.477513 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976282.499557 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976282.518992 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976283.730139 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976285.111910 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976287.437525 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976293.238627 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976298.304132 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722976301.089656 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976301.109539 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976301.131943 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976301.156169 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976302.367368 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976305.427144 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976309.340381 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976316.917912 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976323.448650 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork




Metrics Summary

  - ✅ Answer Relevancy (score: 0.5, threshold: 0.5, strict: False, evaluation model: gpt-4o, reason: The score is 0.50 because while some parts of the output may relate to the AI chatbot's purpose, multiple statements such as 'For more information, see:', 'General:', and 'Syllabus.' are irrelevant and detract from fully answering the question., error: None)
  - ✅ Faithfulness (score: 1.0, threshold: 0.7, strict: False, evaluation model: gpt-4, reason: The score is 1.00 because there are no contradictions between the actual output and the retrieval context., error: None)
  - ✅ Contextual Relevancy (score: 1.0, threshold: 0.7, strict: False, evaluation model: gpt-4, reason: The score is 1.00 because the retrieval context perfectly aligns with the input, showcasing the AI chatbot's relevance in the business analytics course., error: None)

For test case:

  - input: What is the purpose of the AI chatbot in the business analytics course?

  - actual output: The AI chatbo

--------------------------------------------------------------------------------
Completed Evaluation


In [100]:
df_results = None
for model in embedding_models:
    df_model = evaluate_rag(model, df_questions)
    df_results = df_model if df_results is None else pd.concat([df_results, df_model])

Creating Test Cases...
--------------------------------------------------------------------------------
load INSTRUCTOR_Transformer
max_seq_length  512
Prompt tokens: 599
Completion tokens: 79
Total tokens: 678
Prompt tokens: 577
Completion tokens: 164
Total tokens: 741
Prompt tokens: 597
Completion tokens: 128
Total tokens: 725
Prompt tokens: 605
Completion tokens: 105
Total tokens: 710
Prompt tokens: 528
Completion tokens: 151
Total tokens: 679
Prompt tokens: 568
Completion tokens: 111
Total tokens: 679
Prompt tokens: 588
Completion tokens: 187
Total tokens: 775
Prompt tokens: 492
Completion tokens: 64
Total tokens: 556
Prompt tokens: 634
Completion tokens: 90
Total tokens: 724


Output()

Prompt tokens: 582
Completion tokens: 170
Total tokens: 752
Evaluating Test Cases...
--------------------------------------------------------------------------------
Evaluating test cases...
Event loop is already running. Applying nest_asyncio patch to allow async execution...


I0000 00:00:1722976608.334434 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976608.371150 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976608.393794 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976608.417353 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976609.554884 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976610.660721 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976616.624658 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976623.792334 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722976626.773662 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976626.793477 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976626.810449 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976626.830873 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976628.008066 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976629.417423 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976632.464349 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976638.464679 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976647.130558 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722976649.616334 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976649.634127 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976649.648639 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976649.664319 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976650.917044 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976652.152540 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976655.365918 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976664.184801 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976668.881524 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722976672.408694 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976672.430534 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976672.450396 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976672.467750 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976674.523892 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976674.932078 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976676.557039 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976682.672310 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976689.123876 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722976693.238564 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976693.261619 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976693.277662 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976693.300073 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976694.546679 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976696.431352 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976699.300648 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976709.498806 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976715.849237 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722976718.264381 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976718.286539 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976718.304712 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976718.322682 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976719.571647 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976720.936684 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976724.228618 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976727.956863 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976734.047080 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722976737.166435 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976737.190690 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976737.212250 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976737.233321 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976738.353312 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976740.425810 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976743.498148 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976750.356417 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976760.687698 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722976764.046370 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976764.073321 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976764.092644 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976764.112157 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976765.402296 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976765.572209 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976767.438314 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976768.516940 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976771.142890 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976771.143070 24998719 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers


Output()

I0000 00:00:1722976773.709655 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976773.731820 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976773.751248 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976773.768388 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976775.172675 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976775.890503 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976780.702184 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976784.165682 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976788.554497 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722976791.842928 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976791.843054 24998719 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers
I0000 00:00:1722976791.885273 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976791.936085 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976792.006946 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976793.211760 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976797.652187 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976803.157985 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976803.819282 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976808.642922 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork




Metrics Summary

  - ✅ Answer Relevancy (score: 0.75, threshold: 0.5, strict: False, evaluation model: gpt-4o, reason: The score is 0.75 because the majority of the output is relevant to explaining the purpose of the AI chatbot in the business analytics course. However, the statement 'For more information, see: General: Syllabus.' is not directly relevant to the question., error: None)
  - ✅ Faithfulness (score: 1.0, threshold: 0.7, strict: False, evaluation model: gpt-4, reason: The score is 1.00 because there are no contradictions detected between the retrieval context and the actual output., error: None)
  - ✅ Contextual Relevancy (score: 1.0, threshold: 0.7, strict: False, evaluation model: gpt-4, reason: The score is 1.00 because the retrieval context perfectly matches the input, indicating high relevance., error: None)

For test case:

  - input: What is the purpose of the AI chatbot in the business analytics course?

  - actual output: The AI chatbot in this business analytics

--------------------------------------------------------------------------------
Completed Evaluation
Creating Test Cases...
--------------------------------------------------------------------------------
load INSTRUCTOR_Transformer
max_seq_length  512
Prompt tokens: 611
Completion tokens: 95
Total tokens: 706
Prompt tokens: 592
Completion tokens: 144
Total tokens: 736
Prompt tokens: 634
Completion tokens: 96
Total tokens: 730
Prompt tokens: 630
Completion tokens: 132
Total tokens: 762
Prompt tokens: 540
Completion tokens: 156
Total tokens: 696
Prompt tokens: 621
Completion tokens: 95
Total tokens: 716
Prompt tokens: 606
Completion tokens: 180
Total tokens: 786
Prompt tokens: 564
Completion tokens: 63
Total tokens: 627
Prompt tokens: 643
Completion tokens: 73
Total tokens: 716


Output()

Prompt tokens: 613
Completion tokens: 170
Total tokens: 783
Evaluating Test Cases...
--------------------------------------------------------------------------------
Evaluating test cases...
Event loop is already running. Applying nest_asyncio patch to allow async execution...


I0000 00:00:1722976877.483837 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976877.517320 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976877.540320 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976877.562308 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976879.022664 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976879.906130 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976882.148700 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976885.078450 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976888.626395 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722976891.250090 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976891.277636 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976891.299629 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976891.319675 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976892.487612 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976895.055073 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976898.006869 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976904.055439 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976909.469454 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722976912.015312 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976912.040044 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976912.063496 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976912.084516 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976913.928264 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976914.545390 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976916.498964 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976927.406725 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976931.196090 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722976933.726167 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976933.752940 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976933.772426 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976933.791306 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976935.116107 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976937.132128 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976939.984231 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976942.567726 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976948.553585 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722976950.817461 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976950.844616 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976950.866715 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976950.889494 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976951.948138 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976955.825054 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976958.748292 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976965.146478 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976972.651067 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722976975.575222 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976975.599751 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976975.620618 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976975.642753 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976976.900523 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976976.900662 24998719 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers


I0000 00:00:1722976977.949417 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976980.581401 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976983.180429 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976987.975162 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722976991.852271 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976991.852456 24998719 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers
I0000 00:00:1722976991.879328 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976991.879485 24998719 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers
I0000 00:00:1722976991.898549 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976991.898669 24998719 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers
I0000 00:00:1722976991.921845 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722976991.922083 24998719 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers


I0000 00:00:1722976992.967554 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976995.302446 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722976997.793827 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977008.201334 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977021.721446 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722977025.539126 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977025.563282 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977025.582798 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977025.599611 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977026.858281 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977026.858401 24998719 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers


I0000 00:00:1722977027.315827 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977030.411167 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977031.240578 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977036.586561 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722977039.965819 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977039.991912 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977040.034353 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977040.057959 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977041.047317 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977041.744281 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977041.744387 24998719 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers


I0000 00:00:1722977045.096073 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977050.189381 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977054.183558 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722977056.755363 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977056.755486 24998719 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers
I0000 00:00:1722977056.775528 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977056.775661 24998719 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers
I0000 00:00:1722977056.792114 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977056.792227 24998719 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers
I0000 00:00:1722977056.811765 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977056.811895 24998719 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers


I0000 00:00:1722977057.988753 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977059.988562 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977063.809761 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977068.629965 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977076.501780 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork




Metrics Summary

  - ✅ Answer Relevancy (score: 0.8, threshold: 0.5, strict: False, evaluation model: gpt-4o, reason: The score is 0.80 because the majority of the output addressed the purpose of the AI chatbot in the business analytics course. However, the inclusion of the statement 'For more information, see: Module 5: Lecture Slides.' was not directly relevant to the question, preventing a higher score., error: None)
  - ✅ Faithfulness (score: 1.0, threshold: 0.7, strict: False, evaluation model: gpt-4, reason: The score is 1.00 because there are no contradictions between the retrieval context and the actual output., error: None)
  - ✅ Contextual Relevancy (score: 1.0, threshold: 0.7, strict: False, evaluation model: gpt-4, reason: The score is 1.00 because the retrieval context perfectly aligns with the input, showing a high degree of relevancy., error: None)

For test case:

  - input: What is the purpose of the AI chatbot in the business analytics course?

  - actual output: Th

--------------------------------------------------------------------------------
Completed Evaluation
Creating Test Cases...
--------------------------------------------------------------------------------
load INSTRUCTOR_Transformer
max_seq_length  512
Prompt tokens: 427
Completion tokens: 66
Total tokens: 493
Prompt tokens: 956
Completion tokens: 111
Total tokens: 1067
Prompt tokens: 591
Completion tokens: 118
Total tokens: 709
Prompt tokens: 969
Completion tokens: 47
Total tokens: 1016
Prompt tokens: 964
Completion tokens: 119
Total tokens: 1083
Prompt tokens: 589
Completion tokens: 37
Total tokens: 626
Prompt tokens: 955
Completion tokens: 172
Total tokens: 1127
Prompt tokens: 420
Completion tokens: 37
Total tokens: 457
Prompt tokens: 545
Completion tokens: 48
Total tokens: 593


Output()

Prompt tokens: 562
Completion tokens: 176
Total tokens: 738
Evaluating Test Cases...
--------------------------------------------------------------------------------
Evaluating test cases...
Event loop is already running. Applying nest_asyncio patch to allow async execution...


I0000 00:00:1722977138.994922 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977139.016635 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977139.032574 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977139.050210 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977140.675642 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977140.695142 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977142.449378 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977146.854541 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977146.854705 24998719 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers


I0000 00:00:1722977152.073033 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722977154.303055 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977154.328314 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977154.347233 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977154.367129 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977155.659564 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977157.727896 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977160.795514 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977164.874928 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977169.354090 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722977171.876496 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977171.876629 24998719 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers
I0000 00:00:1722977171.898264 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977171.898379 24998719 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers
I0000 00:00:1722977171.919645 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977171.919810 24998719 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers
I0000 00:00:1722977171.942062 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977171.942237 24998719 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers


I0000 00:00:1722977173.123671 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977174.163098 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977176.384679 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977190.597692 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977195.625643 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722977198.064733 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977198.096440 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977198.111446 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977198.127177 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977199.787861 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977199.886627 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977201.248275 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977205.775564 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977208.041521 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722977211.093359 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977211.115312 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977211.133321 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977211.150748 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977212.230594 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977214.031936 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977216.795772 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977222.437736 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977229.541936 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722977233.684416 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977233.709264 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977233.725085 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977233.742820 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977234.973381 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977234.994523 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977237.231087 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977241.721055 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977245.709133 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722977248.682891 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977248.710015 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977248.726396 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977248.745734 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977249.841440 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977251.901845 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977251.902012 24998719 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers


I0000 00:00:1722977254.156564 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977258.552266 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977266.000989 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722977268.739206 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977268.768349 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977268.790916 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977268.813454 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977269.757008 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977269.974410 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977271.312277 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977271.978193 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977271.978346 24998719 fork_posix.cc:77] Other threads are currently calling into gRPC, skipping fork() handlers


I0000 00:00:1722977275.219694 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722977279.455867 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977279.475078 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977279.491613 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977279.514386 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977281.026400 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977281.311310 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977285.221079 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977290.006966 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977296.877530 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


Output()

I0000 00:00:1722977301.339677 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977301.360554 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977301.380498 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork
I0000 00:00:1722977301.398157 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977302.623764 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977304.913247 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977307.339855 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977311.280370 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork


I0000 00:00:1722977316.268591 24998719 work_stealing_thread_pool.cc:320] WorkStealingThreadPoolImpl::PrepareFork




Metrics Summary

  - ✅ Answer Relevancy (score: 0.6666666666666666, threshold: 0.5, strict: False, evaluation model: gpt-4o, reason: The score is 0.67 because while the main points about the AI chatbot's purpose in the business analytics course are addressed, the statement 'For more information, see: General: Syllabus.' is irrelevant and does not contribute to answering the question., error: None)
  - ✅ Faithfulness (score: 1.0, threshold: 0.7, strict: False, evaluation model: gpt-4, reason: The score is 1.00 because there are no contradictions present in the actual output., error: None)
  - ✅ Contextual Relevancy (score: 1.0, threshold: 0.7, strict: False, evaluation model: gpt-4, reason: The score is 1.00 because the context perfectly aligns with the input, requiring no reasons for irrelevancy., error: None)

For test case:

  - input: What is the purpose of the AI chatbot in the business analytics course?

  - actual output: The AI chatbot in the business analytics course is desig

--------------------------------------------------------------------------------
Completed Evaluation


In [101]:
df_results

Unnamed: 0,ques_id,context,question,answer,source_doc,Model,Answer Relevance,Answer Relevance - Reason,Faithfulness,Faithfulness - Reason,Contextual Relevancy,Contextual Relevancy - Reason
0,Q1,- **AI Chatbot**: An AI chatbot is available f...,What is the purpose of the AI chatbot in the b...,The AI chatbot is designed to answer frequentl...,course_material/general/syllabus.md,sentence-transformers/all-MiniLM-L6-v2,0.75,The score is 0.75 because the majority of the ...,1.0,The score is 1.00 because there are no contrad...,1.0,The score is 1.00 because the retrieval contex...
1,Q2,Primer on Business Analytics with Python\nModu...,What are the basic techniques in text analytic...,The basic techniques in text analytics covered...,course_material/module 5/lecture slides.pptx,sentence-transformers/all-MiniLM-L6-v2,0.857143,The score is 0.86 because the primary content ...,1.0,The score is 1.00 because there are no contrad...,1.0,The score is 1.00 because the retrieval contex...
2,Q3,Primer on Business Analytics \nwith Python\nMo...,What is the business example of using linear r...,Predicting sales based on advertising spend.,course_material/module 2/lecture slides.pdf,sentence-transformers/all-MiniLM-L6-v2,0.857143,The score is 0.86 because the main content eff...,1.0,The score is 1.00 because there are no contrad...,1.0,The score is 1.00 because the retrieval contex...
3,Q4,Primer on Business Analytics with Python\nModu...,What are the three tasks in the online exam fo...,The three tasks in the online exam are Code In...,course_material/module 6/lecture slides.pptx,sentence-transformers/all-MiniLM-L6-v2,0.875,The score is 0.88 because the main tasks for t...,0.8,The score is 0.80 because the actual output in...,1.0,The score is 1.00 because the retrieval contex...
4,Q5,Primer on Business Analytics with Python\nModu...,What are some common algorithms used in superv...,Some common algorithms used in supervised mach...,course_material/module 4/lecture slides.pptx,sentence-transformers/all-MiniLM-L6-v2,0.888889,The score is 0.89 because the answer mostly pr...,1.0,The score is 1.00 because there are no contrad...,1.0,The score is 1.00 because the retrieval contex...
5,Q6,"[Slide 1: Title Slide]\nHello again, everyone!...",What is the purpose of a t-test in business an...,A t-test allows us to compare the means of two...,course_material/module 2/transcript.txt,sentence-transformers/all-MiniLM-L6-v2,0.8,The score is 0.80 because while the main expla...,0.75,The score is 0.75 because the actual output in...,1.0,The score is 1.00 because the retrieval contex...
6,Q7,# Primer on Business Analytics with Python\n##...,What are the main concepts covered in the Busi...,The main concepts covered in the course are ba...,course_material/module 6/transcript.txt,sentence-transformers/all-MiniLM-L6-v2,0.875,The score is 0.88 because the answer mostly ad...,0.727273,The score is 0.73 because the actual output in...,1.0,The score is 1.00 because the retrieval contex...
7,Q8,"Thank you for your participation, and I'm exci...",What is the speaker excited to continue in the...,The speaker is excited to continue the learnin...,course_material/module 2/transcript.txt,sentence-transformers/all-MiniLM-L6-v2,0.666667,The score is 0.67 because the statement 'For m...,1.0,The score is 1.00 because there are no contrad...,1.0,The score is 1.00 because the retrieval contex...
8,Q9,"[Slide 1: Title Slide]\nHello, everyone! Welco...",What is the go-to tool for data manipulation a...,The go-to tool for data manipulation and analy...,course_material/module 1/transcript.txt,sentence-transformers/all-MiniLM-L6-v2,0.428571,The score is 0.43 because the output contains ...,1.0,The score is 1.00 because there are no contrad...,1.0,The score is 1.00 because the retrieval contex...
9,Q10,- Split data into training and test sets for u...,What is a confusion matrix in model evaluation?\n,A confusion matrix is a table that describes t...,course_material/module 4/lecture slides.pptx,sentence-transformers/all-MiniLM-L6-v2,0.666667,The score is 0.67 because while the answer pro...,1.0,The score is 1.00 because there are no contrad...,1.0,The score is 1.00 because the retrieval contex...


In [102]:
df_results.to_csv('evaluation/evaluations_v3.csv', index=False)

In [103]:
df_results

Unnamed: 0,ques_id,context,question,answer,source_doc,Model,Answer Relevance,Answer Relevance - Reason,Faithfulness,Faithfulness - Reason,Contextual Relevancy,Contextual Relevancy - Reason
0,Q1,- **AI Chatbot**: An AI chatbot is available f...,What is the purpose of the AI chatbot in the b...,The AI chatbot is designed to answer frequentl...,course_material/general/syllabus.md,sentence-transformers/all-MiniLM-L6-v2,0.75,The score is 0.75 because the majority of the ...,1.0,The score is 1.00 because there are no contrad...,1.0,The score is 1.00 because the retrieval contex...
1,Q2,Primer on Business Analytics with Python\nModu...,What are the basic techniques in text analytic...,The basic techniques in text analytics covered...,course_material/module 5/lecture slides.pptx,sentence-transformers/all-MiniLM-L6-v2,0.857143,The score is 0.86 because the primary content ...,1.0,The score is 1.00 because there are no contrad...,1.0,The score is 1.00 because the retrieval contex...
2,Q3,Primer on Business Analytics \nwith Python\nMo...,What is the business example of using linear r...,Predicting sales based on advertising spend.,course_material/module 2/lecture slides.pdf,sentence-transformers/all-MiniLM-L6-v2,0.857143,The score is 0.86 because the main content eff...,1.0,The score is 1.00 because there are no contrad...,1.0,The score is 1.00 because the retrieval contex...
3,Q4,Primer on Business Analytics with Python\nModu...,What are the three tasks in the online exam fo...,The three tasks in the online exam are Code In...,course_material/module 6/lecture slides.pptx,sentence-transformers/all-MiniLM-L6-v2,0.875,The score is 0.88 because the main tasks for t...,0.8,The score is 0.80 because the actual output in...,1.0,The score is 1.00 because the retrieval contex...
4,Q5,Primer on Business Analytics with Python\nModu...,What are some common algorithms used in superv...,Some common algorithms used in supervised mach...,course_material/module 4/lecture slides.pptx,sentence-transformers/all-MiniLM-L6-v2,0.888889,The score is 0.89 because the answer mostly pr...,1.0,The score is 1.00 because there are no contrad...,1.0,The score is 1.00 because the retrieval contex...
5,Q6,"[Slide 1: Title Slide]\nHello again, everyone!...",What is the purpose of a t-test in business an...,A t-test allows us to compare the means of two...,course_material/module 2/transcript.txt,sentence-transformers/all-MiniLM-L6-v2,0.8,The score is 0.80 because while the main expla...,0.75,The score is 0.75 because the actual output in...,1.0,The score is 1.00 because the retrieval contex...
6,Q7,# Primer on Business Analytics with Python\n##...,What are the main concepts covered in the Busi...,The main concepts covered in the course are ba...,course_material/module 6/transcript.txt,sentence-transformers/all-MiniLM-L6-v2,0.875,The score is 0.88 because the answer mostly ad...,0.727273,The score is 0.73 because the actual output in...,1.0,The score is 1.00 because the retrieval contex...
7,Q8,"Thank you for your participation, and I'm exci...",What is the speaker excited to continue in the...,The speaker is excited to continue the learnin...,course_material/module 2/transcript.txt,sentence-transformers/all-MiniLM-L6-v2,0.666667,The score is 0.67 because the statement 'For m...,1.0,The score is 1.00 because there are no contrad...,1.0,The score is 1.00 because the retrieval contex...
8,Q9,"[Slide 1: Title Slide]\nHello, everyone! Welco...",What is the go-to tool for data manipulation a...,The go-to tool for data manipulation and analy...,course_material/module 1/transcript.txt,sentence-transformers/all-MiniLM-L6-v2,0.428571,The score is 0.43 because the output contains ...,1.0,The score is 1.00 because there are no contrad...,1.0,The score is 1.00 because the retrieval contex...
9,Q10,- Split data into training and test sets for u...,What is a confusion matrix in model evaluation?\n,A confusion matrix is a table that describes t...,course_material/module 4/lecture slides.pptx,sentence-transformers/all-MiniLM-L6-v2,0.666667,The score is 0.67 because while the answer pro...,1.0,The score is 1.00 because there are no contrad...,1.0,The score is 1.00 because the retrieval contex...
