只针对真实性的评估：

- 至少跑通了，没有报错
- 但是看结果，也没有给出是否正确

参考：

- https://github.com/truera/trulens/blob/main/trulens_eval/examples/expositional/frameworks/llama_index/llama_index_groundtruth.ipynb

In [1]:
from trulens_eval import Tru

tru = Tru()
tru.reset_database()

🦑 Tru initialized with db url sqlite:///default.sqlite .
🛑 Secret keys may be written to the database. See the `database_redact_keys` option of `Tru` to prevent this.


In [2]:
%%time

from llama_index.core import Settings
from llama_index.llms.openai_like import OpenAILike
from llama_index.embeddings.ollama import OllamaEmbedding

Settings.chunk_size = 128
Settings.chunk_overlap = 16

Settings.llm = OpenAILike(
    model="qwen2", 
    api_base="http://monkey:11434/v1", 
    api_key="ollama",
    is_chat_model=True,
    temperature=0.1,
    request_timeout=60.0
)

Settings.embed_model =OllamaEmbedding(
    model_name="quentinz/bge-large-zh-v1.5",
    base_url="http://monkey:11434",
    ollama_additional_kwargs={"mirostat": 0}, # -mirostat N 使用 Mirostat 采样。
)

CPU times: user 23 ms, sys: 3.91 ms, total: 26.9 ms
Wall time: 26.5 ms


In [3]:
%%time

from llama_index.core import VectorStoreIndex
from llama_index.readers.web import SimpleWebPageReader

documents = SimpleWebPageReader(html_to_text=True).load_data(
    ["http://paulgraham.com/worked.html"]
)
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()

CPU times: user 811 ms, sys: 34.2 ms, total: 845 ms
Wall time: 19.5 s


In [4]:
golden_set = [
    {
        "query": "What was the author's undergraduate major?",
        "response": "He didn't choose a major, and customized his courses.",
    },
    {
        "query": "What company did the author start in 1995?",
        "response": "Viaweb, to make software for building online stores.",
    },
    {
        "query": "Where did the author move in 1998 after selling Viaweb?",
        "response": "California, after Yahoo acquired Viaweb.",
    },
    {
        "query": "What did the author do after leaving Yahoo in 1999?",
        "response": "He focused on painting and tried to improve his art skills.",
    },
    {
        "query": "What program did the author start with Jessica Livingston in 2005?",
        "response": "Y Combinator, to provide seed funding for startups.",
    },
]

In [30]:
# from trulens_eval.feedback.provider import LiteLLM

# provider = LiteLLM(
#     model_engine="ollama/qwen2", 
#     api_base="http://monkey:11434/v1"
# )

import os
from trulens_eval.feedback.provider import OpenAI

# os.environ["OPENAI_API_KEY"] = "sk-FdNf8kpfMG1yy1pU16D8E82592974693Bf40E2Df39117991"
# os.environ["OPENAI_API_BASE"] = "https://api.bianxie.ai/v1"

# provider = OpenAI()

os.environ["OPENAI_API_KEY"] = "sk-bJP6QSnUfjAYeYeE505d3eBf63A643BeB0B8E350Df9b7750"
os.environ["OPENAI_API_BASE"] = "https://ape:3000/v1"
provider = OpenAI()

In [31]:
from trulens_eval import Feedback
from trulens_eval.feedback import GroundTruthAgreement
from trulens_eval.app import App

# f_groundtruth = (
#     Feedback(
#         provider.groundedness_measure_with_cot_reasons, name="Groundedness"
#     )
#     .on(context.collect())  # collect context chunks into a list
#     .on_output()
# )

f_groundtruth = Feedback(
    GroundTruthAgreement(golden_set).agreement_measure, name="Ground Truth Eval"
).on_input_output()

✅ In Ground Truth Eval, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Ground Truth Eval, input response will be set to __record__.main_output or `Select.RecordOutput` .


In [32]:
from trulens_eval import TruLlama

tru_query_engine_recorder = TruLlama(
    query_engine,
    app_id="LlamaIndex_App1",
    feedbacks=[f_groundtruth],
)

In [33]:
import nltk
nltk.set_proxy('http://myproxy:7890')

# Run and evaluate on groundtruth questions
for pair in golden_set:
    with tru_query_engine_recorder as recording:
        llm_response = query_engine.query(pair["query"])
        print(llm_response)

The author did not choose a specific major for their undergraduate studies. They attended a program at Cornell that allowed them to take whatever classes they liked and choose what they wanted to put on their degree.
The context information does not provide the name of the company that was started by the author in 1995.
The context information does not provide details about where the author moved after selling Viaweb.
After leaving Yahoo in 1999, the author decided to start a new startup. This decision was seen as an insanely ambitious plan by his boss, who was then a billionaire, but also considered plausible given the high value of the author's options at that time, which were worth about $2 million a month.
In early 2005, Jessica Livingston interviewed for a marketing job at a Boston VC firm.


In [34]:
records, feedback = tru.get_records_and_feedback(
    app_ids=[]
)  # pass an empty list of app_ids to get all
records.head()

Unnamed: 0,app_id,app_json,type,record_id,input,output,tags,record_json,cost_json,perf_json,ts,Groundedness_calls,Ground Truth Eval_calls,latency,total_tokens,total_cost
0,LlamaIndex_App1,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_d6778cafdde9ce10e5b5bbd5dfbb08fc,"""What was the author's undergraduate major?""","""The author did not choose a specific major fo...",-,"{""record_id"": ""record_hash_d6778cafdde9ce10e5b...","{""n_requests"": 0, ""n_successful_requests"": 0, ...","{""start_time"": ""2024-08-05T18:14:50.228810"", ""...",2024-08-05T18:14:56.286164,[],,6,0,0.0
1,LlamaIndex_App1,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_3d3b7f62b9ec2e23d9e4795733ea2a2f,"""What company did the author start in 1995?""","""The context information does not specify what...",-,"{""record_id"": ""record_hash_3d3b7f62b9ec2e23d9e...","{""n_requests"": 0, ""n_successful_requests"": 0, ...","{""start_time"": ""2024-08-05T18:14:56.477185"", ""...",2024-08-05T18:14:57.827139,[],,1,0,0.0
2,LlamaIndex_App1,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_b33d7a5f65e59decc32c545393e02bfc,"""Where did the author move in 1998 after selli...","""The context information does not provide deta...",-,"{""record_id"": ""record_hash_b33d7a5f65e59decc32...","{""n_requests"": 0, ""n_successful_requests"": 0, ...","{""start_time"": ""2024-08-05T18:14:58.010071"", ""...",2024-08-05T18:14:59.248015,[],,1,0,0.0
3,LlamaIndex_App1,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_81e13cb2bd8eb41a76eecd6e3e18800a,"""What did the author do after leaving Yahoo in...","""After leaving Yahoo in 1999, the author decid...",-,"{""record_id"": ""record_hash_81e13cb2bd8eb41a76e...","{""n_requests"": 0, ""n_successful_requests"": 0, ...","{""start_time"": ""2024-08-05T18:14:59.431104"", ""...",2024-08-05T18:15:01.726491,[],,2,0,0.0
4,LlamaIndex_App1,"{""tru_class_info"": {""name"": ""TruLlama"", ""modul...",RetrieverQueryEngine(llama_index.core.query_en...,record_hash_651fc9677ff51cd9d83d42137c8703e2,"""What program did the author start with Jessic...","""In early 2005, Jessica Livingston interviewed...",-,"{""record_id"": ""record_hash_651fc9677ff51cd9d83...","{""n_requests"": 0, ""n_successful_requests"": 0, ...","{""start_time"": ""2024-08-05T18:15:01.910387"", ""...",2024-08-05T18:15:03.295856,[],,1,0,0.0
