# TruLens 集成 LlamaIndex 的使用

参考：

- https://github.com/truera/trulens/blob/main/trulens_eval/examples/expositional/models/ollama_quickstart.ipynb
- https://www.trulens.org/trulens_eval/getting_started/quickstarts/llama_index_quickstart/

In [1]:
%%time
%%capture

!pip install trulens_eval llama_index

CPU times: user 11.7 ms, sys: 4.99 ms, total: 16.7 ms
Wall time: 2.06 s


In [2]:
%%time

import nltk
nltk.set_proxy('http://myproxy:7890')

from trulens_eval import Tru
tru = Tru()
tru.reset_database()

CPU times: user 14.3 ms, sys: 3.45 ms, total: 17.7 ms
Wall time: 43.6 ms


In [3]:
%%time

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import Settings
from llama_index.llms.openai_like import OpenAILike
from llama_index.embeddings.ollama import OllamaEmbedding

Settings.chunk_size = 128
Settings.chunk_overlap = 16

Settings.llm = OpenAILike(
    model="qwen2", 
    api_base="http://monkey:11434/v1", 
    api_key="ollama",
    is_chat_model=True,
    temperature=0.1,
    request_timeout=60.0
)

Settings.embed_model =OllamaEmbedding(
    model_name="quentinz/bge-large-zh-v1.5",
    base_url="http://monkey:11434",
    ollama_additional_kwargs={"mirostat": 0}, # -mirostat N 使用 Mirostat 采样。
)

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine(similarity_top_k=3)

CPU times: user 958 ms, sys: 80.6 ms, total: 1.04 s
Wall time: 21.6 s


In [4]:
%%time

response = query_engine.query("What did the author do growing up?")
print(response)

The provided context does not mention what the author did growing up; it discusses topics such as company growth, hiring practices, and decision-making in a startup environment. Therefore, based on this information alone, there is no way to determine what the author did growing up.
CPU times: user 65.2 ms, sys: 4.5 ms, total: 69.7 ms
Wall time: 2.74 s


In [5]:
%%time
%%capture

!pip install litellm

CPU times: user 6.96 ms, sys: 11.8 ms, total: 18.8 ms
Wall time: 1.79 s


In [6]:
%%time

import litellm

from trulens_eval.feedback.provider import LiteLLM

litellm.set_verbose = False

ollama_provider = LiteLLM(
    model_engine="ollama/qwen2", api_base="http://monkey:11434"
)

CPU times: user 51.5 ms, sys: 7.92 ms, total: 59.4 ms
Wall time: 59 ms


In [7]:
%%time

from trulens_eval.feedback.provider import OpenAI
from trulens_eval import Feedback
import numpy as np

# Initialize provider class
provider = ollama_provider

# select context to be used in feedback. the location of context is app specific.
from trulens_eval.app import App
context = App.select_context(query_engine)

# Define a groundedness feedback function
f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons, name = "Groundedness")
    .on(context.collect()) # collect context chunks into a list
    .on_output()
)

# Question/answer relevance between overall question and answer.
f_answer_relevance = (
    Feedback(provider.relevance_with_cot_reasons, name = "Answer Relevance")
    .on_input_output()
)
# Question/statement relevance between question and each context chunk.
f_context_relevance = (
    Feedback(provider.context_relevance_with_cot_reasons, name = "Context Relevance")
    .on_input()
    .on(context)
    .aggregate(np.mean)
)

✅ In Groundedness, input source will be set to __record__.app.query.rets.source_nodes[:].node.text.collect() .
✅ In Groundedness, input statement will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Answer Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Answer Relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Context Relevance, input question will be set to __record__.main_input or `Select.RecordInput` .
✅ In Context Relevance, input context will be set to __record__.app.query.rets.source_nodes[:].node.text .
CPU times: user 21.9 ms, sys: 7.77 ms, total: 29.7 ms
Wall time: 29.2 ms


In [8]:
%%time

from trulens_eval import TruLlama
tru_query_engine_recorder = TruLlama(query_engine,
    app_id='LlamaIndex_App1',
    feedbacks=[f_groundedness, f_answer_relevance, f_context_relevance])

CPU times: user 294 ms, sys: 7.15 ms, total: 301 ms
Wall time: 312 ms


In [10]:
%%time

litellm.set_verbose = True

# or as context manager
with tru_query_engine_recorder as recording:
    query_engine.query("What did the author do growing up?")

CPU times: user 1.54 s, sys: 24.6 ms, total: 1.57 s
Wall time: 2.99 s


In [11]:
%%time

last_record = recording.records[-1]

from trulens_eval.utils.display import get_feedback_result
get_feedback_result(last_record, "Context Relevance")

CPU times: user 5.73 ms, sys: 126 µs, total: 5.86 ms
Wall time: 5.22 ms


In [12]:
%%time

from trulens_eval.guardrails.llama import WithFeedbackFilterNodes

# note: feedback function used for guardrail must only return a score, not also reasons
f_context_relevance_score = Feedback(provider.context_relevance)

filtered_query_engine = WithFeedbackFilterNodes(query_engine, feedback=f_context_relevance_score, threshold=0.5)

CPU times: user 9.49 ms, sys: 108 µs, total: 9.6 ms
Wall time: 9.27 ms


In [13]:
%%time

tru_recorder = TruLlama(filtered_query_engine,
    app_id='LlamaIndex_App1_Filtered',
    feedbacks=[f_answer_relevance, f_context_relevance, f_groundedness])

with tru_recorder as recording:
    llm_response = filtered_query_engine.query("What did the author do growing up?")

display(llm_response)



[92mRequest to litellm:[0m
[92mlitellm.completion(temperature=0.0, model='ollama/qwen2', messages=[{'role': 'system', 'content': 'You are a RELEVANCE grader; providing the relevance of the given CONTEXT to the given QUESTION.\n        Respond only as a number from 0 to 10 where 0 is the least relevant and 10 is the most relevant. \n\n        A few additional scoring guidelines:\n\n        - Long CONTEXTS should score equally well as short CONTEXTS.\n\n        - RELEVANCE score should increase as the CONTEXTS provides more RELEVANT context to the QUESTION.\n\n        - RELEVANCE score should increase as the CONTEXTS provides RELEVANT context to more parts of the QUESTION.\n\n        - CONTEXT that is RELEVANT to some of the QUESTION should score of 2, 3 or 4. Higher score indicates more RELEVANCE.\n\n        - CONTEXT that is RELEVANT to most of the QUESTION should get a score of 5, 6, 7 or 8. Higher score indicates more RELEVANCE.\n\n        - CONTEXT that is RELEVANT to the entir

RuntimeError: Endpoint litellm request failed 4 time(s): 
	litellm.ServiceUnavailableError: OllamaException: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff174205db0>: Failed to establish a new connection: [Errno 111] Connection refused'))
	litellm.ServiceUnavailableError: OllamaException: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff174224610>: Failed to establish a new connection: [Errno 111] Connection refused'))
	litellm.ServiceUnavailableError: OllamaException: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff174226ce0>: Failed to establish a new connection: [Errno 111] Connection refused'))
	litellm.ServiceUnavailableError: OllamaException: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff17423d3f0>: Failed to establish a new connection: [Errno 111] Connection refused'))