# Llama-Index Quickstart

In this quickstart you will create a simple Llama Index App and learn how to log it and get feedback on an LLM response.

For evaluation, we will leverage the "hallucination triad" of groundedness, context relevance and answer relevance.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/main/trulens_eval/examples/quickstart/llama_index_quickstart.ipynb)

## Setup

### Install dependencies
Let's install some of the dependencies for this notebook if we don't have them already

In [1]:
# pip install trulens_eval==0.18.3 llama_index>=0.8.69 html2text>=2020.1.16 

### Add API keys
For this quickstart, you will need Open AI and Huggingface keys. The OpenAI key is used for embeddings and GPT, and the Huggingface key is used for evaluation.

In [2]:
import os
os.environ["OPENAI_API_KEY"] = "sk-..."

### Import from LlamaIndex and TruLens

In [3]:
from trulens_eval import Tru

tru = Tru()

🦑 Tru initialized with db url sqlite:///default.sqlite .
🛑 Secret keys may be written to the database. See the `database_redact_keys` option of `Tru` to prevent this.


### Create Simple LLM Application

This example uses LlamaIndex which internally uses an OpenAI LLM.

In [4]:
from llama_index import VectorStoreIndex
from llama_index.readers.web import SimpleWebPageReader

documents = SimpleWebPageReader(
    html_to_text=True
).load_data(["http://paulgraham.com/worked.html"])
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()

### Send your first request

In [5]:
response = query_engine.query("What did the author do growing up?")
print(response)

The author mentioned that before college, they worked on two main things outside of school: writing and programming. They wrote short stories and also tried writing programs on an IBM 1401 computer.


## Setup One Click Testing

In [6]:
import numpy as np
import logging
logger = logging.getLogger(__name__)
from logging import StreamHandler

# Create a logger and set the logging level
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

# Create a StreamHandler to output logs to the notebook
handler = StreamHandler()
handler.setLevel(logging.INFO)
logger.addHandler(handler)

from trulens_eval import Feedback, TruLlama
from trulens_eval.feedback import Groundedness

class TruLlama_OneClick_Testing:
    def __init__(self, query_engine, feedback_provider):
        self.query_engine = query_engine
        self.feedback_provider = feedback_provider
        
    def generate_hallucination_test_cases(self, number_test_cases: int) -> str:
        """
        Inputs:
            number_test_cases: int - number of test cases you wish you to generate
        """
        logger.info("Generating test cases...")
        test_case_system_prompt = """Return a list of {number_test_cases} questions. Half should be about the data available, and half should seem like they are from the data available but be unanswerable. Respond in the format of a python list, for example: ["question 1", "question 2", ...]"""
        test_cases = eval(self.query_engine.query(test_case_system_prompt).response)

        return test_cases
    
    def get_rag_triad(self):
        logger.info("Defining feedback functions...")
        grounded = Groundedness(groundedness_provider=self.feedback_provider)
        f_groundedness = Feedback(grounded.groundedness_measure_with_cot_reasons, name = "Groundedness").on(
            TruLlama.select_source_nodes().node.text.collect()
            ).on_output(
            ).aggregate(grounded.grounded_statements_aggregator)

        # Question/answer relevance between overall question and answer.
        f_qa_relevance = Feedback(self.feedback_provider.relevance, name = "Answer Relevance").on_input_output()

        # Context relevance between question and each context chunk.
        f_context_relevance = Feedback(self.feedback_provider.qs_relevance, name = "Context Relevance").on_input().on(
            TruLlama.select_source_nodes().node.text
            ).aggregate(np.mean)
        
        hallucination_feedbacks = [f_groundedness, f_qa_relevance, f_context_relevance]

        return hallucination_feedbacks
    
    def get_recorder(self, feedbacks):
        logger.info("Setting up tracking...")
        return TruLlama(self.query_engine, feedbacks=feedbacks)
    
    def evaluate_hallucination(self, number_test_cases: int):
        test_cases = self.generate_hallucination_test_cases(number_test_cases)
        feedbacks = self.get_rag_triad()
        recorder = self.get_recorder(feedbacks)
        logger.info("Evaluating the app...")
        with recorder as recording:
            for test_case in test_cases:
                self.query_engine.query(test_case)

        return tru.get_leaderboard(app_ids=[recorder.app_id])

## Run One-click Testing

In [7]:
from trulens_eval.feedback.provider import OpenAI

openai_provider = OpenAI()

oneclick = TruLlama_OneClick_Testing(query_engine, openai_provider)

oneclick.evaluate_hallucination(number_test_cases=10)

Generating test cases...
Defining feedback functions...
Setting up tracking...
Evaluating the app...


✅ In Groundedness, input source will be set to __record__.app.query.rets.source_nodes[:].node.text.collect() .
✅ In Groundedness, input statement will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Answer Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Answer Relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Context Relevance, input question will be set to __record__.main_input or `Select.RecordInput` .
✅ In Context Relevance, input statement will be set to __record__.app.query.rets.source_nodes[:].node.text .


Unnamed: 0_level_0,Context Relevance,Groundedness,Answer Relevance,latency,total_cost
app_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
app_hash_9875900db40ac95721921ee3305fd897,0.16,1.0,0.8375,1.5,0.003017


## Explore in a Dashboard

In [8]:
tru.run_dashboard() # open a local streamlit app to explore

# tru.stop_dashboard() # stop if needed

Starting dashboard ...
Config file already exists. Skipping writing process.
Credentials file already exists. Skipping writing process.


Accordion(children=(VBox(children=(VBox(children=(Label(value='STDOUT'), Output())), VBox(children=(Label(valu…

Dashboard started at http://192.168.1.157:8501 .


<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>

Alternatively, you can run `trulens-eval` from a command line in the same folder to start the dashboard.

Note: Feedback functions evaluated in the deferred manner can be seen in the "Progress" page of the TruLens dashboard.

## Or view results directly in your notebook

In [9]:
tru.get_records_and_feedback(app_ids=[])[0] # pass an empty list of app_ids to get all

Unnamed: 0,app_id,app_json,type,record_id,input,output,tags,record_json,cost_json,perf_json,...,groundedness_measure_with_cot_reasons_calls,Answer Relevance,Context Relevance,Groundedness,Answer Relevance_calls,Context Relevance_calls,Groundedness_calls,latency,total_tokens,total_cost
0,app_hash_2497bd1d5aaac828f47588fd987dfbaf,"{""app_id"": ""app_hash_2497bd1d5aaac828f47588fd9...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_e6c7db7d5baa81f78a6f4637c79f4ad9,"""What were the different shapes and sizes of t...","""The air conditioners were all different shape...",-,"{""record_id"": ""record_hash_e6c7db7d5baa81f78a6...","{""n_requests"": 2, ""n_successful_requests"": 2, ...","{""start_time"": ""2023-12-07T17:09:17.182651"", ""...",...,"[{'args': {'source': [""[16] She reports that t...",,,,,,,2,1596,0.002380
1,app_hash_2497bd1d5aaac828f47588fd987dfbaf,"{""app_id"": ""app_hash_2497bd1d5aaac828f47588fd9...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_363f9c89d7cc01acb5683745d0ba2870,"""What were the problems with HN?""","""The problems with HN were a 60% chance of it ...",-,"{""record_id"": ""record_hash_363f9c89d7cc01acb56...","{""n_requests"": 2, ""n_successful_requests"": 2, ...","{""start_time"": ""2023-12-07T17:09:19.565124"", ""...",...,[{'args': {'source': ['When I was dealing with...,,,,,,,3,1676,0.002555
2,app_hash_2497bd1d5aaac828f47588fd987dfbaf,"{""app_id"": ""app_hash_2497bd1d5aaac828f47588fd9...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_c60bf76608c7d23a607fe419603675eb,"""What was the worst thing about leaving YC?""","""The worst thing about leaving YC was the real...",-,"{""record_id"": ""record_hash_c60bf76608c7d23a607...","{""n_requests"": 2, ""n_successful_requests"": 2, ...","{""start_time"": ""2023-12-07T17:09:23.205103"", ""...",...,[{'args': {'source': ['When I was dealing with...,,,,,,,1,2129,0.003195
3,app_hash_2497bd1d5aaac828f47588fd987dfbaf,"{""app_id"": ""app_hash_2497bd1d5aaac828f47588fd9...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_813142bf883d4c95ee4db2f192839308,"""What is an example of a concept that aliens w...","""The Pythagorean theorem is an example of a co...",-,"{""record_id"": ""record_hash_813142bf883d4c95ee4...","{""n_requests"": 2, ""n_successful_requests"": 2, ...","{""start_time"": ""2023-12-07T17:09:25.271943"", ""...",...,"[{'args': {'source': [""[16] She reports that t...",,,,,,,1,1602,0.002392
4,app_hash_2497bd1d5aaac828f47588fd987dfbaf,"{""app_id"": ""app_hash_2497bd1d5aaac828f47588fd9...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_61e503a7b4815776b62434a8265edcea,"""What were some of the startups in the first b...","""Some of the startups in the first batch funde...",-,"{""record_id"": ""record_hash_61e503a7b4815776b62...","{""n_requests"": 2, ""n_successful_requests"": 2, ...","{""start_time"": ""2023-12-07T17:09:27.402813"", ""...",...,[{'args': {'source': ['We invited about 20 of ...,,,,,,,1,2127,0.003187
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
121,app_hash_9875900db40ac95721921ee3305fd897,"{""app_id"": ""app_hash_9875900db40ac95721921ee33...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_611884a3139e557af112a56a80d31ec5,"""What were the three things the author origina...","""The author originally intended to write short...",-,"{""record_id"": ""record_hash_611884a3139e557af11...","{""n_requests"": 2, ""n_successful_requests"": 2, ...","{""start_time"": ""2023-12-07T17:46:09.385418"", ""...",...,,1.0,,,[{'args': {'prompt': 'What were the three thin...,,,1,2130,0.003189
122,app_hash_9875900db40ac95721921ee3305fd897,"{""app_id"": ""app_hash_9875900db40ac95721921ee33...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_258604f25c031bf4b8c04f41b921f081,"""What was the original name and purpose of Hac...","""The original name of Hacker News was Startup ...",-,"{""record_id"": ""record_hash_258604f25c031bf4b8c...","{""n_requests"": 2, ""n_successful_requests"": 2, ...","{""start_time"": ""2023-12-07T17:46:11.656759"", ""...",...,,,,,,,,2,2141,0.003221
123,app_hash_9875900db40ac95721921ee3305fd897,"{""app_id"": ""app_hash_9875900db40ac95721921ee33...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_012adcc2719ca18d831aaa58bcd40e27,"""What was the biggest source of stress for the...","""The biggest source of stress for the author i...",-,"{""record_id"": ""record_hash_012adcc2719ca18d831...","{""n_requests"": 2, ""n_successful_requests"": 2, ...","{""start_time"": ""2023-12-07T17:46:13.534052"", ""...",...,,,,,,,,1,2121,0.003174
124,app_hash_9875900db40ac95721921ee3305fd897,"{""app_id"": ""app_hash_9875900db40ac95721921ee33...",RetrieverQueryEngine(llama_index.query_engine....,record_hash_775d8725d86eafc69a76f10453c44375,"""What did the author gradually stop working on?""","""The author gradually stopped working on writi...",-,"{""record_id"": ""record_hash_775d8725d86eafc69a7...","{""n_requests"": 2, ""n_successful_requests"": 2, ...","{""start_time"": ""2023-12-07T17:46:15.205647"", ""...",...,,,,,,,,1,2108,0.003158
