# Quickstart Notebook

This notebook shows how to log RAG system answers to Tonic Validate using the Tonic Validate logging package, `tvallogging`. A simple RAG system is made using LlamaIndex to answer questions about 6 Paul Graham essays taken from his website about founders. Answers to 10 benchmark questions are logged along with the RAG metrics which are calculated using the Tonic Validate package.

In this notebook we work with a benchmark of questions without reference answers, calculating the RAG metrics that do not need a reference answer.

In [1]:
import json
from llama_index import VectorStoreIndex, SimpleDirectoryReader

Set up RAG system.

In [2]:
documents = SimpleDirectoryReader("./paul_graham_essays").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

Load the benchmark questions and answers.

In [3]:
with open("question_and_answer_list.json", "r") as f:
    question_and_answer_list = json.load(f)

question_and_answer_list = [dict(question=x["question"], answer="") for x in question_and_answer_list]

In [4]:
question_and_answer_list

[{'question': 'What makes Sam Altman a good founder?', 'answer': ''},
 {'question': 'When was the essay "Five Founders" written?', 'answer': ''},
 {'question': 'When does the most dramatic growth happen for a startup?',
  'answer': ''},
 {'question': 'What is the problem with business culture versus start up culture with respect to productivity?',
  'answer': ''},
 {'question': "What's the single biggest thing the government could do to increase the number of startups in this country?",
  'answer': ''},
 {'question': 'How could one create a rigorous government definition of what a startup is to categorize whether a business is a startup?',
  'answer': ''},
 {'question': 'Why is frienship a good quality of founders?', 'answer': ''},
 {'question': 'Why is determination the most important quality in startup founders?',
  'answer': ''},
 {'question': 'For startups, what does board control mean in practice?',
  'answer': ''},
 {'question': "What's in the way of founders keeping board contro

Start using Tonic Validate to log answer to the benchmark questions that are from the RAG system.

In [5]:
from tvallogging.api import TonicValidateApi
from tvallogginginging.chat_objects import Benchmark

Create a TonicValidateApi instance. For this to work, get a Tonic Validate API key from the Tonic Validate UI and set the environmental variable `TONIC_VALIDATE_API_KEY`.

In [6]:
# set TONIC_VALIDATE_API_KEY environment variable
# import os
# os.environ["TONIC_VALIDATE_API_KEY"] = "put-your-api-key-here"

api = TonicValidateApi()

Create new benchmark on Tonic Validate server.

In [7]:
benchmark = Benchmark.from_json_list(question_and_answer_list)
benchmark_id = api.new_benchmark(benchmark, "Paul Graham Quickstart No Reference Answers Benchmark")

Create new project on Tonic Validate server associated with the created benchmark.

In [8]:
project = api.new_project(benchmark_id, "Paul Graham Quickstart No Reference Answers Project")

Create a run in the in the project and log answers to the qeustions in benchmark with their scores. Specify the scores we want calculated in the run.

In [9]:
llm_evaluator = "gpt-3.5-turbo"
# llm_evaluator = "gpt-4"
run = project.new_run(
    llm_evaluator=llm_evaluator,
    retrieval_precision=True,
    augmentation_precision=True,
    augmentation_accuracy=True,
    answer_consistency=True
)

for q_and_a in run.benchmark.question_with_answer_list:
    
    # get response to question from rag llm
    question = q_and_a.question
    response = query_engine.query(question)
    
    # get answer and context from response
    llm_answer = response.response
    context_list = [source_node.node.text for source_node in response.source_nodes]
    
    # log answer and context, this is where scores are calculated
    run.log(q_and_a, llm_answer, context_list)
    print(f"logged {q_and_a.question}")

logged What makes Sam Altman a good founder?
logged When was the essay "Five Founders" written?
logged When does the most dramatic growth happen for a startup?
logged What is the problem with business culture versus start up culture with respect to productivity?
logged What's the single biggest thing the government could do to increase the number of startups in this country?
logged How could one create a rigorous government definition of what a startup is to categorize whether a business is a startup?
logged Why is frienship a good quality of founders?
logged Why is determination the most important quality in startup founders?
logged For startups, what does board control mean in practice?
logged What's in the way of founders keeping board control after a series A?
