# Quickstart Notebook

This notebook shows how to log RAG system answers to Tonic Validate using the Tonic Validate logging package, `tvallogging`. A simple RAG system is made using LlamaIndex to answer questions about 6 Paul Graham essays taken from his website about founders. Answers to 10 benchmark questions are logged along with the RAG metrics which are calculated using the Tonic Validate package.

In [1]:
import json
from llama_index import VectorStoreIndex, SimpleDirectoryReader

Set up RAG system.

In [2]:
documents = SimpleDirectoryReader("./paul_graham_essays").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

Load the benchmark questions and answers.

In [3]:
with open("question_and_answer_list.json", "r") as f:
    question_and_answer_list = json.load(f)

question_and_answer_list = [dict(question=x["question"], answer=x["answer"]) for x in question_and_answer_list]

In [4]:
question_and_answer_list

[{'question': 'What makes Sam Altman a good founder?',
  'answer': 'He has a great force of will.'},
 {'question': 'When was the essay "Five Founders" written?',
  'answer': 'April 2009'},
 {'question': 'When does the most dramatic growth happen for a startup?',
  'answer': 'When the startup only has three or four people.'},
 {'question': 'What is the problem with business culture versus start up culture with respect to productivity?',
  'answer': "In business culture, energy is expended on outward appearance to the detriment of productivity, while in startup culture there is no value of appearance it's all about productivity."},
 {'question': "What's the single biggest thing the government could do to increase the number of startups in this country?",
  'answer': 'Establish a new class of visa for startup founders.'},
 {'question': 'How could one create a rigorous government definition of what a startup is to categorize whether a business is a startup?',
  'answer': 'One could define 

In [5]:
ex_question = question_and_answer_list[0]["question"]
ex_question

'What makes Sam Altman a good founder?'

In [6]:
ex_reference_answer = question_and_answer_list[0]["answer"]
ex_reference_answer

'He has a great force of will.'

In [7]:
ex_response = query_engine.query(ex_question)
ex_llm_answer = ex_response.response
ex_llm_answer

"Sam Altman is considered a good founder because he possesses qualities that are highly valued in the startup world. He is known for his determination and force of will, which are crucial for overcoming obstacles and not getting demoralized easily. Altman is also known for his strategic thinking and ambition, which make him a valuable source of advice for startups. Additionally, Altman has a strong sense of design and an ability to think outside the box, which allows him to come up with innovative ideas. Overall, Altman's combination of determination, strategic thinking, and creativity make him a successful and influential founder."

In [8]:
ex_context = [source_node.node.text for source_node in ex_response.source_nodes]
ex_context

['Five Founders\n\nApril 2009\n\nInc recently asked me who I thought were the 5 most interesting startup founders of the last 30 years. How do you decide who\'s the most interesting? The best test seemed to be influence: who are the 5 who\'ve influenced me most? Who do I use as examples when I\'m talking to companies we fund? Who do I find myself quoting?1. Steve JobsI\'d guess Steve is the most influential founder not just for me but for most people you could ask. A lot of startup culture is Apple culture. He was the original young founder. And while the concept of "insanely great" already existed in the arts, it was a novel idea to introduce into a company in the 1980s. More remarkable still, he\'s stayed interesting for 30 years. People await new Apple products the way they\'d await new books by a popular novelist. Steve may not literally design them, but they wouldn\'t happen if he weren\'t CEO. Steve is clever and driven, but so are a lot of people in the Valley. What makes him un

Start using Tonic Validate Logging to log answer to the benchmark questions that are from the RAG system.

In [9]:
from tvallogging.api import TonicValidateApi
from tvallogging.chat_objects import Benchmark

Create a TonicValidateApi instance. For this to work, get a Tonic Validate API key from the Tonic Validate UI and set the environmental variable `TONIC_VALIDATE_API_KEY`.

In [10]:
# set TONIC_VALIDATE_API_KEY environment variable
# import os
# os.environ["TONIC_VALIDATE_API_KEY"] = "put-your-api-key-here"

api = TonicValidateApi()

Create new benchmark on Tonic Validate server.

In [11]:
benchmark = Benchmark.from_json_list(question_and_answer_list)
benchmark_id = api.new_benchmark(benchmark, "Paul Graham Quickstart Benchmark")

Create new project on Tonic Validate server associated with the created benchmark.

In [12]:
project = api.new_project(benchmark_id, "Paul Graham Quickstart Project")

Create a run in the in the project and log answers to the qeustions in benchmark with their scores. Specify the scores we want calculated in the run.

In [13]:
llm_evaluator = "gpt-3.5-turbo"
# llm_evaluator = "gpt-4"
run = project.new_run(
    llm_evaluator=llm_evaluator,
    answer_similarity_score=True,
    retrieval_precision=True,
    augmentation_precision=True,
    augmentation_accuracy=True,
    answer_consistency=True
)

for q_and_a in run.benchmark.question_with_answer_list:
    
    # get response to question from rag llm
    question = q_and_a.question
    response = query_engine.query(question)
    
    # get answer and context from response
    llm_answer = response.response
    context_list = [source_node.node.text for source_node in response.source_nodes]
    
    # log answer and context, this is where scores are calculated
    run.log(q_and_a, llm_answer, context_list)
    print(f"logged {q_and_a.question}")

logged What makes Sam Altman a good founder?
logged When was the essay "Five Founders" written?
logged When does the most dramatic growth happen for a startup?
logged What is the problem with business culture versus start up culture with respect to productivity?
logged What's the single biggest thing the government could do to increase the number of startups in this country?
logged How could one create a rigorous government definition of what a startup is to categorize whether a business is a startup?
logged Why is frienship a good quality of founders?
logged Why is determination the most important quality in startup founders?
logged For startups, what does board control mean in practice?
logged What's in the way of founders keeping board control after a series A?
