# Llama Index Quick Start Example

In the spirit of [Llama Index's starter tutorial](https://gpt-index.readthedocs.io/en/stable/getting_started/starter_example.html) and Andrej Karpathy's [Unreasonable Effectiveness of RNNs blog post](http://karpathy.github.io/2015/05/21/rnn-effectiveness/), we start with an example of a RAG system where the document set consists of Paul Graham essays. (Footnote: The Paul Graham essay text files used were derived from the dataset of Paul Graham essays found in [paul-graham-gpt](https://github.com/mckaywrigley/paul-graham-gpt) github project.)

In this notebook, we set up a simple RAG system using llama index, and evaluate the RAG system using 10 predetermined questions and answers about Paul Graham essays. The Paul Graham essays that make up the document set are the 6 essays that have the word founder in the title.

Set up a simple llama index RAG system that uses the default LlamaIndex parameters. The default LlamaIndex parameters use Open AIs ada-002 embedding model as the embedder and gpt-3.5-turbo as the LLM.

In [1]:
from llama_index import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("./paul_graham_essays").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

# Gets the response from llama index
def get_llama_response(prompt):
    response = query_engine.query(prompt)
    context = [x.text for x in response.source_nodes]
    return {
        "llm_answer": response.response,
        "llm_context_list": context
    }

Load 10 questions and answers about the Paul Graham essays as a benchmark for how the RAG system should answer questions.

In [2]:
import json
qa_pairs = []
with open("question_and_answer_list.json", "r") as qa_file:
    qa_pairs = json.load(qa_file)[:10]

Let's inspect an example, question, answer from the RAG system and reference answer.

In [3]:
example_qa = qa_pairs[0]
example_qa

{'question': 'What makes Sam Altman a good founder?',
 'answer': 'He has a great force of will.',
 'reference_article': 'Five Founders',
 'reference_text': '5. Sam Altman\n\nI was told I shouldn\'t mention founders of YC-funded companies in this list. But Sam Altman can\'t be stopped by such flimsy rules. If he wants to be on this list, he\'s going to be.\n\nHonestly, Sam is, along with Steve Jobs, the founder I refer to most when I\'m advising startups. On questions of design, I ask "What would Steve do?" but on questions of strategy or ambition I ask "What would Sama do?"\n\nWhat I learned from meeting Sama is that the doctrine of the elect applies to startups. It applies way less than most people think: startup investing does not consist of trying to pick winners the way you might in a horse race. But there are a few people with such force of will that they\'re going to get whatever they want.'}

In [4]:
get_llama_response(example_qa["question"])

{'llm_answer': "Sam Altman is considered a good founder because he possesses qualities that are highly valued in the startup world. He is known for his force of will and determination, which are crucial qualities for overcoming obstacles and persevering in the face of challenges. Additionally, Altman is known for his strategic thinking and ambition, which make him a valuable advisor for startups. His ability to think outside the box and come up with innovative ideas is also highly regarded. Altman's success and influence in the startup community make him a noteworthy founder.",
 'llm_context_list': ['Five Founders\n\nApril 2009\n\nInc recently asked me who I thought were the 5 most interesting startup founders of the last 30 years. How do you decide who\'s the most interesting? The best test seemed to be influence: who are the 5 who\'ve influenced me most? Who do I use as examples when I\'m talking to companies we fund? Who do I find myself quoting?1. Steve JobsI\'d guess Steve is the 

Now let's set up the benchmark for Tonic Validate using the QA pairs

In [5]:
from tonic_validate import Benchmark
question_list = [qa_pair['question'] for qa_pair in qa_pairs]
answer_list = [qa_pair['answer'] for qa_pair in qa_pairs]

benchmark = Benchmark(questions=question_list, answers=answer_list)

Set up the scorer from Tonic Validate to score the run.

In [6]:
from tonic_validate import ValidateScorer

scorer = ValidateScorer()
response_scores = scorer.score(benchmark, get_llama_response, scoring_parallelism=2, callback_parallelism=2)

Put the scores into a dataframe for easy viewing.

In [7]:
import pandas as pd

def make_scores_df(response_scores):
    scores_df = {
        "question": [],
        "reference_answer": [],
        "llm_answer": [],
        "retrieved_context": []
    }
    for score_name in response_scores.overall_scores:
        scores_df[score_name] = []
    for data in response_scores.run_data:
        scores_df["question"].append(data.reference_question)
        scores_df["reference_answer"].append(data.reference_answer)
        scores_df["llm_answer"].append(data.llm_answer)
        scores_df["retrieved_context"].append(data.llm_context)
        for score_name, score in data.scores.items():
            scores_df[score_name].append(score)
    return pd.DataFrame(scores_df)
            

In [8]:
scores_df = make_scores_df(response_scores)

In [9]:
scores_df.head()

Unnamed: 0,question,reference_answer,llm_answer,retrieved_context,answer_similarity,augmentation_precision,answer_consistency
0,What makes Sam Altman a good founder?,He has a great force of will.,Sam Altman is considered a good founder becaus...,[Five Founders\n\nApril 2009\n\nInc recently a...,5.0,1.0,0.8
1,"When was the essay ""Five Founders"" written?",April 2009,April 2009,[Five Founders\n\nApril 2009\n\nInc recently a...,5.0,1.0,1.0
2,When does the most dramatic growth happen for ...,When the startup only has three or four people.,The most dramatic growth for a startup typical...,[Learning from Founders\n\nJanuary 2007\n\n(Fo...,5.0,1.0,1.0
3,What is the problem with business culture vers...,"In business culture, energy is expended on out...",The problem with business culture versus start...,[Learning from Founders\n\nJanuary 2007\n\n(Fo...,5.0,0.5,1.0
4,What's the single biggest thing the government...,Establish a new class of visa for startup foun...,Establish a new class of visa for startup foun...,[The Founder Visa\n\nApril 2009\n\nI usually a...,5.0,1.0,1.0
