In [None]:
%%capture
!pip install llama-index==0.10.29 ragas==0.1.7 llama-index-embeddings-openai llama-index-llms-openai

In [40]:
import os
import sys
from getpass import getpass
import nest_asyncio

from dotenv import load_dotenv

sys.path.append('../helpers')

nest_asyncio.apply()

load_dotenv()

True

In [2]:
OPENAI_API_KEY = os.environ['OPENAI_API_KEY'] or getpass("Enter your OpenAI API key: ")

I'm using OpenAI here because Cohere has rate limits for it's free tier. You don't need to run this code yourself if you don't want to incur costs from OpenAI. I'll upload the dataset to the Hugging Face Hub and I'll show you how to download it from there when we need it.

In [3]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo")

We've already cleaned up our data before. Recall that we've persisted the `Document` objects to disk using a Docstore in such a way that each Document object represents cleaned text from a page of a book.

In [4]:
from utils import get_documents_from_docstore

documents = get_documents_from_docstore("../data/words-of-the-senpais")

# Create a set of `Documents` for the evaluation set

- 📚 **`group_documents_by_author`**: A utility function that sorts a collection of douments into groups based on who wrote them.

- 🗂️ **How It Works**: It creates a  dictionary where each author's name is linked to all the documents they've written.
  - Starts with an empty dictionary ready to be filled with author-document pairs.
  - Goes through each document, checking the author's name and adding the document under the appropriate author in the dictionary.
  - If a document doesn't list an author, it skips adding that document with a warning note.

- 📝 **Input**: Takes a list of `Document` objects, each with metadata that includes the `author` field (the name of its author).

- 🔖 **Output**: Outputs a dictionary that groups all the documents by their respective authors.
  

In [5]:
import random
from utils import group_documents_by_author

random.seed(42)

documents_by_author = group_documents_by_author(documents)

- 📚 **`sample_documents`**: Picks a set number of documents randomly from each author's collection within a grouped dictionary.

- 🎲 **Sampling Logic**: It tries to get a specific number of documents for each author. If an author doesn't have enough documents, it alerts you.
  - Begins with an empty list for storing selected samples.
  - Loops through each author, considers only docs with >500 characters, checking if there are enough documents to fulfill the sampling requirement.
  - Randomly selects the desired number of documents from those available, adding them to the overall sample list.
  - Issues a warning if the documents under an author are too few to meet the sampling number.

- 📝 **Input**: Receives a dictionary where authors are keys and values are lists of their documents, along with an optional number of documents to sample per author.

- 🔖 **Output**: Outputs a list of randomly chosen documents from across all authors, sticking to the specified number per author when possible.

In [6]:
from utils import sample_documents

docs_for_eval_set = sample_documents(documents_by_author, num_samples=25)

# Perform a sanity check

In [7]:
from collections import Counter

def count_documents_by_author(documents):
    """
    Count the number of documents each author has in a list of document objects.

    :param documents: List of document objects with metadata containing 'author'.
    :return: A Counter object with authors as keys and counts of their documents as values.
    """
    # Extract the author from each document's metadata and count occurrences
    author_counts = Counter(doc.metadata['author'] for doc in documents if 'author' in doc.metadata)
    return author_counts

author_counts = count_documents_by_author(docs_for_eval_set)
for author, count in author_counts.items():
    print(f"Author '{author}' has {count} documents.")

Author 'Naval Ravikant' has 25 documents.
Author 'Balaji Srinivasan' has 25 documents.
Author 'Paul Graham' has 25 documents.
Author 'Nassim Nicholas Taleb' has 25 documents.
Author 'Seneca' has 25 documents.
Author 'Bruce Lee' has 25 documents.


In [8]:
len(docs_for_eval_set)

150

In [9]:
from utils import ingest 
from llama_index.core.node_parser import TokenTextSplitter

splitter = TokenTextSplitter(
    chunk_size = 256,
    chunk_overlap = 32
)

transformations = [splitter]

docs_for_eval_set = ingest(documents = docs_for_eval_set, transformations = transformations)

In [10]:
len(docs_for_eval_set)

316

## Let's create an evaluation set using custom prompts

In [11]:
from llama_index.core.prompts.base import PromptTemplate
from prompts import QUESTION_GEN_PROMPT
print(QUESTION_GEN_PROMPT)

Your task is to write a question given a context. Your question must be in the form of an adult mentee seeking advice 
from a trusted mentor. Formulate your question in the same style as questions users could ask in a search engine. Your question must be 
answerable with a specific, concise piece of information from the context. 

The context is below:
----------------------
{context}
----------------------

Your question MUST be short, clear, and based on the essence of the context. DO NOT use any qualifiers, relative clauses, or introductory modifiers.  
Keep your question short and to the point. Ask your question using the first person perspective, in the form of a student seeking advice from a trusted mentor.



In [12]:
QUESTION_GEN_PROMPT_TEMPLATE = PromptTemplate(QUESTION_GEN_PROMPT)

In [14]:
from llama_index.core import PromptTemplate

prompt = QUESTION_GEN_PROMPT_TEMPLATE.format(context=docs_for_eval_set[10].get_content()) 

response = llm.complete(prompt)

print(response)

How can I work my way up to higher leverage, more accountability, and specific knowledge without risking ruin?


# 🤖 + ❓Generate questions from context

We'll use GPT-3.5-Turbo to generate questions from our `Documents`

Here's what the function below is doing:

- Initialize an empty dictionary results to store the responses and contexts.

- Iterate through each document doc in `docs_for_eval_set`.

- For each document, we generate the prompt using `QUESTION_GEN_PROMPT_TEMPLATE` and the document's content.

- Get the response from the LLM using `question_gen_llm.complete(prompt)`.

- Store the response as the key, and the document's content as the value with the key "context" in the results dictionary.


In [16]:
from llama_index.core import PromptTemplate

questions = []

for doc in docs_for_eval_set:
    result_dict = {}
    context = doc.get_content()
    prompt = QUESTION_GEN_PROMPT_TEMPLATE.format(context=context)
    response = llm.complete(prompt)
    result_dict['question'] = response.text
    result_dict["context"] =  context
    questions.append(result_dict)

In [17]:
len(questions)

316

In [18]:
questions[:10]

[{'question': 'How can I leverage coding to multiply my efforts without needing permission or money from others?',
  'context': 'This includes books, media, movies, and code. Code is probably the most powerful form of permissionless leverage. All you need is a computeryou dont need anyones permission. Forget rich versus poor, white-collar versus blue. Its now leveraged versus un-leveraged. The most interesting and the most important form of leverage is the idea of products that have no marginal cost of replication. This is the new form of leverage. This was only invented in the last few hundred years. It started with the printing press. It accelerated with broadcast media, and now its really blown up with the internet and with coding. Now, you can multiply your efforts without involving other humans and without needing money from other humans. This book is a form of leverage. Long ago, I would have had to sit in a lecture hall and lecture each of you personally. I would have maybe reac

# 🤖 + 💬 Create answers using generated question and context

Using GPT-3.5-Turbo (to keep costs down, you can of course use GPT-4-Turbo), we'll generate answers using the questions we just created plus the context.

In [19]:
from prompts import ANSWER_GEN_PROMPT

print(ANSWER_GEN_PROMPT)

You're a trusted mentor to an adult mentee. Your mentee is seeking advice in the form of a question.

Below is your mentee's question:

----------------------
{question}
----------------------

You've previously done some thinking and writing on the exact question your mentee has asked. Your answer MUST be actionable
yet concise. It must be based on your thinking and writing, which is below:

----------------------
{context}
----------------------

DO NOT use any qualifiers, relative clauses, or introductory modifiers in your answer. Provide your answer question using the second person
perspective, speaking directly to your mentee, in the form of a trusted mentor providing actionable advice.



In [20]:
ANSWER_GEN_PROMPT_TEMPLATE = PromptTemplate(ANSWER_GEN_PROMPT)

In [21]:
prompt = ANSWER_GEN_PROMPT_TEMPLATE.format(question=questions[42]['question'], context=questions[42]['context']) 

response = llm.complete(prompt)

print(response)

Focus on setting clear goals that align with your desires, create a plan with actionable steps to achieve them, and stay committed to your long-term vision. Stay adaptable, seek guidance from mentors, and remember that setbacks are part of the journey. Stay focused, stay determined, and keep moving forward towards your goals.


In [22]:
for question in questions:
    prompt = ANSWER_GEN_PROMPT_TEMPLATE.format(question=question['question'], context=question['context']) 
    response = llm.complete(prompt)
    question['answer'] = response.text

In [23]:
questions[:10]

[{'question': 'How can I leverage coding to multiply my efforts without needing permission or money from others?',
  'context': 'This includes books, media, movies, and code. Code is probably the most powerful form of permissionless leverage. All you need is a computeryou dont need anyones permission. Forget rich versus poor, white-collar versus blue. Its now leveraged versus un-leveraged. The most interesting and the most important form of leverage is the idea of products that have no marginal cost of replication. This is the new form of leverage. This was only invented in the last few hundred years. It started with the printing press. It accelerated with broadcast media, and now its really blown up with the internet and with coding. Now, you can multiply your efforts without involving other humans and without needing money from other humans. This book is a form of leverage. Long ago, I would have had to sit in a lecture hall and lecture each of you personally. I would have maybe reac

# 🧐 How good are our questions?

I suppose you could do this part before generating answers, if you wanted to...But we'll do it now.

Here we're going to use GPT-4-Turbo to judge how good the questions is based on the context. We'll write a prompt that does this and score each question on a scale of 1-5.



In [26]:
critic_llm = OpenAI(model="gpt-4-turbo-2024-04-09")

In [61]:
from prompts import GROUNDEDNESS_PROMPT

print(GROUNDEDNESS_PROMPT)

You are given context and a question. Provide a 'total rating' from 1 to 5 indicating 
the extent to which the question can be answered clearly using the context. 1 = not answerable, 5 = clearly answerable

Format your response as:

Evaluation: (rationale)
Total rating: (a number in the range 1-5)

Content and question are below:
----------------------
Context: {context}
Question: {question}
----------------------



In [42]:
GROUNDEDNESS_PROMPT_TEMPLATE = PromptTemplate(GROUNDEDNESS_PROMPT)

In [39]:
prompt = GROUNDEDNESS_PROMPT_TEMPLATE.format(question=questions[42]['question'], context=questions[42]['context']) 

response = critic_llm.complete(prompt)

print(response)

Evaluation: The context provides a philosophical perspective on puberty, describing it as the onset of desire and the beginning of long-range planning. It suggests that during puberty, individuals start to think more deeply and build their identities and egos to achieve their desires. However, the context does not provide specific strategies or advice on how to navigate these changes effectively. The question asks for guidance on managing these new desires and planning strategies, which the context does not directly address with practical steps or methods.

Total rating: 2


In [52]:
for question in questions:
    prompt = GROUNDEDNESS_PROMPT_TEMPLATE.format(question=question['question'], context=question['context']) 
    response = critic_llm.complete(prompt)
    response_string = response.text
    try:
        score_as_int = int(response_string.split("Total rating: ")[-1].strip())
        score_rational = response_string.split("Total rating: ")[-2].split("Evaluation: ")[1]
        question['question_groundedness_score'] = score_as_int
        question['question_groundedness_score_rationale'] = score_rational
    except Exception as e:
        question['question_groundedness_score'] = None
        question['question_groundedness_score_rationale'] = None

In [59]:
questions[-10:]

[{'question': 'How can I cultivate an intelligent mind that is constantly learning and inquiring?',
  'context': "The Mind An intelligent mind is constantly learning. - An intelligent mind is one which is constantly learning, never concluding - styles and patterns have come to conclusion, therefore they [have] ceased to be intelligent. An intelligent mind is an inquiring mind. - An intelligent mind is an INQUIRING mind. It is not satisfied with explanations, with conclusions; nor is it a mind that believes, because belief is again another form of conclusion. The qualities of mind. -To be one thing and not to change is the climax of STILLNESS. To have nothing in one that resists is the climax of EMPTINESS. To remain detached from all outside things is the climax of FINENESS. To have in oneself no contraries is the climax of PURITY. You are the commander of your mind. - I've always been buffeted by circumstances because I thought of myself as a human being [affected by] outside condition

In [62]:
from datasets import Dataset

rag_eval_set = Dataset.from_list(questions)

# You can find the dataset on Hugging Face

You don't have to run the examples here if you don't want to incur costs from OpenAI. 

I've made the dataset available on Hugging Face, and I'll show you how to load it when we need to make use of it. 

[Here's the dataset](https://huggingface.co/datasets/harpreetsahota/LI_Learning_RAG_Eval_Set). You can click around and explore using the dataset viewer. If you sign-up for an account on Hugging Face, feel free to [follow me](https://huggingface.co/harpreetsahota)!



In [None]:
rag_eval_set.push_to_hub(repo_id="harpreetsahota/LI_Learning_RAG_Eval_Set")