# RAG with Reranking

This notebook will show you how to use DSPy to compile a RAG program!

DSPy compilation is a fairly new tool for LLM developers, so let's start with an overview of the concept.

By `compiling`, we mean finding the prompts that elicit the behavior we want from LLMs when connected in some kind of pipeline.

For example, basic RAG consists of 2 steps, (1) Retrieve and (2) Answer a Question

Answering a Question has an associated prompt, for example, people generally use:

--

Please answer the question based on the following context.

context  {context}

question {question}

--

This prompt may be a good initial point for an LLM to understand the task. However, it is not the *optimal* prompt. DSPy optimizes the prompt for you by jointly tweaking the instructions, such as:

`Please answer the question based on the following context.` --> `Your task is to answer a question based on retrieved search results. It is very important that the answer contains citations to the retrieved context.` 

Further, we can put examples of input-outputs in the prompt to further improve performance, also known as `In-Context Learning`.

This example will show you how to compile a RAG program that (1) Retrieves, (2) Re-ranks Search Results, and (3) Answers a Question. We will be optimizing the prompts used in stages (2) and (3).

We are using 2 datasets for this example. Firstly, we have an index of the Weaviate Blog Posts. We will use the Weaviate Blog Posts as the retrieved context to help with our second dataset, the Weaviate FAQs. The Weaviate FAQs consists of 44 question-answer pairs of frequently asked Weaviate questions such as: `Do I need to know about Docker (Compose) to use Weaviate?`

# Import FAQs from a markdown file

In [3]:
# Load FAQs
import re

f = open("faq.md")
markdown_content = f.read()

def parse_questions_answers(markdown_content):
    # Regular expression patterns for finding questions and answers
    question_pattern = r'#### Q: (.+?)\n'
    answer_pattern = r'<details>\s*<summary>Answer</summary>(.+?)</details>'

    # Finding all questions and answers
    questions = re.findall(question_pattern, markdown_content, re.DOTALL)
    answers = re.findall(answer_pattern, markdown_content, re.DOTALL)

    # Cleaning answers by removing leading and trailing whitespaces/newlines
    cleaned_answers = [re.sub(r'\s*\n\s*', ' ', answer).strip() for answer in answers]

    # Combining questions and answers
    q_and_a = list(zip(questions, cleaned_answers))

    return q_and_a

# Parsing the markdown content
questions_and_answers = parse_questions_answers(markdown_content)
# Removing the initial "> " from the answers
questions_and_answers = [(q, a[2:] if a.startswith('> ') else a) for q, a in questions_and_answers]

# Displaying the first few extracted questions and answers
questions_and_answers[:5]  # Displaying only the first few for brevity

[('Why would I use Weaviate as my vector database?',
  'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the "knowledge" in "vector databases," if you will). Our ultimate goal is to have Weaviate help you manage, index, and "understand" your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.'),
 ('What is the difference between Weaviate and for example Elasticsearch?',
  'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. T

In [4]:
len(questions_and_answers)

44

# Wrap each FAQ into an `Example` object

In [5]:
# Load into dspy datasets
import dspy

# ToDo, add random splitting -- maybe wrap this entire thing in a cross-validation loop
trainset = questions_and_answers[:20] # 20 examples for training
devset = questions_and_answers[20:30] # 10 examples for development
testset = questions_and_answers[30:] # 14 examples for testing

trainset = [dspy.Example(question=question, answer=answer).with_inputs("question") for question, answer in trainset]
devset = [dspy.Example(question=question, answer=answer).with_inputs("question") for question, answer in devset]
testset = [dspy.Example(question=question, answer=answer).with_inputs("question") for question, answer in testset]

# Define a Metric for Performance

In [10]:
# Signature for automatic assesments.

# This is a WIP, the next step is to optimize this metric as itself a DSPy module (pretty meta)

# Reference - https://github.com/stanfordnlp/dspy/blob/main/examples/tweets/tweet_metric.py

metricLM = dspy.OpenAI(model='gpt-4', max_tokens=1000, model_type='chat')

class Assess(dspy.Signature):
    """Assess the quality of an answer to a question."""
    
    context = dspy.InputField()
    assessed_answer = dspy.InputField(desc="The answer to the question.")
    assessed_question = dspy.InputField(desc="The evaluation criterion.")
    assessment_answer = dspy.OutputField(desc="Yes or No")

def llm_metric(gold, pred, trace=None):
    predicted_answer = pred.answer
    question = gold.question
    #question="foobar" # works with this
    gold_answer = gold.answer
    # I think I already have context, but could do
    # context = retrieve(question)
    
    correct = f"The text above should answer `{question}`. The gold answer is `{gold_answer}`."
    correct = f"{correct} Does the assessed text above contain the gold answer?"
    
    print(correct)
    
    with dspy.context(lm=metricLM):
        assess = dspy.Predict(Assess)
        correct = assess(context='N/A', assessed_answer=predicted_answer, assessed_question=correct)
    
    print(correct)
    
    correct = correct.assessment_answer.split()[0].lower() == 'yes'
    score = 1.0 if correct else 0.0
    return score

In [16]:
# remember to set your api_key with
# import openai
# openai.api_key = "sk-foobar"

test_example = dspy.Example(question="What time is it?", answer="10:30")
test_pred = dspy.Example(answer="10:30")

type(llm_metric(test_example, test_pred))

The text above should answer `What time is it?`. The gold answer is `10:30`. Does the assessed text above contain the gold answer?
Prediction(
    assessment_answer='Yes'
)


float

In [17]:
test_example = dspy.Example(question="What time is it?", answer="10:30")
test_pred = dspy.Example(answer="8:25")

type(llm_metric(test_example, test_pred))

The text above should answer `What time is it?`. The gold answer is `10:30`. Does the assessed text above contain the gold answer?
Prediction(
    assessment_answer='No'
)


float

# Configure DSPy Settings

In [18]:
# Connect to Weaviate Retriever and configure LLM
import dspy
from dspy.retrieve.weaviate_rm import WeaviateRM
import weaviate
import openai


llm = dspy.OpenAI(model="gpt-3.5-turbo")

weaviate_client = weaviate.Client("http://localhost:8080")
retriever_model = WeaviateRM("WeaviateBlogChunk", weaviate_client=weaviate_client)
dspy.settings.configure(lm=llm, rm=retriever_model)

            Please consider upgrading to the latest version. See https://weaviate.io/developers/weaviate/client-libraries/python for details.


# Retriever Example

In [19]:
retrieve = dspy.Retrieve(k=5)
topK_passages = retrieve("What are re-rankers in search engines?").passages

In [20]:
topK_passages

['They offer the advantage of further reasoning about the relevance of results without needing specialized training. Cross Encoders can be interfaced with Weaviate to re-rank search results, trading off performance for slower search speed. * **Metadata Rankers** are context-based re-rankers that use symbolic features to rank relevance. They take into account user and document features, such as age, gender, location, preferences, release year, genre, and box office, to predict the relevance of candidate documents. By incorporating metadata features, these rankers offer a more personalized and context-aware search experience.',
 'Taken directly from the paper, “Our findings indicate that cross-encoder re-rankers can efficiently be improved without additional computational burden and extra steps in the pipeline by explicitly adding the output of the first-stage ranker to the model input, and this effect is robust for different models and query types”. Taking this a bit further, [Dinh et a

# Configure Basic RAG Program

In [None]:
# Maybe an idea to add

# `Please filter anything that is not relevant to the query.`

class Ranker(dspy.Signature):
    """Rank these documents in order of most relevant to least relevant."""
    
    query = dspy.InputField(desc="A query")
    context = dspy.InputField(desc="")

In [21]:
class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""
    
    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

In [22]:
class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()
        
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
    
    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

In [23]:
uncompiled_rag = RAG()

In [20]:
# Would love something like this:
# uncompiled_rag.generate_answer.answer_description = ""

In [24]:
uncompiled_rag("What are re-rankers in search engines?").context

['They offer the advantage of further reasoning about the relevance of results without needing specialized training. Cross Encoders can be interfaced with Weaviate to re-rank search results, trading off performance for slower search speed. * **Metadata Rankers** are context-based re-rankers that use symbolic features to rank relevance. They take into account user and document features, such as age, gender, location, preferences, release year, genre, and box office, to predict the relevance of candidate documents. By incorporating metadata features, these rankers offer a more personalized and context-aware search experience.',
 'Taken directly from the paper, “Our findings indicate that cross-encoder re-rankers can efficiently be improved without additional computational burden and extra steps in the pipeline by explicitly adding the output of the first-stage ranker to the model input, and this effect is robust for different models and query types”. Taking this a bit further, [Dinh et a

In [25]:
llm.inspect_history(n=1)





Answer questions with short factoid answers.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: often between 1 and 5 words

---

Context:
[1] «They offer the advantage of further reasoning about the relevance of results without needing specialized training. Cross Encoders can be interfaced with Weaviate to re-rank search results, trading off performance for slower search speed. * **Metadata Rankers** are context-based re-rankers that use symbolic features to rank relevance. They take into account user and document features, such as age, gender, location, preferences, release year, genre, and box office, to predict the relevance of candidate documents. By incorporating metadata features, these rankers offer a more personalized and context-aware search experience.»
[2] «Taken directly from the paper, “Our findings indicate that cross-encoder re-rankers 

In [26]:
class GenerateAnswer(dspy.Signature):
    """Use the context to answer questions about a database called Weaviate."""
    
    context = dspy.InputField(desc="may contain relevant facts for answering the question")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="a detailed answer to the question.")

class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()
        
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
    
    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

uncompiled_rag = RAG()
print(uncompiled_rag("How do re-rankers work in search engines?").answer)

Re-rankers in search engines, such as Weaviate, work by modifying the ranking of search results based on additional criteria or features. In the context provided, two types of re-rankers are mentioned: Cross Encoders and Metadata Rankers.

Cross Encoders are used to re-rank search results by incorporating further reasoning about the relevance of the results. These re-rankers can be interfaced with Weaviate to trade off search speed for improved performance. They are particularly effective when used in conjunction with Large Language Models (LLMs) for retrieval-augmented generation.

On the other hand, Metadata Rank


In [16]:
print(uncompiled_rag("Can you please write a detailed explanation of how to configure ref2vec in Weaviate?").answer)

To configure ref2vec in Weaviate, you can follow the steps below:

1. Choose a vectorizer model: Weaviate supports various vectorizer models and service providers. You can either use the `text2vec-huggingface` module to select one of the many sentence transformers published on Hugging Face or use other popular vectorization APIs like OpenAI or Cohere through the `text2vec-openai` or `text2vec-cohere` modules.

2. Install the required modules: If you decide to use a specific vectorizer model, you need to install the corresponding module in Weaviate. For example


In [17]:
# Move num_passages to the call to `forward`

In [17]:
llm.inspect_history(n=1)





Use the context to answer questions about a database called Weaviate.

---

Follow the following format.

Context: may contain relevant facts for answering the question

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: a detailed answer to the question.

---

Context:
[1] «Developers who want to build AI-powered applications can now skip the tedious process of complex training strategies. Now you can simply take models off-the-shelf and plug them into your apps. Applying a ranking model to hybrid search results is a promising approach to keep pushing the frontier of zero-shot AI. Imagine we want to retrieve information about the Weaviate Ref2Vec feature. If our application is using the Cohere embedding model, it has never seen this term or concept.»
[2] «Some models, such as [CLIP](https://openai.com/blog/clip/), are capable of vectorizing multiple data types (images and text in this case) into one vector space, so that a

In [19]:
# Compile!

In [44]:
devset[0]

Example({'question': 'Is there support to multiple versions of the query/document embedding models to co-exist at a given time? (helps with live experiments of new model versions)', 'answer': "You can create multiple classes in the Weaviate schema, where one class will act like a namespace in Kubernetes or an index in Elasticsearch. So the spaces will be completely independent, this allows space 1 to use completely different embeddings from space 2. The configured vectorizer is always scoped only to a single class. You can also use Weaviate's Cross-Reference features to make a graph-like connection between an object of Class 1 to the corresponding object of Class 2 to make it easy to see the equivalent in the other space."}) (input_keys={'question'})

In [74]:
type(f1_score(test_pred, test_example))

float

In [81]:
metricLM.inspect_history(n=1)

In [99]:
# Evaluate Uncompiled
from dspy.evaluate.evaluate import Evaluate

# Set up the `evaluate_on_hotpotqa` function. We'll use this many times below.
evaluate = Evaluate(devset=devset, num_threads=1, display_progress=True, display_table=5)

# Evaluate the `compiled_rag` program with the `answer_exact_match` metric.
#metric = dspy.evaluate.answer_exact_match
evaluate(uncompiled_rag, metric=llm_metric)


  0%|                                                    | 0/10 [00:00<?, ?it/s][A

The text above should answer `Is there support to multiple versions of the query/document embedding models to co-exist at a given time? (helps with live experiments of new model versions)`. The gold answer is `You can create multiple classes in the Weaviate schema, where one class will act like a namespace in Kubernetes or an index in Elasticsearch. So the spaces will be completely independent, this allows space 1 to use completely different embeddings from space 2. The configured vectorizer is always scoped only to a single class. You can also use Weaviate's Cross-Reference features to make a graph-like connection between an object of Class 1 to the corresponding object of Class 2 to make it easy to see the equivalent in the other space.`. Does the assessed text above contain the gold answer?



Average Metric: 0.0 / 1  (0.0):   0%|                    | 0/10 [00:03<?, ?it/s][A
Average Metric: 0.0 / 1  (0.0):  10%|█▏          | 1/10 [00:03<00:32,  3.62s/it][A

Prediction(
    assessment_answer='No'
)
The text above should answer `How can I retrieve the total object count in a class?`. The gold answer is `import HowToGetObjectCount from '/_includes/how.to.get.object.count.mdx'; > This `Aggregate` query returns the total object count in a class. <HowToGetObjectCount/>`. Does the assessed text above contain the gold answer?



Average Metric: 0.0 / 2  (0.0):  10%|█▏          | 1/10 [00:04<00:32,  3.62s/it][A
Average Metric: 0.0 / 2  (0.0):  20%|██▍         | 2/10 [00:04<00:17,  2.17s/it][A

Prediction(
    assessment_answer='No'
)
The text above should answer `How do I get the cosine similarity from Weaviate's certainty?`. The gold answer is `To obtain the [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity) from weaviate's `certainty`, you can do `cosine_sim = 2*certainty - 1``. Does the assessed text above contain the gold answer?



Average Metric: 0.0 / 3  (0.0):  20%|██▍         | 2/10 [00:05<00:17,  2.17s/it][A
Average Metric: 0.0 / 3  (0.0):  30%|███▌        | 3/10 [00:05<00:10,  1.46s/it][A

Prediction(
    assessment_answer='No'
)
The text above should answer `The quality of my search results change depending on the specified limit. Why? How can I fix this?`. The gold answer is `Weaviate makes use of ANN indices to serve vector searches. An ANN index is an approximate nearest neighbor index. The "approximate" part refers to an explicit recall-query-speed tradeoff. This trade-off is presented in detail in the [ANN benchmarks section](/developers/weaviate/benchmarks/ann.md#results). For example, a 98% recall for a given set of HNSW parameters means that 2% of results will not match the true nearest neighbors. What build parameters lead to what recall depends on the dataset used. The benchmark pages shows 4 different example datasets. Based on the characteristic of each dataset you can pick the one closest to your production load and draw conclusions about the expected recall for the respective build and query-time parameters. Generally if you need a higher recall than the d


Average Metric: 0.0 / 4  (0.0):  30%|███▌        | 3/10 [00:06<00:10,  1.46s/it][A
Average Metric: 0.0 / 4  (0.0):  40%|████▊       | 4/10 [00:06<00:08,  1.36s/it][A

Prediction(
    assessment_answer='No'
)
The text above should answer `Why did you use GraphQL instead of SPARQL?`. The gold answer is `For user experience. We want to make it as simple as possible to integrate Weaviate into your stack, and we believe that GraphQL is the answer to this. The community and client libraries around GraphQL are enormous, and you can use almost all of them with Weaviate.`. Does the assessed text above contain the gold answer?



Average Metric: 0.0 / 5  (0.0):  40%|████▊       | 4/10 [00:07<00:08,  1.36s/it][A
Average Metric: 0.0 / 5  (0.0):  50%|██████      | 5/10 [00:07<00:05,  1.19s/it][A

Prediction(
    assessment_answer='No'
)
The text above should answer `What is the best way to iterate through objects? Can I do paginated API calls?`. The gold answer is `Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. > > To iterate through all objects, you can use the `after` operator with both [REST](../api/rest/objects.md#exhaustive-listing-using-a-cursor-after) and [GraphQL](../api/graphql/additional-operators.md#cursor-with-after). > > For pagination through a result set, you can use the `offset` and `limit` operators for GraphQL API calls. Take a look at [this page](../api/graphql/filters.md#pagination-with-offset) which describes how to use these operators, including tips on performance and limitations.`. Does the assessed text above contain the gold answer?



Average Metric: 0.0 / 6  (0.0):  50%|██████      | 5/10 [00:08<00:05,  1.19s/it][A
Average Metric: 0.0 / 6  (0.0):  60%|███████▏    | 6/10 [00:08<00:04,  1.12s/it][A

Prediction(
    assessment_answer='No'
)
The text above should answer `What is best practice for updating data?`. The gold answer is `Here are top 3 best practices for updating data: > 1. Use the [batch API](../api/rest/batch.md) > 2. Start with a small-ish batch size e.g. 100 per batch. Adjust up if it is very fast, adjust down if you run into timeouts > 3. If you have unidirectional relationships (e.g. `Foo -> Bar`.) it's easiest to first import all `Bar` objects, then import all `Foo` objects with the refs already set. If you have more complex relationships, you can also import the objects without references, then use the [`/v1/batch/references API`](../api/rest/batch.md) to set links between classes in arbitrary directions.`. Does the assessed text above contain the gold answer?



Average Metric: 0.0 / 7  (0.0):  60%|███████▏    | 6/10 [00:09<00:04,  1.12s/it][A
Average Metric: 0.0 / 7  (0.0):  70%|████████▍   | 7/10 [00:09<00:03,  1.10s/it][A

Prediction(
    assessment_answer='No'
)
The text above should answer `Can I connect my own module?`. The gold answer is `[Yes!](/developers/weaviate/modules/other-modules/custom-modules.md)`. Does the assessed text above contain the gold answer?



Average Metric: 1.0 / 8  (12.5):  70%|███████▋   | 7/10 [00:10<00:03,  1.10s/it][A
Average Metric: 1.0 / 8  (12.5):  80%|████████▊  | 8/10 [00:10<00:02,  1.21s/it][A

Prediction(
    assessment_answer='Yes'
)
The text above should answer `Can I train my own text2vec-contextionary vectorizer module?`. The gold answer is `Not at the moment. You can currently use the [available contextionaries](/developers/weaviate/modules/retriever-vectorizer-modules/text2vec-contextionary.md) in a variety of languages and use the transfer learning feature to add custom concepts if needed.`. Does the assessed text above contain the gold answer?



Average Metric: 2.0 / 9  (22.2):  80%|████████▊  | 8/10 [00:11<00:02,  1.21s/it][A
Average Metric: 2.0 / 9  (22.2):  90%|█████████▉ | 9/10 [00:11<00:01,  1.14s/it][A

Prediction(
    assessment_answer='Yes'
)
The text above should answer `Does Weaviate use Hnswlib?`. The gold answer is `No > > Weaviate uses a custom implementation of HNSW that overcomes certain limitations of [hnswlib](https://github.com/nmslib/hnswlib), such as durability requirements, CRUD support, pre-filtering, etc. > > Custom HNSW implementation in Weaviate references: > > - [HNSW plugin (GitHub)](https://github.com/weaviate/weaviate/tree/master/adapters/repos/db/vector/hnsw) > - [vector dot product ASM](https://github.com/weaviate/weaviate/blob/master/adapters/repos/db/vector/hnsw/distancer/asm/dot_amd64.s) > > More information: > > - [Weaviate, an ANN Database with CRUD support – DB-Engines.com](https://db-engines.com/en/blog_post/87) ⬅️ best resource on the topic > - [Weaviate's HNSW implementation in the docs](/developers/weaviate/concepts/vector-index.md#hnsw) > > _Note I: HNSW is just one implementation in Weaviate, but Weaviate can support multiple indexing algoritmns as


Average Metric: 2.0 / 10  (20.0):  90%|█████████ | 9/10 [00:12<00:01,  1.14s/it][A
Average Metric: 2.0 / 10  (20.0): 100%|█████████| 10/10 [00:12<00:00,  1.28s/it][A

Prediction(
    assessment_answer='No'
)
Average Metric: 2.0 / 10  (20.0%)





Unnamed: 0,question,example_answer,context,pred_answer,llm_metric
0,Is there support to multiple versions of the query/document embedding models to co-exist at a given time? (helps with live experiments of new model versions),"You can create multiple classes in the Weaviate schema, where one class will act like a namespace in Kubernetes or an index in Elasticsearch. So...",['Developers who want to build AI-powered applications can now skip the tedious process of complex training strategies. Now you can simply take models off-the-shelf and...,"Based on the given context, there is no direct mention of whether Weaviate supports multiple versions of the query/document embedding models to co-exist at a...",0.0
1,How can I retrieve the total object count in a class?,import HowToGetObjectCount from '/_includes/how.to.get.object.count.mdx'; > This `Aggregate` query returns the total object count in a class.,"[""It was that Ofir did such a phenomenal job of figuring out a way to measure the complexity of the knowledge that was extracted from...","To retrieve the total object count in a class in the Weaviate database, you can use the Weaviate API. The API provides a ""GET"" request...",0.0
2,How do I get the cosine similarity from Weaviate's certainty?,"To obtain the [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity) from weaviate's `certainty`, you can do `cosine_sim = 2*certainty - 1`","['For example, if we broke down this blog post into **chapters** in Weaviate, with **title** and **content** properties. We could run a query to summarize...","Weaviate does not directly provide the cosine similarity from its certainty value. The certainty value in Weaviate represents the confidence level of the result, but...",0.0
3,The quality of my search results change depending on the specified limit. Why? How can I fix this?,"Weaviate makes use of ANN indices to serve vector searches. An ANN index is an approximate nearest neighbor index. The ""approximate"" part refers to an...",['|\n\nBy re-ranking the results we are able to get the clip where Jonathan Frankle describes the benchmarks created by Ofir Press et al. in the...,"The quality of search results can change depending on the specified limit because language models (LLMs) are constrained by input length. When humans search, they...",0.0
4,Why did you use GraphQL instead of SPARQL?,"For user experience. We want to make it as simple as possible to integrate Weaviate into your stack, and we believe that GraphQL is the...","['HNSW, on the other hand, implements the same idea a bit differently. Instead of having all information together on a flat graph, it has a...",The given context does not provide any information about why GraphQL was used instead of SPARQL.,0.0


20.0

In [21]:
# Compile

In [107]:
uncompiled_rag("what is ref2vec?")

Prediction(
    context=['Luckily, hybrid search comes to the rescue by combining the contextual semantics from the vector search and the keyword matching from the BM25 scoring. If the query is: `How can I use ref2vec to build a home feed?` The pairing of vector search and BM25 will return a good set of candidates. Now with the ranking model, it takes the [query, candidate document] pair as input and is able to further reason about the relevance of these results without specialized training. Let’s begin with categories of ranking models. We see roughly 3 different genres of ranking models with:\n\n1.', 'Developers who want to build AI-powered applications can now skip the tedious process of complex training strategies. Now you can simply take models off-the-shelf and plug them into your apps. Applying a ranking model to hybrid search results is a promising approach to keep pushing the frontier of zero-shot AI. Imagine we want to retrieve information about the Weaviate Ref2Vec feature. 

In [23]:
# Compile!

In [108]:
from dspy.teleprompt import BootstrapFewShot

teleprompter = BootstrapFewShot(metric=llm_metric)

# also common to init here, e.g. Rag()
compiled_rag = teleprompter.compile(uncompiled_rag, trainset=trainset)


  0%|                                                    | 0/20 [00:00<?, ?it/s][A

The text above should answer `Why would I use Weaviate as my vector database?`. The gold answer is `Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the "knowledge" in "vector databases," if you will). Our ultimate goal is to have Weaviate help you manage, index, and "understand" your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.`. Does the assessed text above contain the gold answer?



  5%|██▏                                         | 1/20 [00:01<00:19,  1.01s/it][A

Prediction(
    assessment_answer='No'
)
The text above should answer `What is the difference between Weaviate and for example Elasticsearch?`. The gold answer is `Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.`. Does the assessed text above contain the gold answer?



 10%|████▍                                       | 2/20 [00:02<00:20,  1.16s/it][A

Prediction(
    assessment_answer='Yes'
)
The text above should answer `Do you offer Weaviate as a managed service?`. The gold answer is `Yes, we do - check out [Weaviate Cloud Services](/pricing).`. Does the assessed text above contain the gold answer?



 15%|██████▌                                     | 3/20 [00:03<00:17,  1.00s/it][A

Prediction(
    assessment_answer='No'
)
The text above should answer `How should I configure the size of my instance?`. The gold answer is `You can find this in the [architecture section](/developers/weaviate/concepts/resources.md#an-example-calculation) of the docs.`. Does the assessed text above contain the gold answer?



 20%|████████▊                                   | 4/20 [00:04<00:15,  1.03it/s][A

Prediction(
    assessment_answer='No'
)
The text above should answer `Do I need to know about Docker (Compose) to use Weaviate?`. The gold answer is `Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the [Docker Introduction for Weaviate Users](https://medium.com/semi-technologies/what-weaviate-users-should-know-about-docker-containers-1601c6afa079).`. Does the assessed text above contain the gold answer?



 25%|███████████                                 | 5/20 [00:06<00:22,  1.47s/it][A

Prediction(
    assessment_answer='No'
)
The text above should answer `What happens when the Weaviate Docker container restarts? Is my data in the Weaviate database lost?`. The gold answer is `There are three levels: > 1. You have no volume configured (the default in our `Docker Compose` files), if the container restarts (e.g. due to a crash, or because of `docker stop/start`) your data is kept > 2. You have no volume configured (the default in our `Docker Compose` files), if the container is removed (e.g. from `docker compose down` or `docker rm`) your data is gone > 3. If a volume is configured, your data is persisted regardless of what happens to the container. They can be completely removed or replaced, next time they start up with a volume, all your data will be there`. Does the assessed text above contain the gold answer?



 30%|█████████████▏                              | 6/20 [00:10<00:32,  2.33s/it][A

Prediction(
    assessment_answer='No'
)
The text above should answer `Are there any 'best practices' or guidelines to consider when designing a schema?`. The gold answer is `As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. > > So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chap


 35%|███████████████▍                            | 7/20 [00:13<00:34,  2.68s/it][A

Prediction(
    assessment_answer='No'
)
The text above should answer `Should I use references in my schema?`. The gold answer is `In short: for convenience you can add relations to your data schema, because you need less code and queries to get data. But resolving references in queries takes some of the performance. > > 1. If your ultimate goal is performance, references probably don't add any value, as resolving them adds a cost. > 2. If your goal is represent complex relationships between your data items, they can help a lot. You can resolve references in a single query, so if you have classes with multiple links, it could definitely be helpful to resolve some of those connections in a single query. On the other hand, if you have a single (bi-directional) reference in your data, you could also just denormalize the links (e.g. with an ID field) and resolve them during search.`. Does the assessed text above contain the gold answer?



 40%|█████████████████▌                          | 8/20 [00:17<00:34,  2.86s/it][A

Prediction(
    assessment_answer='Yes'
)
The text above should answer `Is it possible to create one-to-many relationships in the schema?`. The gold answer is `Yes, it is possible to reference to one or more objects (Class -> one or more Classes) through cross-references. Referring to lists or arrays of primitives, this will be available [soon](https://github.com/weaviate/weaviate/issues/1611).`. Does the assessed text above contain the gold answer?



 45%|███████████████████▊                        | 9/20 [00:21<00:37,  3.38s/it][A

Prediction(
    assessment_answer='Yes'
)
The text above should answer `What is the difference between `text` and `string` and `valueText` and `valueString`?`. The gold answer is `The `text` and `string` datatypes differ in tokenization behavior. Note that `string` is now deprecated. Read more in [this section](../config-refs/schema/index.md#property-tokenization) on the differences.`. Does the assessed text above contain the gold answer?



 50%|█████████████████████▌                     | 10/20 [00:26<00:37,  3.76s/it][A

Prediction(
    assessment_answer='No'
)
The text above should answer `Do Weaviate classes have namespaces?`. The gold answer is `Yes. Each class itself acts like namespaces. Additionally, you can use the [multi-tenancy](../concepts/data.md#multi-tenancy) feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.`. Does the assessed text above contain the gold answer?



 55%|███████████████████████▋                   | 11/20 [00:28<00:31,  3.45s/it][A

Prediction(
    assessment_answer='No'
)
The text above should answer `Are there restrictions on UUID formatting? Do I have to adhere to any standards?`. The gold answer is `The UUID must be presented as a string matching the [Canonical Textual representation](https://en.wikipedia.org/wiki/Universally_unique_identifier#Format). If you don't specify a UUID, Weaviate will generate a `v4` i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use [`v3` or `v5`](https://en.wikipedia.org/wiki/Universally_unique_identifier#Versions_3_and_5_(namespace_name-based)).`. Does the assessed text above contain the gold answer?



 60%|█████████████████████████▊                 | 12/20 [00:32<00:27,  3.47s/it][A

Prediction(
    assessment_answer='No'
)
The text above should answer `If I do not specify a UUID during adding data objects, will Weaviate create one automatically?`. The gold answer is `Yes, a UUID will be created if not specified.`. Does the assessed text above contain the gold answer?



 65%|███████████████████████████▉               | 13/20 [00:35<00:19,  2.76s/it][A

Prediction(
    assessment_answer='Yes'
)
Bootstrapped 4 full traces after 14 examples in round 0.





In [109]:
evaluate(compiled_rag, metric=llm_metric)


  0%|                                                    | 0/10 [00:00<?, ?it/s][A

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x1278bd750> with kwargs {}
Backing off 1.9 seconds after 2 tries calling function <function GPT3.request at 0x1278bd750> with kwargs {}
Backing off 3.8 seconds after 3 tries calling function <function GPT3.request at 0x1278bd750> with kwargs {}


KeyboardInterrupt: 

In [111]:
compiled_rag("what is ref2vec?")

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x1278bd750> with kwargs {}
Backing off 1.0 seconds after 2 tries calling function <function GPT3.request at 0x1278bd750> with kwargs {}
Backing off 0.9 seconds after 3 tries calling function <function GPT3.request at 0x1278bd750> with kwargs {}


KeyboardInterrupt: 

In [112]:
llm.inspect_history(n=1)





Use the context to answer questions about a database called Weaviate.

---

Question: Can I use Weaviate to create a traditional knowledge graph?
Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a _knowledge_ graph 😉.

Question: What is the difference between Weaviate and for example Elasticsearch?
Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.

Questio

In [105]:
compiled_rag("what is ref2vec?")

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x1278bd750> with kwargs {}
Backing off 0.5 seconds after 2 tries calling function <function GPT3.request at 0x1278bd750> with kwargs {}
Backing off 2.0 seconds after 3 tries calling function <function GPT3.request at 0x1278bd750> with kwargs {}


Average Metric: 1.0894846750368796 / 8  (13.6):  80%|▊| 8/10 [06:25<01:36, 48.14
Average Metric: 0.0 / 4  (0.0):  40%|████▊       | 4/10 [06:33<09:49, 98.26s/it]
Average Metric: 0.0 / 4  (0.0):  40%|████▊       | 4/10 [04:25<06:38, 66.34s/it]


KeyboardInterrupt: 