# Listwise Reranking using OpenAI

![Listwise ranking](./listwise.jpeg)

#### Listwise reranking takes all the retrieved documents and query as input to the ranker. Listwise reranking uses prompt engineering to feed in the input of the retrieved documents + query and returns a structured output of the results [Doc B > Doc C > Doc A]. The objective of the LLM is to find the best document ordering that maximizes the retrieval metric (i.e. nDCG, precision).

### Connect to Weaviate instance

In [59]:
import weaviate
import json
import os
from dotenv import load_dotenv

load_dotenv()

client = weaviate.Client(
    url = os.getenv("WEAVIATE_URL"),  # Replace with your cluster url
    auth_client_secret=weaviate.AuthApiKey(api_key=os.getenv("WEAVIATE_API_KEY")),  # Replace w/ your Weaviate instance API key
    additional_headers = {
        "X-OpenAI-Api-Key": "sk-key" #  os.getenv("OPENAI_API_KEY") Replace with your inference API key
    }
)

### Load in FAQ json

In [16]:
with open("faq.json", "r") as f:
    json_data = json.load(f)

queries = [{"question": item["question"], "answer": item["answer"], "number": item["number"]} for item in json_data["questions"]]

### Create Weaviate Schema

In [3]:
# resetting the schema. CAUTION: THIS WILL DELETE YOUR DATA 
client.schema.delete_all()

schema = {
   "classes": [
       {
           "class": "FAQ_Answers",
           "description": "Answers to Weaviate FAQs",
           "vectorizer": "text2vec-openai",
           "properties": [
               {
                  "name": "Answer",
                  "dataType": ["text"],
                  "description": "An answer to the FAQ question",
               },
               {
                  "name": "Number",
                  "dataType": ["text"],
                  "description": "The FAQ id",
                }
            ]
        }
    ]
}

client.schema.create(schema)

print("Successfully created the schema.")

Successfully created the schema.


#### Upload answers to Weaviate

In [4]:
from weaviate.util import get_valid_uuid
from uuid import uuid4

for item in queries:
  print(item)
  properties = {
    "Answer": item["answer"],
    "Number": item["number"]
  }
  id = get_valid_uuid(uuid4())
  client.data_object.create(properties, "FAQ_Answers", id)
    

{'question': 'Why would I use Weaviate as my vector database?', 'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.', 'number': '1'}
{'question': 'What is the difference between Weaviate and for example Elasticsearch?', 'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is store

### Retrieve information

##### Prompt 1

In [63]:
import openai

openai.api_key = "sk-key"

def openai_request(prompt):
    response = openai.chat.completions.create(
        model="gpt-4-1106-preview",
        messages=[
        {
        "role": "system",
        "content": """
        You are a reranking agent. Each potential answer has a corresponding Answer id and you're tasked with ranking the questions based on their relevancy to the query.
        The output should ONLY contain the ranked list and no additional comments.
        """
        },
        {
            "role": "user",
            "content": prompt
        }],
        temperature=1,
        max_tokens=2048,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0
    )
    return response.choices[0].message.content

In [64]:
def parse_ranked_ids(ids):
    elements = ids.split(',')

    first_element = elements[0].strip('[]')

    return int(first_element)


correct = 0
for query_obj in queries:
    query = query_obj["question"]
    ground_truth = int(query_obj["number"])

    results = client.query.raw("""
    {
        Get {
            FAQ_Answers(
                nearText: {
                    concepts: ["%s"]
                },
                limit: 10
            ){
                answer
                number
            }
        }
    }                          
    """ % query)
    print(f"Weaviate search results {results}\n")

    reranking_template = f"\nINPUT:\nQUERY: {query}"
    reranking_template += "\nPlease rerank these search results.\n"
    for result in results["data"]["Get"]["FAQ_Answers"]:
        id, answer = result["number"], result["answer"]
        reranking_template += f"[Answer id: {id}, Answer: {answer}]\n"
    print(reranking_template)
    print("\n")
    ranked_ids = openai_request(reranking_template) # send the query along with retrieved results to OpenAI
    first_ranking = parse_ranked_ids(ranked_ids)
    print(f"OPENAI 1st Rank = {first_ranking}\n") # first ranking from gpt-4
    print(f"GROUND TRUTH = {ground_truth}") # ground truth
    if (first_ranking == ground_truth):
        correct += 1

print(correct / len(queries) * 100)


Weaviate search results {'data': {'Get': {'FAQ_Answers': [{'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.', 'number': '2'}, {'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it ev

ValueError: invalid literal for int() with base 10: '3\n1\n4\n10\n11\n14\n7\n8\n9\n2'

##### Prompt 2

In [45]:
def openai_request(prompt):
    response = openai.chat.completions.create(
        model="gpt-4-1106-preview",
        messages=[
        {
        "role": "system", #systems prompt
        "content": """
            You are a reranking agent! Each potential answer has a corresponding Answer id and you're tasked with ranking the answers based on their relevancy to the QUERY.
            You SHOULD ONLY contain the ranked list and no additional comments, such as [8],[7],[3],[1],[5]. THIS IS VERY IMPORTANT!
            Here is an example of the task you should perform:
            INPUT:
            QUERY: Why would I use Weaviate as my vector database?
            Please rerank these search results.
            [Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
            [Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
            [Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
            [Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
            [Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
            [Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
            [Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
            [Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
            [Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
            [Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]
            

            OUTPUT:
            [1],[2] 
        """
        },
        {
        "role": "user", # users prompt
        "content": prompt
        }],
        temperature=1,
        max_tokens=2048,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0
    )
    return response.choices[0].message.content

In [46]:
def parse_ranked_ids(ids):
    elements = ids.split(',')

    first_element = elements[0].strip('[]')

    return int(first_element)


correct = 0
for query_obj in queries:
    query = query_obj["question"]
    ground_truth = int(query_obj["number"])

    results = client.query.raw("""
    {
        Get {
            FAQ_Answers(
                nearText: {
                    concepts: ["%s"]
                },
                limit: 10
            ){
                answer
                number
            }
        }
    }                          
    """ % query)
    print(f"Weaviate search results {results}\n")

    reranking_template = f"\nINPUT: \nQUERY: {query}"
    reranking_template += "\nPlease rerank these search results.\n"
    for result in results["data"]["Get"]["FAQ_Answers"]:
        id, answer = result["number"], result["answer"]
        reranking_template += f"[Answer id: {id}, Answer: {answer}]\n"
    print(reranking_template)
    print("\n")
    ranked_ids = openai_request(reranking_template) # send the query along with retrieved results to OpenAI
    print(f"RAW OUTPUT FROM OPENAI = {ranked_ids}\n")
    first_ranking = parse_ranked_ids(ranked_ids)
    print(f"OPENAI 1st Rank = {first_ranking}\n") # first ranking from gpt-4
    print(f"GROUND TRUTH = {ground_truth}") # ground truth
    if (first_ranking == ground_truth):
        correct += 1

print(correct / len(queries) * 100)


Weaviate search results {'data': {'Get': {'FAQ_Answers': [{'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.', 'number': '2'}, {'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it ev

##### Prompt 3

In [47]:
def openai_request(prompt):
    response = openai.chat.completions.create(
        model="gpt-4-1106-preview",
        messages=[
        {
        "role": "system", #systems prompt
        "content": """
        You are a reranking agent. Each potential answer has a corresponding Answer id and you're tasked with ranking the questions based on their relevancy to the QUERY.
    VERY IMPORTANT!!! The output SHOULD ONLY contain the ranked list and no additional comments, such as [8],[7],[3],[1],[5]! THIS IS VERY IMPORTANT!
    Here is an example of the task you should perform:
    INPUT:
    QUERY: Why would I use Weaviate as my vector database?
    Please rerank these search results.
    [Answer id: 2, Answer: Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.]
    [Answer id: 1, Answer: Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it everywhere. This is the reason why Weaviate comes containerized.]
    [Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
    [Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
    [Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]
    [Answer id: 10, Answer: Yes, you can! Weaviate support ontology, RDF-like definitions in its schema, and it runs out of the box. It is scalable, and the GraphQL API will allow you to query through your knowledge graph easily. But now you are here. We like to suggest you really try its semantic features. After all, you are creating a knowledge graph 😉.]
    [Answer id: 5, Answer: As a rule of thumb, the smaller the units, the more accurate the search will be. Two objects of e.g. a sentence would most likely contain more information in their vector embedding than a common vector (which is essentially just the mean of sentences). At the same time more objects leads to a higher import time and (since each vector also makes up some data) more space. (E.g. when using transformers, a single vector is 768xfloat32 = 3KB. This can easily make a difference if you have millions, etc.) of vectors. As a rule of thumb, the more vectors you have the more memory you're going to need. So, basically, it's a set of tradeoffs. Personally we've had great success with using paragraphs as individual units, as there's little benefit in going even more granular, but it's still much more precise than whole chapters, etc. You can use cross-references to link e.g. chapters to paragraphs. Note that resolving a cross-references takes a slight performance penalty. Essentially resolving A1->B1 is the same cost as looking up both A1 and B1 indvidually. This cost however, will probably only matter at really large scale.]
    [Answer id: 8, Answer: The UUID must be presented as a string matching the Canonical Textual representation. If you don't specify a UUID, Weaviate will generate a v4 i.e. a random UUID. If you generate them yourself you could either use random ones or deterministically determine them based on some fields that you have. For this you'll need to use v3 or v5.]
    [Answer id: 7, Answer: Yes. Each class itself acts like namespaces. Additionally, you can use the multi-tenancy feature to create isolated storage for each tenant. This is especially useful for use cases where one cluster might be used to store data for multiple customers or users.]
    [Answer id: 13, Answer: To obtain the cosine similarity from weaviate's certainty, you can do cosine_sim = 2*certainty - 1]

    OUTPUT:
    [1],[2] 
        """
        },
        {
            "role": "user", # users prompt
            "content": prompt
        }],
        temperature=1,
        max_tokens=2048,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0
    )
    return response.choices[0].message.content

In [48]:
def parse_ranked_ids(ids):
    elements = ids.split(',')

    first_element = elements[0].strip('[]')

    return int(first_element)


correct = 0
for query_obj in queries:
    query = query_obj["question"]
    ground_truth = int(query_obj["number"])

    results = client.query.raw("""
    {
        Get {
            FAQ_Answers(
                nearText: {
                    concepts: ["%s"]
                },
                limit: 10
            ){
                answer
                number
            }
        }
    }                          
    """ % query)
    print(f"Weaviate search results {results}\n")
    

    reranking_template = f"\nINPUT: \nQUERY: {query}"
    reranking_template += "\nPlease rerank these search results.\n"
    for result in results["data"]["Get"]["FAQ_Answers"]:
        id, answer = result["number"], result["answer"]
        reranking_template += f"[Answer id: {id}, Answer: {answer}]\n"
    print(reranking_template)
    print("\n")
    ranked_ids = openai_request(reranking_template) # send the query along with retrieved results to OpenAI
    print(f"RAW OUTPUT FROM OPENAI = {ranked_ids}\n")
    first_ranking = parse_ranked_ids(ranked_ids)
    print(f"OPENAI 1st Rank = {first_ranking}\n") # first ranking from gpt-4
    print(f"GROUND TRUTH = {ground_truth}") # ground truth
    if (first_ranking == ground_truth):
        correct += 1

print(correct / len(queries) * 100)

Weaviate search results {'data': {'Get': {'FAQ_Answers': [{'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.', 'number': '2'}, {'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it ev

##### Prompt 4

In [55]:
def openai_request(prompt):
    response = openai.chat.completions.create(
        model="gpt-4-1106-preview",
        messages=[
        {
        "role": "system", #systems prompt
        "content": """
        You are a reranking agent. Each Answer has a corresponding Answer id and you're tasked with ranking the questions based on their relevancy to the QUERY.
        The output should ONLY contain the ranked list and no additional comments. An example of your output looks like this [8],[7],[3],[1],[5]!
        Here is an example of the task you will perform:
        ```
        INPUT:
        QUERY: Why would I use Weaviate as my vector database?
        Please rerank these search results.
        [Answer id: 3, Answer: Weaviate uses Docker images as a means to distribute releases and uses Docker Compose to tie a module-rich runtime together. If you are new to those technologies, we recommend reading the Docker Introduction for Weaviate Users.]
        [Answer id: 11, Answer: We use a schema because it focusses on the representation of your data (in our case in the GraphQL API) but you can use a Weaviate schema to express an ontology. One of Weaviate's core features is that it semantically interprets your schema (and with that your ontology) so that you can search for concepts rather than formally defined entities.]
        [Answer id: 14, Answer: Yes, Weaviate supports cursor-based iteration as well as pagination through a result set. To iterate through all objects, you can use the after operator with both REST and GraphQL. For pagination through a result set, you can use the offset and limit operators for GraphQL API calls. Take a look at this page which describes how to use these operators, including tips on performance and limitations.]

        OUTPUT:
        [14],[3] 
        ```
        Please consider all answers first before making your final decision!

        """
        },
        {
            "role": "user", # users prompt
            "content": prompt
        }],
        temperature=1,
        max_tokens=2048,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0
    )
    return response.choices[0].message.content

In [56]:
def parse_ranked_ids(ids):
    elements = ids.split(',')

    first_element = elements[0].strip('[]')

    return int(first_element)


correct = 0
for query_obj in queries:
    query = query_obj["question"]
    ground_truth = int(query_obj["number"])

    results = client.query.raw("""
    {
        Get {
            FAQ_Answers(
                nearText: {
                    concepts: ["%s"]
                },
                limit: 10
            ){
                answer
                number
            }
        }
    }                          
    """ % query)
    print(f"Weaviate search results {results}\n")
    

    reranking_template = f"\nINPUT: \nQUERY: {query}"
    reranking_template += "\nPlease rerank these search results.\n"
    for result in results["data"]["Get"]["FAQ_Answers"]:
        id, answer = result["number"], result["answer"]
        reranking_template += f"[Answer id: {id}, Answer: {answer}]\n"
    print(reranking_template)
    print("\n")
    ranked_ids = openai_request(reranking_template) # send the query along with retrieved results to OpenAI
    print(f"RAW OUTPUT FROM OPENAI = {ranked_ids}\n")
    first_ranking = parse_ranked_ids(ranked_ids)
    print(f"OPENAI 1st Rank = {first_ranking}\n") # first ranking from gpt-4
    print(f"GROUND TRUTH = {ground_truth}") # ground truth
    if (first_ranking == ground_truth):
        correct += 1

print(correct / len(queries) * 100)

Weaviate search results {'data': {'Get': {'FAQ_Answers': [{'answer': 'Other database systems like Elasticsearch rely on inverted indices, which makes search super fast. Weaviate also uses inverted indices to store data and values. But additionally, Weaviate is also a vector-native search database, which means that data is stored as vectors, which enables semantic search. This combination of data storage is unique, and enables fast, filtered and semantic search from end-to-end.', 'number': '2'}, {'answer': 'Our goal is three-folded. Firstly, we want to make it as easy as possible for others to create their own semantic systems or vector search engines (hence, our APIs are GraphQL based). Secondly, we have a strong focus on the semantic element (the knowledge in vector databases, if you will). Our ultimate goal is to have Weaviate help you manage, index, and understand your data so that you can build newer, better, and faster applications. And thirdly, we want you to be able to run it ev