# Question Answering with Embeddings in Azure Cache for Redis (Enterprise)
*This notebook has been edited from its [original form](https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb) to store and retrieve embeddings with [Azure Cache for Redis](https://azure.microsoft.com/products/cache/).*

Many use cases require [GPT-3](https://openai.com/blog/gpt-3-apps/) to respond to user questions with insightful answers. For example, a customer support chatbot may need to provide answers to common questions. The GPT models have picked up a lot of general knowledge in training, but we often need to ingest and use a large library of more specific information.

In this notebook we will:
- Demonstrate a method for enabling GPT-3 to answer questions using a library of text as a reference
- Use document embeddings for KNN search and retrieval in Redis
- Use a dataset of Wikipedia articles about the 2020 Summer Olympic Games.

Refer to [this notebook](https://github.com/openai/openai-cookbook/blob/main/examples/fine-tuned_qa/olympics-1-collect-data.ipynb) to follow the data gathering process.

## Table of Contents

- [0 - Preventing hallucinations with prompt engineering](#0---preventing-hallucination-with-prompt-engineering)
- [1 - Preprocess document library](#1---preprocess-the-document-library)
  - [1.5 - Load the embeddings into Redis](#11---load-the-embeddings-into-redis)
- [2 - Question and Document Similarity using Redis](#2---question-and-document-similarity-using-redis)
- [3 - Add the most relevant document sections to the query prompt](#3---add-the-most-relevant-document-sections-to-the-query-prompt)
- [4 - Answer the question based on the context](#4---answer-the-question-based-on-the-context)
- [5 - Expanding capability with hybrid search](#5---expanding-capability-with-hybrid-search)

In [None]:
# Install requirements
! pip install -r requirements.txt

### Make sure the following environmant variables are set

In [14]:
import os

#os.environ['OPENAI_API_KEY'] = '<your Azure OpenAI API key>'
#os.environ['REDIS_PASSWORD'] = '<your password for Redis>'


### Connect to the Azure OpenAI Service and select the model names

In [15]:
import openai
import pickle
import pandas as pd
import numpy as np
import typing as t

from transformers import GPT2TokenizerFast

# Point the OpenAI SDK to the Azure endpoint
openai.api_type = "azure"
openai.api_key = os.getenv("OPENAI_API_KEY") 
openai.api_base = "https://{}.openai.azure.com/".format("<your-resource-name>")
openai.api_version = "2022-12-01"

COMPLETIONS_MODEL = "text-davinci-003"
DOC_EMBEDDINGS_MODEL = "text-search-ada-doc-001"
QUERY_EMBEDDINGS_MODEL = "text-search-ada-query-001"

### Connect to the Redis instance in Azure Cache for Redis
[Redis Stack](https://redis.io/docs/stack/search/) is composed of a number of modules that extend Redis and its capabilities. In particular, [RediSearch](https://redis.io/docs/stack/search/reference/vectors/) enables fast vector search, and is what we will lean on here for this example!

For example, Redis Stack gives you the ability to store documents in JSON (with RedisJSON) AND build search indices over top (with RediSearch 2.6+). These popular modules are only supported on the **Enterprise Tier of Azure Cache for Redis** or the Redis Stack docker container.

Follow the instructions here to [create a Redis database with Stack enabled](https://learn.microsoft.com/azure/azure-cache-for-redis/cache-redis-modules#adding-modules-to-your-cache). 

In [16]:
import os
import redis

from redis.commands.search.indexDefinition import (
    IndexDefinition,
    IndexType
)
from redis.commands.search.query import Query
from redis.commands.search.field import (
    TextField,
    NumericField,
    VectorField
)

# Acquire your Redis Host, Port, and Password

# Using Docker Redis
#r = redis.Redis(host="localhost", port=6379) # host=redis if inside same Docker network as the database

# Using ACRE Redis - define connection args
REDIS_PASSWORD = os.getenv("REDIS_PASSWORD") 
REDIS_ENDPOINT =  ""
REDIS_PORT = 10000

# Connect to Redis
r = redis.Redis(
    host=REDIS_ENDPOINT,
    port=REDIS_PORT,
    password=REDIS_PASSWORD,
    ssl=True
) 

By default, GPT-3 isn't an expert on the 2020 Olympics:

In [17]:
prompt = "Who won the 2020 Summer Olympics men's high jump?"

openai.Completion.create(
    prompt=prompt,
    temperature=0,
    max_tokens=300,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0,
    engine=COMPLETIONS_MODEL
)["choices"][0]["text"].strip(" \n")

"Marcelo Chierighini of Brazil won the gold medal in the men's high jump at the 2020 Summer Olympics."

Marcelo is a gold medalist swimmer, and, we assume, not much of a high jumper! **Evidently GPT-3 needs some assistance here.**

The first issue to tackle is that the model is hallucinating an answer rather than telling us "I don't know". This is bad because it makes it hard to trust the answer that the model gives us!

# 0 - Preventing hallucination with prompt engineering

We can address this hallucination issue by being more explicit with our prompt:


In [18]:
prompt = """Answer the question as truthfully as possible, and if you're unsure of the answer, say "Sorry, I don't know".

Q: Who won the 2020 Summer Olympics men's high jump?
A:"""

openai.Completion.create(
    prompt=prompt,
    temperature=0,
    max_tokens=300,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0,
    engine=COMPLETIONS_MODEL
)["choices"][0]["text"].strip(" \n")

"Sorry, I don't know."

To help the model answer the question, we provide extra contextual information in the prompt. When the total required context is short, we can include it in the prompt directly. For example we can use this information taken from Wikipedia. We update the initial prompt to tell the model to explicitly make use of the provided text.

In [19]:
prompt = """Answer the question as truthfully as possible using the provided text, and if the answer is not contained within the text below, say "I don't know"

Context:
The men's high jump event at the 2020 Summer Olympics took place between 30 July and 1 August 2021 at the Olympic Stadium.
33 athletes from 24 nations competed; the total possible number depended on how many nations would use universality places 
to enter athletes in addition to the 32 qualifying through mark or ranking (no universality places were used in 2021).
Italian athlete Gianmarco Tamberi along with Qatari athlete Mutaz Essa Barshim emerged as joint winners of the event following
a tie between both of them as they cleared 2.37m. Both Tamberi and Barshim agreed to share the gold medal in a rare instance
where the athletes of different nations had agreed to share the same medal in the history of Olympics. 
Barshim in particular was heard to ask a competition official "Can we have two golds?" in response to being offered a 
'jump off'. Maksim Nedasekau of Belarus took bronze. The medals were the first ever in the men's high jump for Italy and 
Belarus, the first gold in the men's high jump for Italy and Qatar, and the third consecutive medal in the men's high jump
for Qatar (all by Barshim). Barshim became only the second man to earn three medals in high jump, joining Patrik Sjöberg
of Sweden (1984 to 1992).

Q: Who won the 2020 Summer Olympics men's high jump?
A:"""

openai.Completion.create(
    prompt=prompt,
    temperature=0,
    max_tokens=300,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0,
    engine=COMPLETIONS_MODEL
)["choices"][0]["text"].strip(" \n")

'Gianmarco Tamberi and Mutaz Essa Barshim emerged as joint winners of the event.'

**Adding extra information into the prompt only works when the dataset of extra content that the model may need to know is small enough to fit in a single prompt**. What do we do when we need the model to choose relevant contextual information from within a large body of information?

**In the remainder of this notebook, we will demonstrate a method for augmenting GPT-3 with a large body of additional contextual information by using document embeddings and retrieval in Azure Cache for Redis**

This method answers queries in two steps: first it retrieves the information relevant to the query, then it writes an answer tailored to the question based on the retrieved information. The first step uses the [Embedding API](https://learn.microsoft.com/azure/cognitive-services/openai/how-to/embeddings?tabs=python), the second step uses the [Completions API](https://learn.microsoft.com/azure/cognitive-services/openai/how-to/completions).

The steps are:
* Preprocess the contextual information by splitting the docs into chunks and create an embedding vector for each chunk.
* On receiving a query, embed the query in the same vector space as the context chunks and find the context embeddings which are most similar to the query using the **RediSearch** module.
* Prepend the most relevant context embeddings to the query prompt.
* Submit the question along with the most relevant context to GPT, and receive an answer which makes use of the provided contextual information.

# 1 - Preprocess the document library

We plan to use document embeddings to fetch the most relevant part of parts of our document library and insert them into the prompt that we provide to GPT-3. We therefore need to break up the document library into "sections" of context, which can be searched and retrieved separately. 

Sections should be large enough to contain enough information to answer a question; but small enough to fit one or several into the GPT-3 prompt. We find that approximately a paragraph of text is usually a good length, but you should experiment for your particular use case. In this example, Wikipedia articles are already grouped into semantically related headers, so we will use these to define our sections. This preprocessing has already been done in [this notebook](https://github.com/openai/openai-cookbook/blob/main/examples/fine-tuned_qa/olympics-1-collect-data.ipynb), so we will load the results and use them.

In [20]:
# We have hosted the processed dataset, so you can download it directly without having to recreate it.
# This dataset has already been split into sections, one row for each section of the Wikipedia page.

df = pd.read_csv('https://cdn.openai.com/API/examples/data/olympics_sections_text.csv')
print(f"{len(df)} rows in the data.")
df.sample(5)

3964 rows in the data.


Unnamed: 0,title,heading,content,tokens
3784,Serbia at the 2020 Summer Olympics,Table tennis,Serbia entered three athletes into the table t...,62
2681,Field hockey at the 2020 Summer Olympics – Wom...,Table,^1 – Japan qualified both as the hosts and th...,42
2041,Miraitowa and Someity,Characteristics,"Miraitowa, the Olympic mascot, is a figure wit...",420
419,Rowing at the 2020 Summer Olympics – Men's cox...,Summary,The men's coxless four event at the 2020 Summe...,79
2889,Argentina at the 2020 Summer Olympics,Athletics,Argentine athletes achieved the entry standard...,48


We preprocess the document sections by creating an embedding vector for each section. An embedding is a vector of numbers that helps us understand how semantically similar or different the texts are. The closer two embeddings are to each other, the more similar are their contents. See the [documentation on Azure OpenAI embeddings](https://learn.microsoft.com/azure/cognitive-services/openai/how-to/embeddings?tabs=python) for more information.

This indexing stage can be executed offline and only runs once to precompute the indexes for the dataset so that each piece of content can be retrieved later. In this example, we use **RediSearch** to power the vector search in Azure Cache for Redis.

For the purposes of this tutorial we chose to use [ada embeddings](https://learn.microsoft.com/azure/cognitive-services/openai/concepts/models#embeddings-models) for embedding creation (for both docs and queries), which are a very good price and still perform well.

In [21]:
import time

def get_embedding(text: str, model: str) -> t.List[float]:
    """
    Fetch embedding given input text from OpenAI.

    Args:
        text (str): Text input for which to create embedding.
        model (str): OpenAI model to use for embedding creation.

    Returns:
        t.List[float]: OpenAI Embedding.
    """
    result = openai.Embedding.create(
      engine=model,
      input=text
    )
    return result["data"][0]["embedding"]

def get_doc_embedding(text: str) -> list[float]:
    retries = 0
    max_retries = 5

    while retries < max_retries:
        try:
            result = get_embedding(text, DOC_EMBEDDINGS_MODEL)
            break
        except openai.error.RateLimitError as e:
            retries += 1
            time.sleep(12)

    return result

def get_query_embedding(text: str) -> t.List[float]:
    return get_embedding(text, QUERY_EMBEDDINGS_MODEL)

def compute_doc_embeddings(df: pd.DataFrame) -> t.Dict[t.Tuple[str, str], t.List[float]]:
    """
    Create an embedding for each row in the dataframe using the OpenAI Embeddings API.

    Return a dictionary that maps between each embedding vector and the index of the row that it corresponds to.
    """
    embeddings = {}
    for idx, row in df.iterrows():
        embedding = get_doc_embedding(row["content"].replace("\n", ""))
        embeddings[idx] = embedding

        if idx % 100 == 0:
            print(f"Processed {idx} rows of {len(df)}.")
    return embeddings

Compute the embeddings for the document sections. This will take several minutes.

In [None]:
# computing the embeddings will take several minutes
document_embeddings = compute_doc_embeddings(df)

In [23]:
# An example embedding:
example_entry = list(document_embeddings.items())[0]
print(f"{example_entry[0]} : {example_entry[1][:5]}... ({len(example_entry[1])} entries)")

0 : [-0.00617007864639163, 0.01927444338798523, -0.017017723992466927, 0.05506397783756256, -0.004782708361744881]... (1024 entries)


So we split our document library into sections, and encoded them by creating embedding vectors that represent each chunk. Next we will use these embeddings to answer our users' questions.

<img align="right" src="../images/redis.svg" style="width: 5%; margin-right: 10%">

## 1.5 - Load the Embeddings into Redis

[Redis Stack](https://redis.io/docs/stack/search/) is composed of a number of modules that extend Redis and its capabilities.

For example, Redis Stack gives you the ability to store documents in JSON (with RedisJSON) AND build search indices over top (with RediSearch 2.6+). These popular modules are only supported on the **Enterprise Tier of Azure Cache for Redis** or the Redis Stack docker container.

Follow the instructions here to [create a Redis database with Stack enabled](https://learn.microsoft.com/azure/azure-cache-for-redis/cache-redis-modules#adding-modules-to-your-cache). [RediSearch](https://redis.io/docs/stack/search/reference/vectors/) enables fast vector search, and is what we will lean on here for this example!

### Define some helper functions to use with Redis
You'll need to tweak these functions if your schema changes.

Redis supports both FLAT (brute force) and HNSW (approximate) vector search. In this example, we're creating a FLAT index but this can easily be changed to [HNSW](https://redis.io/docs/stack/search/reference/vectors/#hnsw) for faster search at a lower recall (accuracy) mark. See the docs at the link to discover the params that you can use to tune this capability.


In [24]:
# Constants
VECTOR_DIM = len(example_entry[1])
NUM_VECTORS = len(document_embeddings)
INDEX_NAME = "embeddings-index"
PREFIX = "embedding"
DISTANCE_METRIC = "COSINE" # Cosine Similarity


# Helper Functions
def create_index(
    index_name: str,
    prefix: str,
    fields: list
):
    """
    Create the RediSearch index. Indices can be used for standard and hybrid
    style searches combining multiple fields. See available field types:
    https://github.com/redis/redis-py/blob/master/redis/commands/search/field.py

    Args:
        index_name (str): Name of the RediSearch index to create.
        prefix (str): Key prefix that assumes membership to the RediSearch index.
        fields (list): List of RediSearch fields to include in the index.
    """
    # Create RediSearch Index
    return r.ft(index_name).create_index(
        fields = fields,
        definition = IndexDefinition(prefix=[prefix], index_type=IndexType.HASH)
    )

def delete_index(index_name: str = INDEX_NAME, drop_docs: bool = False):
    """
    Delete the RediSearch index.

    Args:
        index_name (str, optional): Name of the RediSearch index to delete. Defaults to INDEX_NAME.
        drop_docs (bool, optional): Drop all index documents? Defaults to False.
    """
    return r.ft(index_name).dropindex(delete_documents=drop_docs)

def index_documents(prefix: str, embeddings_lookup: dict, documents: list):
    """
    Index a list of documents in RediSearch.

    Args:
        prefix (str): RediSearch prefix on keys associated with the index.
        embeddings_lookup (dict): Doc embedding lookup dict.
        documents (list): List of docs to set in the index.
    """
    # Iterate through documents and store in Redis
    # NOTE: use async Redis client for even better throughput
    pipe = r.pipeline(transaction=False)
    for i, doc in enumerate(documents):
        key = f"{prefix}:{i}"
        embedding = embeddings_lookup[i]
        doc["embedding"] = np.array(embedding, dtype=np.float32).tobytes()
        pipe.hset(key, mapping = doc)
        if i % 150 == 0:
            pipe.execute()
    pipe.execute()

def fetch_embedding(key: str) -> np.array:
    """
    Fetch embedding from Redis.

    Args:
        key (str): Redis key for which to fetch the embedding.

    Returns:
        np.array: Numpy Array of the embedding associated with the key.
    """
    embedding = r.hget(key, "embedding")
    return np.frombuffer(embedding, dtype=np.float32)

def search_redis(
    query_vector: np.array,
    return_fields: list = [],
    k: int = 5,
    index_name: str = INDEX_NAME,
    pre_filter: str = None
) -> t.List[dict]:
    """
    Perform KNN search in Redis.

    Args:
        query_vector (np.array): Numpy array of the embedding vector to use in the search.
        return_fields (list, optional): Fields to include in the response. Defaults to [].
        k (int, optional): Count of nearest neighbors to return. Defaults to 5.
        index_name (str, optional): Name of the RediSeatch index. Defaults to INDEX_NAME.
        pre_filter (str, optional): Pre filter to constrain the KNN search with conditions. Defaults to "*" (all).

    Returns:
        list<dict>: List of most similar documents.
    """
    def process_doc(doc) -> dict:
        d = doc.__dict__
        if "vector_score" in d:
            d["vector_score"] = 1 - float(d["vector_score"])
        return d
    # Prepare the Query
    if not pre_filter:
        pre_filter = "*"
    base_query = f'{pre_filter}=>[KNN {k} @embedding $vector AS vector_score]'
    query = (
        Query(base_query)
         .sort_by("vector_score")
         .paging(0, k)
         .return_fields(*return_fields)
         .dialect(2)
    )
    params_dict = {"vector": query_vector.astype(dtype=np.float32).tobytes()}
    # Vector Search in Redis
    results = r.ft(index_name).search(query, params_dict)
    return [process_doc(doc) for doc in results.docs]

### Create the index in Redis and load the data

If there is not a Redis Vector Index already created, we will create one here. Then we will test loading embeddings from Redis and performing simple KNN-style searches.

The [fields](https://github.com/redis/redis-py/blob/master/redis/commands/search/field.py) defined in the RediSearch schema below enable different kinds of search, including [Hybrid KNN queries](https://redis.io/docs/stack/search/reference/vectors/#hybrid-knn-queries). We will demonstrate some examples of this later on.

For a more performant, but slightly more complicated version of indexing large quantities of documents, use Redis pipelines OR checkout an [async HSET implementation in this example](https://github.com/RedisVentures/redis-arXiv-search/blob/main/backend/vecsim_app/load_data.py#L19).

In [24]:
# Define Schema
title = TextField(name="title")
heading = TextField(name="heading")
content = TextField(name="content")
tokens = NumericField(name="tokens")
embedding = VectorField("embedding", 
                        "FLAT", 
                        { "TYPE": "FLOAT32", 
                          "DIM": VECTOR_DIM, 
                          "DISTANCE_METRIC": DISTANCE_METRIC,
                          "INITIAL_CAP": NUM_VECTORS
                        })
fields = [title, heading, content, tokens, embedding]

# delete index if needed
# delete_index()

try:
    # Check if index exists
    r.ft(INDEX_NAME).info()
    print("Index Exists")
except:
    print("Index Does Not Exist")

    # Create index
    print("Creating Index")
    create_index(
        index_name = INDEX_NAME,
        prefix = PREFIX,
        fields = fields
    )

    # Index documents - this may take a few minutes
    docs = df.to_dict("records")
    print(f"Indexing {len(docs)} Documents")
    index_documents(
        prefix = PREFIX,
        embeddings_lookup = document_embeddings,
        documents = docs
    )

    print("Redis Vector Index Created!")

Index Does Not Exist
Creating Index
Indexing 3964 Documents
Redis Vector Index Created!


In [25]:
# Fetch a test embedding vector

# Grab a random Redis key
key = (r.keys())[0]

# Fetch the embedding vector
test_embedding = fetch_embedding(key)
print(test_embedding)
print("Shape:", test_embedding.shape)

[-0.01466752 -0.01099539  0.02891537 ... -0.03766552  0.01990292
  0.01233834]
Shape: (1024,)


In [26]:
# Test performing a single search with our test embedding vector
res = search_redis(
    query_vector = test_embedding,
    return_fields = ["id", "title", "vector_score", "content"]
)

# Print out results
print("Input Document\n", res[0])
print("\nSimilar Documents")
for doc in res[1:]:
    print(doc, "\n")

Input Document
 {'id': 'embedding:930', 'payload': None, 'vector_score': 1.0000001192092896, 'title': "Cycling at the 2020 Summer Olympics – Men's Madison", 'content': 'A madison race is a tag team points race that involves all 16 teams competing at once. One cyclist from each team competes at a time; the two team members can swap at any time by touching (including pushing and handslinging). The distance is 200 laps (50 km). Teams score points in two ways: lapping the field and sprints. A team that gains a lap on the field earns 20 points; one that loses a lap has 20 points deducted. Every 10th lap is a sprint, with the first to finish the lap earning 5 points, second 3 points, third 2 points, and fourth 1 point. The points values are doubled for the final sprint. There is only one round of competition.'}

Similar Documents
{'id': 'embedding:917', 'payload': None, 'vector_score': 0.99694395065308, 'title': "Cycling at the 2020 Summer Olympics – Women's Madison", 'content': 'A madison r

# 2 - Question and Document Similarity using Redis

At the time of question-answering, to answer the user's query we compute the query embedding of the question and use it to find the most similar document sections. This notebook has been updated to use RediSearch to do the vector similarity search.

In [27]:
def find_similar_sections(query: str, n: int = 5, pre_filter: str = None):
    # Embed query with OpenAI
    query_embedding = get_query_embedding(query)
    query_embedding = np.asarray(query_embedding, dtype=np.float64)
    # Perform KNN search in Redis
    similar_sections = search_redis(
        query_vector = query_embedding,
        k = n,
        return_fields = ["title", "heading", "content", "tokens", "vector_score"],
        pre_filter = pre_filter
    )
    return similar_sections

The output scores below represent the similarity between the query and the document. The higher the score, the more similar the document is to the query.

In [28]:
query = "Who won the high jump?"

find_similar_sections(query)

[{'id': 'embedding:236',
  'payload': None,
  'vector_score': 0.398154139519,
  'title': "Athletics at the 2020 Summer Olympics – Men's high jump",
  'heading': 'Summary',
  'content': 'The men\'s high jump event at the 2020 Summer Olympics took place between 30 July and 1 August 2021 at the Olympic Stadium. 33 athletes from 24 nations competed; the total possible number depended on how many nations would use universality places to enter athletes in addition to the 32 qualifying through mark or ranking (no universality places were used in 2021). Italian athlete Gianmarco Tamberi along with Qatari athlete Mutaz Essa Barshim emerged as joint winners of the event following a tie between both of them as they cleared 2.37m. Both Tamberi and Barshim agreed to share the gold medal in a rare instance where the athletes of different nations had agreed to share the same medal in the history of Olympics. Barshim in particular was heard to ask a competition official "Can we have two golds?" in res

# 3 - Add the most relevant document sections to the query prompt

Once we've calculated the most relevant pieces of context, we construct a prompt by simply prepending them to the supplied query. It is helpful to use a query separator to help the model distinguish between separate pieces of text.

In [29]:
MAX_SECTION_LEN = 500
SEPARATOR = "\n* "

tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
SEPARATOR_LEN = len(tokenizer.tokenize(SEPARATOR))

print(f"Context separator contains {SEPARATOR_LEN} tokens")

Context separator contains 3 tokens


In [30]:
PROMPT_HEADER = """Answer the question as truthfully as possible using the provided context, and if the answer is not contained within the text below, say "I don't know."\n\nContext:\n"""

def construct_prompt(question: str, pre_filter: str = None) -> str:
    """
    Construct full prompt based on the input question using
    the document sections indexed in Redis.

    Args:
        question (str): User input question.
        pre_filter (str, optional): Pre filter to constrain the KNN search with conditions.

    Returns:
        str: Full prompt string to pass along to a generative language model.
    """
    chosen_sections = []
    chosen_sections_len = 0
    chosen_sections_indexes = []

    # Search for relevant document sections based on the question
    most_relevant_document_sections = find_similar_sections(
        question,
        n = 5,
        pre_filter = pre_filter
    )

    # Iterate through results
    for document_section in most_relevant_document_sections:
        # Add contexts until we run out of token space
        chosen_sections_len += int(document_section['tokens']) + SEPARATOR_LEN
        if chosen_sections_len > MAX_SECTION_LEN:
            break

        chosen_sections.append(SEPARATOR + document_section['content'].replace("\n", " "))
        chosen_sections_indexes.append(document_section['id'])

    # Useful diagnostic information
    print(f"Selected {len(chosen_sections)} document sections:")
    print("\n".join(chosen_sections_indexes))

    return PROMPT_HEADER + "".join(chosen_sections) + "\n\n Q: " + question + "\n A:"

In [31]:
# Test an example
prompt = construct_prompt("Who won the men's high jump?")

print("===\n", prompt)

Selected 2 document sections:
embedding:236
embedding:284
===
 Answer the question as truthfully as possible using the provided context, and if the answer is not contained within the text below, say "I don't know."

Context:

* The men's high jump event at the 2020 Summer Olympics took place between 30 July and 1 August 2021 at the Olympic Stadium. 33 athletes from 24 nations competed; the total possible number depended on how many nations would use universality places to enter athletes in addition to the 32 qualifying through mark or ranking (no universality places were used in 2021). Italian athlete Gianmarco Tamberi along with Qatari athlete Mutaz Essa Barshim emerged as joint winners of the event following a tie between both of them as they cleared 2.37m. Both Tamberi and Barshim agreed to share the gold medal in a rare instance where the athletes of different nations had agreed to share the same medal in the history of Olympics. Barshim in particular was heard to ask a competition

We have now obtained the document sections that are most relevant to the question. As a final step, let's put it all together to get an answer to the question.

# 4 - Answer the question based on the context

Now that we've retrieved the relevant context and constructed our prompt, we can finally use the Completions API to answer the user's query.

In [32]:
COMPLETIONS_API_PARAMS = {
    # We use temperature of 0.0 because it gives the most predictable, factual answer.
    "temperature": 0.0,
    "max_tokens": 300,
    "engine": COMPLETIONS_MODEL,
}

In [33]:
def answer_question_with_context(
    question: str,
    show_prompt: bool = False,
    pre_filter: str = None
) -> str:
    """
    Answer the question.

    Args:
        query (str): User input question.
        show_prompt (bool, optional): Print out the prompt or not. Defaults to False.
        pre_filter (str, optional): Pre filter to constrain the KNN search with conditions.

    Returns:
        str: Response to the question.
    """
    # Construct prompt with Redis Vector Search
    prompt = construct_prompt(question, pre_filter=pre_filter)

    if show_prompt:
        print(prompt)

    response = openai.Completion.create(
        prompt=prompt,
        **COMPLETIONS_API_PARAMS
    )

    return response["choices"][0]["text"].strip(" \n")

In [34]:
answer_question_with_context("Who won the 2020 Summer Olympics men's high jump?")

Selected 2 document sections:
embedding:236
embedding:222


'Gianmarco Tamberi and Mutaz Essa Barshim emerged as joint winners of the event.'

# 5 - Expanding capability with hybrid search

Hybrid search allows you to combine additional pre-filters to the vector similarity search (KNN) operation. This is useful if you have additional business specs or constraints that need to be imposed in the search.

For example, you could search for relevant documents that were published within a specific calendar Year or for documents that have a particular word in the title. The flexibility here is by design and extends the KNN operation to be more optimal for different use cases.

Here's an example below where we filter for context documents with `Women` explicitly in the title.

In [35]:
pre_filter = "@title:Women"

answer_question_with_context("Who won the long jump at the 2020 Summer Olympics?", pre_filter=pre_filter, show_prompt=True)

Selected 5 document sections:
embedding:276
embedding:338
embedding:300
embedding:254
embedding:281
Answer the question as truthfully as possible using the provided context, and if the answer is not contained within the text below, say "I don't know."

Context:

* The women's long jump event at the 2020 Summer Olympics took place on 1 and 3 August 2021 at the Japan National Stadium. 30 athletes from 23 nations competed. Germany's 2019 world champion Malaika Mihambo moved up from third  to first with her final round jump of 7.00 metres, to win the gold medal. 2012 Olympic champion Brittney Reese of the USA won the silver and Nigeria's Ese Brume the bronze.
* The women's triple jump event at the 2020 Summer Olympics took place between 30 July and 1 August 2021 at the Japan National Stadium.The event was won by Yulimar Rojas of Venezuela: Her winning jump of 15.67 meters also broke the 26-year-old world record.
* The women's 400 metres event at the 2020 Summer Olympics took place from 3 t

"Germany's Malaika Mihambo won the long jump at the 2020 Summer Olympics."

## Conclusion
By combining the Embeddings and Completions APIs, we have created a question-answering model which can answer questions using a large base of additional knowledge stored in Redis. It also understands when it doesn't know the answer!

For this example we used a dataset of Wikipedia articles, but that dataset could be replaced with books, articles, documentation, service manuals, or much much more. **We can't wait to see what you create with GPT-3 models!**

# More Examples

Let's have some fun and try some more examples.

In [36]:
query = "Why was the 2020 Summer Olympics originally postponed?"
answer = answer_question_with_context(query)

print(f"\nQ: {query}\nA: {answer}")

Selected 1 document sections:
embedding:1500

Q: Why was the 2020 Summer Olympics originally postponed?
A: The 2020 Summer Olympics were originally postponed due to the COVID-19 pandemic.


In [37]:
query = "In the 2020 Summer Olympics, how many gold medals did the country which won the most medals win?"
answer = answer_question_with_context(query)

print(f"\nQ: {query}\nA: {answer}")

Selected 2 document sections:
embedding:1722
embedding:254

Q: In the 2020 Summer Olympics, how many gold medals did the country which won the most medals win?
A: The United States won the most gold medals in the 2020 Summer Olympics, with 46.


In [38]:
query = "What was unusual about the men’s shotput competition?"
answer = answer_question_with_context(query)

print(f"\nQ: {query}\nA: {answer}")

Selected 2 document sections:
embedding:242
embedding:240

Q: What was unusual about the men’s shotput competition?
A: For the first time in Olympic history, the same three competitors received the same medals in back-to-back editions of an the same individual event.


In [39]:
query = "In the 2020 Summer Olympics, how many silver medals did Italy win?"
answer = answer_question_with_context(query)

print(f"\nQ: {query}\nA: {answer}")

Selected 3 document sections:
embedding:1876
embedding:1178
embedding:1908

Q: In the 2020 Summer Olympics, how many silver medals did Italy win?
A: 10 silver medals.


Our Q&A model is less prone to hallucinating answers, and has a better sense of what it does or doesn't know. This works when the information isn't contained in the context; when the question is nonsensical; or when the question is theoretically answerable but beyond GPT-3's powers!

In [40]:
query = "What is the total number of medals won by France, multiplied by the number of Taekwondo medals given out to all countries?"
answer = answer_question_with_context(query)

print(f"\nQ: {query}\nA: {answer}")

Selected 4 document sections:
embedding:3047
embedding:2686
embedding:3846
embedding:3318

Q: What is the total number of medals won by France, multiplied by the number of Taekwondo medals given out to all countries?
A: I don't know.


In [41]:
query = "What is the tallest mountain in the world?"
answer = answer_question_with_context(query)

print(f"\nQ: {query}\nA: {answer}")

Selected 5 document sections:
embedding:512
embedding:2653
embedding:3855
embedding:2483
embedding:1979

Q: What is the tallest mountain in the world?
A: I don't know.


In [42]:
query = "Who won the grimblesplatch competition at the 2020 Summer Olympic games?"
answer = answer_question_with_context(query)

print(f"\nQ: {query}\nA: {answer}")

Selected 4 document sections:
embedding:462
embedding:1421
embedding:451
embedding:603

Q: Who won the grimblesplatch competition at the 2020 Summer Olympic games?
A: I don't know.
