# Step-back prompting in Retrieval Augmented Generation

Step-back prompting is a technique of asking an LLM to abstract over the original question. It was introduced by DeepMind and is thought to improve the performance in benchmarks.

![](step-back-prompting.png)

It turns out, step-back prompting might be also used with RAG. Let's implement it in Langchain, using Cohere embeddings, OpenAI LLM and Qdrant vector store.

In [None]:
!pip install qdrant-client langchain datasets cohere openai

## Dataset indexing

We are going to use a [mugithi/ubuntu_question_answer](https://huggingface.co/datasets/mugithi/ubuntu_question_answer) dataset which is a set of questions and corresponding answers related to Ubuntu. It is going to act as out knowledge base, so we need to index it into a vector store. Let's download it first.

In [None]:
from datasets import load_dataset

dataset = load_dataset("mugithi/ubuntu_question_answer")
dataset

In [None]:
import pandas as pd

train_df = pd.DataFrame(dataset["train"])
train_df.head(n=10)

We are going to create embeddings from question and corresponding answer combined together. They will be also stored separately in the document metadata, so we can use them later on if needed. Let's create a template for the text, and then process the dataset to end up with a list of texts and corresponding metadata dictionaries.

In [None]:
text_pattern = """
Example question: {question}
Example answer: {answer}
"""

texts, metadatas = [], []
for entry in train_df.itertuples():
    text = text_pattern.format(question=entry.question, answer=entry.answer)

    texts.append(text.strip())
    metadatas.append({"question": entry.question, "answer": entry.answer})

Our dataset is ready, so we can index it into a vector store with selected embedding model. In our case, we are going to use Qdrant and multilingual Cohere embeddings, so we can ask questions in multiple languages later on.

In [None]:
from langchain.embeddings.cohere import CohereEmbeddings
from langchain.vectorstores import Qdrant

embeddings = CohereEmbeddings(model="embed-multilingual-v3.0")
facts_store = Qdrant.from_texts(
    texts, embeddings, metadatas,
    location=userdata.get("QDRANT_URL"),
    api_key=userdata.get("QDRANT_API_KEY"),
    collection_name="facts",
    force_recreate=True,
)

Our knowledge base is now built, so we can freely ask questions to it. Let's try it out.

In [None]:
facts_store.similarity_search("How do I format the disk?")

## Step-back prompting

Step-back prompting is based on few-shot prompting. We cheat the LLM with a made up interactions history and force it to produce the abstract question for the user question. For that, we need to create a set of question-question examples.

In [None]:
# All examples come from the original paper on step-back prompting
# Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
# See: https://arxiv.org/abs/2310.06117

examples = [
    {
        "input": "Estella Leopold went to which school between Aug 1954 and Nov 1954?",
        "output": "What was Estella Leopold's history?",
    },
    {
        "input": "Could the members of The Police perform lawful arrests?",
        "output": "What can the members of The Police do?",
    },
    {
        "input": "At year saw the creation of the region where the county of Hertfordshire is located?",
        "output": "which region is the county of Hertfordshire located?"
    },
]

Created examples are going to be used in the prompts we send to LLM. Let's create a prompt template for that. Its goal will be to get the step-back prompt, given the original question.

In [None]:
from langchain.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate

single_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}"),
    ]
)

few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=single_prompt,
    examples=examples,
)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are simplifying the user questions, so they are more general and easier to answer. Use the following examples:"),
    few_shot_prompt,
    ("user", "{input}"),
])


Created prompt template is now ready to be used in the LLM. Let's create a runnable pipeline for that and then launch it on the same question as before.

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.schema.output_parser import StrOutputParser

chat_model = ChatOpenAI(temperature=0)
question_generator = prompt | chat_model | StrOutputParser()

In [None]:
question_generator.invoke({"input": "How do I format the disk?"})

## Step-back Retrieval Augmented Generation

Step-back prompting is just another prompt engineering strategy, so it might be also integrated into RAG. That effectively ends up with two context attached to each prompt. Let's build another prompt template that will be parametrized with the original question, context and step-back context.

In [None]:
from langchain_core.runnables import RunnablePassthrough, RunnableLambda

facts_retriever = facts_store.as_retriever()

rag_prompt = ChatPromptTemplate.from_template("""
Answer the question based only on the provided context and step-back context. Do not make up the answer if it's not given, but answer "I don't know".
Context, step-back context and question are enclosed with HTML-like tags.

<context>
{context}
</context>

<step-back-context>
{step_back_context}
</step-back-context>

<question>{input}</question>
""")

extract_input = RunnableLambda(lambda x: x["input"])
step_back_rag = (
    {
        "context": extract_input | facts_retriever,
        "step_back_context": question_generator | facts_retriever,
        "input": extract_input
    }
    | rag_prompt
    | chat_model
    | StrOutputParser()
)

In [None]:
step_back_rag.invoke({"input": "What is wayland used for?"})

Created pipeline integrates step-back prompting into RAG. Since it's just another prompt engineering strategy, it can be combined with other strategies to improve the performance even further. Please remember that **prompt engineering does not fix the retrieval process**. Choosing a right embedding model and making sure it works properly is still a key to success.