# Tutorial: Building a Recency-Aware Question-Answering System with Llama-Index


In this tutorial, we will learn how to build a question-answering system that is aware of the recency of information. This is especially important for questions whose answers may change over time, such as "Who is the current US President?".

We will be using the llama_index package, a powerful tool for building large-scale information retrieval systems.

## Installing necessary packages
Before we start, make sure that you have the llama_index package installed. You can install it using pip:

In [1]:
# !pip install llama-index 

You will also need the qdrant_client package for this tutorial. You can install it via pip as well:

In [2]:
# !pip install qdrant-client

In [3]:
# !pip install 'rich[jupyter]'
# from rich import print
%load_ext rich

In [4]:
import qdrant_client
from datetime import datetime
from llama_index import GPTVectorStoreIndex

from llama_index.vector_stores.qdrant import QdrantVectorStore
from pathlib import Path
from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.indices.postprocessor import (
    FixedRecencyPostprocessor,
    EmbeddingRecencyPostprocessor,
)
# load documents
from llama_index.storage.storage_context import StorageContext
Path.ls = lambda x: list(x.iterdir())

# Loading Documents

First, let's load our documents. In this example, we will use the News Category Dataset v3. This dataset contains news articles with various fields like `headline`, `category`, `short_description`, `link`, `authors`, and date. We'll load the data, and reformat it to suit our needs.

In [5]:
import json

with open("../data/News_Category_Dataset_v3.json", "r") as f:
    data = [json.loads(k) for k in f.readlines()]
    links = [k.pop("link") for k in data]
    authors = [k.pop("authors") for k in data]
    data = [
        {
            "text": f"{k['headline']} under the category: {k['category']}\n {k['short_description']}",
            "date": k["date"],
        }
        for k in data
    ]

In [6]:
import datetime
dates = [datetime.datetime.fromisoformat(element["date"]) for element in data]
min(dates), max(dates)

(datetime.datetime(2012, 1, 28, 0, 0), datetime.datetime(2022, 9, 23, 0, 0))

Next, we'll write these documents to text files in a directory. Each document will be written to a text file named by its date.

In [43]:
from pathlib import Path

write_dir = Path("../data/dump").resolve()
write_dir.mkdir(exist_ok=True, parents=True)
for element in data:
    file_path = write_dir / f"{element['date']}.txt"
    with file_path.open("w") as f:
#         print(element["text"])
        f.write(element["text"])


This is what we want to build:

This diagram shows that:

1. The user issues a query to the query engine.
2. The query engine, which has been configured with certain postprocessors, performs a search on the vector store based on the query.
3. The query engine then postprocesses the results.
4. The postprocessed results are then returned to the user.

Each arrow represents the direction of data flow. The "Query Engine" box encapsulates the postprocessing step to indicate that it's a part of the query engine's function. This diagram is meant to provide a high-level understanding of the process and does not include all the details involved.


We're using Llama Index to make the Query Engine and Qdrant for our Vector Store



```
+-----------+          +------------------+          +-----------------+
|           |  Query   |                  |  Search  |                 |
|  User     +--------->+  Query Engine    +--------->+     Qdrant      |
|           |          |                  |          |   Vector Store  |
|           |          | (with            |          |                 |
+-----------+          | Postprocessors)  |          +-----------------+
                       |                  |          |                 |
                       |  +------------+  |          |                 |
                       |  |Postprocess |  |  Top K   |                 | 
                       |  |            |  |Candidates|                 |
                       +--|Candidates  |  +<---------+                 |
                       |  +------------+  |                            |
                       |                  |                            |
+-----------+          +------------------+                            |
|           |   Final  |                                               |
|  User     |<---------+                                               |
|           |  Answer  |                                               |
+-----------+                                                          |
                                                                       |
+---------------------------------------------------------------------+
```

## Creating Qdrant Client
We'll be using Qdrant as our vector storage system. Qdrant is a high-performance vector database designed for storing and searching large-scale high-dimensional vectors.

- You can use `:memory:` mode for fast and lightweight experiments. It does not require Qdrant to be deployed anywhere but requires `qdrant-client >= 1.1.1`.

- Otherwise, set the Qdrant instance address with:
  - `uri="http://<host>:<port>"`

- Set the API KEY for Qdrant Cloud:
  - `api_key="<qdrant-api-key>"`

In [15]:
import os
from qdrant_client import QdrantClient

client = QdrantClient(
    url="https://486a0c22-f395-4a53-8ce3-26851a6e8b84.us-east-1-0.aws.cloud.qdrant.io:6333", 
    api_key=os.environ["QDRANT_API_KEY"],
)

## Loading Data into Llama-Index
Llama-Index has a simple way to load documents from a directory. We can define a function to get the metadata from a file name, and pass this function to the `SimpleDirectoryReader` class.

In [16]:
def get_file_metadata(file_name: str):
    """Get file metadata."""
    return {"date": Path(file_name).stem}


documents = SimpleDirectoryReader(
    input_files=write_dir.ls(), file_metadata=get_file_metadata
).load_data()

We then parse these documents into nodes and create our QdrantVectorStore

In [17]:
from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.vector_stores.qdrant import QdrantVectorStore

# define service context (wrapper container around current classes)
service_context = ServiceContext.from_defaults(chunk_size_limit=512)

# use node parser in service context to parse into nodes
nodes = service_context.node_parser.get_nodes_from_documents(documents)
vector_store = QdrantVectorStore(client=client, collection_name="huffpostnews")

Next, we'll create our `GPTVectorStoreIndex` from the documents. This operation might take some time as it's creating the index from the documents.

In [18]:
%%time
from llama_index.storage.storage_context import StorageContext

storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = GPTVectorStoreIndex.from_documents(documents, storage_context=storage_context)

CPU times: user 13.9 s, sys: 1.48 s, total: 15.4 s
Wall time: 7min 41s


## Defining Recency Postprocessors

Llama-Index allows us to add postprocessors to our query engine. These postprocessors can modify the results of our queries after they are returned from the index. Here, we'll add a recency postprocessor to our query engine. This postprocessor will prioritize recent documents in the results.

We'll define two types of recency postprocessors - `FixedRecencyPostprocessor` and `EmbeddingRecencyPostprocessor`.

In [19]:
recency_postprocessor = FixedRecencyPostprocessor(service_context=service_context)

# Incorporating Reranking using Cohere
Cohere is a powerful AI model that can help us rerank our results. We'll define a `CohereRerank` postprocessor and add it to our query engine.

First, install the cohere package:

In [20]:
# !pip install cohere

Using it in our code:

In [21]:
import os
from llama_index.indices.postprocessor.cohere_rerank import CohereRerank

## Defining Query Engines
We'll define four query engines for this tutorial: 
1. Just the Vector Store i.e. Qdrant here
1. A recency query engine
1. A reranking query engine
1. And a combined query engine.

The recency query engine uses the `FixedRecencyPostprocessor`, the reranking query engine uses the `CohereRerank` postprocessor, and the combined query engine uses both.

In [22]:
from pydantic import BaseModel
from typing import Optional

class Interaction(BaseModel):
    question: str
    answer: Optional[str] = None
    response: Optional[object] = None

In [23]:
top_k = 10

In [24]:
index_query_engine = index.as_query_engine(
    similarity_top_k=top_k,
) 

In [25]:
recency_query_engine = index.as_query_engine(
    similarity_top_k=top_k,
    node_postprocessors=[recency_postprocessor],
) 

In [26]:
cohere_rerank = CohereRerank(api_key=os.environ["COHERE_API_KEY"], top_n=top_k)
reranking_query_engine = index.as_query_engine(
    similarity_top_k=top_k,
    node_postprocessors=[cohere_rerank],
) 

In [27]:
query_engine = index.as_query_engine(
    similarity_top_k=top_k,
    node_postprocessors=[cohere_rerank, recency_postprocessor],
)

## Querying the Engine
Finally, we can query our engine. Let's ask it "Who is the current US President?" and see the results from each query engine.

In [45]:
question = "Who is the current US President?"
response = index_query_engine.query(question)
print(response)


The current US President is Joe Biden.


The `response` object has a few interesting attributes which help us quickly debug and understand what happened in each of our steps: 
1. What source nodes (similar to Document Chunks in Langchain) were used to answer the question
2. What `extra_info` does the index have which we can use? This could also be sent as a payload to Qdrant to filter on (via epoch time) -- but Llama Index does not

Let's unpack that a bit, and we'll use what we learn from `response` to improve our understanding of the query engines and post processors themselves. 

In [48]:
type(response.source_nodes), len(response.source_nodes), response.source_nodes[0]

(list,
 10,
 NodeWithScore(node=Node(text='date: 2021-06-13\n\nPresident Biden Meets With Queen Elizabeth At Windsor Castle under the category: WORLD NEWS\n Biden, who is in the U.K. for the Group of Seven leaders summit, is the 13th American president\xa0to meet the monarch.', doc_id='13340f61-1f07-493b-8475-cffac69797b5', embedding=None, doc_hash='83f86814916cf2880a3bc224ad557ee0e0cc90087aadf2f87784bc786f305129', extra_info={'date': '2021-06-13'}, node_info=None, relationships={<DocumentRelationship.SOURCE: '1'>: 'd3d9d1b6-10d5-450d-8f25-cca3f6c1ec00'}), score=0.8132833))

Note that `10` which is the topk parameter we set. This confirms that we got back the 10 documents most similar to the question (or more correct: 10 nearest neighbours to the question) and a confidence score. 

This can be made more human friendly perhaps?

In [55]:
print(response.get_formatted_sources()[:318])

> Source (Doc id: 13340f61-1f07-493b-8475-cffac69797b5): date: 2021-06-13

date: 2021-06-13

President Biden Meets With Queen Elizabeth At Windsor Castle ...

> Source (Doc id: 3fda0831-3839-4c11-86f4-b1935e17b709): date: 2016-11-13

date: 2016-11-13

Public Diplomacy in the Trump Era under the category: WORLDPO...




Let's check what is stored in the `extra_info` attribute

In [52]:
response.extra_info

{'13340f61-1f07-493b-8475-cffac69797b5': {'date': '2021-06-13'},
 '3fda0831-3839-4c11-86f4-b1935e17b709': {'date': '2016-11-13'},
 '0798b54e-18b0-48d1-a16b-c9566c2f62c3': {'date': '2014-08-02'},
 '5a1207be-9c1b-466e-b122-3912787a26e6': {'date': '2017-11-26'},
 '006bcdd0-c86e-4b4a-b72f-72bbe75b4870': {'date': '2015-02-22'},
 '288548e6-877e-43cc-97bf-45fe6a50cb16': {'date': '2017-05-14'},
 'db90cb93-aeb5-4dba-a9a5-3129d88bdc1b': {'date': '2018-11-10'},
 '2935dbb7-da04-48e5-b84b-4023a18dc9c6': {'date': '2016-05-03'},
 '9db09aaa-0284-4f8a-976d-68d236ab3c8f': {'date': '2016-10-03'},
 'df33aa4d-191e-40cd-badd-3b250f1d7903': {'date': '2016-06-05'}}

This has mostly a `date` key-value as a string against the `doc id`

In [None]:
def compare_responses(question: str)->None:
    responses = {}
    print("*"*len(question))
    print(f"Question: {question}")
    print("*"*len(question))
    response = index_query_engine.query(question)
    responses["qdrant"] = response
    print("No Postprocessor, Qdrant:", response)
    response = recency_query_engine.query(question)
    responses["recency"] = response
    print("Recency:", response)
    response = reranking_query_engine.query(question)
    responses["reranking"] = response
    print("Reranking:", response)
    response = query_engine.query(question)
    responses["both"] = response
    print("Both:", response)
    return responses

responses = compare_responses(question)
"Correct answer is Joe Biden"

In [32]:
question = "When did Mughals invade India?"
responses = compare_responses(question)

print("**Correct answer** is Babur's victory in Battle of Panipat in April 1526")
# responses["both"].source_nodes, responses["qdrant"].source_nodes

******************************
Question: When did Mughals invade India?
******************************
No Postprocessor, Qdrant: 
The Mughals invaded India in 1526.
Recency: 
The Mughal Empire began in 1526 when Babur, a Central Asian ruler, invaded India and established the Mughal dynasty.
Reranking: 
The Mughals invaded India in the 16th century.
Both: 
The Mughal Empire began in 1526 when Babur, a Central Asian ruler, invaded India and established the Mughal dynasty.
**Correct answer** is Babur's victory in Battle of Panipat in April 1526


In [33]:
question = "Who won the latest Superbowl?"
responses = compare_responses(question)
"Correct answer is Kansas Chiefs in May 2023, but the data cutoff is September 2022 for which Los Angeles Rams is the correct answer"

*****************************
Question: Who won the latest Superbowl?
*****************************
No Postprocessor, Qdrant: 
The latest Super Bowl was Super Bowl LVI, which took place on February 13, 2022. The winner of Super Bowl LVI has not yet been determined.
Recency: 
The Los Angeles Rams won the latest Super Bowl.
Reranking: 
The latest Super Bowl was Super Bowl LVI, which was held on February 13, 2022. The Los Angeles Rams defeated the Cincinnati Bengals by a score of 31-23.
Both: 
The Los Angeles Rams won the latest Super Bowl.


'Correct answer is Kansas Chiefs in May 2023, but the data cutoff is September 2022 for which Los Angeles Rams is the correct answer'

In [34]:
question = "When did Kargil war happen? Why is that significant?"
responses = compare_responses(question)
"1999, 2 nuclear countries went to war for the first time"

****************************************************
Question: When did Kargil war happen? Why is that significant?
****************************************************
No Postprocessor, Qdrant: 
The Kargil War happened in 1999. It was significant because it was the first major armed conflict between India and Pakistan since the 1971 Indo-Pakistani War, and it was fought in the Kargil district of Kashmir and elsewhere along the Line of Control (LOC). The conflict is also noted for being one of the first instances of high-altitude warfare in the world, as the two sides fought in the Himalayan mountains. The war resulted in a strategic victory for India, as it successfully managed to push back the Pakistani forces from the Kargil district.
Recency: 
The Kargil War happened in 1999. It was significant because it was the first major armed conflict between India and Pakistan since the 1971 Indo-Pakistani War. The conflict resulted in a strategic victory for India, and it is seen as a major 

'1999, 2 nuclear countries went to war for the first time'

In [56]:
question = "Who is the Indian Prime Minister in January 2014?"
responses = compare_responses(question)

*************************************************
Question: Who is the Indian Prime Minister in January 2014?
*************************************************
No Postprocessor, Qdrant: 
The Indian Prime Minister in January 2014 was Manmohan Singh.
Recency: 
The Indian Prime Minister in January 2014 was Narendra Modi.
Reranking: 
The Indian Prime Minister in January 2014 was Manmohan Singh.
Both: 
The Indian Prime Minister in January 2014 was Narendra Modi.


In this question: "Who is the Indian Prime Minister in January 2014?", the Qdrant and reranking engines correctly respond with "Manmohan Singh", while the recency and combined engines incorrectly respond with "Narendra Modi". 

This demonstrates how the use of recency postprocessing can lead to incorrect results if the recency information contradicts the correct answer. Let's do quick check to see if we can understand why this happened: 