# Tutorial: Building a Recency-Aware Question-Answering System with Llama-Index


In this tutorial, we will learn how to build a question-answering system that is aware of the recency of information. This is especially important for questions whose answers may change over time, such as "Who is the current US President?".

We will be using the llama_index package, a powerful tool for building large-scale information retrieval systems.

## Installing necessary packages
Before we start, make sure that you have the llama_index package installed. You can install it using pip:

In [1]:
# !pip install llama-index 

You will also need the qdrant_client package for this tutorial. You can install it via pip as well:

In [2]:
# !pip install qdrant-client

In [3]:

import qdrant_client
from datetime import datetime
from llama_index import GPTVectorStoreIndex

from llama_index.vector_stores.qdrant import QdrantVectorStore
from pathlib import Path
from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.indices.postprocessor import (
    FixedRecencyPostprocessor,
    EmbeddingRecencyPostprocessor,
)
# load documents
from llama_index.storage.storage_context import StorageContext
Path.ls = lambda x: list(x.iterdir())

# Loading Documents

First, let's load our documents. In this example, we will use the News Category Dataset v3. This dataset contains news articles with various fields like `headline`, `category`, `short_description`, `link`, `authors`, and date. We'll load the data, and reformat it to suit our needs.

In [4]:
import json

with open("../data/News_Category_Dataset_v3.json", "r") as f:
    data = [json.loads(k) for k in f.readlines()]
    links = [k.pop("link") for k in data]
    authors = [k.pop("authors") for k in data]
    data = [
        {
            "text": f"{k['headline']} under the category: {k['category']}\n {k['short_description']}",
            "date": k["date"],
        }
        for k in data
    ]

Next, we'll write these documents to text files in a directory. Each document will be written to a text file named by its date.

In [5]:
from pathlib import Path

write_dir = Path("../data/dump").resolve()
write_dir.mkdir(exist_ok=True, parents=True)
for element in data:
    file_path = write_dir / f"{element['date']}.txt"
    with file_path.open("w") as f:
        f.write(element["text"])


This is what we want to build:

This diagram shows that:

1. The user issues a query to the query engine.
2. The query engine, which has been configured with certain postprocessors, performs a search on the vector store based on the query.
3. The query engine then postprocesses the results.
4. The postprocessed results are then returned to the user.

Each arrow represents the direction of data flow. The "Query Engine" box encapsulates the postprocessing step to indicate that it's a part of the query engine's function. This diagram is meant to provide a high-level understanding of the process and does not include all the details involved.


We're using Llama Index to make the Query Engine and Qdrant for our Vector Store



```
+-----------+          +------------------+          +-----------------+
|           |  Query   |                  |  Search  |                 |
|  User     +--------->+  Query Engine    +--------->+     Qdrant      |
|           |          |                  |          |   Vector Store  |
|           |          | (with            |          |                 |
+-----------+          | Postprocessors)  |          +-----------------+
                       |                  |          |                 |
                       |  +------------+  |          |                 |
                       |  |Postprocess |  |  Top K   |                 | 
                       |  |            |  |Candidates|                 |
                       +--|Candidates  |  +<---------+                 |
                       |  +------------+  |                            |
                       |                  |                            |
+-----------+          +------------------+                            |
|           |   Final  |                                               |
|  User     |<---------+                                               |
|           |  Answer  |                                               |
+-----------+                                                          |
                                                                       |
+---------------------------------------------------------------------+
```

## Creating Qdrant Client
We'll be using Qdrant as our vector storage system. Qdrant is a high-performance vector database designed for storing and searching large-scale high-dimensional vectors.

- You can use `:memory:` mode for fast and lightweight experiments. It does not require Qdrant to be deployed anywhere but requires `qdrant-client >= 1.1.1`.

- Otherwise, set the Qdrant instance address with:
  - `uri="http://<host>:<port>"`

- Set the API KEY for Qdrant Cloud:
  - `api_key="<qdrant-api-key>"`

In [6]:
client = qdrant_client.QdrantClient(
    location=":memory:"
)

## Loading Data into Llama-Index
Llama-Index has a simple way to load documents from a directory. We can define a function to get the metadata from a file name, and pass this function to the `SimpleDirectoryReader` class.

In [7]:
def get_file_metadata(file_name: str):
    """Get file metadata."""
    return {"date": Path(file_name).stem}


documents = SimpleDirectoryReader(
    input_files=write_dir.ls(), file_metadata=get_file_metadata
).load_data()

We then parse these documents into nodes and create our QdrantVectorStore

In [8]:
from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.vector_stores.qdrant import QdrantVectorStore

# define service context (wrapper container around current classes)
service_context = ServiceContext.from_defaults(chunk_size_limit=512)

# use node parser in service context to parse into nodes
nodes = service_context.node_parser.get_nodes_from_documents(documents)
vector_store = QdrantVectorStore(client=client, collection_name="huffpostnews")

Next, we'll create our `GPTVectorStoreIndex` from the documents. This operation might take some time as it's creating the index from the documents.

In [9]:
%%time
from llama_index.storage.storage_context import StorageContext

storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = GPTVectorStoreIndex.from_documents(documents, storage_context=storage_context)

CPU times: user 8.53 s, sys: 787 ms, total: 9.32 s
Wall time: 6min 40s


## Defining Recency Postprocessors

Llama-Index allows us to add postprocessors to our query engine. These postprocessors can modify the results of our queries after they are returned from the index. Here, we'll add a recency postprocessor to our query engine. This postprocessor will prioritize recent documents in the results.

We'll define two types of recency postprocessors - `FixedRecencyPostprocessor` and `EmbeddingRecencyPostprocessor`.

In [10]:
recency_postprocessor = FixedRecencyPostprocessor(service_context=service_context)

# Incorporating Reranking using Cohere
Cohere is a powerful AI model that can help us rerank our results. We'll define a `CohereRerank` postprocessor and add it to our query engine.

First, install the cohere package:

In [12]:
# !pip install cohere

Using it in our code:

In [18]:
import os
from llama_index.indices.postprocessor.cohere_rerank import CohereRerank

cohere_rerank = CohereRerank(api_key=os.environ["COHERE_API_KEY"], top_n=10)

## Defining Query Engines
We'll define three query engines for this tutorial - a recency query engine, a reranking query engine, and a combined query engine.

The recency query engine uses the FixedRecencyPostprocessor, the reranking query engine uses the CohereRerank postprocessor, and the combined query engine uses both.

In [19]:
recency_query_engine = index.as_query_engine(
    similarity_top_k=10,
    node_postprocessors=[recency_postprocessor],
) 

In [20]:
reranking_query_engine = index.as_query_engine(
    similarity_top_k=10,
    node_postprocessors=[cohere_rerank],
) 

In [21]:
query_engine = index.as_query_engine(
    similarity_top_k=10,
    node_postprocessors=[cohere_rerank, recency_postprocessor],
)

## Querying the Engine
Finally, we can query our engine. Let's ask it "Who is the current US President?" and see the results from each query engine.

In [None]:
question = "Who is the current US President?"

response = query_engine.query(question)
print("Both:", response)
response = recency_query_engine.query(question)
print("Recency:", response)
response = reranking_query_engine.query(question)
print("Reranking:", response)

Both: 
The current US President is Joe Biden.
Recency: 
The current US President is Joe Biden.
