# Tutorial: 
# Qdrant and Llama-Index: A new way to keep your Q&A systems up-to-date 

Ever found yourself frustrated with an answer engine that's stuck in the past? As our world rapidly evolves, the accuracy of information changes accordingly. Traditional models can become outdated, providing answers that were once accurate but are now obsolete. The cost of outdated knowledge can be high - misinforming users, impacting decision-making, and ultimately undermining trust in your system.

That's where Qdrant and Llama-Index come in, revolutionizing the way answer engines stay relevant. These cutting-edge tools offer a fresh, timely perspective, ensuring your answer engine adapts to the relentless pace of information change. This tutorial will guide you in building an answer engine that stays up-to-the-minute. By mastering Qdrant and Llama-Index, you can transform your applications from static knowledge repositories into dynamic, adaptable knowledge machines. Whether you're a seasoned data scientist or an AI enthusiast, join us on this learning journey - the future of answer engines is here, and it's time to embrace it.

## Tools

We will be using the llama_index package, a powerful tool for building large-scale information retrieval systems.
It is designed to work with language models like OpenAI's GPT-3. It also supports the use of postprocessors, which can modify the results of a query after they have been returned from the index. We demonstrate the impact of two interesting postprocessors.

Qdrant is a high-performance vector database designed for storing and searching large-scale high-dimensional vectors. In the context of this notebook, we use Qdrant as our vector storage system.


## Before you begin, API Keys:

Dependency: 
1. OpenAI keys for LLM. You can get them here: https://platform.openai.com/account/api-keys
2. Cohere keys for Reranking. You can get get them here: https://dashboard.cohere.ai/api-keys

If you'd like to use Qdrant Cloud, that needs: 
1. URL
2. API Keys

In [1]:
import os

def check_environment_keys():
    """
    Utility Function that you've the NECESSARY Keys
    """
    if os.environ.get("OPENAI_API_KEY") is None:
        raise ValueError("OPENAI_API_KEY cannot be None. Set the key using os.environ['OPENAI_API_KEY']='sk-xxx'")
    if os.environ.get("COHERE_API_KEY") is None:
        raise ValueError("COHERE_API_KEY cannot be None. Set the key using os.environ['COHERE_API_KEY']='xxx'")
    if os.environ.get("QDRANT_API_KEY") is None:
        print("[Optional] If you want to use the Qdrant Cloud, please get the Qdrant Cloud API Keys and URL")
        
check_environment_keys()

[Optional] If you want to use the Qdrant Cloud, please get the Qdrant Cloud API Keys and URL


## Installing necessary packages
Before we start, make sure that you have the `llama_index` package installed. You can install it using pip:

In [2]:
# !pip install llama-index

You will also need the `qdrant_client` package for this tutorial. You can install it via pip as well:

In [3]:
# !pip install -U qdrant-client

In [4]:
# Optionally: Rich to make error messages, stack traces easier to read
# !pip install 'rich[jupyter]'
%load_ext rich

In [5]:
import qdrant_client
import datetime
from llama_index.vector_stores.qdrant import QdrantVectorStore
from pathlib import Path
from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.indices.postprocessor import FixedRecencyPostprocessor

import json
from llama_index.indices.postprocessor.cohere_rerank import CohereRerank
import os
from qdrant_client import QdrantClient

from llama_index import SimpleDirectoryReader, ServiceContext
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.storage.storage_context import StorageContext


Path.ls = lambda x: list(x.iterdir())

# Loading Documents

First, let's load our documents. In this example, we will use the News Category Dataset v3. This dataset contains news articles with various fields like `headline`, `category`, `short_description`, `link`, `authors`, and date. We'll load the data, and reformat it to suit our needs.

In [6]:
with open("../data/News_Category_Dataset_v3.json", "r") as f:
    data = [json.loads(k) for k in f.readlines()]
    links = [k.pop("link") for k in data]
    authors = [k.pop("authors") for k in data]
    data = [
        {
            "text": f"{k['headline']} under the category: {k['category']}\n {k['short_description']}",
            "date": k["date"],
        }
        for k in data
    ]

In [7]:
dates = [datetime.datetime.fromisoformat(element["date"]) for element in data]
min(dates), max(dates)

(datetime.datetime(2012, 1, 28, 0, 0), datetime.datetime(2022, 9, 23, 0, 0))

Next, we'll write these documents to text files in a directory. Each document will be written to a text file named by its date.

In [8]:
write_dir = Path("../data/dump").resolve()
write_dir.mkdir(exist_ok=True, parents=True)
for element in data:
    file_path = write_dir / f"{element['date']}.txt"
    with file_path.open("w") as f:
        #         print(element["text"])
        f.write(element["text"])

## What do we want to build?

This diagram shows that:

1. The user issues a query to the query engine.
2. The query engine, which has been configured with certain postprocessors, performs a search on the vector store based on the query.
3. The query engine then postprocesses the results.
4. The postprocessed results are then returned to the user.

Each arrow represents the direction of data flow. The "Query Engine" box encapsulates the postprocessing step to indicate that it's a part of the query engine's function. This diagram is meant to provide a high-level understanding of the process and does not include all the details involved.


We're using Llama Index to make the Query Engine and Qdrant for our Vector Store

![](../images/SetupFocus.png)

## Creating Qdrant Client
We'll be using Qdrant as our vector storage system. Qdrant is a high-performance vector database designed for storing and searching large-scale high-dimensional vectors.

### Local Qdrant Server/Docker + Cloud Instructions
- If you're running a local Qdrant instance with Docker, use `uri`:
  - `uri="http://<host>:<port>"`
  
Here I'll be using the cloud, so I am using the url set to my cloud instance

- Set the API KEY for Qdrant Cloud:
  - `api_key="<qdrant-api-key>"`
  - `url`

### Memory

- You can use `:memory:` mode for fast and lightweight experiments. It does not require Qdrant to be deployed anywhere.

In [9]:
client = QdrantClient(":memory:")

## Loading Data into Llama-Index
Llama-Index has a simple way to load documents from a directory. We can define a function to get the metadata from a file name, and pass this function to the `SimpleDirectoryReader` class.

In [10]:
def get_file_metadata(file_name: str):
    """Get file metadata."""
    return {"date": Path(file_name).stem}


documents = SimpleDirectoryReader(
    input_files=write_dir.ls(), file_metadata=get_file_metadata
).load_data()

This `date` key is *necessary* for the Recency Postprocessor that we'll use later.

We then parse these documents into nodes and create our QdrantVectorStore

In [11]:
# define service context (wrapper container around current classes)
service_context = ServiceContext.from_defaults(chunk_size_limit=512)

vector_store = QdrantVectorStore(client=client, collection_name="huffpostnews")

Next, we'll create our `GPTVectorStoreIndex` from the documents. This operation might take some time as it's creating the index from the documents.

In [12]:
%%time
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = GPTVectorStoreIndex.from_documents(documents, storage_context=storage_context)

CPU times: user 8.89 s, sys: 953 ms, total: 9.85 s
Wall time: 9min 4s


## Making our first Query 

We have made an index. But as we saw in the diagram, we also need some added functionality to do 3 things:

1. Retrieval
    - Convert the text query into embedding
    - Find the most similar documents
 
 
2. Synthesis
    - The LLM (here, OpenAI) texts the question, similar documents and a prompt to give you an answer

In [13]:
query_engine = index.as_query_engine(similarity_top_k=10)

In [14]:
response = query_engine.query("Who is the US President?")
print(response)


The US President is Joe Biden.


In [15]:
response = query_engine.query("Who won the EPL?")
print(response)


Arsenal won the English Premier League in 2013.


This answer is wrong and outdated: Manchester United F.C. won the 2013 club. But why is this?  

This happens often because the embedding similarity can pay more attention to EPL, Arsenal than time. 

Let's dig a bit deeper and see the top 5 results:  

In [16]:
k = 10
print(f"Top {k} results")

selected_nodes = response.source_nodes[:k]

for nodewithscore in selected_nodes:
    print(f"Score: {nodewithscore.score}. Sentence: {nodewithscore.node.text}")

Top 10 results
Score: 0.8131673462272293. Sentence: Manchester United Dumped From FA Cup By Swansea City's Wilfried Bony (VIDEO/PHOTOS) under the category: SPORTS
 United was made to pay soon after, when Alejandro Pozuelo played in Routledge, who produced a delightful lob over Lindegaard
Score: 0.8116888515410747. Sentence: Manchester United Defeat Liverpool, 2-1: Robin Van Persie Penalty Decides Match (VIDEO) under the category: SPORTS
 Buoyed by a rousing rendition of ‘You’ll Never Walk Alone’ in memory of the 96 football fans who died at Hillsborough 23
Score: 0.8076903703147911. Sentence: Arsenal Routs Norwich City 4-1 To Extend Lead Atop Premier League (VIDEO/PHOTOS) under the category: SPORTS
 STORY CONTINUES BELOW Follow GOAL.COM on Twitter. Norwich continued to press forward after the interval as Snodgrass and
Score: 0.8047559328099497. Sentence: Arsenal Eges Tottenham 1-0: Olivier Giroud Scores Only Goal In North London Derby (VIDEO/PHOTOS) under the category: SPORTS
 Villas-B

We notice that the Retrieval was indeed right with Manchester United being mentioned in the first few responses. Even with that Arsenal gets selected during the Synthesis step by LLM. 
Llama Index has powerful ways to compose Retrieval and Ranking steps. 

# Adding Postprocessor

The intention behind this is to improve answer quality. Let's see if we can use Postprocessors to improve answer quality by using two approaches: 
1. Selecting the most recent node
2. Reranking using a different model, here Cohere Reranker

![](../images/RankFocus.png)

## Defining Recency Postprocessors

Llama-Index allows us to add postprocessors to our query engine. These postprocessors can modify the results of our queries after they are returned from the index. Here, we'll add a recency postprocessor to our query engine. This postprocessor will prioritize recent documents in the results.

We'll define a single type of recency postprocessors - `FixedRecencyPostprocessor`.

Llama Index also has a `EmbeddingRecencyPostprocessor`

In [17]:
recency_postprocessor = FixedRecencyPostprocessor(service_context=service_context)

## Incorporating Reranking using Cohere
Cohere is a powerful AI model that can help us rerank our results. We'll define a `CohereRerank` postprocessor and add it to our query engine. This can improve the search results since it's working on a smaller dataset. 

First, install the cohere package. You will also need a [Cohere API Key](https://docs.cohere.com/reference/key) and add to your environment. One of the ways to do can do so is this:  

In [18]:
# !pip install cohere

Using it in our code:

## Defining Query Engines
We'll define four query engines for this tutorial: 
1. Just the Vector Store i.e. Qdrant here
1. A recency query engine
1. A reranking query engine
1. And a combined query engine.

The recency query engine uses the `FixedRecencyPostprocessor`, the reranking query engine uses the `CohereRerank` postprocessor, and the combined query engine uses both.

In [20]:
top_k = 10  # set one, reuse from now on, ensures consistency

In [21]:
index_query_engine = index.as_query_engine(
    similarity_top_k=top_k,
)

In [22]:
recency_query_engine = index.as_query_engine(
    similarity_top_k=top_k,
    node_postprocessors=[recency_postprocessor],
)

In [23]:
cohere_rerank = CohereRerank(api_key=os.environ["COHERE_API_KEY"], top_n=top_k)
reranking_query_engine = index.as_query_engine(
    similarity_top_k=top_k,
    node_postprocessors=[cohere_rerank],
)

In [24]:
query_engine = index.as_query_engine(
    similarity_top_k=top_k,
    node_postprocessors=[cohere_rerank, recency_postprocessor],
)

## Querying the Engine
Finally, we can query our engine. Let's ask it "Who is the current US President?" and see the results from each query engine.

In [25]:
question = "Who is the current US President?"
response = index_query_engine.query(question)
print(response)


The current US President is Joe Biden.


The `response` object has a few interesting attributes which help us quickly debug and understand what happened in each of our steps: 
1. What source nodes (similar to Document Chunks in Langchain) were used to answer the question
2. What `extra_info` does the index have which we can use? This could also be sent as a payload to Qdrant to filter on (via epoch time) -- but Llama Index does not

Let's unpack that a bit, and we'll use what we learn from `response` to improve our understanding of the query engines and post processors themselves. 

Note that `10` which is the topk parameter we set. This confirms that we got back the 10 documents most similar to the question (or more correct: 10 nearest neighbours to the question) and a confidence score. 

This can be made more human friendly perhaps?

In [26]:
print(response.get_formatted_sources()[:318])

> Source (Doc id: 2c0719d1-219f-4fe0-b182-60917d44c718): date: 2021-06-13

President Biden Meets With Queen Elizabeth At Windsor Castle under the category...

> Source (Doc id: dc52cea3-6075-4375-a760-3c12f3c113ee): date: 2016-11-13

Public Diplomacy in the Trump Era under the category: WORLDPOST
 ...

> Source (Doc 


Let's check what is stored in the `extra_info` attribute. 

In [27]:
response.extra_info

{'2c0719d1-219f-4fe0-b182-60917d44c718': {'date': '2021-06-13'},
 'dc52cea3-6075-4375-a760-3c12f3c113ee': {'date': '2016-11-13'},
 'ff3e371a-0eea-4ff9-b479-7459e33b863a': {'date': '2014-08-02'},
 '5db1926f-44d8-4c8b-aba7-f1d7b80810b4': {'date': '2017-11-26'},
 'ceb3b87f-2d85-4898-8a46-56475b9707ed': {'date': '2015-02-22'},
 'c4e37110-028e-4fba-bf9f-3ea7ccff2e52': {'date': '2017-05-14'},
 '5ae2d94d-a97f-48b2-85e3-f4db9ea75123': {'date': '2018-11-10'},
 'e89ddb11-2693-4ce3-ba72-ce6798402a86': {'date': '2016-05-03'},
 '0d6c797a-285c-4d99-8739-b497270df691': {'date': '2016-10-03'},
 '29e5beeb-efcb-4664-aeda-ce4b1f51682e': {'date': '2016-06-05'}}

This has a `date` key-value as a string against the `doc id`

Let's setup some tools to have a question, answer and the responses from the index engine in the same object - this will come handy in a bit for explaining a wrong answer. 

In [28]:
class QAInfo:
    """This class is used to store the question, correct answer and responses from different query engines."""

    def __init__(self, question: str, correct_answer: str):
        self.question = question
        self.correct_answer = correct_answer
        self.responses = {}

    def add_response(self, engine: str, response: str):
        # This method is used to add the response of a query engine to the responses dictionary.
        self.responses[engine] = response


def compare_responses(qa_info: QAInfo, query_engines: dict):
    """This function takes in a QAInfo object and a dictionary of query engines, and runs the question through each query engine.
    The responses from each engine are added to the QAInfo object."""
    print("*" * len(qa_info.question))
    print(f"Question: {qa_info.question}")
    print("*" * len(qa_info.question))

    for engine_name, engine in query_engines.items():
        response = engine.query(qa_info.question)
        qa_info.add_response(engine_name, response)
        print(f"{engine_name}:", response)

    return qa_info


query_engines = {
    "qdrant": index_query_engine,
    "recency": recency_query_engine,
    "reranking": reranking_query_engine,
    "both": query_engine,
}

In [29]:
question = "What is the gas price?"
correct_answer = "Less than $4 in Aug 22"  # This would normally be determined programmatically.
qa_info = QAInfo(question, correct_answer)
qa_info = compare_responses(qa_info, query_engines)

**********************
Question: What is the gas price?
**********************
qdrant: 
The gas price is not specified in the context information.
recency: 
The gas price is just below $4.
reranking: 
The gas price is not specified in the context information.
both: 
The gas price is just below $4.


We see that the Recency component plays a key role in getting this right!

That looks promising. Let's try this with a question which has a specific year in it: 

In [30]:
question = "Who was the Indian Prime Minister in January 2014?"
correct_answer = "Manmohan Singh"  # This would normally be determined programmatically.
qa_info = QAInfo(question, correct_answer)
qa_info = compare_responses(qa_info, query_engines)
print("Correct answer is", qa_info.correct_answer)

**************************************************
Question: Who was the Indian Prime Minister in January 2014?
**************************************************
qdrant: 
The Indian Prime Minister in January 2014 was Manmohan Singh.
recency: 
The Indian Prime Minister in January 2014 was Narendra Modi.
reranking: 
The Indian Prime Minister in January 2014 was Manmohan Singh.
both: 
The Indian Prime Minister in January 2014 was Narendra Modi.
Correct answer is Manmohan Singh


### Observation

In this question: "Who is the Indian Prime Minister in January 2014?", the Qdrant and reranking engines correctly respond with "Manmohan Singh", while the recency and combined engines incorrectly respond with "Narendra Modi". 

This demonstrates how the use of recency postprocessing can lead to incorrect results if the recency information contradicts the correct answer. Let's try another example: 

In [31]:
question = "Who is the EPL Champion?"
correct_answer = "Manchester City"  # This would normally be determined programmatically.
qa_info = QAInfo(question, correct_answer)
qa_info = compare_responses(qa_info, query_engines)
print("Correct answer is", qa_info.correct_answer)

************************
Question: Who is the EPL Champion?
************************
qdrant: 
It is not possible to answer this question with the given context information.
recency: 
The EPL Champion is not specified in the context information provided.
reranking: 
The EPL Champion is not specified in the context information.
both: 
The EPL Champion is not specified in the context information provided.
Correct answer is Manchester City


In [32]:
qa_info.responses["qdrant"].source_nodes[0].node.text

'Mighty Underdogs of the European Chess Club Cup under the category: SPORTS\n Sporting fans love long shots and unexpected triumphs. The first and the last European Chess Club Cups provide proof.'

In [33]:
qa_info.responses["reranking"].source_nodes[0].node.text

'Manchester United Defeat Liverpool, 2-1: Robin Van Persie Penalty Decides Match (VIDEO) under the category: SPORTS\n Buoyed by a rousing rendition of ‘You’ll Never Walk Alone’ in memory of the 96 football fans who died at Hillsborough 23'

In [34]:
qa_info.responses["recency"].source_nodes[0].node.text

'Las Vegas Aces Win First WNBA Title, Chelsea Gray Named MVP under the category: SPORTS\n Las Vegas never had a professional sports champion — until Sunday.'

In [35]:
qa_info.responses["both"].source_nodes[0].node.text

'Las Vegas Aces Win First WNBA Title, Chelsea Gray Named MVP under the category: SPORTS\n Las Vegas never had a professional sports champion — until Sunday.'

## Observation

We notice that receny did have a node which selected a node with the correct answer ("Manchester United"). The synthesis step ignored that and contributed to our quest being unfulfilled. Maybe we'll take another stab at this in a future tutorial perhaps!


# Recap

In this notebook, we demonstrated how to build a recency-aware question-answering system using Llama-Index and Qdrant. We loaded a news dataset, created a Qdrant client, and loaded our data into a Llama-Index. We defined a recency postprocessor and a Cohere reranking postprocessor, and used these to create various query engines. We then queried these engines with various questions and compared their responses.

Through this exercise, we learned that the use of recency postprocessing can lead to incorrect results if the recency information contradicts the correct answer. However, the use of a reranking postprocessor like Cohere can help to improve the accuracy of the results.