# Tutorial: 
# Qdrant and Llama-Index: A new way to keep your Q&A systems up-to-date 

Ever found yourself frustrated with an answer engine that's stuck in the past? As our world rapidly evolves, the accuracy of information changes accordingly. Traditional models can become outdated, providing answers that were once accurate but are now obsolete. The cost of outdated knowledge can be high - misinforming users, impacting decision-making, and ultimately undermining trust in your system.

That's where Qdrant and Llama-Index come in, ensuring that answer engines stay relevant. These cutting-edge tools offer a fresh, timely perspective, ensuring your answer engine adapts to the relentless pace of information change.

This tutorial will demonstrate **how to build an answer engine that stays up-to-the-minute**. 

By mastering Qdrant and Llama-Index, you can transform your applications from static knowledge repositories into dynamic, adaptable knowledge machines. Whether you're a seasoned data scientist or an AI enthusiast, join us on this learning journey - the future of answer engines is here, and it's time to embrace it.

The tutorial covers setting up the necessary tools and API keys, loading documents, and adding postprocessors to improve the relevance of search results. The News Category Dataset v3 is used as an example dataset.

# Table of Contents

- Setting Up
    - Tools
    - Before you begin, API Keys
    - Installing & Importing necessary packages
- Loading Documents
  - What do we want to build?
  - Creating Qdrant Client
    - Local Qdrant Server/Docker + Cloud Instructions
    - Memory
  - Loading Data into Llama-Index
  - Making our first Query
- Adding Postprocessor
  - Defining Recency Postprocessors
  - Incorporating Reranking using Cohere
  - Defining Query Engines
  - Querying the Engine
    - Observation
    - Observation
- Recap

## Setting Up

### Tools

We will be using: 
1. `llama_index` package, a powerful tool for building large-scale information retrieval systems. If this is the first time you're coming across this library, [get started here](https://gpt-index.readthedocs.io/en/latest/getting_started/starter_example.html) 
2. `Qdrant` is a high-performance vector database designed for storing and searching large-scale high-dimensional vectors. In the context of this notebook, we use Qdrant as our vector storage system.
3. `cohere` is a Reranking service: takes in a query and a list of texts and returns an ordered array with each text assigned a _new_ relevance score.

### Before you begin, API Keys:

Dependency: 
1. OpenAI keys for LLM. You can get them here: https://platform.openai.com/account/api-keys
2. Cohere keys for Reranking. You can get get them here: https://dashboard.cohere.ai/api-keys and some more docs on it: [Cohere API Key](https://docs.cohere.com/reference/key) 

If you'd like to use Qdrant Cloud, you can get the following from here: https://cloud.qdrant.io/ -> Access: 
1. URL
2. API Keys

The tutorial is written to work with the Qdrant client without depending on the cloud

Reviewer comments

- I think you might be able to download the dataset from hugging face and skip the json processing at the beginning https://huggingface.co/datasets?sort=downloads&search=news+ca
- I would put the diagram right below the title What do we want to build? . Also, the first sentence should say what we are building. The description of the image doesn't help me understand what we are building
- I am not entirely sure I understood what postprocessors are or how they work. It would be great to make this more explicit
- I think it would be great if you could mention what this is "Llama Index also has a EmbeddingRecencyPostprocessor " after you mentioned it. It sounds useful
- I think the Cohere part needs better wording.

In [17]:
import os


def check_environment_keys():
    """
    Utility Function that you've the NECESSARY Keys
    """
    if os.environ.get("OPENAI_API_KEY") is None:
        raise ValueError(
            "OPENAI_API_KEY cannot be None. Set the key using os.environ['OPENAI_API_KEY']='sk-xxx'"
        )
    if os.environ.get("COHERE_API_KEY") is None:
        raise ValueError(
            "COHERE_API_KEY cannot be None. Set the key using os.environ['COHERE_API_KEY']='xxx'"
        )
    if os.environ.get("QDRANT_API_KEY") is None:
        print("[Optional] If you want to use the Qdrant Cloud, please get the Qdrant Cloud API Keys and URL")


check_environment_keys()

[Optional] If you want to use the Qdrant Cloud, please get the Qdrant Cloud API Keys and URL


### Installing & Importing necessary packages
Before we start, make sure that you have the `llama_index` package installed. For post processing, we will Cohere's Rerank API. You can install both using pip:

In [None]:
# !pip install llama-index
# !pip install cohere
# !pip install datasets
# !pip install pandas

You will also need the `qdrant_client` package for this tutorial. You can install it via pip as well:

In [19]:
# !pip install -U qdrant-client

In [20]:
# Optionally: Rich to make error messages, stack traces easier to read
# !pip install 'rich[jupyter]'
%load_ext rich

The rich extension is already loaded. To reload it, use:
  %reload_ext rich


In [21]:
import os
from pathlib import Path

import pandas as pd
from datasets import load_dataset
from llama_index import (GPTVectorStoreIndex, ServiceContext,
                         SimpleDirectoryReader)
from llama_index.indices.postprocessor import FixedRecencyPostprocessor
from llama_index.indices.postprocessor.cohere_rerank import CohereRerank
from llama_index.storage.storage_context import StorageContext
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient

Path.ls = lambda x: list(x.iterdir())

# Loading Documents

First, let's load our documents. In this example, we will use the News Category Dataset v3. This dataset contains news articles with various fields like `headline`, `category`, `short_description`, `link`, `authors`, and date. We'll load the data, and reformat it to suit our needs.

In [45]:
dataset = load_dataset("heegyu/news-category-dataset", split="train")

Found cached dataset json (/Users/nirantk/.cache/huggingface/datasets/heegyu___json/heegyu--news-category-dataset-a0dcb53f17af71bf/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4)


In [52]:
def get_single_text(k):
    return f"{k['headline']} under the category:\n{k['category']}\n{k['short_description']}"


df = pd.DataFrame(dataset)
df.head()

Unnamed: 0,link,headline,category,short_description,authors,date
0,https://www.huffpost.com/entry/covid-boosters-...,Over 4 Million Americans Roll Up Sleeves For O...,U.S. NEWS,Health experts said it is too early to predict...,"Carla K. Johnson, AP",2022-09-23
1,https://www.huffpost.com/entry/american-airlin...,"American Airlines Flyer Charged, Banned For Li...",U.S. NEWS,He was subdued by passengers and crew when he ...,Mary Papenfuss,2022-09-23
2,https://www.huffpost.com/entry/funniest-tweets...,23 Of The Funniest Tweets About Cats And Dogs ...,COMEDY,"""Until you have a dog you don't understand wha...",Elyse Wanshel,2022-09-23
3,https://www.huffpost.com/entry/funniest-parent...,The Funniest Tweets From Parents This Week (Se...,PARENTING,"""Accidentally put grown-up toothpaste on my to...",Caroline Bologna,2022-09-23
4,https://www.huffpost.com/entry/amy-cooper-lose...,Woman Who Called Cops On Black Bird-Watcher Lo...,U.S. NEWS,Amy Cooper accused investment firm Franklin Te...,Nina Golgowski,2022-09-22


In [53]:
df["text"] = df.apply(get_single_text, axis=1)
df["text"]

0         Over 4 Million Americans Roll Up Sleeves For O...
1         American Airlines Flyer Charged, Banned For Li...
2         23 Of The Funniest Tweets About Cats And Dogs ...
3         The Funniest Tweets From Parents This Week (Se...
4         Woman Who Called Cops On Black Bird-Watcher Lo...
                                ...                        
209522    RIM CEO Thorsten Heins' 'Significant' Plans Fo...
209523    Maria Sharapova Stunned By Victoria Azarenka I...
209524    Giants Over Patriots, Jets Over Colts Among  M...
209525    Aldon Smith Arrested: 49ers Linebacker Busted ...
209526    Dwight Howard Rips Teammates After Magic Loss ...
Name: text, Length: 209527, dtype: object

In [57]:
df.describe()

Unnamed: 0,date
count,209527
mean,2015-04-30 00:44:14.344308736
min,2012-01-28 00:00:00
25%,2013-08-10 00:00:00
50%,2015-03-16 00:00:00
75%,2016-11-01 00:00:00
max,2022-09-23 00:00:00


In [59]:
df.drop(columns=["link", "headline", "category", "short_description", "authors"], inplace=True)

Next, we'll write these documents to text files in a directory. Each document will be written to a text file named by its date.

In [None]:
write_dir = Path("../data/dump").resolve()
write_dir.mkdir(exist_ok=True, parents=True)
for element in data:
    file_path = write_dir / f"{element['date']}.txt"
    with file_path.open("w") as f:
        #         print(element["text"])
        f.write(element["text"])

## What do we want to build?

This diagram shows that:

1. The user issues a query to the query engine.
2. The query engine, which has been configured with certain postprocessors, performs a search on the vector store based on the query.
3. The query engine then postprocesses the results.
4. The postprocessed results are then returned to the user.

Each arrow represents the direction of data flow. The "Query Engine" box encapsulates the postprocessing step to indicate that it's a part of the query engine's function. This diagram is meant to provide a high-level understanding of the process and does not include all the details involved.


We're using Llama Index to make the Query Engine and Qdrant for our Vector Store

![](../images/SetupFocus.png)

## Creating Qdrant Client
We'll be using Qdrant as our vector storage system. Qdrant is a high-performance vector database designed for storing and searching large-scale high-dimensional vectors.

### Local Qdrant Server/Docker + Cloud Instructions
- If you're running a local Qdrant instance with Docker, use `uri`:
  - `uri="http://<host>:<port>"`
  
Here I'll be using the cloud, so I am using the url set to my cloud instance

- Set the API KEY for Qdrant Cloud:
  - `api_key="<qdrant-api-key>"`
  - `url`

### Memory

- You can use `:memory:` mode for fast and lightweight experiments. It does not require Qdrant to be deployed anywhere.

In [None]:
client = QdrantClient(":memory:")

## Loading Data into Llama-Index
Llama-Index has a simple way to load documents from a directory. We can define a function to get the metadata from a file name, and pass this function to the `SimpleDirectoryReader` class.

In [None]:
def get_file_metadata(file_name: str):
    """Get file metadata."""
    return {"date": Path(file_name).stem}


documents = SimpleDirectoryReader(input_files=write_dir.ls(), file_metadata=get_file_metadata).load_data()

This `date` key is *necessary* for the Recency Postprocessor that we'll use later.

We then parse these documents into nodes and create our QdrantVectorStore

In [None]:
# define service context (wrapper container around current classes)
service_context = ServiceContext.from_defaults(chunk_size_limit=512)

vector_store = QdrantVectorStore(client=client, collection_name="huffpostnews")

Next, we'll create our `GPTVectorStoreIndex` from the documents. This operation might take some time as it's creating the index from the documents.

In [None]:
%%time
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = GPTVectorStoreIndex.from_documents(documents, storage_context=storage_context)

## Making our first Query 

We have made an index. But as we saw in the diagram, we also need some added functionality to do 3 things:

1. Retrieval
    - Convert the text query into embedding
    - Find the most similar documents
 
 
2. Synthesis
    - The LLM (here, OpenAI) texts the question, similar documents and a prompt to give you an answer

In [None]:
query_engine = index.as_query_engine(similarity_top_k=10)

In [None]:
response = query_engine.query("Who is the US President?")
print(response)

In [None]:
response = query_engine.query("Who won the EPL?")
print(response)

This answer is wrong and outdated: Manchester United F.C. won the 2013 club. But why is this?  

This happens often because the embedding similarity can pay more attention to EPL, Arsenal than time. 

Let's dig a bit deeper and see the top 5 results:  

In [None]:
k = 5
print(f"Top {k} results")

selected_nodes = response.source_nodes[:k]

for nodewithscore in selected_nodes:
    print("*" * len(nodewithscore.node.text))
    print(f"Score: {nodewithscore.score}. Sentence: {nodewithscore.node.text}")

We notice that the Retrieval was indeed right with Manchester United being mentioned in the first few responses. Even with that Arsenal gets selected during the Synthesis step by LLM. 
Llama Index has powerful ways to compose Retrieval and Ranking steps. 

# Adding Postprocessor

The intention behind this is to improve answer quality. Let's see if we can use Postprocessors to improve answer quality by using two approaches: 
1. Selecting the most recent node
2. Reranking using a different model, here Cohere Reranker

![](../images/RankFocus.png)

## Defining Recency Postprocessors

Llama-Index allows us to add postprocessors to our query engine. These postprocessors can modify the results of our queries after they are returned from the index. Here, we'll add a recency postprocessor to our query engine. This postprocessor will prioritize recent documents in the results.

We'll define a single type of recency postprocessor: `FixedRecencyPostprocessor`.

In [None]:
recency_postprocessor = FixedRecencyPostprocessor(service_context=service_context)

## Cohere Rerank

Cohere Rerank works on the top K results which the Retrieval step from Qdrant returns. While Qdrant works on your entire corpus (here thousands, but Qdrant is designed to work with millions) -- Cohere works with the result from Qdrant. This can improve the search results since it's working on smaller number of entries. 

Rerank endpoint takes in a query and a list of texts and produces an ordered array with each text assigned a relevance score.We'll define a `CohereRerank` postprocessor and add it to our query engine. T

## Defining Query Engines
We'll define four query engines for this tutorial: 
1. Just the Vector Store i.e. Qdrant here
1. A recency query engine
1. A reranking query engine
1. And a combined query engine.

The recency query engine uses the `FixedRecencyPostprocessor`, the reranking query engine uses the `CohereRerank` postprocessor, and the combined query engine uses both.

In [None]:
top_k = 10  # set one, reuse from now on, ensures consistency

In [None]:
index_query_engine = index.as_query_engine(
    similarity_top_k=top_k,
)

In [None]:
recency_query_engine = index.as_query_engine(
    similarity_top_k=top_k,
    node_postprocessors=[recency_postprocessor],
)

In [None]:
cohere_rerank = CohereRerank(api_key=os.environ["COHERE_API_KEY"], top_n=top_k)
reranking_query_engine = index.as_query_engine(
    similarity_top_k=top_k,
    node_postprocessors=[cohere_rerank],
)

In [None]:
query_engine = index.as_query_engine(
    similarity_top_k=top_k,
    node_postprocessors=[cohere_rerank, recency_postprocessor],
)

## Querying the Engine
Finally, we can query our engine. Let's ask it "Who is the current US President?" and see the results from each query engine.

In [None]:
question = "Who is the current US President?"
response = index_query_engine.query(question)
print(response)

The `response` object has a few interesting attributes which help us quickly debug and understand what happened in each of our steps: 
1. What source nodes (similar to Document Chunks in Langchain) were used to answer the question
2. What `extra_info` does the index have which we can use? This could also be sent as a payload to Qdrant to filter on (via epoch time) -- but Llama Index does not

Let's unpack that a bit, and we'll use what we learn from `response` to improve our understanding of the query engines and post processors themselves. 

Note that `10` which is the topk parameter we set. This confirms that we got back the 10 documents most similar to the question (or more correct: 10 nearest neighbours to the question) and a confidence score. 

This can be made more human friendly perhaps?

In [None]:
print(response.get_formatted_sources()[:318])

Let's check what is stored in the `extra_info` attribute. 

In [None]:
response.extra_info

This has a `date` key-value as a string against the `doc id`

Let's setup some tools to have a question, answer and the responses from the index engine in the same object - this will come handy in a bit for explaining a wrong answer. 

In [None]:
class QAInfo:
    """This class is used to store the question, correct answer and responses from different query engines."""

    def __init__(self, question: str, correct_answer: str):
        self.question = question
        self.correct_answer = correct_answer
        self.responses = {}

    def add_response(self, engine: str, response: str):
        # This method is used to add the response of a query engine to the responses dictionary.
        self.responses[engine] = response


def compare_responses(qa_info: QAInfo, query_engines: dict):
    """This function takes in a QAInfo object and a dictionary of query engines, and runs the question through each query engine.
    The responses from each engine are added to the QAInfo object."""
    print("*" * len(qa_info.question))
    print(f"Question: {qa_info.question}")
    print("*" * len(qa_info.question))

    for engine_name, engine in query_engines.items():
        response = engine.query(qa_info.question)
        qa_info.add_response(engine_name, response)
        print(f"{engine_name}:", response)

    return qa_info


query_engines = {
    "qdrant": index_query_engine,
    "recency": recency_query_engine,
    "reranking": reranking_query_engine,
    "both": query_engine,
}

In [None]:
question = "What is the gas price?"
correct_answer = "Less than $4 in Aug 22"  # This would normally be determined programmatically.
qa_info = QAInfo(question, correct_answer)
qa_info = compare_responses(qa_info, query_engines)

We see that the Recency component plays a key role in getting this right!

That looks promising. Let's try this with a question which has a specific year in it: 

In [None]:
question = "Who was the Indian Prime Minister in January 2014?"
correct_answer = "Manmohan Singh"  # This would normally be determined programmatically.
qa_info = QAInfo(question, correct_answer)
qa_info = compare_responses(qa_info, query_engines)
print("Correct answer is", qa_info.correct_answer)

### Observation

In this question: "Who is the Indian Prime Minister in January 2014?", the Qdrant and reranking engines correctly respond with "Manmohan Singh", while the recency and combined engines incorrectly respond with "Narendra Modi". 

This demonstrates how the use of recency postprocessing can lead to incorrect results if the recency information contradicts the correct answer. Let's try another example: 

In [None]:
question = "Who is the EPL Champion?"
correct_answer = "Manchester City"  # This would normally be determined programmatically.
qa_info = QAInfo(question, correct_answer)
qa_info = compare_responses(qa_info, query_engines)
print("Correct answer is", qa_info.correct_answer)

In [None]:
qa_info.responses["qdrant"].source_nodes[0].node.text

In [None]:
qa_info.responses["reranking"].source_nodes[0].node.text

In [None]:
qa_info.responses["recency"].source_nodes[0].node.text

In [None]:
qa_info.responses["both"].source_nodes[0].node.text

## Observation

We notice that receny did have a node which selected a node with the correct answer ("Manchester United"). The synthesis step ignored that and contributed to our quest being unfulfilled. Maybe we'll take another stab at this in a future tutorial perhaps!


# Recap

In this notebook, we demonstrated how to build a recency-aware question-answering system using Llama-Index and Qdrant. We loaded a news dataset, created a Qdrant client, and loaded our data into a Llama-Index. We defined a recency postprocessor and a Cohere reranking postprocessor, and used these to create various query engines. We then queried these engines with various questions and compared their responses.

Through this exercise, we learned that the use of recency postprocessing can lead to incorrect results if the recency information contradicts the correct answer. However, the use of a reranking postprocessor like Cohere can help to improve the accuracy of the results.