Tutorial
---
# Qdrant and Llama-Index: A new way to keep your Q&A systems up-to-date 

Ever found yourself frustrated with an answer engine that's stuck in the past? As our world rapidly evolves, the accuracy of information changes accordingly. Traditional models can become outdated, providing answers that were once accurate but are now obsolete. The cost of outdated knowledge can be high - misinforming users, impacting decision-making, and ultimately undermining trust in your system.

That's where Qdrant and Llama-Index come in, ensuring that answer engines stay relevant. These cutting-edge tools offer a fresh, timely perspective, ensuring your answer engine adapts to the relentless pace of information change.

This tutorial will demonstrate **how to build an answer engine that stays up-to-the-minute**. 

By mastering Qdrant and Llama-Index, you can transform your applications from static knowledge repositories into dynamic, adaptable knowledge machines. Whether you're a seasoned data scientist or an AI enthusiast, join us on this learning journey - the future of answer engines is here, and it's time to embrace it.

The tutorial covers setting up the necessary tools and API keys, loading documents, and adding postprocessors to improve the relevance of search results. The News Category Dataset v3 is used as an example dataset.

# Table of Contents

- Setting Up
    - Tools
    - Before you begin, API Keys
    - Installing & Importing necessary packages
- Loading Documents
  - What do we want to build?
  - Creating Qdrant Client
    - Local Qdrant Server/Docker + Cloud Instructions
    - Memory
  - Loading Data into Llama-Index
  - Making our first Query
- Adding Postprocessor
  - Defining Recency Postprocessors
  - Incorporating Reranking using Cohere
  - Defining Query Engines
  - Querying the Engine
    - Observation
    - Observation
- Recap

## Setting Up

### Tools

We will be using: 
1. `llama_index` package, a powerful tool for building large-scale information retrieval systems. If this is the first time you're coming across this library, [get started here](https://gpt-index.readthedocs.io/en/latest/getting_started/starter_example.html) 
2. `Qdrant` is a high-performance vector database designed for storing and searching large-scale high-dimensional vectors. In the context of this notebook, we use Qdrant as our vector storage system.
3. `cohere` is a Reranking service: takes in a query and a list of texts and returns an ordered array with each text assigned a _new_ relevance score.

### Before you begin, API Keys:

Dependency: 
1. OpenAI keys for LLM. You can get them here: https://platform.openai.com/account/api-keys
2. Cohere keys for Reranking. You can get get them here: https://dashboard.cohere.ai/api-keys and some more docs on it: [Cohere API Key](https://docs.cohere.com/reference/key) 

If you'd like to use Qdrant Cloud, you can get the following from here: https://cloud.qdrant.io/ -> Access: 
1. URL
2. API Keys

The tutorial is written to work with the Qdrant client without depending on the cloud

In [1]:
import os


def check_environment_keys():
    """
    Utility Function that you've the NECESSARY Keys
    """
    if os.environ.get("OPENAI_API_KEY") is None:
        raise ValueError(
            "OPENAI_API_KEY cannot be None. Set the key using os.environ['OPENAI_API_KEY']='sk-xxx'"
        )
    if os.environ.get("COHERE_API_KEY") is None:
        raise ValueError(
            "COHERE_API_KEY cannot be None. Set the key using os.environ['COHERE_API_KEY']='xxx'"
        )
    if os.environ.get("QDRANT_API_KEY") is None:
        print("[Optional] If you want to use the Qdrant Cloud, please get the Qdrant Cloud API Keys and URL")


check_environment_keys()

[Optional] If you want to use the Qdrant Cloud, please get the Qdrant Cloud API Keys and URL


### Installing & Importing necessary packages
Before we start, make sure that you have the `llama_index` package installed. For post processing, we will Cohere's Rerank API. You can install both using pip:

In [2]:
# !pip install llama-index
# !pip install cohere
# !pip install datasets
# !pip install pandas

You will also need the `qdrant_client` package for this tutorial. You can install it via pip as well:

In [3]:
# !pip install -U qdrant-client

In [4]:
# Optionally: Rich to make error messages, stack traces easier to read
# !pip install 'rich[jupyter]'
%load_ext rich

In [56]:
import datetime
import os
import random
from pathlib import Path
from typing import Any
import pandas as pd
from datasets import load_dataset
from llama_index import (ServiceContext,
                         SimpleDirectoryReader, StorageContext,
                         GPTVectorStoreIndex)
from llama_index.indices.postprocessor import FixedRecencyPostprocessor
from llama_index.indices.postprocessor.cohere_rerank import CohereRerank
from llama_index.storage.storage_context import StorageContext
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient

Path.ls = lambda x: list(x.iterdir())
random.seed(42)  # This is the answer

## What do we want to build?
![](../images/SetupFocus.png)


We will build an answer engine which stays updated. This has two main parts:
1. Retrieval - Done with Qdrant
2. Synthesis - Done with OpenAI API

Later, we'll add our components which helps us keep this updated and improve ranking after retrieval

Each arrow represents the direction of data flow. The "Query Engine" box encapsulates the postprocessing step to indicate that it's a part of the query engine's function. This diagram is meant to provide a high-level understanding of the process and does not include all the details involved.


We're using Llama Index to make the Query Engine and Qdrant for our Vector Store


# Loading Documents

First, let's load our documents. In this example, we will use the News Category Dataset v3. This dataset contains news articles with various fields like `headline`, `category`, `short_description`, `link`, `authors`, and date. We'll load the data, and reformat it to suit our needs.

In [6]:
dataset = load_dataset("heegyu/news-category-dataset", split="train")

Found cached dataset json (/Users/nirantk/.cache/huggingface/datasets/heegyu___json/heegyu--news-category-dataset-a0dcb53f17af71bf/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4)


In [7]:
def get_single_text(k):
    return f"Under the category:\n{k['category']}:\n{k['headline']}\n{k['short_description']}"


df = pd.DataFrame(dataset)
print(df.head())
# print(df.category.value_counts())
category_columns_to_keep = ["POLITICS", "THE WORLDPOST", "WORLD NEWS", "WORLDPOST", "U.S. NEWS"]

                                                link   
0  https://www.huffpost.com/entry/covid-boosters-...  \
1  https://www.huffpost.com/entry/american-airlin...   
2  https://www.huffpost.com/entry/funniest-tweets...   
3  https://www.huffpost.com/entry/funniest-parent...   
4  https://www.huffpost.com/entry/amy-cooper-lose...   

                                            headline   category   
0  Over 4 Million Americans Roll Up Sleeves For O...  U.S. NEWS  \
1  American Airlines Flyer Charged, Banned For Li...  U.S. NEWS   
2  23 Of The Funniest Tweets About Cats And Dogs ...     COMEDY   
3  The Funniest Tweets From Parents This Week (Se...  PARENTING   
4  Woman Who Called Cops On Black Bird-Watcher Lo...  U.S. NEWS   

                                   short_description               authors   
0  Health experts said it is too early to predict...  Carla K. Johnson, AP  \
1  He was subdued by passengers and crew when he ...        Mary Papenfuss   
2  "Until you have a dog y

In [8]:
import pandas as pd

# Assuming `df` is your original dataframe
df["year"] = df["date"].dt.year

category_columns_to_keep = ["POLITICS", "THE WORLDPOST", "WORLD NEWS", "WORLDPOST", "U.S. NEWS"]

# Filter by category
df_filtered = df[df["category"].isin(category_columns_to_keep)]

# Sample data for each year


def sample_func(x):
    return x.sample(min(len(x), 200))


df_sampled = df_filtered.groupby("year").apply(sample_func).reset_index(drop=True)

In [9]:
df_sampled["year"].value_counts()

year
2014    200
2015    200
2016    200
2017    200
2018    200
2019    200
2020    200
2021    200
2022    200
Name: count, dtype: int64

In [10]:
del df

In [11]:
df = df_sampled

In [12]:
df["text"] = df.apply(get_single_text, axis=1)
df["text"]

0       Under the category:\nWORLDPOST:\nPrepping for ...
1       Under the category:\nPOLITICS:\nPelosi Will Hi...
2       Under the category:\nPOLITICS:\nObama Spells O...
3       Under the category:\nPOLITICS:\nBill Foster De...
4       Under the category:\nPOLITICS:\nU.S. Opens Cri...
                              ...                        
1795    Under the category:\nWORLD NEWS:\nOperation Un...
1796    Under the category:\nPOLITICS:\nChilling Trump...
1797    Under the category:\nU.S. NEWS:\nMichigan Bask...
1798    Under the category:\nPOLITICS:\nMark Meadows S...
1799    Under the category:\nWORLD NEWS:\nBiden At UN ...
Name: text, Length: 1800, dtype: object

In [13]:
df["text"][9]

'Under the category:\nPOLITICS:\nHow Will Medicaid Directors Handle the Boom in Applications From the Affordable Care Act?\nThe second open enrollment will be heavily focused on bringing in even harder to reach populations, many of whom will be deemed eligible for Medicaid coverage.'

In [14]:
df.drop(columns=["year"], inplace=True)

Next, we'll write these documents to text files in a directory. Each document will be written to a text file named by its date.

In [29]:
%%time
write_dir = Path("../data/sample").resolve()
if write_dir.exists():
    [f.unlink() for f in write_dir.ls()]
write_dir.mkdir(exist_ok=True, parents=True)
for index, row in df.iterrows():
    date = str(row["date"]).replace("-", "_")  # replace '-' in date with '_' to avoid issues with file names
    file_path = write_dir / f"date_{date}_row_{index}.txt"
    with file_path.open("w") as f:
        f.write(row["text"])

CPU times: user 58.1 ms, sys: 110 ms, total: 169 ms
Wall time: 167 ms


In [16]:
# del dataset, df

## Creating Qdrant Client
We'll be using Qdrant as our vector storage system. Qdrant is a high-performance vector database designed for storing and searching large-scale high-dimensional vectors.

### Local Qdrant Server/Docker + Cloud Instructions
- If you're running a local Qdrant instance with Docker, use `uri`:
  - `uri="http://<host>:<port>"`
  
Here I'll be using the cloud, so I am using the url set to my cloud instance

- Set the API KEY for Qdrant Cloud:
  - `api_key="<qdrant-api-key>"`
  - `url`

### Memory

- You can use `:memory:` mode for fast and lightweight experiments. It does not require Qdrant to be deployed anywhere.

In [31]:
client = QdrantClient(
    url="https://1a02d8e1-d527-4a48-8719-81ce17cda1c3.us-east-1-0.aws.cloud.qdrant.io:6333",
    api_key="Ny_D6kwbz02QQ1CDYUGdCSx_UGCU1TmKFsV-Rr51zTynG4DIZuSpXw",
    prefer_grpc=True,
)

## Loading Data into Llama-Index
Llama-Index has a simple way to load documents from a directory. We can define a function to get the metadata from a file name, and pass this function to the `SimpleDirectoryReader` class.

In [32]:
def get_file_metadata(file_name: str):
    """Get file metadata."""
    date_str = Path(file_name).stem.split("_")[1:4]
    return {"date": "-".join(date_str)}


documents = SimpleDirectoryReader(input_files=write_dir.ls(), file_metadata=get_file_metadata).load_data()

In [33]:
len(documents)

1800

In [34]:
dates = []

for document in documents:
    d = document.extra_info["date"]
    #     print(d)
    try:
        dates.append(datetime.datetime.fromisoformat(d))
    except:
        print(d)

min(dates), max(dates)

(datetime.datetime(2014, 4, 18, 0, 0), datetime.datetime(2022, 9, 23, 0, 0))

This `date` key is *necessary* for the Recency Postprocessor that we'll use later.

We then parse these documents into nodes and create our QdrantVectorStore

In [35]:
# define service context (wrapper container around current classes)
service_context = ServiceContext.from_defaults(chunk_size_limit=512)
vector_store = QdrantVectorStore(client=client, collection_name="NewsCategoryv3PoliticsSample")

Next, we'll create our `GPTVectorStoreIndex` from the documents. This operation might take some time as it's creating the index from the documents.

In [36]:
index = GPTVectorStoreIndex.from_documents(documents, vector_store=vector_store)

## Making our first Query 

We have made an index. But as we saw in the diagram, we also need some added functionality to do 3 things:

1. Retrieval
    - Convert the text query into embedding
    - Find the most similar documents
 
 
2. Synthesis
    - The LLM (here, OpenAI) texts the question, similar documents and a prompt to give you an answer

In [37]:
query_engine = index.as_query_engine(similarity_top_k=10)

In [38]:
response = query_engine.query("Who is the US President?")
print(response)


The US President is Barack Obama.


In [39]:
response = query_engine.query("Who is the current US President?")
print(response)


The current US President is Donald Trump.


In [40]:
# response.source_nodes

In [41]:
response = query_engine.query("Who won the EPL?")
print(response)


The EPL (English Premier League) does not have a winner yet for the given dates.



# Adding Postprocessor

Llama Index has powerful ways to compose Retrieval and Ranking steps. 

The intention behind this is to improve answer quality. Let's see if we can use Postprocessors to improve answer quality by using two approaches: 
1. Selecting the most recent nodes (i.e. sort by recency)
2. Reranking using a different model, here Cohere Reranker

![](../images/RankFocus.png)

Here is what the diagram represents:
1. The user issues a query to the query engine.
2. The query engine, which has been configured with certain postprocessors, performs a search on the vector store based on the query.
3. The query engine then postprocesses the results.
4. The postprocessed results are then returned to the user

## Defining Recency Postprocessors

Llama-Index allows us to add postprocessors to our query engine. These postprocessors can modify the results of our queries after they are returned from the index. Here, we'll add a recency postprocessor to our query engine. This postprocessor will prioritize recent documents in the results.

We'll define a single type of recency postprocessor: `FixedRecencyPostprocessor`.

In [140]:
recency_postprocessor = FixedRecencyPostprocessor(service_context=service_context, top_k=3)

## Cohere Rerank

Cohere Rerank works on the top K results which the Retrieval step from Qdrant returns. While Qdrant works on your entire corpus (here thousands, but Qdrant is designed to work with millions) -- Cohere works with the result from Qdrant. This can improve the search results since it's working on smaller number of entries. 

![](../images/RerankFocus.png)


Rerank endpoint takes in a query and a list of texts and produces an ordered array with each text assigned a relevance score. We'll define a `CohereRerank` postprocessor and add it to our query engine. T

## Defining Query Engines
We'll define four query engines for this tutorial: 
1. Just the Vector Store i.e. Qdrant here
1. A recency query engine
1. A reranking query engine
1. And a combined query engine.

The recency query engine uses the `FixedRecencyPostprocessor`, the reranking query engine uses the `CohereRerank` postprocessor, and the combined query engine uses both.

In [141]:
top_k = 10  # set one, reuse from now on, ensures consistency

In [74]:
index_query_engine = index.as_query_engine(
    similarity_top_k=top_k,
)

In [75]:
recency_query_engine = index.as_query_engine(
    similarity_top_k=top_k,
    node_postprocessors=[recency_postprocessor],
)

In [76]:
cohere_rerank = CohereRerank(api_key=os.environ["COHERE_API_KEY"], top_n=top_k)
reranking_query_engine = index.as_query_engine(
    similarity_top_k=top_k,
    node_postprocessors=[cohere_rerank],
)

In [77]:
query_engine = index.as_query_engine(
    similarity_top_k=top_k,
    node_postprocessors=[cohere_rerank, recency_postprocessor],
)

## Querying the Engine
Finally, we can query our engine. Let's ask it "Who is the current US President?" and see the results from each query engine.

In [78]:
# question = "Who is the current US President?"
response = index_query_engine.query("Who is the US President?")
print(response)


The US President is Barack Obama.


The `response` object has a few interesting attributes which help us quickly debug and understand what happened in each of our steps: 
1. What source nodes (similar to Document Chunks in Langchain) were used to answer the question
2. What `extra_info` does the index have which we can use? This could also be sent as a payload to Qdrant to filter on (via epoch time) -- but Llama Index does not

Let's unpack that a bit, and we'll use what we learn from `response` to improve our understanding of the query engines and post processors themselves. 

Note that `10` which is the topk parameter we set. This confirms that we got back the 10 documents most similar to the question (or more correct: 10 nearest neighbours to the question) and a confidence score. 

This can be made more human friendly perhaps?

In [79]:
print(response.get_formatted_sources()[:318])

> Source (Doc id: dd85b761-f599-47ab-bed6-feafe8ea2094): date: 2016-08-21 00:00:00

Under the category:
POLITICS:
The Five Reasons Donald Trump Will Be Ou...

> Source (Doc id: 482e1da2-3e78-4bdd-b1f3-11c74c34ef1a): date: 2014-11-08 00:00:00

Under the category:
POLITICS:
Global Leadership for U.S. National Secu...




Let's check what is stored in the `extra_info` attribute. 

In [80]:
response.extra_info

{'dd85b761-f599-47ab-bed6-feafe8ea2094': {'date': '2016-08-21 00:00:00'},
 '482e1da2-3e78-4bdd-b1f3-11c74c34ef1a': {'date': '2014-11-08 00:00:00'},
 '429221c0-8a2a-4e3d-a46b-4b6760650ae9': {'date': '2015-12-31 00:00:00'},
 'a5c5dd77-d13d-4251-974f-339fe10d49d7': {'date': '2016-02-08 00:00:00'},
 '51d02bb9-cdab-43b2-a226-5e51869a23e6': {'date': '2014-05-03 00:00:00'},
 '095227ed-79f4-49dd-b2a6-d35e81551faa': {'date': '2015-03-15 00:00:00'},
 'bd9a3880-b675-49f6-84b9-2b14531e1295': {'date': '2018-01-24 00:00:00'},
 '9ab19555-0bcf-4e4b-9579-e4dd0262693a': {'date': '2014-10-05 00:00:00'},
 '01a7c6e9-f62d-4271-9a73-716615ad9ae7': {'date': '2016-04-09 00:00:00'},
 '9bb9b689-643b-4745-9572-5a5f3ae5311d': {'date': '2016-03-07 00:00:00'}}

This has a `date` key-value as a string against the `doc id`

Let's setup some tools to have a question, answer and the responses from the index engine in the same object - this will come handy in a bit for explaining a wrong answer. 

In [133]:
from IPython.display import Markdown, display_markdown

def mprint(text: str):
    display_markdown(Markdown(text))

class QAInfo:
    """This class is used to store the question, correct answer and responses from different query engines."""

    def __init__(self, question: str, correct_answer: str, query_engines: dict[str, Any]):
        self.question = question
        self.query_engines = query_engines
        self.correct_answer = correct_answer
        self.responses = {}

    def add_response(self, engine: str, response: str):
        # This method is used to add the response of a query engine to the responses dictionary.
        self.responses[engine] = response


    def compare_responses(self):
        
        
        """This function takes in a QAInfo object and a dictionary of query engines, and runs the question through each query engine.
        The responses from each engine are added to the QAInfo object."""
        mprint(f"### Question: {self.question}")

        for engine_name, engine in query_engines.items():
            response = engine.query(self.question)
            self.add_response(engine_name, response)
            mprint(f"**{engine_name.title()}**: {response}")

        mprint(f"Correct Answer is: {self.correct_answer}")
    
    def node_print(self, index, preview_count=5):
        source_nodes = self.responses[index].source_nodes
        for i in range(preview_count):
            mprint(f"- {source_nodes[i].node.text}")
            


query_engines = {
    "qdrant": index_query_engine,
    "recency": recency_query_engine,
    "reranking": reranking_query_engine,
    "both": query_engine,
}

In [129]:
question = "Who is the US President?"
correct_answer = "Joe Biden"  # This would normally be determined programmatically.
president_qa_info = QAInfo(question=question, correct_answer=correct_answer, query_engines=query_engines)
president_qa_info.compare_responses()

### Question: Who is the US President?

**Qdrant**: 
The US President is Barack Obama.

**Recency**: 
Donald Trump.

**Reranking**: 
The US President is Barack Obama.

**Both**: 
Donald Trump.

Correct Answer is: Joe Biden

## Impact of how a question is asked

In [123]:
question = "Who is current US President?"
correct_answer = "Joe Biden"  # This would normally be determined programmatically.
current_president_qa_info = QAInfo(question=question, correct_answer=correct_answer, query_engines=query_engines)
current_president_qa_info.compare_responses()

### Question: Who is current US President?

**Qdrant**: 
The current US President is Joe Biden.

**Recency**: 
The current US President is Donald Trump.

**Reranking**: 
The current US President is Joe Biden.

**Both**: 
Donald Trump.

Correct Answer is: Joe Biden

## Adding a specific Year

That looks interesting. Let's try this with a question which has a specific year in it: 

In [124]:
question = "Who was the US President in 2010?"
correct_answer = "Barack Obama"  # This would normally be determined programmatically.
president_2010_qa_info = QAInfo(question=question, correct_answer=correct_answer, query_engines=query_engines)
president_2010_qa_info.compare_responses()

### Question: Who was the US President in 2010?

**Qdrant**: 
The US President in 2010 was Barack Obama.

**Recency**: 
The US President in 2010 was Barack Obama.

**Reranking**: 
The US President in 2010 was Barack Obama.

**Both**: 
The US President in 2010 was Barack Obama.

Correct Answer is: Barack Obama

Let's try a different variant of this question, specify a year and see what happens? 

In [139]:
question = "Who was the Indian Prime Minister in January 2014?"
correct_answer = "Manmohan Singh"  # This would normally be determined programmatically.
prime_minister_jan2014 = QAInfo(question=question, correct_answer=correct_answer, query_engines=query_engines)
prime_minister_jan2014.compare_responses()

### Question: Who was the Indian Prime Minister in January 2014?

**Qdrant**: 
The Indian Prime Minister in January 2014 was Narendra Modi.

**Recency**: 
The Indian Prime Minister in January 2014 was Manmohan Singh.

**Reranking**: 
The Indian Prime Minister in January 2014 was Narendra Modi.

**Both**: 
The Indian Prime Minister in January 2014 was Manmohan Singh.

Correct Answer is: Manmohan Singh

In [135]:
question = "Who was the Indian Prime Minister in December 2014?"
correct_answer = "Narendra Modi"  # This would normally be determined programmatically.
prime_minister_dec2014 = QAInfo(question=question, correct_answer=correct_answer, query_engines=query_engines)
prime_minister_dec2014.compare_responses()

### Question: Who was the Indian Prime Minister in December 2014?

**Qdrant**: 
The Indian Prime Minister in December 2014 was Narendra Modi.

**Recency**: 
The Indian Prime Minister in December 2014 was Narendra Modi.

**Reranking**: 
The Indian Prime Minister in December 2014 was Narendra Modi.

**Both**: 
The Indian Prime Minister in December 2014 was Narendra Modi.

Correct Answer is: Narendra Modi

### Observation

In this question: "Who is the Indian Prime Minister in January 2014?", the Qdrant and reranking engines incorrectly respond with "Narendra Modi", while the recency and combined engines correctly respond with "Manmohan Singh". 

This demonstrates how the use of recency postprocessing can lead to incorrect results if the recency information contradicts the correct answer. Let's investigate the sources further:

In [136]:
prime_minister_dec2014.node_print(index="qdrant", preview_count=3)

- Under the category:
WORLDPOST:
This Chart Shows Which World Leaders Are the Most Powerful on Twitter
The new prime minister of India, @NarendraModi, just surpassed the @WhiteHouse in fourth place and is likely to have more followers than Indonesian President Susilo Bambang Yudhoyono next week.

- Under the category:
WORLD NEWS:
After India's Amazon Snub, Modi's Party Slams Bezos-Owned Washington Post
An official has been critical of foreign media’s reporting on political issues in the past.

- Under the category:
U.S. NEWS:
Activists Petition Gates Foundation Not To Honor India's Prime Minister
Actors Riz Ahmed and Jameela Jamil have also pulled out of the event in which an award is to be given to Narendra Modi despite India's crackdown in Kashmir.

In [137]:
prime_minister_dec2014.node_print(index="recency", preview_count=3)

- Under the category:
WORLD NEWS:
After India's Amazon Snub, Modi's Party Slams Bezos-Owned Washington Post
An official has been critical of foreign media’s reporting on political issues in the past.

- Under the category:
U.S. NEWS:
Activists Petition Gates Foundation Not To Honor India's Prime Minister
Actors Riz Ahmed and Jameela Jamil have also pulled out of the event in which an award is to be given to Narendra Modi despite India's crackdown in Kashmir.

- Under the category:
WORLD NEWS:
Israeli Police Recommend Prime Minister Netanyahu Be Indicted On Corruption Charges
He could face indictment after a monthslong investigation into two separate allegations of corruption.

In [138]:
prime_minister_dec2014.node_print(index="reranking", preview_count=3)

- Under the category:
WORLDPOST:
Japan's Election Sees A Landslide Victory For Ruling Party


- Under the category:
WORLD NEWS:
After India's Amazon Snub, Modi's Party Slams Bezos-Owned Washington Post
An official has been critical of foreign media’s reporting on political issues in the past.

- Under the category:
WORLDPOST:
This Chart Shows Which World Leaders Are the Most Powerful on Twitter
The new prime minister of India, @NarendraModi, just surpassed the @WhiteHouse in fourth place and is likely to have more followers than Indonesian President Susilo Bambang Yudhoyono next week.

## Observation

We notice that receny did have a node which selected a node with the correct answer ("Manchester United"). The synthesis step ignored that and contributed to our quest being unfulfilled. Maybe we'll take another stab at this in a future tutorial perhaps!


# Recap

In this notebook, we demonstrated how to build a recency-aware question-answering system using Llama-Index and Qdrant. We loaded a news dataset, created a Qdrant client, and loaded our data into a Llama-Index. We defined a recency postprocessor and a Cohere reranking postprocessor, and used these to create various query engines. We then queried these engines with various questions and compared their responses.

Through this exercise, we learned that the use of recency postprocessing can lead to incorrect results if the recency information contradicts the correct answer. However, the use of a reranking postprocessor like Cohere can help to improve the accuracy of the results.