
#  LlamaIndex And Qdrant Q&A System With Recency and Cohere Reranking         Updates


#  Introduction

This notebook is a basic demonstration of using LlamaIndex and Qdrant to work together to demonstrate dynamic, adaptable knowledge machines. The rapid pace of information updates makes a dynamically updating system much more valuable than a static system which can become outdated rapidly.

The following activities are performed in this notebook:

* Loading a news dataset, store it with Qdrant client, and load the data into LlamaIndex.

* Two postprocessors: Recency and Cohere Rerank; will be defined and used to create various query engines.

* LlamaIndex node sources will be used to investigate questions and sources on which the answers are based.

* These engines will be queried with various questions and the responses will be compared.

### Install Packages

In [1]:
%pip install llama-index==0.10.44 > /dev/null
%pip install llama-index-vector-stores-qdrant==0.2.9 > /dev/null
%pip install llama-index-postprocessor-cohere-rerank==0.1.7 > /dev/null

In [50]:
%pip install cohere==5.5.5  > /dev/null
%pip install -U qdrant-client==1.9.1 > /dev/null
%pip install datasets==2.20.0 > /dev/null


### Import Libraries

In [3]:
import datetime
import os
import random
import getpass
import cohere
from pathlib import Path
from typing import Any
import tensorflow as tf

import pandas as pd
import openai
from datasets import load_dataset
from IPython.display import Markdown, display_markdown
from llama_index.core import (VectorStoreIndex, Settings,
                         SimpleDirectoryReader)
from llama_index.core.postprocessor import FixedRecencyPostprocessor
from llama_index.postprocessor.cohere_rerank import CohereRerank
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from llama_index.core.node_parser import SentenceSplitter

Path.ls = lambda x: list(x.iterdir())
random.seed(42)

In [4]:
# Remove limit on column display width
pd.options.display.max_colwidth = None

In [5]:
# Utilize GPU if available
# Get the list of available physical devices
physical_devices = tf.config.list_physical_devices('GPU')

if len(physical_devices) > 0:
    # If a GPU is available, use it
    tf.config.experimental.set_memory_growth(physical_devices[0], True)
    device = "/GPU:0"
    print("Using GPU")
else:
    # If no GPU is available, use CPU
    device = "/CPU:0"
    print("Using CPU")

Using GPU


### Retrieve API Keys:

In [6]:
# Retrieve OpenAI key
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key: ")
openai.api_key = os.environ["OPENAI_API_KEY"]

# Retrieve Cohere key
os.environ["COHERE_API_KEY"] = getpass.getpass("COHERE API Key: ")
cohere.api_key = os.environ["COHERE_API_KEY"]



OpenAI API Key: ··········
COHERE API Key: ··········


# Architecture

The answer engine consists of two main parts:

1. Retrieval - Done with Qdrant
2. Synthesis - Done with OpenAI API

LlamaIndex will be used to make the Query Engine and Qdrant will be used for the Vector Store. Components will then be added to keep the engine updated and improve ranking after retrieval.

# Load Sample Dataset

The HuggingFace [News Category Dataset v3] dataset is used. (https://huggingface.co/datasets/heegyu/news-category-dataset). This dataset contains news articles with various fields like `headline`, `category`, `short_description`, `link`, `authors`, and date.

In [7]:
# Load Dataset
dataset = load_dataset("heegyu/news-category-dataset", split="train")

Downloading readme:   0%|          | 0.00/101 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/87.3M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/209527 [00:00<?, ? examples/s]

In [8]:
# Define a get_single_text function to return muliline string contining contents of the three categories listed.
def get_single_text(k):
    return f"Under the category:\n{k['category']}:\n{k['headline']}\n{k['short_description']}"

# Define DataFrame and print head
df = pd.DataFrame(dataset)
df.head()

Unnamed: 0,link,headline,category,short_description,authors,date
0,https://www.huffpost.com/entry/covid-boosters-uptake-us_n_632d719ee4b087fae6feaac9,Over 4 Million Americans Roll Up Sleeves For Omicron-Targeted COVID Boosters,U.S. NEWS,Health experts said it is too early to predict whether demand would match up with the 171 million doses of the new boosters the U.S. ordered for the fall.,"Carla K. Johnson, AP",2022-09-23
1,https://www.huffpost.com/entry/american-airlines-passenger-banned-flight-attendant-punch-justice-department_n_632e25d3e4b0e247890329fe,"American Airlines Flyer Charged, Banned For Life After Punching Flight Attendant On Video",U.S. NEWS,"He was subdued by passengers and crew when he fled to the back of the aircraft after the confrontation, according to the U.S. attorney's office in Los Angeles.",Mary Papenfuss,2022-09-23
2,https://www.huffpost.com/entry/funniest-tweets-cats-dogs-september-17-23_n_632de332e4b0695c1d81dc02,23 Of The Funniest Tweets About Cats And Dogs This Week (Sept. 17-23),COMEDY,"""Until you have a dog you don't understand what could be eaten.""",Elyse Wanshel,2022-09-23
3,https://www.huffpost.com/entry/funniest-parenting-tweets_l_632d7d15e4b0d12b5403e479,The Funniest Tweets From Parents This Week (Sept. 17-23),PARENTING,"""Accidentally put grown-up toothpaste on my toddler’s toothbrush and he screamed like I was cleaning his teeth with a Carolina Reaper dipped in Tabasco sauce.""",Caroline Bologna,2022-09-23
4,https://www.huffpost.com/entry/amy-cooper-loses-discrimination-lawsuit-franklin-templeton_n_632c6463e4b09d8701bd227e,Woman Who Called Cops On Black Bird-Watcher Loses Lawsuit Against Ex-Employer,U.S. NEWS,Amy Cooper accused investment firm Franklin Templeton of unfairly firing her and branding her a racist after video of the Central Park encounter went viral.,Nina Golgowski,2022-09-22


In [9]:
# Extract year data from date column and store in the year column
df["year"] = df["date"].dt.year

# Define list of categories to keep
category_columns_to_keep = ["POLITICS", "THE WORLDPOST", "WORLD NEWS", "WORLDPOST", "U.S. NEWS"]

# Create boolean mask and filter by category
df_filtered = df[df["category"].isin(category_columns_to_keep)]

# Create function to randomly sample the minimum of either 200 rows or number of rows in DaaFrame
def sample_func(x):
    return x.sample(min(len(x), 200), random_state=42)

# Apply sample_func to each unique year and then reset index to simple sequential index, drop grouped index
df_sampled = df_filtered.groupby("year").apply(sample_func).reset_index(drop=True)

In [10]:
# Determine entries by count of year
df_sampled["year"].value_counts()

year
2014    200
2015    200
2016    200
2017    200
2018    200
2019    200
2020    200
2021    200
2022    200
Name: count, dtype: int64

In [11]:
# Delete dataframe
del df

In [12]:
# Set df to sampled_df
df = df_sampled

In [13]:
# Apply function get_single_text defined above to each row.
df["text"] = df.apply(get_single_text, axis=1)
df["text"]

0       Under the category:\nWORLDPOST:\nAfghans Don't Like Soybeans, Despite a Big U.S. Push\nAfghanistan has a rich culinary tradition, but soybeans have not been a part of it. American agricultural experts who consider soybeans a superfood find this dismaying, and so over the past four years, they have invested tens of millions of U.S. taxpayer dollars to try to change the way Afghans eat.
1                                                                                                                                                                                                                                                                                                             Under the category:\nPOLITICS:\nACLU Seeks To Challenge Law Targeting Pregnant Drug Addicts\n
2                                               Under the category:\nPOLITICS:\nWork and Worth\nWhat someone is paid has little or no relationship to what their work is worth to society. Does anyone seriously

In [14]:
# Print a line of DataFrame
df["text"][10]

'Under the category:\nPOLITICS:\nPulling the Princes From Their Thrones\nIt is the time of year to think about the Christmas story, even if the main characters are a bit on the radical side. It is good to remember as we fight the battles of today that common-sense and basic justice have always sounded a little radical, especially to those in power.'

In [15]:
# Drop year column
df.drop(columns=["year"], inplace=True)

In [16]:
# Write documents to text files in directory. Each document to be written to a text file named after it's date and row index.
%%time

# Convert path object to an absolute path
write_dir = Path("../data/sample").resolve()

# Clear directory if it exists
if write_dir.exists():
    [f.unlink() for f in write_dir.ls()]

# Create directory if it doesn't exist
write_dir.mkdir(exist_ok=True, parents=True)

# Iterate through DataFrame rows
for index, row in df.iterrows():
    date = str(row["date"]).replace("-", "_")  # replace '-' in date with '_' to avoid issues with file names
    file_path = write_dir / f"date_{date}_row_{index}.txt" # Create file path using formatted date and row index
    with file_path.open("w") as f:       # Open file in write mode
        f.write(row["text"])             # Write the "text" value from current row into file

CPU times: user 273 ms, sys: 105 ms, total: 378 ms
Wall time: 389 ms


## Store Dataset with Qdrant Client

In [17]:
# Retrieve Qdrant API key
os.environ["QDRANT_API_KEY"] = getpass.getpass("QDRANT API Key: ")
api_key = os.environ["QDRANT_API_KEY"]

QDRANT API Key: ··········


In [18]:
# Use :memory: mode for this lightweight application as it does no require Qdrant to be deployed anywhere
client = QdrantClient(":memory:")

## Load Data into LlamaIndex

In [19]:
# Define a function to get the metadata from a file name, and pass this function to the `SimpleDirectoryReader` class.
def get_file_metadata(file_name: str):
    """Get file metadata."""
    date_str = Path(file_name).stem.split("_")[1:4]
    return {"date": "-".join(date_str)}

# Pass get_file_metadata funcion to the SimpleDirectoryReader class
documents = SimpleDirectoryReader(input_files=write_dir.ls(), file_metadata=get_file_metadata).load_data()

In [20]:
# Determine documents length
len(documents)

1800

In [21]:
# Examine date ranges in dataset. Date key is necessary for the Recency Postprocessor
dates, years = [], []

for document in documents:
    dt = datetime.datetime.fromisoformat(document.extra_info["date"])
    #     print(d)
    try:
        dates.append(dt)
        years.append(dt.year)
    except:
        print(d)

In [22]:
# define service context (wrapper container around current classes)
# service_context = ServiceContext.from_defaults(chunk_size=512)
Settings.node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=20)

# Create vector store
vector_store = QdrantVectorStore(client=client, collection_name="NewsCategoryv3PoliticsSample")

In [23]:
# Create VectorStoreIndex from the documents
%%time
index = VectorStoreIndex.from_documents(documents, vector_store=vector_store)

CPU times: user 14 s, sys: 136 ms, total: 14.2 s
Wall time: 25.3 s


## Execute A Test Query

An index has been created, however extra functionality is needed:

1. Retrieval
    - Convert the text query into embedding
    - Find the most similar documents
2. Synthesis
    - The LLM (OpenAI) texts the question, similar documents and a prompt to return a response.
    
For clarity this demonstration notebook will employ queries with a very clear cut true or false answer which has changed over time.

In [24]:
# Define query engine
query_engine = index.as_query_engine(similarity_top_k=10)

In [25]:
# Submit a query without any time context included.
response = query_engine.query("Who is the US President?")
print(response)

Barack Obama.


In [26]:
# Submit the identical query, however, with a specific time context included.
response = query_engine.query("Who is the current US President?")
print(response)

The current US President is Joe Biden.


### Observation

* Note that the addition of s specific time context that is otherwise identical results in a different, correct answer.

## Adding Postprocessors

###  Defining Query Engines and Postprocessors

* The Vector Store (Qdrant) query engine which searches the entire vector store and returns the top k results.
* Recency postprocessor which searches the top K results returned by the vector store and prioritizes the most recent.  
* Cohere reranking postprocessor which searches the top k results returned by the vector store and reranks them.
* Combined postprocessor which searches the vector store top k results and reranks using combined recency and Cohere rerankers.

In [27]:
# Define Recency Postprocessor
recency_postprocessor = FixedRecencyPostprocessor(top_k=1)

In [28]:
# Define top_k once and reused for consistancy
top_k = 10

In [29]:
# Define the Vector Store (Qdrant) query engine
index_query_engine = index.as_query_engine(
    similarity_top_k=top_k,
)

In [30]:
# Define a Recency Engine query engine
recency_query_engine = index.as_query_engine(
    similarity_top_k=top_k,
    node_postprocessors=[recency_postprocessor],
)

In [31]:
# Define a Cohere Reranking query engine
cohere_rerank = CohereRerank(api_key=os.environ["COHERE_API_KEY"], top_n=top_k)
reranking_query_engine = index.as_query_engine(
    similarity_top_k=top_k,
    node_postprocessors=[cohere_rerank],
)

In [32]:
# Define a combined Cohere Rerank and Recency Post Processor query engine
query_engine = index.as_query_engine(
    similarity_top_k=top_k,
    node_postprocessors=[cohere_rerank, recency_postprocessor],
)

## Querying the Engines and Comparing Results


In [33]:
# Query the Vector Store (Qdrant) query engine
response = index_query_engine.query("Who is the current US President?")
print(response)

The current US President is Joe Biden.


### Observations                                            
* The vector store returns the correct answer when a temporal basis is included in the query.

In [34]:
# Print response for more detailed information
print(response.get_formatted_sources()[:318])

> Source (Doc id: be1e6124-176b-4af4-ba02-251cea668941): Under the category:
THE WORLDPOST:
World Leaders React To The Reality Of A Trump Presidency
Many ...

> Source (Doc id: cbb4475d-fdd4-42f4-ba72-90b6c47819fe): Under the category:
POLITICS:
The Issue That's Looming Over The Final Years Of Obama's Presidency

> S


In [35]:
# Setup tools to have a question, answer, and and the responses from the index engine in the same object
def mprint(text: str):
    display_markdown(Markdown(text))


class QAInfo:
    """This class is used to store the question, correct answer and responses from different query engines."""

    def __init__(self, question: str, correct_answer: str, query_engines: dict[str, Any]):
        self.question = question
        self.query_engines = query_engines
        self.correct_answer = correct_answer
        self.responses = {}

    def add_response(self, engine: str, response: str):
        # This method is used to add the response of a query engine to the responses dictionary.
        self.responses[engine] = response

    def compare_responses(self):
        """This function takes in a QAInfo object and a dictionary of query engines, and runs the question through each query engine.
        The responses from each engine are added to the QAInfo object."""
        mprint(f"### Question: {self.question}")

        for engine_name, engine in query_engines.items():
            response = engine.query(self.question)
            self.add_response(engine_name, response)
            mprint(f"**{engine_name.title()}**: {response}")

        mprint(f"Correct Answer is: {self.correct_answer}")

    def node_print(self, index, preview_count=5):
        source_nodes = self.responses[index].source_nodes
        for i in range(preview_count):
            mprint(f"- {source_nodes[i].node.text}")


query_engines = {
    "qdrant": index_query_engine,
    "recency": recency_query_engine,
    "reranking": reranking_query_engine,
    "both": query_engine,
}

In [36]:
# Query all the four engines and compare responses
question = "Who is the US President?"
correct_answer = "Joe Biden"
president_qa_info = QAInfo(question=question, correct_answer=correct_answer, query_engines=query_engines)
president_qa_info.compare_responses()

### Question: Who is the US President?

**Qdrant**: Barack Obama.

**Recency**: The US President is Joe Biden.

**Reranking**: The US President is Barack Obama.

**Both**: The US President is Joe Biden.

Correct Answer is: Joe Biden

### Observations
* With no temporal reference in the question the Qrant vector store query egine returns the incorrect answer.
* The Recency query engine reviewed the top k responsed returned by the vector store, prioritized the most recent information, and returned the correct answer.
* The Cohere Reranking query engine reviewed the top k reponses from the vector store and returned the incorrec answer.
* Combining the Recency and Cohere Reranking query engines rsulted in a correbt response even when there was no temporal context in the question.
                                                 

In [37]:
# Print Qdrant vector store  based response
president_qa_info.node_print(index="qdrant", preview_count=1)

- Under the category:
THE WORLDPOST:
World Leaders React To The Reality Of A Trump Presidency
Many of the presidential memorandums and executive decisions will fundamentally affect countries around the globe.

In [38]:
# Print recency based response
president_qa_info.node_print(index="recency", preview_count=1)

- Under the category:
WORLD NEWS:
Biden On Putin: 'For God's Sake, This Man Cannot Remain In Power'
President Joe Biden visited Poland's capital on Saturday to speak with refugees who've been displaced amid Russia's attack on Ukraine.

In [39]:
# Print Cohere reranking based response
president_qa_info.node_print(index="reranking", preview_count=1)

- Under the category:
WORLDPOST:
President Obama Touts Partnership With Egypt's Military Regime: Yet Repression Continues Unabated
In any case, Washington's influence is limited: The Sisi regime will do whatever it believes necessary to retain power. Whatever America does, Egypt is likely to end up without liberty or stability. Washington should step back from a crisis that it can't resolve.

In [40]:
# Print combined query engine based response
president_qa_info.node_print(index="both", preview_count=1)

- Under the category:
WORLD NEWS:
Biden On Putin: 'For God's Sake, This Man Cannot Remain In Power'
President Joe Biden visited Poland's capital on Saturday to speak with refugees who've been displaced amid Russia's attack on Ukraine.

### Submit a question which includes a specific year

In [41]:
# Submit an identical query to all four query engines
question = "Who is the US President in 2022?"
correct_answer = "Joe Biden"  # This would normally be determined programmatically.
current_president_qa_info = QAInfo(
    question=question, correct_answer=correct_answer, query_engines=query_engines
)
current_president_qa_info.compare_responses()

### Question: Who is the US President in 2022?

**Qdrant**: Joe Biden

**Recency**: I cannot provide the current US President in 2022 as it is not mentioned in the provided context.

**Reranking**: Joe Biden

**Both**: I cannot provide the name of the US President in 2022 based on the context information provided.

Correct Answer is: Joe Biden

### Oservations
* The Recency query engine responds that it can not answer when a question which references a specific year is submitted.
* The combined Recency and Cohere Reranking query engine responds that it can not answer when a question which references a specific year is submitted.

### Investigating for Ranking Challenges

We pull the few top documents which from each query engine. To make them easy to read, we've a utility `node_print` here.


💡 We notice that Qdrant (using embeddings) correctly pulls out a few mentions of "2024", "Joe Biden" and "President Joe Biden"

💡 Cohere also re-orders the top 10 candidates to give the top 3 which mention "President Joe Biden".

With Recency, we get an undetermined answer. This is because we're only using the one, most recent result.

In [42]:
# Query Qdrant for top three responses
current_president_qa_info.node_print(index="qdrant", preview_count=3)

- Under the category:
POLITICS:
Joe Biden Says He 'Can't Picture' U.S. Troops Being In Afghanistan In 2022
The president doubled down on his promise to end America's longest-running war at a Thursday press conference, though he said a May 1 deadline seemed unlikely.

- Under the category:
POLITICS:
How A Crowded GOP Field Could Bolster A Trump 2024 Campaign
As Donald Trump considers another White House run, polls show he's the most popular figure in the Republican Party.

- Under the category:
POLITICS:
Ex-CIA Director Issues Chilling Warning On What 4 More Years Of Trump Would Mean
Retired Gen. Michael Hayden endorses Joe Biden in a new video from Republican Voters Against Trump.

### Observation
* Note that the previews of 3 of the 10 vector store responses include one with a reference to Joe Biden in 2022 which helps explain why this query can answer the question correctly.


In [43]:
# Query for most recent with recency query engine
current_president_qa_info.node_print(index="recency", preview_count=1)

- Under the category:
POLITICS:
GOP Senators Refuse To Rule Out Supporting Donald Trump Again — Even If He's Indicted
With the ex-president reportedly under criminal investigation, many Senate Republicans are taking a wait-and-hope-it-doesn’t-happen stance.

### Observation
* Note that the Recency response preview does not include a reference to President Joe Biden in 2022 which helps explain why this query engine could not answer the question.
* It is apparent the the Recency query engine is not a good selection if the question needs to retrieve data from a specific past year.

In [44]:
# Query the Chere top ten candidates re-ordered to give the top three
current_president_qa_info.node_print(index="reranking", preview_count=3)

- Under the category:
POLITICS:
Biden To Give First State Of The Union Address At Fraught Moment
President Joe Biden aims to navigate the country out a pandemic, reboot his stalled domestic agenda and confront Russia’s aggression.

- Under the category:
POLITICS:
Joe Biden Says He 'Can't Picture' U.S. Troops Being In Afghanistan In 2022
The president doubled down on his promise to end America's longest-running war at a Thursday press conference, though he said a May 1 deadline seemed unlikely.

- Under the category:
POLITICS:
Fed's Powell Will Aim To Win A High-Stakes Bet In 2nd Term
President Joe Biden on Monday picked Federal Reserve Chair Jerome Powell for another four-year term.

### Observation
* The previews of the Cohere Reranking query engine responses include a reference to President Joe Biden in 2022 which helps explain why this query engine can answer the question correctly.

### Submit a different question with the temporal information in a different format

In [46]:
# Define a different question and answer pair
question = "Who was the Finance Minister of India under Manmohan Singh Govt?"
correct_answer = "P. Chidambaram"  # This would normally be determined programmatically.
prime_minister_jan2014 = QAInfo(question=question, correct_answer=correct_answer, query_engines=query_engines)
prime_minister_jan2014.compare_responses()

### Question: Who was the Finance Minister of India under Manmohan Singh Govt?

**Qdrant**: The Finance Minister of India under the Manmohan Singh government was P. Chidambaram.

**Recency**: P. Chidambaram was the Finance Minister of India under the Manmohan Singh government.

**Reranking**: There is no information provided in the context about the Finance Minister of India under the Manmohan Singh government.

**Both**: P. Chidambaram

Correct Answer is: P. Chidambaram

### Observations
* Note that all query engines with the exception of the Cohere reranking engine returned the correct answer.


### Observation

In this question: All the engines give the correct answer!

This is despite the fact that the Recency Postprocessor response does not even talk about the Indian Prime Minister! ❌

Qdrant via OpenAI Embeddings and Cohere Rerank do not do that much better

The correct answer comes from OpenAI LLM's knowledge of the world!

In [47]:
# Vector Store query
prime_minister_jan2014.node_print(index="qdrant", preview_count=10)

- Under the category:
POLITICS:
Robbing Main Street to Prop Up Wall Street:  Why Jerry Brown's Rainy Day Fund Is a Bad Idea
There is no need to sequester funds urgently needed by Main Street to pay for Wall Street's malfeasance. Californians can have their cake and eat it too - with a state-owned bank.

- Under the category:
WORLDPOST:
Cities Need To Get Smarter -- And India's On It

- Under the category:
WORLD NEWS:
Arundhati Roy's New Novel Lays India Bare, Unveiling Worlds Within Our Worlds
Malavika Binny, Jawaharlal Nehru University Wearing two hats at once can be an uncomfortable fit, but it does not seem to

- Under the category:
POLITICS:
The World Bank Must Commit to Food Security
Much will be said about bringing roads, electricity and infrastructure to underdeveloped regions. But how committed is the World Bank to the planet as a whole when it is doling out its loans?

- Under the category:
POLITICS:
It Takes Just 4 Charts To Show A Big Part Of What's Wrong With Congress

- Under the category:
THE WORLDPOST:
Greece Demands IMF Explain 'Disaster' Remarks In Explosive Leak
A letter from Greek prime minister Alexis Tsipras questions whether the country "can trust" the lender.

- Under the category:
WORLD NEWS:
The Race To Replace Boris Johnson As Prime Minister Is Down To These 2 Candidates
On Sept. 5, either Rishi Sunak or Liz Truss will be announced as the U.K.'s new prime minister.

- Under the category:
POLITICS:
Shortchanging Our Own Future
To win the future, we should be providing incentives to encourage our youths to pursue professional careers that are suffering from labor shortages, instead of making education more costly and taxing for them.

- Under the category:
WORLDPOST:
The Limits of Ideology: Lessons from Singapore
Next year will see the 50th anniversary of the creating of Singapore, widely hailed as one of the most successful of the Asian tigers. In that short space of time, the tiny nation state has grown into one of the world's largest financial cities.

- Under the category:
POLITICS:
Electing to Support Our Economy
Either we can choose to elect those who have proven themselves to have the courage and the foresight to make decisions with the long-term benefit of working people in mind, or we can elect those who are too shortsighted and too captive to special interests to make the calls that will help our nation thrive.

### Observation
* The Qdrant vector store preview of the top 10 results does not include any reference to the Indian Prime Minister.  
* OpenAI answered this question based on it's general world knowledge.   

In [48]:
# Query for most recent
prime_minister_jan2014.node_print(index="recency", preview_count=1)

- Under the category:
WORLD NEWS:
The Race To Replace Boris Johnson As Prime Minister Is Down To These 2 Candidates
On Sept. 5, either Rishi Sunak or Liz Truss will be announced as the U.K.'s new prime minister.

### Observation
* Despite no mention of the Indian Prime Minister in the preview, OpenAI answered the question based on it's general world knowledge.

In [49]:
# Query with Cohere reranking
prime_minister_jan2014.node_print(index="reranking", preview_count=10)

- Under the category:
WORLDPOST:
The Limits of Ideology: Lessons from Singapore
Next year will see the 50th anniversary of the creating of Singapore, widely hailed as one of the most successful of the Asian tigers. In that short space of time, the tiny nation state has grown into one of the world's largest financial cities.

- Under the category:
POLITICS:
Electing to Support Our Economy
Either we can choose to elect those who have proven themselves to have the courage and the foresight to make decisions with the long-term benefit of working people in mind, or we can elect those who are too shortsighted and too captive to special interests to make the calls that will help our nation thrive.

- Under the category:
WORLD NEWS:
Arundhati Roy's New Novel Lays India Bare, Unveiling Worlds Within Our Worlds
Malavika Binny, Jawaharlal Nehru University Wearing two hats at once can be an uncomfortable fit, but it does not seem to

- Under the category:
POLITICS:
Shortchanging Our Own Future
To win the future, we should be providing incentives to encourage our youths to pursue professional careers that are suffering from labor shortages, instead of making education more costly and taxing for them.

- Under the category:
POLITICS:
It Takes Just 4 Charts To Show A Big Part Of What's Wrong With Congress

- Under the category:
WORLDPOST:
Cities Need To Get Smarter -- And India's On It

- Under the category:
POLITICS:
The World Bank Must Commit to Food Security
Much will be said about bringing roads, electricity and infrastructure to underdeveloped regions. But how committed is the World Bank to the planet as a whole when it is doling out its loans?

- Under the category:
POLITICS:
Robbing Main Street to Prop Up Wall Street:  Why Jerry Brown's Rainy Day Fund Is a Bad Idea
There is no need to sequester funds urgently needed by Main Street to pay for Wall Street's malfeasance. Californians can have their cake and eat it too - with a state-owned bank.

- Under the category:
THE WORLDPOST:
Greece Demands IMF Explain 'Disaster' Remarks In Explosive Leak
A letter from Greek prime minister Alexis Tsipras questions whether the country "can trust" the lender.

- Under the category:
WORLD NEWS:
The Race To Replace Boris Johnson As Prime Minister Is Down To These 2 Candidates
On Sept. 5, either Rishi Sunak or Liz Truss will be announced as the U.K.'s new prime minister.

### Observations
* The Cohere Reranking of the top ten responses does not include any mention of the Indian Prime Minister.
* OpenAI answered this question based on it's general world knowledge.

### Conclusions

* A news dataset was downloaded and a Q&A bot was built with Qdrant and LlamaIndex.

* A Recency postprocessor, Cohere Reranking postprocessor, and a combined Recency plus Cohere Reranking postprocessor were added.

* The Recency postprocessor may not be a good choice if data from a specific year is needed as the postprocessor was unable to answer those questions.

* OpenAI answered questions from it's general world knowledge when the data was not available from the vector store.