# Project: Question-Answering using Retrieval Augmented Generation
by L.Arduini, D.N.Ghaneh, L.Menchini, C.Petruzzella

## Description
This project implements a QA chatbot leveraging language models hosted on a scalable server infrastructure. It provides embeddings to facilitate query-answering capabilities with advanced retrieval mechanisms.

## Instructions to Run

### Prerequisites
1. Python 3.10 or above.
2. Access to a runtime environment with GPU support (e.g., NVIDIA T4 on Google Colab) for optimal performance.

### Running the project
- Switch the runtime to GPU (e.g., NVIDIA T4) for enhanced performance

In [24]:
!pip install ir_datasets
!pip install rank_bm25
!pip install sentence_transformers
!pip install pytrec_eval
!pip install PyStemmer



In [25]:
from tqdm import tqdm
import json
import ir_datasets
import os
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import pandas as pd
from rank_bm25 import BM25Okapi
from transformers import pipeline
import torch
from huggingface_hub import login
import pytrec_eval
import collections
import itertools
import heapq

api_key = "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"
login(token=api_key)

# Check GPU availability
def get_device():
    if torch.cuda.is_available():
        device = "cuda"
        cuda_version = torch.version.cuda  # Retrieve CUDA version
        gpu_properties = torch.cuda.get_device_properties(torch.cuda.current_device())
        print(f"Using GPU: {gpu_properties.name}")
        print(f"CUDA Version: {cuda_version}")
        print(f"CUDA Cores: {gpu_properties.multi_processor_count}")
        print(f"Total Memory: {gpu_properties.total_memory / 1e9:.2f} GB")
        print(f"Compute Capability: {gpu_properties.major}.{gpu_properties.minor}")
    elif torch.backends.mps.is_available():
        device = "mps"
        print("Using MPS (Metal Performance Shaders)")
    else:
        device = "cpu"
        print("Using CPU")
    return device

device = get_device()


Using GPU: NVIDIA A100-SXM4-40GB
CUDA Version: 12.1
CUDA Cores: 108
Total Memory: 42.48 GB
Compute Capability: 8.0


# Section 1: Dataset loading and preparation

In [26]:
from functools import lru_cache
import re
import string
import Stemmer
import nltk
nltk.download("stopwords", quiet=True)

# ------- Pre Initialization -------
# 1. Compile regex patterns once globally
# 2. Preload stopwords set
# 3. Initialize stemmer

ACRONYM_REGEX = re.compile(r"(?<!\w)\.(?!\d)")
PUNCTUATION_TRANS = str.maketrans("", "", string.punctuation)
STOPWORDS = set(nltk.corpus.stopwords.words('english'))
STEMMER = Stemmer.Stemmer('english')

# Define a cached function to stem individual words
@lru_cache(maxsize=1000)
def stem(word):
    return STEMMER.stemWord(word)

# ----------------------------------

def preprocess(s):
    """
    Preprocess a string for indexing or querying.

    Args:
        s: The input string.

    Returns:
        A list of preprocessed tokens.
    """

    s = s.lower()
    s = s.replace("&", " and ")
    # normalize quotes and dashes
    s = s.translate(str.maketrans("‘’´“”–-", "'''\"\"--"))
    # remove unnecessary dots in acronyms (but not decimals)
    s = ACRONYM_REGEX.sub("", s)
    # remove punctuation
    s = s.translate(PUNCTUATION_TRANS)
    # strip and remove extra spaces
    s = " ".join(s.split())

    tokens = s.split()
    tokens = [t for t in tokens if t not in STOPWORDS]
    tokens = STEMMER.stemWords(tokens)
    return tokens

In [27]:
# Load dataset
print("Loading the trec covid dataset...")
dataset = ir_datasets.load("cord19/trec-covid")

Loading the trec covid dataset...


In [28]:
import pandas as pd

# Convert the dataset to a pandas DataFrame for easier manipulation
df = pd.DataFrame(dataset.docs_iter(), columns=['doc_id', 'title', 'doi', 'date', 'abstract'])

# Check length of the dataset
print(f"Dataset length: {len(df)}")

# Check number of documents with duplicate abstracts
print(f"Number of documents with duplicate abstracts: {df['abstract'].duplicated().sum()}")

# Remove documents with empty or null abstracts
print("Removing documents with empty or null abstracts...")
data_cleaned = df[~df['abstract'].isnull() & (df['abstract'].str.strip() != '')]

# Remove documents with duplicate abstracts
print("Removing documents with duplicate abstracts...")
docs_dataset = df.drop_duplicates(subset='abstract')

# Check dataset length
print(f"Cleaned dataset length: {len(docs_dataset)}")

Dataset length: 192509
Number of documents with duplicate abstracts: 66793
Removing documents with empty or null abstracts...
Removing documents with duplicate abstracts...
Cleaned dataset length: 125716


In [29]:
# Prepare documents and queries
print("Preparing documents and queries...")

# put all documents and queries in a list of dictionaries
all_docs = []
for index, row in docs_dataset.iterrows():
    abstract = f"{row['title']} {row['abstract']}"
    all_docs.append({"doc_id": row['doc_id'], "abstract": abstract})

all_queries = []
for query in dataset.queries_iter():
    query_text = f"{query.description}"
    all_queries.append({"query_id": query.query_id, "title": query_text})

# Print dataset size information
print(f"Summary: {len(all_docs)} documents and {len(all_queries)} queries are available in the dataset.")

# Tokenize documents
tokenized_docs = [preprocess(doc) for doc in [docs["abstract"] for docs in all_docs]]
tokenized_queries = [preprocess(query) for query in [queries["title"] for queries in all_queries]]
print("Tokenization of documents is done.")

bm25 = BM25Okapi(tokenized_docs)

Preparing documents and queries...
Summary: 125716 documents and 50 queries are available in the dataset.
Tokenization of documents is done.


In [30]:
# convert qrels to a dictionary
qrels_dict = collections.defaultdict(dict)
for qrel in dataset.qrels_iter():
    qrels_dict[qrel.query_id][qrel.doc_id] = int(qrel.relevance)

# Section 2: Embeddings generation

In [31]:
# Load or generate embeddings
force_generate = False

def generate_embeddings():
    if not force_generate and os.path.exists("trec_covid_doc_embeddings.csv") and os.path.exists("trec_covid_query_embeddings.csv"):
        print("Loading precomputed embeddings...")
        doc_embeddings = pd.read_csv("trec_covid_doc_embeddings.csv").values
        query_embeddings = pd.read_csv("trec_covid_query_embeddings.csv").values
    else:
        print("No precomputed embeddings found.")
        print("Generating new embeddings using SentenceTransformer model 'sentence-transformers/all-MiniLM-L6-v2'.")
        model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2", device=device)
        doc_embeddings = model.encode([doc["abstract"] for doc in all_docs], batch_size=32, show_progress_bar=True, normalize_embeddings=True)
        query_embeddings = model.encode([query['title'] for query in all_queries], batch_size=32, show_progress_bar=True, normalize_embeddings=True)

        # Save embeddings for future use
        pd.DataFrame(doc_embeddings).to_csv("trec_covid_doc_embeddings.csv", index=False)
        pd.DataFrame(query_embeddings).to_csv("trec_covid_query_embeddings.csv", index=False)

    return doc_embeddings, query_embeddings

doc_embeddings, query_embeddings = generate_embeddings()

No precomputed embeddings found.
Generating new embeddings using SentenceTransformer model 'sentence-transformers/all-MiniLM-L6-v2'.


Batches:   0%|          | 0/3929 [00:00<?, ?it/s]

Batches:   0%|          | 0/2 [00:00<?, ?it/s]

# Section 3: Retrieval implementation

In [32]:
# Function to prepare run data for pytrec_eval
def prepare_run_data(results):
    """
    Prepares the run data in the format expected by pytrec_eval.
    Converts numpy scores to native Python float for compatibility.
    """
    run = {}
    for query_results in results:
        query_id = query_results['query']['query_id']
        run[query_id] = {}
        for doc_id, score in zip(query_results['results'], query_results['scores']):
            run[query_id][doc_id] = float(score)  # Convert numpy type to float
    return run

### Document Retrieval Methods

1. **BM25 Sparse Retrieval**:
   - The **BM25 algorithm** is used to perform sparse retrieval on tokenized documents by calculating a relevance score for each document based on the query. It then returns the indices and relevance scores of the top-k most relevant documents.

2. **Dense Retrieval**:
   - **Dense retrieval** is performed by calculating the cosine similarity between the query embedding and the document embeddings. The top-k documents with the highest similarity scores are returned.

3. **Rank Fusion Retrieval**:
   - Results from both **BM25** and **dense retrieval** are combined using a **rank fusion** technique. Scores from both methods are normalized, weighted by a parameter `alpha`, and the top-k documents are returned based on the combined scores.

4. **Cascading Retrieval**:
   - Initially, a set of documents is retrieved using both Sparse and Dense Retrieval. Afterwards, a reranking step is made using a Reranker Model.

In [33]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

cross_encoder_model = AutoModelForSequenceClassification.from_pretrained('cross-encoder/ms-marco-MiniLM-L-6-v2').to("cuda")
cross_encoder_tokenizer = AutoTokenizer.from_pretrained('cross-encoder/ms-marco-MiniLM-L-6-v2')

In [34]:
from scipy.stats import zscore

# BM25 Sparse Retrieval
def bm25_retrieve(query, bm25, top_k=5):
    """
    Perform sparse retrieval using BM25 on the tokenized documents.
    Returns the indices and scores of the top-k documents.
    """
    tokenized_query = preprocess(query)                                     # Tokenize the query into words
    scores = bm25.get_scores(tokenized_query)                                   # Get BM25 scores for all documents
    top_k_indices = np.argsort(scores)[-top_k:][::-1]                           # Get indices of top-k documents based on BM25 score
    return top_k_indices, scores[top_k_indices]

# Dense Retrieval
def dense_retrieve(query_embedding, doc_embeddings, top_k=5):
    """
    Perform dense retrieval using cosine similarity between query and document embeddings.
    Returns the indices and similarities of the top-k documents.
    """
    similarities = cosine_similarity([query_embedding], doc_embeddings)[0]      # Compute cosine similarity
    top_k_indices = np.argsort(similarities)[-top_k:][::-1]                     # Get top-k indices based on similarity
    return top_k_indices, similarities[top_k_indices]

def combsum_fusion(dense_indices, dense_scores, sparse_indices, sparse_scores, top_k=5):
    # Combine scores using CombSUM
    all_doc_ids = np.concatenate((sparse_indices, dense_indices))
    all_scores = np.concatenate((sparse_scores, dense_scores))
    combined_scores = collections.defaultdict(float)
    for doc_id, score in zip(all_doc_ids, all_scores):
        combined_scores[doc_id] += score

    # Retrieve top-k documents based on combined scores
    top_docs = heapq.nlargest(top_k, combined_scores.items(), key=lambda x: x[1])

    # return top k indices and scores
    return [doc[0] for doc in top_docs], [doc[1] for doc in top_docs]

def neural_rerank(query_text, dense_indices, dense_scores, sparse_indices, sparse_scores, top_k=5):

    doc_ids = np.concatenate((sparse_indices, dense_indices))
    documents = []
    for doc in doc_ids:
        documents.append(all_docs[doc]['abstract'])
    features = cross_encoder_tokenizer([query_text]*len(documents), documents, padding=True, truncation=True, return_tensors="pt").to("cuda")
    with torch.no_grad():
      scores = cross_encoder_model(**features).logits

    # Rerank the documents by scores
    doc_scores = {doc_id: score.item() for doc_id, score in zip(doc_ids, scores)}
    reranked_doc_scores = dict(sorted(doc_scores.items(), key=lambda item: item[1], reverse=True))

    # return top indices and top scores
    return list(reranked_doc_scores.keys())[:top_k], list(reranked_doc_scores.values())[:top_k]

This section of code performs several retrieval experiments using the four different Document Retrieval Methods described earlier.

In [35]:
# Run retrieval experiments
def run_retrieval_experiments():
    """
    Execute sparse, dense, rank fusion, and cascading retrieval for all queries.
    Save the results to a JSON file for further analysis.
    """
    results = {"sparse": [], "dense": [], "rank_fusion": [], "cascade": []}

    print("Running retrieval experiments on all queries.")

    # Iterate over each query and its embedding
    for query, query_embedding in tqdm(zip(all_queries, query_embeddings), total=len(all_queries)):
        # Extract the query ID and text for the current query
        query_id = query['query_id']
        query_text = query['title']

        # Sparse Retrieval using BM25
        sparse_indices, sparse_scores = bm25_retrieve(query_text, bm25)                 # Retrieve the top-k BM25 documents and their scores
        sparse_docs = [all_docs[idx]['doc_id'] for idx in sparse_indices]               # Get document IDs from the indices

        # Dense Retrieval using cosine similarity
        dense_indices, dense_scores = dense_retrieve(query_embedding, doc_embeddings)   # Retrieve the top-k documents based on cosine similarity of embeddings
        dense_docs = [all_docs[idx]['doc_id'] for idx in dense_indices]

        # Normalize scores
        sparse_scores = zscore(sparse_scores)
        dense_scores = zscore(dense_scores)
        results["sparse"].append({"query": query, "results": sparse_docs, "scores": sparse_scores}) # Store the BM25 results for the current query
        results["dense"].append({"query": query, "results": dense_docs, "scores": dense_scores})

        # Rank Fusion Retrieval by combining sparse (BM25) and dense result
        fusion_indices, fusion_scores = combsum_fusion(dense_indices, dense_scores, sparse_indices, sparse_scores)
        fusion_docs = [all_docs[idx]['doc_id'] for idx in fusion_indices]
        results["rank_fusion"].append({"query": query, "results": fusion_docs, "scores": fusion_scores})

        # Cascade Retrieval: compute sparse and dense retrieval, then use reranker
        cascade_indices, cascade_scores = neural_rerank(query_text, dense_indices, dense_scores, sparse_indices, sparse_scores)
        cascade_docs = [all_docs[idx]['doc_id'] for idx in cascade_indices]
        results["cascade"].append({"query": query, "results": cascade_docs, "scores": cascade_scores})
    return results

results = run_retrieval_experiments()


Running retrieval experiments on all queries.


100%|██████████| 50/50 [00:29<00:00,  1.72it/s]


In [36]:
run_sparse = prepare_run_data(results["sparse"])
run_dense = prepare_run_data(results["dense"])
run_rank_fusion = prepare_run_data(results["rank_fusion"])
run_cascade = prepare_run_data(results["cascade"])

# Evaluate results with pytrec_eval
evaluator = pytrec_eval.RelevanceEvaluator(qrels_dict, {'recall.5', 'ndcg_cut.5'})
eval_results_sparse = evaluator.evaluate(run_sparse)
eval_results_dense = evaluator.evaluate(run_dense)
eval_results_rank_fusion = evaluator.evaluate(run_rank_fusion)
eval_results_cascade = evaluator.evaluate(run_cascade)

# Aggregate metrics for overall performance
aggregated_results = {
    "sparse": {
        metric: sum([res[metric] for res in eval_results_sparse.values()]) / len(eval_results_sparse)
        for metric in eval_results_sparse[next(iter(eval_results_sparse))]
    },
    "dense": {
        metric: sum([res[metric] for res in eval_results_dense.values()]) / len(eval_results_dense)
        for metric in eval_results_dense[next(iter(eval_results_dense))]
    },
    "rank_fusion": {
        metric: sum([res[metric] for res in eval_results_rank_fusion.values()]) / len(eval_results_rank_fusion)
        for metric in eval_results_rank_fusion[next(iter(eval_results_rank_fusion))]
    },
    "cascade": {
        metric: sum([res[metric] for res in eval_results_cascade.values()]) / len(eval_results_cascade)
        for metric in eval_results_cascade[next(iter(eval_results_cascade))]
    }
}

print("Aggregated results:", json.dumps(aggregated_results, indent=4))
print("Retrieval results and metrics saved to files.")

Aggregated results: {
    "sparse": {
        "recall_5": 0.008623629150449596,
        "ndcg_cut_5": 0.6805147106092604
    },
    "dense": {
        "recall_5": 0.008256141329265207,
        "ndcg_cut_5": 0.6636285607092102
    },
    "rank_fusion": {
        "recall_5": 0.008793696779750117,
        "ndcg_cut_5": 0.7045910900616086
    },
    "cascade": {
        "recall_5": 0.010368315231514647,
        "ndcg_cut_5": 0.7947962701630851
    }
}
Retrieval results and metrics saved to files.


# Section 4: QA with Language Model

In [37]:
# QA for the first query
QUERY_INDEX = 3                                                     # Index of the query to be used for retrieval
query = all_queries[QUERY_INDEX - 1]                                # Select the query from the list based on the index
query_text = query['title'] if isinstance(query, dict) else query   # Get the query text

# Perform dense retrieval using query embedding and document embeddings
dense_top_k_indices, dense_top_k_scores = dense_retrieve(query_embeddings[QUERY_INDEX-1], doc_embeddings)
# Perform sparse retrieval using BM25 on the query text
sparse_top_k_indices, sparse_top_k_scores = bm25_retrieve(query_text, bm25)
# Perform rank fusion retrieval by combining BM25 and dense retrieval results
rank_top_k_indices, rank_top_k_scores = combsum_fusion(dense_top_k_indices, dense_top_k_scores, sparse_top_k_indices, sparse_top_k_scores)
# Perform cascading retrieval: first BM25, then re-rank with dense retrieval
cascading_top_k_indices, cascading_top_k_scores = neural_rerank(query_text, dense_top_k_indices, dense_top_k_scores, sparse_top_k_indices, sparse_top_k_scores)

# Get retrieved documents for each method
dense_retrieved_docs = [f"Document {i+1}: {all_docs[idx]['abstract']}" for i, idx in enumerate(dense_top_k_indices)]
sparse_retrieved_docs = [f"Document {i+1}: {all_docs[idx]['abstract']}" for i, idx in enumerate(sparse_top_k_indices)]
rank_retrieved_docs = [f"Document {i+1}: {all_docs[idx]['abstract']}" for i, idx in enumerate(rank_top_k_indices)]
cascading_retrieved_docs = [f"Document {i+1}: {all_docs[idx]['abstract']}" for i, idx in enumerate(cascading_top_k_indices)]

# Definition of the model that will be used to generate the various responses.
lm_pipeline = pipeline("text-generation",
                      model="meta-llama/Llama-3.2-3B",
                      device=0 if device == "cuda" else -1)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cuda:0


In [38]:
INSTRUCTIONS = "Answer the user's QUESTION using the CONTEXT text above in a clear and conversational tone. Keep your answer ground in the facts of the CONTEXT. Avoid structured formats. If the DOCUMENT doesn’t contain the facts to answer the QUESTION return {NONE}"
ANSWER = "Answer:\n"

def build_prompt(query_text, retrieved_docs):
  context = "\n".join(retrieved_docs)
  prompt = f"CONTEXT:\n{context}\n\nQUESTION:\n{query_text}\n\nINSTRUCTIONS:\n{INSTRUCTIONS}\n\n{ANSWER}"
  return prompt

#### Question-answering using DENSE RETRIEVAL

In [39]:
print("------------------ DENSE RETRIEVAL ----------------------\n")
prompt = build_prompt(query_text, dense_retrieved_docs)

print(f"----------------- Length of the prompt -----------------\n{len(prompt.split())} words")
print(f"------------------------ Prompt ------------------------\n{prompt}")

# Generate response
response = lm_pipeline(prompt,
                      max_new_tokens=150,
                      temperature=0.3,
                      truncation=False)[0]["generated_text"]

response = response.split(prompt)[1].strip()
print(f"------------------ Response ------------------\n{response}")

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


------------------ DENSE RETRIEVAL ----------------------

----------------- Length of the prompt -----------------
350 words
------------------------ Prompt ------------------------
CONTEXT:
Document 1: Dissecting antibody-mediated protection against SARS-CoV-2 Understanding the properties and mechanisms by which antibodies provide protection is essential to defining immunity. Although neutralizing antibodies have been proposed as a potential key mechanism of protection against many viral pathogens, antibodies mediate additional immune functions that may have both protective and pathological consequences. Dissecting these properties against SARS-CoV-2 is likely necessary for defining metrics of immunity that will inform the design of vaccines and therapeutics and improve clinical management.
Document 2: Reply to McDonald, “Protections against the Risk of Airborne SARS-CoV-2 Infection” We thank Dr McDonald ([1][1]) for his close reading of our paper ([2][2]) and acknowledge that he mak

#### Question-answering using SPARSE RETRIEVAL

In [40]:
print("------------------ SPARSE RETRIEVAL ----------------------\n")
prompt = build_prompt(query_text, sparse_retrieved_docs)

print(f"----------------- Length of the prompt -----------------\n{len(prompt.split())} words")
print(f"------------------------ Prompt ------------------------\n{prompt}")

# Generate response
response = lm_pipeline(prompt,
                      max_new_tokens=150,
                      temperature=0.3,
                      truncation=False)[0]["generated_text"]

response = response.split(prompt)[1].strip()
print(f"------------------ Response ------------------\n{response}")

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


------------------ SPARSE RETRIEVAL ----------------------

----------------- Length of the prompt -----------------
929 words
------------------------ Prompt ------------------------
CONTEXT:
Document 1: COVID‐19 is milder in children possibly due to cross immunity It has been unclear why the new severe acute respiratory syndrome coronavirus (sars‐CoV‐2) hits a small minority hard, while the vast majority of children appear to be protected and develop mild or no disease (1,2). The editorial by Brodin suggests some possible mechanisms why it is so (1). I would like to emphasize the significance of cross immunity due to previous exposure to seasonal coronavirus; it may be a plausible explanation for why children appear to be protected (2,3).
Document 2: Recombinant tandem epitope vaccination provides cross protection against Actinobacillus pleuropneumoniae challenge in mice Actinobacillus pleuropneumoniae (A. pleuropneumoniae/APP) is the pathogen that causes porcine contagious pleuropne

#### Question-answering using RANK FUSION

In [41]:
print("------------------ RANK FUSION ----------------------\n")
prompt = build_prompt(query_text, rank_retrieved_docs)

print(f"----------------- Length of the prompt -----------------\n{len(prompt.split())} words")
print(f"------------------------ Prompt ------------------------\n{prompt}")

# Generate response
response = lm_pipeline(prompt,
                      max_new_tokens=150,
                      temperature=0.3,
                      truncation=False)[0]["generated_text"]

response = response.split(prompt)[1].strip()
print(f"------------------ Response ------------------\n{response}")

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


------------------ RANK FUSION ----------------------

----------------- Length of the prompt -----------------
929 words
------------------------ Prompt ------------------------
CONTEXT:
Document 1: COVID‐19 is milder in children possibly due to cross immunity It has been unclear why the new severe acute respiratory syndrome coronavirus (sars‐CoV‐2) hits a small minority hard, while the vast majority of children appear to be protected and develop mild or no disease (1,2). The editorial by Brodin suggests some possible mechanisms why it is so (1). I would like to emphasize the significance of cross immunity due to previous exposure to seasonal coronavirus; it may be a plausible explanation for why children appear to be protected (2,3).
Document 2: Recombinant tandem epitope vaccination provides cross protection against Actinobacillus pleuropneumoniae challenge in mice Actinobacillus pleuropneumoniae (A. pleuropneumoniae/APP) is the pathogen that causes porcine contagious pleuropneumoni

#### Question-answering using CASCADING RETRIEVAL

In [42]:
print("------------------ CASCADING RETRIEVAL ----------------------\n")
prompt = build_prompt(query_text, cascading_retrieved_docs)

print(f"----------------- Length of the prompt -----------------\n{len(prompt.split())} words")
print(f"------------------------ Prompt ------------------------\n{prompt}")

# Generate response
response = lm_pipeline(prompt,
                      max_new_tokens=150,
                      temperature=0.3,
                      truncation=False)[0]["generated_text"]

response = response.split(prompt)[1].strip()
print(f"------------------ Response ------------------\n{response}")

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


------------------ CASCADING RETRIEVAL ----------------------

----------------- Length of the prompt -----------------
602 words
------------------------ Prompt ------------------------
CONTEXT:
Document 1: COVID‐19 is milder in children possibly due to cross immunity It has been unclear why the new severe acute respiratory syndrome coronavirus (sars‐CoV‐2) hits a small minority hard, while the vast majority of children appear to be protected and develop mild or no disease (1,2). The editorial by Brodin suggests some possible mechanisms why it is so (1). I would like to emphasize the significance of cross immunity due to previous exposure to seasonal coronavirus; it may be a plausible explanation for why children appear to be protected (2,3).
Document 2: Children's vaccines do not induce cross reactivity against SARS-CoV. In contrast with adults, children infected by severe acute respiratory syndrome-corona virus (SARS-CoV) develop milder clinical symptoms. Because of this, it is spec

#### Question-answering WITH NO CONTEXT PROVIDED WITH RAG

In [43]:
print("------------------ RESPONSE WITHOUT RAG ----------------------\n")
prompt = f"""Question:\n{query_text}\n\nAnswer in a concise and clear manner without repetition (if no direct answer, provide a general summary):"""

print(f"----------------- Length of the prompt -----------------\n{len(prompt.split())} words")
print(f"------------------------ Prompt ------------------------\n{prompt}")

response = lm_pipeline(prompt,
                      max_new_tokens=150,
                      temperature=0.3,
                      truncation=False)[0]["generated_text"]

response = response.split("Answer in a concise and clear manner without repetition (if no direct answer, provide a general summary):")[1].strip()
print(f"------------------ Response ------------------\n{response}")

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


------------------ RESPONSE WITHOUT RAG ----------------------

----------------- Length of the prompt -----------------
28 words
------------------------ Prompt ------------------------
Question:
will SARS-CoV2 infected people develop immunity? Is cross protection possible?

Answer in a concise and clear manner without repetition (if no direct answer, provide a general summary):
------------------ Response ------------------



In [44]:
import random

picked_queries = random.sample(all_queries, 5)

for q in picked_queries:

    # For each query, retrieve and rank documents independently
    query_text = q['title']
    cascading_top_k_indices, cascading_top_k_scores = neural_rerank(query_text, dense_top_k_indices, dense_top_k_scores, sparse_top_k_indices, sparse_top_k_scores)

    # Use the top-k documents for that specific query
    cascading_retrieved_docs = [f"Document {i+1}: {all_docs[idx]['abstract']}" for i, idx in enumerate(cascading_top_k_indices)]
    cascading_context = "\n".join(cascading_retrieved_docs)

    # Repeat the process for rank fusion
    rank_top_k_indices, rank_top_k_scores = combsum_fusion(dense_top_k_indices, dense_top_k_scores, sparse_top_k_indices, sparse_top_k_scores)
    rank_retrieved_docs = [f"Document {i+1}: {all_docs[idx]['abstract']}" for i, idx in enumerate(rank_top_k_indices)]
    rank_fusion_context = "\n".join(rank_retrieved_docs)

    cascading_prompt = build_prompt(query_text, cascading_retrieved_docs)
    rank_fusion_prompt = build_prompt(query_text, rank_retrieved_docs)
    # cascading_prompt = f"Context:\n{cascading_context}\n\nQuestion:\n{query_text}\n\nAnswer in a concise and clear manner without repetition (if no direct answer, provide a general summary):"
    # rank_fusion_prompt = f"Context:\n{rank_fusion_context}\n\nQuestion:\n{query_text}\n\nAnswer in a concise and clear manner without repetition (if no direct answer, provide a general summary):"

    # Generate response using language model
    cascading_response = lm_pipeline(cascading_prompt,
                           max_new_tokens=150,
                           temperature=0.3,
                           truncation=False)[0]["generated_text"]

    rank_fusion_response = lm_pipeline(rank_fusion_prompt,
                      max_new_tokens=150,
                      temperature=0.3,
                      truncation=False)[0]["generated_text"]

    # Extract the answer from the response
    cascading_response = cascading_response.split(cascading_prompt)[1].strip()
    rank_fusion_response = rank_fusion_response.split(rank_fusion_prompt)[1].strip()

    # Print the results
    print("\n------------------------------")
    print(f"QUERY: {query_text}")
    print(f"CASCADING RESPONSE: {cascading_response}")
    print(f"RANK FUSION RESPONSE: {rank_fusion_response}")
    print("------------------------------\n")


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



------------------------------
QUERY: What are the impacts of COVID-19 among African-Americans that differ from the rest of the U.S. population?
CASCADING RESPONSE: The COVID-19 pandemic has disproportionately impacted African Americans in the United States. African Americans are more likely to be hospitalized and die from COVID-19 than other racial and ethnic groups. This is likely due to several factors, including higher rates of underlying medical conditions, such as diabetes and heart disease, that increase the risk of severe illness from COVID-19. Additionally, African Americans are more likely to live in crowded and underserved communities, which can increase the risk of exposure to the virus. Finally, African Americans are more likely to have jobs that require them to work in close proximity to others, which can increase the risk of exposure to the virus.
RANK FUSION RESPONSE: Document 1: COVID‐19 is milder in children possibly due to cross immunity It has been unclear why the 

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



------------------------------
QUERY: What is the mechanism of cytokine storm syndrome on the COVID-19?
CASCADING RESPONSE: The cytokine storm syndrome is a severe inflammatory response that is associated with COVID-19. It is characterized by an overproduction of pro-inflammatory cytokines, such as interleukin-6 (IL-6), tumor necrosis factor-α (TNF-α), and interferon-γ (IFN-γ). These cytokines can cause a wide range of symptoms, including fever, cough, shortness of breath, and fatigue. The cytokine storm syndrome is thought to be caused by a combination of factors, including the virus itself, the host immune response, and the presence of other underlying conditions. The cytokine storm syndrome can be fatal, and it is important to identify and treat it early in order to improve the prognosis of
RANK FUSION RESPONSE: The cytokine storm syndrome is a severe immune response to a pathogen. The cytokine storm syndrome is a severe immune response to a pathogen. The cytokine storm syndrome is

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



------------------------------
QUERY: what are the mortality rates overall and in specific populations
CASCADING RESPONSE: The mortality rate of COVID-19 is 3.4% in the general population and 8.9% in the elderly population. The mortality rate of COVID-19 in the general population is 3.4% and the mortality rate of COVID-19 in the elderly population is 8.9%. The mortality rate of COVID-19 in the general population is 3.4% and the mortality rate of COVID-19 in the elderly population is 8.9%. The mortality rate of COVID-19 in the general population is 3.4% and the mortality rate of COVID-19 in the elderly population is 8.9%. The mortality rate of COVID-19 in the general population is 3.4%
RANK FUSION RESPONSE: Document 1: COVID‐19 is milder in children possibly due to cross immunity It has been unclear why the new severe acute respiratory syndrome coronavirus (sars‐CoV‐2) hits a small minority hard, while the vast majority of children appear to be protected and develop mild or no disease 

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



------------------------------
QUERY: How does the coronavirus differ from seasonal flu?
CASCADING RESPONSE: The coronavirus is a different virus than the seasonal flu. The coronavirus is a new virus that has been identified as the cause of COVID-19. The seasonal flu is caused by influenza viruses, which are a group of viruses that can cause infections in the nose, throat, and lungs. The coronavirus is a new virus that has not been identified before, and it is not known how it will behave in humans. The seasonal flu is caused by influenza viruses, which are a group of viruses that can cause infections in the nose, throat, and lungs. The coronavirus is a new virus that has not been identified before, and it is not known how it will behave in humans. The seasonal flu is caused by influenza viruses, which are a group of viruses
RANK FUSION RESPONSE: The coronavirus differs from seasonal flu in that it is more contagious and more deadly. The coronavirus is also more difficult to treat and

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



------------------------------
QUERY: what evidence is there for dexamethasone as a treatment for COVID-19?
CASCADING RESPONSE: Dexamethasone is a steroid that has been used for decades to treat a variety of inflammatory conditions. It is also used to treat COVID-19. The most recent study on dexamethasone was published in the New England Journal of Medicine in June 2020. The study looked at the effects of dexamethasone on patients who were hospitalized with COVID-19. The study found that dexamethasone reduced the risk of death in patients who were on ventilators or oxygen therapy. The study also found that dexamethasone did not help patients who were not on ventilators or oxygen therapy. The study was conducted in the United Kingdom and included patients from 53 hospitals. The study found that dexam
RANK FUSION RESPONSE: The evidence for dexamethasone as a treatment for COVID-19 is based on the following studies:
1. Dexamethasone is a glucocorticoid that has been used for decades to t