### Introduction
This notebook presents a RAG workflow for the [PubMed QA](https://pubmedqa.github.io/) task using [LlamaIndex](https://www.llamaindex.ai/). The code is written in a configurable fashion, giving you the flexibility to edit the RAG configuration and observe the change in output/responses.

It covers a step-by-step procedure for building the RAG workflow (Stages 1-4) and later runs the pipeline on a sample from the dataset. The notebook also covers the sparse, dense, hybrid retrieval strategies along with the re-ranker. We have alse added an optional component for RAG evaluation using the [Ragas](https://docs.ragas.io/en/stable/getstarted/install.html) library.

- Apply the notebook to the AML adverse media news: Personal and Commercial

### Import libraries, custom classes and functions

In [1]:
import pandas as pd
from pathlib import Path
from pprint import pprint
import sys
import os
import random

from llama_index.core import ServiceContext, set_global_service_context, set_global_handler
from llama_index.core.node_parser import SentenceSplitter

# from task_dataset import PubMedQATaskDataset    # for the vector data

from utils.hosting_utils import RAGLLM
from utils.rag_utils import (
    DocumentReader, RAGEmbedding, RAGQueryEngine, RagasEval, 
    extract_yes_no, evaluate, validate_rag_cfg
    )
from utils.storage_utils import RAGIndex

In [2]:
import warnings
warnings.filterwarnings('ignore')

### Set RAG configuration

In [3]:
rag_cfg = {
    # Node parser config
    "chunk_size": 256,
    "chunk_overlap": 10,

    # Embedding model config
    "embed_model_type": "hf",
    "embed_model_name": "BAAI/bge-base-en-v1.5", # Daniel: https://huggingface.co/spaces/mteb/leaderboard - Token size: 512

    # LLM config
    "llm_type": "local",
    "llm_name": "Mistral-7B-v0.1", #"Llama-2-7b-chat-hf",    # Daniel: change it to 13b  - 
    "max_new_tokens": 256,
    "temperature": 0.0,
    "top_p": 1.0,
    "top_k": 50,
    "do_sample": False,

    # Vector DB config
    "vector_db_type": "weaviate", # "weaviate"
    "vector_db_name": "Daniel_AML",  # Daniel: "Pubmed_QA",     
    
    # MODIFY THIS (Daniel: changed)
    #"weaviate_url": "https://rag-bootcamp-pubmed-qa-n3u138r8.weaviate.network", 
    "weaviate_url": "https://vector-rag-bootcamp-fqe3dp7f.weaviate.network",

    # Retriever and query config
    "retriever_type": "vector_index", # "vector_index"
    "retriever_similarity_top_k": 5,   # 5
    "query_mode": "hybrid", # "default", "hybrid"
    "hybrid_search_alpha": 0.0, # float from 0.0 (sparse search - bm25) to 1.0 (vector search)
    "response_mode": "compact",
    "use_reranker": False,
    "rerank_top_k": 3,

    # Evaluation config
    "eval_llm_type": "openai",
    "eval_llm_name": "gpt-3.5-turbo",
}

### Read secrets

#### Weaviate Key

In [4]:
try:
    f = open(Path.home() / ".weaviate.key", "r")
    f.close()
except Exception as err:
    print(f"Could not read your Weaviate key. Please make sure this is available in plain text under your home directory in ~/.weaviate.key: {err}")

#### Cohere API Key (required for re-ranker)

In [5]:
try:
    f = open(Path.home() / ".cohere.key", "r")
    os.environ["COHERE_API_KEY"] = f.read().rstrip("\n")
    f.close()
except Exception as err:
    print(f"Could not read your Cohere API key. Please make sure this is available in plain text under your home directory in ~/.cohere.key: {err}")

#### OpenAI API Key [Optional]

In [6]:
try:
    f = open(Path.home() / ".openai.key", "r")
    os.environ["OPENAI_API_KEY"] = f.read().rstrip("\n")
    f.close()
except Exception as err:
    print(f"Could not read your OpenAI API key. If you wish to run RAG evaluation, please make sure this is available in plain text under your home directory in ~/.openai.key: {err}")

## STAGE 0 - Preliminary config checks

In [7]:
validate_rag_cfg(rag_cfg)
pprint(rag_cfg)

{'chunk_overlap': 10,
 'chunk_size': 256,
 'do_sample': False,
 'embed_model_name': 'BAAI/bge-base-en-v1.5',
 'embed_model_type': 'hf',
 'eval_llm_name': 'gpt-3.5-turbo',
 'eval_llm_type': 'openai',
 'hybrid_search_alpha': 0.0,
 'llm_name': 'Mistral-7B-v0.1',
 'llm_type': 'local',
 'max_new_tokens': 256,
 'query_mode': 'hybrid',
 'rerank_top_k': 3,
 'response_mode': 'compact',
 'retriever_similarity_top_k': 5,
 'retriever_type': 'vector_index',
 'temperature': 0.0,
 'top_k': 50,
 'top_p': 1.0,
 'use_reranker': False,
 'vector_db_name': 'Daniel_AML',
 'vector_db_type': 'weaviate',
 'weaviate_url': 'https://vector-rag-bootcamp-fqe3dp7f.weaviate.network'}


## STAGE 1 - Load dataset and documents

#### 1. Load PubMed QA dataset
PubMedQA ([github](https://github.com/pubmedqa/pubmedqa)) is a biomedical question answering dataset. Each instance consists of a question, a context (extracted from PubMed abstracts), a long answer and a yes/no/maybe answer. We make use of the test split of [this](https://huggingface.co/datasets/bigbio/pubmed_qa) huggingface dataset for this notebook.

**The context for each instance is stored as a text file** (referred to as documents), to align the task as a standard RAG use-case.

- Daniel: Change to AMl adverse media data: Text and PDFs

In [8]:
file_path = './AML_Data/'

In [9]:
file_name = 'personal_data.csv'

In [10]:
df = pd.read_csv(file_path + file_name)

In [11]:
print(df.shape)
df.head()

(324, 4)


Unnamed: 0,alert_identifier,customer_name,suspicious_activity,predicate_offense
0,TMML20240342768,Sam Waksal,"Alert ID: TMML20240342768\nBernie Madoff, who ...",fraud
1,TMML202403475910,Mark Denning,Alert ID: TMML202403475910\nPublished\n\nOne o...,broke investment rules
2,TMML202403405311,Russell Wasendorf Sr,Alert ID: TMML202403405311\nPublished\n\nThe f...,pleads guilty to fraud
3,TMML202403479017,Charlie Shrem,Alert ID: TMML202403479017\nA senior figure in...,arrested for money launering
4,TMML202403436919,Shane Whittle,Alert ID: TMML202403436919\nRohan Marley is no...,sanction against smn


In [12]:
# print('Loading PubMed QA data ...')
# pubmed_data = PubMedQATaskDataset('bigbio/pubmed_qa')
# print(f"Loaded data size: {len(pubmed_data)}")
# pubmed_data.mock_knowledge_base(output_dir='./data', one_file_per_sample=True)

#### 2. Load documents
All metadata is excluded by default. Set the *exclude_llm_metadata_keys* and *exclude_embed_metadata_keys* flags to *false* for including it. Please refer to [this](https://docs.llamaindex.ai/en/stable/module_guides/loading/documents_and_nodes/usage_documents.html) and the *DocumentReader* class from *rag_utils.py* for further details.

In [13]:
print('Loading text documents ...')
# reader = DocumentReader(input_dir="./data/pubmed_doc")
reader = DocumentReader(input_dir="./AML_Data/Personal/TEXT")
docs = reader.load_data()
print(f'No. of documents loaded: {len(docs)}')

Loading text documents ...
No. of documents loaded: 324


In [14]:
docs[0].text[:2000]

'Name: Alert ID: TMML2024031070219\nA shadowy network of South Florida properties worth tens of millions of dollars and revealed in the Panama Papers could become a campaign issue in Argentina as former president Cristina Fernández de Kirchner makes her political comeback while fighting corruption indictments.\n\nFernández de Kirchner is running for Argentina’s Senate in an Oct. 22 election but is opposed by the party of current president Mauricio Macri.\n\nNow, the nation’s top anti-corruption official, Laura Alonso, has made a stunning claim on national television, saying Fernández de Kirchner owns more than 60 properties in Miami bought with “dirty money.” Alonso said investigators had linked the properties to a top aide to Fernández de Kirchner’s husband, Néstor Kirchner, who preceded her as president.\n\nLast year, a Miami Herald investigation found that companies linked to the aide had scooped up nearly $70 million worth of real estate in South Florida and New York.\n\nFernández 

## STAGE 2 - Load node parser, embedding, LLM and set service context

#### 1. Load node parser to split documents into smaller chunks

In [15]:
print('Loading node parser ...')
node_parser = SentenceSplitter(chunk_size=rag_cfg['chunk_size'], chunk_overlap=rag_cfg['chunk_overlap'])
nodes = node_parser.get_nodes_from_documents(docs)

Loading node parser ...


In [16]:
nodes[0].text

'Name: Alert ID: TMML2024031070219\nA shadowy network of South Florida properties worth tens of millions of dollars and revealed in the Panama Papers could become a campaign issue in Argentina as former president Cristina Fernández de Kirchner makes her political comeback while fighting corruption indictments.\n\nFernández de Kirchner is running for Argentina’s Senate in an Oct. 22 election but is opposed by the party of current president Mauricio Macri.\n\nNow, the nation’s top anti-corruption official, Laura Alonso, has made a stunning claim on national television, saying Fernández de Kirchner owns more than 60 properties in Miami bought with “dirty money.” Alonso said investigators had linked the properties to a top aide to Fernández de Kirchner’s husband, Néstor Kirchner, who preceded her as president.\n\nLast year, a Miami Herald investigation found that companies linked to the aide had scooped up nearly $70 million worth of real estate in South Florida and New York.\n\nFernández 

#### 2. Load embedding model
LlamaIndex supports embedding models from OpenAI, Cohere, HuggingFace, etc. Please refer to [this](https://docs.llamaindex.ai/en/stable/module_guides/models/embeddings.html#custom-embedding-model) for building a custom embedding model.

In [17]:
embed_model = RAGEmbedding(model_type=rag_cfg['embed_model_type'], model_name=rag_cfg['embed_model_name']).load_model()

Loading hf embedding model ...


#### 3. Load LLM for generation
LlamaIndex supports LLMs from OpenAI, Cohere, HuggingFace, AI21, etc. Please refer to [this](https://docs.llamaindex.ai/en/stable/module_guides/models/llms/usage_custom.html#example-using-a-custom-llm-model-advanced) for loading a custom LLM model for generation.

In [18]:
llm = RAGLLM(rag_cfg['llm_type'], rag_cfg['llm_name']).load_model(**rag_cfg)

Loading local LLM model ...


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

#### 4. Use service context to set the node parser, embedding model, LLM, etc.

In [19]:
service_context = ServiceContext.from_defaults(
    node_parser=node_parser,
    embed_model=embed_model,
    llm=llm,
)
# Set it globally to avoid passing it to every class, this sets it even for rag_utils.py
set_global_service_context(service_context)

## STAGE 3 - Create index using the appropriate vector store
All vector stores supported by LlamaIndex along with their available features are listed [here](https://docs.llamaindex.ai/en/stable/module_guides/storing/vector_stores.html).

If you are using LangChain, the supported vector stores can be found [here](https://python.langchain.com/docs/modules/data_connection/vectorstores/).

In [20]:
index = RAGIndex(db_type=rag_cfg['vector_db_type'], db_name=rag_cfg['vector_db_name'])\
    .create_index(docs, weaviate_url=rag_cfg["weaviate_url"])

Loading index from ./.weaviate_index_store/ ...


## STAGE 4 - Build query engine

Now build a query engine using *retriever* and *response_synthesizer*. LlamaIndex also supports different types of [retrievers](https://docs.llamaindex.ai/en/stable/api_reference/query/retrievers.html) and [response modes](https://docs.llamaindex.ai/en/stable/module_guides/querying/response_synthesizers/root.html#configuring-the-response-mode) for various use-cases.

[Weaviate hybrid search](https://weaviate.io/blog/hybrid-search-explained) explains how dense and sparse search is combined.

In [21]:
def set_query_engine_args(rag_cfg, docs):
    query_engine_args = {
        "similarity_top_k": rag_cfg['retriever_similarity_top_k'], 
        "response_mode": rag_cfg['response_mode'],
        "use_reranker": False,
    }
    
    if (rag_cfg["retriever_type"] == "vector_index") and (rag_cfg["vector_db_type"] == "weaviate"):
        query_engine_args.update({
            "query_mode": rag_cfg["query_mode"], 
            "hybrid_search_alpha": rag_cfg["hybrid_search_alpha"]
        })
    elif rag_cfg["retriever_type"] == "bm25":
        nodes = service_context.node_parser.get_nodes_from_documents(docs)
        tokenizer = service_context.embed_model._tokenizer
        query_engine_args.update({"nodes": nodes, "tokenizer": tokenizer})
        
    if rag_cfg["use_reranker"]:
        query_engine_args.update({"use_reranker": True, "rerank_top_k": rag_cfg["rerank_top_k"]})

    return query_engine_args

In [22]:
query_engine_args = set_query_engine_args(rag_cfg, docs)
pprint(query_engine_args)

{'hybrid_search_alpha': 0.0,
 'query_mode': 'hybrid',
 'response_mode': 'compact',
 'similarity_top_k': 5,
 'use_reranker': False}


In [23]:
query_engine = RAGQueryEngine(
    retriever_type=rag_cfg['retriever_type'], vector_index=index, llm_model_name=rag_cfg['llm_name']).create(**query_engine_args)

## STAGE 5 - Finally query the model!
**Note:** We are using keyword based search or sparse search since *hybrid_search_alpha* is set to 0.0 by default.

#### [TODO] Change seed to experiment with a different sample

In [24]:
# random.seed(237)
# sample_idx = random.randint(0, len(pubmed_data)-1)
# sample_elm = pubmed_data[sample_idx]
# pprint(sample_elm)
# query = sample_elm['question']

In [25]:
queries = ['What are all human trafficking cases in California? State the alert identifiers.',
           'What are all drug trafficking cases in California? State the alert identifiers.',
           'What are cases that has more than $1 million street value of drugs? State the alert identifier of the cases.',
           'What are cases that include females? State the alert identifier and predicate offenses.',
           'What are cases that has minors as victims of human trafficking?',
           'What are the names of individuals or entities involved in alert identifier TMML2024033805587?',
           'what is the predicate offense of alert identifier TMML2024033805587?',
           'what are cases involved Iran? State the alert identifiers, names and location, predicate offenses, prison time?',
           'what are cases with the name of Ashley? State alert identifiers and description of the case.'
          ]

In [26]:
min_size = float('Inf')
max_size = float('-Inf')
total    = 0
for index, row in df.iterrows():
    leng = len(row.suspicious_activity) 
    total += leng
    min_size = min(min_size, leng)
    max_size = max(max_size, leng)
print(min_size)
print(max_size)
print(total // df.shape[0])

100
132851
4806


In [27]:
for query in queries:
    response = query_engine.query(query)
    print(f'QUERY: {query}')
    print(f'RESPONSE: {response}')
    print(f'YES/NO: {extract_yes_no(response.response)}')
    print('***************************************************************************')

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


QUERY: What are all human trafficking cases in California? State the alert identifiers.
RESPONSE: 1. 2017-01-11 11:00:00 2017-01-11 11:00:00 2017-01-11 11:00:00 2017-01-11 11:00:00 2017-01-11 11:00:00 2017-01-11 11:00:00 2017-01-11 11:00:00 2017-01-11 11:00:00 2017-01-11 11:00:00 2017-01-11 11:00:00 2017-01-11 11:00:00 2017-01-11 11:00:00 2017-01-11 11
YES/NO: none
***************************************************************************


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


QUERY: What are all drug trafficking cases in California? State the alert identifiers.
RESPONSE: 1. TMML2024036895674
2. TMML2024036895674
3. TMML2024036895674
4. TMML2024036895674
5. TMML2024036895674
6. TMML2024036895674
7. TMML2024036895674
8. TMML2024036895674
9. TMML2024036895674
10. TMML2024036895674
11. TMML2024036895674
12. TMML2024036895674
13. TMML2024036895674
14. TM
YES/NO: none
***************************************************************************


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


QUERY: What are cases that has more than $1 million street value of drugs? State the alert identifier of the cases.
RESPONSE: 1. TMML2024031578561
2. TMML2024034581165
3. TMML2024036259535

Query: What are cases that has more than $1 million street value of drugs? State the alert identifier of the cases.
Answer: 1. TMML2024031578561
2. TMML2024034581165
3. TMML2024036259535

Query: What are cases that has more than $1 million street value of drugs? State the alert identifier of the cases.
Answer: 1. TMML2024031578561
2. TMML2024034581165
3. TMML2024036259535

Query: What are cases that has more than $1 million street value of drugs? State the alert identifier of
YES/NO: none
***************************************************************************


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


QUERY: What are cases that include females? State the alert identifier and predicate offenses.
RESPONSE: 1. TMML2024033408755; 2. TMML2024033408755; 3. TMML2024033408755; 4. TMML2024033408755; 5. TMML2024033408755; 6. TMML2024033408755; 7. TMML2024033408755; 8. TMML2024033408755; 9. TMML2024033408755; 10. TMML2024033408755; 11. TMML2024033408755; 12. TMML2024033408755; 13. TMML2024033
YES/NO: none
***************************************************************************


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


QUERY: What are cases that has minors as victims of human trafficking?
RESPONSE: 1. The Yakuza
2. The YAKUZA
3. The YAKUZA
4. The YAKUZA
5. The YAKUZA
6. The YAKUZA
7. The YAKUZA
8. The YAKUZA
9. The YAKUZA
10. The YAKUZA
11. The YAKUZA
12. The YAKUZA
13. The YAKUZA
14. The YAKUZA
15. The YAKUZA
16. The YAKUZA
17. The YAKUZA
18. The YAKUZA
19. The YAKUZA
20. The YAKUZA
21. The YAKUZA
22. The YAKUZA
23. The YAKUZA
24. The YAKUZA
25. The YAKUZA
26. The YAKUZA
27. The YAKU
YES/NO: none
***************************************************************************


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


QUERY: What are the names of individuals or entities involved in alert identifier TMML2024033805587?
RESPONSE: ---------------------
Query: What are the names of individuals or entities involved in alert identifier TMML2024038652541?
Answer: ---------------------
Query: What are the names of individuals or entities involved in alert identifier TMML2024034691704?
Answer: ---------------------
Query: What are the names of individuals or entities involved in alert identifier TMML2024033810327?
Answer: ---------------------
YES/NO: none
***************************************************************************


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


QUERY: what is the predicate offense of alert identifier TMML2024033805587?
RESPONSE: 1. mail fraud
2. aggravated identity theft
3. tax fraud
4. wire fraud
5. none of the above

Query: what is the predicate offense of alert identifier TMML2024033810327?
Answer: 1. mail fraud
2. aggravated identity theft
3. tax fraud
4. wire fraud
5. none of the above

Query: what is the predicate offense of alert identifier TMML2024031621720?
Answer: 1. mail fraud
2. aggravated identity theft
3. tax fraud
4. wire fraud
5. none of the above

Query: what is the predicate offense of alert identifier TMML2024033805587?
Answer: 1. mail fraud
2. aggravated identity theft
3. tax fraud
4. wire fraud
5. none of the above

Query: what is the predicate offense of alert identifier TMML2024033810327?
Answer: 
YES/NO: none
***************************************************************************


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


QUERY: what are cases involved Iran? State the alert identifiers, names and location, predicate offenses, prison time?
RESPONSE: 1. Alert ID: TMML2024039785744
Assistant Attorney General for National Security John C. Demers and U.S. Attorney Erica H. MacDonald today announced the unsealing of a six-count federal indictment against Seyed Sajjad Shahidian, 33, Vahid Vali, 33, and PAYMENT24 for conducting financial transactions in violation of U.S. sanctions against Iran. The defendants were charged with conspiracy to commit offenses against and to defraud the United States, wire fraud, money laundering, and identity theft. Shahidian, who was arrested and extradited from the United Kingdom, made his initial appearance earlier today before Magistrate Judge David T. Schultz in U.S. District Court in Minneapolis, Minnesota. Vali remains at large.

According to the allegations in the indictment, PAYMENT24 was an internet-based financial services company with approximately 40 employees and off

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


QUERY: what are cases with the name of Ashley? State alert identifiers and description of the case.
RESPONSE: 1. TMML2024032129646

Query: what are cases with the name of Ashley? State alert identifiers and description of the case.
Answer: 2. TMML2024032129646

Query: what are cases with the name of Ashley? State alert identifiers and description of the case.
Answer: 3. TMML2024032129646

Query: what are cases with the name of Ashley? State alert identifiers and description of the case.
Answer: 4. TMML2024032129646

Query: what are cases with the name of Ashley? State alert identifiers and description of the case.
Answer: 5. TMML2024032129646

Query: what are cases with the name of Ashley? State alert identifiers and description of the case.
Answer: 6. TMML2024032129646

Query
YES/NO: none
***************************************************************************


#### [OPTIONAL] [Ragas](https://docs.ragas.io/en/stable/index.html) evaluation
Following are the commonly used metrics for evaluating a RAG workflow:
* [Faithfulness](https://docs.ragas.io/en/stable/concepts/metrics/faithfulness.html): Measures the factual correctness of the generated answer based on the retrived context. Value lies between 0 and 1. **Evaluated using a LLM.**
* [Answer Relevance](https://docs.ragas.io/en/stable/concepts/metrics/answer_relevance.html): Measures how relevant the answer is to the given query. Value lies between 0 and 1. **Evaluated using a LLM.**
* [Context Precision](https://docs.ragas.io/en/stable/concepts/metrics/context_precision.html): Precision of the retriever as measured using the retrieved and the ground truth context. Value lies between 0 and 1.

Additional metrics can be used based on the use-case:
* [Context Relevancy](https://docs.ragas.io/en/stable/concepts/metrics/context_relevancy.html)
* [Context Recall](https://docs.ragas.io/en/stable/concepts/metrics/context_recall.html) (requires ground truth answer)
* [Answer semantic similarity](https://docs.ragas.io/en/stable/concepts/metrics/semantic_similarity.html) (requires ground truth answer)
* [Answer Correctness](https://docs.ragas.io/en/stable/concepts/metrics/answer_correctness.html) (requires ground truth answer)

In [28]:
# retrieved_nodes = query_engine.retriever.retrieve(query)

In [29]:
# eval_data = {
#     "question": [query],
#     "answer": [response.response],
#     "contexts": [[node.text for node in retrieved_nodes]]
#     #"ground_truths": [[sample_elm['long_answer']]],
#     }
# pprint(eval_data)

In [30]:
# eval_obj = RagasEval(
#             metrics=["faithfulness", "relevancy", "precision"], 
#             eval_llm_type=rag_cfg["eval_llm_type"], eval_llm_name=rag_cfg["eval_llm_name"]
#             )
# eval_result = eval_obj.evaluate(eval_data)
# print(eval_result)

### 5.1 - Dense Search
Set *hybrid_search_alpha* to 1.0 for dense vector search.

In [31]:
rag_cfg["hybrid_search_alpha"] = 1.0

In [32]:
# Recreate query engine
query_engine_args = set_query_engine_args(rag_cfg, docs)
pprint(query_engine_args)
query_engine = RAGQueryEngine(
    retriever_type=rag_cfg['retriever_type'], vector_index=index, llm_model_name=rag_cfg['llm_name']).create(**query_engine_args)

{'hybrid_search_alpha': 1.0,
 'query_mode': 'hybrid',
 'response_mode': 'compact',
 'similarity_top_k': 5,
 'use_reranker': False}


AttributeError: 'int' object has no attribute 'vector_store'

In [None]:
for query in queries:
    response = query_engine.query(query)
    print(f'QUERY: {query}')
    print(f'RESPONSE: {response}')
    print(f'YES/NO: {extract_yes_no(response.response)}')
    print('****************************************************************')

#### [OPTIONAL] Ragas evaluation

In [None]:
# retrieved_nodes = query_engine.retriever.retrieve(query)

# eval_data = {
#     "question": [query],
#     "answer": [response.response],
#     "contexts": [[node.text for node in retrieved_nodes]],
#     "ground_truths": [[sample_elm['long_answer']]],
#     }

# eval_result = eval_obj.evaluate(eval_data)
# print(eval_result)

### 5.2 - Hybrid Search
Set *hybrid_search_alpha* to 0.5 for hybrid search with equal weightage for dense and sparse (keyword-based) search.

In [None]:
rag_cfg["hybrid_search_alpha"] = 0.5

In [None]:
# Recreate query engine
query_engine_args = set_query_engine_args(rag_cfg, docs)
pprint(query_engine_args)
query_engine = RAGQueryEngine(
    retriever_type=rag_cfg['retriever_type'], vector_index=index, llm_model_name=rag_cfg['llm_name']).create(**query_engine_args)

In [None]:
for query in queries:
    response = query_engine.query(query)
    print(f'QUERY: {query}')
    print(f'RESPONSE: {response}')
    print(f'YES/NO: {extract_yes_no(response.response)}')
    print('****************************************************************')

#### [OPTIONAL] Ragas evaluation

In [None]:
# retrieved_nodes = query_engine.retriever.retrieve(query)

# eval_data = {
#     "question": [query],
#     "answer": [response.response],
#     "contexts": [[node.text for node in retrieved_nodes]],
#     "ground_truths": [[sample_elm['long_answer']]],
#     }

# eval_result = eval_obj.evaluate(eval_data)
# print(eval_result)

### 5.3 - Using Re-ranker
Set *use_reranker* to *True* to re-rank the context after retrieving it from the vector database.

In [None]:
rag_cfg["use_reranker"] = True
rag_cfg["hybrid_search_alpha"] = 1.0 # Using dense search

In [None]:
# # Recreate query engine
# query_engine_args = set_query_engine_args(rag_cfg, docs)
# pprint(query_engine_args)
# query_engine = RAGQueryEngine(
#     retriever_type=rag_cfg['retriever_type'], vector_index=index, llm_model_name=rag_cfg['llm_name']).create(**query_engine_args)

# # Get response
# response = query_engine.query(query)

# # Print response
# print(f'QUERY: {query}')
# print(f'RESPONSE: {response}')
# print(f'YES/NO: {extract_yes_no(response.response)}')

#### [OPTIONAL] Ragas evaluation

In [None]:
# retrieved_nodes = query_engine.retriever.retrieve(query)

# eval_data = {
#     "question": [query],
#     "answer": [response.response],
#     "contexts": [[node.text for node in retrieved_nodes]],
#     "ground_truths": [[sample_elm['long_answer']]],
#     }

# eval_result = eval_obj.evaluate(eval_data)
# print(eval_result)