# **Hybrid RAG**

Hybrid RAG refers to an advanced retrieval technique that combines vector similarity search with traditional search methods, such as full-text search or BM25. This approach enables more comprehensive and flexible information retrieval by leveraging the strengths of both methods, vector similarity for semantic understanding and traditional techniques for precise keyword or text-based matching.

Research Paper: [paper1](https://arxiv.org/pdf/2408.05141) and [paper2](https://arxiv.org/pdf/2408.04948)

## **Initial Setup**

In [None]:
! pip install --q athina chromadb rank_bm25

In [None]:
import os
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
os.environ['ATHINA_API_KEY'] = userdata.get('ATHINA_API_KEY')

## **Indexing**

In [None]:
# load embedding model
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

In [None]:
# load data
from langchain.document_loaders import CSVLoader
loader = CSVLoader("./context.csv")
documents = loader.load()

In [None]:
# split documents
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
documents = text_splitter.split_documents(documents)

In [None]:
# create vectorstore
from langchain.vectorstores import Chroma
vectorstore = Chroma.from_documents(documents, embeddings)

## **Retrievers**

In [None]:
# create retriever
retriever = vectorstore.as_retriever()

### **Keyword Retriever**

In [None]:
# create keyword retriever
from langchain.retrievers import BM25Retriever
keyword_retriever = BM25Retriever.from_documents(documents)
keyword_retriever.k =  3

In [None]:
# test keyword retriever
keyword_retriever.get_relevant_documents("what bacteria grow on macconkey agar")

  warn_deprecated(


[Document(page_content='predominantly made from the lactose sugar in the agar.\\n\\n\\n== Variant ==\\nA variant, sorbitol-MacConkey agar, (with the addition of additional selective agents) can assist in the isolation and differentiation of enterohemorrhagic E. coli serotype E. coli O157:H7, by the presence of colorless circular colonies that are non-sorbitol fermenting.\\n\\n\\n== See also ==\\nR2a agar\\nMRS agar (culture medium designed to grow Gram-positive bacteria and differentiate them for lactose fermentation).\\n\\n\\n==', metadata={'source': './context.csv', 'row': 6}),
 Document(page_content='zoonotic disease since around 1910, but in the 1930s knowledge was gained that the bacteria lost their virulent power when repeatedly spread on agar media. This explained the difficulties to reproduce results from different studies as the pre-inoculating handlings of the bacteria were not standardized among scientists.Today it is established that at least some primate species are highly

### **Ensemble Retriever**

In [None]:
# create ensemble retriever
from langchain.retrievers import EnsembleRetriever
ensemble_retriever = EnsembleRetriever(retrievers=[retriever, keyword_retriever], weights=[0.5, 0.5])

In [None]:
# test ensemble retriever
ensemble_retriever.get_relevant_documents("what bacteria grow on macconkey agar")

[Document(page_content="context: ['MacConkey agar is a selective and differential culture medium for bacteria. It is designed to selectively isolate Gram-negative and enteric (normally found in the intestinal tract) bacteria and differentiate them based on lactose fermentation. Lactose fermenters turn red or pink on MacConkey agar, and nonfermenters do not change color. The media inhibits growth of Gram-positive organisms with crystal violet and bile salts, allowing for the selection and isolation of gram-negative", metadata={'row': 6, 'source': './context.csv'}),
 Document(page_content='predominantly made from the lactose sugar in the agar.\\n\\n\\n== Variant ==\\nA variant, sorbitol-MacConkey agar, (with the addition of additional selective agents) can assist in the isolation and differentiation of enterohemorrhagic E. coli serotype E. coli O157:H7, by the presence of colorless circular colonies that are non-sorbitol fermenting.\\n\\n\\n== See also ==\\nR2a agar\\nMRS agar (culture m

## **RAG Chain**

In [None]:
# create llm
from langchain_openai import ChatOpenAI
llm = ChatOpenAI()

In [None]:
# create document chain
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser

template = """"
You are a helpful assistant that answers questions based on the following context.
If you don't find the answer in the context, just say that you don't know.
Context: {context}

Question: {input}

Answer:

"""
prompt = ChatPromptTemplate.from_template(template)

# Setup RAG pipeline
rag_chain = (
    {"context": ensemble_retriever,  "input": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [None]:
# response
response = rag_chain.invoke('what bacteria grow on macconkey agar')
response

'Gram-negative and enteric bacteria grow on MacConkey agar.'

## **Preparing Data for Evaluation**

In [None]:
# create dataset
questions = ["what bacteria grow on macconkey agar", "who wrote a rose is a rose is a rose"]
response = []
contexts = []

# Inference
for query in questions:
  response.append(rag_chain.invoke(query))
  contexts.append([docs.page_content for docs in ensemble_retriever.get_relevant_documents(query)])

# To dict
data = {
    "query": questions,
    "response": response,
    "context": contexts,
}

In [None]:
# create dataset
from datasets import Dataset
dataset = Dataset.from_dict(data)

In [None]:
# create dataframe
import pandas as pd
df = pd.DataFrame(dataset)

In [None]:
df

Unnamed: 0,query,response,context
0,what bacteria grow on macconkey agar,Gram-negative and enteric bacteria grow on MacConkey agar.,"[context: ['MacConkey agar is a selective and differential culture medium for bacteria. It is designed to selectively isolate Gram-negative and enteric (normally found in the intestinal tract) bacteria and differentiate them based on lactose fermentation. Lactose fermenters turn red or pink on MacConkey agar, and nonfermenters do not change color. The media inhibits growth of Gram-positive organisms with crystal violet and bile salts, allowing for the selection and isolation of gram-negative..."
1,who wrote a rose is a rose is a rose,"Gertrude Stein wrote ""A rose is a rose is a rose"" as part of the 1913 poem ""Sacred Emily"".","['Version ridicules the stupidity of court speeches when the prosecutor ends his opening speech with ""murder is murder is murder.""\nJeanette Winterson wrote in her novel Written on the Body: ""Sometimes a breast is a breast is a breast.""\n""La rosa es una rosa es una rosa"" is used in Fernando del Paso\'s Sonetos con lugares comunes.\nA song by Poe (Anne Danielewski), ""A rose is a rose"", states ""a rose is a rose is a rose is a rose said my good friend Gertrude Stein.""\nThe computer game Carmen,..."


In [None]:
# Convert to dictionary
df_dict = df.to_dict(orient='records')

# Convert context to list
for record in df_dict:
    if not isinstance(record.get('context'), list):
        if record.get('context') is None:
            record['context'] = []
        else:
            record['context'] = [record['context']]

## **Evaluation in Athina AI**

We will use **Context Relevancy** eval here. It Measures the relevancy of the retrieved context, calculated based on both the query and contexts. Please refer to our [documentation](https://docs.athina.ai/api-reference/evals/preset-evals/overview) for further details

In [None]:
# set api keys for Athina evals
from athina.keys import AthinaApiKey, OpenAiApiKey
OpenAiApiKey.set_key(os.getenv('OPENAI_API_KEY'))
AthinaApiKey.set_key(os.getenv('ATHINA_API_KEY'))

In [None]:
# load dataset
from athina.loaders import Loader
dataset = Loader().load_dict(df_dict)

In [None]:
# evaluate
from athina.evals import RagasContextRelevancy
RagasContextRelevancy(model="gpt-4o").run_batch(data=dataset).to_df()

evaluating with [context_relevancy]
evaluating with [context_relevancy]


  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/1 [00:00<?, ?it/s][A
100%|██████████| 1/1 [00:00<00:00,  1.74it/s]
100%|██████████| 1/1 [00:01<00:00,  1.13s/it]


You can view your dataset at: https://app.athina.ai/develop/171e53cf-55e1-4d30-a4cf-71648edb7650


Unnamed: 0,query,context,response,expected_response,display_name,failed,grade_reason,runtime,model,ragas_context_relevancy
0,what bacteria grow on macconkey agar,"[context: ['MacConkey agar is a selective and differential culture medium for bacteria. It is designed to selectively isolate Gram-negative and enteric (normally found in the intestinal tract) bacteria and differentiate them based on lactose fermentation. Lactose fermenters turn red or pink on MacConkey agar, and nonfermenters do not change color. The media inhibits growth of Gram-positive organisms with crystal violet and bile salts, allowing for the selection and isolation of gram-negative...",Gram-negative and enteric bacteria grow on MacConkey agar.,,Ragas Context Relevancy,,This metric is calculated by dividing the number of sentences in context that are relevant for answering the given query by the total number of sentences in the retrieved context,1427,gpt-4o,0.05
1,who wrote a rose is a rose is a rose,"['Version ridicules the stupidity of court speeches when the prosecutor ends his opening speech with ""murder is murder is murder.""\nJeanette Winterson wrote in her novel Written on the Body: ""Sometimes a breast is a breast is a breast.""\n""La rosa es una rosa es una rosa"" is used in Fernando del Paso\'s Sonetos con lugares comunes.\nA song by Poe (Anne Danielewski), ""A rose is a rose"", states ""a rose is a rose is a rose is a rose said my good friend Gertrude Stein.""\nThe computer game Carmen,...","Gertrude Stein wrote ""A rose is a rose is a rose"" as part of the 1913 poem ""Sacred Emily"".",,Ragas Context Relevancy,,This metric is calculated by dividing the number of sentences in context that are relevant for answering the given query by the total number of sentences in the retrieved context,1788,gpt-4o,0.05
