# Overview
This Jupyter notebook is focused on building and evaluating a Retrieval-Augmented Generation (RAG) pipeline with different chunking strategies when building the vector database. The notebook consists of several sections:

1.  **Importing Libraries and Environment Setup:** The notebook begins by importing necessary libraries, including `langchain_core`, `langchain_mongodb`, `langchain_openai`, and `dotenv`. It then loads environment variables from a `.env` file using `dotenv`.
2.  **Vector Store and Retriever Setup:** Three vector stores (`vector_store_0`, `vector_store_1`, `vector_store_2`) are set up using `MongoDBAtlasVectorSearch` from `langchain_mongodb`. Each vector store is associated with a different MongoDB collection and index name. Three retrievers (`retriever_0`, `retriever_1`, `retriever_2`) are set up using the `as_retriever` method of each vector store.
3.  **RAG Pipeline with Chunking Strategy:** Three RAG pipelines are defined, each with a different chunking strategy:

    *   **Chunking Strategy 0:** This pipeline uses `retriever_0`, with chunk size 200 and overlap 20.
    *   **Chunking Strategy 1:** This pipeline uses `retriever_1`, with chunk size 150 and overlap 15.
    *   **Chunking Strategy 2:** This pipeline uses `retriever_2`, with chunk size 300 and overlap 30.

    Each pipeline uses a language model (`ChatOpenAI`) to generate answers based on the retrieved documents.
4.  **Testing the RAG Pipelines:** Each RAG pipeline is tested with a sample question, and the generated answer is printed.
5.  **RAG Pipeline Evaluation:** The notebook prepares a test dataset using a JSON file and evaluates each RAG pipeline using the `evaluate` function from `ragas`. The evaluation metrics include answer relevance, faithfulness, context recall, and context precision.
6.  **Saving Evaluation Results:** The evaluation results for each RAG pipeline are saved as a CSV file.

**Main Functions and Variables:**

*   `call_openai_0`, `call_openai_1`, `call_openai_2`: These functions define the RAG pipelines with different chunking strategies.
*   `vector_store_0`, `vector_store_1`, `vector_store_2`: These variables define the vector stores for the RAG pipelines.
*   `retriever_0`, `retriever_1`, `retriever_2`: These variables define the retrievers for the RAG pipelines.
*   `llm`: This variable defines the language model used for generating answers.
*   `result_0`, `result_1`, `result_2`: These variables store the evaluation results for each RAG pipeline.

**Context and Evaluation Metrics:**

*   The notebook uses a test dataset with questions and ground truth answers to evaluate the RAG pipelines.
*   The evaluation metrics include answer relevance, faithfulness, context recall, and context precision.

**Chunking Strategy Comparison:**

*   The notebook evaluates three different chunking strategies with chunk sizes 200, 150, and 300, and overlaps 20, 15, and 30, respectively.
*   The evaluation results are saved as CSV files for each chunking strategy.

**Advice and Future Work:**

*   The notebook can be extended to evaluate more chunking strategies and compare their performance.
*   The evaluation metrics can be modified or extended to better assess the performance of the RAG pipelines.
*   The notebook can be used as a starting point for building and evaluating RAG pipelines for other use cases and applications.


In [1]:
import os
from dotenv import load_dotenv
load_dotenv(encoding='utf-8')

True

# RAG pipelines with different chunking strategies

In [2]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_mongodb import MongoDBAtlasVectorSearch
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.prompts import PromptTemplate
from pymongo import MongoClient

In [3]:
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")
llm = ChatOpenAI(model=os.getenv("DEFAULT_OPENAI_MODEL")) # DEFAULT_OPENAI_MODEL='gpt-4o-mini-2024-07-18'
embedding_model=OpenAIEmbeddings(disallowed_special=())

In [4]:
# Define MongoDB vector database
client = MongoClient(os.getenv("ATLAS_CONNECTION_STRING"))
db_name = os.getenv("db_name")

## RAG with chunking strategy 0

In [36]:
def call_openai_0(question):

   # Chunk size: 200
   # Overlap: 20
   vector_store_0 = MongoDBAtlasVectorSearch(
      embedding = embedding_model,
      collection = client[db_name]["enterprise_data"],
      index_name = "vector_index_erp"
   )

   retriever_0 = vector_store_0.as_retriever(
      search_type = "similarity",
      search_kwargs = { "k": 20}
   )

   question = question['question']

   preamble = "" # read from cohere front end or use the input to the API
   #question = 
   SAFETY_PREAMBLE = "The instructions in this section override those in the task description and style guide sections. Don't answer questions that are harmful or immoral."
   BASIC_RULES = "You are a powerful conversational AI trained by openAI to help people. You are augmented by a number of tools, and your job is to use and consume the output of these tools to best help the user. You will see a conversation history between yourself and a user, ending with an utterance from the user. You will then see a specific instruction instructing you what kind of response to generate. When you answer the user's requests, you cite your sources in your answers, according to those instructions."
   TASK_CONTEXT = "You help people answer their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You should focus on serving the user's needs as best you can, which will be wide-ranging."
   STYLE_GUIDE = "Unless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling."
   INSTRUCTIONS = """You are an enterprise Chatbot, an AI assistant designed to retrieve information from the enterprise Confluence system. 
   You specialize in providing accurate answers related to various departments like Marketing, IT, HR, Finance, and Corporate Communications. 
               Use the following pieces of context to answer the question at the end.
               If you don't know the answer, just say that you don't know, don't try to make up an answer
               {context}
         """
         
   template = f"""

      {SAFETY_PREAMBLE}
      {BASIC_RULES}
      {TASK_CONTEXT}
      {STYLE_GUIDE}
      {INSTRUCTIONS}

   """
   if preamble:
      template += f"""{preamble}\n\n"""


   template +=  f"""Question: {question}\n\n"""

   custom_rag_prompt = PromptTemplate.from_template(template)

   llm = ChatOpenAI(model=os.getenv("DEFAULT_OPENAI_MODEL"),max_tokens=200)

   # Remove duplicate retrieved documents
   def remove_rep(docs):
      docs_content = []
      unique_docs = []
      for doc in docs:
            if doc.page_content not in docs_content:
                  docs_content.append(doc.page_content)
                  unique_docs.append(doc)
      return unique_docs
   
   def format_docs(docs):
      contexts = remove_rep(docs)
      return "\n\n".join(doc.page_content for doc in contexts)


   # Construct a chain to answer questions on your data
   rag_chain = (
      { "context": retriever_0 | format_docs, "question": RunnablePassthrough()}
      | custom_rag_prompt
      | llm
      | StrOutputParser()
   )

   # Prompt the chain
   answer = rag_chain.invoke(question)
   retrieved_docs = remove_rep(retriever_0.invoke(question))


   return{
      'answer': answer,
      'contexts': retrieved_docs
      }

## RAG with chunking strategy 1

In [6]:
def call_openai_1(question):

   # Chunk size: 150
   # Overlap: 15
   vector_store_1 = MongoDBAtlasVectorSearch(
      embedding = embedding_model,
      collection = client[db_name]['enterprise_data_1'],
      index_name = "vector_index_erp1"
   )

   retriever_1 = vector_store_1.as_retriever(
      search_type = "similarity",
      search_kwargs = { "k": 20}
   )

   question = question['question']

   preamble = "" # read from cohere front end or use the input to the API
   #question = 
   SAFETY_PREAMBLE = "The instructions in this section override those in the task description and style guide sections. Don't answer questions that are harmful or immoral."
   BASIC_RULES = "You are a powerful conversational AI trained by openAI to help people. You are augmented by a number of tools, and your job is to use and consume the output of these tools to best help the user. You will see a conversation history between yourself and a user, ending with an utterance from the user. You will then see a specific instruction instructing you what kind of response to generate. When you answer the user's requests, you cite your sources in your answers, according to those instructions."
   TASK_CONTEXT = "You help people answer their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You should focus on serving the user's needs as best you can, which will be wide-ranging."
   STYLE_GUIDE = "Unless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling."
   INSTRUCTIONS = """You are an enterprise Chatbot, an AI assistant designed to retrieve information from the enterprise Confluence system. 
   You specialize in providing accurate answers related to various departments like Marketing, IT, HR, Finance, and Corporate Communications. 
               Use the following pieces of context to answer the question at the end.
               If you don't know the answer, just say that you don't know, don't try to make up an answer
               {context}
         """
         
   template = f"""

      {SAFETY_PREAMBLE}
      {BASIC_RULES}
      {TASK_CONTEXT}
      {STYLE_GUIDE}
      {INSTRUCTIONS}

   """
   if preamble:
      template += f"""{preamble}\n\n"""


   template +=  f"""Question: {question}\n\n"""

   custom_rag_prompt = PromptTemplate.from_template(template)

   #llm = get_llm_model("openai")
   llm = ChatOpenAI(model=os.getenv("DEFAULT_OPENAI_MODEL"),max_tokens=200)

   # Remove duplicate retrieved documents
   def remove_rep(docs):
      docs_content = []
      unique_docs = []
      for doc in docs:
            if doc.page_content not in docs_content:
                  docs_content.append(doc.page_content)
                  unique_docs.append(doc)
      return unique_docs
   
   def format_docs(docs):
      contexts = remove_rep(docs)
      return "\n\n".join(doc.page_content for doc in contexts)


   # Construct a chain to answer questions on your data
   rag_chain = (
      { "context": retriever_1 | format_docs, "question": RunnablePassthrough()}
      | custom_rag_prompt
      | llm
      | StrOutputParser()
   )

   # Prompt the chain
   question = question
   answer = rag_chain.invoke(question)
   retrieved_docs = remove_rep(retriever_1.invoke(question))


   return{
      'answer': answer,
      'contexts': retrieved_docs
      }

## RAG with chunking strategy 2

In [20]:
def call_openai_2(question):

   # Chunk size: 300
   # Overlap: 30
   vector_store_2 = MongoDBAtlasVectorSearch(
      embedding = embedding_model,
      collection = client[db_name]['enterprise_data_2'],
      index_name = "vector_index_erp_2"
   )

   retriever_2 = vector_store_2.as_retriever(
      search_type = "similarity",
      search_kwargs = { "k": 20} 
   )

   question = question['question']

   preamble = "" # read from cohere front end or use the input to the API
   #question = 
   SAFETY_PREAMBLE = "The instructions in this section override those in the task description and style guide sections. Don't answer questions that are harmful or immoral."
   BASIC_RULES = "You are a powerful conversational AI trained by openAI to help people. You are augmented by a number of tools, and your job is to use and consume the output of these tools to best help the user. You will see a conversation history between yourself and a user, ending with an utterance from the user. You will then see a specific instruction instructing you what kind of response to generate. When you answer the user's requests, you cite your sources in your answers, according to those instructions."
   TASK_CONTEXT = "You help people answer their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You should focus on serving the user's needs as best you can, which will be wide-ranging."
   STYLE_GUIDE = "Unless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling."
   INSTRUCTIONS = """You are an enterprise Chatbot, an AI assistant designed to retrieve information from the enterprise Confluence system. 
   You specialize in providing accurate answers related to various departments like Marketing, IT, HR, Finance, and Corporate Communications. 
               Use the following pieces of context to answer the question at the end.
               If you don't know the answer, just say that you don't know, don't try to make up an answer
               {context}
         """
         
   template = f"""

      {SAFETY_PREAMBLE}
      {BASIC_RULES}
      {TASK_CONTEXT}
      {STYLE_GUIDE}
      {INSTRUCTIONS}

   """
   if preamble:
      template += f"""{preamble}\n\n"""


   template +=  f"""Question: {question}\n\n"""

   custom_rag_prompt = PromptTemplate.from_template(template)

   #llm = get_llm_model("openai")
   llm = ChatOpenAI(model=os.getenv("DEFAULT_OPENAI_MODEL"),max_tokens=200)

   # Remove duplicate retrieved documents
   def remove_rep(docs):
      docs_content = []
      unique_docs = []
      for doc in docs:
            if doc.page_content not in docs_content:
                  docs_content.append(doc.page_content)
                  unique_docs.append(doc)
      return unique_docs
   
   def format_docs(docs):
      contexts = remove_rep(docs)
      return "\n\n".join(doc.page_content for doc in contexts)

   # Construct a chain to answer questions on your data
   rag_chain = (
      { "context": retriever_2 | format_docs, "question": RunnablePassthrough()}
      | custom_rag_prompt
      | llm
      | StrOutputParser()
   )

   # Prompt the chain
   question = question
   answer = rag_chain.invoke(question)
   retrieved_docs = remove_rep(retriever_2.invoke(question))


   return{
      'answer': answer,
      'contexts': retrieved_docs
      }

# RAG pipeline evaluation

## Test data set prep

**Load test questions dataset of IT Department.**

In [9]:
import pandas as pd
import json

def json_to_dataframe(json_file_path):
  """Reads a JSON file and converts it to a pandas DataFrame.

  Args:
    json_file_path (str): The path to the JSON file.

  Returns:
    pandas.DataFrame: The DataFrame created from the JSON data.
  """

  with open(json_file_path, 'r') as f:
    data = json.load(f)

  # Handle different JSON structures
  if isinstance(data, list):
    # If the JSON data is a list of dictionaries, create a DataFrame directly
    df = pd.DataFrame(data)
  elif isinstance(data, dict):
    # If the JSON data is a single dictionary, convert it to a list of dictionaries
    df = pd.DataFrame([data])
  else:
    raise ValueError("Unsupported JSON structure")

  return df

In [10]:
from from_root import from_root
import os
file_name = "test_dataset_it.json"
json_file_path = os.path.join(from_root(), "data-test/test_dataset/", file_name)
data_to_test = json_to_dataframe(json_file_path)

## RAGAS evaluation

In [22]:
def extract_page_content(documents):
    """
    Reads a list of Document objects and return a list of str of Document objects' page_content.

    Args:
        A list of Document objects

    Returns:
        A list of strings, which are the page_content of the Document objects
    """
    return [doc.page_content for doc in documents]

### Chunking Strategy 0

In [44]:
# Generate all the answers for the questions in the dataset
# Store the answers and retrieved documents as contexts to lists
answers_0 = []
contexts_0 = []
for question in data_to_test['question']:
    question_dict = {'question': question}
    answer = call_openai_0(question_dict)
    contexts_0.append(answer['contexts'])
    answers_0.append(answer['answer'])

In [45]:
# update the dataset with answers and contexts
data_to_test['answers_0'] = answers_0
data_to_test['contexts_0'] = contexts_0

In [None]:
# Replace empty list context with ['No context'] if there are any
def is_empty_list(lst):
    return len(lst) == 0
data_to_test['contexts_0'] = data_to_test['contexts_0'].apply(lambda x: ['No context'] if is_empty_list(x) else x)

In [56]:
from datasets import Dataset

question = list(data_to_test['question'])
answer = list(data_to_test['answers_0'])
contexts = list(data_to_test['contexts_0'].apply(extract_page_content))
ground_truth = list(data_to_test['ground_truth'])

data_chunking_0 = {
    'question': question,
    'answer': answer,
    'contexts': contexts,
    'ground_truth': ground_truth
}

dataset_chunking_0 = Dataset.from_dict(data_chunking_0)

In [None]:
# Optional, uncomment to trace runs with LangSmith. Sign up here: https://smith.langchain.com.
# from langsmith import Client
# os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
# os.environ["LANGCHAIN_PROJECT"] = "Cohere_RAG_Eval"
# os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGSMITH_API_KEY"] = os.getenv("LANGSMITH_API_KEY")
# client = Client()

In [None]:
from ragas import evaluate
# from ragas.integrations.langsmith import evaluate
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
    context_recall,
    context_precision,
)
result_0 = evaluate(
    dataset_chunking_0,
    metrics=[
        answer_relevancy,
        faithfulness,
        context_recall,
        context_precision,
    ],
)

In [59]:
result_0

{'answer_relevancy': 0.9552, 'faithfulness': 0.4109, 'context_recall': 0.3813, 'context_precision': 0.6624}

In [60]:
result_0.to_pandas()

Unnamed: 0,question,answer,contexts,ground_truth,answer_relevancy,faithfulness,context_recall,context_precision
0,How does the role of the Senior Director respo...,The Senior Director responsible for Analytics ...,[the Senior Director responsible for Analytics...,The role of the Senior Director responsible fo...,0.925206,0.307692,1.0,1.0
1,What is the importance of identifying and addr...,Identifying and addressing growth areas in sel...,[to identify strengths and areas for improveme...,Identifying and addressing growth areas in sel...,0.990183,0.1875,0.333333,0.0
2,What forms of unethical behavior are strictly ...,"In the recruitment process of Inc., any form o...",[Inc. upholds the highest ethical standards in...,Favoritism or nepotism,0.920672,0.875,0.0,0.961735
3,What is the significance of emotional and aest...,Emotional and aesthetic labor plays a signific...,[LabourEmotional and aesthetic labor involves ...,Emotional and aesthetic labor in the workplace...,0.974615,0.565217,0.666667,1.0
4,What is the purpose of the orientation session...,The purpose of the orientation session at Tech...,[see you thrive at Tech Innovators Inc. Welcom...,The purpose of the orientation session at Tech...,1.0,0.222222,0.0,0.0
5,What mechanisms are in place for reporting vio...,"At Tech Innovators Inc., employees can report ...",[and identify areas for improvement.5.3 Report...,Employees can report violations of labor laws ...,0.962563,0.4,0.25,1.0
6,How do employee engagement and disengagement d...,Employee engagement and disengagement differ s...,"[are motivated and committed, disengaged emplo...",Employee engagement and disengagement differ i...,0.920337,0.529412,1.0,1.0
7,What steps are needed to extract data from Con...,To extract data from Confluence and create a R...,[IntroductionThis guide provides a step-by-ste...,To extract data from Confluence and create a R...,0.921967,0.571429,0.181818,1.0
8,How does Tech Innovators Inc. promote employee...,Tech Innovators Inc. promotes employee engagem...,[IntroductionTech Innovators Inc. is committed...,Tech Innovators Inc. promotes employee engagem...,0.980843,0.04,0.0,0.0


In [62]:
file_name = "test_dataset_it_chunking_0.csv"
json_file_path = os.path.join(from_root(), "data-test/test_dataset/", file_name)
result_0.to_pandas().to_csv(json_file_path, index=False)

### Chunking Strategy 1

In [13]:
# Generate all the answers for the questions in the dataset
# examples = client.list_examples(dataset_name="hr test")
answers_1 = []
contexts_1 = []
for question in data_to_test['question']:
    question_dict = {'question': question}
    answer = call_openai_1(question_dict)
    contexts_1.append(answer['contexts'])
    answers_1.append(answer['answer'])

In [14]:
# update the dataset with answers
data_to_test['answers_1'] = answers_1
data_to_test['contexts_1'] = contexts_1

In [15]:
# Replace empty list context with ['No context'] if there are any
def is_empty_list(lst):
    return len(lst) == 0
data_to_test['contexts_1'] = data_to_test['contexts_1'].apply(lambda x: ['No context'] if is_empty_list(x) else x)

In [23]:
from datasets import Dataset

question = list(data_to_test['question'])
answer = list(data_to_test['answers_1'])
contexts = list(data_to_test['contexts_1'].apply(extract_page_content))
ground_truth = list(data_to_test['ground_truth'])

data_chunking_1 = {
    'question': question,
    'answer': answer,
    'contexts': contexts,
    'ground_truth': ground_truth
}

dataset_chunking_1 = Dataset.from_dict(data_chunking_1)

In [None]:
# Optional, uncomment to trace runs with LangSmith. Sign up here: https://smith.langchain.com.
# from langsmith import Client
# os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
# os.environ["LANGCHAIN_PROJECT"] = "Cohere_RAG_Eval"
# os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGSMITH_API_KEY"] = os.getenv("LANGSMITH_API_KEY")
# client = Client()

In [26]:
from ragas import evaluate
# from ragas.integrations.langsmith import evaluate
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
    context_recall,
    context_precision,
)
result_1 = evaluate(
    dataset_chunking_1,
    metrics=[
        answer_relevancy,
        faithfulness,
        context_recall,
        context_precision,
    ],
)

Evaluating:   0%|          | 0/36 [00:00<?, ?it/s]

In [27]:
result_1

{'answer_relevancy': 0.6372, 'faithfulness': 0.2407, 'context_recall': 0.0000, 'context_precision': 0.0000}

In [28]:
result_1.to_pandas()

Unnamed: 0,question,answer,contexts,ground_truth,answer_relevancy,faithfulness,context_recall,context_precision
0,How does the role of the Senior Director respo...,The Senior Director responsible for Analytics ...,[immediate attention to avoid delays in custom...,The role of the Senior Director responsible fo...,0.983052,0.0,0.0,0.0
1,What is the importance of identifying and addr...,Identifying and addressing growth areas in sel...,[must receive regular training on Azure securi...,Identifying and addressing growth areas in sel...,0.908362,0.0,0.0,0.0
2,What forms of unethical behavior are strictly ...,The recruitment process at Tech Innovators Inc...,[promptly.9.2 PenaltiesEmployees found in viol...,Favoritism or nepotism,0.93473,0.0,0.0,0.0
3,What is the significance of emotional and aest...,Emotional and aesthetic labor in the workplace...,[promptly.9.2 PenaltiesEmployees found in viol...,Emotional and aesthetic labor in the workplace...,0.979084,0.0,0.0,0.0
4,What is the purpose of the orientation session...,"I'm sorry, but I don't have information regard...","[Innovators Inc.#12, IT parkTech city, Tech st...",The purpose of the orientation session at Tech...,0.0,1.0,0.0,0.0
5,What mechanisms are in place for reporting vio...,I don't have specific information regarding th...,[automated tools.Any violations of this policy...,Employees can report violations of labor laws ...,0.0,0.333333,0.0,0.0
6,How do employee engagement and disengagement d...,Employee engagement and disengagement differ s...,[promptly.9.2 PenaltiesEmployees found in viol...,Employee engagement and disengagement differ i...,0.933077,0.0,0.0,0.0
7,What steps are needed to extract data from Con...,To extract data from Confluence and create a R...,[com.atlassian.confluence.project-linker\nproj...,To extract data from Confluence and create a R...,0.99666,0.5,0.0,0.0
8,How does Tech Innovators Inc. promote employee...,"I'm sorry, but I don't have information on how...","[Innovators Inc.#12, IT parkTech city, Tech st...",Tech Innovators Inc. promotes employee engagem...,0.0,0.333333,0.0,0.0


In [29]:
# Save the extracted data for evaluation use
file_name = "eval_result_dataset_hr_chunking1.csv"
result_1.to_pandas().to_csv(os.path.join(from_root(), "data-test/test_dataset/", file_name), index=False)

### Chunking Strategy 2

In [81]:
# Generate all the answers for the questions in the dataset
# examples = client.list_examples(dataset_name="hr test")
answers_2 = []
contexts_2 = []
# retrieved_contexts = []
for question in data_to_test['question']:
    question_dict = {'question': question}
    answer = call_openai_2(question_dict)
    contexts_2.append(answer['contexts'])
    answers_2.append(answer['answer'])

In [83]:
# update the dataset with answers and contexts
data_to_test['answers_2'] = answers_2
data_to_test['contexts_2'] = contexts_2

In [None]:
# Replace empty list context with ['No context'] if there are any 
def is_empty_list(lst):
    return len(lst) == 0
data_to_test['contexts_2'] = data_to_test['contexts_2'].apply(lambda x: ['No context'] if is_empty_list(x) else x)

In [88]:
from datasets import Dataset

question = list(data_to_test['question'])
answer = list(data_to_test['answers_2'])
contexts = list(data_to_test['contexts_2'].apply(extract_page_content))
ground_truth = list(data_to_test['ground_truth'])

data_chunking_2 = {
    'question': question,
    'answer': answer,
    'contexts': contexts,
    'ground_truth': ground_truth
}

dataset_chunking_2 = Dataset.from_dict(data_chunking_2)

In [None]:
# Optional, uncomment to trace runs with LangSmith. Sign up here: https://smith.langchain.com.
# from langsmith import Client
# os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
# os.environ["LANGCHAIN_PROJECT"] = "Cohere_RAG_Online_Eval"
# os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGSMITH_API_KEY"] = os.getenv("LANGSMITH_API_KEY")
# client = Client()

In [90]:
from ragas import evaluate
# from ragas.integrations.langsmith import evaluate
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
    context_recall,
    context_precision,
)
result_2 = evaluate(
    dataset_chunking_2,
    metrics=[
        answer_relevancy,
        faithfulness,
        context_recall,
        context_precision,
    ],
)

Evaluating:   0%|          | 0/36 [00:00<?, ?it/s]

In [91]:
result_2

{'answer_relevancy': 0.9585, 'faithfulness': 0.5833, 'context_recall': 0.5242, 'context_precision': 0.6198}

In [92]:
result_2.to_pandas()

Unnamed: 0,question,answer,contexts,ground_truth,answer_relevancy,faithfulness,context_recall,context_precision
0,How does the role of the Senior Director respo...,The Senior Director responsible for Analytics ...,[page_content='initiatives.Feedback Loop: Impl...,The role of the Senior Director responsible fo...,0.921495,0.5,0.545455,0.78567
1,What is the importance of identifying and addr...,Identifying and addressing growth areas in sel...,[page_content='assessments are used to ensure ...,Identifying and addressing growth areas in sel...,0.990183,0.4,0.0,0.0
2,What forms of unethical behavior are strictly ...,In the recruitment process at Tech Innovators ...,[page_content='standards in all recruitment ac...,Favoritism or nepotism,0.886105,0.857143,1.0,0.683333
3,What is the significance of emotional and aest...,Emotional and aesthetic labor are significant ...,[page_content='healthy work-life balance and p...,Emotional and aesthetic labor in the workplace...,0.976493,0.75,1.0,1.0
4,What is the purpose of the orientation session...,The purpose of the orientation session at Tech...,[page_content='Welcome to Tech Innovators Inc....,The purpose of the orientation session at Tech...,0.999999,0.181818,0.0,0.25
5,What mechanisms are in place for reporting vio...,"At Tech Innovators Inc., employees can report ...",[page_content='IntroductionTech Innovators Inc...,Employees can report violations of labor laws ...,0.962563,0.666667,0.5,0.281548
6,How do employee engagement and disengagement d...,Employee engagement and disengagement differ s...,[page_content='Employee Engagement?Employee en...,Employee engagement and disengagement differ i...,0.960525,0.782609,1.0,0.864435
7,What steps are needed to extract data from Con...,To extract data from Confluence and create a R...,[page_content='IntroductionThis guide provides...,To extract data from Confluence and create a R...,0.948293,0.666667,0.272727,0.880258
8,How does Tech Innovators Inc. promote employee...,Tech Innovators Inc. promotes employee engagem...,[page_content='opportunities. Employment Relat...,Tech Innovators Inc. promotes employee engagem...,0.980843,0.444444,0.4,0.833333


In [93]:
# Save the extracted data for evaluation use
file_name = "eval_result_dataset_hr_chunking2.csv"
result_2.to_pandas().to_csv(os.path.join(from_root(), "data-test/test_dataset/", file_name), index=False)