# Overview

The provided Jupyter notebook performs a pre-production evaluation of a Reading and Writing Assistant (RAG) system using a synthetic dataset. The notebook consists of several sections that can be summarized as follows:

**1. Environment Setup and Data Loading**

- The notebook begins by setting up the environment by loading environment variables from a `.env` file using the `dotenv` library.
- It then loads synthetic data generated by another Jupyter notebook (`1-ragas-synthetic-test-data-generation.ipynb`) containing question-ground truth pairs.

**2. Data Preparation**

- The notebook defines several functions for serializing and deserializing data:
  - `serialize_list` and `deserialize_list` functions convert lists to JSON strings and vice versa.
  - `save_dataframe_with_list_column` and `load_dataframe_with_list_column` functions save and load pandas DataFrames with list columns to/from CSV files while preserving list structures.
- These functions are used to prepare the test data for evaluation.

**3. Loading Test Data**

- The notebook loads test data from a JSON file (`test_dataset_hr_groq.json`) generated by another Jupyter notebook (`2-chat-history-extraction.ipynb`) using the `json_to_dataframe` function.
- The test data is converted into a pandas DataFrame.

**4. Converting to RAGAS Data Format**

- The loaded test data is converted into the RAGAS dataset format using the `Dataset` class from the `datasets` library.
- The data is organized into columns for questions, answers, contexts, and ground truths.

**5. Evaluation**

- The notebook performs an evaluation of the RAG system using the `evaluate` function from the `ragas` library.
- The evaluation metrics used are answer relevancy, faithfulness, context recall, and context precision.
- The results are stored in a pandas DataFrame and saved to a CSV file.

**Evaluation Metrics**

The notebook evaluates the RAG system using the following metrics:

- **Answer Relevancy**: measures the relevance of the generated answer to the question.
- **Faithfulness**: measures the accuracy of the generated answer with respect to the context.
- **Context Recall**: measures the proportion of context information included in the generated answer.
- **Context Precision**: measures the proportion of generated answer that is relevant to the context.

**Output**

**The notebook outputs the evaluation results in a pandas DataFrame and saves it to a CSV file (`eval_result_pre_prod_dataset_hr_groq_deployment.csv`).**


# Prepare testing data

In [1]:
import os
from dotenv import load_dotenv
load_dotenv(encoding='utf-8')

True

**When saving the test data in the notebook 2-chat-history-extraction.ipynb, we serialized the contexts column. To load that file to a dataframe, we need a function to de-serialize it.**

In [2]:
import json
import pandas as pd

def serialize_list(value):
    """Serializes a list to a JSON string."""
    return json.dumps(value)

def deserialize_list(value):
    """Deserializes a JSON string back into a list."""
    return json.loads(value)

def save_dataframe_with_list_column(df, filename):
    """Saves a DataFrame with a list column to a CSV file, preserving the list structure.

    Args:
        df: The DataFrame to save.
        filename: The name of the output CSV file.
    """

    # Apply the serialization function to the list column
    df['contexts'] = df['contexts'].apply(serialize_list)

    # Save the DataFrame to CSV
    df.to_csv(filename, index=False)

def load_dataframe_with_list_column(filename):
    """Loads a DataFrame from a CSV file, restoring the list structure.

    Args:
        filename: The name of the input CSV file.

    Returns:
        The loaded DataFrame.
    """

    # Load the DataFrame
    df = pd.read_csv(filename)

    # Apply the deserialization function to the list column
    df['contexts'] = df['contexts'].apply(deserialize_list)

    return df

## Load the test data from the chat history extraction process

In [3]:
import pandas as pd
import json

def json_to_dataframe(json_file_path):
  """Reads a JSON file and converts it to a pandas DataFrame.

  Args:
    json_file_path (str): The path to the JSON file.

  Returns:
    pandas.DataFrame: The DataFrame created from the JSON data.
  """

  with open(json_file_path, 'r') as f:
    data = json.load(f)

  # Handle different JSON structures
  if isinstance(data, list):
    # If the JSON data is a list of dictionaries, create a DataFrame directly
    df = pd.DataFrame(data)
  elif isinstance(data, dict):
    # If the JSON data is a single dictionary, convert it to a list of dictionaries
    df = pd.DataFrame([data])
  else:
    raise ValueError("Unsupported JSON structure")

  return df

In [5]:
import pandas as pd
from from_root import from_root
file_name = "test_dataset_hr_groq.json"
df_hr_groq = json_to_dataframe(os.path.join(from_root(), "data-test/test-dataset/", file_name))

In [6]:
df_hr_groq

Unnamed: 0,question,contexts,ground_truth,response
0,What is Tech Innovators Inc.'s approach to wor...,[Q12: What is the process for handling workpla...,Tech Innovators Inc. has a zero-tolerance poli...,"According to the provided document, Tech Innov..."
1,What resources should be added for new hires i...,[:check_mark:\natlassian-check_mark\n#FFFAE6\n...,Add resources for new hires in the onboarding ...,"According to the provided document, for the on..."
2,What training programs are offered in data sci...,"[ learn about SEO, social media marketing, and...","Courses covering financial analysis, budgeting...","According to the provided document, Tech Innov..."
3,What services does the Employee Assistance Pro...,[Q12: What is the process for handling workpla...,Employees can contact the Employee Assistance ...,"According to the provided document, the Employ..."
4,What is the significance of identifying growth...,[ Self-assessment\nStart by thinking through y...,Identifying growth areas in self-assessment is...,I don't know the answer to that question becau...
5,How do employee engagement and disengagement d...,"[Introduction\nAt Tech Innovators Inc., we bel...",Employee engagement and disengagement differ i...,"According to the provided document, employee e..."
6,What is the purpose of adding a header image t...,[Create a stellar overview\nThe overview is th...,Adding a header image to a Confluence space en...,"According to the provided document, adding a h..."


## Convert to RAGAS data format

In [7]:
# Convert testing data to RAGAS Dataset format
from datasets import Dataset

question = list(df_hr_groq['question'])
answer = list(df_hr_groq['response'])
contexts = list(df_hr_groq['contexts'])
ground_truth = list(df_hr_groq['ground_truth'])

data = {
    'question': question,
    'answer': answer,
    'contexts': contexts,
    'ground_truth': ground_truth
}

dataset = Dataset.from_dict(data)

# Evaluation

In [11]:
# Uncomment this block of code if you want to store the evluation on LangSmith

# from langsmith import Client
# os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
# os.environ["LANGCHAIN_PROJECT"] = os.getenv('LANGCHAIN_PROJECT')
# os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGSMITH_API_KEY"] = os.getenv("LANGSMITH_API_KEY")
# client = Client()

In [8]:
from ragas import evaluate
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
    context_recall,
    context_precision,
)
result = evaluate(
    dataset,
    metrics=[
        answer_relevancy,
        faithfulness,
        context_recall,
        context_precision,
    ],
)

result

Evaluating:   0%|          | 0/28 [00:00<?, ?it/s]

No statements were generated from the answer.


{'answer_relevancy': 0.8369, 'faithfulness': 0.4528, 'context_recall': 0.7619, 'context_precision': 0.8571}

In [9]:
df = result.to_pandas()
df

Unnamed: 0,question,answer,contexts,ground_truth,answer_relevancy,faithfulness,context_recall,context_precision
0,What is Tech Innovators Inc.'s approach to wor...,"According to the provided document, Tech Innov...",[Q12: What is the process for handling workpla...,Tech Innovators Inc. has a zero-tolerance poli...,1.0,0.666667,1.0,1.0
1,What resources should be added for new hires i...,"According to the provided document, for the on...",[:check_mark:\natlassian-check_mark\n#FFFAE6\n...,Add resources for new hires in the onboarding ...,0.997387,0.666667,1.0,1.0
2,What training programs are offered in data sci...,"According to the provided document, Tech Innov...","[ learn about SEO, social media marketing, and...","Courses covering financial analysis, budgeting...",0.989385,0.083333,0.0,0.0
3,What services does the Employee Assistance Pro...,"According to the provided document, the Employ...",[Q12: What is the process for handling workpla...,Employees can contact the Employee Assistance ...,0.994299,1.0,1.0,1.0
4,What is the significance of identifying growth...,I don't know the answer to that question becau...,[ Self-assessment\nStart by thinking through y...,Identifying growth areas in self-assessment is...,0.0,0.0,1.0,1.0
5,How do employee engagement and disengagement d...,"According to the provided document, employee e...","[Introduction\nAt Tech Innovators Inc., we bel...",Employee engagement and disengagement differ i...,0.915311,,1.0,1.0
6,What is the purpose of adding a header image t...,"According to the provided document, adding a h...",[Create a stellar overview\nThe overview is th...,Adding a header image to a Confluence space en...,0.961716,0.3,0.333333,1.0


In [11]:
# Save the result data
file_name = "eval_result_pre_prod_dataset_hr_groq_deployment.csv"
result.to_pandas().to_csv(os.path.join(from_root(), "data-test/eval-result/", file_name), index=False)