# MemoRAG: Enhancing Retrieval-Augmented Generation with Memory Models

## Overview
MemoRAG is a Retrieval-Augmented Generation (RAG) framework that incorporates a memory model as an auxiliary step before the retrieval phase. In doing so, it bridges the gap in contextual understanding and reasoning that standard RAG techniques face when addressing queries with implicit or ambiguous information needs and unstructured external knowledge.

## Motivation
Standard RAG techniques rely heavily on lexical or semantic matching between the query and the knowledge base. While this approach works well for clear question answering tasks with structured knowledge, it often falls short when handling queries with implicit or ambiguous information (e.g., describing the relationships between main characters in a novel) or when the knowledge base is unstructured (e.g., fiction books). In such cases, lexical or semantic matching seldom produces the desired outputs.

## Key Components
1. **Memory**: A compressed representation of the database created by a long-context model, designed to handle and summarize extensive inputs efficiently.
2. **Retriever**: A standard RAG retrieval model responsible for selecting relevant context from the knowledge base to support the generator.
3. **Generator**: A generative language model that produces responses by combining the query with the retrieved context, similar to standard RAG setups.

## Method Details
### 1. Memory
- The memory module serves as an auxiliary component to enhance the retriever’s ability to identify better matches between queries and relevant parts of the database. It takes the original query and the database as inputs and produces staging answers — intermediate outputs like clues, surrogate queries, or key points — which the retriever uses instead of the original query.
- Long-term memory is constructed by running a long-context model, such as Qwen2-7B-Instruct or Mistral-7B-Instruct-v0.2, over the entire database. This process generates a compressed representation of the database through an attention mechanism.
- The compressed representation is stored as key-value pairs, facilitating efficient and accurate retrieval.
- Released memory models include memorag-qwen2-7b-inst and memorag-mistral-7b-inst, derived from Qwen2-7B-Instruct and Mistral-7B-Instruct-v0.2, respectively.

### 2. Retriever
- The retriever is a standard retrieval model, adapted to take processed queries (created by the memory module as staging answers) instead of the original query.
- It outputs the retrieved **context**, which serves as the basis for generating the final answer.


### 3. Generator
- The generator produces the final response by combining the retriever’s output (retrieved context) with the original query.
- MemoRAG ensures compatibility and consistency by using the memory module’s underlying model as the default generator.

## Benefits of the Approach
1. **Extended Scope of Queries:** MemoRAG's preprocessing capabilities enable it to handle complex and long-context tasks that conventional RAG methods struggle with.

2. **Improved Accuracy:** By simplifying and adjusting queries before retrieval, MemoRAG enhances performance over standard RAG methods.

3. **Flexibility:** Adapts to diverse tasks, datasets, and retrieval scenarios.

4. **Robustness:** Improved performance remains consistent across various generators, datasets, and query types.

5. **Efficiency**: The use of key-value compression reduces computational overhead.

## Conclusion
The memory module in MemoRAG significantly enhances comprehension of both the queries and the database, enabling more effective retrieval. Its ability to preprocess queries, generate staging answers, and leverage long-context memory models ensures high-quality responses, making MemoRAG a significant step forward in the evolution of retrieval-augmented generation.


<div style="text-align: center;">

<img src="../images/memo_rag.svg" alt="MemoRAG" style="width:100%; height:auto;">
</div>

## Implementation

### Imports

In [1]:
import os
from dotenv import load_dotenv
from langchain_community.embeddings import OpenAIEmbeddings
from openai import OpenAI
from helper_functions import *


### OpenAI Setup

In [2]:
load_dotenv()
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

### Memory Module Classes

In [3]:
class KeyValuePair(BaseModel):
    topic: str
    details: str

class MemoryResponse(BaseModel):
    pairs: List[KeyValuePair]

In [4]:
class MemoryStore:
    """The MemoryStore class is a realization of the Memory Module discussed in the paper.
    Its 'memorize' method is used to create a prompt cache mimicking a key-value compression of the original text (database).
    This cache can then be used by the 'create_retrieval_prompt' method for creating the processed query to be used later for retrieval"""

    def __init__(self):
        self.embeddings = OpenAIEmbeddings()
        self.store = None
        self.processed_count = 0
        self.json_parse_failures = 0
        self._last_parse_used_fallback = False


    def memorize(self, document: str):
        """Process document into key-value pairs and store them"""

        # self.reset()
        # batch_size = 10
        system_prompt = (
            "You are an expert at extracting structured key-value pairs from text. "
            "Identify key topics, entities, or questions and pair them with relevant, detailed information. "
            "Ensure the output is well-structured, avoiding redundancy while preserving meaning."
        )

        kv_cache_prompt = """
        You are provided with a long article, chunk by chunk. Read each chunk carefully and extract key topics and their detailed information.
        For each key topic:
        1. Identify the main concept, entity, or potential question
        2. Provide the corresponding detailed information or answer

        Note: the aim is to mimic kv cache memory creation

        Now, the article begins:
        {document}

        The article ends here.

        RESPONSE FORMAT IS IMPORTANT. You must return a JSON object with the following structure:
        {{
            "pairs": [
                {{
                    "topic": "topic1",
                    "details": "details1"
                }},
                {{
                    "topic": "topic2",
                    "details": "details2"
                }}
            ]
        }}
        """

        print(f"Processing chunk {self.processed_count + 1}...")

        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": kv_cache_prompt.format(document=document)}
            ],
            response_format={"type": "json_object"},
            seed=42  # For reproducibility
        )

        pairs = self._parse_into_pairs(response.choices[0].message.content)
        if hasattr(self, 'json_parse_failures') and self._last_parse_used_fallback:
            self.json_parse_failures += 1
            if self.json_parse_failures > 0 and self.json_parse_failures % 5 == 0:
               print(f"Warning: JSON parsing has failed {self.json_parse_failures} times")

        print('*'*20)
        print(f'first extracted pairs: {pairs[0:3]}\n\n')

        # Batch process pairs
        texts = []
        metadatas = []

        for topic, details in pairs:
            if not topic or not details:  # Skip empty pairs
                continue

            combined_text = f"Topic: {topic}\nDetails: {details}"
            texts.append(combined_text)
            metadatas.append({"topic": topic})
        print('*'*20)
        print(f'num of extracted texts: {len(texts)}\n\n')
        if self.store is None:
            self.store = FAISS.from_texts(texts, self.embeddings, metadatas=metadatas)
        else:
            existing_docs = self.store.similarity_search("")
            existing_topics = {doc.metadata.get("topic") for doc in existing_docs}

            # Filter out duplicates
            new_texts = []
            new_metadatas = []
            for text, metadata in zip(texts, metadatas):
                if metadata["topic"] not in existing_topics:
                    new_texts.append(text)
                    new_metadatas.append(metadata)

            if new_texts:
                self.store.add_texts(new_texts, metadatas=new_metadatas)

        self.processed_count += 1
        print(f"Processed {self.processed_count} chunks so far\n\n")

    def _parse_into_pairs(self, content: str):
        self._last_parse_used_fallback = False
        try:
            # Parse JSON and validate with Pydantic
            data = json.loads(content)
            validated_data = MemoryResponse(**data)
            return [(pair.topic, pair.details) for pair in validated_data.pairs]
        except (json.JSONDecodeError, ValueError) as e:
            print(f"Error parsing response: {e}")
            # Fall back to text parsing if JSON fails
            self._last_parse_used_fallback = True
            return self._parse_text_into_pairs(content)

    def _parse_text_into_pairs(self, text: str):
        """Original text parsing method as fallback"""
        pairs = []
        lines = text.split('\n')
        current_topic = None
        current_details = []

        for line in lines:
            if line.startswith('Topic:'):
                if current_topic:  # Save previous pair
                    pairs.append((current_topic, ' '.join(current_details)))
                current_topic = line[6:].strip()
                current_details = []
            elif line.startswith('Details:'):
                current_details.append(line[8:].strip())
            elif current_details:  # Continue adding to details if already started
                current_details.append(line.strip())

        # Add the last pair
        if current_topic:
            pairs.append((current_topic, ' '.join(current_details)))

        return pairs

    def create_retrieval_queries(self, query: str) -> str | None:
        """Generate staging answers y = Θ_mem(q, D | θ_mem)
        This should provide rough clues/outline to guide context retrieval"""

        if not self.store:
            return None

        results = self.store.similarity_search_with_score(query, k=min(10, self.store.index.ntotal)) # consider increasing k to more than 10. maybe try even top 30% or something
        relevant_info = [
            f"Topic: {doc.metadata['topic']}\nDetails: {doc.page_content}"
            for doc, _ in results
        ]

        memorag_span_prompt = """
            You are given a question related to an article. To answer it effectively, you need to use specific details from the article. You are not provided with the whole article. Instead, you are provided with specific and relevant information from it.
             Your task is to identify and extract one or more specific clue texts from the provided information that are relevant to the question.

            ### Question: {question}
            ### Information: {relevant_info}
            ### Instructions:
            1. You have a general understanding of the provided information. Your task is to generate one or more specific clues that will help in searching for supporting evidence within the article.
            2. The clues are in the form of text spans that will assist in answering the question.
            3. Only output the clues. If there are multiple clues, separate them with a newline.
            """
        memorag_sur_prompt = """
            You are given a question related to an article. To answer it effectively, you need to use specific details from the article.  You are not provided with the whole article. Instead, you are provided with specific and relevant information from it.
            Your task is to generate precise clue questions that can help locate the necessary information for answering the question.

            ### Question: {question}
            ### Information: {relevant_info}
            ### Instructions:
            1. You have a general understanding of the provided information. Your task is to generate one or more specific clues that will help in searching for supporting evidence within the article.
            2. The clues are in the form of precise surrogate questions that clarify the original question.
            3. Only output the clues. If there are multiple clues, separate them with a newline.
            """

        system_prompt = "You are an expert in information retrieval. Your task is to extract specific and relevant information from the provided context to help answer the given question."

        text_spans = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": memorag_span_prompt.format(question=query, relevant_info = relevant_info)}
            ]
        ).choices[0].message.content
        surrogate_queries = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": memorag_sur_prompt.format(question=query, relevant_info = relevant_info)}
            ],
            temperature=0.5  # Allow for some creativity in generating clues
        ).choices[0].message.content

        retrieval_query = text_spans.split("\n") + surrogate_queries.split("\n")
        retrieval_query = [q for q in retrieval_query if len(q.split()) > 3]
        retrieval_query.append(query)

        return retrieval_query

    def save_store(self, path: str):
        if self.store:
            self.store.save_local(path)

    def load_store(self, path: str):
        self.store = FAISS.load_local(path, self.embeddings, allow_dangerous_deserialization=True)

### Retrieval Function

In [5]:
def retrieve_context(retrieval_query: str, vectorstore, k: int = 3) -> List[str]:
    """Retrieve relevant context using the staging answer and the database vectorstore.
    Implements c = Γ(y, D | γ) from the paper"""

    results = vectorstore.similarity_search(retrieval_query, k=k)
    contexts = [doc.page_content for doc in results]

    return contexts

### Generation Function

In [6]:
def generate_answer(query: str, contexts: List[str], temperature: float = 0) -> str:
    """Generate final answer y = Θ(q, c | θ)"""
    prompt = f"""Based ONLY on the provided context, answer the query.

Query: {query}

Context:
{' '.join(contexts)}

Provide a clear and concise answer based solely on the information in the context. If the context doesn't contain sufficient information to fully answer the query, clearly state that the necessary information is not available in the provided context rather than using any external knowledge.
"""

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are an expert in generating answers from given text. Provide clear, concise answers."},
            {"role": "user", "content": prompt}
        ],
        temperature=temperature,
        max_tokens=500
    )

    return response.choices[0].message.content

### Query Processing Function

In [7]:
def process_query(query: str, memory_store, vectorstore):
    print('*'*20)
    print("\nProcessing Query:", query)
    print("=" * 50)

    # y = Θ_mem(q, D | θ_mem)
    retrieval_queries = memory_store.create_retrieval_queries(query)
    print('*'*20)
    print(f"\n\nRetrieval Queries:\n{retrieval_queries}\n\n")

    # c = Γ(y, D | γ)
    all_contexts = []
    for retrieval_query in retrieval_queries:
        contexts = retrieve_context(retrieval_query, vectorstore, k=3)
        all_contexts.extend(contexts)

    unique_contexts = list(dict.fromkeys(all_contexts))  # Preserves order
    # contexts = [retrieve_context(retrieval_query, vectorstore) for retrieval_query in retrieval_queries][0]
    print('*'*20)
    print(f"Retrieved Context Example: {unique_contexts[0]}\n\n")

    # y = Θ(q, c | θ)
    final_answer = generate_answer(query, unique_contexts)
    print('*'*20)
    print(f"\nFinal Answer: {final_answer}")

    return unique_contexts, final_answer

### Initialize Components

In [None]:
# Initialize memory store
climate_memory_store = MemoryStore()

# Load and process document
path = "../data/Understanding_Climate_Change.pdf"
# loader = PyPDFLoader(path)
# documents = loader.load()
# document_text = '\n'.join([doc.page_content for doc in documents])
# climate_memory_store.memorize(document_text)
climate_vectorstore = encode_pdf(path, chunk_size=1000, chunk_overlap=200)

In [None]:
# climate_memory_store.save_store("../data/Understanding_Climate_Change_Memory_Store.faiss")

### Usage Examples

In [None]:
climate_memory_store.load_store("../data/Understanding_Climate_Change_Memory_Store.faiss")

In [None]:
query_1 = "What are the impacts of climate change on biodiversity?"
query_2 = "Please summarize the climate change article"
query_3 = "Describe the social and economic influence of climate change."

for query in [query_1, query_2, query_3]:
    process_query(query, climate_memory_store, climate_vectorstore)


## Comparison Short Contex

### Simple RAG

In [8]:
from evaluation.evalute_rag import *



In [None]:
chunks_query_retriever = climate_vectorstore.as_retriever(search_kwargs={"k": 2})

In [None]:
evaluate_rag(chunks_query_retriever, q_a_file_name = "../data/q_a.json")

### MemoRAG

In [12]:
# import time

def evaluate_memo_rag(memory_store = None,vectorstore = None, num_questions: int = 5,
                      q_a_file_name = "../data/q_a.json",
                      use_saved_results=True,
                      results_path_prefix="../data/saved_evaluation_results.json",
                      use_original_memorag=False,
                      original_memorag_answers=None,
                      original_memorag_retrievals=None
                      )-> None:
    """
    Evaluate the MemoRAG system using predefined metrics.

    Args:
        memory_store: MemoryStore object containing the memory model.
        vectorstore: FAISS vector store containing the encoded document chunks.
        num_questions (int): Number of questions to evaluate (default: 5).
        q_a_file_name (str): Path to the JSON file containing questions and answers (default: "../data/q_a.json").
        use_saved_results (bool): Whether to load previously saved results or to save generated results to disk.
        results_path_prefix (str): Path to save/load generated results.
        use_original_memorag (bool): Whether to use paper's MemoRAG results or simplified MemoRAG implementation from this notebook.
        original_memorag_answers: Pre-generated answers to use in case use_original_memorag=True.
        original_memorag_retrievals: Pre-generated retrievals to use in case use_original_memorag=True.
    """
    q_a_file_name = q_a_file_name
    with open(q_a_file_name, "r", encoding="utf-8") as json_file:
        q_a = json.load(json_file)

    questions = [qa["question"] for qa in q_a][:num_questions]
    ground_truth_answers = [qa["answer"] for qa in q_a][:num_questions]

    full_results_path = f"{results_path_prefix}_{q_a_file_name}"
    if use_saved_results and os.path.exists(full_results_path):
        with open(full_results_path, "r", encoding="utf-8") as f:
            saved_data = json.load(f)
            generated_answers = saved_data["generated_answers"]
            retrieved_documents = saved_data["retrieved_documents"]
        print(f"Loaded previously saved results from {full_results_path}")
    else:
        # Generate answers and retrieve documents for each question
        generated_answers = []
        retrieved_documents = []
        for question in questions:
            contexts, result = process_query(question, memory_store, vectorstore)
            retrieved_documents.append(contexts)
            generated_answers.append(result)

        with open(full_results_path, "w", encoding="utf-8") as f:
            json.dump({
                "generated_answers": generated_answers,
                "retrieved_documents": retrieved_documents
            }, f, ensure_ascii=False, indent=2)
        print(f"Saved results to {full_results_path}")

    # Create test cases and evaluate
    test_cases = create_deep_eval_test_cases(questions, ground_truth_answers, generated_answers, retrieved_documents)
    # batch_size = 1

    # for i in range(0, len(test_cases), batch_size):
    #     batch = test_cases[i:i+batch_size]
    #     evaluate(
    #         test_cases=batch,
    #         metrics=[correctness_metric, faithfulness_metric, relevance_metric],
    #         throttle_value=5,
    #         run_async = False,
    #         ignore_errors = False,
    #         print_results=False
    #     )
    #     time.sleep(15)

    evaluate(
        test_cases=test_cases,
        metrics=[correctness_metric, faithfulness_metric, relevance_metric],
        throttle_value = 5,
        run_async = False,
        ignore_errors = True
    )

In [None]:
evaluate_memo_rag(q_a_file_name = "../data/q_a.json", memory_store=climate_memory_store, vectorstore = climate_vectorstore)

## Comparison Long Context

In [9]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Initialize memory store
wealth_memory_store = MemoryStore()

# Load and process document
path = "../data/The_Wealth_of_Nations_Project_Gutenberg.pdf"
# loader = PyPDFLoader(path)
# documents = loader.load()
# document_text = '\n'.join([doc.page_content for doc in documents])
# Split document into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=50000,
    chunk_overlap=5000
)

In [None]:
# chunks = text_splitter.split_text(document_text)
#
# # Memorize each chunk
# for chunk in chunks:
#     wealth_memory_store.memorize(chunk)

In [10]:
# wealth_memory_store.save_store("../data/The_Wealth_of_Nations_Project_Gutenberg_Memory_Store_Long_Context.faiss")
wealth_memory_store.load_store("../data/The_Wealth_of_Nations_Project_Gutenberg_Memory_Store_Long_Context.faiss")

In [11]:
wealth_vectorstore = encode_pdf(path, chunk_size=1000, chunk_overlap=200)

### Simple RAG

In [None]:
chunks_query_retriever = wealth_vectorstore.as_retriever(search_kwargs={"k": 2})

In [None]:
evaluate_rag(chunks_query_retriever, num_questions = 15, q_a_file_name = "../data/q_a_smith_short.json")

### MemoRAG

In [None]:
evaluate_memo_rag(num_questions=15, q_a_file_name = "../data/q_a_smith_short.json", memory_store= wealth_memory_store, vectorstore = wealth_vectorstore)

## Evaluate original MemoRAG

In [None]:
original_memorag_answers_wealth = ["Adam Smith",
"The invisible hand refers to the self-interested actions of individuals in a free market economy that unintentionally lead to economic outcomes.",
"Land, labor, and capital.",
"Labor determines the value of goods through the quantity of labor required to produce them.",
"Adam Smith argues that while individuals act in their self-interest, this pursuit often benefits society more effectively than if they acted solely for personal gain. He illustrates this through examples like trade, where individuals acting in their self-interest actually facilitate economic growth and prosperity for society as a whole.",
"Smith justifies taxation as necessary for the defense of the state, the maintenance of public order, and the provision of public services.",
"Supply and demand determine prices. When supply exceeds demand, prices tend to fall; when demand exceeds supply, prices tend to rise.",
"The division of labor encourages workers to focus on specific tasks, leading to innovation in tools and methods. This innovation enhances productivity and has long-term economic impacts by increasing the quantity of goods produced and the efficiency of production processes.",
"Smith viewed agriculture as the primary source of wealth, while manufacturing contributed less significantly.",
"Foreign trade enhances domestic wealth by allowing countries to specialize in producing goods where they have a comparative advantage. This specialization leads to increased efficiency and productivity. Smith's 'invisible hand' refers to the self-interested actions of individuals and businesses that, despite their self-interest, ultimately contribute to economic growth and efficiency.",
"Capital accumulation increases the quantity of productive labor by enabling more workers to be employed. This leads to a division of labor where each worker specializes in a specific task, enhancing efficiency and productivity.",
"Challenges include the need for security, the complexity of international transactions, and the potential for fraud and corruption.",
"Value is the amount of labor required to produce a good or service.",
"Smith believed that government should not interfere with the natural division of labor, allowing individuals to specialize in their areas of expertise. He argued that government's role is to protect property rights, ensure justice, and provide public works and institutions that facilitate economic growth.",
"Free trade benefits consumers by lowering prices, increasing product variety, and enhancing economic efficiency.",
"Adam Smith argued that monopolies, whether natural or artificial, are inherently detrimental to society and the economy. They stifle competition, lead to higher prices, and reduce overall economic efficiency. Smith believed that monopolies are a form of corruption and that the benefits they confer on a few at the expense of many are unjust. He advocated for free trade and competition as essential to economic prosperity and social justice"
]

In [None]:
original_memorag_retrievals_wealth = [
    [
        "The circumstances which seem to have introduced and established this policy are explained in the third book...",
        "That which arises from the more solid improvements of agriculture is much more durable...",
        "Among civilized and thriving nations, on the contrary, though a great number of people do not labour at all..."
    ],
    [
        "Whenever he employs any part of it in maintaining unproductive hands of any kind...",
        "The rich merchant, though with his capital he maintains industrious people only...",
        "If there are any merchants among them, they are, properly, only the agents of wealthier merchants..."
    ],
    [
        "As in a well ordered state of things, therefore, those ground expenses...",
        "We should not call a marriage barren or unproductive, though it produced only a son and a daughter...",
        "Among civilized and thriving nations, on the contrary, though a great number of people do not labour at all..."
    ],
    [
        "The far greater part of them he must derive from the labour of other people...",
        "His fortune is greater or less, precisely in proportion to the extent of this power...",
        "A diamond, on the contrary, has scarce any value in use; but a very great quantity of other goods..."
    ],
    [
        "Among competitors of equal wealth and luxury, the same deficiency will generally occasion...",
        "If it is rent, the interest of the landlords will immediately prompt them...",
        "The natural price, therefore, is, as it were, the central price, to which the prices of all commodities are continually gravitating..."
    ],
    [
        "In every period, indeed, of every society, the surplus part both of the rude and manufactured produce...",
        "It then became necessary to say something about the beneficial effects of foreign trade...",
        "They naturally, perhaps necessarily, follow the mode of the times; and their expense comes to be regulated..."
    ],
    [
        "The wages of the labour, and the profits of the stock employed in bringing such commodities to market...",
        "First, by restraining the competition in some employments to a smaller number...",
        "But the public would be a gainer, the work of all artificers coming in this way much cheaper to market..."
    ]
]

In [None]:
evaluate_memo_rag(num_questions=15, q_a_file_name = "../data/q_a_smith_short.json", memory_store= wealth_memory_store, vectorstore = wealth_vectorstore, use_original_memorag=True, original_memorag_answers=original_memorag_answers_wealth, original_memorag_retrievals=original_memorag_retrievals_wealth)

In [None]:
# import json
#
# def evaluate_original_memo_rag(num_questions: int = 15, q_a_file_name = "../data/q_a.json") -> None:
#     """
#     Evaluate the MemoRAG system using predefined metrics.
#
#     Args:
#         num_questions (int): Number of questions to evaluate (default: 5).
#         q_a_file_name (str): Path to the JSON file containing questions and answers (default: "../data/q_a.json").
#
#     """
#     q_a_file_name = q_a_file_name
#     with open(q_a_file_name, "r", encoding="utf-8") as json_file:
#         q_a = json.load(json_file)
#
#     questions = [qa["question"] for qa in q_a][:num_questions]
#     ground_truth_answers = [qa["answer"] for qa in q_a][:num_questions]
#     generated_answers = original_memorag_answers
#     retrieved_documents = original_memorag_retrievals
#
#
#     # Create test cases and evaluate
#     test_cases = create_deep_eval_test_cases(questions, ground_truth_answers, generated_answers, retrieved_documents)
#     evaluate(
#         test_cases=test_cases,
#         metrics=[correctness_metric, faithfulness_metric, relevance_metric],
#         throttle_value = 3,
#         run_async = True,
#         ignore_errors = True
#     )

In [None]:
# # import time
#
# def evaluate_memo_rag(memory_store, vectorstore, num_questions: int = 5, q_a_file_name = "../data/q_a.json",
#                     use_saved_results=True, results_path_prefix="../data/saved_evaluation_results.json") -> None:
#     """
#     Evaluate the MemoRAG system using predefined metrics.
#
#     Args:
#         memory_store: MemoryStore object containing the memory model.
#         vectorstore: FAISS vector store containing the encoded document chunks.
#         num_questions (int): Number of questions to evaluate (default: 5).
#         q_a_file_name (str): Path to the JSON file containing questions and answers (default: "../data/q_a.json").
#         use_saved_results (bool): Whether to load previously saved results or to save generated results to disk.
#         results_path_prefix (str): Path to save/load generated results.
#
#     """
#     q_a_file_name = q_a_file_name
#     with open(q_a_file_name, "r", encoding="utf-8") as json_file:
#         q_a = json.load(json_file)
#
#     questions = [qa["question"] for qa in q_a][:num_questions]
#     ground_truth_answers = [qa["answer"] for qa in q_a][:num_questions]
#
#     full_results_path = f"{results_path_prefix}_{q_a_file_name}"
#     if use_saved_results and os.path.exists(full_results_path):
#         with open(full_results_path, "r", encoding="utf-8") as f:
#             saved_data = json.load(f)
#             generated_answers = saved_data["generated_answers"]
#             retrieved_documents = saved_data["retrieved_documents"]
#         print(f"Loaded previously saved results from {full_results_path}")
#     else:
#         # Generate answers and retrieve documents for each question
#         generated_answers = []
#         retrieved_documents = []
#         for question in questions:
#             contexts, result = process_query(question, memory_store, vectorstore)
#             retrieved_documents.append(contexts)
#             generated_answers.append(result)
#
#         with open(full_results_path, "w", encoding="utf-8") as f:
#             json.dump({
#                 "generated_answers": generated_answers,
#                 "retrieved_documents": retrieved_documents
#             }, f, ensure_ascii=False, indent=2)
#         print(f"Saved results to {full_results_path}")
#
#     # Create test cases and evaluate
#     test_cases = create_deep_eval_test_cases(questions, ground_truth_answers, generated_answers, retrieved_documents)
#
#     evaluate(
#         test_cases=test_cases,
#         metrics=[correctness_metric, faithfulness_metric, relevance_metric],
#         throttle_value = 5,
#         run_async = False,
#         ignore_errors = True
#     )

In [None]:
# evaluate_original_memo_rag(num_questions=2, q_a_file_name = "../data/q_a_smith_short_test.json")