# MemoRAG: Enhancing Retrieval-Augmented Generation with Memory Models

## Overview
MemoRAG is a Retrieval-Augmented Generation (RAG) framework that incorporates a memory model as an auxiliary step before the retrieval phase. In doing so, it bridges the gap in contextual understanding and reasoning that standard RAG techniques face when addressing queries with implicit or ambiguous information needs and unstructured external knowledge.

## Motivation
Standard RAG techniques rely heavily on lexical or semantic matching between the query and the knowledge base. While this approach works well for clear question answering tasks with structured knowledge, it often falls short when handling queries with implicit or ambiguous information (e.g., describing the relationships between main characters in a novel) or when the knowledge base is unstructured (e.g., fiction books). In such cases, lexical or semantic matching seldom produces the desired outputs.

## Key Components
1. **Memory**: A compressed representation of the database created by a long-context model, designed to handle and summarize extensive inputs efficiently.
2. **Retriever**: A standard RAG retrieval model responsible for selecting relevant context from the knowledge base to support the generator.
3. **Generator**: A generative language model that produces responses by combining the query with the retrieved context, similar to standard RAG setups.

## Method Details
### 1. Memory
- The memory module serves as an auxiliary component to enhance the retriever’s ability to identify better matches between queries and relevant parts of the database. It takes the original query and the database as inputs and produces staging answers — intermediate outputs like clues, surrogate queries, or key points — which the retriever uses instead of the original query.
- Long-term memory is constructed by running a long-context model, such as Qwen2-7B-Instruct or Mistral-7B-Instruct-v0.2, over the entire database. This process generates a compressed representation of the database through an attention mechanism.
- The compressed representation is stored as key-value pairs, facilitating efficient and accurate retrieval.
- Released memory models include memorag-qwen2-7b-inst and memorag-mistral-7b-inst, derived from Qwen2-7B-Instruct and Mistral-7B-Instruct-v0.2, respectively.

### 2. Retriever
- The retriever is a standard retrieval model, adapted to take processed queries (created by the memory module as staging answers) instead of the original query.
- It outputs the retrieved **context**, which serves as the basis for generating the final answer.


### 3. Generator
- The generator produces the final response by combining the retriever’s output (retrieved context) with the original query.
- MemoRAG ensures compatibility and consistency by using the memory module’s underlying model as the default generator.

## Benefits of the Approach
1. **Extended Scope of Queries:** MemoRAG's preprocessing capabilities enable it to handle complex and long-context tasks that conventional RAG methods struggle with.

2. **Improved Accuracy:** By simplifying and adjusting queries before retrieval, MemoRAG enhances performance over standard RAG methods.

3. **Flexibility:** Adapts to diverse tasks, datasets, and retrieval scenarios.

4. **Robustness:** Improved performance remains consistent across various generators, datasets, and query types.

5. **Efficiency**: The use of key-value compression reduces computational overhead.

## Conclusion
The memory module in MemoRAG significantly enhances comprehension of both the queries and the database, enabling more effective retrieval. Its ability to preprocess queries, generate staging answers, and leverage long-context memory models ensures high-quality responses, making MemoRAG a significant step forward in the evolution of retrieval-augmented generation.


<div style="text-align: center;">

<img src="../images/memo_rag.svg" alt="MemoRAG" style="width:100%; height:auto;">
</div>

## Implementation

### Imports

In [35]:
import os
from dotenv import load_dotenv
# from typing import List
from langchain_community.embeddings import OpenAIEmbeddings
# from langchain_community.vectorstores import FAISS
# from langchain_community.document_loaders import PyPDFLoader
from openai import OpenAI
from helper_functions import *

### OpenAI Setup

In [36]:
load_dotenv()
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

### Memory Module Classes

In [37]:
class MemoryStore:
    """The MemoryStore class is a realization of the Memory Module discussed in the paper.
    Its 'memorize' method is used to create a key-value compression of the original text (database).
    This Compression can then be used by the 'get_staging_answer' method for creating the processed query to be used later for retrieval"""

    def __init__(self):
        self.embeddings = OpenAIEmbeddings()
        self.store = None

    def memorize(self, document: str):
        """Process document into key-value pairs and store them
        Keys: Embeddings of main topics/questions
        Values: Corresponding detailed information"""

        extraction_prompt = """Extract key topics and their detailed information from this text.
        For each key topic:
        1. Identify the main concept, entity, or potential question
        2. Provide the corresponding detailed information or answer

        Text:
        {document}

        Format each pair as:
        Topic: <topic>
        Details: <details>
        """

        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "Extract key-value pairs from document"},
                {"role": "user", "content": extraction_prompt.format(document=document)}
            ]
        )

        # Parse the response into key-value pairs
        pairs = self._parse_into_pairs(response.choices[0].message.content)

        # Store the pair
        for topic, details in pairs:
            # Store both topic and details as searchable text
            combined_text = f"Topic: {topic}\nDetails: {details}"

            if self.store is None:
                self.store = FAISS.from_texts([combined_text], self.embeddings,
                                            metadatas=[{"topic": topic}])
            else:
                self.store.add_texts([combined_text], metadatas=[{"topic": topic}])

    def _parse_into_pairs(self, text: str):
        """Parse GPT response into list of (topic, details) pairs"""
        pairs = []
        lines = text.split('\n')
        current_topic = None
        current_details = []

        for line in lines:
            if line.startswith('Topic:'):
                if current_topic:  # Save previous pair
                    pairs.append((current_topic, ' '.join(current_details)))
                current_topic = line[6:].strip()
                current_details = []
            elif line.startswith('Details:'):
                current_details.append(line[8:].strip())

        # Add the last pair
        if current_topic:
            pairs.append((current_topic, ' '.join(current_details)))

        return pairs

    def get_staging_answer(self, query: str) -> str:
        """Generate staging answer y = Θ_mem(q, D | θ_mem)
        This should provide rough clues/outline to guide context retrieval"""

        if not self.store:
            return None

        results = self.store.similarity_search_with_score(query, k=5)
        relevant_info = [
            f"Topic: {doc.metadata['topic']}\nDetails: {doc.page_content}"
            for doc, _ in results
        ]

        prompt = f"""Based on available information, generate a rough outline/staging answer.
        This should help guide retrieval of detailed context, but doesn't need to be fully accurate.

        Query: {query}

        Relevant Information:
        {relevant_info}

        Generate a rough outline that could help locate the correct answers:"""

        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "Generate rough outlines to guide information retrieval"},
                {"role": "user", "content": prompt}
            ],
            temperature=0.7  # Allow for some creativity in generating clues
        )

        return response.choices[0].message.content

### Retrieval Function

In [38]:
def retrieve_context(staging_answer: str, vectorstore) -> List[str]:
    """Retrieve relevant context using the staging answer and the database vectorstore.
    Implements c = Γ(y, D | γ) from the paper"""

    results = vectorstore.similarity_search(staging_answer, k=5)
    contexts = [doc.page_content for doc in results]

    return contexts

### Generation Function

In [39]:
def generate_answer(query: str, contexts: List[str]) -> str:
    """Generate final answer y = Θ(q, c | θ)"""
    prompt = f"""Based on the provided context, answer the query.

Query: {query}

Retrieved Information:
{' '.join(contexts)}

Provide a clear and concise answer focusing only on the retrieved information.
"""

    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a knowledgeable assistant. Provide clear, concise answers."},
            {"role": "user", "content": prompt}
        ],
        max_tokens=200,
        temperature=0.7
    )

    return response.choices[0].message.content

### Query Processing Function

In [40]:
def process_query(query: str, memory_store, vectorstore):
    print("\nProcessing Query:", query)
    print("=" * 50)

    # y = Θ_mem(q, D | θ_mem)
    staging_answer = memory_store.get_staging_answer(query)
    print(f"Staging Answer:\n{staging_answer}")

    # c = Γ(y, D | γ)
    contexts = retrieve_context(staging_answer, vectorstore)
    print(f"Retrieved Context Example: {contexts[0]}")

    # y = Θ(q, c | θ)
    final_answer = generate_answer(query, contexts)
    print(f"Final Answer: {final_answer}")

    return contexts, final_answer

### Initialize Components

In [41]:
# Initialize memory store
memory_store = MemoryStore()

# Load and process document
path = "../data/Understanding_Climate_Change.pdf"
loader = PyPDFLoader(path)
documents = loader.load()
document_text = '\n'.join([doc.page_content for doc in documents])
memory_store.memorize(document_text)
# vectorstore = FAISS.from_documents(documents, OpenAIEmbeddings())
chunks_vector_store = encode_pdf(path, chunk_size=1000, chunk_overlap=200)

### Usage Examples

In [28]:
# Example query 1 - information seeking
# Example query 2 - information aggregation
# Example query 3 - ambiguous information needs and information seeking

query_1 = "What are the impacts of climate change on biodiversity?"
query_2 = "Please summarize the climate change article"
query_3 = "Describe the social and economic influence of climate change."

for query in [query_1, query_2, query_3]:
    process_query(query, memory_store, chunks_vector_store)


Processing Query: What are the impacts of climate change on biodiversity?
Staging Answer:
I. Introduction to Climate Change
    A. Definition and overview
    B. Human contribution to climate change

II. Modern Observations of Climate Change
    A. Increase in global temperatures
    B. Rise in sea levels
    C. Occurrence of extreme weather events
    D. Human activities driving recent changes

III. Historical Context of Climate Change
    A. Earth's climate history
    B. Glacial cycles over the past 650,000 years
    C. Impact of the end of the last ice age

IV. Impact of Deforestation on Climate Change
    A. Role of forests as carbon sinks
    B. Release of carbon dioxide due to deforestation
    C. Exacerbation of the greenhouse effect

V. Greenhouse Gases and Climate Change
    A. Role of greenhouse gases in trapping heat
    B. Intensification of the greenhouse effect by human activities
    C. Increase in greenhouse gases from burning fossil fuels

VI. Conclusion on the Impac

## Comparison

### Simple RAG

In [42]:
from evaluation.evalute_rag import *

In [43]:
# chunks_vector_store = encode_pdf(path, chunk_size=1000, chunk_overlap=200)
chunks_query_retriever = chunks_vector_store.as_retriever(search_kwargs={"k": 2})

In [44]:
evaluate_rag(chunks_query_retriever)

Answering the question from the retrieved context...
Answering the question from the retrieved context...
Answering the question from the retrieved context...
Answering the question from the retrieved context...
Answering the question from the retrieved context...


Event loop is already running. Applying nest_asyncio patch to allow async execution...


Evaluating 5 test case(s) in parallel: |██        | 20% (1/5) [Time Taken: 00:14, 14.40s/test case]ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
Evaluating 5 test case(s) in parallel: |████      | 40% (2/5) [Time Taken: 00:21, 10.04s/test case]ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
Evaluating 5 test case(s) in parallel: |██████    | 60% (3/5) [Time Taken: 00:27,  8.44s/test case]ERROR:root:OpenAI rate limit exceeded. Retrying: 2 time(s)...
Evaluating 5 test case(s) in parallel: |██████████|100% (5/5) [Time Taken: 00:39,  7.90s/test case]



Metrics Summary

  - ✅ Correctness (GEval) (score: 1.0, threshold: 0.5, strict: False, evaluation model: gpt-4o, reason: The actual output matches the expected output exactly, indicating factual correctness., error: None)
  - ✅ Faithfulness (score: 1.0, threshold: 0.7, strict: False, evaluation model: gpt-4, reason: None, error: None)
  - ✅ Contextual Relevancy (score: 1.0, threshold: 1.0, strict: False, evaluation model: gpt-4, reason: The score is 1.00 because the retrieval context perfectly aligns with the input, demonstrating a complete match in relevancy., error: None)

For test case:

  - input: What marked the beginning of the modern climate era and human civilization?
  - actual output: The abrupt end of the last ice age about 11,700 years ago marked the beginning of the modern climate era and human civilization.
  - expected output: The abrupt end of the last ice age about 11,700 years ago marked the beginning of the modern climate era and human civilization.
  - context: No




### MemoRAG

In [45]:
def evaluate_memo_rag(num_questions: int = 5) -> None:
    """
    Evaluate the MemoRAG system using predefined metrics.

    Args:
        num_questions (int): Number of questions to evaluate (default: 5).
    """
    # llm = ChatOpenAI(temperature=0, model_name="gpt-4o", max_tokens=2000)
    # question_answer_from_context_chain = create_question_answer_from_context_chain(llm)

    # Load questions and answers from JSON file
    q_a_file_name = "../data/q_a.json"
    with open(q_a_file_name, "r", encoding="utf-8") as json_file:
        q_a = json.load(json_file)

    questions = [qa["question"] for qa in q_a][:num_questions]
    ground_truth_answers = [qa["answer"] for qa in q_a][:num_questions]
    generated_answers = []
    retrieved_documents = []

    # Generate answers and retrieve documents for each question
    for question in questions:
        # context = retrieve_context_per_question(question, chunks_query_retriever)
        # retrieved_documents.append(context)
        # context_string = " ".join(context)
        contexts, result = process_query(question, memory_store, chunks_vector_store)
        retrieved_documents.append(contexts)
        generated_answers.append(result)

    # Create test cases and evaluate
    test_cases = create_deep_eval_test_cases(questions, ground_truth_answers, generated_answers, retrieved_documents)
    evaluate(
        test_cases=test_cases,
        metrics=[correctness_metric, faithfulness_metric, relevance_metric]
    )

In [46]:
evaluate_memo_rag()


Processing Query: What does climate change refer to?
Staging Answer:
I. Definition of climate change
    A. Refers to significant, long-term changes in the global climate
    B. Encompasses overall weather patterns such as temperature, precipitation, and wind patterns
    C. Human activities like burning fossil fuels and deforestation contribute significantly

II. Modern Observations
    A. Rapid increase in global temperatures, sea levels, and extreme weather events
    B. Intergovernmental Panel on Climate Change (IPCC) extensively documents these changes
    C. Historical evidence shows recent changes primarily driven by human activities and greenhouse gas emissions

III. Greenhouse Gases
    A. Primary cause of recent climate change
    B. Includes carbon dioxide (CO2), methane (CH4), and nitrous oxide (N2O)
    C. Trapping heat from the sun creates a "greenhouse effect"

IV. Historical Context
    A. Earth's climate changes throughout history
    B. Seven cycles of glacial advanc

Event loop is already running. Applying nest_asyncio patch to allow async execution...


Evaluating 5 test case(s) in parallel: |          |  0% (0/5) [Time Taken: 00:00, ?test case/s]ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded. Retrying: 1 time(s)...
ERROR:root:OpenAI rate limit exceeded



Metrics Summary

  - ✅ Correctness (GEval) (score: 0.7933778855586217, threshold: 0.5, strict: False, evaluation model: gpt-4o, reason: The actual output correctly identifies the burning of fossil fuels and deforestation as significant contributors to climate change, aligning well with the expected output. However, it introduces additional details such as the alteration of seasons and melting ice, which are not mentioned in the expected output., error: None)
  - ❌ Faithfulness (score: 0.5, threshold: 0.7, strict: False, evaluation model: gpt-4, reason: None, error: None)
  - ❌ Contextual Relevancy (score: 0.6, threshold: 1.0, strict: False, evaluation model: gpt-4, reason: The score is 0.60 because the context solely describes the effects of climate change, like 'changing seasons, melting ice and rising sea levels,' and the need for 'eco-friendly agricultural practices,' but does not address the input question about activities that have significantly contributed to climate change ove


