# MemoRAG: Enhancing Retrieval-Augmented Generation with Memory Models

## Overview
MemoRAG is a Retrieval-Augmented Generation (RAG) framework that incorporates a memory model as an auxiliary step before the retrieval phase. In doing so, it bridges the gap in contextual understanding and reasoning that standard RAG techniques face when addressing queries with implicit or ambiguous information needs and unstructured external knowledge.

## Motivation
Standard RAG techniques rely heavily on lexical or semantic matching between the query and the knowledge base. While this approach works well for clear question answering tasks with structured knowledge, it often falls short when handling queries with implicit or ambiguous information (e.g., describing the relationships between main characters in a novel) or when the knowledge base is unstructured (e.g., fiction books). In such cases, lexical or semantic matching seldom produces the desired outputs.

## Key Components
1. **Memory**: A compressed representation of the database created by a long-context model, designed to handle and summarize extensive inputs efficiently.
2. **Retriever**: A standard RAG retrieval model responsible for selecting relevant context from the knowledge base to support the generator.
3. **Generator**: A generative language model that produces responses by combining the query with the retrieved context, similar to standard RAG setups.

## Method Details
### 1. Memory
- The memory module serves as an auxiliary component to enhance the retriever’s ability to identify better matches between queries and relevant parts of the database. It takes the original query and the database as inputs and produces staging answers — intermediate outputs like clues, surrogate queries, or key points — which the retriever uses instead of the original query.
- Long-term memory is constructed by running a long-context model, such as Qwen2-7B-Instruct or Mistral-7B-Instruct-v0.2, over the entire database. This process generates a compressed representation of the database through an attention mechanism.
- The compressed representation is stored as key-value pairs, facilitating efficient and accurate retrieval.
- Released memory models include memorag-qwen2-7b-inst and memorag-mistral-7b-inst, derived from Qwen2-7B-Instruct and Mistral-7B-Instruct-v0.2, respectively.

### 2. Retriever
- The retriever is a standard retrieval model, adapted to take processed queries (created by the memory module as staging answers) instead of the original query.
- It outputs the retrieved **context**, which serves as the basis for generating the final answer.


### 3. Generator
- The generator produces the final response by combining the retriever’s output (retrieved context) with the original query.
- MemoRAG ensures compatibility and consistency by using the memory module’s underlying model as the default generator.

## Benefits of the Approach
1. **Extended Scope of Queries:** MemoRAG's preprocessing capabilities enable it to handle complex and long-context tasks that conventional RAG methods struggle with.

2. **Improved Accuracy:** By simplifying and adjusting queries before retrieval, MemoRAG enhances performance over standard RAG methods.

3. **Flexibility:** Adapts to diverse tasks, datasets, and retrieval scenarios.

4. **Robustness:** Improved performance remains consistent across various generators, datasets, and query types.

5. **Efficiency**: The use of key-value compression reduces computational overhead.

## Conclusion
The memory module in MemoRAG significantly enhances comprehension of both the queries and the database, enabling more effective retrieval. Its ability to preprocess queries, generate staging answers, and leverage long-context memory models ensures high-quality responses, making MemoRAG a significant step forward in the evolution of retrieval-augmented generation.


<div style="text-align: center;">

<img src="../images/memo_rag.svg" alt="MemoRAG" style="width:100%; height:auto;">
</div>

### Imports

In [None]:
import os
from dotenv import load_dotenv
from typing import List, Dict
from langchain_community.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import PyPDFLoader
from openai import OpenAI
import time

### OpenAI Setup

In [None]:
load_dotenv()
client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

### Memory Base Classes

In [None]:
class SimpleMemoryStore:
    """Simple memory store using FAISS"""
    def __init__(self):
        self.embeddings = OpenAIEmbeddings()
        self.episodic_store = None
        self.semantic_store = None

    def add_memory(self, text: str, memory_type: str):
        """Add a memory to either episodic or semantic store"""
        if memory_type == "episodic":
            if self.episodic_store is None:
                self.episodic_store = FAISS.from_texts([text], self.embeddings)
            else:
                self.episodic_store.add_texts([text])
        elif memory_type == "semantic":
            if self.semantic_store is None:
                self.semantic_store = FAISS.from_texts([text], self.embeddings)
            else:
                self.semantic_store.add_texts([text])

    def search_memories(self, query: str, k: int = 5) -> Dict[str, List[tuple[str, float]]]:
        """Search both memory stores"""
        results = {"episodic": [], "semantic": []}

        if self.episodic_store:
            episodic_results = self.episodic_store.similarity_search_with_score(query, k=k)
            results["episodic"] = [(doc.page_content, score) for doc, score in episodic_results]

        if self.semantic_store:
            semantic_results = self.semantic_store.similarity_search_with_score(query, k=k)
            results["semantic"] = [(doc.page_content, score) for doc, score in semantic_results]

        return results

### Search and Generation Functions

In [None]:
def enhanced_search(query: str, memory_store, vectorstore):
    """Perform enhanced search with memory integration"""
    start_time = time.time()

    try:
        # Get relevant memories
        memories = memory_store.search_memories(query, k=5)
        print("\nRelevant memories found:")
        for mem_type, mem_list in memories.items():
            for content, score in mem_list:
                relevance = 1 / (1 + score)
                print(f"- {mem_type.title()} Memory (relevance: {relevance:.2f}):")
                print(f"  {content[:100]}...")

        # Combine query with memories for better search
        memory_context = " ".join([mem[0] for mems in memories.values() for mem in mems])
        enhanced_query = f"{query} Context: {memory_context}"

        # Search document
        results = vectorstore.similarity_search(enhanced_query, k=5)
        contexts = [doc.page_content for doc in results]

        # Store this interaction as a new episodic memory
        if contexts:
            memory_store.add_memory(
                f"Query: {query}\nRelevant content: {contexts[0][:200]}",
                "episodic"
            )

        print(f"Search completed in {time.time() - start_time:.2f} seconds")
        return contexts

    except Exception as e:
        print(f"Search error: {e}")
        return []

def generate_answer(query: str, contexts: List[str]) -> str:
    """Generate answer using only the original query and retrieved contexts"""
    prompt = f"""Based on the provided context, answer the query.

Query: {query}

Retrieved Information:
{' '.join(contexts)}

Provide a clear and concise answer focusing only on the retrieved information.
"""

    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a knowledgeable assistant. Provide clear, concise answers."},
            {"role": "user", "content": prompt}
        ],
        max_tokens=200,
        temperature=0.7
    )

    return response.choices[0].message.content

### Initialize Components

In [None]:
# Initialize memory store
memory_store = SimpleMemoryStore()

# Load and process document
path = "../data/Understanding_Climate_Change.pdf"
loader = PyPDFLoader(path)
documents = loader.load()
vectorstore = FAISS.from_documents(documents, OpenAIEmbeddings())

### Usage Examples

In [None]:
# Example query 1 - information seeking
query = "What are the impacts of climate change on biodiversity?"

print("\nProcessing Query:", query)
print("=" * 50)

# Get results (now only contexts)
contexts = enhanced_search(query, memory_store, vectorstore)

# Display results
if contexts:
    print("\nRetrieved Contexts:")
    for i, context in enumerate(contexts, 1):
        print(f"\nContext {i}:")
        print("=" * 50)
        print(context[:200] + "..." if len(context) > 200 else context)
        print("-" * 50)

    answer = generate_answer(query, contexts)
    print("\nGenerated Answer:")
    print("=" * 50)
    print(answer)
    print("-" * 50)

In [7]:
# Example query 2 - information aggregation
query = "Please summarize the climate change article"

print("\nProcessing Query:", query)
print("=" * 50)

# Get results (now only contexts)
contexts = enhanced_search(query, memory_store, vectorstore)

# Display results
if contexts:
    print("\nRetrieved Contexts:")
    for i, context in enumerate(contexts, 1):
        print(f"\nContext {i}:")
        print("=" * 50)
        print(context[:200] + "..." if len(context) > 200 else context)
        print("-" * 50)

    answer = generate_answer(query, contexts)
    print("\nGenerated Answer:")
    print("=" * 50)
    print(answer)
    print("-" * 50)


Processing Query: Please summarize the climate change article

Relevant memories found:
- Episodic Memory (relevance: 0.74):
  Query: What are the impacts of climate change on biodiversity?
Relevant content: goals. Policies sho...
Search completed in 7.52 seconds

Retrieved Contexts:

Context 1:
goals. Policies should promote synergies between biodiversity conservation and climate 
action.  
Chapter 10: Climate Change and Human Health  
Health Impacts  
Heat -Related Illnesses  
Rising temper...
--------------------------------------------------

Context 2:
Local communities are often on the front lines of climate impacts and can be powerful agents 
of change. Community -based conservation projects involve residents in protecting and 
restoring natural r...
--------------------------------------------------

Context 3:
Healthy ecosystems provide services such as water filtration, pollination, and climate 
regulation. Protecting and restoring ecosystems enhances their ability to suppor

In [8]:
# Example query 3 - ambiguous information needs and information seeking
query = "Describe the social and economic influence of climate change."

print("\nProcessing Query:", query)
print("=" * 50)

# Get results (now only contexts)
contexts = enhanced_search(query, memory_store, vectorstore)

# Display results
if contexts:
    print("\nRetrieved Contexts:")
    for i, context in enumerate(contexts, 1):
        print(f"\nContext {i}:")
        print("=" * 50)
        print(context[:200] + "..." if len(context) > 200 else context)
        print("-" * 50)

    answer = generate_answer(query, contexts)
    print("\nGenerated Answer:")
    print("=" * 50)
    print(answer)
    print("-" * 50)


Processing Query: Describe the social and economic influence of climate change.

Relevant memories found:
- Episodic Memory (relevance: 0.76):
  Query: What are the impacts of climate change on biodiversity?
Relevant content: goals. Policies sho...
- Episodic Memory (relevance: 0.75):
  Query: Please summarize the climate change article
Relevant content: goals. Policies should promote ...
Search completed in 2.25 seconds

Retrieved Contexts:

Context 1:
goals. Policies should promote synergies between biodiversity conservation and climate 
action.  
Chapter 10: Climate Change and Human Health  
Health Impacts  
Heat -Related Illnesses  
Rising temper...
--------------------------------------------------

Context 2:
Local communities are often on the front lines of climate impacts and can be powerful agents 
of change. Community -based conservation projects involve residents in protecting and 
restoring natural r...
--------------------------------------------------

Context 3:
empower