# Vector Only RAG

In this notebook we are going to explore how to make a vector only RAG system. RAG stands for Retrieval Augmented Generation, the basic idea of this technique is to not use the LLM to recall knowledge but instead rely on a external system like a database to provide information to the system.

The main advantage of vector-only RAG systems over Knowledge Graph RAG is the fact that you don't need to model your knowledge domain and can use very simple data pipelines. They excel in retrieving information based on the context of the question but fail short in retrieving factual knowledge (you've been warned).

In this tutorial we are going to present you a small system that is nice to learn how RAG works in practice. To make things bit more complicated, we are going to start from a PDF file.

The PDF we are going to parse is a paper from [Elisabeth Spelke](https://en.wikipedia.org/wiki/Elizabeth_Spelke) a developmental psychologist that studied cognition in infants, culturally different communities and non-human species to understand the fundamental building blocks of our mind. It contains evidences that our cognition system is build upon several key components and is an important inspiration for Neuro-Symbolic and Robotics systems.

In [1]:
import pymupdf # Used to extract the text data from the PDF file
from hybridagi.core.datatypes import Document, DocumentList
from hybridagi.core.pipeline import Pipeline
from hybridagi.embeddings import SentenceTransformerEmbeddings
from hybridagi.modules.splitters import DocumentSentenceSplitter
from hybridagi.modules.embedders import DocumentEmbedder
from hybridagi.readers import PDFReader

embeddings = SentenceTransformerEmbeddings(
    model_name_or_path = "all-MiniLM-L6-v2",
    dim = 384, # The dimention of the embeddings vector (also called dense vector)
)

reader = PDFReader()

input_docs = reader("data/SpelkeKinzlerCoreKnowledge.pdf")

# Now that we have our input documents, we can start to make our data processing pipeline

pipeline = Pipeline()

pipeline.add("first_split", DocumentSentenceSplitter(
    method = "word",
    chunk_size = 100,
    chunk_overlap = 0,
    separator = " ",
))
pipeline.add("second_split", DocumentSentenceSplitter(
    method = "word",
    chunk_size = 30,
    chunk_overlap = 0,
    separator = " ",
))
pipeline.add("embed_chunks", DocumentEmbedder(embeddings=embeddings))

output_docs = pipeline(input_docs)
intermediate_docs = pipeline.get_output("first_split")

  from .autonotebook import tqdm as notebook_tqdm


#### Saving the documents into memory

Now that we have our documents we can load them into memory, for storing unstructured textual documents, we provide the `DocumentMemory` and for this example we are going to use the local integration for rapid prototyping.

In [2]:
from hybridagi.memory.integration.local import LocalDocumentMemory

document_memory = LocalDocumentMemory(index_name="vector_rag_agent")

document_memory.update(input_docs)
document_memory.update(intermediate_docs)
document_memory.update(output_docs)

document_memory.show() # Let's see what the memory look like now

vector_rag_agent_document_memory.html


### Making a RAG Agent

Now that our data is ready and loaded into memory, we can start making our RAG Agent, but first we need to create our graph program that is going to encode the Agent behavior.

In [3]:
import hybridagi.core.graph_program as gp

# We first need to program our RAG Agent using Graph Prompt Programs
# Here we have the simplest agent possible that involve 2 steps
# It first retrieve documents, then answer to the objective's question based on them

main = gp.GraphProgram(
    name = "main",
    description = "The main program",
)

main.add(gp.Action(
    id = "document_search",
    purpose = "Find relevant documents",
    tool = "DocumentSearch",
    prompt = "Please infer the similarity search query (only ONE item) based on the Objective's question",
))

main.add(gp.Action(
    id = "answer",
    purpose = "Answer the Objective's question",
    tool = "Speak",
    prompt = """
Please answer the Objective's question using the relevant documents in your context.
If no document are relevant just say that you don't know.
Don't state the Objective's question and only give the correct answer.
""",
))

main.connect("start", "document_search")
main.connect("document_search", "answer")
main.connect("answer", "end")

main.build() # Verify that the graph program is correct

print(main) # Let's look at it


// @desc: The main program
CREATE
// Nodes declaration
(start:Control {id: "start"}),
(end:Control {id: "end"}),
(document_search:Action {
  id: "document_search",
  purpose: "Find relevant documents",
  tool: "DocumentSearch",
  prompt: "Please infer the similarity search query (only ONE item) based on the Objective's question"
}),
(answer:Action {
  id: "answer",
  purpose: "Answer the Objective's question",
  tool: "Speak",
  prompt: "\nPlease answer the Objective's question using the relevant documents in your context.\nIf no document are relevant just say that you don't know.\nDon't state the Objective's question and only give the correct answer.\n"
}),
// Structure declaration
(start)-[:NEXT]->(document_search),
(document_search)-[:NEXT]->(answer),
(answer)-[:NEXT]->(end)


In [4]:
# Now we can add this program into memory

from hybridagi.memory.integration.local import LocalProgramMemory

program_memory = LocalProgramMemory(index_name="vector_rag_agent")

program_memory.update(main)

In [5]:
# Then instanciate our tools and agent system

import dspy
from hybridagi.core.datatypes import AgentState, Query
from hybridagi.modules.agents import GraphInterpreterAgent
from hybridagi.modules.retrievers.integration.local import FAISSDocumentRetriever
from hybridagi.modules.agents.tools import (
    SpeakTool,
    DocumentSearchTool,
)

agent_state = AgentState()

tools = [
    SpeakTool(
        agent_state = agent_state
    ),
    DocumentSearchTool(
        retriever = FAISSDocumentRetriever(
            document_memory = document_memory,
            embeddings = embeddings,
            distance = "cosine",
            max_distance = 0.7,
            k = 5,
            reranker = None,
        ),
    ),
]

rag_agent = GraphInterpreterAgent(
    program_memory = program_memory,
    agent_state = agent_state,
    tools = tools,
)

# We can now setup the LLM using Ollama client from DSPy

lm = dspy.OllamaLocal(model='mistral', max_tokens=1024, stop=["\n\n\n"])
dspy.configure(lm=lm)

# And call our agent

result = rag_agent(Query(query="What are the core knowledge?"))

print(result.final_answer)

[35m --- Step 0 ---
Call Program: main
Program Purpose: What are the core knowledge?[0m
[36m --- Step 1 ---
Action Purpose: Find relevant documents
Action: {
  "query": "core knowledge OR fundamental concepts OR basic principles OR key ideas OR essential facts",
  "documents": [
    {
      "text": "and its research bears on these questions. We believe its\nresearch has shown that both these views are false: humans\nare endowed neither with a single, general-purpose learning\nsystem nor with myriad special-purpose systems and\npredispositions. Instead, we believe that humans are\nendowed with a small number of separable systems of\ncore knowledge. New, \ufb02exible skills and belief systems\nbuild on these core foundations.\nStudies of human infants and non-human animals,\nfocused on the ontogenetic and phylogenetic origins of\nknowledge, provide evidence for four core knowledge\nsystems (Spelke, 2004). These systems serve to represent\ninanimate objects and their mechanical interac