Skip to content

Bruno-Ferr/rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Obsidian RAG Agent

A local RAG (Retrieval-Augmented Generation) agent that answers questions from your Obsidian notes. When your notes don't cover a question, it automatically falls back to web search. Conversation memory is maintained across turns within a session.

How it works

You ask a question
        ↓
   [router] — LLM decides: notes or web?
        ↓
  ┌─────┴──────┐
[rag]       [web]
  └─────┬──────┘
        ↓
 [grade_docs] — are results actually useful?
        ↓
  ┌─────┴──────────────────┐
[relevant]           [not relevant]
  ↓                        ↓
[generate]         if came from RAG → try web
  ↓                if came from web → generate anyway
answer + sources

Two search modes:

  • RAG — semantic search over your Obsidian vault using a local embedding model (no API cost)
  • Web — DuckDuckGo search for current events or topics not in your notes

Full deep-dive in docs/how_it_works.md.

Stack

Layer Tool Why
Embeddings paraphrase-multilingual-MiniLM-L12-v2 (sentence-transformers) Free, local, handles Portuguese + English
Vector store ChromaDB (persistent, local) No server required, cosine distance
Agent framework LangGraph Supports loops and conditional branching — needed for the RAG→web fallback
LLM gpt-4o-mini (OpenAI) Cheapest capable model; ~$0.15/1M input tokens
Web search ddgs (DuckDuckGo) Free, no API key required

Why LangGraph and not LangChain?

LangChain chains are fixed sequences — they run top-to-bottom with no ability to loop or branch dynamically. This agent needs two things LangChain can't do:

  1. Conditional routing — send the question to RAG or web based on its content
  2. Retry loop — if RAG retrieval fails the relevance check, fall back to web search

LangGraph models the agent as a state machine: nodes are functions, edges are transitions, and conditional edges let a function inspect the current state to decide what runs next. Conversation memory comes for free via a checkpointer that saves state after every node.

Why cosine distance for ChromaDB?

The embedding model outputs vectors that are not normalized (norm ≈ 5.8). ChromaDB defaults to L2 (Euclidean) distance, which is sensitive to vector magnitude — giving distances in the 12–15 range that are meaningless for thresholding. Cosine distance ignores magnitude and only measures the angle between vectors (0 = identical meaning, 2 = opposite), which is what semantic search actually needs. The collection is created with metadata={"hnsw:space": "cosine"}.

Why RELEVANCE_THRESHOLD = 0.6?

Calibrated empirically against this vault:

  • Real on-topic queries: cosine distance 0.31–0.52
  • Gibberish / off-topic: cosine distance 0.64+

A cutoff of 0.6 keeps real results and blocks noise with margin on both sides.

Why a two-stage relevance check?

  1. Distance threshold (cheap, no API call) — filters chunks that are geometrically far from the query
  2. LLM grader (one API call) — catches topically adjacent chunks that passed the distance filter but don't actually answer the question

The two stages complement each other: the threshold handles obvious misses, the grader handles subtle ones.

Setup

Requirements: Python 3.10+, an OpenAI API key, and an Obsidian vault.

git clone <repo-url>
cd rag

python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate

pip install -r requirements.txt

Set your vault path and excluded folders in config.py:

VAULT_PATH = "/path/to/your/obsidian/vault"
EXCLUDED_DIRS = {".obsidian", ".git", "_templates", "Journals", "Languages"}

Export your OpenAI API key:

export OPENAI_API_KEY="sk-proj-..."

Index your notes (run once, re-run after adding new notes):

python ingest.py

Usage

Agent (recommended) — routing + memory

python chat.py
You: O que é attention mechanism?
[RAG] The attention mechanism allows the model to...
Sources:
  - Notes/Attention mechanism.md

You: Explica de forma mais simples
[RAG] In simpler terms, attention lets the model decide...

You: What did Anthropic release this week?
[WEB] Anthropic released...
Sources:
  - https://...

Simple one-shot query (no memory, always RAG)

python query.py "O que é attention mechanism?"

Running tests

# Fast tests — no API key needed
pytest tests/

# Full suite including LLM nodes
export OPENAI_API_KEY="sk-proj-..."
pytest tests/ -v

Tests marked with NEEDS_API are automatically skipped when OPENAI_API_KEY is not set.

File structure

.
├── ingest.py          # Index Obsidian vault into ChromaDB
├── retriever.py       # ChromaDB similarity search (used by agent)
├── agent.py           # LangGraph graph: state, nodes, edges, compiled app
├── chat.py            # Interactive CLI conversation loop
├── query.py           # Simple one-shot RAG query (no agent)
├── config.py          # All configuration in one place
├── tests/
│   └── test_nodes.py  # Unit tests for each agent node
└── docs/
    ├── how_it_works.md          # Full pipeline explanation
    └── retrieval_improvements.md # Retrieval failure strategies

Configuration

All tunable values are in config.py:

Variable Default Description
VAULT_PATH /home/bruno/... Path to your Obsidian vault
EXCLUDED_DIRS Journals, Languages, ... Folders to skip during indexing
EMBED_MODEL paraphrase-multilingual-MiniLM-L12-v2 Local embedding model
TOP_K 5 Number of chunks retrieved per query
RELEVANCE_THRESHOLD 0.6 Max cosine distance to keep a chunk
WEB_SEARCH_MAX_RESULTS 3 Web results fetched per query
OPENAI_MODEL gpt-4o-mini OpenAI model used for all LLM calls
MIN_CHUNK_LEN 80 Minimum characters for a chunk to be indexed

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages