Obsidian RAG Agent

A local RAG (Retrieval-Augmented Generation) agent that answers questions from your Obsidian notes. When your notes don't cover a question, it automatically falls back to web search. Conversation memory is maintained across turns within a session.

How it works

You ask a question
        ↓
   [router] — LLM decides: notes or web?
        ↓
  ┌─────┴──────┐
[rag]       [web]
  └─────┬──────┘
        ↓
 [grade_docs] — are results actually useful?
        ↓
  ┌─────┴──────────────────┐
[relevant]           [not relevant]
  ↓                        ↓
[generate]         if came from RAG → try web
  ↓                if came from web → generate anyway
answer + sources

Two search modes:

RAG — semantic search over your Obsidian vault using a local embedding model (no API cost)
Web — DuckDuckGo search for current events or topics not in your notes

Full deep-dive in docs/how_it_works.md.

Stack

Layer	Tool	Why
Embeddings	`paraphrase-multilingual-MiniLM-L12-v2` (sentence-transformers)	Free, local, handles Portuguese + English
Vector store	ChromaDB (persistent, local)	No server required, cosine distance
Agent framework	LangGraph	Supports loops and conditional branching — needed for the RAG→web fallback
LLM	`gpt-4o-mini` (OpenAI)	Cheapest capable model; ~$0.15/1M input tokens
Web search	`ddgs` (DuckDuckGo)	Free, no API key required

Why LangGraph and not LangChain?

LangChain chains are fixed sequences — they run top-to-bottom with no ability to loop or branch dynamically. This agent needs two things LangChain can't do:

Conditional routing — send the question to RAG or web based on its content
Retry loop — if RAG retrieval fails the relevance check, fall back to web search

LangGraph models the agent as a state machine: nodes are functions, edges are transitions, and conditional edges let a function inspect the current state to decide what runs next. Conversation memory comes for free via a checkpointer that saves state after every node.

Why cosine distance for ChromaDB?

The embedding model outputs vectors that are not normalized (norm ≈ 5.8). ChromaDB defaults to L2 (Euclidean) distance, which is sensitive to vector magnitude — giving distances in the 12–15 range that are meaningless for thresholding. Cosine distance ignores magnitude and only measures the angle between vectors (0 = identical meaning, 2 = opposite), which is what semantic search actually needs. The collection is created with metadata={"hnsw:space": "cosine"}.

Why `RELEVANCE_THRESHOLD = 0.6`?

Calibrated empirically against this vault:

Real on-topic queries: cosine distance 0.31–0.52
Gibberish / off-topic: cosine distance 0.64+

A cutoff of 0.6 keeps real results and blocks noise with margin on both sides.

Why a two-stage relevance check?

Distance threshold (cheap, no API call) — filters chunks that are geometrically far from the query
LLM grader (one API call) — catches topically adjacent chunks that passed the distance filter but don't actually answer the question

The two stages complement each other: the threshold handles obvious misses, the grader handles subtle ones.

Setup

Requirements: Python 3.10+, an OpenAI API key, and an Obsidian vault.

git clone <repo-url>
cd rag

python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate

pip install -r requirements.txt

Set your vault path and excluded folders in config.py:

VAULT_PATH = "/path/to/your/obsidian/vault"
EXCLUDED_DIRS = {".obsidian", ".git", "_templates", "Journals", "Languages"}

Export your OpenAI API key:

export OPENAI_API_KEY="sk-proj-..."

Index your notes (run once, re-run after adding new notes):

python ingest.py

Usage

Agent (recommended) — routing + memory

python chat.py

You: O que é attention mechanism?
[RAG] The attention mechanism allows the model to...
Sources:
  - Notes/Attention mechanism.md

You: Explica de forma mais simples
[RAG] In simpler terms, attention lets the model decide...

You: What did Anthropic release this week?
[WEB] Anthropic released...
Sources:
  - https://...

Simple one-shot query (no memory, always RAG)

python query.py "O que é attention mechanism?"

Running tests

# Fast tests — no API key needed
pytest tests/

# Full suite including LLM nodes
export OPENAI_API_KEY="sk-proj-..."
pytest tests/ -v

Tests marked with NEEDS_API are automatically skipped when OPENAI_API_KEY is not set.

File structure

.
├── ingest.py          # Index Obsidian vault into ChromaDB
├── retriever.py       # ChromaDB similarity search (used by agent)
├── agent.py           # LangGraph graph: state, nodes, edges, compiled app
├── chat.py            # Interactive CLI conversation loop
├── query.py           # Simple one-shot RAG query (no agent)
├── config.py          # All configuration in one place
├── tests/
│   └── test_nodes.py  # Unit tests for each agent node
└── docs/
    ├── how_it_works.md          # Full pipeline explanation
    └── retrieval_improvements.md # Retrieval failure strategies

Configuration

All tunable values are in config.py:

Variable	Default	Description
`VAULT_PATH`	`/home/bruno/...`	Path to your Obsidian vault
`EXCLUDED_DIRS`	`Journals`, `Languages`, ...	Folders to skip during indexing
`EMBED_MODEL`	`paraphrase-multilingual-MiniLM-L12-v2`	Local embedding model
`TOP_K`	`5`	Number of chunks retrieved per query
`RELEVANCE_THRESHOLD`	`0.6`	Max cosine distance to keep a chunk
`WEB_SEARCH_MAX_RESULTS`	`3`	Web results fetched per query
`OPENAI_MODEL`	`gpt-4o-mini`	OpenAI model used for all LLM calls
`MIN_CHUNK_LEN`	`80`	Minimum characters for a chunk to be indexed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Obsidian RAG Agent

How it works

Stack

Why LangGraph and not LangChain?

Why cosine distance for ChromaDB?

Why `RELEVANCE_THRESHOLD = 0.6`?

Why a two-stage relevance check?

Setup

Usage

Agent (recommended) — routing + memory

Simple one-shot query (no memory, always RAG)

Running tests

File structure

Configuration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
docs		docs
tests		tests
.gitignore		.gitignore
README.md		README.md
agent.py		agent.py
chat.py		chat.py
config.py		config.py
conftest.py		conftest.py
ingest.py		ingest.py
query.py		query.py
requirements.txt		requirements.txt
retriever.py		retriever.py

Folders and files

Latest commit

History

Repository files navigation

Obsidian RAG Agent

How it works

Stack

Why LangGraph and not LangChain?

Why cosine distance for ChromaDB?

Why RELEVANCE_THRESHOLD = 0.6?

Why a two-stage relevance check?

Setup

Usage

Agent (recommended) — routing + memory

Simple one-shot query (no memory, always RAG)

Running tests

File structure

Configuration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Why `RELEVANCE_THRESHOLD = 0.6`?

Packages