A local RAG (Retrieval-Augmented Generation) agent that answers questions from your Obsidian notes. When your notes don't cover a question, it automatically falls back to web search. Conversation memory is maintained across turns within a session.
You ask a question
↓
[router] — LLM decides: notes or web?
↓
┌─────┴──────┐
[rag] [web]
└─────┬──────┘
↓
[grade_docs] — are results actually useful?
↓
┌─────┴──────────────────┐
[relevant] [not relevant]
↓ ↓
[generate] if came from RAG → try web
↓ if came from web → generate anyway
answer + sources
Two search modes:
- RAG — semantic search over your Obsidian vault using a local embedding model (no API cost)
- Web — DuckDuckGo search for current events or topics not in your notes
Full deep-dive in docs/how_it_works.md.
| Layer | Tool | Why |
|---|---|---|
| Embeddings | paraphrase-multilingual-MiniLM-L12-v2 (sentence-transformers) |
Free, local, handles Portuguese + English |
| Vector store | ChromaDB (persistent, local) | No server required, cosine distance |
| Agent framework | LangGraph | Supports loops and conditional branching — needed for the RAG→web fallback |
| LLM | gpt-4o-mini (OpenAI) |
Cheapest capable model; ~$0.15/1M input tokens |
| Web search | ddgs (DuckDuckGo) |
Free, no API key required |
LangChain chains are fixed sequences — they run top-to-bottom with no ability to loop or branch dynamically. This agent needs two things LangChain can't do:
- Conditional routing — send the question to RAG or web based on its content
- Retry loop — if RAG retrieval fails the relevance check, fall back to web search
LangGraph models the agent as a state machine: nodes are functions, edges are transitions, and conditional edges let a function inspect the current state to decide what runs next. Conversation memory comes for free via a checkpointer that saves state after every node.
The embedding model outputs vectors that are not normalized (norm ≈ 5.8). ChromaDB defaults to L2 (Euclidean) distance, which is sensitive to vector magnitude — giving distances in the 12–15 range that are meaningless for thresholding. Cosine distance ignores magnitude and only measures the angle between vectors (0 = identical meaning, 2 = opposite), which is what semantic search actually needs. The collection is created with metadata={"hnsw:space": "cosine"}.
Calibrated empirically against this vault:
- Real on-topic queries: cosine distance 0.31–0.52
- Gibberish / off-topic: cosine distance 0.64+
A cutoff of 0.6 keeps real results and blocks noise with margin on both sides.
- Distance threshold (cheap, no API call) — filters chunks that are geometrically far from the query
- LLM grader (one API call) — catches topically adjacent chunks that passed the distance filter but don't actually answer the question
The two stages complement each other: the threshold handles obvious misses, the grader handles subtle ones.
Requirements: Python 3.10+, an OpenAI API key, and an Obsidian vault.
git clone <repo-url>
cd rag
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txtSet your vault path and excluded folders in config.py:
VAULT_PATH = "/path/to/your/obsidian/vault"
EXCLUDED_DIRS = {".obsidian", ".git", "_templates", "Journals", "Languages"}Export your OpenAI API key:
export OPENAI_API_KEY="sk-proj-..."Index your notes (run once, re-run after adding new notes):
python ingest.pypython chat.pyYou: O que é attention mechanism?
[RAG] The attention mechanism allows the model to...
Sources:
- Notes/Attention mechanism.md
You: Explica de forma mais simples
[RAG] In simpler terms, attention lets the model decide...
You: What did Anthropic release this week?
[WEB] Anthropic released...
Sources:
- https://...
python query.py "O que é attention mechanism?"# Fast tests — no API key needed
pytest tests/
# Full suite including LLM nodes
export OPENAI_API_KEY="sk-proj-..."
pytest tests/ -vTests marked with NEEDS_API are automatically skipped when OPENAI_API_KEY is not set.
.
├── ingest.py # Index Obsidian vault into ChromaDB
├── retriever.py # ChromaDB similarity search (used by agent)
├── agent.py # LangGraph graph: state, nodes, edges, compiled app
├── chat.py # Interactive CLI conversation loop
├── query.py # Simple one-shot RAG query (no agent)
├── config.py # All configuration in one place
├── tests/
│ └── test_nodes.py # Unit tests for each agent node
└── docs/
├── how_it_works.md # Full pipeline explanation
└── retrieval_improvements.md # Retrieval failure strategies
All tunable values are in config.py:
| Variable | Default | Description |
|---|---|---|
VAULT_PATH |
/home/bruno/... |
Path to your Obsidian vault |
EXCLUDED_DIRS |
Journals, Languages, ... |
Folders to skip during indexing |
EMBED_MODEL |
paraphrase-multilingual-MiniLM-L12-v2 |
Local embedding model |
TOP_K |
5 |
Number of chunks retrieved per query |
RELEVANCE_THRESHOLD |
0.6 |
Max cosine distance to keep a chunk |
WEB_SEARCH_MAX_RESULTS |
3 |
Web results fetched per query |
OPENAI_MODEL |
gpt-4o-mini |
OpenAI model used for all LLM calls |
MIN_CHUNK_LEN |
80 |
Minimum characters for a chunk to be indexed |