-
Notifications
You must be signed in to change notification settings - Fork 0
003_Production level RAG Workshop: Part 1
RAG stands for:
Retrieval-Augmented Generation
It combines:
- Retrieval
- Fetching relevant information from an external knowledge source
- Generation
- Using an LLM to generate a response
Instead of relying only on the LLM’s pretrained knowledge, RAG supplements the LLM with retrieved context from external documents.
- Lovable
- Supabase
Problem: Context Window Limitations
Consider a 1200-page nutrition textbook.
A naive solution:
Question + Entire PDF
↓
LLM
Problems:
- Too many tokens
- High cost
- Context window overflow
- Hallucinations
- Slow responses
Example discussed:
PDF size ≈ 400K tokens
GPT context window ≈ 128K tokens
The entire document cannot fit into memory at once.
When the relevant information is missing from the prompt:
The LLM may answer from pretrained knowledge rather than the provided document.
This leads to:
- Incorrect answers
- Non-grounded responses
- Hallucinations
RAG helps reduce this issue by supplying only relevant document sections.
The workshop explains RAG using an open-book exam.
Without RAG
A student answers questions using memory only.
Equivalent:
User Question
↓
LLM
↓
Answer
With RAG
A student:
- Searches the book
- Finds relevant pages
- Uses both retrieved information and existing knowledge
Equivalent:
User Question
↓
Retrieval
↓
Relevant Context
↓
LLM
↓
Answer
This is the core intuition behind RAG.
Main objective:
Reduce hallucinations
Architecture:
Documents
↓
Retrieval
↓
LLM
↓
Answer
RAG is viewed as part of a larger discipline:
Context Engineering
Components include:
- Retrieval
- Prompt Engineering
- Memory
- State Management
- Embeddings
- Vector Databases
- Long Context Windows
Modern RAG is therefore a subset of context engineering.