-
Notifications
You must be signed in to change notification settings - Fork 0
003_Production level RAG Workshop: Part 1
RAG stands for:
Retrieval-Augmented Generation
It combines:
- Retrieval
- Fetching relevant information from an external knowledge source
- Generation
- Using an LLM to generate a response
Instead of relying only on the LLM’s pretrained knowledge, RAG supplements the LLM with retrieved context from external documents.
- Lovable
- Supabase
Problem: Context Window Limitations
Consider a 1200-page nutrition textbook.
A naive solution:
Question + Entire PDF
↓
LLM
Problems:
- Too many tokens
- High cost
- Context window overflow
- Hallucinations
- Slow responses
Example discussed:
PDF size ≈ 400K tokens
GPT context window ≈ 128K tokens
The entire document cannot fit into memory at once.
When the relevant information is missing from the prompt:
The LLM may answer from pretrained knowledge rather than the provided document.
This leads to:
- Incorrect answers
- Non-grounded responses
- Hallucinations
RAG helps reduce this issue by supplying only relevant document sections.
The workshop explains RAG using an open-book exam.
Without RAG
A student answers questions using memory only.
Equivalent:
User Question
↓
LLM
↓
Answer
With RAG
A student:
- Searches the book
- Finds relevant pages
- Uses both retrieved information and existing knowledge
Equivalent:
User Question
↓
Retrieval
↓
Relevant Context
↓
LLM
↓
Answer
This is the core intuition behind RAG.
Main objective:
Reduce hallucinations
Architecture:
Documents
↓
Retrieval
↓
LLM
↓
Answer
RAG is viewed as part of a larger discipline:
Context Engineering
Components include:
- Retrieval
- Prompt Engineering
- Memory
- State Management
- Embeddings
- Vector Databases
- Long Context Windows
Modern RAG is therefore a subset of context engineering.
Definition
Context engineering is the practice of managing all information that enters an LLM's context.
Retrieval Context
Information fetched from knowledge sources.
Conversation Memory
Previous user interactions.
Application State
Current workflow information.
Prompt Design
Instructions guiding the model.
Storage Layer
- Vector databases
- Traditional databases
- Hybrid storage systems
The workshop positions context engineering as the next evolution beyond prompt engineering.
Imagine a shopkeeper asks:
"How much should I charge customer Amit?"
AI needs:
Customer name = Amit
Products in cart
Discount rules
GST rules
Wallet balance
Without this information:
AI = Guessing
With this information:
AI = Accurate
Providing all this information is Context Engineering.
Now we provide much more than a prompt.
Prompt
+
Company Documents
+
Customer Data
+
Previous Chats
+
Current Order
+
Database Information
↓
LLM
This is Context Engineering.