Skip to content

003_Production level RAG Workshop: Part 1

Amresh Verma edited this page Jun 18, 2026 · 7 revisions

What is RAG?

Definition

RAG stands for:

Retrieval-Augmented Generation

It combines:

  1. Retrieval
    • Fetching relevant information from an external knowledge source
  2. Generation
    • Using an LLM to generate a response

Instead of relying only on the LLM’s pretrained knowledge, RAG supplements the LLM with retrieved context from external documents.

External tools

  1. Lovable
  2. Supabase
Screenshot 2026-06-18 at 1 20 19 PM

Why RAG Exists

Problem: Context Window Limitations

Consider a 1200-page nutrition textbook.

A naive solution:

   Question + Entire PDF
            ↓
           LLM

Problems:

  • Too many tokens
  • High cost
  • Context window overflow
  • Hallucinations
  • Slow responses

Example discussed:

PDF size ≈ 400K tokens

GPT context window ≈ 128K tokens

The entire document cannot fit into memory at once.

Hallucination Problem

When the relevant information is missing from the prompt:

The LLM may answer from pretrained knowledge rather than the provided document.

This leads to:

  • Incorrect answers
  • Non-grounded responses
  • Hallucinations

RAG helps reduce this issue by supplying only relevant document sections.

Open-Book Exam Analogy

The workshop explains RAG using an open-book exam.

Without RAG

    A student answers questions using memory only.

Equivalent:

    User Question
          ↓
         LLM
          ↓
       Answer

With RAG

A student:

  • Searches the book
  • Finds relevant pages
  • Uses both retrieved information and existing knowledge

Equivalent:

    User Question
           ↓
     Retrieval
           ↓
    Relevant Context
           ↓
          LLM
           ↓
        Answer

This is the core intuition behind RAG.

Screenshot 2026-06-18 at 1 49 57 PM

Evolution of RAG

RAG in 2021

Main objective:

Reduce hallucinations

Architecture:

    Documents
       ↓
    Retrieval
       ↓
    LLM
       ↓
    Answer
Screenshot 2026-06-18 at 2 15 11 PM

RAG Today

RAG is viewed as part of a larger discipline:

Context Engineering

Components include:

  • Retrieval
  • Prompt Engineering
  • Memory
  • State Management
  • Embeddings
  • Vector Databases
  • Long Context Windows

Modern RAG is therefore a subset of context engineering.

Screenshot 2026-06-18 at 2 09 24 PM

Context Engineering

Definition

    Context engineering is the practice of managing all information that enters an LLM's context.

Components:

Retrieval Context

     Information fetched from knowledge sources.

Conversation Memory

     Previous user interactions.

Application State

     Current workflow information.

Prompt Design

     Instructions guiding the model.

Storage Layer

  • Vector databases
  • Traditional databases
  • Hybrid storage systems

The workshop positions context engineering as the next evolution beyond prompt engineering.

Real-Life Example

Imagine a shopkeeper asks:

 "How much should I charge customer Amit?"

AI needs:

  Customer name = Amit
  Products in cart
  Discount rules
  GST rules
  Wallet balance

Without this information:

  AI = Guessing

With this information:

  AI = Accurate

Providing all this information is Context Engineering.

Today: Context Engineering

Now we provide much more than a prompt.

           Prompt
              +
           Company Documents
              +
           Customer Data
              +
           Previous Chats
              +
           Current Order
              +
           Database Information
             ↓
           LLM

This is Context Engineering.

Clone this wiki locally