Skip to content

003_Production level RAG Workshop: Part 1

Amresh Verma edited this page Jun 18, 2026 · 7 revisions

What is RAG?

Definition

RAG stands for:

Retrieval-Augmented Generation

It combines:

  1. Retrieval
    • Fetching relevant information from an external knowledge source
  2. Generation
    • Using an LLM to generate a response

Instead of relying only on the LLM’s pretrained knowledge, RAG supplements the LLM with retrieved context from external documents.

External tools

  1. Lovable
  2. Supabase
Screenshot 2026-06-18 at 1 20 19 PM

Why RAG Exists

Problem: Context Window Limitations

Consider a 1200-page nutrition textbook.

A naive solution:

   Question + Entire PDF
            ↓
           LLM

Problems:

  • Too many tokens
  • High cost
  • Context window overflow
  • Hallucinations
  • Slow responses

Example discussed:

PDF size ≈ 400K tokens

GPT context window ≈ 128K tokens

The entire document cannot fit into memory at once.

Hallucination Problem

When the relevant information is missing from the prompt:

The LLM may answer from pretrained knowledge rather than the provided document.

This leads to:

  • Incorrect answers
  • Non-grounded responses
  • Hallucinations

RAG helps reduce this issue by supplying only relevant document sections.

Open-Book Exam Analogy

The workshop explains RAG using an open-book exam.

Without RAG

    A student answers questions using memory only.

Equivalent:

    User Question
          ↓
         LLM
          ↓
       Answer

With RAG

A student:

  • Searches the book
  • Finds relevant pages
  • Uses both retrieved information and existing knowledge

Equivalent:

    User Question
           ↓
     Retrieval
           ↓
    Relevant Context
           ↓
          LLM
           ↓
        Answer

This is the core intuition behind RAG.

Screenshot 2026-06-18 at 1 49 57 PM

Evolution of RAG

RAG in 2021

Main objective:

Reduce hallucinations

Architecture:

    Documents
       ↓
    Retrieval
       ↓
    LLM
       ↓
    Answer
Screenshot 2026-06-18 at 2 15 11 PM

RAG Today

RAG is viewed as part of a larger discipline:

Context Engineering

Components include:

  • Retrieval
  • Prompt Engineering
  • Memory
  • State Management
  • Embeddings
  • Vector Databases
  • Long Context Windows

Modern RAG is therefore a subset of context engineering.

Screenshot 2026-06-18 at 2 09 24 PM

App hole Flow

Screenshot 2026-06-18 at 3 17 31 PM

Step 1:

Document Pre-process

while we are process data file like pdf then we need to use some lib. It is depend on your data file, what kind of data are containing like below

  1. Pdf can contain Text only

     we can use package lib like **pymuPDF** lib. It is tradisonal libary. By using we can use diff pages.
    

Now I have question if pdf contain image then it will consider as image only. But if image contain text like restorent bill then it will not read that text.

Screenshot 2026-06-18 at 4 29 27 PM
  1. Pdf can contain Text + images

    OCR lib can resolved above issues. The best OCR opensouce lib is tesseract lib

Screenshot 2026-06-18 at 4 30 14 PM
  1. Pdf can contain Text + images + Tables

Docling can like with OCR tool also. So we can use OCR+Docling

Screenshot 2026-06-18 at 4 31 22 PM Screenshot 2026-06-18 at 4 40 15 PM

Clone this wiki locally