NotebookRAG — Chat with your documents

Live Link: note-book-llm-rag-ebon.vercel.app · Source: github.com/Ishan007-bot/NoteBookLLM_RAG

A NotebookLM-style web app that lets you upload a document (PDF, Word, text, markdown, or CSV) and ask questions about it. Answers are grounded in the document and cite specific page or row numbers — when the document doesn't contain an answer, the model says so explicitly instead of hallucinating.

Built with Next.js 16, Groq (Llama-3.3-70B), Google Gemini embeddings, Qdrant vector DB, and the Vercel AI SDK.

Features

Multi-format upload — drag-drop or click. Supports .pdf, .docx, .txt, .md, .csv up to 25 MB
Streaming chat — token-by-token responses via Vercel AI SDK + Groq
Page citations — every answer cites the page numbers it relied on, e.g. (Page 3)
Inline source viewer — collapsible panel under each answer showing the retrieved chunks with similarity scores
Strict grounding — refuses to answer questions not covered by the document (returns "The document does not cover this." instead of guessing)
Per-session isolation — each upload gets a unique session ID and Qdrant collection so different documents don't cross-contaminate

Tech stack

Layer	Choice
Framework	Next.js 16 (App Router, Turbopack) on Node runtime
LLM	`llama-3.3-70b-versatile` via Groq
Embeddings	`gemini-embedding-001` (3072-dim) via Google AI Studio
Vector DB	Qdrant (local Docker for dev, Qdrant Cloud for prod)
Document parsing	`WebPDFLoader` (PDF), `mammoth` (DOCX), native string ops (TXT/MD/CSV) — all buffer-based, serverless-safe
Streaming	Vercel AI SDK v6 (`streamText` + `useChat`)
Styling	Tailwind CSS v4

How it works

                ┌──────────────┐
                │   /api/upload │
                └──────┬───────┘
                       │
   PDF ──► WebPDFLoader ──► RecursiveCharacterTextSplitter
                       │   (1000 char chunks, 200 char overlap,
                       │    page metadata preserved on each chunk)
                       ▼
              filter empty/whitespace chunks
                       │
                       ▼
       Gemini batchEmbedContents (32 chunks/request)
       ─────► fallback to single embedContent on batch failure
              (fetchWithRetry: 5 attempts, exp-backoff up to 30s,
               respects Retry-After headers)
                       │
                       ▼
       Qdrant create-collection + upsert (cosine, dim=3072)
                       │
                       ▼
              return { sessionId, pages, chunkCount }


                ┌──────────────┐
                │   /api/chat   │
                └──────┬───────┘
                       │
        last user message ──► Gemini embedContent
                              (taskType: RETRIEVAL_QUERY)
                       │
                       ▼
        Qdrant similarity search (top-4 with scores)
                       │
                       ▼
        build strict grounded-answer system prompt
        with retrieved chunks tagged "[Source N | Page X]"
                       │
                       ▼
        Groq streamText (llama-3.3-70b-versatile, T=0.2)
                       │
                       ▼
        toUIMessageStreamResponse with sources in metadata

Why this RAG configuration?

Chunk size 1000 / overlap 200 — large enough to preserve a paragraph's local context, small enough that retrieval pulls focused snippets. ~20% overlap means a sentence cut by a chunk boundary still appears intact in a neighbour, so retrieval doesn't miss it.
RecursiveCharacterTextSplitter — splits hierarchically on paragraph → sentence → word boundaries, so chunks end on natural breaks rather than mid-word.
Per-chunk error isolation — Gemini's batch endpoint fails the entire batch if any single input is malformed. The pipeline catches batch failures and falls back to one-at-a-time calls so a single bad chunk can't kill the whole upload.
RETRIEVAL_DOCUMENT vs RETRIEVAL_QUERY taskTypes — Gemini documents asymmetric task types for indexing vs querying. Using both gives noticeably better retrieval quality than calling without taskType.
Direct @qdrant/js-client-rest — bypasses LangChain's QdrantVectorStore so we control collection creation, upsert batching, and per-chunk error handling.

Local development

Prerequisites

Node.js 20+
Docker Desktop (for local Qdrant)
A Groq API key — free at https://console.groq.com/keys
A Google AI Studio API key — free at https://aistudio.google.com/apikey

Setup

# 1. Clone and install
git clone https://github.com/Ishan007-bot/NoteBookLLM_RAG.git
cd NoteBookLLM_RAG
npm install --legacy-peer-deps

# 2. Start Qdrant locally
docker run -d --name qdrant -p 6333:6333 -p 6334:6334 qdrant/qdrant

# 3. Configure environment
cp .env.example .env.local
# Edit .env.local and paste your Groq + Google keys.
# Leave QDRANT_URL=http://127.0.0.1:6333 and QDRANT_API_KEY blank for local dev.

# 4. Run
npm run dev

Open http://localhost:3000.

Windows note: use 127.0.0.1 not localhost in QDRANT_URL — Node's fetch on Windows often resolves localhost to IPv6, which Docker Qdrant doesn't bind to.

Environment variables

GROQ_API_KEY=        # https://console.groq.com/keys
GROQ_API_KEY_2=      # optional backup, auto-rotated on 429 / quota errors
GROQ_API_KEY_3=      # optional backup
GOOGLE_API_KEY=      # https://aistudio.google.com/apikey
GOOGLE_API_KEY_2=    # optional backup
GOOGLE_API_KEY_3=    # optional backup
QDRANT_URL=          # http://127.0.0.1:6333 for local, your cluster URL for cloud
QDRANT_API_KEY=      # blank for local Docker, required for Qdrant Cloud

Key rotation: when a Groq or Gemini key returns a 429 / quota / 401 error, the app rotates to the next configured backup key. State is per-process (in-memory) — on serverless platforms each instance maintains its own pointer. Add up to 5 keys per provider (*_API_KEY, *_API_KEY_2, ..., *_API_KEY_5).

Deployment

A working deployment is live at note-book-llm-rag-ebon.vercel.app.

The app is designed for serverless (Vercel) — PDF parsing uses unpdf (no browser-API polyfills), the upload route uses the Node runtime with maxDuration = 60, and chunking happens entirely in memory (no disk writes).

Deploy your own — Vercel + Qdrant Cloud

Sign up at https://cloud.qdrant.io and create a free-tier cluster (1 GB RAM, ~1M vectors). Pick AWS / us-east-1 to match Vercel's default region for lowest latency.
Copy the cluster URL and create an API key.
Push to GitHub, then import the repo at https://vercel.com/new.
Set the environment variables in Vercel project settings — at minimum:
- GROQ_API_KEY
- GOOGLE_API_KEY
- QDRANT_URL (your Qdrant Cloud URL — not localhost)
- QDRANT_API_KEY (your Qdrant Cloud API key)
- Optionally GROQ_API_KEY_2, _3, GOOGLE_API_KEY_2, _3 for backup-key rotation
Deploy.

Project layout

notebook-rag/
├─ app/
│  ├─ api/
│  │  ├─ upload/route.ts   # POST: parse PDF, chunk, embed, index in Qdrant
│  │  └─ chat/route.ts     # POST: retrieve context, stream grounded answer
│  ├─ layout.tsx           # Root layout, fonts, metadata
│  ├─ page.tsx             # State machine: upload view ↔ chat view
│  └─ globals.css          # Tailwind + theme tokens
├─ components/
│  ├─ upload-view.tsx      # Drag-drop upload zone with progress + errors
│  ├─ chat-view.tsx        # Streaming chat UI with collapsible sources
│  └─ background-decor.tsx # Editorial ornaments / atmospheric SVGs
├─ lib/
│  ├─ rag.ts               # Chunking, embedding, indexing, retrieval, prompts
│  ├─ api-keys.ts          # Multi-key rotation for Groq + Gemini
│  └─ types.ts             # Shared types (UploadResponse, RetrievedChunk, ...)
├─ next.config.ts
└─ .env.example            # Required env vars (committed; .env.local is gitignored)

Limitations & notes

Free-tier rate limits — Gemini's free tier allows 100 RPM and 1500 RPD on gemini-embedding-001. Large documents (~500+ chunks) or rapid re-uploads can hit this. Pipeline pacing (1.5s between batches) keeps normal uploads comfortably under the limit; on persistent 429s the app rotates to the next configured backup API key automatically.
Image-only / scanned PDFs — unpdf extracts text only; scanned-image PDFs without OCR will produce zero chunks and fail at upload with a clear error.
Per-session collections — each upload creates a new Qdrant collection (doc-{uuid}). Old collections accumulate in Qdrant; in production, add a TTL/cleanup job or expire by collection age.
No auth — anyone with the URL can upload and query. Add NextAuth or similar before exposing publicly.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
app		app
components		components
lib		lib
public		public
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NotebookRAG — Chat with your documents

Features

Tech stack

How it works

Why this RAG configuration?

Local development

Prerequisites

Setup

Environment variables

Deployment

Deploy your own — Vercel + Qdrant Cloud

Project layout

Limitations & notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NotebookRAG — Chat with your documents

Features

Tech stack

How it works

Why this RAG configuration?

Local development

Prerequisites

Setup

Environment variables

Deployment

Deploy your own — Vercel + Qdrant Cloud

Project layout

Limitations & notes

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages