An interactive web reader for "Machine Learning and Artificial Intelligence: Concepts, Algorithms and Models" by Prof. Reza Rawassizadeh (Boston University) — with a RAG-powered AI study assistant.
Five chapters (Ch 3, 8, 9, 10, 11) of the professor's open-source textbook, extracted from PDF into editorial-quality MDX, rendered with KaTeX + Shiki, and paired with a chat panel that retrieves relevant passages via pgvector and streams answers from Llama 3.1 8B.
- Live: (deployment URL TBD)
- Textbook source: Prof. Rawassizadeh's GitHub repo
| Layer | Choice |
|---|---|
| Framework | Next.js 16 (App Router, Turbopack) |
| Content | MDX + remark-math + rehype-katex + Shiki |
| Styling | Tailwind CSS 4 (Newsreader serif + Inter) |
| Vector store | Supabase pgvector (HNSW, 768-dim) |
| Embeddings | Cloudflare Workers AI — @cf/baai/bge-base-en-v1.5 |
| LLM | Cloudflare Workers AI — @cf/meta/llama-3.1-8b-instruct |
| AI orchestration | Vercel AI SDK (streamText) |
| Assets | Cloudflare R2 (public bucket) |
| PDF extraction | marker-pdf via Google Colab |
Prerequisites: Node 20+, npm, and accounts on Supabase (free) + Cloudflare (free).
git clone https://github.com/Amith71965/MLBook-WebReader.git
cd MLBook-WebReader
npm install
cp .env.example .env.local # fill in values — see below
npm run devOpen http://localhost:3000.
Edit .env.local:
# Supabase — create a free project at https://supabase.com
NEXT_PUBLIC_SUPABASE_URL=https://<your-project>.supabase.co
SUPABASE_SERVICE_ROLE_KEY=<service-role-key>
# Cloudflare Workers AI — https://dash.cloudflare.com → AI → Workers AI
CLOUDFLARE_ACCOUNT_ID=<account-id>
CLOUDFLARE_AI_API_TOKEN=<api-token-with-workers-ai-read-permission>In your Supabase SQL editor, run supabase/migrations/20260415_mlbook_rag.sql to create the book_chunks and chat_sessions tables and enable pgvector.
npx tsx scripts/index-book.tsReads all content/mlbook/**/*.mdx, chunks the sections, embeds each chunk via Cloudflare, and inserts ~280 rows into book_chunks. Takes ~2 min.
Images are already hosted on the project's public R2 bucket:
https://pub-ee43721261544e8e8a0ca430d5d2c560.r2.dev/
You do not need your own R2 bucket to run this locally. Section MDX files reference these URLs directly; your dev server fetches them anonymously. Egress is free on R2 and the reads fit comfortably in the project's free tier.
If you need to re-extract the PDFs from scratch, see colab/README.md and scripts/upload-r2.sh. Most contributors will never need this.
app/
page.tsx # book landing / TOC
[chapter]/page.tsx # chapter overview
[chapter]/[section]/page.tsx # section reader + chat shell
api/chat/route.ts # RAG chat endpoint (streaming)
components/mlbook/ # ChatPanel, ChapterNav, TextSelectionAction, …
content/mlbook/ # MDX: 5 chapters, 96 sections
_meta.json # book-level metadata
ch<N>/_meta.json # per-chapter section ordering
ch<N>/NN-<slug>.mdx # section content
lib/
mlbook.ts # content loader (chapters/sections)
mlbook-rag.ts # embed + pgvector search + session store
scripts/
split-sections.ts # markdown → MDX per-section with sanitizer
index-book.ts # chunk + embed + upload to pgvector
upload-r2.sh # upload images/PDFs/cover to R2
supabase/migrations/ # SQL schema
colab/extract_pdfs.ipynb # one-off PDF extraction (runs on Colab T4)
See CONTRIBUTING.md for the full workflow (fork → branch → PR). Short version:
- Fork the repo and clone your fork.
git checkout -b feat/<short-description>- Follow the Quick start to get it running locally.
- Make your changes, run
npm run buildto verify nothing broke. - Open a PR against
mainwith a clear description of the problem and the fix.
Known issues and planned improvements are tracked in GitHub Issues. The CHANGELOG lists what's shipped.
- Textbook content © Prof. Reza Rawassizadeh — used with permission for this open-source reader implementation. See the original repo for the canonical PDFs and companion Jupyter notebooks.
- Reader code — MIT.