GemX-v0.5.0
v0.5.0 Release Notes
The follow-up to v0.4.0 turns GemX into a genuine document workspace. The headline is local RAG — large files and long transcripts are now indexed on-device and retrieved per question, so you can attach real documents without blowing up the context window. Alongside it, GemX learned to recover the actual math out of your files (Word equations become LaTeX, PDF equations are read straight off the rendered page), gained native PowerPoint (.pptx) support, and got a round of polish on the thinking block, the streaming avatar, and the responsive layout — all still running entirely on your Mac, no cloud.
Local RAG — Attach Big Documents Without Blowing the Context Window
- On-device retrieval, not context stuffing. Large documents (and long Whisper transcripts) are no longer crammed whole into the prompt. GemX chunks them, embeds each chunk locally, and at send time retrieves only the passages relevant to your question — so a 60-page report and a one-line follow-up cost roughly the same.
- Local embeddings via
Supabase/gte-small. A 384-dim embedding model runs through Transformers.js with WebGPU → WASM fallback, weights cached in IndexedDB — the same no-Python, in-browser pattern as Whisper. The first attach shows a one-time "Downloading embedding model… N%"; after that it's instant. - Vectors live on disk, never in
localStorage. Each conversation gets its own on-disk vector store underrag/<conversation>/<doc>/; the chat message keeps only a compact<context name="report.pdf" indexed="84" />marker. Base64 payloads stay out of app state entirely. - Retrieval is ephemeral and per-turn. Fresh, query-relevant excerpts are injected into the outbound message only — never persisted. History stays lean, so it's never the thing that gets pruned away when the window fills.
- Self-cleaning indexes. Deleting a conversation drops its whole RAG folder; removing an indexed attachment before sending deletes just that doc's index — no orphaned vectors left behind.
- A dedicated "Indexing…" state. Chunking + embedding shows its own status in the composer and no longer borrows Whisper's "Loading model…" label, so the two never cross wires.
Real Math Recovery From Documents
- DOCX equations become LaTeX. Word stores math as structured OMML, which
mammothsilently drops. GemX now walksword/document.xmlitself and converts everym:oMathto LaTeX ($…$/$$…$$) in reading order — fractions, sub/superscripts, radicals, n-ary sums and integrals, delimiters, matrices, accents, and a full Unicode-symbol map — falling back tomammothonly on error. Equations render via KaTeX in the reply. - PDF math is read off the page, not the text. PDF math is glyph-soup no parser can recover, so under a multimodal model the relevant pages are rendered to images (
pdfjs-dist) and attached at retrieval time — the model reads the real equations off the actual page. Text-only (mlx-lm) models fall back to plainpdf-parseautomatically. - Honest about what's supported. GemX reads PDF, DOCX, PPTX, and text-based files (code / plaintext / Markdown), plus images and audio — not arbitrary binaries. The docs and README were corrected to say exactly that, rather than implying "any file."
PowerPoint (.pptx) Support
- Decks go straight in. Attach a
.pptxand GemX extracts the slide text, reusing the same OOXML/OMML pipeline as DOCX — so slide equations come through as LaTeX too, with no new heavy dependencies. - True display order. Slides are emitted in the order you'd present them — resolved from
presentation.xmland its relationships, not by filename — so reordered decks read correctly. Each slide gets a## Slide Nheader. - Speaker notes included. Where the substance often lives — each slide's notes are appended under a
_Notes:_line. - Same threshold logic as everything else. Small decks inline; large decks index for retrieval (RAG) exactly like DOCX. (Out of scope for now: slide images, diagrams, and SmartArt — text + equations + notes only.)
UX Polish
- No more mid-response cutoffs on multi-doc chats. Token estimation used to count an image's full base64 length as tokens — attaching a few documents could collapse
max_tokensto its floor and truncate the answer mid-sentence. Images are now charged a flat, realistic prompt cost, so replies run to completion. - Quieter attachments. The redundant "indexed" badge beside documents is gone — the indexing status above the composer is enough.
Under the Hood
- New
src/main/pptx.ts(parsePptxWithMath) and the sharedwalk/local/childElshelpers factored out ofsrc/main/docx.ts, so DOCX and PPTX run through one namespace-agnostic OMML walk. - New renderer libs:
lib/embeddings.ts(gte-small feature-extraction),lib/rag.ts(chunking, cosine retrieval, context building), andlib/pdf.ts(per-page text + JPEG rendering). - New main module
src/main/rag.tsplus IPC channelsrag:put/get/delete/put-pages/get-pagesandfile:parse-pptx. - New dependencies:
pdfjs-dist,jszip,@xmldom/xmldom.
Upgrading
Drop the new GemX.app in over the old one. Your downloaded models, custom models, settings, and HuggingFace / Tavily keys are preserved. Embedding weights download once on your first large attachment and are cached from then on.