Skip to content

GemX-v0.5.0

Choose a tag to compare

@Avaneesh40585 Avaneesh40585 released this 11 Jun 14:01

v0.5.0 Release Notes

The follow-up to v0.4.0 turns GemX into a genuine document workspace. The headline is local RAG — large files and long transcripts are now indexed on-device and retrieved per question, so you can attach real documents without blowing up the context window. Alongside it, GemX learned to recover the actual math out of your files (Word equations become LaTeX, PDF equations are read straight off the rendered page), gained native PowerPoint (.pptx) support, and got a round of polish on the thinking block, the streaming avatar, and the responsive layout — all still running entirely on your Mac, no cloud.

Local RAG — Attach Big Documents Without Blowing the Context Window

  • On-device retrieval, not context stuffing. Large documents (and long Whisper transcripts) are no longer crammed whole into the prompt. GemX chunks them, embeds each chunk locally, and at send time retrieves only the passages relevant to your question — so a 60-page report and a one-line follow-up cost roughly the same.
  • Local embeddings via Supabase/gte-small. A 384-dim embedding model runs through Transformers.js with WebGPU → WASM fallback, weights cached in IndexedDB — the same no-Python, in-browser pattern as Whisper. The first attach shows a one-time "Downloading embedding model… N%"; after that it's instant.
  • Vectors live on disk, never in localStorage. Each conversation gets its own on-disk vector store under rag/<conversation>/<doc>/; the chat message keeps only a compact <context name="report.pdf" indexed="84" /> marker. Base64 payloads stay out of app state entirely.
  • Retrieval is ephemeral and per-turn. Fresh, query-relevant excerpts are injected into the outbound message only — never persisted. History stays lean, so it's never the thing that gets pruned away when the window fills.
  • Self-cleaning indexes. Deleting a conversation drops its whole RAG folder; removing an indexed attachment before sending deletes just that doc's index — no orphaned vectors left behind.
  • A dedicated "Indexing…" state. Chunking + embedding shows its own status in the composer and no longer borrows Whisper's "Loading model…" label, so the two never cross wires.

Real Math Recovery From Documents

  • DOCX equations become LaTeX. Word stores math as structured OMML, which mammoth silently drops. GemX now walks word/document.xml itself and converts every m:oMath to LaTeX ($…$ / $$…$$) in reading order — fractions, sub/superscripts, radicals, n-ary sums and integrals, delimiters, matrices, accents, and a full Unicode-symbol map — falling back to mammoth only on error. Equations render via KaTeX in the reply.
  • PDF math is read off the page, not the text. PDF math is glyph-soup no parser can recover, so under a multimodal model the relevant pages are rendered to images (pdfjs-dist) and attached at retrieval time — the model reads the real equations off the actual page. Text-only (mlx-lm) models fall back to plain pdf-parse automatically.
  • Honest about what's supported. GemX reads PDF, DOCX, PPTX, and text-based files (code / plaintext / Markdown), plus images and audio — not arbitrary binaries. The docs and README were corrected to say exactly that, rather than implying "any file."

PowerPoint (.pptx) Support

  • Decks go straight in. Attach a .pptx and GemX extracts the slide text, reusing the same OOXML/OMML pipeline as DOCX — so slide equations come through as LaTeX too, with no new heavy dependencies.
  • True display order. Slides are emitted in the order you'd present them — resolved from presentation.xml and its relationships, not by filename — so reordered decks read correctly. Each slide gets a ## Slide N header.
  • Speaker notes included. Where the substance often lives — each slide's notes are appended under a _Notes:_ line.
  • Same threshold logic as everything else. Small decks inline; large decks index for retrieval (RAG) exactly like DOCX. (Out of scope for now: slide images, diagrams, and SmartArt — text + equations + notes only.)

UX Polish

  • No more mid-response cutoffs on multi-doc chats. Token estimation used to count an image's full base64 length as tokens — attaching a few documents could collapse max_tokens to its floor and truncate the answer mid-sentence. Images are now charged a flat, realistic prompt cost, so replies run to completion.
  • Quieter attachments. The redundant "indexed" badge beside documents is gone — the indexing status above the composer is enough.

Under the Hood

  • New src/main/pptx.ts (parsePptxWithMath) and the shared walk / local / childEls helpers factored out of src/main/docx.ts, so DOCX and PPTX run through one namespace-agnostic OMML walk.
  • New renderer libs: lib/embeddings.ts (gte-small feature-extraction), lib/rag.ts (chunking, cosine retrieval, context building), and lib/pdf.ts (per-page text + JPEG rendering).
  • New main module src/main/rag.ts plus IPC channels rag:put / get / delete / put-pages / get-pages and file:parse-pptx.
  • New dependencies: pdfjs-dist, jszip, @xmldom/xmldom.

Upgrading

Drop the new GemX.app in over the old one. Your downloaded models, custom models, settings, and HuggingFace / Tavily keys are preserved. Embedding weights download once on your first large attachment and are cached from then on.