Skip to content

Releases: Avaneesh40585/GemX

GemX-v1.0.0

Choose a tag to compare

@Avaneesh40585 Avaneesh40585 released this 12 Jun 22:32

v1.0.0 Release Notes

GemX hits 1.0. The follow-up to v0.5.0 is about making GemX feel like a real tool rather than just a chat window — you can now shape how the model responds (custom instructions, personas, sampling), use GemX as a backend for everything else via an OpenAI-compatible API server you can even run headless from the terminal, and find your way around a redesigned, sectioned Settings panel. Below the input is a quieter status line that tells you only what matters for the next message. Still 100% local, still no cloud.

Make It Yours — Custom Instructions & Personas

  • Custom instructions, applied everywhere. A free-text field in Settings ▸ Personalization is appended to GemX's system prompt on every chat — "always answer in British English," "prefer bullet points," whatever you like. It's appended, never substituted, so the built-in date, tool, vision, and citation behavior keeps working.
  • Named personas, switched per conversation. Save reusable prompts ("terse Rust reviewer," "Socratic tutor") and pick one per conversation from a dropdown in the chat header. The active persona's prompt rides along with your custom instructions for that thread only.
  • It never leaks into history. The personalization text is composed fresh each turn and sent as a separate field — it shapes the system prompt, it doesn't clutter your saved messages.

Dial In The Response — Sampling Controls

  • Three presets, one click. Settings ▸ Behavior ▸ Response Style offers Precise / Balanced / Creative — Precise sticks to likely tokens (code, facts), Creative explores more (writing, brainstorming).
  • Full control when you want it. An Advanced expander exposes raw temperature, top_p, top_k, and repetition_penalty; editing any value flips the preset to "custom." Values are forwarded to the MLX server only when set, so unsupported sampler params are simply ignored.

Use GemX From Anything — Local API Server

  • An OpenAI-compatible endpoint. Flip on Settings ▸ Developer ▸ Local API server and point any OpenAI client — your editor, a script, any agent tool — at http://127.0.0.1:11535/v1. It's a thin proxy in front of the running MLX server, so it serves whichever model is currently loaded — a built-in Gemma 4 variant or one of your own custom models.
  • Works with agent harnesses, not just chat. GemX emits native OpenAI tool_calls (Gemma 4 via mlx-vlm, verified end-to-end), so agentic coding tools — Cline, Kilo Code, Continue.dev, Zed, JetBrains AI Assistant, Goose, OpenCode — can drive it, not only chat clients. Setup guides (and the two tools that can't connect, and why) are in docs/api-clients.md. Prefer Gemma 4 12B with a raised context window for agent work.
  • Safe by default. Off until you enable it, bound to 127.0.0.1 only, and /v1/*-only. An optional bearer token can gate requests; LAN exposure (binding 0.0.0.0) is a separate, explicit opt-in that requires a token, with a visible warning. The panel shows a copyable base URL and a live running/stopped dot.
  • Correct for real clients. The proxy pins the request model to the loaded model (the MLX server would otherwise try to load whatever id a client sends) and answers GET /v1/models locally with exactly the loaded model, so auto-discovery clients aren't misled.
  • Zero new dependencies. Built on Node's built-in http, streams responses (SSE) through unbuffered, and survives model switches untouched since it forwards to a fixed internal port.

Run It Headless — gemx serve From The Terminal

  • No GUI required, Ollama-style. Run the same server straight from the terminal. One-time setup puts a small wrapper script on your PATH:

    # ⚠️ Use a wrapper script, NOT a bare symlink — macOS Electron resolves its bundled
    # helper apps relative to the executable path, so a symlinked launch dies with
    # "Unable to find helper app." If you made a symlink before, remove it first.
    sudo tee /usr/local/bin/gemx >/dev/null <<'EOF'
    #!/bin/bash
    exec "/Applications/GemX.app/Contents/MacOS/GemX" "$@"
    EOF
    sudo chmod +x /usr/local/bin/gemx
    
    gemx serve                                   # last-used model, loopback, port 11535
    gemx serve --port 8080 --model mlx-community/gemma-4-e4b-it-4bit
    gemx serve --lan --token my-secret           # expose on the LAN with auth
    gemx serve --help
  • Same engine, no window. A --serve mode of the app binary boots the MLX runtime + the API proxy with the Dock icon hidden, prints the endpoint, and stays in the foreground until you Ctrl-C (which frees both ports cleanly). Launch the GemX app once first so the model weights and Python runtime are set up. (Building from source? npm run serve does the same against a dev build.)

  • GUI and gemx serve are mutually exclusive. They're the same app sharing one macOS instance and one model server (port 11534), not separate daemons — so run one or the other. If you want the UI and an endpoint at the same time, skip gemx serve and just turn on the in-app Settings ▸ Developer ▸ Local API server.

Redesigned Settings — Sectioned & Navigable

  • A real settings layout. The old single flat panel is now a modal with a left nav rail (a horizontal tab strip on narrow windows) across five clearly-named sections: Behavior (thinking, web search + Tavily, context window, sampling), Personalization (custom instructions, personas), Models (downloaded built-ins and custom models, HuggingFace token), Appearance (theme), and Developer (the API server). The confusing "Model" / "Models & Storage" pairing is gone.
  • Extracted from the sidebar. The settings UI moved into its own component, leaving the sidebar to focus on conversations.

Find Things Faster — Search, Palette, Context Meter

  • Conversation search. A search box at the top of the sidebar filters chats by title and message contents, across both Pinned and Chats.
  • Command palette. ⌘K opens a filtered command list — new chat, search, toggle sidebar, open settings, toggle thinking, switch model. Plus ⌘N (new chat) and ⌘F (focus search); ⌘B still toggles the sidebar.
  • Context-usage meter. The composer footer shows a live used / max token gauge (green → amber → red as you approach the limit), so you can see how full the window is getting before history starts getting trimmed.

Quieter Input Footer

  • Only what matters. The verbose "mic for voice / drop a file" hints below the input are gone. In their place: compact indicators that appear only when active — the conversation's persona, a Thinking badge when reasoning is on, and a Web badge when search is enabled — alongside the context meter and a minimal ⏎ send · ⇧⏎ newline hint.

Under the Hood

  • New src/main/apiServer.ts — the opt-in OpenAI-compatible reverse proxy (Node http, no new dependency) — plus IPC channels apiserver:set-config / apiserver:get-status and a synchronous teardown wired into app quit. It pins the request model to the loaded model and synthesizes /v1/models.
  • Headless serve mode (parseServeArgs / runHeadless) added to src/main/index.ts; MLX_PORT is now exported so the proxy can target it. New bin/gemx wrapper shim, an npm run serve script, and a package "bin" entry. Serve mode also disables the GPU / network-service helpers since a windowless server needs neither.
  • New renderer pieces: components/Settings.tsx (sectioned modal), components/CommandPalette.tsx (⌘K), and lib/tokens.ts (client-side token estimate for the meter).
  • AppSettings gained customInstructions, personas, sampling, and apiServer; ChatRequest gained systemPromptExtra and sampling; chatSystemPrompt(enableTools) became chatSystemPrompt(enableTools, extra?). All new settings are defaulted on read, so existing preferences upgrade silently.

Upgrading

Drop the new GemX.app in over the old one. Your conversations, downloaded models, custom models, settings, and HuggingFace / Tavily keys are all preserved — new settings simply appear with sensible defaults. The local API server stays off until you enable it. For terminal use, install the wrapper script once (see Run It Headless above — not a bare symlink) and launch the app at least once so the runtime is installed.

GemX-v0.5.0

Choose a tag to compare

@Avaneesh40585 Avaneesh40585 released this 11 Jun 14:01

v0.5.0 Release Notes

The follow-up to v0.4.0 turns GemX into a genuine document workspace. The headline is local RAG — large files and long transcripts are now indexed on-device and retrieved per question, so you can attach real documents without blowing up the context window. Alongside it, GemX learned to recover the actual math out of your files (Word equations become LaTeX, PDF equations are read straight off the rendered page), gained native PowerPoint (.pptx) support, and got a round of polish on the thinking block, the streaming avatar, and the responsive layout — all still running entirely on your Mac, no cloud.

Local RAG — Attach Big Documents Without Blowing the Context Window

  • On-device retrieval, not context stuffing. Large documents (and long Whisper transcripts) are no longer crammed whole into the prompt. GemX chunks them, embeds each chunk locally, and at send time retrieves only the passages relevant to your question — so a 60-page report and a one-line follow-up cost roughly the same.
  • Local embeddings via Supabase/gte-small. A 384-dim embedding model runs through Transformers.js with WebGPU → WASM fallback, weights cached in IndexedDB — the same no-Python, in-browser pattern as Whisper. The first attach shows a one-time "Downloading embedding model… N%"; after that it's instant.
  • Vectors live on disk, never in localStorage. Each conversation gets its own on-disk vector store under rag/<conversation>/<doc>/; the chat message keeps only a compact <context name="report.pdf" indexed="84" /> marker. Base64 payloads stay out of app state entirely.
  • Retrieval is ephemeral and per-turn. Fresh, query-relevant excerpts are injected into the outbound message only — never persisted. History stays lean, so it's never the thing that gets pruned away when the window fills.
  • Self-cleaning indexes. Deleting a conversation drops its whole RAG folder; removing an indexed attachment before sending deletes just that doc's index — no orphaned vectors left behind.
  • A dedicated "Indexing…" state. Chunking + embedding shows its own status in the composer and no longer borrows Whisper's "Loading model…" label, so the two never cross wires.

Real Math Recovery From Documents

  • DOCX equations become LaTeX. Word stores math as structured OMML, which mammoth silently drops. GemX now walks word/document.xml itself and converts every m:oMath to LaTeX ($…$ / $$…$$) in reading order — fractions, sub/superscripts, radicals, n-ary sums and integrals, delimiters, matrices, accents, and a full Unicode-symbol map — falling back to mammoth only on error. Equations render via KaTeX in the reply.
  • PDF math is read off the page, not the text. PDF math is glyph-soup no parser can recover, so under a multimodal model the relevant pages are rendered to images (pdfjs-dist) and attached at retrieval time — the model reads the real equations off the actual page. Text-only (mlx-lm) models fall back to plain pdf-parse automatically.
  • Honest about what's supported. GemX reads PDF, DOCX, PPTX, and text-based files (code / plaintext / Markdown), plus images and audio — not arbitrary binaries. The docs and README were corrected to say exactly that, rather than implying "any file."

PowerPoint (.pptx) Support

  • Decks go straight in. Attach a .pptx and GemX extracts the slide text, reusing the same OOXML/OMML pipeline as DOCX — so slide equations come through as LaTeX too, with no new heavy dependencies.
  • True display order. Slides are emitted in the order you'd present them — resolved from presentation.xml and its relationships, not by filename — so reordered decks read correctly. Each slide gets a ## Slide N header.
  • Speaker notes included. Where the substance often lives — each slide's notes are appended under a _Notes:_ line.
  • Same threshold logic as everything else. Small decks inline; large decks index for retrieval (RAG) exactly like DOCX. (Out of scope for now: slide images, diagrams, and SmartArt — text + equations + notes only.)

UX Polish

  • No more mid-response cutoffs on multi-doc chats. Token estimation used to count an image's full base64 length as tokens — attaching a few documents could collapse max_tokens to its floor and truncate the answer mid-sentence. Images are now charged a flat, realistic prompt cost, so replies run to completion.
  • Quieter attachments. The redundant "indexed" badge beside documents is gone — the indexing status above the composer is enough.

Under the Hood

  • New src/main/pptx.ts (parsePptxWithMath) and the shared walk / local / childEls helpers factored out of src/main/docx.ts, so DOCX and PPTX run through one namespace-agnostic OMML walk.
  • New renderer libs: lib/embeddings.ts (gte-small feature-extraction), lib/rag.ts (chunking, cosine retrieval, context building), and lib/pdf.ts (per-page text + JPEG rendering).
  • New main module src/main/rag.ts plus IPC channels rag:put / get / delete / put-pages / get-pages and file:parse-pptx.
  • New dependencies: pdfjs-dist, jszip, @xmldom/xmldom.

Upgrading

Drop the new GemX.app in over the old one. Your downloaded models, custom models, settings, and HuggingFace / Tavily keys are preserved. Embedding weights download once on your first large attachment and are cached from then on.

GemX-v0.4.0

Choose a tag to compare

@Avaneesh40585 Avaneesh40585 released this 10 Jun 21:56

v0.4.0 Release Notes

The follow-up to v0.3.0 makes GemX feel at home on any Mac, no matter how you keep your desktop. The headline is a full Light mode that spans every surface of the app — chat, composer, sidebar, modals, code blocks, and the README hero — with a three-way Light / Dark / System toggle that follows your macOS Appearance setting live. Alongside it, the assistant's thinking avatar is reborn as the Quantum Binary mark, code blocks are rebuilt to match the clean Gemini-chat look, and the Settings panel is reorganised into a calmer, better-paired layout.

Light Mode — Across the Entire UI

  • Three-way theme toggle in Settings. A new Theme control offers System (default), Light, and Dark. System watches your macOS Appearance and switches the whole app the instant you flip it in Control Center or System Settings — no restart, no reload.
  • Built on CSS custom properties, not per-component branches. Every colour in the app now reads from a semantic token (--surface-*, --fg-*, --border-*, --accent-*, syntax --syn-*, …). Dark is the default :root palette; [data-theme="light"] flips the values. This keeps both themes pixel-consistent and means future surfaces theme themselves for free.
  • No white flash on launch. An IIFE runs before React mounts, reads your saved theme from localStorage, and applies data-theme to <html> immediately — so a Light-mode boot never flickers through a dark frame first.
  • Native chrome stays in sync. The renderer tells the main process the active mode over a new theme:set IPC channel, which drives nativeTheme.themeSource so the titlebar tint and native scrollbars match the in-app theme instead of fighting it.
  • Theme-conditional artwork. The app ships icon-light / icon-dark and hero-light / hero-dark assets; the in-app hero and the README hero both swap automatically with the active theme (<picture> + prefers-color-scheme on GitHub).
  • 'system' is resolved, never stored as a literal. Picking System keeps themeMode: 'system' in settings and resolves to a concrete 'light' | 'dark' at runtime via matchMedia('(prefers-color-scheme: light)'), with a live subscription so the app re-renders when the OS preference changes.

New Assistant Avatar — "Quantum Binary"

  • A solid core dot + a hollow ring. The old 4-point sparkle is replaced by a strictly monochrome mark — one filled qb-core and one outlined qb-ring. It reads as black in Light mode and white in Dark, so it never looks tinted or out of place.
  • Two states that tell you what the model is doing. While thinking, the core and ring orbit a shared invisible centre on opposite sides. When the model starts generating, they slide into a locked bullseye and pulse rhythmically (the ring scales slightly more than the core).
  • Tuned motion. The slide-and-lock uses a cubic-bezier(0.65, 0, 0.35, 1) for a "magnetic friction" feel, and the pulse is delayed 0.65s so it only begins once the slide has fully settled. Pure SVG + CSS keyframes; respects prefers-reduced-motion.

Redesigned Code Blocks

  • Gemini-chat style. Code blocks are rebuilt as a single rounded bg-code panel — no heavy border separators. A compact header carries an uppercase language label and a Copy button; the code body sits inside with a transparent background so it reads as one clean surface.
  • Theme-aware syntax. Highlighting is driven by --syn-* tokens (One Dark in dark, One Light in light), so snippets stay legible and on-palette in either theme.

Settings Panel — Reorganised & Paired

  • A calmer top-to-bottom order: Theme → Web Search (+ Tavily key) → Thinking → Context Window → HuggingFace Token. Related controls now sit together instead of being scattered.
  • Downloaded Models pinned. The Downloaded Models surface spans its own column (lg:row-span-5) so built-ins and customs stay visible while you adjust the rest of the panel.

UX Polish

  • Cleaner composer attach affordance. The attach control is back to a plain paperclip (the old inline count chip is gone). Attachments render as cards above the textarea — the same clear, low-noise pattern Claude and Gemini use.
  • Consistent steppers. The custom-model modal's context-window inputs use the same power-of-two stepper as Settings → Context Window, so the two surfaces feel identical.

Under the Hood

  • New src/renderer/src/lib/theme.ts (resolveTheme, useEffectiveTheme, useAppliedTheme, THEME_OPTIONS) centralises all theme resolution and subscriptions.
  • useAppliedTheme() lets components without settings in scope (Setup, the empty-state hero) react to theme via a MutationObserver on <html data-theme>.
  • The full contributor docs/ and from-scratch learning/ curriculum were audited and brought current with the theme system, the Quantum Binary avatar, and the custom-model registry.

Upgrading

Drop the new GemX.app in over the old one. Your downloaded models, custom models, settings, and HuggingFace / Tavily keys are preserved. On first launch GemX defaults to System theme — switch to a fixed Light or Dark any time from Settings.

GemX-v0.3.0

Choose a tag to compare

@Avaneesh40585 Avaneesh40585 released this 10 Jun 12:29

v0.3.0 Release Notes

The follow-up to v0.2.0 opens the model registry: you can now bring your own MLX model from HuggingFace through a new + Custom flow, with a three-layer safety net that prevents the silent crashes that runtime mismatches used to cause. The Settings panel collapses to a single Downloaded Models surface for both built-ins and customs, the assistant gets an animated avatar during thinking and inference, and a round of UX polish lands across the picker, modal, and composer.

Bring Your Own Model — Custom HuggingFace Models

  • + Custom button next to the model picker accepts any mlx-vlm or mlx-lm model from the mlx-community/* HuggingFace organisation. Paste a repo id, GemX probes HuggingFace for sane defaults, and the model joins your picker under a Custom subheader (label + runtime).
  • mlx-community/ gate* — both the modal (client-side) and the main-process probe reject other namespaces up-front with a link to huggingface.co/mlx-community/collections. Models from other orgs often aren't MLX-quantised and would fail at load.
  • Auto-detect runtime, context window, thinking support — fetches config.json + chat_template.jinja from huggingface.co/<repo>/resolve/main/…. Runtime comes from model_type + architectures (a known multimodal allow-list + a VL/Vision/Image arch-string check); context max from max_position_embeddings with model_max_length / n_positions fallbacks; thinking from enable_thinking in either chat-template source.
  • Lazy mlx-lm install — built-ins all run on mlx-vlm, so users who never add a text-only custom model don't pay the install cost. The first time you select an mlx-lm custom, GemX pip installs mlx-lm into the existing venv (~30 s) and then spawns mlx_lm.server instead of mlx_vlm.server. Subsequent boots skip the install.
  • HuggingFace token re-used — gated mlx-community repos pick up the token you've already saved in Settings; no per-model auth surface needed.
  • Default context clamps to 16 K — even if a repo reports 256 K, the default lands conservative so first-launch RAM stays modest on 8 GB Macs. You can raise it any time from Settings → Context Window.

Locked Runtime + Immutable Settings at Add Time

  • Runtime is auto-detected and locked. The modal shows it as a readonly chip with an "Auto-detected from config.json — Locked at add time" subtitle. A wrong runtime is the single biggest source of silent server crashes, so the override toggle is gone. If auto-detect was wrong, you remove and re-add.
  • All settings become immutable after Add. A prominent amber banner at the top of the modal makes this clear: "These settings are locked after Add. To fix a mistake, remove the model from Settings → Downloaded Models and re-add. Double-check every field before clicking Add model." There is no edit UI for custom models — remove + re-add is the only path.
  • Other probed fields stay user-editable at add time — label, max context window, default context window, and thinking-support are all overrideable because config.json max_position_embeddings is wrong for several real repos and the chat-template heuristic can miss thinking on unusual templates.

Runtime Mismatch Safety Net

  • Composer blocks image attach for text-only models. When the active model's runtime is mlx-lm, drag-and-drop, paste, and the file picker all filter out image/* files and surface a "This model is text-only — images can't be attached." toast. PDFs, code, audio, and other context attachments continue to work.
  • Boot-time mismatch detection. A new RuntimeMismatchError (mirroring the existing MLXVersionMismatchError pattern) is thrown by waitForHealth when the server's stderr matches a vision-vs-text architecture failure — direction-aware regexes for vision_tower/image_token_index/preprocessor_config.json (vlm-on-text) and vision encoder/multimodal not supported (lm-on-vision). For custom models, the caller surfaces an action-oriented advisory: "{label} appears to be text-only, but was added as mlx-vlm. Remove from Settings → Downloaded Models and re-add."
  • Stream-time mismatch detection. chatStream wraps its !res.ok branch: if image_url parts were in the request body and the response body mentions image / multimodal rejection, the error is upgraded to RuntimeMismatchError and rendered as an error bubble on the offending message. Catches the case where a text-only server loaded fine but rejects images at request time.

Unified Settings: One Downloaded Models Surface

  • Custom and cached built-in models merged into a single Downloaded Models list — no more separate "Custom Models" panel. Per-row layout is consistent for both: label first, custom / active / default tags after, repo id underneath for customs. One trash icon, one window.confirm confirmation, one IPC path.
  • Custom-model deletion removes both the registry entry and the cached weights, so the model vanishes from the picker and the disk in a single click.
  • Active model can't be deleted — the trash icon is disabled with the same "Switch to another model first" tooltip that built-ins use.
  • Thinking toggle gated on the active model. When the active custom has thinkingSupported === false, the Thinking buttons in Settings disable with a "This model doesn't expose a reasoning channel — toggle disabled." subtext. Built-ins (Gemma 4 family) all support thinking, so it stays enabled for them.
  • Context Window slider clamps to the active model's contextWindowMax — the steps filter respects the custom's declared ceiling, with a Math.min(effectiveCtx, maxCtx) belt-and-suspenders clamp for legacy contextWindowOverride values that exceeded a since-lowered Max.

Composer & Modal UX Polish

  • Animated assistant avatar during thinking and inference — a new AssistantAvatar component renders an animated logomark in the assistant bubble while the model is generating, replacing the static placeholder.
  • Mic icon visual parity with the attach icon — the mic SVG was visibly slimmer than the paperclip at the same h-4 w-4 box because its path was thinner. Stroke weight raised and the rect + arc footprint widened so both icons read at the same visual weight in the toolbar.
  • File-count badge removed entirely. The previous emerald-500 notification dot on the paperclip read like a phone-notification badge; a brief inline-chip replacement still felt redundant. Now the paperclip button stays clean and the attached files render as cards above the textarea (image thumbnails + file-icon-plus-name chips with ×) — same pattern Claude and Gemini use.
  • Custom-model context-window inputs match Settings. The modal's Max-context and Default-context fields used to be plain <input type="number"> controls with browser-native spinners. They now use the same display-box + vertical chevron stepper as Settings → Context Window, stepping through the power-of-2 sequence (1 K → 1 M) with disabled boundaries.
  • Custom subheader in the model picker no longer doubles as a per-row tag. The custom chip on each picker row was redundant and is gone — the Custom section header already labels them. Picker rows now show label + runtime, period.
  • Description field removed. The custom-model add modal collected a free-text description that nothing in the app ever rendered. Field dropped from the modal and from the CustomModel type. Legacy entries with stored descriptions still parse fine — the value is silently ignored.

Documentation & Versioning

  • README documents the entire + Custom flow, the locked-runtime + immutable-settings model, image-block behaviour for mlx-lm, and where runtime-mismatch advisories surface.
  • docs/data-model.md documents the updated CustomModel shape, runtime-lock rationale, and the three-layer safety net.
  • docs/mlx-runtime.md adds a new "Runtime-mismatch detection" section walking through Composer block + boot-time RuntimeMismatchError + chatStream upgrade with the file references.
  • docs/ipc-contract.md documents the new probeCustomModel / addCustomModel / removeCustomModel(name, alsoDeleteCache?) IPC surface and the advisory-defaults model.
  • Inspiration credit. Ammaar Reshi's gemma-chat — a beautiful single-purpose Gemma desktop client — is now credited in the README as the visual / structural inspiration that started GemX.

Please see the README in the main repository for complete installation instructions, system requirements, updated model memory specifications, and the new Bring your own model walkthrough.

GemX-v0.2.0

Choose a tag to compare

@Avaneesh40585 Avaneesh40585 released this 06 Jun 05:22

v0.2.0 Release Notes

The first follow-up to the initial release brings a new mid-tier model, opt-in step-by-step reasoning, a completely rebuilt download pipeline, a real settings panel for managing local model storage, and a layout that finally adapts to your window size.

New Models & Context Defaults

  • Gemma 4 12B added: A dense mid-tier variant (mlx-community/gemma-4-12B-it-4bit, ~11 GB on disk) now sits between E4B and the 26B MoE. Default 32K context, native max 256K. Sharper reasoning than E4B without the MoE's RAM cliff. Comfortable on a 24 GB Mac; tight at 16 GB.
  • Context window defaults rebalanced: E4B drops 32K → 16K to leave KV-cache headroom on 8 GB Macs. The 26B MoE drops 128K → 64K to keep its boot-time reservation manageable. 31B rises 32K → 64K to align with the other dense variants. All five models still expose their full native max (128K for E2B/E4B, 256K for 12B / 26B MoE / 31B) via the Context Window slider.
  • Disk sizes refined across the registry: E2B ~4 GB, E4B ~5.5 GB, 12B 11 GB, 26B MoE ~16 GB, 31B ~18 GB. The "27B MoE" label was also corrected to "26B MoE" to match Google's naming.

Thinking Mode (Opt-In)

Gemma 4's built-in step-by-step reasoning is now plumbed end-to-end:

  • Settings → Thinking → Enabled flips enable_thinking: true on every chat request. The model emits its reasoning trace before the final answer.
  • Collapsible Thinking block appears above the answer. The header cycles through "Thinking → Considering → Planning → Pondering → Reasoning → Sketching" while the trace streams in, then collapses to "Thought process" once the final answer begins. Click any time to re-expand.
  • Full markdown inside the block — syntax-highlighted code, KaTeX math, GFM tables, clickable links — same renderer as the main answer.
  • No context-window leak: Past thinking traces are stripped at the IPC boundary on every subsequent turn (defense in depth: Gemma 4's chat template also strips them server-side via its strip_thinking() macro). Reasoning lives in the UI only.

Rebuilt Download Pipeline

The first-launch download was replaced end-to-end after the deprecated huggingface-cli stopped working:

  • Manifest-driven byte progress: Total size is fetched from HfApi.model_info() before the first byte streams, so the progress bar is correct from frame zero — no more "stuck at 0%".
  • Resume detection: Partial blobs from interrupted previous attempts are detected on next launch; the UI swaps the verb from "Downloading" to "Resuming" and only fetches what's missing.
  • Cancel button: Interrupt a download mid-stream and drop back to the model picker — no orphaned Python children, no half-written blobs in an indeterminate state.
  • Authoritative completion marker: A .gemx-verified file is written only after snapshot_download(..., local_files_only=True) integrity-checks the cache. Subsequent launches trust this marker exactly — no more "model present but didn't load" edge cases.

Self-Healing MLX Runtime

  • Auto-upgrade on version mismatch: When the bundled mlx-vlm is too old to load a model (e.g., gemma4_unified requires ≥ 0.6.1), the app catches the typed error, runs pip install --upgrade, and retries boot once. No restart, no manual venv surgery.
  • Lifecycle hardening: A single-flight server wrapper handles dev hot-reload double-mounts cleanly. Orphaned port holders from previous force-quits are reclaimed via lsof before the new server binds. On app quit, the Python child gets a synchronous SIGKILL so nothing leaks across runs.

Settings Panel Expansion

  • Downloaded Models: Every cached model is now listed with size and active/default tags, and one-click delete for any that aren't currently loaded. Reclaim disk space without leaving the app.
  • Context Window: Power-of-2 slider from 4K up to the model's native max, with a Reset button to fall back to the default.
  • Tavily API Key: Paste your free Tavily key (1,000 searches/month, no card required) for faster, more reliable web research. DuckDuckGo remains the fallback when no key is set.
  • HuggingFace Token: Optional token field that speeds up gated-repo downloads.
  • Thinking: Toggle described above.

Responsive Layout

The whole app now adapts to its window size instead of holding a fixed-width column:

  • Minimum window 700×500 (was 820×560) — usable in split-screen and on smaller displays.
  • Sidebar tiers: w-60 on roomy windows, w-52 between 640–768px, auto-collapsed below 640px on first launch (⌘B and the chevron still toggle manually).
  • Settings goes horizontal at ≥ 1024px: The modal becomes a wide 2-column layout — HF Token, Web Search, Thinking, and Context Window on the left, Downloaded Models on the right — instead of a tall scrolling stack.
  • Setup screens go horizontal at ≥ 1024px: The welcome screen splits into intro / model picker; the in-progress screen splits into stages + error / download progress + cancel.
  • Chat area scales padding, the header model dropdown shrinks gracefully, and the composer wraps cleanly at very narrow widths.

UX Polish

  • Manual stop now shows a clear "Response stopped" indicator instead of leaving the bubble blank.
  • Generation errors show a clean "Something went wrong while generating the response." card instead of streaming raw stack traces into chat.
  • Welcome card padding fixed for short windows.
  • LM Studio comparison updated to reflect their new mlx-vlm support. GemX's remaining differentiators: on-device Whisper voice input, the multi-step web research workflow with inline [N] citations, and the MIT open-source license.

Please see the README in the main repository for complete installation instructions, system requirements, updated model memory specifications, and troubleshooting steps.

GemX-v0.1.0

Choose a tag to compare

@Avaneesh40585 Avaneesh40585 released this 23 May 21:21

v0.1.0 Initial Release Notes

Welcome to the very first release of GemX!, a fully local, native chat application for Apple Silicon Macs designed specifically around the multimodal capabilities of Gemma 4. By leveraging Apple's MLX framework and the mlx-vlm runtime, GemX goes beyond standard text streaming to support native vision, document parsing, and agentic web research directly on your machine.

Key Features in this Release

  • Native Multimodal Vision: Paste or drag-and-drop up to four images per message. The mlx-vlm backend ensures Gemma 4 processes these locally through its integrated image encoder.
  • Integrated Web Research: GemX performs multi-step web research—searching, fetching top URLs, and synthesizing information—using Tavily or a DuckDuckGo fallback. The model outputs verifiable, clickable citations for its sources.
  • Local Voice Transcription: Speak directly to the model using an on-device Whisper model powered by WebGPU, ensuring audio data never leaves your computer.
  • Document Context: Attach PDFs, DOCX files, or raw code files. The application extracts the text locally and injects it cleanly as context for your prompt.
  • Model Hot-Swapping: Switch seamlessly between four quantized Gemma 4 variants (ranging from the E2B to the 31B model) mid-conversation without restarting the application or losing chat history.
  • Native UI Elements: The chat interface supports full GitHub Flavored Markdown, LaTeX math rendering, syntax-highlighted code blocks, collapsible reasoning blocks, and a persistent sidebar featuring pinned chats and auto-generated titles.

First-Launch Automation

Upon opening the application for the first time, GemX handles the backend setup automatically. It will locate your Python installation, provision an isolated virtual environment, install the MLX backend dependencies, and download your selected model weights from HuggingFace.

Please see the README in the main repository for complete installation instructions, system requirements, model memory specifications, and troubleshooting steps.