Releases: Avaneesh40585/GemX
Release list
GemX-v1.0.0
v1.0.0 Release Notes
GemX hits 1.0. The follow-up to v0.5.0 is about making GemX feel like a real tool rather than just a chat window — you can now shape how the model responds (custom instructions, personas, sampling), use GemX as a backend for everything else via an OpenAI-compatible API server you can even run headless from the terminal, and find your way around a redesigned, sectioned Settings panel. Below the input is a quieter status line that tells you only what matters for the next message. Still 100% local, still no cloud.
Make It Yours — Custom Instructions & Personas
- Custom instructions, applied everywhere. A free-text field in Settings ▸ Personalization is appended to GemX's system prompt on every chat — "always answer in British English," "prefer bullet points," whatever you like. It's appended, never substituted, so the built-in date, tool, vision, and citation behavior keeps working.
- Named personas, switched per conversation. Save reusable prompts ("terse Rust reviewer," "Socratic tutor") and pick one per conversation from a dropdown in the chat header. The active persona's prompt rides along with your custom instructions for that thread only.
- It never leaks into history. The personalization text is composed fresh each turn and sent as a separate field — it shapes the system prompt, it doesn't clutter your saved messages.
Dial In The Response — Sampling Controls
- Three presets, one click. Settings ▸ Behavior ▸ Response Style offers Precise / Balanced / Creative — Precise sticks to likely tokens (code, facts), Creative explores more (writing, brainstorming).
- Full control when you want it. An Advanced expander exposes raw
temperature,top_p,top_k, andrepetition_penalty; editing any value flips the preset to "custom." Values are forwarded to the MLX server only when set, so unsupported sampler params are simply ignored.
Use GemX From Anything — Local API Server
- An OpenAI-compatible endpoint. Flip on Settings ▸ Developer ▸ Local API server and point any OpenAI client — your editor, a script, any agent tool — at
http://127.0.0.1:11535/v1. It's a thin proxy in front of the running MLX server, so it serves whichever model is currently loaded — a built-in Gemma 4 variant or one of your own custom models. - Works with agent harnesses, not just chat. GemX emits native OpenAI
tool_calls(Gemma 4 via mlx-vlm, verified end-to-end), so agentic coding tools — Cline, Kilo Code, Continue.dev, Zed, JetBrains AI Assistant, Goose, OpenCode — can drive it, not only chat clients. Setup guides (and the two tools that can't connect, and why) are indocs/api-clients.md. Prefer Gemma 4 12B with a raised context window for agent work. - Safe by default. Off until you enable it, bound to
127.0.0.1only, and/v1/*-only. An optional bearer token can gate requests; LAN exposure (binding0.0.0.0) is a separate, explicit opt-in that requires a token, with a visible warning. The panel shows a copyable base URL and a live running/stopped dot. - Correct for real clients. The proxy pins the request
modelto the loaded model (the MLX server would otherwise try to load whatever id a client sends) and answersGET /v1/modelslocally with exactly the loaded model, so auto-discovery clients aren't misled. - Zero new dependencies. Built on Node's built-in
http, streams responses (SSE) through unbuffered, and survives model switches untouched since it forwards to a fixed internal port.
Run It Headless — gemx serve From The Terminal
-
No GUI required, Ollama-style. Run the same server straight from the terminal. One-time setup puts a small wrapper script on your PATH:
# ⚠️ Use a wrapper script, NOT a bare symlink — macOS Electron resolves its bundled # helper apps relative to the executable path, so a symlinked launch dies with # "Unable to find helper app." If you made a symlink before, remove it first. sudo tee /usr/local/bin/gemx >/dev/null <<'EOF' #!/bin/bash exec "/Applications/GemX.app/Contents/MacOS/GemX" "$@" EOF sudo chmod +x /usr/local/bin/gemx gemx serve # last-used model, loopback, port 11535 gemx serve --port 8080 --model mlx-community/gemma-4-e4b-it-4bit gemx serve --lan --token my-secret # expose on the LAN with auth gemx serve --help
-
Same engine, no window. A
--servemode of the app binary boots the MLX runtime + the API proxy with the Dock icon hidden, prints the endpoint, and stays in the foreground until you Ctrl-C (which frees both ports cleanly). Launch the GemX app once first so the model weights and Python runtime are set up. (Building from source?npm run servedoes the same against a dev build.) -
GUI and
gemx serveare mutually exclusive. They're the same app sharing one macOS instance and one model server (port 11534), not separate daemons — so run one or the other. If you want the UI and an endpoint at the same time, skipgemx serveand just turn on the in-app Settings ▸ Developer ▸ Local API server.
Redesigned Settings — Sectioned & Navigable
- A real settings layout. The old single flat panel is now a modal with a left nav rail (a horizontal tab strip on narrow windows) across five clearly-named sections: Behavior (thinking, web search + Tavily, context window, sampling), Personalization (custom instructions, personas), Models (downloaded built-ins and custom models, HuggingFace token), Appearance (theme), and Developer (the API server). The confusing "Model" / "Models & Storage" pairing is gone.
- Extracted from the sidebar. The settings UI moved into its own component, leaving the sidebar to focus on conversations.
Find Things Faster — Search, Palette, Context Meter
- Conversation search. A search box at the top of the sidebar filters chats by title and message contents, across both Pinned and Chats.
- Command palette.
⌘Kopens a filtered command list — new chat, search, toggle sidebar, open settings, toggle thinking, switch model. Plus⌘N(new chat) and⌘F(focus search);⌘Bstill toggles the sidebar. - Context-usage meter. The composer footer shows a live
used / maxtoken gauge (green → amber → red as you approach the limit), so you can see how full the window is getting before history starts getting trimmed.
Quieter Input Footer
- Only what matters. The verbose "mic for voice / drop a file" hints below the input are gone. In their place: compact indicators that appear only when active — the conversation's persona, a Thinking badge when reasoning is on, and a Web badge when search is enabled — alongside the context meter and a minimal
⏎ send · ⇧⏎ newlinehint.
Under the Hood
- New
src/main/apiServer.ts— the opt-in OpenAI-compatible reverse proxy (Nodehttp, no new dependency) — plus IPC channelsapiserver:set-config/apiserver:get-statusand a synchronous teardown wired into app quit. It pins the requestmodelto the loaded model and synthesizes/v1/models. - Headless serve mode (
parseServeArgs/runHeadless) added tosrc/main/index.ts;MLX_PORTis now exported so the proxy can target it. Newbin/gemxwrapper shim, annpm run servescript, and a package"bin"entry. Serve mode also disables the GPU / network-service helpers since a windowless server needs neither. - New renderer pieces:
components/Settings.tsx(sectioned modal),components/CommandPalette.tsx(⌘K), andlib/tokens.ts(client-side token estimate for the meter). AppSettingsgainedcustomInstructions,personas,sampling, andapiServer;ChatRequestgainedsystemPromptExtraandsampling;chatSystemPrompt(enableTools)becamechatSystemPrompt(enableTools, extra?). All new settings are defaulted on read, so existing preferences upgrade silently.
Upgrading
Drop the new GemX.app in over the old one. Your conversations, downloaded models, custom models, settings, and HuggingFace / Tavily keys are all preserved — new settings simply appear with sensible defaults. The local API server stays off until you enable it. For terminal use, install the wrapper script once (see Run It Headless above — not a bare symlink) and launch the app at least once so the runtime is installed.
GemX-v0.5.0
v0.5.0 Release Notes
The follow-up to v0.4.0 turns GemX into a genuine document workspace. The headline is local RAG — large files and long transcripts are now indexed on-device and retrieved per question, so you can attach real documents without blowing up the context window. Alongside it, GemX learned to recover the actual math out of your files (Word equations become LaTeX, PDF equations are read straight off the rendered page), gained native PowerPoint (.pptx) support, and got a round of polish on the thinking block, the streaming avatar, and the responsive layout — all still running entirely on your Mac, no cloud.
Local RAG — Attach Big Documents Without Blowing the Context Window
- On-device retrieval, not context stuffing. Large documents (and long Whisper transcripts) are no longer crammed whole into the prompt. GemX chunks them, embeds each chunk locally, and at send time retrieves only the passages relevant to your question — so a 60-page report and a one-line follow-up cost roughly the same.
- Local embeddings via
Supabase/gte-small. A 384-dim embedding model runs through Transformers.js with WebGPU → WASM fallback, weights cached in IndexedDB — the same no-Python, in-browser pattern as Whisper. The first attach shows a one-time "Downloading embedding model… N%"; after that it's instant. - Vectors live on disk, never in
localStorage. Each conversation gets its own on-disk vector store underrag/<conversation>/<doc>/; the chat message keeps only a compact<context name="report.pdf" indexed="84" />marker. Base64 payloads stay out of app state entirely. - Retrieval is ephemeral and per-turn. Fresh, query-relevant excerpts are injected into the outbound message only — never persisted. History stays lean, so it's never the thing that gets pruned away when the window fills.
- Self-cleaning indexes. Deleting a conversation drops its whole RAG folder; removing an indexed attachment before sending deletes just that doc's index — no orphaned vectors left behind.
- A dedicated "Indexing…" state. Chunking + embedding shows its own status in the composer and no longer borrows Whisper's "Loading model…" label, so the two never cross wires.
Real Math Recovery From Documents
- DOCX equations become LaTeX. Word stores math as structured OMML, which
mammothsilently drops. GemX now walksword/document.xmlitself and converts everym:oMathto LaTeX ($…$/$$…$$) in reading order — fractions, sub/superscripts, radicals, n-ary sums and integrals, delimiters, matrices, accents, and a full Unicode-symbol map — falling back tomammothonly on error. Equations render via KaTeX in the reply. - PDF math is read off the page, not the text. PDF math is glyph-soup no parser can recover, so under a multimodal model the relevant pages are rendered to images (
pdfjs-dist) and attached at retrieval time — the model reads the real equations off the actual page. Text-only (mlx-lm) models fall back to plainpdf-parseautomatically. - Honest about what's supported. GemX reads PDF, DOCX, PPTX, and text-based files (code / plaintext / Markdown), plus images and audio — not arbitrary binaries. The docs and README were corrected to say exactly that, rather than implying "any file."
PowerPoint (.pptx) Support
- Decks go straight in. Attach a
.pptxand GemX extracts the slide text, reusing the same OOXML/OMML pipeline as DOCX — so slide equations come through as LaTeX too, with no new heavy dependencies. - True display order. Slides are emitted in the order you'd present them — resolved from
presentation.xmland its relationships, not by filename — so reordered decks read correctly. Each slide gets a## Slide Nheader. - Speaker notes included. Where the substance often lives — each slide's notes are appended under a
_Notes:_line. - Same threshold logic as everything else. Small decks inline; large decks index for retrieval (RAG) exactly like DOCX. (Out of scope for now: slide images, diagrams, and SmartArt — text + equations + notes only.)
UX Polish
- No more mid-response cutoffs on multi-doc chats. Token estimation used to count an image's full base64 length as tokens — attaching a few documents could collapse
max_tokensto its floor and truncate the answer mid-sentence. Images are now charged a flat, realistic prompt cost, so replies run to completion. - Quieter attachments. The redundant "indexed" badge beside documents is gone — the indexing status above the composer is enough.
Under the Hood
- New
src/main/pptx.ts(parsePptxWithMath) and the sharedwalk/local/childElshelpers factored out ofsrc/main/docx.ts, so DOCX and PPTX run through one namespace-agnostic OMML walk. - New renderer libs:
lib/embeddings.ts(gte-small feature-extraction),lib/rag.ts(chunking, cosine retrieval, context building), andlib/pdf.ts(per-page text + JPEG rendering). - New main module
src/main/rag.tsplus IPC channelsrag:put/get/delete/put-pages/get-pagesandfile:parse-pptx. - New dependencies:
pdfjs-dist,jszip,@xmldom/xmldom.
Upgrading
Drop the new GemX.app in over the old one. Your downloaded models, custom models, settings, and HuggingFace / Tavily keys are preserved. Embedding weights download once on your first large attachment and are cached from then on.
GemX-v0.4.0
v0.4.0 Release Notes
The follow-up to v0.3.0 makes GemX feel at home on any Mac, no matter how you keep your desktop. The headline is a full Light mode that spans every surface of the app — chat, composer, sidebar, modals, code blocks, and the README hero — with a three-way Light / Dark / System toggle that follows your macOS Appearance setting live. Alongside it, the assistant's thinking avatar is reborn as the Quantum Binary mark, code blocks are rebuilt to match the clean Gemini-chat look, and the Settings panel is reorganised into a calmer, better-paired layout.
Light Mode — Across the Entire UI
- Three-way theme toggle in Settings. A new Theme control offers System (default), Light, and Dark. System watches your macOS Appearance and switches the whole app the instant you flip it in Control Center or System Settings — no restart, no reload.
- Built on CSS custom properties, not per-component branches. Every colour in the app now reads from a semantic token (
--surface-*,--fg-*,--border-*,--accent-*, syntax--syn-*, …). Dark is the default:rootpalette;[data-theme="light"]flips the values. This keeps both themes pixel-consistent and means future surfaces theme themselves for free. - No white flash on launch. An IIFE runs before React mounts, reads your saved theme from
localStorage, and appliesdata-themeto<html>immediately — so a Light-mode boot never flickers through a dark frame first. - Native chrome stays in sync. The renderer tells the main process the active mode over a new
theme:setIPC channel, which drivesnativeTheme.themeSourceso the titlebar tint and native scrollbars match the in-app theme instead of fighting it. - Theme-conditional artwork. The app ships
icon-light/icon-darkandhero-light/hero-darkassets; the in-app hero and the README hero both swap automatically with the active theme (<picture>+prefers-color-schemeon GitHub). 'system'is resolved, never stored as a literal. Picking System keepsthemeMode: 'system'in settings and resolves to a concrete'light' | 'dark'at runtime viamatchMedia('(prefers-color-scheme: light)'), with a live subscription so the app re-renders when the OS preference changes.
New Assistant Avatar — "Quantum Binary"
- A solid core dot + a hollow ring. The old 4-point sparkle is replaced by a strictly monochrome mark — one filled
qb-coreand one outlinedqb-ring. It reads as black in Light mode and white in Dark, so it never looks tinted or out of place. - Two states that tell you what the model is doing. While thinking, the core and ring orbit a shared invisible centre on opposite sides. When the model starts generating, they slide into a locked bullseye and pulse rhythmically (the ring scales slightly more than the core).
- Tuned motion. The slide-and-lock uses a
cubic-bezier(0.65, 0, 0.35, 1)for a "magnetic friction" feel, and the pulse is delayed0.65sso it only begins once the slide has fully settled. Pure SVG + CSS keyframes; respectsprefers-reduced-motion.
Redesigned Code Blocks
- Gemini-chat style. Code blocks are rebuilt as a single rounded
bg-codepanel — no heavy border separators. A compact header carries an uppercase language label and a Copy button; the code body sits inside with a transparent background so it reads as one clean surface. - Theme-aware syntax. Highlighting is driven by
--syn-*tokens (One Dark in dark, One Light in light), so snippets stay legible and on-palette in either theme.
Settings Panel — Reorganised & Paired
- A calmer top-to-bottom order: Theme → Web Search (+ Tavily key) → Thinking → Context Window → HuggingFace Token. Related controls now sit together instead of being scattered.
- Downloaded Models pinned. The Downloaded Models surface spans its own column (
lg:row-span-5) so built-ins and customs stay visible while you adjust the rest of the panel.
UX Polish
- Cleaner composer attach affordance. The attach control is back to a plain paperclip (the old inline count chip is gone). Attachments render as cards above the textarea — the same clear, low-noise pattern Claude and Gemini use.
- Consistent steppers. The custom-model modal's context-window inputs use the same power-of-two stepper as Settings → Context Window, so the two surfaces feel identical.
Under the Hood
- New
src/renderer/src/lib/theme.ts(resolveTheme,useEffectiveTheme,useAppliedTheme,THEME_OPTIONS) centralises all theme resolution and subscriptions. useAppliedTheme()lets components withoutsettingsin scope (Setup, the empty-state hero) react to theme via aMutationObserveron<html data-theme>.- The full contributor
docs/and from-scratchlearning/curriculum were audited and brought current with the theme system, the Quantum Binary avatar, and the custom-model registry.
Upgrading
Drop the new GemX.app in over the old one. Your downloaded models, custom models, settings, and HuggingFace / Tavily keys are preserved. On first launch GemX defaults to System theme — switch to a fixed Light or Dark any time from Settings.
GemX-v0.3.0
v0.3.0 Release Notes
The follow-up to v0.2.0 opens the model registry: you can now bring your own MLX model from HuggingFace through a new + Custom flow, with a three-layer safety net that prevents the silent crashes that runtime mismatches used to cause. The Settings panel collapses to a single Downloaded Models surface for both built-ins and customs, the assistant gets an animated avatar during thinking and inference, and a round of UX polish lands across the picker, modal, and composer.
Bring Your Own Model — Custom HuggingFace Models
- + Custom button next to the model picker accepts any
mlx-vlmormlx-lmmodel from themlx-community/*HuggingFace organisation. Paste a repo id, GemX probes HuggingFace for sane defaults, and the model joins your picker under a Custom subheader (label + runtime). - mlx-community/ gate* — both the modal (client-side) and the main-process probe reject other namespaces up-front with a link to
huggingface.co/mlx-community/collections. Models from other orgs often aren't MLX-quantised and would fail at load. - Auto-detect runtime, context window, thinking support — fetches
config.json+chat_template.jinjafromhuggingface.co/<repo>/resolve/main/…. Runtime comes frommodel_type+architectures(a known multimodal allow-list + a VL/Vision/Image arch-string check); context max frommax_position_embeddingswithmodel_max_length/n_positionsfallbacks; thinking fromenable_thinkingin either chat-template source. - Lazy
mlx-lminstall — built-ins all run onmlx-vlm, so users who never add a text-only custom model don't pay the install cost. The first time you select anmlx-lmcustom, GemXpip installsmlx-lminto the existing venv (~30 s) and then spawnsmlx_lm.serverinstead ofmlx_vlm.server. Subsequent boots skip the install. - HuggingFace token re-used — gated mlx-community repos pick up the token you've already saved in Settings; no per-model auth surface needed.
- Default context clamps to 16 K — even if a repo reports 256 K, the default lands conservative so first-launch RAM stays modest on 8 GB Macs. You can raise it any time from Settings → Context Window.
Locked Runtime + Immutable Settings at Add Time
- Runtime is auto-detected and locked. The modal shows it as a readonly chip with an "Auto-detected from
config.json— Locked at add time" subtitle. A wrong runtime is the single biggest source of silent server crashes, so the override toggle is gone. If auto-detect was wrong, you remove and re-add. - All settings become immutable after Add. A prominent amber banner at the top of the modal makes this clear: "These settings are locked after Add. To fix a mistake, remove the model from Settings → Downloaded Models and re-add. Double-check every field before clicking Add model." There is no edit UI for custom models — remove + re-add is the only path.
- Other probed fields stay user-editable at add time — label, max context window, default context window, and thinking-support are all overrideable because
config.json max_position_embeddingsis wrong for several real repos and the chat-template heuristic can miss thinking on unusual templates.
Runtime Mismatch Safety Net
- Composer blocks image attach for text-only models. When the active model's runtime is
mlx-lm, drag-and-drop, paste, and the file picker all filter outimage/*files and surface a "This model is text-only — images can't be attached." toast. PDFs, code, audio, and other context attachments continue to work. - Boot-time mismatch detection. A new
RuntimeMismatchError(mirroring the existingMLXVersionMismatchErrorpattern) is thrown bywaitForHealthwhen the server's stderr matches a vision-vs-text architecture failure — direction-aware regexes forvision_tower/image_token_index/preprocessor_config.json(vlm-on-text) andvision encoder/multimodal not supported(lm-on-vision). For custom models, the caller surfaces an action-oriented advisory: "{label} appears to be text-only, but was added as mlx-vlm. Remove from Settings → Downloaded Models and re-add." - Stream-time mismatch detection.
chatStreamwraps its!res.okbranch: ifimage_urlparts were in the request body and the response body mentions image / multimodal rejection, the error is upgraded toRuntimeMismatchErrorand rendered as an error bubble on the offending message. Catches the case where a text-only server loaded fine but rejects images at request time.
Unified Settings: One Downloaded Models Surface
- Custom and cached built-in models merged into a single Downloaded Models list — no more separate "Custom Models" panel. Per-row layout is consistent for both: label first,
custom/active/defaulttags after, repo id underneath for customs. One trash icon, onewindow.confirmconfirmation, one IPC path. - Custom-model deletion removes both the registry entry and the cached weights, so the model vanishes from the picker and the disk in a single click.
- Active model can't be deleted — the trash icon is disabled with the same "Switch to another model first" tooltip that built-ins use.
- Thinking toggle gated on the active model. When the active custom has
thinkingSupported === false, the Thinking buttons in Settings disable with a "This model doesn't expose a reasoning channel — toggle disabled." subtext. Built-ins (Gemma 4 family) all support thinking, so it stays enabled for them. - Context Window slider clamps to the active model's
contextWindowMax— thestepsfilter respects the custom's declared ceiling, with aMath.min(effectiveCtx, maxCtx)belt-and-suspenders clamp for legacycontextWindowOverridevalues that exceeded a since-lowered Max.
Composer & Modal UX Polish
- Animated assistant avatar during thinking and inference — a new
AssistantAvatarcomponent renders an animated logomark in the assistant bubble while the model is generating, replacing the static placeholder. - Mic icon visual parity with the attach icon — the mic SVG was visibly slimmer than the paperclip at the same
h-4 w-4box because its path was thinner. Stroke weight raised and the rect + arc footprint widened so both icons read at the same visual weight in the toolbar. - File-count badge removed entirely. The previous emerald-500 notification dot on the paperclip read like a phone-notification badge; a brief inline-chip replacement still felt redundant. Now the paperclip button stays clean and the attached files render as cards above the textarea (image thumbnails + file-icon-plus-name chips with ×) — same pattern Claude and Gemini use.
- Custom-model context-window inputs match Settings. The modal's Max-context and Default-context fields used to be plain
<input type="number">controls with browser-native spinners. They now use the same display-box + vertical chevron stepper as Settings → Context Window, stepping through the power-of-2 sequence (1 K → 1 M) with disabled boundaries. - Custom subheader in the model picker no longer doubles as a per-row tag. The
customchip on each picker row was redundant and is gone — the Custom section header already labels them. Picker rows now show label + runtime, period. - Description field removed. The custom-model add modal collected a free-text description that nothing in the app ever rendered. Field dropped from the modal and from the
CustomModeltype. Legacy entries with stored descriptions still parse fine — the value is silently ignored.
Documentation & Versioning
- README documents the entire + Custom flow, the locked-runtime + immutable-settings model, image-block behaviour for
mlx-lm, and where runtime-mismatch advisories surface. docs/data-model.mddocuments the updatedCustomModelshape, runtime-lock rationale, and the three-layer safety net.docs/mlx-runtime.mdadds a new "Runtime-mismatch detection" section walking through Composer block + boot-timeRuntimeMismatchError+ chatStream upgrade with the file references.docs/ipc-contract.mddocuments the newprobeCustomModel/addCustomModel/removeCustomModel(name, alsoDeleteCache?)IPC surface and the advisory-defaults model.- Inspiration credit. Ammaar Reshi's gemma-chat — a beautiful single-purpose Gemma desktop client — is now credited in the README as the visual / structural inspiration that started GemX.
Please see the README in the main repository for complete installation instructions, system requirements, updated model memory specifications, and the new Bring your own model walkthrough.
GemX-v0.2.0
v0.2.0 Release Notes
The first follow-up to the initial release brings a new mid-tier model, opt-in step-by-step reasoning, a completely rebuilt download pipeline, a real settings panel for managing local model storage, and a layout that finally adapts to your window size.
New Models & Context Defaults
- Gemma 4 12B added: A dense mid-tier variant (
mlx-community/gemma-4-12B-it-4bit, ~11 GB on disk) now sits between E4B and the 26B MoE. Default 32K context, native max 256K. Sharper reasoning than E4B without the MoE's RAM cliff. Comfortable on a 24 GB Mac; tight at 16 GB. - Context window defaults rebalanced: E4B drops 32K → 16K to leave KV-cache headroom on 8 GB Macs. The 26B MoE drops 128K → 64K to keep its boot-time reservation manageable. 31B rises 32K → 64K to align with the other dense variants. All five models still expose their full native max (128K for E2B/E4B, 256K for 12B / 26B MoE / 31B) via the Context Window slider.
- Disk sizes refined across the registry: E2B ~4 GB, E4B ~5.5 GB, 12B 11 GB, 26B MoE ~16 GB, 31B ~18 GB. The "27B MoE" label was also corrected to "26B MoE" to match Google's naming.
Thinking Mode (Opt-In)
Gemma 4's built-in step-by-step reasoning is now plumbed end-to-end:
- Settings → Thinking → Enabled flips
enable_thinking: trueon every chat request. The model emits its reasoning trace before the final answer. - Collapsible Thinking block appears above the answer. The header cycles through "Thinking → Considering → Planning → Pondering → Reasoning → Sketching" while the trace streams in, then collapses to "Thought process" once the final answer begins. Click any time to re-expand.
- Full markdown inside the block — syntax-highlighted code, KaTeX math, GFM tables, clickable links — same renderer as the main answer.
- No context-window leak: Past thinking traces are stripped at the IPC boundary on every subsequent turn (defense in depth: Gemma 4's chat template also strips them server-side via its
strip_thinking()macro). Reasoning lives in the UI only.
Rebuilt Download Pipeline
The first-launch download was replaced end-to-end after the deprecated huggingface-cli stopped working:
- Manifest-driven byte progress: Total size is fetched from
HfApi.model_info()before the first byte streams, so the progress bar is correct from frame zero — no more "stuck at 0%". - Resume detection: Partial blobs from interrupted previous attempts are detected on next launch; the UI swaps the verb from "Downloading" to "Resuming" and only fetches what's missing.
- Cancel button: Interrupt a download mid-stream and drop back to the model picker — no orphaned Python children, no half-written blobs in an indeterminate state.
- Authoritative completion marker: A
.gemx-verifiedfile is written only aftersnapshot_download(..., local_files_only=True)integrity-checks the cache. Subsequent launches trust this marker exactly — no more "model present but didn't load" edge cases.
Self-Healing MLX Runtime
- Auto-upgrade on version mismatch: When the bundled
mlx-vlmis too old to load a model (e.g.,gemma4_unifiedrequires ≥ 0.6.1), the app catches the typed error, runspip install --upgrade, and retries boot once. No restart, no manual venv surgery. - Lifecycle hardening: A single-flight server wrapper handles dev hot-reload double-mounts cleanly. Orphaned port holders from previous force-quits are reclaimed via
lsofbefore the new server binds. On app quit, the Python child gets a synchronous SIGKILL so nothing leaks across runs.
Settings Panel Expansion
- Downloaded Models: Every cached model is now listed with size and active/default tags, and one-click delete for any that aren't currently loaded. Reclaim disk space without leaving the app.
- Context Window: Power-of-2 slider from 4K up to the model's native max, with a Reset button to fall back to the default.
- Tavily API Key: Paste your free Tavily key (1,000 searches/month, no card required) for faster, more reliable web research. DuckDuckGo remains the fallback when no key is set.
- HuggingFace Token: Optional token field that speeds up gated-repo downloads.
- Thinking: Toggle described above.
Responsive Layout
The whole app now adapts to its window size instead of holding a fixed-width column:
- Minimum window 700×500 (was 820×560) — usable in split-screen and on smaller displays.
- Sidebar tiers:
w-60on roomy windows,w-52between 640–768px, auto-collapsed below 640px on first launch (⌘B and the chevron still toggle manually). - Settings goes horizontal at ≥ 1024px: The modal becomes a wide 2-column layout — HF Token, Web Search, Thinking, and Context Window on the left, Downloaded Models on the right — instead of a tall scrolling stack.
- Setup screens go horizontal at ≥ 1024px: The welcome screen splits into intro / model picker; the in-progress screen splits into stages + error / download progress + cancel.
- Chat area scales padding, the header model dropdown shrinks gracefully, and the composer wraps cleanly at very narrow widths.
UX Polish
- Manual stop now shows a clear "Response stopped" indicator instead of leaving the bubble blank.
- Generation errors show a clean "Something went wrong while generating the response." card instead of streaming raw stack traces into chat.
- Welcome card padding fixed for short windows.
- LM Studio comparison updated to reflect their new
mlx-vlmsupport. GemX's remaining differentiators: on-device Whisper voice input, the multi-step web research workflow with inline[N]citations, and the MIT open-source license.
Please see the README in the main repository for complete installation instructions, system requirements, updated model memory specifications, and troubleshooting steps.
GemX-v0.1.0
v0.1.0 Initial Release Notes
Welcome to the very first release of GemX!, a fully local, native chat application for Apple Silicon Macs designed specifically around the multimodal capabilities of Gemma 4. By leveraging Apple's MLX framework and the mlx-vlm runtime, GemX goes beyond standard text streaming to support native vision, document parsing, and agentic web research directly on your machine.
Key Features in this Release
- Native Multimodal Vision: Paste or drag-and-drop up to four images per message. The mlx-vlm backend ensures Gemma 4 processes these locally through its integrated image encoder.
- Integrated Web Research: GemX performs multi-step web research—searching, fetching top URLs, and synthesizing information—using Tavily or a DuckDuckGo fallback. The model outputs verifiable, clickable citations for its sources.
- Local Voice Transcription: Speak directly to the model using an on-device Whisper model powered by WebGPU, ensuring audio data never leaves your computer.
- Document Context: Attach PDFs, DOCX files, or raw code files. The application extracts the text locally and injects it cleanly as context for your prompt.
- Model Hot-Swapping: Switch seamlessly between four quantized Gemma 4 variants (ranging from the E2B to the 31B model) mid-conversation without restarting the application or losing chat history.
- Native UI Elements: The chat interface supports full GitHub Flavored Markdown, LaTeX math rendering, syntax-highlighted code blocks, collapsible reasoning blocks, and a persistent sidebar featuring pinned chats and auto-generated titles.
First-Launch Automation
Upon opening the application for the first time, GemX handles the backend setup automatically. It will locate your Python installation, provision an isolated virtual environment, install the MLX backend dependencies, and download your selected model weights from HuggingFace.
Please see the README in the main repository for complete installation instructions, system requirements, model memory specifications, and troubleshooting steps.