v1.0.0 Release Notes
GemX hits 1.0. The follow-up to v0.5.0 is about making GemX feel like a real tool rather than just a chat window — you can now shape how the model responds (custom instructions, personas, sampling), use GemX as a backend for everything else via an OpenAI-compatible API server you can even run headless from the terminal, and find your way around a redesigned, sectioned Settings panel. Below the input is a quieter status line that tells you only what matters for the next message. Still 100% local, still no cloud.
Make It Yours — Custom Instructions & Personas
- Custom instructions, applied everywhere. A free-text field in Settings ▸ Personalization is appended to GemX's system prompt on every chat — "always answer in British English," "prefer bullet points," whatever you like. It's appended, never substituted, so the built-in date, tool, vision, and citation behavior keeps working.
- Named personas, switched per conversation. Save reusable prompts ("terse Rust reviewer," "Socratic tutor") and pick one per conversation from a dropdown in the chat header. The active persona's prompt rides along with your custom instructions for that thread only.
- It never leaks into history. The personalization text is composed fresh each turn and sent as a separate field — it shapes the system prompt, it doesn't clutter your saved messages.
Dial In The Response — Sampling Controls
- Three presets, one click. Settings ▸ Behavior ▸ Response Style offers Precise / Balanced / Creative — Precise sticks to likely tokens (code, facts), Creative explores more (writing, brainstorming).
- Full control when you want it. An Advanced expander exposes raw
temperature,top_p,top_k, andrepetition_penalty; editing any value flips the preset to "custom." Values are forwarded to the MLX server only when set, so unsupported sampler params are simply ignored.
Use GemX From Anything — Local API Server
- An OpenAI-compatible endpoint. Flip on Settings ▸ Developer ▸ Local API server and point any OpenAI client — your editor, a script, any agent tool — at
http://127.0.0.1:11535/v1. It's a thin proxy in front of the running MLX server, so it serves whichever model is currently loaded — a built-in Gemma 4 variant or one of your own custom models. - Works with agent harnesses, not just chat. GemX emits native OpenAI
tool_calls(Gemma 4 via mlx-vlm, verified end-to-end), so agentic coding tools — Cline, Kilo Code, Continue.dev, Zed, JetBrains AI Assistant, Goose, OpenCode — can drive it, not only chat clients. Setup guides (and the two tools that can't connect, and why) are indocs/api-clients.md. Prefer Gemma 4 12B with a raised context window for agent work. - Safe by default. Off until you enable it, bound to
127.0.0.1only, and/v1/*-only. An optional bearer token can gate requests; LAN exposure (binding0.0.0.0) is a separate, explicit opt-in that requires a token, with a visible warning. The panel shows a copyable base URL and a live running/stopped dot. - Correct for real clients. The proxy pins the request
modelto the loaded model (the MLX server would otherwise try to load whatever id a client sends) and answersGET /v1/modelslocally with exactly the loaded model, so auto-discovery clients aren't misled. - Zero new dependencies. Built on Node's built-in
http, streams responses (SSE) through unbuffered, and survives model switches untouched since it forwards to a fixed internal port.
Run It Headless — gemx serve From The Terminal
-
No GUI required, Ollama-style. Run the same server straight from the terminal. One-time setup puts a small wrapper script on your PATH:
# ⚠️ Use a wrapper script, NOT a bare symlink — macOS Electron resolves its bundled # helper apps relative to the executable path, so a symlinked launch dies with # "Unable to find helper app." If you made a symlink before, remove it first. sudo tee /usr/local/bin/gemx >/dev/null <<'EOF' #!/bin/bash exec "/Applications/GemX.app/Contents/MacOS/GemX" "$@" EOF sudo chmod +x /usr/local/bin/gemx gemx serve # last-used model, loopback, port 11535 gemx serve --port 8080 --model mlx-community/gemma-4-e4b-it-4bit gemx serve --lan --token my-secret # expose on the LAN with auth gemx serve --help
-
Same engine, no window. A
--servemode of the app binary boots the MLX runtime + the API proxy with the Dock icon hidden, prints the endpoint, and stays in the foreground until you Ctrl-C (which frees both ports cleanly). Launch the GemX app once first so the model weights and Python runtime are set up. (Building from source?npm run servedoes the same against a dev build.) -
GUI and
gemx serveare mutually exclusive. They're the same app sharing one macOS instance and one model server (port 11534), not separate daemons — so run one or the other. If you want the UI and an endpoint at the same time, skipgemx serveand just turn on the in-app Settings ▸ Developer ▸ Local API server.
Redesigned Settings — Sectioned & Navigable
- A real settings layout. The old single flat panel is now a modal with a left nav rail (a horizontal tab strip on narrow windows) across five clearly-named sections: Behavior (thinking, web search + Tavily, context window, sampling), Personalization (custom instructions, personas), Models (downloaded built-ins and custom models, HuggingFace token), Appearance (theme), and Developer (the API server). The confusing "Model" / "Models & Storage" pairing is gone.
- Extracted from the sidebar. The settings UI moved into its own component, leaving the sidebar to focus on conversations.
Find Things Faster — Search, Palette, Context Meter
- Conversation search. A search box at the top of the sidebar filters chats by title and message contents, across both Pinned and Chats.
- Command palette.
⌘Kopens a filtered command list — new chat, search, toggle sidebar, open settings, toggle thinking, switch model. Plus⌘N(new chat) and⌘F(focus search);⌘Bstill toggles the sidebar. - Context-usage meter. The composer footer shows a live
used / maxtoken gauge (green → amber → red as you approach the limit), so you can see how full the window is getting before history starts getting trimmed.
Quieter Input Footer
- Only what matters. The verbose "mic for voice / drop a file" hints below the input are gone. In their place: compact indicators that appear only when active — the conversation's persona, a Thinking badge when reasoning is on, and a Web badge when search is enabled — alongside the context meter and a minimal
⏎ send · ⇧⏎ newlinehint.
Under the Hood
- New
src/main/apiServer.ts— the opt-in OpenAI-compatible reverse proxy (Nodehttp, no new dependency) — plus IPC channelsapiserver:set-config/apiserver:get-statusand a synchronous teardown wired into app quit. It pins the requestmodelto the loaded model and synthesizes/v1/models. - Headless serve mode (
parseServeArgs/runHeadless) added tosrc/main/index.ts;MLX_PORTis now exported so the proxy can target it. Newbin/gemxwrapper shim, annpm run servescript, and a package"bin"entry. Serve mode also disables the GPU / network-service helpers since a windowless server needs neither. - New renderer pieces:
components/Settings.tsx(sectioned modal),components/CommandPalette.tsx(⌘K), andlib/tokens.ts(client-side token estimate for the meter). AppSettingsgainedcustomInstructions,personas,sampling, andapiServer;ChatRequestgainedsystemPromptExtraandsampling;chatSystemPrompt(enableTools)becamechatSystemPrompt(enableTools, extra?). All new settings are defaulted on read, so existing preferences upgrade silently.
Upgrading
Drop the new GemX.app in over the old one. Your conversations, downloaded models, custom models, settings, and HuggingFace / Tavily keys are all preserved — new settings simply appear with sensible defaults. The local API server stays off until you enable it. For terminal use, install the wrapper script once (see Run It Headless above — not a bare symlink) and launch the app at least once so the runtime is installed.