Skip to content

GemX-v1.0.0

Latest

Choose a tag to compare

@Avaneesh40585 Avaneesh40585 released this 12 Jun 22:32

v1.0.0 Release Notes

GemX hits 1.0. The follow-up to v0.5.0 is about making GemX feel like a real tool rather than just a chat window — you can now shape how the model responds (custom instructions, personas, sampling), use GemX as a backend for everything else via an OpenAI-compatible API server you can even run headless from the terminal, and find your way around a redesigned, sectioned Settings panel. Below the input is a quieter status line that tells you only what matters for the next message. Still 100% local, still no cloud.

Make It Yours — Custom Instructions & Personas

  • Custom instructions, applied everywhere. A free-text field in Settings ▸ Personalization is appended to GemX's system prompt on every chat — "always answer in British English," "prefer bullet points," whatever you like. It's appended, never substituted, so the built-in date, tool, vision, and citation behavior keeps working.
  • Named personas, switched per conversation. Save reusable prompts ("terse Rust reviewer," "Socratic tutor") and pick one per conversation from a dropdown in the chat header. The active persona's prompt rides along with your custom instructions for that thread only.
  • It never leaks into history. The personalization text is composed fresh each turn and sent as a separate field — it shapes the system prompt, it doesn't clutter your saved messages.

Dial In The Response — Sampling Controls

  • Three presets, one click. Settings ▸ Behavior ▸ Response Style offers Precise / Balanced / Creative — Precise sticks to likely tokens (code, facts), Creative explores more (writing, brainstorming).
  • Full control when you want it. An Advanced expander exposes raw temperature, top_p, top_k, and repetition_penalty; editing any value flips the preset to "custom." Values are forwarded to the MLX server only when set, so unsupported sampler params are simply ignored.

Use GemX From Anything — Local API Server

  • An OpenAI-compatible endpoint. Flip on Settings ▸ Developer ▸ Local API server and point any OpenAI client — your editor, a script, any agent tool — at http://127.0.0.1:11535/v1. It's a thin proxy in front of the running MLX server, so it serves whichever model is currently loaded — a built-in Gemma 4 variant or one of your own custom models.
  • Works with agent harnesses, not just chat. GemX emits native OpenAI tool_calls (Gemma 4 via mlx-vlm, verified end-to-end), so agentic coding tools — Cline, Kilo Code, Continue.dev, Zed, JetBrains AI Assistant, Goose, OpenCode — can drive it, not only chat clients. Setup guides (and the two tools that can't connect, and why) are in docs/api-clients.md. Prefer Gemma 4 12B with a raised context window for agent work.
  • Safe by default. Off until you enable it, bound to 127.0.0.1 only, and /v1/*-only. An optional bearer token can gate requests; LAN exposure (binding 0.0.0.0) is a separate, explicit opt-in that requires a token, with a visible warning. The panel shows a copyable base URL and a live running/stopped dot.
  • Correct for real clients. The proxy pins the request model to the loaded model (the MLX server would otherwise try to load whatever id a client sends) and answers GET /v1/models locally with exactly the loaded model, so auto-discovery clients aren't misled.
  • Zero new dependencies. Built on Node's built-in http, streams responses (SSE) through unbuffered, and survives model switches untouched since it forwards to a fixed internal port.

Run It Headless — gemx serve From The Terminal

  • No GUI required, Ollama-style. Run the same server straight from the terminal. One-time setup puts a small wrapper script on your PATH:

    # ⚠️ Use a wrapper script, NOT a bare symlink — macOS Electron resolves its bundled
    # helper apps relative to the executable path, so a symlinked launch dies with
    # "Unable to find helper app." If you made a symlink before, remove it first.
    sudo tee /usr/local/bin/gemx >/dev/null <<'EOF'
    #!/bin/bash
    exec "/Applications/GemX.app/Contents/MacOS/GemX" "$@"
    EOF
    sudo chmod +x /usr/local/bin/gemx
    
    gemx serve                                   # last-used model, loopback, port 11535
    gemx serve --port 8080 --model mlx-community/gemma-4-e4b-it-4bit
    gemx serve --lan --token my-secret           # expose on the LAN with auth
    gemx serve --help
  • Same engine, no window. A --serve mode of the app binary boots the MLX runtime + the API proxy with the Dock icon hidden, prints the endpoint, and stays in the foreground until you Ctrl-C (which frees both ports cleanly). Launch the GemX app once first so the model weights and Python runtime are set up. (Building from source? npm run serve does the same against a dev build.)

  • GUI and gemx serve are mutually exclusive. They're the same app sharing one macOS instance and one model server (port 11534), not separate daemons — so run one or the other. If you want the UI and an endpoint at the same time, skip gemx serve and just turn on the in-app Settings ▸ Developer ▸ Local API server.

Redesigned Settings — Sectioned & Navigable

  • A real settings layout. The old single flat panel is now a modal with a left nav rail (a horizontal tab strip on narrow windows) across five clearly-named sections: Behavior (thinking, web search + Tavily, context window, sampling), Personalization (custom instructions, personas), Models (downloaded built-ins and custom models, HuggingFace token), Appearance (theme), and Developer (the API server). The confusing "Model" / "Models & Storage" pairing is gone.
  • Extracted from the sidebar. The settings UI moved into its own component, leaving the sidebar to focus on conversations.

Find Things Faster — Search, Palette, Context Meter

  • Conversation search. A search box at the top of the sidebar filters chats by title and message contents, across both Pinned and Chats.
  • Command palette. ⌘K opens a filtered command list — new chat, search, toggle sidebar, open settings, toggle thinking, switch model. Plus ⌘N (new chat) and ⌘F (focus search); ⌘B still toggles the sidebar.
  • Context-usage meter. The composer footer shows a live used / max token gauge (green → amber → red as you approach the limit), so you can see how full the window is getting before history starts getting trimmed.

Quieter Input Footer

  • Only what matters. The verbose "mic for voice / drop a file" hints below the input are gone. In their place: compact indicators that appear only when active — the conversation's persona, a Thinking badge when reasoning is on, and a Web badge when search is enabled — alongside the context meter and a minimal ⏎ send · ⇧⏎ newline hint.

Under the Hood

  • New src/main/apiServer.ts — the opt-in OpenAI-compatible reverse proxy (Node http, no new dependency) — plus IPC channels apiserver:set-config / apiserver:get-status and a synchronous teardown wired into app quit. It pins the request model to the loaded model and synthesizes /v1/models.
  • Headless serve mode (parseServeArgs / runHeadless) added to src/main/index.ts; MLX_PORT is now exported so the proxy can target it. New bin/gemx wrapper shim, an npm run serve script, and a package "bin" entry. Serve mode also disables the GPU / network-service helpers since a windowless server needs neither.
  • New renderer pieces: components/Settings.tsx (sectioned modal), components/CommandPalette.tsx (⌘K), and lib/tokens.ts (client-side token estimate for the meter).
  • AppSettings gained customInstructions, personas, sampling, and apiServer; ChatRequest gained systemPromptExtra and sampling; chatSystemPrompt(enableTools) became chatSystemPrompt(enableTools, extra?). All new settings are defaulted on read, so existing preferences upgrade silently.

Upgrading

Drop the new GemX.app in over the old one. Your conversations, downloaded models, custom models, settings, and HuggingFace / Tavily keys are all preserved — new settings simply appear with sensible defaults. The local API server stays off until you enable it. For terminal use, install the wrapper script once (see Run It Headless above — not a bare symlink) and launch the app at least once so the runtime is installed.