Skip to content

GemX-v0.2.0

Choose a tag to compare

@Avaneesh40585 Avaneesh40585 released this 06 Jun 05:22

v0.2.0 Release Notes

The first follow-up to the initial release brings a new mid-tier model, opt-in step-by-step reasoning, a completely rebuilt download pipeline, a real settings panel for managing local model storage, and a layout that finally adapts to your window size.

New Models & Context Defaults

  • Gemma 4 12B added: A dense mid-tier variant (mlx-community/gemma-4-12B-it-4bit, ~11 GB on disk) now sits between E4B and the 26B MoE. Default 32K context, native max 256K. Sharper reasoning than E4B without the MoE's RAM cliff. Comfortable on a 24 GB Mac; tight at 16 GB.
  • Context window defaults rebalanced: E4B drops 32K → 16K to leave KV-cache headroom on 8 GB Macs. The 26B MoE drops 128K → 64K to keep its boot-time reservation manageable. 31B rises 32K → 64K to align with the other dense variants. All five models still expose their full native max (128K for E2B/E4B, 256K for 12B / 26B MoE / 31B) via the Context Window slider.
  • Disk sizes refined across the registry: E2B ~4 GB, E4B ~5.5 GB, 12B 11 GB, 26B MoE ~16 GB, 31B ~18 GB. The "27B MoE" label was also corrected to "26B MoE" to match Google's naming.

Thinking Mode (Opt-In)

Gemma 4's built-in step-by-step reasoning is now plumbed end-to-end:

  • Settings → Thinking → Enabled flips enable_thinking: true on every chat request. The model emits its reasoning trace before the final answer.
  • Collapsible Thinking block appears above the answer. The header cycles through "Thinking → Considering → Planning → Pondering → Reasoning → Sketching" while the trace streams in, then collapses to "Thought process" once the final answer begins. Click any time to re-expand.
  • Full markdown inside the block — syntax-highlighted code, KaTeX math, GFM tables, clickable links — same renderer as the main answer.
  • No context-window leak: Past thinking traces are stripped at the IPC boundary on every subsequent turn (defense in depth: Gemma 4's chat template also strips them server-side via its strip_thinking() macro). Reasoning lives in the UI only.

Rebuilt Download Pipeline

The first-launch download was replaced end-to-end after the deprecated huggingface-cli stopped working:

  • Manifest-driven byte progress: Total size is fetched from HfApi.model_info() before the first byte streams, so the progress bar is correct from frame zero — no more "stuck at 0%".
  • Resume detection: Partial blobs from interrupted previous attempts are detected on next launch; the UI swaps the verb from "Downloading" to "Resuming" and only fetches what's missing.
  • Cancel button: Interrupt a download mid-stream and drop back to the model picker — no orphaned Python children, no half-written blobs in an indeterminate state.
  • Authoritative completion marker: A .gemx-verified file is written only after snapshot_download(..., local_files_only=True) integrity-checks the cache. Subsequent launches trust this marker exactly — no more "model present but didn't load" edge cases.

Self-Healing MLX Runtime

  • Auto-upgrade on version mismatch: When the bundled mlx-vlm is too old to load a model (e.g., gemma4_unified requires ≥ 0.6.1), the app catches the typed error, runs pip install --upgrade, and retries boot once. No restart, no manual venv surgery.
  • Lifecycle hardening: A single-flight server wrapper handles dev hot-reload double-mounts cleanly. Orphaned port holders from previous force-quits are reclaimed via lsof before the new server binds. On app quit, the Python child gets a synchronous SIGKILL so nothing leaks across runs.

Settings Panel Expansion

  • Downloaded Models: Every cached model is now listed with size and active/default tags, and one-click delete for any that aren't currently loaded. Reclaim disk space without leaving the app.
  • Context Window: Power-of-2 slider from 4K up to the model's native max, with a Reset button to fall back to the default.
  • Tavily API Key: Paste your free Tavily key (1,000 searches/month, no card required) for faster, more reliable web research. DuckDuckGo remains the fallback when no key is set.
  • HuggingFace Token: Optional token field that speeds up gated-repo downloads.
  • Thinking: Toggle described above.

Responsive Layout

The whole app now adapts to its window size instead of holding a fixed-width column:

  • Minimum window 700×500 (was 820×560) — usable in split-screen and on smaller displays.
  • Sidebar tiers: w-60 on roomy windows, w-52 between 640–768px, auto-collapsed below 640px on first launch (⌘B and the chevron still toggle manually).
  • Settings goes horizontal at ≥ 1024px: The modal becomes a wide 2-column layout — HF Token, Web Search, Thinking, and Context Window on the left, Downloaded Models on the right — instead of a tall scrolling stack.
  • Setup screens go horizontal at ≥ 1024px: The welcome screen splits into intro / model picker; the in-progress screen splits into stages + error / download progress + cancel.
  • Chat area scales padding, the header model dropdown shrinks gracefully, and the composer wraps cleanly at very narrow widths.

UX Polish

  • Manual stop now shows a clear "Response stopped" indicator instead of leaving the bubble blank.
  • Generation errors show a clean "Something went wrong while generating the response." card instead of streaming raw stack traces into chat.
  • Welcome card padding fixed for short windows.
  • LM Studio comparison updated to reflect their new mlx-vlm support. GemX's remaining differentiators: on-device Whisper voice input, the multi-step web research workflow with inline [N] citations, and the MIT open-source license.

Please see the README in the main repository for complete installation instructions, system requirements, updated model memory specifications, and troubleshooting steps.