Skip to content

App deadlocks at launch when a Qwen3.5 GGUF is selected (bundled llama.framework hangs in ggml_metal_rsets_init) #262

@Sewagewaste

Description

@Sewagewaste

App deadlocks at launch when a Qwen3.5 GGUF is selected (bundled llama.framework hangs in ggml_metal_rsets_init)

Summary

On a fresh launch with cotabbySelectedModelFilename = Qwen3.5-2B-Q4_K_M.gguf, Cotabby's main thread deadlocks before the menu bar icon is created. No suggestions ever appear. With the model unselected (or the file moved aside), Cotabby launches normally — so the hang is triggered by the bundled llama.framework trying to load this specific model.

Same GGUF loads in ~2 seconds on the same machine using standalone llama.cpp (Homebrew build 9310 (e2ef8fe42)) and LM Studio's bundled llama-server (v2.16.0), Metal enabled in both. So the model file is fine — the bug is in Cotabby's bundled llama build.

Environment

  • Cotabby 0.1.1-beta (build 30), bundle id com.jacobfu.tabby
  • macOS 26.4.1 (25E253), Apple Silicon (M4), 16 GB
  • Bundled llama.framework/Versions/A/llama is 9.7 MB, fat (x86_64 + arm64). Only version string I can find in the binary is b8635075f — please confirm which upstream commit this maps to.
  • Engine: llamaOpenSource

Model

  • Repo: lmstudio-community/Qwen3.5-2B-GGUF
  • File: Qwen3.5-2B-Q4_K_M.gguf (1.27 GB)
  • GGUF v3, general.architecture = qwen35, 320 tensors, qwen35.context_length = 262144
  • The bundled llama binary already contains qwen35.cpp symbols, so model-arch dispatch isn't the blocker.

Reproduction

  1. Place Qwen3.5-2B-Q4_K_M.gguf in the Cotabby model folder.
  2. Open Cotabby → settings → Engine: Open Source, pick that model.
  3. Quit Cotabby. Relaunch.
  4. Menu bar icon never appears; app sits at ~248 MB RSS with all dispatch queues blocked.

If the file is moved out of the model folder before launch, Cotabby starts cleanly (≈77 MB RSS, menu bar icon present).

Stack (sampled with sample)

Main thread and every cooperative-queue task are blocked on a pthread mutex:

1653 Thread … DispatchQueue_1: com.apple.main-thread  (serial)
  completeTaskWithClosure → … (Cotabby) … →
  _pthread_mutex_firstfit_lock_slow → _pthread_mutex_firstfit_lock_wait → __psynch_mutexwait

The thread holding the lock is spinning forever in ggml_metal_rsets_init:

1661 Thread … DispatchQueue_13: com.apple.root.default-qos
  start_wqthread → _pthread_wqthread → _dispatch_worker_thread2 → _dispatch_root_queue_drain
  → _dispatch_client_callout → _dispatch_call_block_and_release
  → __ggml_metal_rsets_init_block_invoke  (in llama) + 116
  → usleep → nanosleep → __semwait_signal

(Full sample available on request.)

Side-by-side check that the model is fine

LM Studio's llama-server (v2.16.0, 5306f4b), same machine, Metal on, default args:

load_model: model loaded                      … in 0.84 s
prompt eval: 35.77 tok/s · eval: 60.30 tok/s

Stock Homebrew llama-cli 9310 (e2ef8fe42) also loads it without issue.

Likely cause / suggestion

The hang is in __ggml_metal_rsets_init_block_invoke busy-waiting on a semaphore that never gets signalled, while the main actor holds the lock it needs. This pattern shows up in older ggml-metal Resource Sets init paths when the Metal device init races with the loader thread. Most fixes I've seen for this came in much newer llama.cpp builds — Cotabby's bundled b8635075f may simply be older than the working fix.

Suggested fix: bump the bundled llama.framework to current upstream (the build I tested, 9310, is fine), and ideally clamp the loaded context length to a sane default (the model declares qwen35.context_length = 262144; if Cotabby is preallocating that, it adds memory pressure even when the hang itself is the primary bug).

Workaround for users on the current build

  • Do not select Qwen3.5 (or any post-b8635075f arch) until the framework is bumped.
  • Cotabby's recommended Qwen3-0.6B-Q4_K_M and Gemma-3-E2B-Q4_K_M load and run fine.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions