App deadlocks at launch when a Qwen3.5 GGUF is selected (bundled llama.framework hangs in ggml_metal_rsets_init)
Summary
On a fresh launch with cotabbySelectedModelFilename = Qwen3.5-2B-Q4_K_M.gguf, Cotabby's main thread deadlocks before the menu bar icon is created. No suggestions ever appear. With the model unselected (or the file moved aside), Cotabby launches normally — so the hang is triggered by the bundled llama.framework trying to load this specific model.
Same GGUF loads in ~2 seconds on the same machine using standalone llama.cpp (Homebrew build 9310 (e2ef8fe42)) and LM Studio's bundled llama-server (v2.16.0), Metal enabled in both. So the model file is fine — the bug is in Cotabby's bundled llama build.
Environment
- Cotabby 0.1.1-beta (build 30), bundle id
com.jacobfu.tabby
- macOS 26.4.1 (25E253), Apple Silicon (M4), 16 GB
- Bundled
llama.framework/Versions/A/llama is 9.7 MB, fat (x86_64 + arm64). Only version string I can find in the binary is b8635075f — please confirm which upstream commit this maps to.
- Engine:
llamaOpenSource
Model
- Repo: lmstudio-community/Qwen3.5-2B-GGUF
- File:
Qwen3.5-2B-Q4_K_M.gguf (1.27 GB)
- GGUF v3,
general.architecture = qwen35, 320 tensors, qwen35.context_length = 262144
- The bundled
llama binary already contains qwen35.cpp symbols, so model-arch dispatch isn't the blocker.
Reproduction
- Place
Qwen3.5-2B-Q4_K_M.gguf in the Cotabby model folder.
- Open Cotabby → settings → Engine: Open Source, pick that model.
- Quit Cotabby. Relaunch.
- Menu bar icon never appears; app sits at ~248 MB RSS with all dispatch queues blocked.
If the file is moved out of the model folder before launch, Cotabby starts cleanly (≈77 MB RSS, menu bar icon present).
Stack (sampled with sample)
Main thread and every cooperative-queue task are blocked on a pthread mutex:
1653 Thread … DispatchQueue_1: com.apple.main-thread (serial)
completeTaskWithClosure → … (Cotabby) … →
_pthread_mutex_firstfit_lock_slow → _pthread_mutex_firstfit_lock_wait → __psynch_mutexwait
The thread holding the lock is spinning forever in ggml_metal_rsets_init:
1661 Thread … DispatchQueue_13: com.apple.root.default-qos
start_wqthread → _pthread_wqthread → _dispatch_worker_thread2 → _dispatch_root_queue_drain
→ _dispatch_client_callout → _dispatch_call_block_and_release
→ __ggml_metal_rsets_init_block_invoke (in llama) + 116
→ usleep → nanosleep → __semwait_signal
(Full sample available on request.)
Side-by-side check that the model is fine
LM Studio's llama-server (v2.16.0, 5306f4b), same machine, Metal on, default args:
load_model: model loaded … in 0.84 s
prompt eval: 35.77 tok/s · eval: 60.30 tok/s
Stock Homebrew llama-cli 9310 (e2ef8fe42) also loads it without issue.
Likely cause / suggestion
The hang is in __ggml_metal_rsets_init_block_invoke busy-waiting on a semaphore that never gets signalled, while the main actor holds the lock it needs. This pattern shows up in older ggml-metal Resource Sets init paths when the Metal device init races with the loader thread. Most fixes I've seen for this came in much newer llama.cpp builds — Cotabby's bundled b8635075f may simply be older than the working fix.
Suggested fix: bump the bundled llama.framework to current upstream (the build I tested, 9310, is fine), and ideally clamp the loaded context length to a sane default (the model declares qwen35.context_length = 262144; if Cotabby is preallocating that, it adds memory pressure even when the hang itself is the primary bug).
Workaround for users on the current build
- Do not select Qwen3.5 (or any post-
b8635075f arch) until the framework is bumped.
- Cotabby's recommended Qwen3-0.6B-Q4_K_M and Gemma-3-E2B-Q4_K_M load and run fine.
App deadlocks at launch when a Qwen3.5 GGUF is selected (bundled llama.framework hangs in
ggml_metal_rsets_init)Summary
On a fresh launch with
cotabbySelectedModelFilename = Qwen3.5-2B-Q4_K_M.gguf, Cotabby's main thread deadlocks before the menu bar icon is created. No suggestions ever appear. With the model unselected (or the file moved aside), Cotabby launches normally — so the hang is triggered by the bundledllama.frameworktrying to load this specific model.Same GGUF loads in ~2 seconds on the same machine using standalone
llama.cpp(Homebrew build9310 (e2ef8fe42)) and LM Studio's bundledllama-server(v2.16.0), Metal enabled in both. So the model file is fine — the bug is in Cotabby's bundled llama build.Environment
com.jacobfu.tabbyllama.framework/Versions/A/llamais 9.7 MB, fat (x86_64 + arm64). Only version string I can find in the binary isb8635075f— please confirm which upstream commit this maps to.llamaOpenSourceModel
Qwen3.5-2B-Q4_K_M.gguf(1.27 GB)general.architecture = qwen35, 320 tensors,qwen35.context_length = 262144llamabinary already containsqwen35.cppsymbols, so model-arch dispatch isn't the blocker.Reproduction
Qwen3.5-2B-Q4_K_M.ggufin the Cotabby model folder.If the file is moved out of the model folder before launch, Cotabby starts cleanly (≈77 MB RSS, menu bar icon present).
Stack (sampled with
sample)Main thread and every cooperative-queue task are blocked on a pthread mutex:
The thread holding the lock is spinning forever in
ggml_metal_rsets_init:(Full sample available on request.)
Side-by-side check that the model is fine
LM Studio's
llama-server(v2.16.0,5306f4b), same machine, Metal on, default args:Stock Homebrew
llama-cli9310 (e2ef8fe42)also loads it without issue.Likely cause / suggestion
The hang is in
__ggml_metal_rsets_init_block_invokebusy-waiting on a semaphore that never gets signalled, while the main actor holds the lock it needs. This pattern shows up in older ggml-metal Resource Sets init paths when the Metal device init races with the loader thread. Most fixes I've seen for this came in much newer llama.cpp builds — Cotabby's bundledb8635075fmay simply be older than the working fix.Suggested fix: bump the bundled llama.framework to current upstream (the build I tested,
9310, is fine), and ideally clamp the loaded context length to a sane default (the model declaresqwen35.context_length = 262144; if Cotabby is preallocating that, it adds memory pressure even when the hang itself is the primary bug).Workaround for users on the current build
b8635075farch) until the framework is bumped.