App deadlocks at launch when a Qwen3.5 GGUF is selected (bundled llama.framework hangs in ggml_metal_rsets_init)

# App deadlocks at launch when a Qwen3.5 GGUF is selected (bundled llama.framework hangs in `ggml_metal_rsets_init`)

## Summary
On a fresh launch with `cotabbySelectedModelFilename = Qwen3.5-2B-Q4_K_M.gguf`, Cotabby's main thread deadlocks before the menu bar icon is created. No suggestions ever appear. With the model unselected (or the file moved aside), Cotabby launches normally — so the hang is triggered by the bundled `llama.framework` trying to load this specific model.

Same GGUF loads in ~2 seconds on the same machine using standalone `llama.cpp` (Homebrew build `9310 (e2ef8fe42)`) and LM Studio's bundled `llama-server` (`v2.16.0`), Metal enabled in both. So the model file is fine — the bug is in Cotabby's bundled llama build.

## Environment
- Cotabby **0.1.1-beta** (build 30), bundle id `com.jacobfu.tabby`
- macOS **26.4.1** (25E253), Apple Silicon (M4), 16 GB
- Bundled `llama.framework/Versions/A/llama` is 9.7 MB, fat (x86_64 + arm64). Only version string I can find in the binary is `b8635075f` — please confirm which upstream commit this maps to.
- Engine: `llamaOpenSource`

## Model
- Repo: [lmstudio-community/Qwen3.5-2B-GGUF](https://huggingface.co/lmstudio-community/Qwen3.5-2B-GGUF)
- File: `Qwen3.5-2B-Q4_K_M.gguf` (1.27 GB)
- GGUF v3, `general.architecture = qwen35`, 320 tensors, `qwen35.context_length = 262144`
- The bundled `llama` binary already contains `qwen35.cpp` symbols, so model-arch dispatch isn't the blocker.

## Reproduction
1. Place `Qwen3.5-2B-Q4_K_M.gguf` in the Cotabby model folder.
2. Open Cotabby → settings → Engine: Open Source, pick that model.
3. Quit Cotabby. Relaunch.
4. Menu bar icon never appears; app sits at ~248 MB RSS with all dispatch queues blocked.

If the file is moved out of the model folder before launch, Cotabby starts cleanly (≈77 MB RSS, menu bar icon present).

## Stack (sampled with `sample`)

Main thread and every cooperative-queue task are blocked on a pthread mutex:
```
1653 Thread … DispatchQueue_1: com.apple.main-thread  (serial)
  completeTaskWithClosure → … (Cotabby) … →
  _pthread_mutex_firstfit_lock_slow → _pthread_mutex_firstfit_lock_wait → __psynch_mutexwait
```

The thread holding the lock is spinning forever in `ggml_metal_rsets_init`:
```
1661 Thread … DispatchQueue_13: com.apple.root.default-qos
  start_wqthread → _pthread_wqthread → _dispatch_worker_thread2 → _dispatch_root_queue_drain
  → _dispatch_client_callout → _dispatch_call_block_and_release
  → __ggml_metal_rsets_init_block_invoke  (in llama) + 116
  → usleep → nanosleep → __semwait_signal
```

(Full sample available on request.)

## Side-by-side check that the model is fine
LM Studio's `llama-server` (`v2.16.0`, `5306f4b`), same machine, Metal on, default args:
```
load_model: model loaded                      … in 0.84 s
prompt eval: 35.77 tok/s · eval: 60.30 tok/s
```
Stock Homebrew `llama-cli` `9310 (e2ef8fe42)` also loads it without issue.

## Likely cause / suggestion
The hang is in `__ggml_metal_rsets_init_block_invoke` busy-waiting on a semaphore that never gets signalled, while the main actor holds the lock it needs. This pattern shows up in older ggml-metal Resource Sets init paths when the Metal device init races with the loader thread. Most fixes I've seen for this came in much newer llama.cpp builds — Cotabby's bundled `b8635075f` may simply be older than the working fix.

**Suggested fix**: bump the bundled llama.framework to current upstream (the build I tested, `9310`, is fine), and ideally clamp the loaded context length to a sane default (the model declares `qwen35.context_length = 262144`; if Cotabby is preallocating that, it adds memory pressure even when the hang itself is the primary bug).

## Workaround for users on the current build
- Do not select Qwen3.5 (or any post-`b8635075f` arch) until the framework is bumped.
- Cotabby's recommended Qwen3-0.6B-Q4_K_M and Gemma-3-E2B-Q4_K_M load and run fine.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

App deadlocks at launch when a Qwen3.5 GGUF is selected (bundled llama.framework hangs in ggml_metal_rsets_init) #262

App deadlocks at launch when a Qwen3.5 GGUF is selected (bundled llama.framework hangs in `ggml_metal_rsets_init`)

Summary

Environment

Model

Reproduction

Stack (sampled with `sample`)

Side-by-side check that the model is fine

Likely cause / suggestion

Workaround for users on the current build

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

App deadlocks at launch when a Qwen3.5 GGUF is selected (bundled llama.framework hangs in ggml_metal_rsets_init) #262

Description

App deadlocks at launch when a Qwen3.5 GGUF is selected (bundled llama.framework hangs in ggml_metal_rsets_init)

Summary

Environment

Model

Reproduction

Stack (sampled with sample)

Side-by-side check that the model is fine

Likely cause / suggestion

Workaround for users on the current build

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

App deadlocks at launch when a Qwen3.5 GGUF is selected (bundled llama.framework hangs in `ggml_metal_rsets_init`)

Stack (sampled with `sample`)