airc-queue card
Coordinates work via the AIRC queue substrate (airc#562). Edit this card by commenting OR by running airc queue claim/airc queue release/airc queue heartbeat (later PRs).
{
"kind": "airc-queue-card-v1",
"id": "#627",
"owner": "claude-tab-1",
"status": "claimed",
"evidence": "bulk-adopted from continuum unlabeled-open backlog (Joel directive 2026-05-14)",
"next_action": "triage scope + assign or close stale"
}
Close this issue when the work is done (status=merged/abandoned).
Original issue body
Pre-adoption body
The Headline Comparison
Developers care about VRAM, not parameter counts. Our compressed models should beat native models at the same VRAM tier.
Eval Matrix
| Our Model |
VRAM |
Compare Against |
Their VRAM |
Why |
| 27B forged GGUF Q4 |
~10GB |
Qwen3.5-7B, CodeLlama-7B |
~4-7GB |
27B brain in 7B body |
| 35B-A3B (16 experts) |
~3GB |
Any 3B model |
~2-3GB |
35B patterns in 3B |
| 14B compacted GGUF Q4 |
~5GB |
Qwen3.5-7B |
~4-7GB |
Pruned 14B vs native 7B |
| 4B forged GGUF Q4 |
2.6GB |
Qwen2.5-Coder-1.5B |
~1-2GB |
Already done: 53% HumanEval |
Method
All HumanEval via EvalPlus, greedy decoding. Save .jsonl proof files.
GGUF models eval via llama-cpp-python (~10 min each).
fp16 models via HF transformers (~24hr each for 14B+).
Priority
- 27B forged GGUF vs 7B native (THE headline)
- 35B-A3B vs 3B native (MoE surgery headline)
- Controls (base models unforged)
Related
airc-queue card
Coordinates work via the AIRC queue substrate (airc#562). Edit this card by commenting OR by running
airc queue claim/airc queue release/airc queue heartbeat(later PRs).{ "kind": "airc-queue-card-v1", "id": "#627", "owner": "claude-tab-1", "status": "claimed", "evidence": "bulk-adopted from continuum unlabeled-open backlog (Joel directive 2026-05-14)", "next_action": "triage scope + assign or close stale" }Close this issue when the work is done (status=merged/abandoned).
Original issue body
Pre-adoption body
The Headline Comparison
Developers care about VRAM, not parameter counts. Our compressed models should beat native models at the same VRAM tier.
Eval Matrix
Method
All HumanEval via EvalPlus, greedy decoding. Save .jsonl proof files.
GGUF models eval via llama-cpp-python (~10 min each).
fp16 models via HF transformers (~24hr each for 14B+).
Priority
Related