Release v3.7.4 · Pavelevich/llm-checker

Published to npm as llm-checker@3.7.4. Folds in everything since 3.7.0 (3.7.1–3.7.4). Full suite 48/48.

Highlights since 3.7.0:

Correct MoE memory sizing on ALL paths: weights are sized by the TOTAL parameter count and a real observed artifact size always wins, so a large MoE (e.g. a 236B / 397B-A17B model) can no longer falsely "fit" small hardware. Active params drive speed only.
A size-unknown Ollama variant (e.g. :latest) no longer inherits model_sizes[0]: qwen3:latest is sized ~9B (not 30B) and stops poisoning the real qwen3:30b size map (a 19GB model that was falsely fitting 16GB).
Multi-GPU VRAM is no longer double-counted (a 2x24=48GB box stays 48GB).
Recommendation diversity (3.7.1): the registry surfaces Hugging Face / GPT4All models, not just Ollama — quant/shard variants of the same model collapse to one distinct pick, and a source that scores close to the top is guaranteed a slot. Use --runtime vllm|mlx|llama.cpp|transformers or --source to target explicitly.
Registry CLI validation (3.7.3): registry-search/registry-recommend reject invalid --source/--format/--runtime/--optimize with a clear error, and never silently fall back to the built-in catalog when no artifacts match.
Registry ingestor data quality (3.7.4): LoRA adapters and optimizer/training files are no longer ingested as models; F16/FP16/BF16 are precisions (not quantizations); GPT4All sizes/canonical ids fixed; dead index dropped. Regenerated seed: 3 sources, 3,259 repos, 32,779 artifacts.
filterByCategory and other guards hardened against malformed input; MCP cli_exec now exposes the registry commands.

Full notes: docs/reference/changelog.md

Provide feedback