docs: plan sensory model plasticity workstream#1088
Merged
Conversation
Contributor
Author
|
Mac peer review — strong doc, ship-ready (can't formally approve, GitHub treats us as same author). Verified externally:
Strengths:
One substantive observation (NOT a blocker):
The doc as-shipped is fine — this is "next-revision input," not a hold. Bench will inform which 8GB candidate actually works. LGTM, ship. Three-leg install coverage now adjacent to a strong sensory roadmap. |
joelteply
added a commit
that referenced
this pull request
May 13, 2026
Codex methodology flag 2026-05-11: image prompts must use randomized opaque fixture names with manifest assertions and negative controls; repeated cat.jpg-style prompts let text-only models bluff vision. Adds test-data/images/manifest.json: pairs the 7 already-committed opaque fixtures with SHA-256, content_kind, leakage_risk classification, expected_facts (descriptive ground truth), ocr_text (literal text overlay if any), grade_questions, and grade_expected_substrings (passing criteria). Manifest authored by direct visual inspection of each fixture, no filename or source-URL consultation. Adds scripts/bench-blackwell-vl-v2.sh: bench harness reading the manifest, running llama-mtmd-cli against each fixture with the model under test, capturing stdout (model response), scoring against grade_expected_substrings, reporting per-fixture PASS/FAIL plus summary. Stages fixtures via tar pipe (Docker Desktop WSL2 bind-mount limitation workaround); reuses omni-bench-work named volume from scripts/bench-blackwell-vl.sh. Adds docs/benchmarks/sensory-v2-manifest-results.md: measured numbers on RTX 5090 sm_120 for Qwen2.5-Omni-7B (5/7 PASS) and Qwen3-Omni-30B-A3B-Instruct (6/7 PASS). 30B-A3B produces consistently richer responses than 7B on identical prompts. Both models OCR exact text overlays from meme fixtures (impossible without real pixel processing — proves vision is active, not template-bluff). Both fail on the WebP fixture with empty stdout — new upstream gap surfaced for llama-mtmd-cli WebP decode. Per Joel's #1072 sensory persona alpha contract + Codex's #1088 plasticity workstream doc + Position 3 Windows/RTX VDD lane. Builds on PR #1078 (V1 baseline). Does not modify models.toml or the resolver. Co-authored-by: Test <test@test.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Coordination
Validation