Conversation
Adds gaia::Image, ContentPart, and three processQuery overloads (text+images ergonomic, vector<Message>) unified through a private processQueryInternal. All overloads share stateful history (image parts stripped after each turn), a RAII re-entrancy guard, and validate-before- load semantics. New types: Image::fromBytes/fromFile with magic-byte MIME detection, O_NOFOLLOW + post-open fstat TOCTOU guard, 20 MiB cap. base64Encode is RFC 4648 standard alphabet with correct padding. Message gains an additive std::optional<vector<ContentPart>> parts field (toJson dispatches array vs string accordingly, defined out-of-line in types.cpp). Test additions: - cpp/tests/test_image.cpp (18 tests: base64, Image fromBytes/fromFile) - cpp/tests/test_agent_vlm.cpp (13 agent-level mock-server tests) - cpp/tests/test_types.cpp (15 new cases: MIME detection, ContentPart, Message backward-compat, fromUser) - cpp/tests/support/mock_llm_server.h (lifted + extended: receivedBodies, loadRequestCount, holdNextResponse; benchmark shim updated) - cpp/tests/integration/test_integration_vlm.cpp (3 tests: live Lemonade VLM smoke, messages-list overload, ctx-overflow error surface; AC-23 version-pin probe via GAIA_PINNED_LEMONADE_VERSION) Examples: cpp/examples/vlm_agent.cpp Docs: api-reference.mdx VLM section, overview/quickstart mentions Closes #785
- detectImageMimeType: return empty string for unrecognized full-sized buffers; callers (fromBytes, fromFile) now throw invalid_argument with a clear message instead of silently mislabeling data as image/png - image.cpp: wrap POSIX fd in FdGuard RAII so it is closed even when std::vector allocation throws std::bad_alloc after the 20 MiB cap check - processQueryInternal: tighten empty-input check to reject parts vectors containing only empty-text ContentParts (not just an empty vector) - mock_llm_server.h: store receivedBodies_ before incrementing requestCount_ so observers polling requestCount() see body already available - summarizeUserInput: remove unsafe last-element fallback that could use non-USER message content; fall back to first non-empty content instead - test_integration_vlm.cpp: narrow catch(…) in envInt to only catch invalid_argument and out_of_range from std::stoi
…-magic test - types.h: update doc comment for detectImageMimeType to accurately reflect that full-sized (>= 12 byte) buffers with unrecognized magic now return empty string (not "image/png") - test_image.cpp: add ImageFromBytesAutoDetectUnrecognizedThrows to cover the new throw path for unrecognized magic without explicit MIME
|
@claude review this pull request |
PR Review — VLM image support in C++ SDKReviewed the diff (~2350 lines, 19 files). This is solid, well-scoped work. Highlights first, then a handful of concerns. 🟢 Strengths
🟡 Concerns1. 2. Pin-check "non-fatal" is actually a test failure. 3. 🟢 Minor nits
Test plan verification
SummaryRecommend merge after addressing (2) — the pin-check will flake CI otherwise. (1) and (3) are improvements worth a follow-up but not blocking. Nice work on the |
- test_integration_vlm.cpp: replace GTEST_MESSAGE_(kNonFatalFailure) in expectPinnedLemonadeVersion() with std::cerr + SUCCEED() so missing version field or /system-info probe failure is purely informational and does not mark VLM tests as failed; real version mismatches still fail via EXPECT_EQ (blocking concern from review) - types.cpp: expand detectImageMimeType short-buffer comment to explain why "image/png" fallback exists for null/<12 byte buffers (AC-15e contract, test fixtures) vs empty-string return for full-sized unrecognized magic (non-blocking concern from review) - types.cpp: add BMP "BM" false-positive trade-off comment (minor nit) - .gitattributes: mark cpp/tests/fixtures/*.png and *.jpg as binary to prevent LF/CRLF text-filter corruption (minor nit)
|
Thanks for the thorough review! All concerns addressed in 75e7a52: 🟡 Issue 2 (blocking) — pin-check non-fatal flake 🟡 Issue 1 (non-blocking) — short-buffer fallback rationale 🟢 Minor nits
Issue 3 ( 331/331 unit tests pass; integration target builds clean. |
|
@claude review this PR |
PR Review (follow-up after 75e7a52) — VLM image support in C++ SDKRe-reviewed with focus on the fixups in ✅ Prior review concerns — all addressed correctly
🟢 Fresh observations (non-blocking)
Test coverage check
RecommendationReady to merge. Blocking concern is fully resolved; remaining items are cosmetic/future-improvement-class. Nice work on the Review by automated |
SummarySolid, cohesive VLM implementation for the C++ SDK. The refactor unifies all The single most important thing to flag: there are no 🔴 issues, and only a handful of 🟢 minors worth considering. Approve with suggestions. Issues Found🟢 Minor1. Anonymous-namespace helpers could Small perf nit — micro but essentially free: 2. The mock now serves unit tests AND benchmarks, but the enclosing namespace is still 3. When 4. MIME detection fallback semantics are subtle (
5. Most other Strengths
VerdictApprove with suggestions. All 🟢 items above are optional polish — none block merge. The architecture work (single-writer |
## Summary Bump the regression threshold for `memory_per_step_growth_kb` from the generic 15% to 75%, so legitimate single-digit-KB feature growth doesn't false-fail the C++ Benchmarks gate. This is the only metric at a single-digit-KB scale, where ordinary feature work produces large *percent* swings on small *absolute* numbers. ## Root cause `memory_per_step_growth_kb` measures KB of RSS growth per agent loop step. Pre-VLM main was ~7.2 KB/step. PR amd#858 (VLM image support, merged 2026-04-24) added the `Image` type, content-block JSON parsing, and new per-message storage — bumping the metric to ~11.4 KB/step. That's +4 KB absolute, a perfectly reasonable cost for a major new feature, but it reads as +58% on the percent scale, well past the 15% generic threshold. The cached baseline never refreshed because the workflow only saves a new baseline on a successful main push (`if: github.ref == 'refs/heads/main' && github.event_name == 'push'`), and every main push since amd#858 has either failed (this same gate) or been cancelled. So **every PR that touches C++ keeps inheriting the stale pre-VLM baseline** and failing on this one metric. ## Why a threshold change, not a baseline bump A baseline bump alone is a one-shot fix — the next feature that adds another KB-level allocation per message will hit the same wall. A wider threshold for *just this metric* is the structural fix: it acknowledges that on a metric whose absolute values are tiny, percent swings are noisy. Other metrics (binary size, latency, peak memory in MB) keep their tighter thresholds. ## Verification Most recent failing run shows the new value within the new band: ``` loop_latency_median_us 1026.0 918.0 -10.5% 15% IMPROVED memory_baseline_kb 9272.0 9332.0 +0.6% 15% OK memory_peak_kb 9416.0 9560.0 +1.5% 15% OK memory_per_step_growth_kb 7.2 11.4 +58.3% 15% FAIL <-- this one ``` With the 75% threshold the FAIL becomes OK; everything else is unchanged. ## Test plan - [ ] CI green on this PR (the same Windows benchmark job that's been red on every main push since 2026-04-24) - [ ] After merge, baseline auto-refreshes on the next main push, future PRs no longer inherit the stale 7.2 KB baseline
Closes #785
Changes
gaia::ImagewithfromBytes/fromFilefactories, RFC 4648 base64 encoding, magic-byte MIME detection (PNG/JPEG/GIF/WebP/BMP), 20 MiB size cap,O_NOFOLLOW+ post-openfstatTOCTOU guard on POSIX, and a whitelist enforcing only the five supported MIME typesgaia::ContentPart(text / image_url parts withtoJson()producing the OpenAI vision wire format)gaia::Messagewith an additivestd::optional<std::vector<ContentPart>> partsfield;toJson()dispatches to array or string accordingly — fully backward-compatible with existing aggregate-init sitesprocessQueryoverloads (string + vector<Image>ergonomic;vector<Message>caller-composed) unified through a privateprocessQueryInternalthat is the sole writer ofconversationHistory_InFlightGuardviastd::atomic<bool> inFlight_andcompare_exchange_strong— concurrentprocessQuerycalls on the same Agent throwstd::runtime_errorensureModelLoadedso no/loadfires on invalid inputcpp/benchmarks/mock_llm_server.htocpp/tests/support/mock_llm_server.hand extends it withreceivedBodies(),loadRequestCount(),holdNextResponse(), and areportModelLoadedconstructor flag; benchmark header is now a thin shimcpp/examples/vlm_agent.cppend-to-end democpp/tests/integration/test_integration_vlm.cpp(3 tests: live Lemonade VLM smoke, messages-list overload, ctx-overflow error surface) with Lemonade version-pin probe viaGAIA_PINNED_LEMONADE_VERSIONdocs/cpp/api-reference.mdx,docs/cpp/overview.mdx,docs/cpp/quickstart.mdxwith VLM section, new overloads, thread-safety update, and example invocationTest coverage
test_image.cpptest_types.cpptest_agent_vlm.cpptest_integration_vlm.cppReviewer notes
Message::partsfield is additive — all existing code compiles unchanged; consumers linked against a prebuiltgaia_coremust rebuild (noted in docs)detectImageMimeTypereturns""(empty string) for ≥ 12-byte buffers with unrecognized magic; returns"image/png"only for null/short buffers (AC-15e safe-fallback contract)-DGAIA_BUILD_INTEGRATION_TESTS=ONand a live Lemonade server withQwen3-VL-4B-Instruct-GGUF; they are non-blocking in CIerrorCountunused-variable warning inagent.cpp:692is not introduced by this PR