Skip to content

Conversation

@ammar-agent
Copy link
Collaborator

@ammar-agent ammar-agent commented Dec 10, 2025

Summary

Fix flaky integration tests for image handling and MCP screenshot functionality.

Changes

Image Test Fix (sendMessage.images.test.ts)

  • Use 8-bit RGB PNGs instead of 1-bit indexed: The original PNGs were using 1-bit colormap encoding which may not be properly processed by vision APIs. Now using explicit -define png:color-type=2 for proper 8-bit per channel RGB encoding.
  • Increase image size: Changed from 1x1 to 4x4 pixels for more reliable vision model processing
  • Better prompts: Updated to explicitly describe the solid-color image and request just the color name
  • Add debug logging: Added logging when sendMessage fails to help diagnose future CI failures
  • Both RED_PIXEL and BLUE_PIXEL fixtures updated to use proper RGB encoding

MCP Screenshot Test Fix (mcpConfig.test.ts)

  • More directive prompts: Changed prompts to explicitly specify the tool names that MUST be used (chrome_navigate_page, chrome_take_screenshot)
  • Add diagnostic logging: When screenshot tool call is missing, log which tools were actually called and the model response

Root Cause Analysis

The image test was failing because:

  1. First retry: API call returns success=false (transient API issue)
  2. Subsequent retries: API call succeeds but returns no text deltas (deltas.length === 0)

Investigation revealed the PNG images were using 1-bit indexed colormap format instead of proper RGB, which may cause issues with vision API processing.

Generated with mux

@ammar-agent ammar-agent force-pushed the ai-tests-6168 branch 2 times, most recently from c5d8be8 to 297563e Compare December 10, 2025 18:28
- Use ImageMagick-generated 4x4 pure red (#FF0000) PNG for reliable color detection
- Update prompt to explicitly describe solid-color image and request just color name
- Tighten assertion to strictly match 'red' instead of accepting wrong colors

_Generated with mux_
- Add explicit tool name requirements (MUST use chrome_navigate_page, chrome_take_screenshot)
- Add diagnostic logging when screenshot tool call is missing
- More directive prompts reduce model non-determinism

_Generated with mux_
- Log result object when sendMessage fails to understand CI failures
- Will help diagnose why API calls are returning success=false

_Generated with mux_
- Previous 1-bit indexed/palette PNGs may not be properly processed by vision APIs
- New 4x4 RGB PNGs with explicit color-type=2 ensure proper 8-bit per channel encoding
- Affects both RED_PIXEL and BLUE_PIXEL test fixtures

_Generated with mux_
@ammar-agent ammar-agent changed the title 🤖 fix: improve image test reliability with 4x4 red pixel 🤖 fix: improve integration test reliability Dec 11, 2025
@ammario ammario changed the title 🤖 fix: improve integration test reliability 🤖 ci: improve integration test reliability Dec 11, 2025
@ammario ammario merged commit a051c81 into main Dec 11, 2025
20 checks passed
@ammario ammario deleted the ai-tests-6168 branch December 11, 2025 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants