Skip to content

Browser Action Tool: Screenshots Display as Base64 Text After ~5 Actions #7795

@deathtothenig

Description

@deathtothenig

App Version

3.25.17

API Provider

AWS Bedrock

Model Used

Claude 3.7 Sonnet

Roo Code Task Links (Optional)

Browser Action Tool: Screenshots Display as Base64 Text After ~5 Actions

Bug Description

After approximately 5 browser actions, the browser_action tool stops displaying screenshots as images and instead shows massive walls of base64-encoded text in the chat interface. This makes browser automation workflows extremely difficult to follow and debug.

Expected Behavior

Screenshots should consistently display as rendered images throughout the entire browser session, providing visual feedback for each action.

Actual Behavior

  • First 5+ actions: Screenshots display correctly as images
  • Subsequent actions: Screenshots appear as large blocks of base64 text like:
    {"screenshot":"data:image/webp;base64,UklGRjAKAABXRUJQVlA4ICQKAADwJACdASqEA..."}
    

Impact

  • Severe usability degradation: Walls of base64 text flood the interface, making it nearly impossible to read conversation history
  • Loss of visual debugging: Cannot see what's happening on the page during automation
  • Workflow disruption: Long browser sessions become unusable due to interface clutter
  • Poor user experience: The core visual feedback feature becomes counterproductive

Steps to Reproduce

  1. Launch a browser session with browser_action
  2. Perform 5+ sequential actions (click, type, scroll, etc.)
  3. Observe that screenshots transition from images to base64 text blocks

Environment

  • Roo Code Version: 3.25.17
  • OS: Windows 10

Suggested Solutions

  1. Maintain consistent image rendering: Continue displaying screenshots as images regardless of session length
  2. Add configuration option: Allow users to control screenshot display format
  3. Implement smart fallbacks: If memory is a concern, consider thumbnail previews instead of base64 dumps
  4. Session management: Provide automatic browser session reset options to maintain visual feedback

Workarounds Currently Used

  • Manually restarting browser sessions every 4-5 actions
  • Breaking complex workflows into smaller tasks
  • Avoiding longer browser automation sequences

Priority

High - This significantly impacts the usability of the browser automation feature, which is a core functionality of Roo Code.


Additional Notes: The base64 format suggests the screenshots are still being captured correctly, but there's likely a UI rendering threshold or memory management issue causing the display to fall back to raw data instead of rendered images.

🔁 Steps to Reproduce

  1. Launch a browser session with browser_action
  2. Perform 5+ sequential actions (click, type, scroll, etc.)
  3. Observe that screenshots transition from images to base64 text blocks

💥 Outcome Summary

Screenshots should consistently display as rendered images throughout the entire browser session, providing visual feedback for each action.

First 5+ actions: Screenshots display correctly as images
Subsequent actions: Screenshots appear as large blocks of base64 text like:
{"screenshot":"data:image/webp;base64,UklGRjAKAABXRUJQVlA4ICQKAADwJACdASqEA..."}

📄 Relevant Logs or Errors (Optional)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Issue - Needs ScopingValid, but needs effort estimate or design input before work can start.bugSomething isn't working

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions