Add Visual Grid Screenshot and Coordinate-Based Clicking Features

## Feature Request: Visual Grid Overlay and Coordinate-Based Interaction

### Problem Statement
Agents struggle to map visual elements to their HTML/CSS/XPath selectors, especially when dealing with complex or dynamically generated interfaces. This creates friction when trying to automate interactions with web elements that are visually obvious but structurally complex to target.

### Proposed Solution
Implement two complementary features that enable visual-to-coordinate mapping for more intuitive browser automation:

## 1. Grid Screenshot Tool: `mcp__selenium__take_grid_screenshot`

### Functionality
- Inject JavaScript to overlay a coordinate grid on the current viewport
- Display grid lines at regular intervals (e.g., every 50px)
- Show coordinate labels at grid intersections
- Optionally highlight interactive elements with colored borders
- Number clickable elements for easy reference
- Return screenshot with grid overlay visible

### JavaScript Injection Example
```javascript
// Create grid overlay
const gridOverlay = document.createElement('div');
gridOverlay.style.position = 'fixed';
gridOverlay.style.top = '0';
gridOverlay.style.left = '0';
gridOverlay.style.width = '100%';
gridOverlay.style.height = '100%';
gridOverlay.style.pointerEvents = 'none';
gridOverlay.style.zIndex = '999999';

// Add vertical and horizontal grid lines
for (let x = 0; x <= window.innerWidth; x += 50) {
    const vLine = document.createElement('div');
    vLine.style.position = 'absolute';
    vLine.style.left = x + 'px';
    vLine.style.top = '0';
    vLine.style.width = '1px';
    vLine.style.height = '100%';
    vLine.style.background = 'rgba(0, 0, 255, 0.2)';
    gridOverlay.appendChild(vLine);
    
    // Add coordinate label
    if (x % 100 === 0) {
        const label = document.createElement('div');
        label.style.position = 'absolute';
        label.style.left = x + 'px';
        label.style.top = '5px';
        label.style.color = 'blue';
        label.style.fontSize = '10px';
        label.textContent = x;
        gridOverlay.appendChild(label);
    }
}

// Similar for horizontal lines
document.body.appendChild(gridOverlay);

// Highlight clickable elements
const clickables = document.querySelectorAll('a, button, input, [onclick], [role="button"]');
clickables.forEach((el, index) => {
    el.style.outline = '2px solid red';
    // Optionally add number labels
});
```

### Tool Parameters
```python
{
    "grid_spacing": 50,  # Pixels between grid lines
    "show_coordinates": true,  # Show coordinate labels
    "highlight_clickables": true,  # Highlight interactive elements
    "number_elements": false,  # Add numbers to clickable elements
    "outputPath": "/tmp/grid_screenshot.png"  # Optional output path
}
```

## 2. Coordinate Click Tool: `mcp__selenium__click_at_coordinates`

### Functionality
- Accept x,y viewport coordinates
- Use Selenium's ActionChains to click at exact coordinates
- Support both absolute viewport coordinates and element-relative offsets
- Validate coordinates are within viewport bounds
- Optionally scroll if coordinates are outside current view

### Implementation Using ActionChains
```python
from selenium.webdriver.common.action_chains import ActionChains

def click_at_coordinates(driver, x, y, relative_to="viewport"):
    if relative_to == "viewport":
        # Click at absolute viewport coordinates
        actions = ActionChains(driver)
        # Reset to top-left corner
        body = driver.find_element_by_tag_name('body')
        actions.move_to_element_with_offset(body, -body.size['width']/2, -body.size['height']/2)
        # Move to target coordinates and click
        actions.move_by_offset(x, y).click().perform()
    elif relative_to == "center":
        # Click relative to viewport center
        actions = ActionChains(driver)
        actions.move_by_offset(x, y).click().perform()
```

### Tool Parameters
```python
{
    "x": 250,  # X coordinate
    "y": 150,  # Y coordinate
    "relative_to": "viewport",  # "viewport" or "center"
    "scroll_if_needed": true  # Auto-scroll if outside viewport
}
```

## 3. Enhanced Agent Workflow

### Visual Element Identification Process
1. Agent calls `take_grid_screenshot` to get visual with coordinates
2. Analyzes image to identify target element visually
3. Notes grid coordinates (e.g., "blue button at approximately 250,150")
4. Calls `click_at_coordinates(x=250, y=150)` to interact

### Benefits
- **Intuitive**: Agents can describe elements by visual position
- **Fallback**: When selectors fail, coordinates provide alternative
- **Debugging**: Grid screenshots help diagnose clicking issues
- **Universal**: Works with canvas, SVG, and complex dynamic content

## 4. Additional Enhancements

### Element Mapper Tool (Optional)
Create a tool that combines both approaches:
```python
def map_visual_elements(driver):
    # Inject JS to find all clickable elements
    # Get their boundaries and positions
    # Create a mapping table
    # Return: {
    #     "elements": [
    #         {"type": "button", "text": "Submit", "coords": [250, 150], "selector": "#submit-btn"},
    #         {"type": "link", "text": "Home", "coords": [50, 30], "selector": "a.home-link"}
    #     ]
    # }
```

## Implementation Priority
1. **Phase 1**: Implement `click_at_coordinates` using existing ActionChains
2. **Phase 2**: Add `take_grid_screenshot` with basic grid overlay
3. **Phase 3**: Enhance grid with element highlighting and numbering
4. **Phase 4**: Create element mapping utilities

## CRITICAL Considerations
- Coordinates are relative to viewport, not page 
- Account for scroll position when calculating clicks
- Handle high-DPI displays and zoom levels
- Validate coordinates are within viewport bounds
- Consider MoveTargetOutOfBoundsException handling
- Maintain a consistent coordinate space (units and origin) across methods regardless of the grid spacing. i.e. even if the grid leaves 20 pts between lines, the addressable coordinate system is still in pts. 

This feature set would significantly improve the agent's ability to interact with complex web interfaces by bridging the gap between visual identification and programmatic interaction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Visual Grid Screenshot and Coordinate-Based Clicking Features #5

Feature Request: Visual Grid Overlay and Coordinate-Based Interaction

Problem Statement

Proposed Solution

1. Grid Screenshot Tool: `mcpseleniumtake_grid_screenshot`

Functionality

JavaScript Injection Example

Tool Parameters

2. Coordinate Click Tool: `mcpseleniumclick_at_coordinates`

Functionality

Implementation Using ActionChains

Tool Parameters

3. Enhanced Agent Workflow

Visual Element Identification Process

Benefits

4. Additional Enhancements

Element Mapper Tool (Optional)

Implementation Priority

CRITICAL Considerations

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add Visual Grid Screenshot and Coordinate-Based Clicking Features #5

Description

Feature Request: Visual Grid Overlay and Coordinate-Based Interaction

Problem Statement

Proposed Solution

1. Grid Screenshot Tool: mcp__selenium__take_grid_screenshot

Functionality

JavaScript Injection Example

Tool Parameters

2. Coordinate Click Tool: mcp__selenium__click_at_coordinates

Functionality

Implementation Using ActionChains

Tool Parameters

3. Enhanced Agent Workflow

Visual Element Identification Process

Benefits

4. Additional Enhancements

Element Mapper Tool (Optional)

Implementation Priority

CRITICAL Considerations

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. Grid Screenshot Tool: `mcpseleniumtake_grid_screenshot`

2. Coordinate Click Tool: `mcpseleniumclick_at_coordinates`