feat: replace screenshot_size with screenshot_original_size to fix mouse coordinate mismatch by JezaChen · Pull Request #116 · CursorTouch/Windows-MCP

JezaChen · 2026-03-19T15:17:03Z

Problem

Some models — Claude Opus in particular — tend to rely on the Screenshot tool (rather than Snapshot) to decide where to click or move the mouse. The Screenshot tool returns a screenshot_size field in its response metadata representing the resolution of the image.

However, many LLM servers (e.g. Claude.ai) automatically compress or resize images before passing them to the model in order to reduce token usage. This means the image the model actually receives may be smaller than what screenshot_size indicates. When the model reads coordinates directly from the image and passes them to Click or Move without accounting for this discrepancy, the resulting positions are wrong — often off by a consistent scale factor — causing clicks to land in entirely unintended locations.

For example: the physical screen is 3840×2160, windows-mcp caps the screenshot to 1920×1080, and the LLM server further compresses it to 1024×576. A control appearing at (200, 200) in the received image is actually at (750, 750) on screen — but the model clicks (200, 200) always.

Root Cause

screenshot_size recorded the post-resize resolution on the server side (i.e. after windows-mcp's own downscaling cap). It said nothing about the size of the image the model actually received, and gave the model no guidance on how to reconcile the two. This made the field actively misleading.

Solution

Remove screenshot_size from DesktopState and the response metadata.
Introduce screenshot_original_size, captured immediately after the screenshot is taken, before any server-side downscaling. This represents the true screen coordinate space.
Attach an inline instruction to the screenshot_original_size metadata field, explaining to the model that:
- The image it receives may have been further resized by the LLM server.
- Before performing any mouse action (Click, Move, etc.), it must compare the actual received image dimensions against screenshot_original_size, compute the scale ratio, and apply it to convert image-space coordinates back to screen-space coordinates.
Update the Screenshot tool description (in both snapshot.py and manifest.json) to make this requirement explicit upfront.

Changes

src/windows_mcp/desktop/views.py — replace screenshot_size: Size | None with screenshot_original_size: Size | None
src/windows_mcp/desktop/service.py — capture screenshot_original_size before the resize step instead of after
src/windows_mcp/tools/_snapshot_helpers.py — update metadata output to emit screenshot_original_size with coordinate-scaling guidance
src/windows_mcp/tools/snapshot.py — update Screenshot tool description
manifest.json — sync Screenshot tool description
tests/test_snapshot_display_filter.py — update tests to use the new field name

…use coordinate mismatch The previous screenshot_size recorded the post-resize dimensions, which misled the LLM into using downscaled coordinates for mouse actions. Now capture the pre-resize original size instead, and include coordinate scaling guidance in both the response metadata and the Screenshot tool description (snapshot.py + manifest.json). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR addresses mouse coordinate mismatches caused by LLM-side image resizing by replacing the misleading screenshot_size metadata with screenshot_original_size, captured before any server-side downscaling, and by updating Screenshot guidance to instruct coordinate rescaling.

Changes:

Renames DesktopState.screenshot_size to screenshot_original_size and captures it before resizing in desktop state collection.
Updates snapshot response metadata text and Screenshot tool descriptions to explain how to scale image coordinates back to screen coordinates.
Updates affected tests to use the new field name.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`src/windows_mcp/desktop/views.py`	Renames the desktop state field to `screenshot_original_size`.
`src/windows_mcp/desktop/service.py`	Captures original screenshot dimensions prior to any resizing step.
`src/windows_mcp/tools/_snapshot_helpers.py`	Emits `screenshot_original_size` in the response text with coordinate scaling guidance.
`src/windows_mcp/tools/snapshot.py`	Updates the Screenshot tool description to mention coordinate scaling.
`manifest.json`	Syncs the Screenshot tool description update.
`tests/test_snapshot_display_filter.py`	Updates tests to assert against `screenshot_original_size`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

src/windows_mcp/tools/_snapshot_helpers.py

+        metadata_text += (
+            f"Screenshot Original Size: {desktop_state.screenshot_original_size.to_string()}"
+            " (the screenshot may be downscaled; multiply image coordinates by"
+            f" the ratio of original size to displayed size to get actual screen coordinates"
+            " for click, move and other mouse actions)\n"


src/windows_mcp/tools/snapshot.py

    @mcp.tool(
        name='Screenshot',
-        description="Captures a fast screenshot-first desktop snapshot with cursor position, desktop/window summaries, and an image. This path skips UI tree extraction for speed. Use Snapshot when you need interactive element ids, scrollable regions, or browser DOM extraction.",
+        description="Captures a fast screenshot-first desktop snapshot with cursor position, desktop/window summaries, and an image. This path skips UI tree extraction for speed. Use Snapshot when you need interactive element ids, scrollable regions, or browser DOM extraction. Note: the returned image may be downscaled for efficiency; when it is, multiply image coordinates by the ratio of original size to displayed size to get the actual screen coordinates for mouse actions (Click, Move, etc.).",
        annotations=ToolAnnotations(


manifest.json

    {
      "name": "Screenshot",
-      "description": "Captures a fast screenshot-first desktop snapshot with cursor position, active/open windows, and an image. Skips UI tree extraction for speed and should be the default first call when you mainly need visual context. Supports display=[0] or display=[0,1] to limit capture to specific screens."
+      "description": "Captures a fast screenshot-first desktop snapshot with cursor position, active/open windows, and an image. Skips UI tree extraction for speed and should be the default first call when you mainly need visual context. Supports display=[0] or display=[0,1] to limit capture to specific screens. Note: the returned image may be downscaled for efficiency; when it is, multiply image coordinates by the ratio of original size to displayed size to get the actual screen coordinates for mouse actions (Click, Move, etc.)."
    },


Copilot AI review requested due to automatic review settings March 19, 2026 15:17

Copilot AI reviewed Mar 19, 2026

View reviewed changes

Jeomon merged commit 23a0304 into CursorTouch:main Mar 19, 2026
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: replace screenshot_size with screenshot_original_size to fix mouse coordinate mismatch#116

feat: replace screenshot_size with screenshot_original_size to fix mouse coordinate mismatch#116
Jeomon merged 1 commit intoCursorTouch:mainfrom
JezaChen:feat/track-screenshot-original-size

JezaChen commented Mar 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

JezaChen commented Mar 19, 2026

Problem

Root Cause

Solution

Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants