feat: Screenshot tool with DXCam backend reporting and UIAutomation hang fix by yasuhirofujii-medley · Pull Request #104 · CursorTouch/Windows-MCP

yasuhirofujii-medley · 2026-03-13T05:12:05Z

Summary

This PR adds a dedicated Screenshot tool for fast screenshot-only capture, reports the capture backend (DXCam/Pillow) in the response, and skips expensive UIAutomation window enumeration in the Screenshot fast path.

These changes build on top of the use_ui_tree=False fast path introduced in PR #98.

Why this is needed

1. Screenshot tool — dedicated fast capture endpoint (`65a9ed3`)

Problem: The existing Snapshot tool, even with use_ui_tree=False, still carries the overhead of being a general-purpose tool. Callers who only need a screenshot have to specify multiple flags (use_vision=True, use_annotation=False, use_ui_tree=False). More importantly, there was no way to invoke a screenshot-only path with a simple, discoverable tool name.

Solution: Added a new Screenshot tool that is purpose-built for fast screenshot capture:

Fixed to use_vision=True, use_annotation=False, use_ui_tree=False
Accepts display parameter (list of display indices) for multi-monitor selection
Single-purpose tool with a clear name that agents can discover easily
DXCam (DirectX) hardware capture is used when display is specified (requires capture_rect)

Also added:

Desktop.parse_display_selection() for robust display parameter handling
Desktop.get_display_union_rect() for computing the capture region from display indices
Shared _capture_desktop_state() helper to deduplicate Snapshot/Screenshot implementation
WINDOWS_MCP_PROFILE_SNAPSHOT env var for per-stage timing instrumentation

2. Capture backend reporting (`5484e46`)

Problem: When debugging screenshot performance, there was no way to tell from the tool response whether DXCam (DirectX, ~10ms) or Pillow (GDI, ~100ms) was used for capture. This made it difficult to confirm that DXCam was actually being activated.

Solution: The get_screenshot() method now tracks the backend used (self._last_screenshot_backend), and the response includes a Screenshot Backend: dxcam or Screenshot Backend: pillow line. The DesktopState dataclass carries a screenshot_backend field.

3. Skip UIAutomation window enumeration for Screenshot tool (`5b22d1b`, `3d751df`)

Problem: Desktop.get_state() unconditionally called get_controls_handles(), get_windows(), and get_active_window() — even when use_ui_tree=False (Screenshot tool). These are UIAutomation API calls that enumerate windows via COM/WM messages. When an application is launching and not responding to window messages (e.g., showing a splash screen), these calls hang for tens of seconds (observed: 47 seconds for a single screenshot).

This is the same class of problem that PR #98 addressed for tree capture, but the window enumeration calls were left in place because the Snapshot response includes window metadata. For the Screenshot tool, however, this metadata is not needed — the purpose is strictly to capture the screen image as fast as possible.

Solution: When use_ui_tree=False, get_state() now skips all three UIAutomation window enumeration calls and returns empty window lists. This eliminates the hang entirely for the Screenshot path.

The comment explaining this was initially written in Japanese, which caused an encoding corruption issue when uv fetched the package from GitHub — multi-byte characters were mangled, newlines were swallowed, and an if statement was merged into a comment line, producing an IndentationError on startup. The comment was rewritten in English to avoid this.

Changes

`src/windows_mcp/main.py`

Added Screenshot tool with display parameter
Extracted _capture_desktop_state() shared helper (used by both Snapshot and Screenshot)
Added _snapshot_profile_enabled() and _as_bool() helpers
Added _build_snapshot_response() to deduplicate response construction
Response includes Screenshot Backend: line when available

`src/windows_mcp/desktop/service.py`

get_state(): Skip get_controls_handles/get_windows/get_active_window when use_ui_tree=False
get_screenshot(): Track _last_screenshot_backend (dxcam/pillow)
Added parse_display_selection() for display parameter validation
Added get_display_union_rect() for computing display capture region
Added per-stage profiling when WINDOWS_MCP_PROFILE_SNAPSHOT=1

`src/windows_mcp/desktop/views.py`

Added screenshot_backend: str | None field to DesktopState

`src/windows_mcp/tree/service.py`

Added screen_box property (used as fallback root box when UI tree is skipped)

`tests/test_snapshot_display_filter.py`

Added tests for parse_display_selection()
Added tests for display-filtered screenshot dimensions
Added tests for use_ui_tree=False tree skip + use_dom validation

Behavior

Default behavior (no breaking changes)

Snapshot tool continues to work exactly as before
All existing parameters and defaults are preserved

New Screenshot tool

{
  "tool": "Screenshot",
  "display": [0]
}

Returns a fast screenshot with DXCam backend (when available), no UI tree, no window enumeration.

Performance impact

Scenario	Before	After
Screenshot during app launch (UIAutomation hang)	~50s	<1s
Normal Screenshot with DXCam	~200ms	~200ms
Snapshot (use_ui_tree=True)	unchanged	unchanged

Testing

python -m pytest -q tests/test_snapshot_display_filter.py
# 11 passed

get_screenshot() で使用されたバックエンド (dxcam/pillow) を追跡し、 DesktopState.screenshot_backend に格納。レスポンステキストに 'Screenshot Backend: dxcam/pillow' 行を追加。 Control Node 側でこの情報をパースしてログに表示することで、 DirectX キャプチャが有効かどうかをリモートから確認可能にする。

use_ui_tree=False (Screenshot tool) の場合、get_controls_handles / get_windows / get_active_window をスキップ。これらの UIAutomation API はアプリ起動中にハングする可能性があり、 Screenshot が 47 秒以上ブロックされるケースがあった。

uv cache fetch corrupted multi-byte (Japanese) characters in comments, causing newlines to be swallowed and merging the if-statement into the comment line, resulting in IndentationError on startup.

Jeomon · 2026-03-15T04:37:41Z

Awesome work and sorry for the late reply..
can you please make __main__.py organized like this file specifically meant for the mcp server.
And what is your thought like usage of mss
https://github.com/BoboTiG/python-mss

yasuhirofujii-medley · 2026-03-15T22:08:08Z

Thanks for the review and the suggestion!

Regarding python-mss:
I've looked into it — mss optimizes the GDI/BitBlt capture path so it's noticeably faster than Pillow's ImageGrab, but DXCam (DXGI Desktop Duplication) still wins significantly since it captures directly from the GPU framebuffer.

That said, mss could be a solid middle-ground fallback for environments where DXCam isn't available (e.g., RDP sessions, VMs without GPU passthrough). The backend selection architecture in this PR already supports pluggable backends via WINDOWS_MCP_SCREENSHOT_BACKEND — adding mss as a fourth option alongside auto, pillow, and dxcam would be straightforward. Happy to add that if you'd like.

Regarding __main__.py organization:
Agreed — it's grown quite large with all tool definitions living in one flat file. I'd prefer to handle that refactor in a separate follow-up PR to keep this one focused on the Screenshot tool / DXCam / UIAutomation hang fix changes. The plan would be to move tool definitions into a tools/ subpackage and keep __main__.py as a thin server entrypoint + CLI. Would that work for you?

Jeomon · 2026-03-15T22:38:00Z

okay cool

yasuhirofujii-medley · 2026-03-15T22:47:23Z

Thanks! If everything looks good, feel free to merge whenever you're ready. 🙏

I'll open a follow-up PR for the __main__.py refactor (tool definitions → tools/ subpackage) once this one lands.

Jeomon · 2026-03-15T23:04:48Z

Could we consider moving the screenshot-related logic into a file called screenshot.py in the desktop folder? I'm not sure if creating a class named Screenshot at this stage makes sense, or if we should just place all screenshot-related functions in that file. What do you think? This way, the desktop service file would be much cleaner.

Jeomon · 2026-03-15T23:06:24Z

In your knowledge can we use the mss or DXcam for macos and linux for capturing the screenshot is there any alternatives

yasuhirofujii-medley · 2026-03-16T00:01:52Z

Re: Moving screenshot logic into desktop/screenshot.py

Agreed — I'll extract the screenshot-related methods into desktop/screenshot.py as part of this PR. I think standalone functions (not a class) is the right call for now. The extracted pieces would be:

get_screenshot_backend() — env var parsing
capture_with_dxcam() / capture_with_pillow() — backend implementations
resolve_dxcam_region() — monitor mapping for DXCam
capture() — main entry point with auto-fallback logic

This keeps service.py focused on desktop state orchestration and makes the screenshot layer independently testable.

Re: Cross-platform screenshot capture

Great question. Here's the landscape:

Backend	Windows	macOS	Linux (X11)	Linux (Wayland)
DXCam	✅ fastest (~10ms)	❌	❌	❌
mss	✅ fast (~20ms)	✅ (Quartz)	✅ (X11)	❌
Pillow ImageGrab	✅ (~50ms)	✅	❌	❌
PyAutoGUI	✅	✅	✅	❌
XDG Desktop Portal	❌	❌	❌	✅

Since Windows-MCP is Windows-focused, I'd suggest keeping DXCam → Pillow as the default chain for now, but designing the screenshot module with a pluggable backend interface so that adding mss (or platform-specific backends) later becomes a drop-in change.

Something like:

# desktop/screenshot.py

def capture(region=None, backend="auto") -> Image:
    """Platform-aware screenshot capture with backend fallback."""
    ...

This way, if the project expands to cross-platform in the future, we just register new backends without touching the capture API. I'll structure the extraction with this in mind.

yasuhirofujii-medley · 2026-03-16T00:03:25Z

One more thought on the cross-platform screenshot architecture — rather than choosing a single library, what about an auto-fallback chain per OS with an env var override?

WINDOWS_MCP_SCREENSHOT_BACKEND=auto (default)

auto → OS detection → fastest-first fallback:

  Windows:  DXCam → mss → Pillow
  macOS:    mss → Pillow
  Linux:    mss

Manual override:
  WINDOWS_MCP_SCREENSHOT_BACKEND=dxcam|mss|pillow

This way:

Users get the fastest available backend automatically with zero config
If a backend causes issues (e.g., DXCam on certain GPU drivers), they can pin it via one env var
Adding a new backend is just inserting it into the fallback chain — no API changes needed
mss becomes the cross-platform baseline, DXCam stays as the Windows fast path

I'll implement this as part of the desktop/screenshot.py extraction. The module would expose a single capture(region, backend="auto") entry point that handles the chain internally.

Does this direction work for you?

Jeomon · 2026-03-16T03:01:27Z

yes please

Jeomon · 2026-03-16T04:51:11Z

Bro the macos and linux thing i just asked for clarification purpose only this mcp is specifically for windows

…Linux paths)

yasuhirofujii-medley · 2026-03-16T04:57:49Z

Good point my bad, I over-interpreted the question and went ahead with macOS/Linux fallback chains you didn't ask for.

Just pushed a fix: removed the platform.system() branching and simplified _auto_backend_chain() to a flat Windows-only chain (dxcam -> mss -> pillow). The mss backend is still useful on Windows as a middle-ground fallback (e.g., RDP sessions where DXCam isn't available), so it stays in the chain just no cross-platform logic anymore.

Changes in latest commit:

Removed platform import and OS detection from screenshot.py
_auto_backend_chain() now always returns ["dxcam", "mss", "pillow"]
Updated tests accordingly 18 passed

Jeomon · 2026-03-16T14:34:41Z

Thanks, friend, I will merge the PR.

yasuhirofujii-medley added 4 commits March 13, 2026 09:25

feat: add Screenshot tool fast path

65a9ed3

fix: replace Japanese comments with English to avoid encoding corruption

3d751df

uv cache fetch corrupted multi-byte (Japanese) characters in comments, causing newlines to be swallowed and merging the if-statement into the comment line, resulting in IndentationError on startup.

refactor: extract screenshot backend logic and add mss fallback chain

724e4d4

fix: simplify screenshot backend chain to Windows-only (remove macOS/…

a6e378a

…Linux paths)

Jeomon merged commit 6573185 into CursorTouch:main Mar 16, 2026

yasuhirofujii-medley mentioned this pull request Mar 17, 2026

refactor: extract tool definitions into tools/ subpackage #111

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Screenshot tool with DXCam backend reporting and UIAutomation hang fix#104

feat: Screenshot tool with DXCam backend reporting and UIAutomation hang fix#104
Jeomon merged 6 commits intoCursorTouch:mainfrom
yasuhirofujii-medley:feat/fast-snapshot-no-tree

yasuhirofujii-medley commented Mar 13, 2026

Uh oh!

Jeomon commented Mar 15, 2026 •

edited

Loading

Uh oh!

yasuhirofujii-medley commented Mar 15, 2026

Uh oh!

Jeomon commented Mar 15, 2026

Uh oh!

yasuhirofujii-medley commented Mar 15, 2026

Uh oh!

Jeomon commented Mar 15, 2026 •

edited

Loading

Uh oh!

Jeomon commented Mar 15, 2026

Uh oh!

yasuhirofujii-medley commented Mar 16, 2026

Uh oh!

yasuhirofujii-medley commented Mar 16, 2026

Uh oh!

Jeomon commented Mar 16, 2026

Uh oh!

Jeomon commented Mar 16, 2026

Uh oh!

yasuhirofujii-medley commented Mar 16, 2026

Uh oh!

Jeomon commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

yasuhirofujii-medley commented Mar 13, 2026

Summary

Why this is needed

1. Screenshot tool — dedicated fast capture endpoint (65a9ed3)

2. Capture backend reporting (5484e46)

3. Skip UIAutomation window enumeration for Screenshot tool (5b22d1b, 3d751df)

Changes

src/windows_mcp/__main__.py

src/windows_mcp/desktop/service.py

src/windows_mcp/desktop/views.py

src/windows_mcp/tree/service.py

tests/test_snapshot_display_filter.py

Behavior

Default behavior (no breaking changes)

New Screenshot tool

Performance impact

Testing

Uh oh!

Jeomon commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yasuhirofujii-medley commented Mar 15, 2026

Uh oh!

Jeomon commented Mar 15, 2026

Uh oh!

yasuhirofujii-medley commented Mar 15, 2026

Uh oh!

Jeomon commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jeomon commented Mar 15, 2026

Uh oh!

yasuhirofujii-medley commented Mar 16, 2026

Uh oh!

yasuhirofujii-medley commented Mar 16, 2026

Uh oh!

Jeomon commented Mar 16, 2026

Uh oh!

Jeomon commented Mar 16, 2026

Uh oh!

yasuhirofujii-medley commented Mar 16, 2026

Uh oh!

Jeomon commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Screenshot tool — dedicated fast capture endpoint (`65a9ed3`)

2. Capture backend reporting (`5484e46`)

3. Skip UIAutomation window enumeration for Screenshot tool (`5b22d1b`, `3d751df`)

`src/windows_mcp/main.py`

`src/windows_mcp/desktop/service.py`

`src/windows_mcp/desktop/views.py`

`src/windows_mcp/tree/service.py`

`tests/test_snapshot_display_filter.py`

Jeomon commented Mar 15, 2026 •

edited

Loading

Jeomon commented Mar 15, 2026 •

edited

Loading