feat: Screenshot tool with DXCam backend reporting and UIAutomation hang fix#104
Conversation
get_screenshot() で使用されたバックエンド (dxcam/pillow) を追跡し、 DesktopState.screenshot_backend に格納。 レスポンステキストに 'Screenshot Backend: dxcam/pillow' 行を追加。 Control Node 側でこの情報をパースしてログに表示することで、 DirectX キャプチャが有効かどうかをリモートから確認可能にする。
use_ui_tree=False (Screenshot tool) の場合、get_controls_handles / get_windows / get_active_window をスキップ。 これらの UIAutomation API はアプリ起動中にハングする可能性があり、 Screenshot が 47 秒以上ブロックされるケースがあった。
uv cache fetch corrupted multi-byte (Japanese) characters in comments, causing newlines to be swallowed and merging the if-statement into the comment line, resulting in IndentationError on startup.
|
Awesome work and sorry for the late reply.. |
|
Thanks for the review and the suggestion! Regarding python-mss: That said, mss could be a solid middle-ground fallback for environments where DXCam isn't available (e.g., RDP sessions, VMs without GPU passthrough). The backend selection architecture in this PR already supports pluggable backends via Regarding |
|
okay cool |
|
Thanks! If everything looks good, feel free to merge whenever you're ready. 🙏 I'll open a follow-up PR for the |
|
Could we consider moving the screenshot-related logic into a file called |
|
In your knowledge can we use the mss or DXcam for macos and linux for capturing the screenshot is there any alternatives |
|
Re: Moving screenshot logic into Agreed — I'll extract the screenshot-related methods into
This keeps Re: Cross-platform screenshot capture Great question. Here's the landscape:
Since Windows-MCP is Windows-focused, I'd suggest keeping Something like: # desktop/screenshot.py
def capture(region=None, backend="auto") -> Image:
"""Platform-aware screenshot capture with backend fallback."""
...This way, if the project expands to cross-platform in the future, we just register new backends without touching the capture API. I'll structure the extraction with this in mind. |
|
One more thought on the cross-platform screenshot architecture — rather than choosing a single library, what about an auto-fallback chain per OS with an env var override? This way:
I'll implement this as part of the Does this direction work for you? |
|
yes please |
|
Bro the macos and linux thing i just asked for clarification purpose only this mcp is specifically for windows |
|
Good point my bad, I over-interpreted the question and went ahead with macOS/Linux fallback chains you didn't ask for. Just pushed a fix: removed the Changes in latest commit:
|
|
Thanks, friend, I will merge the PR. |
Summary
This PR adds a dedicated Screenshot tool for fast screenshot-only capture, reports the capture backend (DXCam/Pillow) in the response, and skips expensive UIAutomation window enumeration in the Screenshot fast path.
These changes build on top of the
use_ui_tree=Falsefast path introduced in PR #98.Why this is needed
1. Screenshot tool — dedicated fast capture endpoint (
65a9ed3)Problem: The existing
Snapshottool, even withuse_ui_tree=False, still carries the overhead of being a general-purpose tool. Callers who only need a screenshot have to specify multiple flags (use_vision=True,use_annotation=False,use_ui_tree=False). More importantly, there was no way to invoke a screenshot-only path with a simple, discoverable tool name.Solution: Added a new
Screenshottool that is purpose-built for fast screenshot capture:use_vision=True,use_annotation=False,use_ui_tree=Falsedisplayparameter (list of display indices) for multi-monitor selectiondisplayis specified (requirescapture_rect)Also added:
Desktop.parse_display_selection()for robust display parameter handlingDesktop.get_display_union_rect()for computing the capture region from display indices_capture_desktop_state()helper to deduplicate Snapshot/Screenshot implementationWINDOWS_MCP_PROFILE_SNAPSHOTenv var for per-stage timing instrumentation2. Capture backend reporting (
5484e46)Problem: When debugging screenshot performance, there was no way to tell from the tool response whether DXCam (DirectX, ~10ms) or Pillow (GDI, ~100ms) was used for capture. This made it difficult to confirm that DXCam was actually being activated.
Solution: The
get_screenshot()method now tracks the backend used (self._last_screenshot_backend), and the response includes aScreenshot Backend: dxcamorScreenshot Backend: pillowline. TheDesktopStatedataclass carries ascreenshot_backendfield.3. Skip UIAutomation window enumeration for Screenshot tool (
5b22d1b,3d751df)Problem:
Desktop.get_state()unconditionally calledget_controls_handles(),get_windows(), andget_active_window()— even whenuse_ui_tree=False(Screenshot tool). These are UIAutomation API calls that enumerate windows via COM/WM messages. When an application is launching and not responding to window messages (e.g., showing a splash screen), these calls hang for tens of seconds (observed: 47 seconds for a single screenshot).This is the same class of problem that PR #98 addressed for tree capture, but the window enumeration calls were left in place because the Snapshot response includes window metadata. For the Screenshot tool, however, this metadata is not needed — the purpose is strictly to capture the screen image as fast as possible.
Solution: When
use_ui_tree=False,get_state()now skips all three UIAutomation window enumeration calls and returns empty window lists. This eliminates the hang entirely for the Screenshot path.The comment explaining this was initially written in Japanese, which caused an encoding corruption issue when
uvfetched the package from GitHub — multi-byte characters were mangled, newlines were swallowed, and anifstatement was merged into a comment line, producing anIndentationErroron startup. The comment was rewritten in English to avoid this.Changes
src/windows_mcp/__main__.pyScreenshottool withdisplayparameter_capture_desktop_state()shared helper (used by both Snapshot and Screenshot)_snapshot_profile_enabled()and_as_bool()helpers_build_snapshot_response()to deduplicate response constructionScreenshot Backend:line when availablesrc/windows_mcp/desktop/service.pyget_state(): Skipget_controls_handles/get_windows/get_active_windowwhenuse_ui_tree=Falseget_screenshot(): Track_last_screenshot_backend(dxcam/pillow)parse_display_selection()for display parameter validationget_display_union_rect()for computing display capture regionWINDOWS_MCP_PROFILE_SNAPSHOT=1src/windows_mcp/desktop/views.pyscreenshot_backend: str | Nonefield toDesktopStatesrc/windows_mcp/tree/service.pyscreen_boxproperty (used as fallback root box when UI tree is skipped)tests/test_snapshot_display_filter.pyparse_display_selection()use_ui_tree=Falsetree skip +use_domvalidationBehavior
Default behavior (no breaking changes)
Snapshottool continues to work exactly as beforeNew Screenshot tool
{ "tool": "Screenshot", "display": [0] }Returns a fast screenshot with DXCam backend (when available), no UI tree, no window enumeration.
Performance impact
Testing
python -m pytest -q tests/test_snapshot_display_filter.py # 11 passed