Skip to content

fix: keyboard focus not transferred when switching foreground window#106

Merged
Jeomon merged 1 commit intoCursorTouch:mainfrom
JezaChen:fix/bring-window-to-top-focus
Mar 15, 2026
Merged

fix: keyboard focus not transferred when switching foreground window#106
Jeomon merged 1 commit intoCursorTouch:mainfrom
JezaChen:fix/bring-window-to-top-focus

Conversation

@JezaChen
Copy link
Contributor

Problem

When using Windows-MCP from Claude Code (running inside Windows Terminal) to maximize a background window, Claude typically:

  1. Calls App(mode="switch") to bring the target window to the foreground
  2. Calls Shortcut to send Win+Up to maximize the window

However, after step 1, the target window is only brought to the front visually — the keyboard focus remains on the previous foreground window. This is evident from the minimize/maximize/close buttons in the title bar appearing grayed out, indicating the window is not truly activated. As a result, the subsequent Win+Up shortcut is delivered to the wrong window (the previously focused one), and the intended window is never maximized.

video3.mp4

Root Cause

The bring_window_to_top method previously called AttachThreadInput(foreground_thread, target_thread, True), attaching the foreground window's thread directly to the target window's thread. However, SetForegroundWindow is called from the MCP server's own thread — which was not attached to either of these threads.

Windows enforces strict rules on which thread is allowed to change the foreground window. One key criterion is that the calling process must have "received the last input event". Since the MCP thread was not attached to the input queues of the foreground or target threads, it did not inherit this eligibility. Windows would then partially honor the request — moving the window to the top of the Z-order — but refuse to transfer keyboard focus, resulting in the observed behavior.

Fix

Instead of attaching foreground_thread ↔ target_thread, we now:

  1. Obtain the current MCP thread ID via GetCurrentThreadId()
  2. Attach current_tid to both the foreground thread and the target thread
  3. Call SetForegroundWindow / BringWindowToTop / SetWindowPos as before
  4. Detach in reverse order in a finally block

By attaching the MCP thread to both threads, it shares their input state and inherits the eligibility to change the foreground window. This ensures that both the Z-order and keyboard focus are correctly transferred to the target window.

Changes

  • src/windows_mcp/desktop/service.py
    • bring_window_to_top: use GetCurrentThreadId() and attach the MCP thread to both foreground and target threads, instead of cross-attaching the two external threads
    • Added detailed comments explaining the rationale behind the approach

AttachThreadInput requires the calling thread to be one of the two
threads being attached. The previous code attached foreground_thread
to target_thread directly, which could cause Windows to bring the
window to the front visually but refuse to transfer keyboard focus.

Now we attach current_tid to both threads so our MCP thread inherits
the "received the last input event" eligibility, ensuring both the
window z-order and keyboard focus are correctly transferred.
Copilot AI review requested due to automatic review settings March 15, 2026 01:25
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Windows focus activation when switching the foreground window so subsequent keyboard shortcuts (e.g., Win+Up) are delivered to the intended target window, not the previously focused one.

Changes:

  • Update bring_window_to_top to attach the MCP server thread input queue to both the current foreground thread and the target window thread before calling SetForegroundWindow.
  • Detach thread input queue attachments reliably via finally, and add detailed rationale comments explaining the Windows focus rules involved.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

win32process.AttachThreadInput(current_tid, tid, False)

except Exception as e:
logger.exception(f"Failed to bring window to top: {e}")
Comment on lines +557 to +561
attached_threads = []
try:
win32process.AttachThreadInput(foreground_thread, target_thread, True)
attached = True
for thread in (foreground_thread, target_thread):
if thread and thread != current_tid:
win32process.AttachThreadInput(current_tid, thread, True)
@Jeomon Jeomon merged commit 8678bef into CursorTouch:main Mar 15, 2026
3 of 4 checks passed
@Jeomon
Copy link
Member

Jeomon commented Mar 15, 2026

Great Work. I missed that part

@Jeomon
Copy link
Member

Jeomon commented Mar 15, 2026

Bro,
A small doubt,
When we press Ctrl+A in a text, that text is completely selected.

One problem I've seen in browsers in the address bar is that if we click on the address bar, the full text inside the address bar gets selected, and I'm unable to capture that situation, so sometimes the model does the Ctrl+A on top of it, which leads to loss of full selection instead of hardcoding the logic for this element instead can we track that from some sort of a say a parameter

@JezaChen
Copy link
Contributor Author

Bro, A small doubt, When we press Ctrl+A in a text, that text is completely selected.

One problem I've seen in browsers in the address bar is that if we click on the address bar, the full text inside the address bar gets selected, and I'm unable to capture that situation, so sometimes the model does the Ctrl+A on top of it, which leads to loss of full selection instead of hardcoding the logic for this element instead can we track that from some sort of a say a parameter

I'll check this issue, was it introduced by my fix?

@Jeomon
Copy link
Member

Jeomon commented Mar 16, 2026

no its something bothers me when i use it the browsers addressbar there one click will auto select the entire url but can't find the parameter in the uia that indicates to llm because of the model is not aware of it

@JezaChen
Copy link
Contributor Author

If possible, could you share a short screen recording when this happens? I feel like I may have seen a similar issue too, but I’m not sure if it’s the exact same one you mean. A recording would make it much easier for me to confirm. I’ll take a look and see if I can resolve it🙂

@Jeomon
Copy link
Member

Jeomon commented Mar 16, 2026

Screen.Recording.2026-03-16.101858.mov

This is the problem being faced, and there could be a solution from the UIA module. This is just me mimicking the problem (shown by the llm); it is a state feedback issue.

@JezaChen
Copy link
Contributor Author

Would it be fair to say that what we really need here is a way for the model to know whether there is already a text selection in the textbox, and if so, what the selection range is? This might require adding a state parameter.

@Jeomon
Copy link
Member

Jeomon commented Mar 16, 2026

The model selects an element based on the coordinate found in the desktop state; it could be a random point inside the bounding box of an element.
However, when it clicks on this address bar, the text gets selected, and these models sometimes set clear=True in the type tool. Then, the already selected text gets deselected and begins typing inside it (which is some URLS), leading to a mess.
so in desktop state something like this text is already selected sort of

@JezaChen
Copy link
Contributor Author

Hey @Jeomon , I found an issue while debugging the tool:

Error calling tool 'Type': name '_INPUTUnion' is not defined

This happens because from .enums import * does not import names that start with _, so _INPUTUnion is skipped. I opened a small PR #108 to fix it first by adding an explicit import:

from .enums import _INPUTUnion

@Jeomon
Copy link
Member

Jeomon commented Mar 16, 2026

merged it thanks again

@JezaChen
Copy link
Contributor Author

Hey bro, I tried this on my side with Claude models as well (Sonnet and Opus). When they call the Type tool with clear=True, the internal Ctrl+A behaves as expected — it selects the text, and the following steps also work as expected: all existing text gets removed and the new text is typed correctly.

So I’m wondering whether Ctrl+A itself can really cancel or invert the selection in your case. I’m not seeing that behavior here. It seems more likely that something else affected the model’s decision and caused an extra click between Ctrl+A and typing, which would collapse the full selection into a caret. I checked the related code, and that seems unlikely🤨, but I think it's still more plausible than Ctrl+A itself breaking the selection.

@Jeomon
Copy link
Member

Jeomon commented Mar 16, 2026

I've created a demonstration video that illustrates an issue I'm experiencing. I want to clarify that it's my demonstration, not the model itself. When you click on the address bar, the entire text is automatically selected, which works as expected. However, the problem occurs when I press "Control + A" again. The complete selection disappears, and the cursor appears where the text is inserted. This seems to be a bug or issue.

If you have some spare time, could you please take a look at the Windows Use project I developed? It's an agent I built on top of Windows MCP. Thank you!

https://github.com/CursorTouch/Windows-Use

@JezaChen
Copy link
Contributor Author

Haha, yeah, I got you — you mean this is from your manual demo, not from the model itself 😄 What really puzzles me is why pressing Ctrl+A again would make the selection disappear. On my machine, it does not toggle like that — it just keeps the full text selected. So that behavior feels really strange to me.

And sure, I’ll take a look at the Windows Use project after work. It sounds like a really exciting tool — thanks for sharing it, and thanks for your work on it!

@Jeomon
Copy link
Member

Jeomon commented Mar 16, 2026

My pleasure, if things go well, I will release the macOS-MCP for a bigger audience.

I truly thank you for your interest in the project

@JezaChen
Copy link
Contributor Author

Hey, I tried Windows-Use with the MiniMax-compatible Anthropic API, and it works as expected!

One thing I wanted to mention, though: when I first followed the Quick Start example in the README, the agent kept complaining with this error:

[WARNING] [Agent] 🚨 Tool 'Shell' failed: Tool 'shell_tool' execution failed:
1 validation error for ToolResult
content
  Input should be a valid string [type=string_type, input_value=<coroutine object shell_t...l at 0x000001AD76773B40>, input_type=coroutine]

I took a quick look at the source code, and it seems that tools like shell_tool have already been implemented using coroutines (async def shell_tool(command: str,timeout:int=10,**kwargs) -> str). Because of that, using them in a synchronous way appears to cause problems.

I also noticed that the README currently includes both synchronous and asynchronous usage examples. So I was wondering: is the synchronous usage still supposed to work, or is the async version the recommended / correct one now?

@Jeomon
Copy link
Member

Jeomon commented Mar 16, 2026

I provided separate support for the sync and async modes of operation of the agent, ig this could fix the issuse

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants