fix: keyboard focus not transferred when switching foreground window#106
Conversation
AttachThreadInput requires the calling thread to be one of the two threads being attached. The previous code attached foreground_thread to target_thread directly, which could cause Windows to bring the window to the front visually but refuse to transfer keyboard focus. Now we attach current_tid to both threads so our MCP thread inherits the "received the last input event" eligibility, ensuring both the window z-order and keyboard focus are correctly transferred.
There was a problem hiding this comment.
Pull request overview
Fixes Windows focus activation when switching the foreground window so subsequent keyboard shortcuts (e.g., Win+Up) are delivered to the intended target window, not the previously focused one.
Changes:
- Update
bring_window_to_topto attach the MCP server thread input queue to both the current foreground thread and the target window thread before callingSetForegroundWindow. - Detach thread input queue attachments reliably via
finally, and add detailed rationale comments explaining the Windows focus rules involved.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| win32process.AttachThreadInput(current_tid, tid, False) | ||
|
|
||
| except Exception as e: | ||
| logger.exception(f"Failed to bring window to top: {e}") |
| attached_threads = [] | ||
| try: | ||
| win32process.AttachThreadInput(foreground_thread, target_thread, True) | ||
| attached = True | ||
| for thread in (foreground_thread, target_thread): | ||
| if thread and thread != current_tid: | ||
| win32process.AttachThreadInput(current_tid, thread, True) |
|
Great Work. I missed that part |
|
Bro, One problem I've seen in browsers in the address bar is that if we click on the address bar, the full text inside the address bar gets selected, and I'm unable to capture that situation, so sometimes the model does the Ctrl+A on top of it, which leads to loss of full selection instead of hardcoding the logic for this element instead can we track that from some sort of a say a parameter |
I'll check this issue, was it introduced by my fix? |
|
no its something bothers me when i use it the browsers addressbar there one click will auto select the entire url but can't find the parameter in the uia that indicates to llm because of the model is not aware of it |
|
If possible, could you share a short screen recording when this happens? I feel like I may have seen a similar issue too, but I’m not sure if it’s the exact same one you mean. A recording would make it much easier for me to confirm. I’ll take a look and see if I can resolve it🙂 |
Screen.Recording.2026-03-16.101858.movThis is the problem being faced, and there could be a solution from the UIA module. This is just me mimicking the problem (shown by the llm); it is a state feedback issue. |
|
Would it be fair to say that what we really need here is a way for the model to know whether there is already a text selection in the textbox, and if so, what the selection range is? This might require adding a state parameter. |
|
The model selects an element based on the coordinate found in the desktop state; it could be a random point inside the bounding box of an element. |
|
Hey @Jeomon , I found an issue while debugging the tool:
This happens because
|
|
merged it thanks again |
|
Hey bro, I tried this on my side with Claude models as well (Sonnet and Opus). When they call the So I’m wondering whether |
|
I've created a demonstration video that illustrates an issue I'm experiencing. I want to clarify that it's my demonstration, not the model itself. When you click on the address bar, the entire text is automatically selected, which works as expected. However, the problem occurs when I press "Control + A" again. The complete selection disappears, and the cursor appears where the text is inserted. This seems to be a bug or issue. If you have some spare time, could you please take a look at the Windows Use project I developed? It's an agent I built on top of Windows MCP. Thank you! |
|
Haha, yeah, I got you — you mean this is from your manual demo, not from the model itself 😄 What really puzzles me is why pressing And sure, I’ll take a look at the Windows Use project after work. It sounds like a really exciting tool — thanks for sharing it, and thanks for your work on it! |
|
My pleasure, if things go well, I will release the macOS-MCP for a bigger audience. I truly thank you for your interest in the project |
|
Hey, I tried Windows-Use with the MiniMax-compatible Anthropic API, and it works as expected! One thing I wanted to mention, though: when I first followed the Quick Start example in the README, the agent kept complaining with this error: I took a quick look at the source code, and it seems that tools like I also noticed that the README currently includes both synchronous and asynchronous usage examples. So I was wondering: is the synchronous usage still supposed to work, or is the async version the recommended / correct one now? |
|
I provided separate support for the sync and async modes of operation of the agent, ig this could fix the issuse |
Problem
When using Windows-MCP from Claude Code (running inside Windows Terminal) to maximize a background window, Claude typically:
App(mode="switch")to bring the target window to the foregroundShortcutto sendWin+Upto maximize the windowHowever, after step 1, the target window is only brought to the front visually — the keyboard focus remains on the previous foreground window. This is evident from the minimize/maximize/close buttons in the title bar appearing grayed out, indicating the window is not truly activated. As a result, the subsequent
Win+Upshortcut is delivered to the wrong window (the previously focused one), and the intended window is never maximized.video3.mp4
Root Cause
The
bring_window_to_topmethod previously calledAttachThreadInput(foreground_thread, target_thread, True), attaching the foreground window's thread directly to the target window's thread. However,SetForegroundWindowis called from the MCP server's own thread — which was not attached to either of these threads.Windows enforces strict rules on which thread is allowed to change the foreground window. One key criterion is that the calling process must have "received the last input event". Since the MCP thread was not attached to the input queues of the foreground or target threads, it did not inherit this eligibility. Windows would then partially honor the request — moving the window to the top of the Z-order — but refuse to transfer keyboard focus, resulting in the observed behavior.
Fix
Instead of attaching
foreground_thread ↔ target_thread, we now:GetCurrentThreadId()current_tidto both the foreground thread and the target threadSetForegroundWindow/BringWindowToTop/SetWindowPosas beforefinallyblockBy attaching the MCP thread to both threads, it shares their input state and inherits the eligibility to change the foreground window. This ensures that both the Z-order and keyboard focus are correctly transferred to the target window.
Changes
src/windows_mcp/desktop/service.pybring_window_to_top: useGetCurrentThreadId()and attach the MCP thread to both foreground and target threads, instead of cross-attaching the two external threads