Skip to content

Workflows

Claude edited this page Jun 30, 2026 · 1 revision

Workflows

Concrete, copy-adaptable sequences. The JSON shown is each tool's input. Always read the returned ok and verify with a screenshot.

1. Simple foreground automation

screenshot { "monitor": 1 }
mouse_click { "x": 960, "y": 540 }
type_text { "text": "hello world" }
press_keys { "keys": ["ctrl", "s"] }
screenshot { "monitor": 1 }

2. Background automation (no focus stealing) — Notepad

list_windows { "title_filter": "Notepad" }
list_child_windows { "window_title": "Notepad" }
win_set_control_text { "hwnd": 23924320, "text": "typed in the background" }
screenshot { "window_title": "Notepad" }

Background click (client coords) / keys:

mouse_click { "window_title": "Notepad", "x": 200, "y": 120 }
win_send_keys { "window_title": "Notepad", "keys": ["ctrl", "end"] }

3. Windows headless-with-GUI

create_headless_desktop { "name": "work" }
launch_on_headless_desktop { "name": "work", "command": "notepad.exe" }
list_headless_windows { "name": "work" }
win_set_control_text { "hwnd": 2495156, "text": "running invisibly" }
screenshot { "hwnd": 2495156 }
close_headless_desktop { "name": "work" }

4. Linux headless-with-GUI (Xvfb)

linux_status {}
create_virtual_display { "display": 99, "width": 1280, "height": 800 }
launch_on_virtual_display { "display": 99, "command": "xterm -e bash" }
list_virtual_display_windows { "display": 99 }
type_text { "hwnd": 2097164, "display": 99, "text": "echo hi" }
win_send_keys { "hwnd": 2097164, "display": 99, "keys": ["enter"] }
screenshot { "hwnd": 2097164, "display": 99 }
stop_virtual_display { "display": 99 }

5. Show for login, then hide

show_window { "window_title": "My App" }
hide_window { "window_title": "My App" }

Whole Windows headless desktop:

show_headless_desktop { "name": "work" }
hide_headless_desktop { "name": "work" }

6. Screen recording

start_screen_recording { "fps": 15, "monitor": 1 }
recording_status {}
stop_screen_recording {}

7. Screenshot then crop

screenshot { "monitor": 1 }
crop_image { "input_path": "...screenshot-....png", "left": 0, "top": 0, "width": 400, "height": 300 }

8. Process control

list_processes { "name_filter": "chrome", "sort_by": "memory", "limit": 10 }
kill_process { "pid": 12345 }

9. Throwaway WSL Linux box

wsl_create_temp {}
wsl_run { "distro": "llcu-tmp-...", "command": "apk add --no-cache curl && curl --version" }
wsl_destroy { "name": "llcu-tmp-..." }

10. AutoHotkey background input

ahk_control_send { "text": "hello", "window": "ahk_id 0x1A2B3C" }

Decision guide — which input method?

Situation Use
App is focused, simple mouse_click/type_text/press_keys
No focus stealing, Windows edit control win_set_control_text (WM_SETTEXT)
No focus stealing, general Windows app mouse_click/type_text with hwnd; if ignored → ahk_control_send
Linux background input type_text/win_send_keys with hwnd (+display)
App ignores synthetic events focus first (window_action focus) or AHK ControlSend
Invisible run headless desktop (Win) / Xvfb (Linux)
Need Linux on Windows wsl_* tools

See also Macros to turn any of these into a reusable skill.

Clone this wiki locally