# Chapter 20: CONTROLLING THE KEYBOARD AND MOUSE WITH GUI AUTOMATION

`pip install PyAutoGUI`

For linux:
- `sudo apt install scrot python3-tk python3-dev`

### Pauses and Fail-Safes

If your program has a bug and you’re unable to use the keyboard and mouse to shut it down, you can use PyAutoGUI’s fail-safe feature. Quickly slide the mouse to one of the four corners of the screen. Every PyAutoGUI function call has a 10th-of-a-second delay after performing its action to give you enough time to move the mouse to a corner. If PyAutoGUI then finds that the mouse cursor is in a corner, it raises the `pyautogui.FailSafeException` exception. Non-PyAutoGUI instructions will not have this 10th-of-a-second delay.

If you find yourself in a situation where you need to stop your PyAutoGUI program, just slam the mouse toward a corner to stop it.

### Controlling Mouse Movement

In [14]:
import pyautogui

wh = pyautogui.size()  # Obtain the screen resolution.
wh

Size(width=1920, height=1080)

In [15]:
print(wh[0])
print(wh[1])

1920
1080


In [16]:
print(wh.width)
print(wh.height)

1920
1080


### Moving the Mouse

In [17]:
import pyautogui

for i in range(2):  # Move mouse in a square
    pyautogui.moveTo(100, 100, duration=0.2)
    pyautogui.moveTo(200, 100, duration=0.2)
    pyautogui.moveTo(200, 200, duration=0.2)
    pyautogui.moveTo(100, 200, duration=0.2)

The `pyautogui.move()` function moves the mouse cursor *relative to its current position*.

In [18]:
for i in range(2):
    pyautogui.move(100, 0, duration=0.2) # right
    pyautogui.move(0, 100, duration=0.2) # down
    pyautogui.move(-100, 0, duration=0.2) # left
    pyautogui.move(0, -100, duration=0.2) # up

### Getting the Mouse Position

In [19]:
pyautogui.position() # Get current mouse position.

Point(x=552, y=807)

In [20]:
p = pyautogui.position()
p

Point(x=552, y=806)

In [21]:
p[0]  # The x-coordinate is at index 0.

552

In [22]:
p.x  # The x-coordinate is also in the x attribute.

552

### Controlling Mouse Interaction

#### Clicking the Mouse

`pyautogui.click(100, 150, button='left')`

button: 'left', 'middle', 'right'.

In [23]:
import pyautogui

pyautogui.click(10, 5)  # Move mouse to (10, 5) and click.

In [24]:
pyautogui.mouseDown(30, 70)
pyautogui.move(0, 150, duration=0.2)
pyautogui.mouseUp()

In [25]:
pyautogui.doubleClick()

In [26]:
pyautogui.rightClick()

In [27]:
pyautogui.leftClick()

In [28]:
pyautogui.middleClick()

#### Dragging the Mouse

- `pyautogui.drag()`
- `pyautogui.dragTo()`

In [29]:
import time
import pyautogui

time.sleep(3)
pyautogui.click()  # Click to make the window active.
distance = 400
change = 20

while distance > 0:
    pyautogui.drag(distance, 0)  # Move right.
    distance -= change
    pyautogui.drag(0, distance)  # Move down.
    pyautogui.drag(-distance, 0)  # Move left.
    distance -= change
    pyautogui.drag(0, -distance)  # Move up.

#### Scrolling the Mouse

In [None]:
import pyautogui

pyautogui.scroll(200)

### Planning Your Mouse Movements

By default, the 3 Sec. Button Delay checkbox is checked, causing a three-second delay between clicking a Copy or Log button and the copying or logging taking place. This gives you a short amount of time in which to click the button and then move the mouse into your desired position. It may be easier to uncheck this box, move the mouse into position, and press the F1 to F8 keys to copy or log the mouse position.

For example, uncheck the 3 Sec. Button Delay, then move the mouse around the screen while pressing the F6 button, and notice how the x- and y-coordinates of the mouse are recorded in the large text field in the middle of the window. You can later use these coordinates in your PyAutoGUI scripts.

For more information on MouseInfo, review the complete documentation at https://mouseinfo.readthedocs.io/.

In [None]:
import pyautogui

pyautogui.mouseInfo()  # provides mouse coordinate information

### Working with the Screen

#### Getting a Screenshot

In [None]:
import pyautogui

im = pyautogui.screenshot()

The `im` variable will contain the `Image` object of the screenshot. You can now call methods on the `Image` object in the `im` variable, just like any other `Image` object.

#### Analyzing the Screenshot

Say that one of the steps in your GUI automation program is to click a gray button. Before calling the `click()` method, you could take a screenshot and look at the pixel where the script is about to click. If it’s not the same gray as the gray button, then your program knows something is wrong. Maybe the window moved unexpectedly, or maybe a pop-up dialog has blocked the button. At this point, instead of continuing—and possibly wreaking havoc by clicking the wrong thing—your program can “see” that it isn’t clicking the right thing and stop itself. You can obtain the RGB color value of a particular pixel on the screen with the `pixel()` function.

In [None]:
import pyautogui

pyautogui.pixel(0, 0)

(0, 17, 38)

In [None]:
pyautogui.pixel(50, 200)

(40, 44, 52)

The return value from `pixel()` is an RGB tuple of three integers for the amount of red, green, and blue in the pixel. (There is no fourth value for alpha, because screenshot images are fully opaque.)

PyAutoGUI’s `pixelMatchesColor()` function will return True if the pixel at the given x- and y-coordinates on the screen matches the given color. The first and second arguments are integers for the x- and y-coordinates, and the third argument is a tuple of three integers for the RGB color the screen pixel must match.

In [None]:
pyautogui.pixel(50, 200)

(40, 44, 52)

In [None]:
pyautogui.pixelMatchesColor(50, 200, ((40, 44, 52)))

True

In [None]:
pyautogui.pixelMatchesColor(50, 200, (255, 135, 144))

False

### Image Recognition

In [None]:
import pyautogui

b = pyautogui.locateOnScreen('submit.png')
b

Box(left=0, top=0, width=63, height=52)

In [None]:
b[0]

0

In [None]:
b.left

0

In [None]:
pyautogui.click((643, 745, 70, 29))

In [None]:
pyautogui.click(b)

In [None]:
pyautogui.click('submit.png')

In [None]:
try:
    location = pyautogui.locateOnScreen('submit.png')
except:
    print("Image could not be found.")

### Getting Window Information

#### Obtaining the Active Window

In [None]:
import time
import pyautogui

pyautogui.getActiveWindow()

Win32Window(hWnd=918830)

In the interactive shell, call the `pyautogui.getActiveWindow()` function to get a Window object.

Once you have that Window object, you can retrieve any of the object’s attributes, which describe its size, position, and title:

- `left, right, top, bottom` - A single integer for the x- or y-coordinate of the window’s side
- `topleft, topright, bottomleft, bottomright` - A named tuple of two integers for the (x, y) coordinates of the window’s corner
- `midleft, midright, midleft, midright` - A named tuple of two integers for the (x, y) coordinate of the middle of the window’s side
- `width, height` - A single integer for one of the window’s dimensions, in pixels
- `size` - A named tuple of two integers for the (width, height) of the window
- `area` - A single integer representing the area of the window, in pixels
- `center` - A named tuple of two integers for the (x, y) coordinate of the window’s center
- `centerx, centery` - A single integer for the x- or y-coordinate of the window’s center
- `box` - A named tuple of four integers for the (left, top, width, height) measurements of the window
- `title` - A string of the text in the title bar at the top of the window

In [None]:
import pyautogui

wo = pyautogui.getActiveWindow()
wo

Win32Window(hWnd=918830)

In [None]:
str(wo)

'<Win32Window left="-9", top="-9", width="1938", height="1048", title="● chapter-20-controlling-keyboard-and-mouse-with-gui-automation.ipynb - Visual Studio Code">'

In [None]:
wo.title

'● chapter-20-controlling-keyboard-and-mouse-with-gui-automation.ipynb - Visual Studio Code'

In [None]:
wo.size

Size(width=1938, height=1048)

In [None]:
wo.left, wo.top, wo.right, wo.bottom

(-9, -9, 1929, 1039)

In [None]:
wo.topright

Point(x=1929, y=-9)

In [None]:
pyautogui.click(wo.left+10, wo.right+20)

#### Other Ways of Obtaining Windows

The following functions return a list of `Window` objects. If they’re unable to find any windows, they return an empty list:

- `pyautogui.getAllWindows()` - returns a list of Window objects for every visible window on the screen.
- `pyautogui.getWindowsAt(x, y)` - returns a list of Window objects for every visible window that includes the point (x, y).
- `pyautogui.getActiveWindow()` - returns the Window object for the window that is currently receiving keyboard focus.
- `pyautogui.getWindowsWithTitle(title)` - returns a list of Window objects for every visible window that includes the string title in its title bar.
- `pyautogui.getAllTitles()` - returns a list of strings of every visible window.

#### Manipulating Windows

In [None]:
import pyautogui

fw = pyautogui.getActiveWindow()
fw.width # Gets the current width of the window. 

1938

In [None]:
fw.topleft # Gets the current position of the window.

Point(x=-9, y=-9)

In [None]:
fw.width = 1000 # Resizes the width.
fw.topleft = (800, 400) # Moves the window

You can also find out and change the window’s minimized, maximized, and activated states. Try entering the following into the interactive shell:

In [None]:
import pyautogui

fw = pyautogui.getActiveWindow()
fw.isMaximized # Returns True if window is maximized. 

True

In [None]:
fw.isMinimized # Returns True if window is minimized.

False

In [None]:
fw.isActive  # Returns True if window is the active window.

True

In [None]:
fw.maximize()  # Maximizes the window.
fw.isMaximized

True

In [None]:
fw.restore() # Undoes a minimize/maximize action.
fw.minimize() # Minimizes the window. 

import time
# Wait 5 seconds while you activate a different window:
time.sleep(5); fw.activate()
fw.close() # This will close the window you're typing in

### Controlling the Keyboard

PyAutoGUI also has functions for sending virtual keypresses to your computer, which enables you to fill out forms or enter text into applications.

#### Sending a String from the Keyboard

The `pyautogui.write()` function sends virtual keypresses to the computer. What these keypresses do depends on what window is active and what text field has focus. You may want to first send a mouse click to the text field you want in order to ensure that it has focus.

As a simple example, let’s use Python to automatically type the words *Hello, world!* into a file editor window. First, open a new file editor window and position it in the upper-left corner of your screen so that PyAutoGUI will click in the right place to bring it into focus.

In [14]:
## Create a new text file and write 'Hello, world!' in VS Code.
import pyautogui

pyautogui.click(x=76, y=20)  # click 'File' option
pyautogui.click(x=123, y=59)  # click 'New Text File'
pyautogui.write('Hello, world!')

#### Key Names

| Keyboard key string | Meaning |
| :- | :- |
| `'a'`, `'b'`, `'c'`, `'A'`, `'B'`, `'C'`, `'1'`, `'2'`, `'3'`, `'!'`, `'@'`, `'#'`, and so on | The keys for single characters |
| `'enter'` (or `'return'` or `'\n'`) | The `enter` key |
| `'esc'` | The `esc` key |
| `'shiftleft'`, `'shiftright'` | The left and right `shift` keys |
| `'altleft'`, `'altright'` | The left and right `alt` keys |
| `'ctrlleft'`, `'ctrlright'` | The left and right `ctrl` keys |
| `'tab'` (or '`\t'`) | The `tab` key |
| `'backspace'`, `'delete'` | The `backspace` and `delete` keys |
| `'pageup'`, `'pagedown'` | The `page up` and `page down` keys |
| `'home'`, `'end'` | The home and end keys |
| `'up'`, `'down'`, `'left'`, `'right'` | The `up`, `down`, `left`, and `right arrow` keys |
| `'f1'`, `'f2'`, `'f3'`, and so on | The `F1` to `F12` keys |
| `'capslock'`, '`numlock'`, `'scrolllock'` | The `caps lock`, `num lock`, and `scroll lock` keys |
| `'pause'` | The `pause` key |
| `'volumemute'`, `'volumedown'`, `'volumeup'` | The `mute`, `volume down`, and `volume up` keys (some keyboards do not have these keys, but your operating system will still be able to understand these simulated keypresses) |
| `'insert'` | The `ins` or `insert` key |
| `'printscreen'` | The `prtsc` or `print screen` key |
| `'winleft'`, `'winright'`  | The `left` and `right win` keys (on Windows) |
| `'command'` | The `Command` key (on macOS) |
| `'option'` | The `option` key (on macOS) |

#### Pressing and Releasing the Keyboard

Much like the `mouseDown()` and `mouseUp()` functions, `pyautogui.keyDown()` and `pyautogui.keyUp()` will send virtual keypresses and releases to the computer. They are passed a keyboard key string for their argument. For convenience, `PyAutoGUI` provides the `pyautogui.press()` function, which calls both of these functions to simulate a complete keypress.

In [50]:
# type a dollar sign character (obtained by holding the shiFt key and pressing 4):
pyautogui.keyDown('shift'); pyautogui.press('4'); pyautogui.keyUp('shift')

#### Hotkey Combinations

A *hotkey* or *shortcut* is a combination of keypresses to invoke some application function. The common hotkey for copying a selection is `ctrl-C`. The user presses and holds the `ctrl` key, then presses the `C` key, and then releases the `C` and `ctrl` keys.

In [54]:
pyautogui.keyDown('ctrl')
pyautogui.keyDown('c')
pyautogui.keyUp('c')
pyautogui.keyUp('ctrl')

This is rather complicated. Instead, use the `pyautogui.hotkey()` function, which takes multiple keyboard key string arguments, presses them in order, and releases them in the reverse order.

In [57]:
pyautogui.hotkey('ctrl', 'c')

### Setting Up Your GUI Automation Scripts

- Use the same screen resolution each time you run the script so that the position of windows doesn’t change.
- The application window that your script clicks should be maximized so that its buttons and menus are in the same place each time you run the script.
- Add generous pauses while waiting for content to load; you don’t want your script to begin clicking before the application is ready.
- Use locateOnScreen() to find buttons and menus to click, rather than relying on XY coordinates. If your script can’t find the thing it needs to click, stop the program rather than let it continue blindly clicking.
- Use getWindowsWithTitle() to ensure that the application window you think your script is clicking on exists, and use the activate() method to put that window in the foreground.
- Use the logging module from Chapter 11 to keep a log file of what your script has done. This way, if you have to stop your script halfway through a process, you can change it to pick up from where it left off.
- Add as many checks as you can to your script. Think about how it could fail if an unexpected pop-up window appears or if your computer loses its internet connection.
- You may want to supervise the script when it first begins to ensure that it’s working correctly.

In [60]:
import pyautogui

pyautogui.sleep(3) # Pauses the program for 3 seconds.
pyautogui.countdown(10) # Counts down over 10 seconds.

10 9 8 7 6 5 4 3 2 1 


In [61]:
print('Starting in ', end=''); pyautogui.countdown(3)

Starting in 3 2 1 


### Project: Automatic Form Filler

https://autbor.com/form

In [2]:
# Automatically fills in the form at autbor.com/form

import time
import pyautogui

formData = [{'name': 'Alice', 'fear': 'eavesdroppers', 'source': 'wand', 'robocop': 4, 'comments': 'Tell Bob I said hi.'},
            {'name': 'Bob', 'fear': 'bees', 'source': 'amulet', 'robocop': 4, 'comments': 'n/a'},
            {'name': 'Carol', 'fear': 'puppets', 'source': 'crystal ball', 'robocop': 1, 'comments': 'Please take the puppets out of the break room.'},
            {'name': 'Alex Murphy', 'fear': 'ED-209', 'source': 'money', 'robocop': 5, 'comments': 'Protect the innocent. Serve the public trust. Uphold the law.'},
            ]

# pyautogui.PAUSE = 0.5
print('Ensure that the browser window is active and the form is loaded!')

for person in formData:
    # Give the user a chance to kill the script
    print('>>> 5 SECOND PAUSE TO LET USER PRESS CTRL-C <<<')
    pyautogui.countdown(5)

    print(f"Entering {person['name']} info...")
    pyautogui.write(['\t']*4)

    # Fill out Name field
    pyautogui.write(person['name'] + '\t')

    # Fill out Greatest Fear(s) field
    pyautogui.write(person['fear'] + '\t')

    # Fill out Source of Wizard Powers field
    if person['source'] == 'wand':
        pyautogui.write(['down', '\n', '\t'], 0.5)
    elif person['source'] == 'amulet':
        pyautogui.write(['down']*2 + ['\n', '\t'], 0.5)
    elif person['source'] == 'crystal ball':
        pyautogui.write(['down']*3 + ['\n', '\t'], 0.5)
    elif person['source'] == 'money':
        pyautogui.write(['down']*4 + ['\n', '\t'], 0.5)

    # Fill out Robocop field
    if person['robocop'] == 1:
        pyautogui.write([' '] + ['\t']*2, 0.5)
    elif person['robocop'] == 2:
        pyautogui.write(['right'] + ['\t']*2, 0.5)
    elif person['robocop'] == 3:
        pyautogui.write(['right']*2 + ['\t']*2, 0.5)
    elif person['robocop'] == 4:
        pyautogui.write(['right']*3 + ['\t']*2, 0.5)
    elif person['robocop'] == 5:
        pyautogui.write(['right']*4 + ['\t']*2, 0.5)

    # Fill out Additional comments
    pyautogui.write(person['comments'] + '\t')

    # "Click" Submit button by pressing Enter.
    time.sleep(0.5) # Wait for the button to activate.
    pyautogui.press('enter')

    # Wait until form page has loaded.
    print('Submitted form.')
    print("Restarting in ", end="")
    pyautogui.countdown(5)

    # "Click" the "Submit another response" link
    pyautogui.write(['\t', '\n'])

Ensure that the browser window is active and the form is loaded!
>>> 5 SECOND PAUSE TO LET USER PRESS CTRL-C <<<
5 4 3 2 1 
Entering Alice info...
Submitted form.
Restarting in 5 4 3 2 1 
>>> 5 SECOND PAUSE TO LET USER PRESS CTRL-C <<<
5 4 3 2 1 
Entering Bob info...
Submitted form.
Restarting in 5 4 3 2 1 
>>> 5 SECOND PAUSE TO LET USER PRESS CTRL-C <<<
5 4 3 2 1 
Entering Carol info...
Submitted form.
Restarting in 5 4 3 2 1 
>>> 5 SECOND PAUSE TO LET USER PRESS CTRL-C <<<
5 4 3 2 1 
Entering Alex Murphy info...
Submitted form.
Restarting in 5 4 3 2 1 


### Displaying Message Boxes

In [8]:
import pyautogui

pyautogui.alert("This   is a message", 'Important')

'OK'

In [11]:
pyautogui.confirm("Do you want to continue?")  # Click Cancel

'OK'

In [18]:
pyautogui.prompt("What is your cat's name?")

'Baroq'

In [23]:
pyautogui.password("What is the password?")

'secretpass'

### Practice Projects

#### Looking Busy

Many instant messaging programs determine whether you are idle, or away from your computer, by detecting a lack of mouse movement over some period of time — say, 10 minutes. Maybe you’re away from your computer but don’t want others to see your instant messenger status go into idle mode. Write a script to nudge your mouse cursor slightly every 10 seconds. The nudge should be small and infrequent enough so that it won’t get in the way if you do happen to need to use your computer while the script is running.

#### Using the Clipboard to Read a Text Field

While you can send keystrokes to an application’s text fields with `pyautogui.write()`, you can’t use `PyAutoGUI` alone to read the text already inside a text field. This is where the `Pyperclip` module can help. You can use `PyAutoGUI` to obtain the window for a text editor such as Mu or Notepad, bring it to the front of the screen by clicking on it, click inside the text field, and then send the `ctrl-A` hotkey to “select all” and `ctrl-C` hotkey to “copy to clipboard.” Your Python script can then read the clipboard text by running `import pyperclip` and `pyperclip.paste()`.

Write a program that follows this procedure for copying the text from a window’s text fields. Use `pyautogui.getWindowsWithTitle('Notepad')` (or whichever text editor you choose) to obtain a Window object. The top and left attributes of this Window object can tell you where this window is, while the `activate()` method will ensure it is at the front of the screen. You can then click the main text field of the text editor by adding, say,  100 or 200 pixels to the top and left attribute values with `pyautogui.click()` to put the keyboard focus there. Call `pyautogui.hotkey('ctrl', 'a')` and `pyautogui.hotkey('ctrl', 'c')` to select all the text and copy it to the clipboard. Finally, call `pyperclip.paste()` to retrieve the text from the clipboard and paste it into your Python program. From there, you can use this string however you want, but just pass it to `print()` for now.

Note that the window functions of `PyAutoGUI` only work on Windows as of PyAutoGUI version 1.0.0, and not on macOS or Linux.

#### Instant Messenger Bot

Google Talk, Skype, Yahoo Messenger, AIM, and other instant messaging applications often use proprietary protocols that make it difficult for others to write Python modules that can interact with these programs. But even these proprietary protocols can’t stop you from writing a GUI automation tool.

The Google Talk application has a search bar that lets you enter a username on your friend list and open a messaging window when you press enter. The keyboard focus automatically moves to the new window. Other instant messenger applications have similar ways to open new message windows. Write a program that will automatically send out a notification message to a select group of people on your friend list. Your program may have to deal with exceptional cases, such as friends being offline, the chat window appearing at different coordinates on the screen, or confirmation boxes that interrupt your messaging. Your program will have to take screenshots to guide its GUI interaction and adopt ways of detecting when its virtual keystrokes aren’t being sent.

*NOTE: You may want to set up some fake test accounts so that you don’t accidentally spam your real friends while writing this program.*

#### Game-Playing Bot Tutorial

There is a great tutorial titled “How to Build a Python Bot That Can Play Web Games” that you can find a link to at https://nostarch.com/automatestuff2/. This tutorial explains how to create a GUI automation program in Python that plays a Flash game called Sushi Go Round. The game involves clicking the correct ingredient buttons to fill customers’ sushi orders. The faster you fill orders without mistakes, the more points you get. This is a perfectly suited task for a GUI automation program—and a way to cheat to a high score! The tutorial covers many of the same topics that this chapter covers but also includes descriptions of PyAutoGUI’s basic image recognition features. The source code for this bot is at https://github.com/asweigart/sushigoroundbot/ and a video of the bot playing the game is at https://youtu.be/lfk_T6VKhTE.