SafariDriver Internals

jleyba edited this page May 19, 2016 · 2 revisions

Introduction

This document provides a primer on the browser extension that powers the SafariDriver. It is not intended to be a full-fledged design document. When in doubt, refer to the source!

NOTE: The SafariDriver is deprecated. All code has been removed from master, but is still available in the safari branch


Building the SafariDriver

  1. Sign up for Apple's (free) Safari Developer Program and generate a signed certificate for the extension.
  2. Build the SafariDriver extension:
$ ./go safari
  1. Install the extension:
    1. Launch Safari
    2. Enable the Develop menu (Preferences > Advanced > Show Develop menu in menu bar)
    3. Open the Extension Builder (Develop > Show Extension Builder)
    4. Add a new extension: $SELENIUM_CLIENT/build/javascript/safari-driver/SafariDriver.safariextension
    5. Click Install

Releasing the SafariDriver

To push a new release of the SafariDriver:

  1. Build and install the extension using the instructions above
  2. Open the Extension Builder and select the extension
  3. Click "Build Package ..."
  4. Save the package as SafariDriver.safariextz
  5. Copy your SafariDriver.safariextz file to $SELENIUM_CLIENT/javascript/safari-driver/prebuilt/SafariDriver.safariextz
  6. Commit your changes

Components

The SafariDriver is implemented as a Safari browser extension. There are three primary components within the extension: the global extension, injected script, and the page script.

Global Extension

Build target: //javascript/safari-driver:extension
Code location: //javascript/safari-driver/extension
Namespace: safaridriver.extension

The global extension is loaded once when Safari is first launched. It is responsible for communicating with the WebDriver client, coordinating command execution with the injected script, and tracking opened windows.

Injected Script

Build target: //javascript/safari-driver:injected
Code location: //javascript/safari-driver/inject
Namespace: safaridriver.inject

The injected script is injected into every web page. The script is injected after the document has been loaded, but before it has been parsed. The injected script shares a DOM with the web page, but runs in a separate JavaScript context.

The injected script is only loaded for pages opened with http:// or https://.
All other protocols are unsupported and will not be testable using the SafariDriver.

The injected script handles the execution of the majority of commands, as well as coordinating with selected frames.

Page Script

Build target: //javascript/safari-driver:page
Code location: //javascript/safari-driver/inject
Namespace: safaridriver.inject.page

The page script is bundled as an extra resource in the SafariDriver. Each time an injected script is loaded, it will insert a new SCRIPT tag into the DOM that will load the bundled page script. This script is used to execute certain commands in the page's context, such as executeScript, instead of the injected script's sandbox.


Communication Protocol

WebDriver Client <---> Global Extension <---> Injected Script <---> Page Script

A WebDriver client and the SafariDriver extension communicate using a WebSocket. The extension is able to communicate with the injected script by dispatching messages using a SafariWebPageProxy. Similarly, the injected script is able to communicate with the extension using the !SafariContentBrowserTabProxy. Finally, the injected and page scripts communicate with each other by using window.postMessage.

By default, all messages exchanged between components are sent asynchronously. The injected script is able to submit a synchronous query to the extension using the tab proxy's canLoad method:

  // Create a beforeload event, which is required by the canLoad function.
var stubEvent = document.createEvent('Events');
stubEvent.initEvent('beforeload', false, false);
var response = safari.self.tab.canLoad(stubEvent, 'Bob');
console.log(response);

The extension's response to a canLoad message must be set on the SafariExtensionMessageEvent's message property in the event handler:

  var browserWindow = safari.application.activeBrowserWindow;
var tab = browserWindow.activeTab;
tab.addEventListener('message', function(e) {
if (e.name == 'canLoad') {
e.message = 'Hello, ' + e.message + '. Nice to meet you';
e.stopPropagation();
} // else not a synchronous message.
});

While the injected and page scripts run in different contexts, they share the same DOM and security domain, so they can communicate synchronously by dispatching a synthesized MessageEvent on the window object. Responses must be serialized to a string and passed using an attribute on a DOM element:

  var messageEvent = document.createEvent('MessageEvent');
messageEvent.initMessageEvent('message', false, false, 'Bob',
window.location.origin, '0', window, null);
window.dispatchEvent(messageEvent);

// Read the response value from document.
var response = document.documentElement.getAttribute('response');
document.documentElement.removeAttribute('response'); // Clean-up.
console.log(response);

Message Format

Regardless of the mechanism used to pass messages, the SafariDriver uses a single JSON message protocol.

{
/* A key identifying the message origin. The extension components use
* unique numeric constants, and WebDriver clients should _always_ use
* the value "webdriver".
*/
"origin": "webdriver",

/* The type of message. */
"type": "foo",

/* Additional fields vary by message type. Refer to the actual code for
* full documentation: //javascript/safari-driver/message
*/
}

There are only three messages types that a WebDriver client ever needs to worry about: the command, response, and connect messages.

Communicating with the SafariDriver

The SafariDriver communicates with clients using WebSockets. While the SafariDriver browser extension maintains the client-end of the WebSocket extension, this section refers to it as the "server" end of the WebDriver API. Similarly, the term "client" refers to the user-facing WebDriver API that issues commands to the server.

Command

Key Type Description
id string A random ID assigned to the command by the client; may be any string value.
name string The name of the command. Should be one of the values defined in org.openqa.selenium.remote.DriverCommand.
parameters Object The command parameters as a JSON object. All parameter (key, value) pairs are consistent with those documented in the JSON wire protocol.

Commands that rely on a URL parameter in the wire protocol should include those parameters in the parameter map using the same name as documented in the JsonWireProtocol. For instance, the wire protocol command

POST /session/:sessionId/window/:windowHandle/size

{"width":250, "height":250}

Should be encoded for the SafariDriver as

{
"origin": "webdriver",
"type": "command",
"command": {
"id": "random-id-1234",
"name": "setWindowSize",
"parameters": {
"sessionId": "mySessionId",
"windowHandle": "current"
}
}
}

The SafariDriver tracks sessions by WebSocket connection, so the sessionId parameter may actually be excluded from the parameter set.

Response

The SafariDriver's response objects have a similar structure to commands:

{
"origin": "webdriver",
"type": "response",
"id": "random-id-1234",
"response": {
"status": 0,
"value": null
}
}

The id field in each response should echoe the ID sent with the corresponding command object. The response field has the same format as specified by the JSON wire protocol.

On Command/Response IDs

The SafariDriver will synchronize the execution of every command it receives. Thus, each response should always match the original command sent by the client. Upon receiving a response, the client should check that the ID matches the one sent with the original command. It is a catastrophic failure if these IDs do not match.


Control Flow

Connecting to the SafariDriver

Clients may connect to the SafariDriver by opening a page with the following:

<!DOCTYPE html>
<script>
window.onload = function() {
window.postMessage({
'type': 'connect',
'origin': 'webdriver',
'url': 'ws://localhost:1234/wd'
}, '*');
};
</script>

The posted message is intercepted by the injected script and passed along to the extension's global page, which will in turn establish the WebSocket connection with the requested server.

Safari will not load extensions for file:// URLs. In order to open the connection page, as shown above, it is recommended that this page be served by the WebSocket server maintained by the SafariDriver client.

The SafariDriver supports multiple WebSocket connections. Each connection is treated as a separate session with its own timeout state (i.e., for implicit waits). All sessions share state for window and frame focus. Furthermore, the execution of commands is synchronized globally, across all sessions.

Executing Commands

Once a connection has been established, the WebDriver client may start sending commands. Command messages are received by the safaridriver.extension.Server. Some commands, like setting script timeouts or closing a window, can be handled directly by the extension. Others must be sent to the injected script for execution (safaridriver.extension.Tab.prototype.send).

Page Loading

Before a command can be sent to the injected script, the extension must wait for there to be an injected script to send the command to. While Safari will emit navigation events whenever a new page is loaded, these events are not a sufficient indicator that the injected script has been initialized. Therefore, each time the injected script is loaded and initialized, it sends a notification of the extension. The extension will then execute the next queued command, if any.

Frame Handling

Safari loads an injected script for each frame in a page. Furthermore, when the extension sends a message to a tab, it is broadcasted to every frame within that tab. To compensate, the injected script in the SafariDriver keeps track of which frame currently has focus.

The topmost frame in a tab (safaridriver.inject.Tab) always has focus after a page is loaded. Whenever a command is received, the top frame first checks if it's a command that must be handled by top (e.g. getWindowSize), otherwise, it forwards the command to the currently focused tab for execution.

As with the extension, the injected script relies on frames sending notifications when they have loaded/unloaded their contents; the tab will not execute a command until the current frame is fully loaded.

Alert Handling

The SafariDriver is not capable of interacting with alerts like the other driver implementations (see issue 3862). It is, however, capable of suppressing alerts so they do not cause the browser to hang.

When the page script is first loaded, it overrides the native alert functions (alert, confirm, and prompt). Each time an alert is triggered, the page script will send a synchronous message to the injected script, which forwards it to the extension (synchronously). The extension will respond with whether there is an active WebDriver session. If there is, the alert will be dismissed with the native function's default return value (alert: undefined, confirm: false, prompt: ''). The injected script will abort the currently executing command, if any, and return a ModalDialogOpenedError to the client.

If there are is not an active session, the native function will be called to preserve their functionality during manual testing.


Debugging

Sometimes, a test may hang after sending a command to the SafariDriver. This usually indicates something blew up. Thankfully, the SafariDriver is quite chatty.

First open the WebKit inspector on the page under test and check the console output. The console is cleared each time a page is loaded, so you'll only be able to see the logs for the most recent injected script. You can select the injected script and set break points inside the inspector, but it is sandboxed from the page, so you can't play with it using the inspector's REPL.

Next, check the global page: Develop > Show Extension Builder. Select the WebDriver extension, and click "Inspect global page." Again, the SafariDriver is super chatty, so you should see what went wrong on the console. You can set script break points for the injected page and interact with it using the REPL.


Development

Once you have manually installed the SafariDriver extension, as outlined above,
you can reload your changes by re-compiling the extension, opening the extension builder, and clicking the "Reload" button for the WebDriver extension.