Skip to content

Contains specification of the Web Automation Workflow (WAW) file.

License

Notifications You must be signed in to change notification settings

apify/waw-file-specification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

waw docs logo

This document contains the official definition of the WAW (web automation workflow) format.

Are you looking for a way of running files in this format? Check out the wbr project on npm or in its GitHub repository.


The WAW format is a declarative format for specifying web-related workflows. It enables the user to control the automation flow with conditional expressions, allowing to make decisions based on the websites content. It is also easily parsable (based on JSON), which greatly simplifies validation, visualization and third-party adoption.

Table of Contents

  1. General overview
  2. Meta Header
  3. Workflow
  4. Where conditions
  5. What actions
  6. Miscellaneous
  7. JSON Schema

General

The .waw (not to be confused with .wav) file is a textual format used for quick, safe and declarative definition of web automation workflows.

Syntactically, .waw should always be a valid .json file. If you are unsure what .json is, refer to the official documentation.

Note: From now on, the .waw file will be considered a valid JSON file and all the terminology (object, array) will be used in this context.

On the top level, the workflow file contains an object with two properties - "meta" - an object with the workflow's metadata and "workflow" - a single array of so-called "where-what pairs". These pairs contain three properties with keys id, where, and what.

The id property is used only for pair referencing (more in State memory) and can be omitted.

Here follows a top-level view of the Workflow file:

{
	"meta" : {
        ...
	}
	"workflow": [
		{
			"id": "login",
			"where": {...},
			"what": [...]
		},
		{
			"id": "signup",
			"where": {...},
			"what": [...]
		},
		...
	]
]

Meta Header

The meta header of the file can contain two fields:

  • "name" - string - optional, name of the workflow (for easier management)
  • "desc" - string - optional, text description of the workflow. Even though all the metadata is optional, developers are strongly advised to use them for clarity and easier management of the workflows.

Example

{
	"name": "Google Maps Scraper",
	"desc": "A blazing fast scraper for Google Maps search results."
}

Workflow

The "workflow" part of the file is a single array consisting of the where-what pairs - objects describing desired behavior in different situations.

For example, let's say we want to click on a button with the label "hello" every time we get on the page "https://example.com/". This behavior is described with the following snippet:

{
	"where": { "url": "https://example.com" },
	"what": [
		{
			"action": "click",
			"args": ["button:text('hello')"]
		}
	]
}

Now, let's say we want to type "Hello world!" into an input field, whenever we see an input field on the "https://example.com" website:

{
	"where": { 
		"url": "https://example.com",
		"selectors": "input"
	},
	"what": [
		{
			"action": "type",
			"args": [
				"input",
				"Hello world!"
			]
		}
	]
}

This should be enough to give you some basic understanding of the WAW Smart Workflow format. In the following sections, there are more details about the format and its certain features.

The Where Clause

The Where clause describes a condition required for the respective What clause to be executed.

In the basic version without the state memory (more later), we can count with the Markov assumption, i.e. the Where clause always depends only on the current browser state and its "applicability" can be evaluated statically, knowing only the browser's state at the given point.

For this reason, the workflow can be executed on different tabs in parallel (any popup window open from the first passed page is processed as well).

Where conditions - The Basics

The where clause is an object with various keys.

The specific "basic" keys (like url, cookies etc.) are implementation-dependent and are not a part of the format specification. Keys shown here correspond to the wbr-interpret implementation.

As of now, three keys are recognized:

  • URL (string or RegEx)
  • cookies (object with string keys and string/RegEx values)
  • selectors (array of CSS/Playwright selectors - all of the targetted elements must be present in the page to match this clause)

An example of a full (simple, flat) Where clause:

	"where": {
		"url": "https://jindrich.bar/",
		"cookies": {
			"uid": "123456"
		},
		"selectors": [
			":text('My Profile')",
			"button.logout"
		]
	}

Where conditions - (Boolean) Logic

For a system operating with conditions, it is crucial to have a simple way to work with formal logic. The WAW format is taking inspiration from the MongoDB query operators, as shown in the example below:

"where": {
    "$and": [
        {
        "url": "https://jindrich.bar/",
        },
        {
            "$or": [
                {
                    "cookies": {
                        "uid": "123456"
                    }
                },
                {
                    "selectors": [
                        ":text('My Profile')",
                        "button.logout"
                    ]
                }
            ]
        }
    ]
}

This notation describes a condition where the URL is https://jindrich.bar/ and there is either the uid cookie set with the specified value, or there are the selectors present. Please note that the top-level $and condition is redundant, as the conjunction of the conditions is the implicit operation.

As of now, the format supports the following boolean operators: $and, $or and $not.

Ordering

Note that the ordering of the rules in the file is crucial. Consider the following example:

	{
		"where": { "url": "https://jindrich.bar" },
		"what": [{ "action A" }]
	},
	{
		"where": { "url": "https://jindrich.bar" },
		"what": [{ "action B" }]
	},

The where conditions in the displayed pairs are the same, i.e. when the interpreter gets to the webpage https://jindrich.bar, it has two possible action sequences to carry out. This situation makes little sense, as the workflow definition needs to be as strict as possible and cannot allow non-deterministic behaviour of the interpreter.

For this reason, the definition of the workflow file says that only the first matching action gets executed.

Even though the colliding conditions were easy to spot in the example above, this problem can get a little more nuanced, for example:

	{
		"where": { 
			"selectors": ["h1", "ul"]
		},
		"what": [{ "action A" }]
	},
	{
		"where": { 
			"selectors": [".large-heading","#list"]
		},
		"what": [{ "action B" }]
	},

While there is no visible collision in the described conditions, the interpreter behavior might be surprising on the following page:

...
	<h1 class="large-heading">Heading</h1>
	<ul id="list">
		<li>a</li>
		...
	</ul>
...

Again, the interpreter will execute only action A, even though both conditions apply.

Another way to think of this is "put more specific conditions closer to the top".

State memory

As mentioned earlier, the interpreter also has an internal memory which allows for more specific conditions. Some of those could be e.g.

"where": {
	"$after": "login" // login being an "id" of another where-what pair
}
"where": {
	"$before": "signup"
}

As of now, the metatags $before and $after are supported. The meaning behind those is to allow an action to be run only after (or before) another action has been executed.

The memory for actions used is tab-scoped, i.e. every new tab has its own memory of used actions (the tabs run the workflow independently of each other).

[Hacker's Tip] : The $before condition specifically can be used to run an action only once ("id": "self", ..., "$before" : "self").

The What Clause

In the most basic version, the What clause should contain a sequence of actions, which should be carried out in case the respective Where condition is satisfied.

Note: While the interpreter wbr-interpret uses Playwright for its backend, the WAW format is suitable for use with any other backend. Just like with the Where clause basic keys, the action's names and parameters are not a part of the format specification.

What actions - The Basics

The what clause is an array of "function" objects. These objects consist of the action field, describing the function called and args - an optional array property, providing parameters for the specified function.

"what":[
	{
		"action": "functionAcceptingString",
		"args": ["theFirstParameter"]
	},
	{
		"action":"voidFunction",
	},
	{
		"action":"moreParameters",
		"args": [
			1000,
			"string parameter",
			{
				"option": true
			}
		]
	}
]

In wbr-interpret, these actions correspond to the Playwright's Page class methods (goto,fill, click...). On top of this, users can use dot notation to access the Page's properties and call their methods (e.g. page.keyboard.press etc.) All parameters passed must be JSON's native types, i.e. scalars, arrays, or objects (no functions etc.)

wbr-interpret custom functions

On top of the Playwright's native methods/functions, the user can also use some interpreter-specific functions.

As of now, these are:

  • screenshot - this is overriding Playwright's page.screenshot method and saves the screenshot using the interpreter's binary output callback.
  • scrape - using a heuristic algorithm, the interpreter tries to find the most important items on the webpage, parses those into a table and pushes the table into the serializable callback.
  • scrapeSchema - getting a "row schema definition" with column names and selectors, the interpreter scrapes the data from a webpage into a "curated" table.
    • Example:
     {
     	"action": "scrapeSchema",
     	"args": [{
     		"name": ".c-item-title",
     		"price": ".c-a-basic-info__price",
     		"vin": ".c-vin-info__vin",
     		"desc": ".c-car-properties__text"
     	}]
     }
  • scroll - scrolls down the webpage for given number of times (default = 1).
  • script - allows the user to run an arbitrary asynchronous function in the interpreter. The function's body is read as a string from the params field and evaluated at the server side (as opposed to a browser). The function accepts one parameter named page, being the current Playwright Page instance.
    • Example:
     {
     	"action": "script",
     	"args": ["\
     	const links = await page.evaluate(() => \
     	{\
     		return Array.from(\
     			document.querySelectorAll('a.c-item__link.sds-surface--clickable')\
     		).map(a => a.href);\
     	});\
     	\
     	for(let link of links){\
     		await new Promise(res => setTimeout(res, 100));\
     		await page.context().newPage().then(page => page.goto(link))\
     	}\
     	"]
     },
    The example runs a server-side script opening all links on the current page in new tabs with 100 ms delay (Note: if you only want to open links on a page, see enqueueLinks lower).
    • Even though it is possible to write the whole workflow using one script field, we do not endorse this. The WAW format should allow the developers to write comprehensible, easy-to-maintain workflow definitions.
  • enqueueLinks (new in 0.4.0)
    • Accepts selector parameter. Reads elements targetted by the specified selector (Playwright selectors) and stores their links in a queue.
    • Those pages are then processed using the same workflow as the initial page (in parallel if the maxConcurrency interpreter parameter is greater than 1).

Extra Syntax

Apart from the mentioned syntax available for direct workflow specification, the WAW format contains more constructs for even better flexibility of the format.

Regular Expressions

The format supports usage of regular expressions, both in the conditions and the action parameters. The syntax is inspired by the MongoDB regex syntax and looks as follows:

    "url": {"$regex": "^https"}

Such a rule matches every URL on a secured website, i.e. starting with https.

Parametrization

The WAW format also allows the developer to parametrize the workflow - this can be particularly useful, e.g. for letting the user insert their login information, URL to be scraped etc.

{
	"action": "goto",
	"args": [
		{"$param": "startURL"}
    ]
}

The interpreter of the format should allow the user to include their own value to replace the entire parameter structure with the user-supplied value.

JSON Schema

In case you want to automatically check a workflow definition file for syntax correctness, use the official JSON Schema.

Note that this JSON schema validates the files only against the base WAW definition. To validate the files against the wbr-interpret implementation of the WAW format, please use the validateWorkflow method of the Preprocessor class.


Want to see a real-world example of a workflow? Visit the examples folder with numerous example workflows.

Ready to automate? Read how to write your first workflow step-by-step.


Successful and speedy Wikipedia scraper

About

Contains specification of the Web Automation Workflow (WAW) file.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published