Querying SPADE

Activating support for querying

SPADE allows stored provenance to be queried with a command line client. Support for this can be activated in the SPADE Kernel using the controller:

-> add analyzer CommandLine
Adding analyzer CommandLine... done

Starting the query client

SPADE supports several types of queries. They are invoked using the query client, which is started with the following command (from within the SPADE/bin directory):

spade query

The following will appear:

SPADE Query Client
->

Help for the query client can be printed using the command:

help [ all | control | constraint | graph | env ]

The SPADE storage that is to be queried can be specified using the command:

set storage <storage class name>

Alternatively, a default storage can be set in cfg/spade.client.CommandLine.config.

Currently, three types of storage can be queried: Quickstep, PostgreSQL, or Neo4j.

To query the configured Quickstep database:

set storage Quickstep

To query the configured Postgres relational database:

set storage PostgreSQL

To query the configured Neo4j graph database:

set storage Neo4j

Querying can then commence using the commands described here.

Use exit to leave the query client.

Upon starting, the query client tries to execute the commands in cfg/spade.client.CommandLine.config. Upon exit, it saves the query history at that point to the same file. This allows query state (such as the current storage and environment variables) to be automatically restored in subsequent sessions.

Specifying query constraints

Constraints are used to define the properties of vertices and edges that are retrieved during querying. Each constraint has the form:

<constraint_name> = [ not ] "<key>" <comparison_operator> '<value>' [ and|or [ not ] "<key>" <comparison_operator> '<value>' ]*

where <constraint_name> is a label that allows the constraint to be referenced in queries and must start with %; <key> specifies an annotation name; <comparison_operator> defines the relationship that must hold with the specified <value>. The operators supported are ==, !=, >, <, >=, <=, and like (for pattern matching).

For example, this constraint will match graph elements that contain a pid annotation with a value of 1.

-> %init_process = "pid" == '1'

This constraint will match elements that contain the annotation name and a value that ends with fox:

-> %name_ends_with_fox = "name" like '%fox'

This constraint matches elements that have an annotation event id with a value in the numeric range that starts at 1000 and ends at 2000:

-> %events_1k_to_2k = "event id" >= '1000' and "event id" <= '2000'

Currently defined constraints can be viewed with:

-> list constraints

The expression that a constraint is currently bound to can seen with:

-> dump <constraint_name>

For example, the constraint %init_process (defined above) can be seen with:

-> dump %init_process
(pid == 1)

Environment variables

As a convenience, certain query arguments do not need to be provided explicitly. In this case, the default value can be specified by setting it in the environment. Currently, these variables can be set: maxDepth, limit.

All supported environment variables can be listed using the command:

list env

If the parameter maxDepth is not defined in a path or lineage query, its value is retrieved from the environment. Similarly, a limit query will use the default from the environment if an explicit value is not provided.

The environment variables can be set, unset, and printed using the commands:

-> env set maxDepth 10   # Sets the variable 'maxDepth' to value 10
OK

-> env set limit 20      # Sets the variable 'limit' to value 20
OK

-> env unset maxDepth    # Removes the binding for variable 'maxDepth'
OK

-> env print limit       # Prints the value of the variable 'limit'
20

-> env print maxDepth    # Prints UNDEFINED if there is no current binding for the variable 'maxDepth'
UNDEFINED

Finding vertices or edges

The getVertex and getEdge functions find all the provenance vertices or edges, respectively, in a particular graph that have specific properties. The properties are framed as an expression that can be evaluated by the underlying storage(s). If a constraint is not specified, all vertices or edges in the graph will be returned.

For example, this will find vertices that have an annotation with key type and value Process:

-> %only_processes = "type" == 'Process'
-> $all_processes = $base.getVertex(%only_processes)

$base is a special variable that represents the entire graph.

To retrieve at most 10 such vertices, use:

-> $ten_processes = $all_processes.limit(10)

Finding parents or children

getNeighbor finds the immediate ancestors or descendants of a set of vertices in a given graph. It takes two arguments:

a graph variable/expression that defines a set of initial vertices, and
ancestors to find parents, or descendants to find children.

For example, to find all the processes (and threads) created by firefox in the global $base graph:

-> %firefox_constraint = "name" == 'firefox'
-> $firefox_vertices = $base.getVertex(%firefox_constraint)
-> $firefox_children = $base.getNeighbor($firefox_vertices, 'descendants')

Finding all paths between vertices

getPath finds all paths between vertices. At minimum, it takes the following arguments:

a graph variable/expression specifying the set of source vertices,
a graph variable/expression specifying the set of destination vertices, and
the maximum length of a path from the source to destination vertices. Optionally, further pairs of another destination graph variable/expression and maximum path length can be specified.

For example, this finds all paths of length at most 7 from vertices with firefox as the value of their name key to vertices with /etc/passwd as the value of their path key:

-> %source_constraint = "name" == 'firefox'
-> $sources = $base.getVertex(%source_constraint)
-> %destination_constraint = "path" == '/etc/passwd'
-> $destinations = $base.getVertex(%destination_constraint)
-> $paths = $base.getPath($sources, $destinations, 7)

Here is another example illustrating the use of optional additional arguments:

-> $paths = $base.getPath($first, $second, 10, $third, 11, $fourth)

Above, any paths between the vertices in the $first and $fourth graphs that pass through the vertices in the $second and $third graphs are found. Specifically, this will find paths from $first to $second with maximum length 10, followed by paths from $second to $third with maximum length 11, and finally from $third to $fourth. Note that the maximum path length to $fourth was not specified. Its value was transparently retrieved from the environment variable for maxDepth.

Retrieving lineage

getLineage finds the ancestors or descendants of given vertices. It takes three arguments:

a graph variable/expression specifying the sources of the lineage.
an optional natural number that specifies the maximum number of levels that should be retrieved. If this is not specified, the environment variable maxDepth is used.
(any prefix of) either ancestors, descendants or both. It indicates the direction of traversal.

For example, this query will find 5 levels of descendants, starting from the vertices with firefox as their name:

-> %firefox_constraint = "name" == 'firefox'
-> $initial_vertices = $base.getVertex(%firefox_constraint)
-> $lineage = $base.getLineage($initial_vertices, 5, 'ancestors')

Exporting results

Each query response is displayed in the client. To export one to a file, run this command before issuing the query:

-> export > /tmp/will_write_result_of_next_command_here

A graph can also be exported in Graphviz DOT format. For example:

-> %test_constraint = "name" == 'galileo'
-> $test_vertices = $base.getVertex(%test_constraint)
-> export > /tmp/galileo.dot
-> dump all $test_vertices
Output exported to file /tmp/galileo.dot

Loading queries

For convenience, SPADE query client supports loading queries from a file. This can be done using the command:

-> load /tmp/SPADE-queries

Each line in the file /tmp/SPADE-queries is treated as a complete SPADE query. Upon error in the execution of a query in the file, the remaining queries are discarded with an error message. Comments can be added by starting a line with #.

This material is based upon work supported by the National Science Foundation under Grants OCI-0722068, IIS-1116414, and ACI-1547467. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Setting up SPADE
Storing provenance
Collecting provenance
- Across the operating system
- Limiting collection to a part of the filesystem
  - On Linux
  - On macOS
- From an external application
- With compile-time instrumentation
- Using the reporting API
- Of transactions in the Bitcoin blockchain
- Filtering provenance
  - Using filters
  - Available filters
Viewing provenance
- In a graph database
- In a relational database
Querying SPADE
- Illustrative example
- Transforming query responses
  - Using transformers
  - Available transformers
- Protecting query responses
Miscellaneous

Provide feedback

Saved searches

Use saved searches to filter your results more quickly