Querying SPADE

Ashish Gehani edited this page Oct 21, 2017 · 28 revisions

Activating support for querying

SPADE supports querying provenance with a command line client. Support for this can be activated in the SPADE Kernel using the controller:

-> add analyzer CommandLine
Adding analyzer CommandLine... done

Starting the query client

SPADE supports several types of queries. They are invoked using the query client, which is started with the following command:

spade query

The following will appear:

SPADE 3.0 Query Client

Available commands:
		GetVertex(expression [, limit])
		GetEdge(expression [, limit])
		GetChildren(expression [, limit])
		GetParents(expression [, limit])
		GetLineage(expression, maxDepth, direction)
		GetPaths(expression, maxLength)
		<constraint_name> [<boolean_operator> <constraint_name> ...]
	constraint creation:
		<constraint_name> : <key> <comparison_operator> <value>
	comparison operators:
		= | > | < | >= | <=
	boolean operators:	
		AND | OR	
		a[ncestors] | d[escendants]
	export > <path_to_file_for_next_query>	
	list constraints

Specifying query constraints

Constraints are used to define the properties of vertices and edges that are retrieved during querying. Each constraint has the form:

<constraint_name> : <key> <comparison_operator> <value>

where <constraint_name> is a label that allows the constraint to be referenced in queries; <key> specifies an an annotation key; <comparison_operator> defines the relationship that must hold between the specified <value> and the value of any annotation with a matching <key>. The comparison operators allowed are =, >, <, >=, and <=.

For example, this constraint will limit vertices and edges to ones with annotation date = 7-20-1969:

-> moon_landing : date = 7-20-1969

Other examples:

-> process_constraint : type = Process
-> time_constraint : time > 1000

Constraints are retained and can be reused till the query client exits. The current set can be viewed with:

-> list constraints

command to view all of them.

Expressions combine multiple constraints using Boolean operations AND and OR.

Finding vertices or edges

The GetVertex and GetEdge functions find all the provenance vertices or edges, respectively, that have specific properties. The properties are framed as an expression that can be evaluated by the underlying storage(s). A second optional argument to these functions is limit, which specifies the maximum number of elements that will be returned.

For example, to find at most 10 vertices that have an annotation with key type and value Process, use the following:

-> type_constraint : type = Process
-> GetVertex(type_constraint, 10)
Time taken for query: 158 ms

Finding children or parents

The GetChildren and GetParents functions find the immediate descendants or ancestors, respectively, of a given vertex. The expression provided as an argument defines starting vertex, whose parents or children will be retrieved. For GetChildren, a constraint for annotation parentVertexHash must be provided; for GetParents, the constraint must specify the annotation childVertexHash. For example:

-> child_constraint : parentVertexHash = 9fd2e948b31d0d0cb41259da548deaed
-> GetChildren(child_constraint)
Time taken for query: 13 ms

To limit the number of elements returned, an optional second argument can be provided. In this example, the number of parents is limited to 10:

-> parent_constraint : childVertexHash = 259de4dae9a54d831d0d0cb29ed418bf
-> GetParents(parent_constraint, 10)
Time taken for query: 21 ms

Finding all paths between vertices

The GetPaths function lets the user find all paths between two vertices.

  • The first argument is the expression specifying the source and destination vertices of the path. It should contain two constraints joined together by an AND. The first should have an annotation key sourceVertexHash and the second should have key destinationVertexHash. The annotation values should be the source or destination vertex identifiers, respectively.
  • The second argument specifies the maximum length of path.

For example, this will find all paths of length at most 7 from vertex with hash d2b05ce7777609433f9590717b7cbdf4 to the vertex with hash 4b05c776095b7cbdf477e3390717d2f9:

-> source_constraint : sourceVertexHash = d2b05ce7777609433f9590717b7cbdf4
-> destination_constraint : destinationVertexHash = 4b05c776095b7cbdf477e3390717d2f9
-> GetPaths(source_constraint AND destination_constraint, 7)
Time taken for query: 219 ms

Retrieving lineage

The GetLineage function finds the ancestors or descendants of a specific vertex.

  • The first argument is the hash of the starting vertex for the query.
  • The second argument specifies the maximum number of levels of the graph that should be returned.
  • The third argument is (any prefix of) either ancestors or descendants, used to indicate the direction of traversal.

For example, this query will find 5 levels of descendants, starting from the vertex with hash 79b957b0c706c4b7df457e390731d2f7:

-> root_constraint : hash = 79b957b0c706c4b7df457e390731d2f7
-> GetLineage(root_constraint, 5, d)
Time taken for query: 458 ms

Exporting results

The result of a query is displayed in the client. However, it can also be exported in Graphviz DOT format. The export command specifies the file in which to store the result of the next query. For example:

-> export > /tmp/galileo_lineage.dot
-> test_constraint : hash = bbb2000cc6dd443ff155e99937777777
-> GetLineage(test_constraint, 3, a)
Output exported to file /tmp/galileo_lineage.dot

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.