Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Activating support for querying
SPADE supports querying provenance with a command line client. Support for this can be activated in the SPADE Kernel using the controller:
-> add analyzer CommandLine Adding analyzer CommandLine... done
Starting the query client
SPADE supports several types of queries. They are invoked using the query client, which is started with the following command:
The following will appear:
SPADE 3.0 Query Client Available commands: functions: GetVertex(expression [, limit]) GetEdge(expression [, limit]) GetChildren(expression [, limit]) GetParents(expression [, limit]) GetLineage(expression, maxDepth, direction) GetPaths(expression, maxLength) expression: <constraint_name> [<boolean_operator> <constraint_name> ...] constraint creation: <constraint_name> : <key> <comparison_operator> <value> comparison operators: = | > | < | >= | <= boolean operators: AND | OR direction: a[ncestors] | d[escendants] export > <path_to_file_for_next_query> list constraints exit ->
Specifying query constraints
Constraints are used to define the properties of vertices and edges that are retrieved during querying. Each constraint has the form:
<constraint_name> : <key> <comparison_operator> <value>
<constraint_name> is a label that allows the constraint to be referenced in queries;
<key> specifies an an annotation key;
<comparison_operator> defines the relationship that must hold between the specified
<value> and the value of any annotation with a matching
<key>. The comparison operators allowed are
For example, this constraint will limit vertices and edges to ones with annotation
date = 7-20-1969:
-> moon_landing : date = 7-20-1969
-> process_constraint : type = Process -> time_constraint : time > 1000
Constraints are retained and can be reused till the query client exits. The current set can be viewed with:
-> list constraints
command to view all of them.
Expressions combine multiple constraints using Boolean operations
Finding vertices or edges
GetEdge functions find all the provenance vertices or edges, respectively, that have specific properties. The properties are framed as an expression that can be evaluated by the underlying storage(s). A second optional argument to these functions is limit, which specifies the maximum number of elements that will be returned.
For example, to find at most
10 vertices that have an annotation with key
type and value
Process, use the following:
-> type_constraint : type = Process -> GetVertex(type_constraint, 10) Time taken for query: 158 ms
Finding children or parents
GetParents functions find the immediate descendants or ancestors, respectively, of a given vertex. The expression provided as an argument defines starting vertex, whose parents or children will be retrieved. For
GetChildren, a constraint for annotation
parentVertexHash must be provided; for
GetParents, the constraint must specify the annotation
childVertexHash. For example:
-> child_constraint : parentVertexHash = 9fd2e948b31d0d0cb41259da548deaed -> GetChildren(child_constraint) Time taken for query: 13 ms
To limit the number of elements returned, an optional second argument can be provided. In this example, the number of parents is limited to
-> parent_constraint : childVertexHash = 259de4dae9a54d831d0d0cb29ed418bf -> GetParents(parent_constraint, 10) Time taken for query: 21 ms
Finding all paths between vertices
GetPaths function lets the user find all paths between two vertices.
- The first argument is the expression specifying the source and destination vertices of the path. It should
contain two constraints joined together by an
AND. The first should have an annotation key
sourceVertexHashand the second should have key
destinationVertexHash. The annotation values should be the source or destination vertex identifiers, respectively.
- The second argument specifies the maximum length of path.
For example, this will find all paths of length at most
7 from vertex with hash
d2b05ce7777609433f9590717b7cbdf4 to the vertex with hash
-> source_constraint : sourceVertexHash = d2b05ce7777609433f9590717b7cbdf4 -> destination_constraint : destinationVertexHash = 4b05c776095b7cbdf477e3390717d2f9 -> GetPaths(source_constraint AND destination_constraint, 7) Time taken for query: 219 ms
GetLineage function finds the ancestors or descendants of a specific vertex.
- The first argument is the
hashof the starting vertex for the query.
- The second argument specifies the maximum number of levels of the graph that should be returned.
- The third argument is (any prefix of) either
descendants, used to indicate the direction of traversal.
For example, this query will find
5 levels of descendants, starting from the vertex with hash
-> root_constraint : hash = 79b957b0c706c4b7df457e390731d2f7 -> GetLineage(root_constraint, 5, d) Time taken for query: 458 ms
The result of a query is displayed in the client. However, it can also be exported in Graphviz DOT format. The
export command specifies the file in which to store the result of the next query. For example:
-> export > /tmp/galileo_lineage.dot -> test_constraint : hash = bbb2000cc6dd443ff155e99937777777 -> GetLineage(test_constraint, 3, a) Output exported to file /tmp/galileo_lineage.dot