Available transformers

Hassaan edited this page Feb 10, 2016 · 16 revisions

SPADE includes a set of transformers to rewrite the responses of provenance queries. They are described below.


Since provenance query responses may have more detail than is needed, it may be preferable to eliminate some of the annotations by specifying their keys. This transformer supports such functionality. By default, the keys are assumed to be listed in the SPADE configuration file cfg/spade.transformer.DropKeys.config. If another file is used, it can be specified with the config argument when adding the transformer:

-> add transformer DropKeys position=1 config=/tmp/DropKeys.config
Adding transformer DropKeys... done

Alternatively, keys to be eliminated can be directly specified with the keys argument when adding the transformer. For example, to eliminate the storageId and size annotations, use:

-> add transformer DropKeys position=1 keys=storageId,size
Adding transformer DropKeys... done


The ancestral lineage of a vertex is constructed by backward traversal of the provenance graph. If the edges have ordered event identifiers or timestamps, this information can be used to temporally scope the traversal. In particular, the time of an edge emerging from a process is used to eliminate all later edges entering the process.

Similarly, during a forward traversal to identify descendants, the time of an edge entering a process vertex can be used to eliminate all earlier edges emerging from that process.

The annotation key used to determine the temporal ordering can be specified by passing it as an argument. If none is provided, the default is to use event identifiers -- that is, an argument of order="event id" is implicit. Ordering using timestamps can be specified with:

-> add transformer TemporalTraversal position=1 order=timestamp
adding transformer TemporalTraversal... done


When a file is repeatedly written by a process, a corresponding number of artifact vertices (with different version numbers) appear in the provenance graph. This transformer combines all versions of the file into a single one and removes the version annotation.


When a process repeatedly reads (or writes, respectively) a file, a corresponding number of edges are created. In the context of dependency analysis, a single edge suffices. This transformer merges all read (or write, respectively) edges into a single one representing the flow of data from (or to, respectively) the file.


When a child process (after a fork or clone call) is replaced by another process (via an execve call), the intermediate process is eliminated from the graph. In particular, "parent ---fork/clone---> intermediate ---execve---> child" is replaced by "parent ---fork/clone---> child".


In some cases, it may be preferable to eliminate some of the file artifacts from the provenance graph. For example, particular files, extensions, or subtrees in the filesystem may be deemed of no interest. In such cases, a blacklist can be specified in the SPADE configuration cfg/spade.transformer.Blacklist.config. Any artifact with a filename that matches the expression will be removed from the graph (along with all incident edges).


If a file is only modified by a single process and never read by any other process, the writes are deemed ephemeral. This transformer eliminates all such ephemeral writes from the provenance graph.


If a file is only read by a single process and never modified by any other process, the reads are deemed ephemeral. In general, ephemeral reads are of interest. In the special case that the reads are from "garbage" files (such as applications' predefined temporary files), it may be preferable to eliminate them from the graph. This transformer supports the read elimination, using a list of garbage files specified in the SPADE configuration cfg/spade.transformer.NoEphemeralReads.config. If the optional argument limited=false is specified, ephemeral read elimination is not limited to the files specified in the configuration.

-> add transformer NoEphemeralReads position=1 limited=false
Adding transformer NoEphemeralReads... done


A query response graph may contain portions that are not of interest. This transformer takes an expression framed over the annotations on vertices. It will prune the subgraphs that flow to or from all matching vertices (with the direction automatically determined by query that gave rise to the response graph). For example, it may be preferable to ignore the provenance of the sudo command when returning the provenance of a file created by the program that was executed via sudo. This can be effected with:

-> add transformer Prune position=1 expression=name:sudo
Adding transformer Prune... done


A file may be renamed or linked to, allowing it to subsequently be referred to by a new name. This transformer can be used to retain the write edge from the process that performed the rename or link operation to the new artifact, while eliminating the analogous read edge from the old artifact and the edge between the old and new artifacts. This simplifies the provenance to reflect only the last name of an artifact.


When a program is instrumented with BEEP1, internal loop execution can be interpreted as unit vertices. In the context of workflow analysis, it may be preferable to abstract away the units. This transformer does this by merging all unit vertices with that of the containing process.


When BEEP1 is used, inter-unit communication may occur through memory addresses that are depicted as artifact vertices in the provenance graph. If this level of detail is not needed, this transformer can be used to abstract away the flows through memory addresses. In particular, memory artifact vertices and the edges representing reads to and from them are eliminated.


This transformer composes several others in a specific order. It can be used to provide results that match those produced by BEEP1. Different transformations must be performed, depending on whether an ancestor or descendant lineage query was executed. The specific transformers, arguments, and order used for each type of query are defined in the SPADE configuration file cfg/spade.transformer.BEEP.config, which has sections for both ancestor and descendant queries. This transformer automatically determines which configuration to use based on the query that gave rise to the response graph being processed.

1Kyu Hyung Lee, Xiangyu Zhang, and Dongyan Xu, High accuracy attack provenance via binary-based execution partition, 20th Network and Distributed System Security Symposium, 2013.

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.