ProvMark is a fully automated system that try to generate benchmark from provenance information collected by different provenance collection tools.
Stage 1 Execute provenance collecting tools of choice on chosen syscall benchmark program (and control program)
In this step, the chosen syscall and control program will be prepared in a clean stage. Then the chosen provenance collecting tools will be started and record provenance of the execution of those program with multiple trial. The provenance collecting tools will create one set of provenance result per trial per program. These result is the raw data from the provenance collecting tools and are further processed and analysis in our system to generate benchmark. Currently, there are three type of output format supported. Graphviz dot format, neo4j format (in full db or cypher dump of the db) and the prov-json format.
Stage 2 Transform raw result to Clingo graph format
Clingo is a Answer Set Programming language that provides powerful modelling ability to solve combinatorial problems. We make use of it to solve the complicated graph compaison problems for matching vertics and edges in multiple trial of benchmark program execution for generalization of graph and identification of additional elements (benchmark of a syscall for certain provenance collecting tools). In this steps, the system will run a different script to transform the raw result generated by the provenance collecting tools to clingo graph formatfor further processing. As those provenance information is supposed to describing the execution trace of the chosen syscall program, it is always possible to transform into directed graph. Each trial result will be transformed into one clingo graph for the next stage.
Stage 3 Generalize resulting graph for multiple trial
In this stage, multiple clingo graph descirbing the multiple trial of the same program execution will be put together to compare. The clingo graph will match the elements in the graph two by two and provide a matching list of nodes and edges with least edit distance. Then the properties in the graph will be compared one by one, noises will be identified and removed. The result of this stage should be a generalized graph for the control program and another generalized graph for the chosen syscall program. The should contains the information which is truely related to the program execution with minimum noise.
Stage 4 Generating benchmark of chosen syscall for chosen provenance collecting tool
This is the last stage of the benchmarking system execution. In this stage, the two generalized graph will be compared to each other. As we assume that the chosen syscall is always a few steps or command more than the control program execution and they are both executed based on a same stage environment with the same language. So the additional elements in the generalized syscall graph shows the patterns that can be used as a benchmark to identify this syscall when we are using the chosen provenance collecting tools. All those addtional branchesand properties will be identified and summarized in the result file in clingo format. Currently, this is the end of the full system. The clingo format graph can be transformed into other directed graph format if needed in the future.
- benchmarkProgram: Contains sample c program for the collection of provenance information on different syscall
- clingo: Contains the clingo code
- config: Contains the configuration profile of different tools choice for stage 1 and stage 2
- documentation: Contains the documentation for ProvMark
- genClingoGraph: Contains code to transform graph format
- processGraph: Contains code to handle graph comparison and generalization
- sampleResult: Contains sample benchmark result on out trial
- startTool: Contains tools to handle provenance collecting tools currently supported and retrieve result from them
- template: Contains html template for result generation
- vagrant: Contains vagrant file for those provenance collecting tools currently supported
Use of Clingo
The content inside the directory Clingo is an external work provided by University of Potsdam as part of the Potassco. It is distributed under MIT License and the developer remain their right for the distribution of the binary and code. We provide a local copy of the compiled version 5.2.1 for convenience only. You should always search for the original code and binary of Clingo from the original developer. Here is a link to the original developer [http://potassco.sourceforge.net/]