lang_datalog
GProM supports the generation of provenance graphs that explain why a tuple exists in the result of a Datalog query respective why it is missing from the result. Several programs, provenance requests, and provenance graphs can be found at the datalog_prov
page (https://github.com/IITDBGroup/gprom/wiki/datalog_prov) as an example. More example programs and provenance requests are available in the examples/datalogAndProvGraphs
directory, and the provenance graphs produced for these examples are included in examples/datalogAndProvGraphs/graphs
. We provide convenience scripts for running these examples using the included SQLite database. Use genProvGraph.sh
to generate a pdf provenance graph and runQuery.sh
to run a query that returns the edge relation of the provenance graphs (or the result if provided with a Datalog query without provenance requests).
The datalog parser is activated by changing the parser, analyzer, and translator plugins to dl
using the following command line options.
gprom -frontend dl
This will cause GProM to translate the input Datalog program into relational algebra (and translate this into SQL). Use -Pexecutor run
to run the SQL code or -Pexecutor sql
to just print the SQL code to stdout
.
If the program asks for provenance (see below) then the -Pexecutor gp
option will cause GProM to output a .dot
script for the provenance graph. For convenience, we provide several shell scripts in the scripts
folder that activate Datalog support:
-
gprom-dl.sh
- starts an interactive shell to run Datalog queries. -
gprom-dl-to-dl.sh
- takes two parameters: 1) a loglevel, 2) a Datalog program with optional provenance requests. This script returns an instrumented Datalog program that computes the edge relation of the provenance graph based on the user's provenance request (if any). -
gprom-dl-to-sql.sh
- takes two parameters: 1) a loglevel, 2) a Datalog program. Returns an SQL query that computes the edge relation of the provenance graph based on the user's provenance request (if any) or an SQL translation of the input program otherwise. -
gprom-dl-from-file-to-sql.sh
- takes two parameters: 1) a loglevel, 2) a file storing a Datalog program. Returns an SQL query that computes the edge relation of the provenance graph based on the user's provenance request (if any) or an SQL translation of the input program otherwise. -
gprom-dl-prov-graph.sh
- takes two parameters: 1) a Datalog program with a provenance request, 2) an output file name. This script generates a pdf file storing the provenance graph constructed for the user's provenance request. This requiresgraphviz
to be installed. -
gprom-dl-from-file-prov-graph.sh
- takes two parameters: 1) a file storing a Datalog program with a provenance request, 2) , 2) an output file name. This script generates a pdf file storing the provenance graph constructed for the user's provenance request. This requiresgraphviz
to be installed.
For example, assuming you have a relation R(A,B)
in an Oracle database, to generate the provenance graph for explaining why Q(1)
is in the result of query Q(X) :- R(X,Y).
and store the graph in my_prov_graph.pdf
you would run:
gprom-dl-prov-graph.sh "Q(X) :- R(X,Y). WHY(Q(1))." my_prov_graph -host 1.1.1.1 -user usr -passwd mypass -port 1521 -db orcl
So far our parser and internal model supports non-recursive Datalog with negation, the ability to specify an answer relation, and constructs that ask for game provenance. A Datalog rule consists of a head (a relational atom) and a body (a list of relational and comparison atoms and is denoted as head :- body .
. An relational atom is an identifier followed by a list of arguments in parenthesis. Arguments can be constants (string constants enclosed in '
and numeric constants) or variables.
A few example Datalog rules are shown below:
Q(X) :- R(X,Y), S(Y,Z).
Q(X) :- R(X,Y), X < 5, X < Y.
We require all rules to be safe:
- all variables in the head have to appear in a positive relational atom in the body
- all variables in a negative relational atom in the body have to occur in at least one positive relational atom in the body
- all variables used in a comparison atom have to occur in at least one positive relational atom in the body
We allow function application, constants, and arithmetic expressions in the head of a rules. This works like projection in relational algebra. For example:
Q(X,3+ Y) :- R(X,Y).
A Datalog program consists of a number of rules and optionally an answer relation specification. In the translation to SQL this answer relation will be the one computed by the resulting SQL query. For example, in the following Datalog program we will compute IDB relation Q
Q(X) :- R(X,Y).
Q2(X) :- Q(X).
ANS : Q.
Currently we do not support recursive programs
Our Datalog dialect currently supports two types of provenance questions. Both ask for the game provenance rooted a particular IDB relational atom with constant values. WHY(Q(C))
where C
is a list of constants computes the move relation for the game provenance graph that explains why Q(C)
is won, i.e., why Q(C)
is in the result of the program. WHYNOT(Q(C))
computes the move relation for the game provenance graph for why Q(C)
is lost, i.e., why Q(C)
is not in the result of the program. For example, the following computes why a
is reachable in 3 hops from a
:
Thop(X,Y) :- hop(X,A), hop(A,B), hop(B,Y).
WHY(Thop('a','a')).
We support RPQ (Regular Path Queries) as a macro-like construct that is expanded into a set of Datalog rules before analyzing and translating the program. Note that some RPQ language constructs will result in recursive queries that are currently not supported by GProM. For such queries you can request GProM to return the resulting Datalog program.
The new construct is RPQ(query,resultType,edgeRelation,resultRelation)
.
-
query
is an RPQ expression enclosed in single quotes'a.b.c*'
-
resultType
determines what kind of results are produced:-
RESULT
returns a relation with two attributes storing pairs of nodes that are connected via at least one path that matches the regular path expression -
PROV
returns a relation with five attributes that stores result nodes (first two attributes) paired with edges from the input graph that lie on paths between the result nodes that match the regular path expression -
SUBGRAPH
returns the subgraph induced by the regular path expression, i.e., the subgraph of the input graph limited to edges that lie on paths matched by the regular path expression.
-
-
edgeRelation
the relation with stores the input graph, i.e., labelled edges as tuples(inNode,label,outNode)
-
resultRelation
the storing the query results
for example to find all nodes that are reachable from node 'n1' with a path a.b.
:
RPQ('a.b',RESULT,e,RP).
Q(Y) :- RP(X,Y), X = 'n1'.