# Provenance Queries Problem 2: Hamming Numbers

We study _retrospective_ provenance graphs resulting from  similar but different executable workflows graphs (_prospective_ provenance). While the resulting outputs (Hamming numbers)  are the same, the graphs reveal differences in the underlying workflow executions, resulting in two different provenance graphs: 
- (a) H1: _Fish_ and 
- (b) H3: _Sail_.

If you'd like to know more about Hamming numbers, see https://en.wikipedia.org/wiki/Regular_number. This [paper and presentation](https://www.usenix.org/conference/tapp12/workshop-program/presentation/dey) on _Datalog as a Lingua Franca for Provenance Querying and Reasoning_ is also relevant and uses the _Fish_ and _Sail_ provenance graphs in the appendix.

![Fish and Sail](fish_sail.png "Fish and Sail")


## Problem 2a: _Fish_

**Hints**. To solve Problem 2, you can reuse the rules from Problem 1: use the relation `hamming(Y,X,F)` to define a new “parent” relation `par(X,Y)`. Using this new parent relation (obtained from the Hamming edges), you can reuse the rules for `anc(X,Y)`, `ca(X,Y,A)`, `not_lca(X,Y,A)`, and `lca(X,Y,A)` to solve Problem 2!


In [1]:
%reload_ext lib.clingo.clingo_magic
import os
from lib.clingo.clingo_evaluate_util import clingo_evaluate

In [2]:
family_base_facts_and_rules_file = os.path.expanduser('~/data_readonly/provenance/problem2-fish.lp')
%set_db_file $family_base_facts_and_rules_file

See the Hamming Rules Output

In [3]:
%%clingo {"predicate" : "hamming", "predicate_arity" : 3, "result_var": "Hamming_test"}
% Hamming Test

Saving output to local variable Hamming_test['result']
Saving code snippet to local variable Hamming_test['code']


## 1. [10 points] Ancestors (_Fish_)
Compute the lineage of 360 in the _Fish_ provenance graph, i.e., all nodes for which there is a path that leads to 360. You will do same for the _Sail_ graph in the next notebook.

In [21]:
%%clingo {"predicate" : "anc_360", "predicate_arity" : 1, "result_var": "Anc_360"}
% Don't change the clingo magic command above. The header of this cell will determine how the datalog rules are saved for evaluation.

%# Change the following expression, and add additional rules if necessary
      
par(Y, X) :- hamming(Y, X, _).

anc(X, Y) :- par(X, Y).
anc(X, Y) :- par(X, Z), anc(Z, Y).

anc_360(X) :- anc(360, X).

Saving output to local variable Anc_360['result']
Saving code snippet to local variable Anc_360['code']


### Test 1 for Ancestors of 360

In [22]:
expected_output = '''
anc_360(72) anc_360(120) anc_360(180) anc_360(36) anc_360(60) anc_360(90) anc_360(24) anc_360(40) anc_360(8) anc_360(12) anc_360(20) anc_360(18) anc_360(30) anc_360(45) anc_360(9) anc_360(15) anc_360(6) anc_360(10) anc_360(4) anc_360(2) anc_360(3) anc_360(5) anc_360(1)
'''
db_file = os.path.expanduser('~/data_readonly/provenance/problem2-fish.lp')
clingo_evaluate(db_file, Anc_360['code'], 'anc_360', 1, expected_output)

In [23]:
# Hidden Test for anc/2.
# anc/2 is the anc rule you used to generate your anc_360.
# refer back to the Family problem (Problem 1):
# anc(X,Y) is true if one can reach from X an ancestor Y via a chain of parent edges.


## 2. [20 points] Lowest Common Ancestors
Compute `lca(360,600,a)`, i.e., the lowest common ancestor of 360 and 600 for the _Fish_ graph.
You will do the same for the _Sail_ graph in the next notebook.

In [10]:
%%clingo {"predicate" : "lca_360_600", "predicate_arity" : 1, "result_var": "Lca_360_600"}
% Don't change the clingo magic command above. The header of this cell will determine how the datalog rules are saved for evaluation.

% Change the following expressions (add additional rules if necessary):
    
lca_360_600(X) :- lca(360,600,X).

par(X,Y) :- hamming(Y,X,F).
    
anc(X,Y) :- par(X,Y).
anc(X,Y) :- par(X,Z), anc(Z,Y).

ca(X,A,A) :- anc(A,X).
ca(A,X,A) :- anc(A,X).
    
ca(X,Y,A) :- anc(A,Y), anc(A,X), X != Y. 

not_lca(X,Y,A) :- ca(X,Y,A), ca(X,Y,A1), anc(A,A1).

lca(X,Y,A) :- ca(X,Y,A), not not_lca(X,Y,A).

Saving output to local variable Lca_360_600['result']
Saving code snippet to local variable Lca_360_600['code']


In [11]:
expected_output = '''
lca_360_600(120)
'''
db_file = os.path.expanduser('~/data_readonly/provenance/problem2-fish.lp')
clingo_evaluate(db_file, Lca_360_600['code'], 'lca_360_600', 1, expected_output)

In [10]:
# Hidden Test for lca/3.
