# Problem 1: Rules and Integrity Constraints (Family Data)

## Notes about Datalog (Clingo) in the Jupyter Notebook environment:
* Refer to [Clingo with Jupyter Intro](Clingo_with_Jupyter_Intro.ipynb) before attempting this notebook.
* It is important to run the following cells first for the rest of the notebook to work. (It's usually a good idea to run cells in order. In case you have run cells out of order and want to start over, simply restart the kernel from the menu.)
* All clingo cells start with `%%clingo`.
* You can run your clingo cell against the facts and rules from a file. `set_db_file $filepath` sets the file against which your clingo cells will run.
* The clingo cells are _independent_ from each other. Rules defined in one cell won't be visible in others!
* When you submit the assignment, we will run your code against different sets of facts. (So don't "hardcode" your answers ;-)
* If you want to practice a bit more with clingo, you can go to https://potassco.org/clingo/run/ and experiment with the small (but interesting!) examples there. Alternatively, you can install clingo on your own computer:  https://potassco.org/doc/start. If you want to go "all in", there is also an online course (https://teaching.potassco.org/). But don't worry: _none of this is required to do this assignment!_

* FYI: `%%clingo` is just a thin wrapper for running the tool in command line mode.

## Family data and rules in Datalog (clingo):
* Consider the family relation which is shown as a graph below (edges point from a parent to a child)! We will work with Datalog rules to query data from this graph and check for integrity violations.</br>
![Family](Family_Datalog.png "Family")

### Good luck!!

In [4]:
%reload_ext lib.clingo.clingo_magic
import os
from lib.clingo.clingo_evaluate_util import clingo_evaluate

In [5]:
# All clingo cells will run against this file containing some base facts.
family_base_facts_and_rules_file = os.path.expanduser('~/data_readonly/datalog/family_base.lp')
%set_db_file $family_base_facts_and_rules_file

## We will now write various rules one by one ... 

### [12.5 points] descendant(X,Y)
* descendant(X,Y) holds if X is a descendant of Y. Hint: We did the closely related ancestor(X,Y) in class.


In [9]:
%%clingo {"predicate" : "descendant", "predicate_arity" : 2, "result_var": "Descendant"}
% Don't change the clingo magic command above. The header of this cell will determine how the datalog rules are saved for evaluation.


% Change following expression.

descendant(X,Y) :- parent(Y,X).
descendant(X,Y) :- parent(Y,Z), descendant(X,Z).

Saving output to local variable Descendant['result']
Saving code snippet to local variable Descendant['code']



### [5 points] Test 1 for descendant(X,Y)
The following test will compare the output of your descendant rule against the expected output.
You must have run all clingo cells above for the test to pass.

In [10]:
# Test 1 for descendant(X,Y)
# The following should be the output of your previous cell.
# The order of predicates in the output doesn't matter.
# Run this cell to see the expected output with syntax highlighting.
expected_output = '''
descendant(john,william) descendant(james,john) descendant(bill,james) descendant(bill,sue) descendant(carol,james) descendant(carol,sue) descendant(carol,john) descendant(bill,john) descendant(james,william) descendant(bill,william) descendant(carol,william)
'''
db_file = os.path.expanduser('~/data_readonly/datalog/family_base.lp')
clingo_evaluate(db_file, Descendant['code'], 'descendant', 2, expected_output)

TypeError: super(type, obj): obj must be an instance or subtype of type


### [7.5 points] Test 2 for descendant(X,Y)
The following contains a hidden test case. This will always pass in your (the student's) version but will actually be evaluated after submission.
* We will first add some facts that are hidden from the student.
* We will then run the descendant rules using these new facts and see if the rules are behaving as expected.

In [11]:
# Hidden Test 2 for descendant(X,Y)
# This cell will test the descendant with these new facts.
# Contents of this cell will not be present in student's version of assignment.
# This will only be evaluated after submission.


### [12.5 points] Defining a sibling relation 
* sibling(X,Y) holds if X and Y are siblings. Hint: X and Y are siblings if they share a parent P (and if X and Y are not the same person ;-)
* To avoid obtaining each pair of siblings twice, you should also require that X comes before Y alphabetically! 


In [13]:
%%clingo {"predicate" : "sibling", "predicate_arity" : 2, "result_var": "Sibling"}
% Don't change the clingo magic command above. The header of this cell will determine how the datalog rules are saved for evaluation.

% Following code snippet and it's result will be assigned to local variable Sibling

% SIBLING
% X and Y are siblings if they share a parent A.
% Note that X should be different from Y, therefore, you can use condition X<Y to denote the difference.
% Change following expression.

sibling(X, Y) :- parent(P, X), parent(P, Y), X != Y, X < Y.



Saving output to local variable Sibling['result']
Saving code snippet to local variable Sibling['code']



### [5 points] Test 1 for sibling(X,Y)
* The following test will compare the output of your sibling rule against the expected output.

As always: you must have run all clingo cells above for test to pass.

In [14]:
# Test 1 for sibling(X,Y)
# Following should be output of your previous cell.
# Order of predicates in the output doesn't matter.
# Run to see expected output with syntax highlighting.
expected_output = '''
sibling(bill,carol)
'''
db_file = os.path.expanduser('~/data_readonly/datalog/family_base.lp')
clingo_evaluate(db_file, Sibling['code'], 'sibling', 2, expected_output)

TypeError: super(type, obj): obj must be an instance or subtype of type

### [7.5 points] Test 2 for sibling(X,Y) 
The following is a hidden test case. As before: this will always pass in your version and will only be evaluated after submission.

In [15]:
# Hidden Test 2 for sibling(X,Y)
# This cell will test the sibling with these new facts.
# Contents of this cell will not be present in student's version of assignment.
# This will only be evaluated after submission.


### [12.5 points] icv_person_has_parent
* Integrity Constraint (IC): **Every person must have a parent!**
* Hints:
  - Write rules that report integrity constration violations in a relation `icv_person_has_parent/1`.
    - Here and elsewhere, we write **_R/n_** to indicate that relation **_R_** has **_n_** arguments!
  - First define `person/1` as the union of the first column and the second column of `parent/2`.
  - Then use `parent/2` to define who has a parent in `hasParent/1`.
  - Finally, define `icv_has_parent/1` using `person/1` and `hasParent/1` (and negation ;-)
* Note about safe rules:
  - A Datalog rule R is called _safe_ if and only if every variable that occurs in head or in any negative literal in the body also occurs _positively_ in the body. For example, given the rule:
    - `p(X):- q(Z,Y),r(X,X),not r(Y,Z)`
    - `X` in `p(X)` can be found in positive literal `r(X,X)` of the body
    - `Y,Z` in the negative literal `not r(Y,Z)` can be found in `q(Z,Y)`
    - then this rule is safe
 
The general idea of IC rules: First, write rules that yield people for which the IC _is_ satisfied. Then write a final rule that subtracts from all people those for which the IC is satisfied, resulting in those  for whom the IC is _not_ satisfied. Those are "reported" as IC-violations (they witness that there is a problem)!

In [16]:
%%clingo {"predicate" : "icv_person_has_parent", "predicate_arity" : 1, "result_var": "Icv_person_has_parent"}
% Don't change the clingo magic command above. The header of this cell will determine how the datalog rules are saved for evaluation.

% Following code snippet and it's result will be assigned to local variable Icv_person_has_parent

% Change following expression.
% First find all persons; then find who has a parent; then who hasn't (using a safe rule).

% icv_person_has_parent(A) :- replace_me_d1(A).

% person
person(X) :- parent(X, _).
person(X) :- parent(_, X).

% hasParent
hasParent(X) :- parent(_, X).

% IC Violation
icv_person_has_parent(X) :- person(X), not hasParent(X).


Saving output to local variable Icv_person_has_parent['result']
Saving code snippet to local variable Icv_person_has_parent['code']



### [5 points] Test 1 for icv_person_has_parent/1
The following test will compare the output of your `icv_person_has_parent` rule against the expected output.

(You must have run all clingo cells above for the test to pass.)

In [17]:
# Test 1 for icv_person_has_parent/1
# Following should be output of your previous cell.
# Order of predicates in the output doesn't matter.
# Run to see expected output with syntax highlighting.
expected_output = '''
icv_person_has_parent(william) icv_person_has_parent(sue)
'''

db_file = os.path.expanduser('~/data_readonly/datalog/family_base.lp')
clingo_evaluate(db_file, Icv_person_has_parent['code'], 'icv_person_has_parent', 1, expected_output)

TypeError: super(type, obj): obj must be an instance or subtype of type


### [7.5 points] Test 2 for icv_person_has_parent/1
Hidden test case.

In [18]:
# Hidden Test 2 for icv_person_has_parent/1
# This cell will test the icv_person_has_parent with these new facts.
# Contents of this cell will not be present in student's version of assignment.
# This will only be evaluated after submission.


### [12.5 points] icv_person_has_father_mother/1
* IC: **Every person has a father and a mother!**
* Hint: Same idea as above. First, write a rule that yields the people for which the IC is satisfied. Then write another rule that reports as an IC-violation those people who are not in the answer to that first query.


In [19]:
%%clingo {"predicate" : "icv_person_has_father_mother", "predicate_arity" : 1, "result_var": "Icv_person_has_father_mother"}
% Don't change the clingo magic command above. The header of this cell will determine how the datalog rules are saved for evaluation.

% Following code snippet and it's result will be assigned to local variable Icv_person_has_father_mother

% Change following expression.
% Find all people having both a father and mother, then report those who haven't.
% icv_person_has_father_mother(A) :- replace_me_d2(A).

% person
person(X) :- parent(X, _).
person(X) :- parent(_, X).

% hasFather
hasFather(X) :- parent(F, X), male(F).

% hasMother
hasMother(X) :- parent(M, X), female(M).

% IC Violation
icv_person_has_father_mother(X) :- person(X), not hasFather(X).
icv_person_has_father_mother(X) :- person(X), not hasMother(X).

Saving output to local variable Icv_person_has_father_mother['result']
Saving code snippet to local variable Icv_person_has_father_mother['code']


### [5 points] Test 1 for icv_person_has_father_mother/1
You must have run all clingo cells above for test to pass.

In [20]:
# Test 1 for icv_person_has_father_mother/1
# Following should be output of your previous cell.
# Order of predicates in the output doesn't matter.
# Run to see expected output with syntax highlighting.
expected_output = '''
icv_person_has_father_mother(william) icv_person_has_father_mother(john) icv_person_has_father_mother(james) icv_person_has_father_mother(sue)
'''

db_file = os.path.expanduser('~/data_readonly/datalog/family_base.lp')
clingo_evaluate(db_file, Icv_person_has_father_mother['code'], 'icv_person_has_father_mother', 1, expected_output)

TypeError: super(type, obj): obj must be an instance or subtype of type


### [7.5 points] Test 2 for icv_person_has_father_mother/1
Hidden test case.

In [21]:
# Hidden Test 2 for icv_person_has_father_mother/1
# This cell will test the icv_person_has_father_mother with these new facts.
# Contents of this cell will not be present in student's version of assignment.
# This will only be evaluated after submission.
