# Unification

This notebook is Part I of a two part assignment. See the `forward_planner` notebook for the second half of this problem. The forward planner requires a unification algorithm. 

Unification is simply the *syntactic* balancing of expressions. There are only 3 kinds of expressions: constants, lists and (logic) variables. Constants and lists are only equal to each other if they're exactly the same thing or can be made to be the same thing by *binding* a value to a variable.

It really is that simple...expressions must be literally the same (identical) except if one or the other (or both) has a variable in that "spot".

## S-Expressions

With that out of the way, we need a language with which to express our constants, variables and predicates and that language will be based on s-expressions.

**constants** - There are two types of constants, values and predicates. Values should start with an uppercase letter. Fred is a constant value, so is Barney and Food. Predicates are named using lowercase letters. loves is a predicate and so is hates. This is only a convention. Secret: your code does not need to treat these two types of constants differently.

**variables** - these are named using lowercase letters but always start with a question mark. ?x is a variable and so is ?yum. This is not a convention.

**expressions (lists)** - these use the S-expression syntax a la LISP. (loves Fred Wilma) is an expression as is (friend-of Barney Fred) and (loves ?x ?y).

(above description by S. Butcher)

In [1]:
import tokenize
from io import StringIO
from typing import List, Dict

This uses the above libraries to build a Lisp structure based on atoms. It is adapted from [simple iterator parser](http://effbot.org/zone/simple-iterator-parser.htm). The first function is the `atom` function.

In [2]:
# function by S. Butcher
def atom( next, token):
    if token[ 1] == '(':
        out = []
        token = next()
        while token[ 1] != ')':
            out.append( atom( next, token))
            token = next()
            if token[ 1] == ' ':
                token = next()
        return out
    elif token[ 1] == '?':
        token = next()
        return "?" + token[ 1]
    else:
        return token[ 1]

The next function is the actual `parse` function:

In [3]:
# function by S. Butcher
def parse(exp):
    src = StringIO(exp).readline
    tokens = tokenize.generate_tokens(src)
    return atom(tokens.__next__, tokens.__next__())

**Note** there was a change between 2.7 and 3.0 that "hid" the next() function in the tokenizer.

From a Python perspective, we want to turn something like "(loves Fred ?x)" to ["loves" "Fred" "?x"] and then work with the second representation as a list of strings. The strings then have the syntactic meaning we gave them previously.

In [4]:
parse("Fred")

'Fred'

In [5]:
parse( "?x")

'?x'

In [6]:
parse( "(loves Fred ?x)")

['loves', 'Fred', '?x']

In [7]:
parse( "(father_of Barney (son_of Barney))")

['father_of', 'Barney', ['son_of', 'Barney']]

## Unifier

Now that that's out of the way, here is the imperative pseudocode for unification. This is a classic recursive program with a number of base cases.

```
def unification( exp1, exp2):
    # base cases
    if exp1 and exp2 are constants or the empty list:
        if exp1 = exp2 then return {}
        else return FAIL
    if exp1 is a variable:
        if exp1 occurs in exp2 then return FAIL
        else return {exp1/exp2}
    if exp2 is a variable:
        if exp2 occurs in exp1 then return FAIL
        else return {exp2/exp1}

    # inductive step
    first1 = first element of exp1
    first2 = first element of exp2
    result1 = unification( first1, first2)
    if result1 = FAIL then return FAIL
    apply result1 to rest of exp1 and exp2
    result2 = unification( rest of exp1, rest of exp2)
    if result2 = FAIL then return FAIL
    return composition of result1 and result2
```

`unification` can return...

1. `None` (if unification completely fails)
2. `{}` (the empty substitution list) or 
3. a substitution list that has variables as keys and substituted values as values, like {"?x": "Fred"}. 

Note that the middle case sometimes confuses people..."Sam" unifying with "Sam" is not a failure so you return {} because there were no variables so there were no substitutions. You do not need to further resolve variables. If a variable resolves to an expression that contains a variable, you don't need to do the substition.

If you think of a typical database table, there is a column, row and value. This Tuple is a *relation* and in some uses of unification, the "thing" in the first spot..."love" above is called the relation. If you have a table of users with user_id, username and the value then the relation is:

`(login ?user_id ?username)`

*most* of the time, the relation name is specified. But it's not impossible for the relation name to be represented by a variable:

`(?relation 12345 "smooth_operator")`

Your code should handle this case.

Our type system is very simple. We can get by with just a few boolean functions. The first tests to see if an expression is a variable.

In [8]:
def is_variable( exp):
    return isinstance( exp, str) and exp[ 0] == "?"

In [9]:
is_variable( "Fred")

False

In [10]:
is_variable( "?fred")

True

The second tests to see if an expression is a constant:

In [11]:
def is_constant( exp):
    return isinstance( exp, str) and not is_variable( exp)

In [12]:
is_constant( "Fred")

True

In [13]:
is_constant( "?fred")

False

In [14]:
is_constant( ["loves", "Fred", "?wife"])

False

It might also be useful to know that:

<code>
type( "a")
&lt;type 'str'>
type( "a") == str
True
type( "a") == list
False
type( ["a"]) == list
True
</code>


You need to write the `unification` function described above. It should work with two expressions of the type returned by `parse`. See `unify` for how it will be called. It should return the result of unification for the two expressions as detailed above and in the book. It does not have to make all the necessary substitions (for example, if ?y is bound to ?x and 1 is bound to ?y, ?x doesn't have to be replaced everywhere with 1. It's enough to return {"?x":"?y", "?y":1}. For an actual application, you would need to fix this!)

(the previous description and helper functions were provided by S. Butcher)
-----

<a id="apply_result"></a>
## apply_result
*The apply_result function is a helper function to apply variable assignments to the rest of the expression in the unification algorithm.* **Used by**: [unification](#unification)

* **expression** str | List[str]: the expression to update
* **result** Dict[str, str]: the result to apply to the expression

**returns** List[str] | str.

In [15]:
def apply_result(expression: str | List[str], result: Dict[str, str]) -> str | List[str]:
    # if expression is a string
    if isinstance(expression, str):
        for key, value in result.items():
            if key in expression:
                return str(value) if not isinstance(value, list) else value
        return expression

    # recurse if expression is a list
    return [apply_result(value, result) for value in expression]

In [16]:
#assertions/unit tests
assert apply_result(['parent', 'Dave', '?x'], {'?x': 'Susan'}) == ['parent', 'Dave', 'Susan']
assert apply_result(['parent', '?x', '?y'], {'?x': 'Dave', '?y': 'Susan'}) == ['parent', 'Dave', 'Susan']
assert apply_result('?x', {'?x': 'Dave'}) == 'Dave'
assert apply_result(['parent', '(parent ?x)', '?y'], {'?x': 'Dave', '?y': 'Susan'}) == ['parent', 'Dave', 'Susan']

<a id="maybe_parse"></a>
## maybe_parse
*The maybe_parse function is a helper function to convert nested expressions to lists as needed throughout recursion.* **Used by**: [unification](#unification)

* **expression1** str | List[str]: the first expression
* **expression2** str | List[str]: the second expression

**returns** List[str | List[str]]

In [17]:
def maybe_parse(expression1: str | List[str], expression2: str | List[str]) -> List[str | List[str]]:
    if not isinstance(expression1, list):
        expression1 = parse(expression1)
    if not isinstance(expression2, list):
        expression2 = parse(expression2)
    return expression1, expression2

In [18]:
#assertions/unit tests
assert maybe_parse('(parent Dave)', 'Dave') == (['parent', 'Dave'], 'Dave')
assert maybe_parse('Dave', 'Dave') == ('Dave', 'Dave')
assert maybe_parse('(parent Dave)', '(parent Dave)') == (['parent', 'Dave'], ['parent', 'Dave'])
assert maybe_parse([], []) == ([], [])

<a id="check_base_cases"></a>
## check_base_cases
*The check_base_cases function checks to determine if the algorithm hit a base case to break recursion. The base cases are hit if either expression is empty, if the expressions are both constant, or if there is a variable that can or cannot be assigned.* **Used by**: [unification](#unification)

* **expression1** str | List[str]: the first expression
* **expression2** str | List[str]: the second expression

**returns** bool | None | Dict[str, str]

In [19]:
def check_base_cases(expression1: str | List[str], expression2: str | List[str]) -> bool | None | Dict[str, str]:
    if is_constant(expression1) and is_constant(expression2):
        return {} if expression1 == expression2 else None        
    if len(expression1) == 0 or len(expression2) == 0:
        return {} if expression1 == expression2 else None
    if is_variable(expression1):
        return None if expression1 in expression2 else {expression1: expression2}
    if is_variable(expression2):
        return None if expression2 in expression1 else {expression2: expression1}
    return False
    

In [20]:
#assertions/unit tests
assert check_base_cases('Fred', 'Dave') == None
assert check_base_cases('Fred', 'Fred') == {}
assert check_base_cases('Fred', '?x') == {'?x': 'Fred'}
assert check_base_cases('?x', 'Dave') == {'?x': 'Dave'}
assert check_base_cases('(?x ?x)', '(Dave Fred)') == None
assert check_base_cases([], []) == {}
assert check_base_cases([], 'Dave') == None
assert check_base_cases({}, {}) == {}
assert check_base_cases('Fred', {}) == None

<a id="unification"></a>
## unification
*Unification is a technique used in first-order inference algorithms to find a substitution list for variable assignments that will unify the expressions. Unifying the expressions means making the expressions exactly equal. The unification algorithm takes two expressions and determines if it can unify them. The unification algorithm returns a substitution list with the variable assignments that will unify the expressions or an empty dictionary indicating that the expressions already match. The unification algorithm returns None if unification fails.*

* **expression1** str | List[str]: the first expression
* **expression2** str | List[str]: the second expression

**returns** None | Dict[str, str] | Dict[]

In [21]:
def unification(expression1: str | List[str], expression2: str | List[str]) -> Dict[str, str]| None:
    # additional parsing to make nested expressions lists
    expression1, expression2 = maybe_parse(expression1, expression2)
    
    # check base cases
    base_case = check_base_cases(expression1, expression2)
    if base_case != False:
        return base_case

    result_1 = unification(expression1[0], expression2[0])
    if result_1 == None:
        return None
        
    rest_expression1 = apply_result(expression1[1:], result_1)
    rest_expression2 = apply_result(expression2[1:], result_1)
    
    result_2 = unification(rest_expression1, rest_expression2)
    if result_2 == None:
        return None    
    return {**result_1, **result_2}    

In [22]:
#see tests below

In [23]:
def list_check(parsed_expression):
    if isinstance(parsed_expression, list):
        return parsed_expression
    return [parsed_expression]

The `unification` pseudocode only takes lists so we have to make sure that we only pass a list.
However, this has the side effect of making "foo" unify with ["foo"], at the start.
That's ok.

In [24]:
def unify( s_expression1, s_expression2):
    list_expression1 = list_check(s_expression1)
    list_expression2 = list_check(s_expression2)
    return unification( list_expression1, list_expression2)

## Test Cases

Use the expressions from the Self Check as your test cases...

In [25]:
self_check_test_cases = [
    ['(son Barney Barney)', '(daughter Wilma Pebbles)', None],
    ['Fred', 'Barney', None],
    ['Pebbles', 'Pebbles', {}],
    ['(quarry_worker Fred)', '(quarry_worker ?x)', {'?x':'Fred'}],
    ['(son Barney ?x)', '(son ?y Bam_Bam)', {'?y':'Barney','?x':'Bam_Bam'}],
    ['(married ?x ?y)', '(married Barney Wilma)', {'?x': 'Barney','?y': 'Wilma'}],
    ['(son Barney ?x)', '(son ?y (son Barney))', {'?y': 'Barney','?x': ['son', 'Barney']}],
    ['(son Barney ?x)',  '(son ?y (son ?y))', {'?y': 'Barney','?x': ['son', 'Barney']}],
    ['(son Barney Bam_Bam)', '(son ?y (son Barney))', None],
    ['(loves Fred Fred)', '(loves ?x ?x)', {'?x': 'Fred'}],
    ['(future George Fred)', '(future ?y ?y)', None]   
]

for case in self_check_test_cases:
    exp1, exp2, expected = case
    actual = unify(exp1, exp2)
    print(f"Test Case: unify({exp1}, {exp2})")
    print(f"actual = {actual}")
    print(f"expected = {expected}")
    print("\n")
    assert expected == actual

Test Case: unify((son Barney Barney), (daughter Wilma Pebbles))
actual = None
expected = None


Test Case: unify(Fred, Barney)
actual = None
expected = None


Test Case: unify(Pebbles, Pebbles)
actual = {}
expected = {}


Test Case: unify((quarry_worker Fred), (quarry_worker ?x))
actual = {'?x': 'Fred'}
expected = {'?x': 'Fred'}


Test Case: unify((son Barney ?x), (son ?y Bam_Bam))
actual = {'?y': 'Barney', '?x': 'Bam_Bam'}
expected = {'?y': 'Barney', '?x': 'Bam_Bam'}


Test Case: unify((married ?x ?y), (married Barney Wilma))
actual = {'?x': 'Barney', '?y': 'Wilma'}
expected = {'?x': 'Barney', '?y': 'Wilma'}


Test Case: unify((son Barney ?x), (son ?y (son Barney)))
actual = {'?y': 'Barney', '?x': ['son', 'Barney']}
expected = {'?y': 'Barney', '?x': ['son', 'Barney']}


Test Case: unify((son Barney ?x), (son ?y (son ?y)))
actual = {'?y': 'Barney', '?x': ['son', 'Barney']}
expected = {'?y': 'Barney', '?x': ['son', 'Barney']}


Test Case: unify((son Barney Bam_Bam), (son ?y (son Barney)

Now add at least **five (5)** additional test cases of your own making, explaining exactly what you are testing.

In [26]:
new_test_cases = [
    ['(son Barney Barney)', '(daughter Wilma Pebbles)', None, "non-equal constants"],
    ['?x', '(father (son Barney))', {'?x': ['father', ['son', 'Barney']]}, "variable and constant expression"],
    ['?x', '(son ?x)', None, "variable part of expression2"],
    ['(son ?x)', '?x', None, "variable part of expression1"],
    ['(parent ?x (son Barney))', '(parent Pebbles (son ?x))', None, "nested variable ?x cannot be Pebbles and Barney"],
    ['(siblings ?x ?y)', '(siblings (brother Jason) (sister Marcy))', {'?x': ['brother', 'Jason'], '?y': ['sister', 'Marcy']}, "nested lists as variables"]
]
for case in new_test_cases:
    exp1, exp2, expected, message = case
    actual = unify(exp1, exp2)
    print(f"Test Case: unify({exp1}, {exp2})")
    print(f"Testing {message}...")
    print(f"actual = {actual}")
    print(f"expected = {expected}")
    print("\n")
    assert expected == actual

Test Case: unify((son Barney Barney), (daughter Wilma Pebbles))
Testing non-equal constants...
actual = None
expected = None


Test Case: unify(?x, (father (son Barney)))
Testing variable and constant expression...
actual = {'?x': ['father', ['son', 'Barney']]}
expected = {'?x': ['father', ['son', 'Barney']]}


Test Case: unify(?x, (son ?x))
Testing variable part of expression2...
actual = None
expected = None


Test Case: unify((son ?x), ?x)
Testing variable part of expression1...
actual = None
expected = None


Test Case: unify((parent ?x (son Barney)), (parent Pebbles (son ?x)))
Testing nested variable ?x cannot be Pebbles and Barney...
actual = None
expected = None


Test Case: unify((siblings ?x ?y), (siblings (brother Jason) (sister Marcy)))
Testing nested lists as variables...
actual = {'?x': ['brother', 'Jason'], '?y': ['sister', 'Marcy']}
expected = {'?x': ['brother', 'Jason'], '?y': ['sister', 'Marcy']}


