Make sure you fill in any place that says `YOUR CODE HERE`. 

---

# Homework 5

*This* is a Python Notebook homework.  It consists of various types of cells: 

* Text: you can read them :-) 
* Code: you should run them, as they may set up the problems that you are asked to solve.
* **Solution:** These are cells where you should enter a solution.  You will see a marker in these cells that indicates where your work should be inserted.  

```
    # YOUR CODE HERE
```    

* Test: These cells contains some tests, and are worth some points.  You should run the cells as a way to debug your code, and to see if you understood the question, and whether the output of your code is produced in the correct format.  The notebook contains both the tests you see, and some secret ones that you cannot see.  This prevents you from using the simple trick of hard-coding the desired output. 

### Questions

There are three groups of questions: 

* Implementing symbolic derivatives
* Implementing the distributive property
* Implementing the randomized test for equality. 

In order to do well on this homework, you need to do at least two of the three groups; the third is extra credit.  The group on distributive property might be harder, and so you might wish to try it last, but of course, it is up to you.

There are other pieces of text called "exercises", but you only have to do those that are explicitly marked with a place in the code for you to write the answer. 

### Working on Your Notebook

To work on your notebook, you can just work on `colab.research.google.com`.  Please don't download it and work directly on your laptop.  Working on Colab has two key features: 

* The notebook is shared with the TAs, tutors, and with the instructor.  So when you report that you have difficulties, they can open your notebook and help you. 
* The notebook preserves the revision history, which is useful for many reasons, among which that we can see how you reached the solution.

### Submitting Your Notebook

Submit your work as follows: 

* Download the notebook from Colab, clicking on "File > Download .ipynb".
* Upload the resulting file to [this Google form](https://docs.google.com/forms/d/e/1FAIpQLSeCTHF3y9L00KsowscmYWoiqyYIfiIxWv3ldsmduPa1YYzttQ/viewform?usp=sf_link).
* **Deadline: Monday October 28, 7pm.**

You can submit multiple times, and the last submittion before the deadline will be used to assign you a grade. 

We will develop a data structure to represent arithmetic expressions containing variables, such as $3 + 4$ or $2 + x * (1 - y)$.  

What is an expression?  An expression consists of one of these: 


1. A number
2. A variable
3. If $e_1$ and $e_2$ are expressions, then $e_1 + e_2$, $e_1 - e_2$, $e_1 * e_2$, and $e_1 / e_2$ are also expressions. 

Formally, the set of expressions is the _least_ set constructed according to the rules above. 

Thus, an expression can be either a constant, representing numbers and variables, or a composite expression, consisting of an operator, a left expression, and a right expression.  


There are (at least) two ways of representing expressions. The simplest way is to represent expressions as trees, and define operations on them. 
The more sophisticated way consists in representing expressions via classes: there will be one class for variable and constants, and one class representing composite expressions; both of these classes will be subclasses of a generic "expression" class. 

In this chapter, we will represent expression as trees, to gain experience with writing recursive functions on trees; in the next chapter, we will show how to represent them more elegantly as classes.

In [0]:
# Let us ensure that nose is installed. 
try:
    from nose.tools import assert_equal, assert_true
    from nose.tools import assert_false, assert_almost_equal
except:
    !pip install nose
    from nose.tools import assert_equal, assert_true
    from nose.tools import assert_false, assert_almost_equal

We will represent expressions as trees.  A number will be represented via a number; a variable via a string, and the expression $e_1 \odot e_2$ via the tuple $(\odot, e_1, e_2)$, for $\odot \in \{+, -, *, / \}$.

For example, we will represent $2 * (x + 1)$ via:

    ('*', 2, ('+', 'x', 1))

In [0]:
e = ('*', 2, ('+', 'x', 1))

### A compute function

Let us define a function compute() that takes one such expression, and returns the expression obtained by performing all possible numerical computations: 

In [0]:
from numbers import Number

def compute(e):
    if isinstance(e, Number) or isinstance(e, str):
        # No simplification possible.
        return e
    else:
        op, l, r = e
        # We compute the left and right subexpressions first.
        ll = compute(l)
        rr = compute(r)
        # And we carry out the operation if we can.
        if isinstance(ll, Number) and isinstance(rr, Number):
            if op == '+':
                return ll + rr
            elif op == '-':
                return ll - rr
            elif op == '*':
                return ll * rr
            elif op == '/' and rr != 0:
                return ll / rr
        # We cannot perform op, so we return an expression.
        return (op, ll, rr)

Let's see how this works.

In [20]:
compute(3)

3

In [21]:
compute( ('+', 'x', ('-', 7, 2)) )

('+', 'x', 5)

### Evaluating expressions with respect to a variable valuation.

This is very good, but we can carry it one step further.  If we specify values for variables, we can then use those values in computing the value of an expression. 
A _variable valuation_ is a mapping from variables to their values; we can represent it simply as a dictionary associating to each variable a number:

In [0]:
varval = {'x': 3, 'y': 8}


We can compute the value of expressions given a variable valuation as follows: 

In [0]:
from numbers import Number

def compute(e, varval={}):
    if isinstance(e, Number):
        return e
    elif isinstance(e, str):
        v = varval.get(e)
        # If we find a value for e, we return it; otherwise we return e.
        return e if v is None else v
    else:
        op, l, r = e
        # We simplify the left and right subexpressions first.
        ll = compute(l, varval=varval)
        rr = compute(r, varval=varval)
        # And we carry out the operation if we can.
        if isinstance(ll, Number) and isinstance(rr, Number):
            if op == '+':
                return ll + rr
            elif op == '-':
                return ll - rr
            elif op == '*':
                return ll * rr
            elif op == '/' and rr != 0:
                return ll / rr
        # Not simplifiable.
        return (op, ll, rr)

In [24]:
e = ('*', 2, ('+', 'x', ('-', 3, 2)))
print(compute(e))
print(compute(e, varval={'x': 6}))

('*', 2, ('+', 'x', 1))
14


If we provide the values for only some of the variables, the compute function defined above, will plug in the values for those variables and perform all computations possible.  Of course, if the expression contains variables for which the valuation does not specify a value, the resulting expression will still contain those variables: it will not be simply a number.  In computer science, evaluating an expression as far as possible using the values for a subset of the variables is knwon as _partial evaluation_.

In [25]:
e = ('+', ('-', 'y', 3), ('*', 'x', 4))
print(compute(e, varval={'x': 2}))
print(compute(e, varval={'y': 3}))
print(compute(e, varval={'x': 2, 'y': 3}))

('+', ('-', 'y', 3), 8)
('+', 0, ('*', 'x', 4))
8


### A digression about recursion on trees

A typical form for a recursive algorithm on expresson trees is the following.  Given a node $(\odot, e_1, e_2)$: 

* First apply the algorithm to $e_1$ and $e_2$, obtaining $e'_1$, $e'_2$ respectively.
* Then, consider $(\odot, e'_1, e'_2)$, and return whatever is appropriate for this node. 

We will see this scheme several times in the following examples and problems.

## Symbolic expression manipulation

Now that we have a representation for expressions, we can manipulate them symbolically. 


### Variable substitution

A simple symbolic manipulation consists in variable substitution: given an expression $e$ and a dictionary $d$ mapping variables to their substitutions, we perform the substitutions, returning the resulting expression.  This can be implemented via a simple recursion: 

* constants are unchanged;
* at leaf nodes consisting in variables, we perform the substitutions;
* at nodes $(\odot, e_1, e_2)$, we perform the substitutions in $e_1$ and $e_2$, obtaining $e'_1$, $e'_2$, and we return $(\odot, e'_1, e'_2)$. 

In [0]:
def variable_substitution(e, d):
    """Performs variable substitutions in e according to the replacement dictionary d"""
    if isinstance(e, tuple):
        op, e1, e2 = e
        ee1 = variable_substitution(e1, d)
        ee2 = variable_substitution(e2, d)
        return (op, ee1, ee2)
    elif isinstance(e, Number):
        return e
    else:
        # We perform the substitution, if one is specified.
        return d.get(e, e)

In [27]:
e = ('+', ('-', 'x', 'y'), ('*', 'x', 2))
variable_substitution(e, {'y': 'z'})

('+', ('-', 'x', 'z'), ('*', 'x', 2))

### Fraction form

To gain practice working with symbolic expression trees, we will implement a function that transforms an expression into _fraction form._  We say that an expression $e$ is in _fraction form_ if one of these two conditions is true: 

* either $e$ does not contain the division operator $/$, 
* or $e = e_1 / e_2$, for $e_1$ and $e_2$ not containing $/$. 

Thus, intuitively, an expression $e$ in fraction form either does not contain division, or is in the form of a fraction, with a numerator and a denominator, neither of which contains the division operator.  

In order to put an expression in fraction form, we start bottom up, obtaining fraction representations for the expression nodes proceeding from the leaves, and going up to the top, in fashion not dissimilar to what we did in the compute function.  At a node $(\odot, e_1, e_2)$, given fraction representations for $e_1$ and $e_2$, we obtain a fraction representation for the node via: 

$$
\frac{n_1}{d_1} \pm \frac{n_2}{d_2} \Rightarrow \frac{n_1 d_2 \pm n_2 d_1}{d_1 d_2}, \quad
\frac{n_1}{d_1} \cdot \frac{n_2}{d_2} \Rightarrow \frac{n_1 n_2}{d_1 d_2}, \quad
\frac{n_1}{d_1} \Bigm/ \frac{n_2}{d_2} \Rightarrow \frac{n_1 d_2}{d_1 n_2}.
$$


Our implementation proceeds as follows.  Given a node  $(\odot, e_1, e_2)$, we first determine whether one of $e_1$ or $e_2$ is a fraction, that is, has the $/$ operator as the root operator.  If this is the case, we combine the fractions $e_1$ and $e_2$ using the rules above.  If none of them is a fraction, we simply leave the node unchanged.  

In the implementation, we make use of an auxiliary function get_num_den, which gets numerator and denumerator of an expression; if the expression is not a fraction, the denominator will be 1.  In passing, we note that in Python you _can_ define functions inside other functions.  It is not done very often, but it can be handy when the function being defined is a helper function that is only going to be useful in the context of the enclosing function.

In [0]:
def to_fraction(e):
    """Returns the expression e converted to fraction form."""
    
    def is_fraction(ee):
        """Returns true if the expression ee is a fraction."""
        return isinstance(ee, tuple) and ee[0] == '/'
    
    def get_num_den(ee):
        """Returns numerator and denominator of an expression ee"""
        return (ee[1], ee[2]) if is_fraction(ee) else (ee, 1)
        
    if isinstance(e, tuple):
        op, e1, e2 = e
        # First, we put in fraction form e1 and e2.
        ee1, ee2 = to_fraction(e1), to_fraction(e2)
        if is_fraction(ee1) or is_fraction(ee2):
            # One of the two expressions is a fraction.  We need to combine
            # them using he above rules.
            n1, d1 = get_num_den(ee1)
            n2, d2 = get_num_den(ee2)
            if op in '+-':
                return ('/', 
                        (op, ('*', n1, d2), ('*', n2, d1)),
                        ('*', d1, d2))
            elif op == '*':
                return ('/', ('*', n1, n2), ('*', d1, d2))
            else:
                return ('/', ('*', n1, d2), ('*', d1, n2))
        # Neither ee1 nor ee2 are fractions. Nothing to do.
        return e    
    # Numbers and variables, and expressions involving non-fractions,
    # are left unchanged.
    return e

Let us put into fraction form the expression:
$$
\frac{a}{b} + \frac{c}{2}
$$

In [29]:
e = ('+', ('/', 'a', 'b'), ('/', 'c', 2))
print(to_fraction(e))

('/', ('+', ('*', 'a', 2), ('*', 'c', 'b')), ('*', 'b', 2))


The above is
$$
\frac{2a + cb}{2b}
$$
as expected.  Let us now try with 
$$
\frac{a}{\frac{b}{\frac{c}{d}}}
$$

In [30]:
e = ('/', 'a', ('/', 'b', ('/', 'c', 'd')))
print(to_fraction(e))

('/', ('*', 'a', ('*', 1, 'c')), ('*', 1, ('*', 'b', 'd')))


Which is
$$
\frac{ac}{bd}
$$
if we could simplify multiplications by 1.

**Exercise.** If we compare expressions as lists, we have that
 
    ('+', 1, 'x')
    
and

    ('+', 'x', 1)
    
 are different.  Write a function commutative_eq that returns True iff two expression are equal, modulo the commutative property of addition and multiplication.  What is the running time of the function you wrote?  Can you improve it? 

### Derivatives

As we have symbolic expressions, we can compute their (partial) derivative with respect to any variable.  Given an expression $e$ and a variable $x$, we denote by $\partial e / \partial x$ the partial derivative of $e$ with respect to $x$.  To compute it, we can simply rely on the definition of derivative. 
For leaf nodes in the expression tree:

* For a constant $c$, $\partial c / \partial x = 0$.
* For a variable $y \neq x$,  $\partial y / \partial x = 0$.
* $\partial x / \partial x = 1$.

For operators, we can use:

$$
 \begin{align*}
 \frac{\partial}{\partial x}(f \pm g) & = \frac{\partial f}{\partial x} \pm \frac  {\partial g}{\partial x}, \\[1ex]
 \frac{\partial}{\partial x}(f \cdot g) & = g \cdot \frac{\partial f}{\partial x}  + f \cdot \frac{\partial g}{\partial x}, \\[1ex]
 \frac{\partial}{\partial x}\left(\frac{f}{g}\right) & = \frac{g \cdot \frac
  {\partial f}{\partial x} - f \cdot \frac{\partial g}{\partial x}}{g^2}. 
\end{align*}
$$

This directly suggest how to implement the symbolic computation of derivatives.

**Exercise: Symbolic derivatives.**  Write a function derivate that, given an expression $e$ and a variable $x$, returns an expression for $\partial e / \partial x$.  Please, write it according to the above rules, including order of terms in products.  For instance, use

$$ 
\frac{\partial}{\partial x}(f \cdot g) = g \cdot \frac{\partial f}{\partial x}  + f \cdot \frac{\partial g}{\partial x}
$$
rather than 
$$ 
\frac{\partial}{\partial x}(f \cdot g) = \frac{\partial f}{\partial x} \cdot g + f \cdot \frac{\partial g}{\partial x}
$$

While the two expressions are equivalent, our tests (so far!) can only check for _identical_, not _equivalent_, expressions.

In [0]:
### Implement `derivate`

def derivate(e, x):
    """Returns the derivative of e wrt x.
    It can be done in less than 15 lines of code."""
    # YOUR CODE HERE
    if isinstance(e, tuple):
      op, e1, e2 = e
      ee1 = derivate(e1, x)
      ee2 = derivate(e2, x)
      if op in '+-':
        return (op, ee1, ee2)
      if op is '*':
        return ('+', ('*', e2, ee1), ('*', e1, ee2))
      if op is '/':
        return ('/', ('-', ('*', e2, ee1,), ('*', e1, ee2)), ('*', e2, e2))
    elif isinstance(e, Number) or e is not 'x':
      return 0
    else:
      return 1
      

In [0]:
### Base case tests for `derivate`

# First, the basics.
assert_equal(derivate(3, 'x'), 0)
assert_equal(derivate('y', 'x'), 0)
assert_equal(derivate('x', 'x'), 1)


In [0]:
### Tests for `derivate` for single-operator expressions

assert_equal(derivate(('+', 'x', 'x'), 'x'), ('+', 1, 1))
assert_equal(derivate(('-', 4, 'x'), 'x'), ('-', 0, 1))
assert_equal(derivate(('*', 2, 'x'), 'x'), 
             ('+', ('*', 'x', 0), ('*', 2, 1)))
assert_equal(derivate(('/', 2, 'x'), 'x'), 
             ('/', ('-', ('*', 'x', 0), ('*', 2, 1)), ('*', 'x', 'x')))


In [0]:
### Tests for `derivate` for composite expressions

e1 = ('*', 'x', 'x')
e2 = ('*', 3, 'x')
num = ('-', e1, e2)
e3 = ('*', 'a', 'x')
den = ('+', e1, e3)
e = ('/', num, den)

f = ('/',
 ('-',
  ('*',
   ('+', ('*', 'x', 'x'), ('*', 'a', 'x')),
   ('-',
    ('+', ('*', 'x', 1), ('*', 'x', 1)),
    ('+', ('*', 'x', 0), ('*', 3, 1)))),
  ('*',
   ('-', ('*', 'x', 'x'), ('*', 3, 'x')),
   ('+',
    ('+', ('*', 'x', 1), ('*', 'x', 1)),
    ('+', ('*', 'x', 0), ('*', 'a', 1))))),
 ('*',
  ('+', ('*', 'x', 'x'), ('*', 'a', 'x')),
  ('+', ('*', 'x', 'x'), ('*', 'a', 'x'))))

assert_equal(derivate(e, 'x'), f)


### Distributive property

**Exercise:** Implement the `apply_distributive_property` function, which applies the distributive property to an expression to push down multiplication to the leaves as far as possible.  The function should apply, recursively, the following transformations: 

$$
f (g \pm h) \Rightarrow fg \pm fh \qquad
(f \pm g) h \Rightarrow fh \pm fg
$$

until the multiplications cannot be moved down further in the expression tree.

_Hint:_ Model the implementation after the one of `to_fraction`, but use the rules for distributivity above instead of the ones given there.  The implementation is somewhat more complex, and can be done as follows. 

If you have $e = e_1 \odot e_2$, you first must apply distributivity to $e_1$ and $e_2$, obtaining $e'_1$ and $e'_2$.  If $\odot \neq *$, then you can just return $e'_1 \odot e'_2$. If $\odot = *$, then you must consider whether $e'_1$ or $e'_2$ has the form $g_1 \pm g_2$.  Assume $e'_2$ does.  This means your expression is $e'_1 * (g_1 \pm g_2)$, and you need to apply distributivity:

$$
e'_1 * (g_1 \pm g_2) \Rightarrow (e'_1 * g_1) \pm (e'_1 * g_2)
$$

The trick is that _after_ you apply distributivity and obtain $(e'_1 * g_1) \pm (e'_1 * g_2)$, you must _again_ apply distributivity to both

$$
(e'_1 * g_1), \qquad (e'_1 * g_2)
$$

obtaining $f_1, f_2$, respectively.  This step is necessary because $(e'_1 * g_1)$ now has $*$ as top operator, and you may need to "push it down" using distributivity, obtaining $f_1$; similarly for $(e'_1 * g_2)$.   Finally, you can return $f_1 \pm f_2$. 


In [0]:
### Exercise: Implement `apply_distributive`

def apply_distributive(e):
    """Applies the distributive property to an expression e."""
    # YOUR CODE HERE

    def has_plus_or_minus(e):
      if e[0] == '+' or e[0] == '-':
        return True
      return False
    
    if isinstance(e, Number):
      return e
    if isinstance(e, tuple):
      op, e1, e2 = e
      ee1 = apply_distributive(e1)
      ee2 = apply_distributive(e2)
      if op != '*' or (isinstance(ee1, Number) and isinstance(ee2, Number)):
        return (op, ee1, ee2)
      else:
        if isinstance(ee1, tuple) and isinstance(ee2, tuple):
          if has_plus_or_minus(ee1) and has_plus_or_minus(ee2):
            g = (ee1[0], (ee2[0], ('*', ee1[1], ee2[1]), ('*', ee1[1], ee2[2])), (ee2[0], ('*', ee1[2], ee2[1]), ('*', ee1[2], ee2[2])))
          elif has_plus_or_minus(ee1) and not has_plus_or_minus(ee2):
            g = (ee1[0], ('*', ee1[1], ee2), ('*', ee1[2], ee2))
          elif has_plus_or_minus(ee2) and not has_plus_or_minus(ee1):
            g = (ee2[0], ('*', ee1, ee2[1]), ('*', ee1, ee2[2]))
          else:
            g = ('*', ee1, ee2)
        if isinstance(ee1, tuple) and isinstance(ee2, Number):
          if has_plus_or_minus(ee1):
            g = (ee1[0], ('*', ee1[1], ee2), ('*', ee1[2], ee2))
          else:
            g = ('*', ee1, ee2)
        if isinstance(ee1, Number) and isinstance(ee2, tuple):
          if has_plus_or_minus(ee2):
            g = (ee2[0], ('*', ee1, ee2[1]), ('*', ee1, ee2[2]))
          else:
            g = ('*', ee1, ee2)
        f1 = apply_distributive(g[1])
        f2 = apply_distributive(g[2])
        return(g[0], f1, f2)  

In [0]:
### Simple test for distributivity

# Simple test

e = ('*', ('+', 1, 2), ('-', 3, 4))
f = ('+', ('-', ('*', 1, 3), ('*', 1, 4)), ('-', ('*', 2, 3), ('*', 2, 4)))
assert_equal(apply_distributive(e), f)


In [37]:
### More complicated tests for distributivity

# More complex tests

# e = ('*', ('+', 1, 2), ('-', 3, 4))
# e2 = ('*', e, ('+', 5, 6))
# f = ('+',
#  ('-',
#   ('+', ('*', ('*', 1, 3), 5), ('*', ('*', 1, 3), 6)),
#   ('+', ('*', ('*', 1, 4), 5), ('*', ('*', 1, 4), 6))),
#  ('-',
#   ('+', ('*', ('*', 2, 3), 5), ('*', ('*', 2, 3), 6)),
#   ('+', ('*', ('*', 2, 4), 5), ('*', ('*', 2, 4), 6))))

# assert_equal(apply_distributive(e2), f)

e = ('*', ('*', ('+', 1, 2), ('-', 3, 4)), ('*', ('-', 5, 6), ('+', 7, 8)))
f = ('+',
 ('-',
  ('-',
   ('+', ('*', ('*', 1, 3), ('*', 5, 7)), ('*', ('*', 1, 3), ('*', 5, 8))),
   ('+', ('*', ('*', 1, 3), ('*', 6, 7)), ('*', ('*', 1, 3), ('*', 6, 8)))),
  ('-',
   ('+', ('*', ('*', 1, 4), ('*', 5, 7)), ('*', ('*', 1, 4), ('*', 5, 8))),
   ('+', ('*', ('*', 1, 4), ('*', 6, 7)), ('*', ('*', 1, 4), ('*', 6, 8))))),
 ('-',
  ('-',
   ('+', ('*', ('*', 2, 3), ('*', 5, 7)), ('*', ('*', 2, 3), ('*', 5, 8))),
   ('+', ('*', ('*', 2, 3), ('*', 6, 7)), ('*', ('*', 2, 3), ('*', 6, 8)))),
  ('-',
   ('+', ('*', ('*', 2, 4), ('*', 5, 7)), ('*', ('*', 2, 4), ('*', 5, 8))),
   ('+', ('*', ('*', 2, 4), ('*', 6, 7)), ('*', ('*', 2, 4), ('*', 6, 8))))))
print(compute(apply_distributive(e)))
print(compute(f))
assert_equal(apply_distributive(e), f)


45
45


AssertionError: ignored

### Other Exercises

**Exercise: Implement 1* and 0+ simplification.**   Write a function simplify01 that, given an expression, replaces all subexpressions of the form $(*, 1, e)$ or $(*, e, 1)$  by $e$, and replaces all subexpressions of the form $(+, 0, e)$, $(+, e, 0)$, $(-, e, 0)$,  by $e$.  

Note that you have to perform this in recursive fashion, performing the semplification from the bottom up in the expression.  Precisely, for an expression $(\odot, e_1, e_2)$, you have to first perform the simplification on $e_1$ and $e_2$, obtaining $e'_1$ and $e'_2$, respectively.  Once this is done, you consider the resulting expression $(\odot, e'_1, e'_2)$, and you perform the simplification on that expression.

**Exercise: Implement 0* simplification.** 
Write a function simpify_timeszero which replaces $(*, 0, e)$ with $0$.  The problem is that this simplification is not always valid; for example, in 

$$
0 \cdot \frac{3}{0}
$$

we cannot simplify the expression to $0$, because the whole expression is in fact an indeterminate form.  
However, for a form $(*, 0, e)$ or $(*, e, 0)$, if $e$ does not contain $/$ nodes, then we can carry out the simplification. 
Write a function simplify_timeszero that, given an expression $e$, applies this simplification at all levels of the tree. 

**Exercise: Implement 1 * , 0+, and 0* simplifications.**  Implement a function called simplify01 that performs all the possibe 1*, 0+, 0* simplifcations at all levels of the tree.  

Is it enough to call first simpify01, then simplify_timeszero?  Think carefully.

## When are two expressions equal? 

> _Or: it's better to be lucky than to be smart._

> _Or: if you don't know how to do it right, do it at random._ 

> _Or: the power of randomization._

We now consider the following problem: given two expressions $e$ and $f$, how can we decide whether they are equal in value, that is, whether they yield always the same value for all values of the variables? 

This _"value equality"_ is a different notion from the structural equality we defined before.  For instance, the two expressions `V('x') + 1` and `2 * V('x') + 1 - V('x')` are not structurally equal, but they are equal in values.  

How can we test for value equality of expressions?  There are two ways: the high road one, and the pirate one.  

The high-road approach consists in trying to demonstrate, in some way, that the two expressions are equal.  One way of doing so would be to define a set of [rewriting rules](https://en.wikipedia.org/wiki/Rewriting) for expressions, that try to transform one expression into the other; this would mimick the process often done by hand to show that two expressions are equal.  Another way would be to use theorem provers that can reason about expressions and real numbers, such as [PVS](https://pvs.csl.sri.com).  The problem is that these approaches are a lot of work.  Is there a way to be lazy, and still get the job done? 

There is, it turns out.  Suppose you have two expressions $f, g$ containing variable $x$ only.  The idea is that if $f$ and $g$ are built with the usual operators of algebra, it is exceedingly unlikely for $f$ and $g$ to give the same value  many values of $x$, and yet not be always equal.  This would not be true if our expressions could contain if-then-else statements, but for the operators we defined so far, it holds.  Indeed, one could be more precise, and try to come up with a theorem of the form: 

> If $f$ and $g$ have "zerosity" $n$, and are equal for $n+1$ values of $x$, then they are equal for all values of $x$. 

We could then try to define the "zerosity" of an expression to make this hold: for example, for two polynomials of degree at most $d$, once you show that they are equal for $d+1$ points, they must be equal everywhere ([why?](https://en.wikipedia.org/wiki/Fundamental_theorem_of_algebra)).  But this again would be a smart approach, and we are trying to see if we can solve the problem while being as stupid as possible.  So our idea will simply be: pick 1000 values of $x$ at random; if the two expressions are equal for all the values, then they must be equal everywhere.  This is a somewhat special case of a [Monte Carlo method](https://en.wikipedia.org/wiki/Monte_Carlo_method), a method used to estimate the probability of complex phenomena (where expression equality is our phenomenon).

There are only two wrinkles with this.  The first is that an expression can contain many variables, and we have to try to value assignments for all of the variables.  This is easy to overcome; we just need some helper function that gives us the set of variables in a function.  The second wrinkle is: how do we generate the possible value assignments?  How big do these values need to be on average?  According to what probability distribution?  We could dive into a lot of theory and reasoning about how to compute appropriate probability distributions, but since our goal is to be stupid, we will use one of the simplest distributions with infinite domain: the Gaussian one. 

Let us start by writing the function `variables` such that, if `e` is an expression, `variables(e)` is the set of variables that appear in it.

In [0]:
### Exercise: define `variables`

# YOUR CODE HERE
def variables(e):
  set_of_variables = set()
  def split(e):
    if isinstance(e, tuple):
      op, e1, e2 = e
      ee1 = split(e1)
      ee2 = split(e2)
    else: 
      if not isinstance(e, int):
        set_of_variables.add(e)
      return e
  split(e)
  return set_of_variables

In [0]:
### Tests for `Expr.variables`

e = ('*', ('+', 'x', 2), ('/', 'x', 'y'))
assert_equal(variables(e), {'x', 'y'})


Now write the `value_equality` method for expressions.  You can write it elegantly in 6 lines of code.

In [0]:
### Exercise: implementation of value equality

import random

def value_equality(e1, e2, num_samples=1000, tolerance=1e-6):
    """Return True if the two expressions self and other are numerically
    equivalent.  Equivalence is tested by generating 
    num_samples assignments, and checking that equality holds
    for all of them.  Equality is checked up to tolerance, that is, 
    the values of the two expressions have to be closer than tolerance.
    It can be done in less than 10 lines of code."""
    # YOUR CODE HERE

    for _ in range(num_samples):
      d1 = {}
      d2 = {}
      r = random.gauss(0, 1)
      for var in variables(e1):
        d1[var] = r
      for var in variables(e2):
        d2[var] = r
      val1 = compute(e1, varval = d1)
      val2 = compute(e2, varval = d2)
      if abs(val1 - val2) > tolerance:
        return False
    return True
      



In [0]:
### Tests for value equality

e1 = ('+', ('*', 'x', 1), ('*', 'y', 0))
e2 = 'x'
assert_true(value_equality(e1, e2))

e3 = ('/', ('*', 'x', 'x'), ('*', 'x', 1))
assert_true(value_equality(e1, e3))

e4 = ('/', 'y', 2)
assert_false(value_equality(e1, e4))
assert_false(value_equality(e3, e4))
