(ex_system_internals)=

# The expression system in detail

In this tutorial, we will delve into the implementation details of heyoka.py's expression system. Our main goal is to explain certain pitfalls that can occur in innocent-looking code and which can have disastrous consequence on performance, and to teach you, dear user, how to avoid these pitfalls.

## Reference semantics and shared (sub)expressions

The first and most important thing to understand about heyoka.py's expressions is that they implement so-called *reference semantics*. That is, an expression is essentially a handle to an underlying object, and copying the expression will not perform an actual copy, rather it will return a new reference to the same underlying object.

Before you ask "isn't this how all Python objects work?", let me immediately point out that heyoka.py's expressions are exposed from C++ and that reference semantics is implemented all the way down into the C++ layer. As a concrete example of what this means, consider the following simple expression:

In [2]:
import heyoka as hy

# Create a couple of variables.
x, y = hy.make_vars("x", "y")

# Create a simple expression.
ex = x + y

If we attempt to copy ``ex`` via the standard {func}`~copy.copy()` function, we will get nominally a new Python object, as we can see by querying the {func}`id()`:

In [6]:
from copy import copy

# Make a "copy" of ex.
ex_copy = copy(ex)

# Print the ids.
print(f'Original id: {id(ex)}')
print(f'Copy id    : {id(ex_copy)}')

Original id: 139928559827312
Copy id    : 139928752231008


However, both ``ex`` and ``ex_copy`` are in reality pointing to the **same** underlying C++ object which is shared among the two Python objects.

We can use ``ex`` as a building block to create more complicated expressions, e.g.:

In [8]:
a = hy.sin(ex) + hy.cos(ex)
a

(cos((x + y)) + sin((x + y)))

Because of the use of reference semantics, this expression will not contain two separate copies of $x + y$. Rather, it will contain two *references* to the original expression ``ex``.

If, on the other hand, we do **not** re-use ``ex`` and write instead

In [9]:
b = hy.sin(x + y) + hy.cos(x + y)
b

(cos((x + y)) + sin((x + y)))

we get an expression ``b`` which is mathematically equivalent to ``a`` but which contains two separate copies of $x + y$, rather than two references to ``ex``. This leads to a couple of very important consequences.

First of all, the memory footprint of ``b`` will be larger than ``a``'s because it is (wastefully) storing two copies of the same subexpression $x + y$ (rather than storing two references to the same underlying expression).

Secondly, heyoka.py's symbolic manipulation routines are coded to keep track of shared subexpressions with the goal of avoiding redundant computations. For instance, let us say we want to replace $x$ with $x^2 - 1$ via
the {func}`~heyoka.subs()` function:

In [12]:
hy.subs(a, {x: x**2 - 1.})

(cos(((x**2.0000000000000000 - 1.0000000000000000) + y)) + sin(((x**2.0000000000000000 - 1.0000000000000000) + y)))

In order to perform the substitution, the {func}`~heyoka.subs()` function needs to traverse the expression tree of ``a``. When it encounters for the first time the ``ex`` subexpression, it will:

1. perform the substitution, producing as a result $x^2-1+y$,
2. record in an internal bookkeeping structure that performing the substitution on the subexpression ``ex`` produced the result $x^2-1+y$.

Crucially, the **second** time ``ex`` is encountered during the traversal of the expression tree, the {func}`~heyoka.subs()` function will query the bookkeeping structure and detect that the result of the substitution on ``ex`` has already been computed, and it will fetch the cached result of the substitution instead of (wastefully) perform again the same computation. Thus, not only we avoided a redundant calculation, but also the two $x^2-1+y$ subexpressions appearing in the final result are pointing to the same underlying object (rather than being two separate copies of identical subexpressions).

On the other hand, when we apply the same substitution on ``b`` we get:

In [13]:
hy.subs(b, {x: x**2 - 1.})

(cos(((x**2.0000000000000000 - 1.0000000000000000) + y)) + sin(((x**2.0000000000000000 - 1.0000000000000000) + y)))

That is, the result is mathematically identical (obviously), but, because there is no internal subexpression sharing, the substitution $x \rightarrow x^2 - 1$ had to be performed twice (rather than once) and the two $x^2-1+y$ subexpressions appearing in the final result are two separate copies of identical subexpressions.

As a final piece of information, it is important to emphasise how subexpression sharing is not limited to single expressions, but it also happens across the components of a vector-valued expression. For instance, consider the following vector expression consisting of the two components $\left[ \sin\left( x + y \right) + \cos\left( x + y \right), 1 + \mathrm{e}^{x+y}\right]$:

In [16]:
vec_ex = [hy.sin(ex) + hy.cos(ex), 1. + hy.exp(ex)]
vec_ex

[(cos((x + y)) + sin((x + y))), (1.0000000000000000 + exp((x + y)))]

Here the subexpression ``ex`` is shared among the two components of ``vec_ex``, which both contain references to ``ex`` (rather than storing their own copies of ``ex``). When we invoke the {func}`~heyoka.subs()` function on ``vec_ex``, 

## Consequences for large computational graphs

These details are, most of the time, of little consequence and they may just result in small, hardly-detectable inefficiencies. In fact, a superficial analysis 

While the use of reference semantics in these simple examples might seem rather inconsequential, it is of absolutely massive importance when it comes to heyoka.py's ability to represent large computational graphs 