This tutorial describes one particular intermediate representation used by the Devito Compiler: the Iteration/Expression Tree (IET), a special type of Abstract Syntax Tree.

# Part I - Top Down

Here, we investigate the IET of a simple ``Operator``.

First, let's describe a $domain$, and a $Function$ that will allow us to specify how such a domain gets modified. In particular, we will look at functions that change through $time$. 

Thus we need a function object with which we can build a timestepping scheme. For this purpose Devito provides so-called TimeData objects that encapsulate functions that are differentiable in space and time, which are derived from basic $SymPy$ functions. 

With this we can derive symbolic expressions for the backward derivatives in space directly via the `u.dxl` and `u.dyl` shorthand expressions (the l indicates "left" or backward differences) and the shorthand notation `u.dt` provided by TimeData objects to derive the forward derivative in time.



In [2]:
from devito import Eq, Grid, TimeFunction, Operator

grid = Grid(shape=(3, 3))
u = TimeFunction(name='u', grid=grid)
u.data

Data([[[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]],

      [[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]]], dtype=float32)

Here, we have just declared a two-dimensional Domain with three coordinates at each dimension ($x$ and $y$). Each coordinate of such a discrete space will be holding a $real~value$.

As we can see, we can always access the values at each coordinate of the Domain. At this point, no modifications have been done to it yet. `u.data` give as a quick access over the values holded by each cell of such a Domain. `u.data[0]` holds the values in the grid at the "current" iteration time, given a time-step to be considered, whereas `u.data[1]` holds the values of `u` for the "current+1" time-step.

We can now create an `Operator` that will perform modifications onto our Domain according to $differential~ equations$ through a computational stencil. 
It means that those differential equations will be translated into finite differences that will be used to update the values at each coordinate of the space.
Such finite differences, or `expressions`, will be applied for specific ranges of `iterations` over the Domain. 

In [3]:
eq = Eq(u.forward, u+1)
op = Operator(eq)
op.args['expressions']

Eq(u(t + dt, x, y), u(t, x, y) + 1)

For instance, the particular `Equation` object above allows us say that, at each time step, `1` will be added to every position of the domain.

Let's take a look at the $kernel$ that will be used to compute how this equation alters the domain.

In [4]:
print(op)

#define _POSIX_C_SOURCE 200809L
#include "stdlib.h"
#include "math.h"
#include "sys/time.h"
#include "xmmintrin.h"
#include "pmmintrin.h"

struct Profiler
{
  double section0;
} ;


int Kernel(float *restrict u_vec, const int time_M, const int time_m, void *_timers, const int x_M, const int x_m, const int x_size, const int y_M, const int y_m, const int y_size)
{
  float (*restrict u)[x_size + 1 + 1][y_size + 1 + 1] __attribute__((aligned(64))) = (float (*)[x_size + 1 + 1][y_size + 1 + 1]) u_vec;
  struct Profiler *timers = (struct Profiler*) _timers;
  /* Flush denormal numbers to zero in hardware */
  _MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
  _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
  for (int time = time_m, t0 = (time)%(2), t1 = (time + 1)%(2); time <= time_M; time += 1, t0 = (time)%(2), t1 = (time + 1)%(2))
  {
    struct timeval start_section0, end_section0;
    gettimeofday(&start_section0, NULL);
    for (int x = x_m; x <= x_M; x += 1)
    {
      #pragma omp simd


Now that we have an `Operator` set up, we are ready to update our domain throught the apply method. Without additional parameters specified, the `Operator` runs on the same data objects used to build it. It is important to stress that, regarded that this operator is buffered by a `TimeFunction`, the maximum iteration point along the time dimension must be explicitly specified (otherwise, the `Operator` wouldn't know how many iterations to run).

Notice that no modifications to the Domain have been done so far. To verify that, query for `u.data`.

In [5]:
op.apply(time=2)
u.data

CustomCompiler: compiled `/tmp/devito-jitcache-uid1000/e1da8656f159ebd000fbe95ce2da53c72ac62b84.c` [0.19 s]


Data([[[2., 2., 2.],
       [2., 2., 2.],
       [2., 2., 2.]],

      [[3., 3., 3.],
       [3., 3., 3.],
       [3., 3., 3.]]], dtype=float32)

This is the first time that we are invoking the method `apply` from `Operator`. Therefore, the $kernel$ that we saw will get written in a `.c` file, and compiled into a `.so` lib if `DEVITO_BACKEND` is set to `core`.

Then, as no key-value parameters are specified, the `Operator` runs with its default arguments, namely `u=u, x_m=0, x_M=2, y_m=0, y_M=2,` and `time_m=0`. The subindexes `m` and `M` stands for the minimum and the maximum at those specific Dimensions, respectivelly. Thus `time_M` will be set to `2`, here.

At this point, the same `Operator` can be used for a completely different run. Let's create another `TimeFunction` for governing a similar $domain$. 

In [6]:
u2 = TimeFunction(name='u', grid=grid)
u2.data

Data([[[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]],

      [[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]]], dtype=float32)

Any of the operator default arguments may be replaced just by passing suitable key-value parameters.


In [7]:
op.apply(u=u2, x_m=1, x_M=2, y_m=0, y_M=1, time_M=3)
u2.data



Data([[[0., 0., 0.],
       [4., 4., 0.],
       [4., 4., 0.]],

      [[0., 0., 0.],
       [3., 3., 0.],
       [3., 3., 0.]]], dtype=float32)

Note, however, that there is no need for recompilation. Just-in-time (JIT) compilation occurs only once, triggered by the first execution.

The `op` object carries out three fundamental tasks: i) generation of low-level code, ii) JIT-compilation, and iii) execution. For the first task, `op` takes as input an ordered sequence of SymPy equation, and represents it as an *Iteration/Expression Tree (IET)* to be used for building a *CGen* tree which is ultimately translated into a string and written to a `.c` file.

An IET is basically an *abstract syntax tree* in which `Iterations` and `Expressions` – two special node
types – play the main actors. Equations are wrapped within `Expressions`. Loop nest embedding
such expressions are constructed by suitably nesting `Iterations`.  Here is another way to see op, in a fashion closer to its IET structure.

In [11]:
from devito import pprint
pprint(op)

<Callable Kernel>
  <List>

    <ArrayCast>
    <PointerCast>
    <List>

      <Denormals>

        <Element /* Flush denormal numbers to zero in hardware */>
        <Element _MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);>
        <Element _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);>

      <List>

        <[affine,sequential,wrappable] Iteration time::time::(time_m, time_M, 1)::(0, 0)>
          <TimedList>
            <C.Statement struct timeval start_section0, end_section0;>
            <C.Statement gettimeofday(&start_section0, NULL);>
            <Section>

              <[affine,parallel] Iteration x::x::(x_m, x_M, 1)::(0, 0)>
                <[affine,parallel,vector-dim] Iteration y::y::(y_m, y_M, 1)::(0, 0)>
                  <ExpressionBundle>

                    <Expression u[t1, x + 1, y + 1] = u[t0, x + 1, y + 1] + 1>


            <C.Statement gettimeofday(&end_section0, NULL);>
            <C.Statement timers->section0 += (double)(end_section0.tv_sec-start_section

Therefore, the `op` object will be expressed as a `root` node of a tree. Walk through such a data structure allows us to regard specific parts of it.

Thus, taking the above $kernel$ as example `op` will be represented as a `<Callable Kernel>` that will be composed by `_headers`, `_includes` and a `body` (that is a `<List>`, in this example).

In [12]:
op._headers

['#define _POSIX_C_SOURCE 200809L']

In [13]:
op._includes

['stdlib.h', 'math.h', 'sys/time.h', 'xmmintrin.h', 'pmmintrin.h']

In [14]:
op.body

(<List (0, 3, 0)>,)

In [15]:
op.body[0].body

(<devito.ir.iet.nodes.ArrayCast at 0x7f456c8b63c8>,
 <devito.ir.iet.nodes.PointerCast at 0x7f456c852400>,
 <List (0, 2, 0)>)

In [16]:
print(op.body[0].body[0])

float (*restrict u)[x_size + 1 + 1][y_size + 1 + 1] __attribute__((aligned(64))) = (float (*)[x_size + 1 + 1][y_size + 1 + 1]) u_vec;


In [17]:
print(op.body[0].body[1])

struct Profiler *timers = (struct Profiler*) _timers;


In [18]:
print(op.body[0].body[2])

/* Flush denormal numbers to zero in hardware */
_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
for (int time = time_m, t0 = (time)%(2), t1 = (time + 1)%(2); time <= time_M; time += 1, t0 = (time)%(2), t1 = (time + 1)%(2))
{
  struct timeval start_section0, end_section0;
  gettimeofday(&start_section0, NULL);
  for (int x = x_m; x <= x_M; x += 1)
  {
    #pragma omp simd
    for (int y = y_m; y <= y_M; y += 1)
    {
      u[t1][x + 1][y + 1] = u[t0][x + 1][y + 1] + 1;
    }
  }
  gettimeofday(&end_section0, NULL);
  timers->section0 += (double)(end_section0.tv_sec-start_section0.tv_sec)+(double)(end_section0.tv_usec-start_section0.tv_usec)/1000000;
}


Within this second list, the first element is responsible for the profiling procedure, while the second part – which is also a list (with only one element) – contains the kernel's main loop.

In [20]:
t_iter = op.body[0].body[2].body[1].body[0]
t_iter

<WithProperties[affine,sequential,wrappable]::Iteration time[t0,t1]; (time_m, time_M, 1)>

Here, we have captured the specific loop corresponding to the `time` dimension of our domain.

In [22]:
print(t_iter)

for (int time = time_m, t0 = (time)%(2), t1 = (time + 1)%(2); time <= time_M; time += 1, t0 = (time)%(2), t1 = (time + 1)%(2))
{
  struct timeval start_section0, end_section0;
  gettimeofday(&start_section0, NULL);
  for (int x = x_m; x <= x_M; x += 1)
  {
    #pragma omp simd
    for (int y = y_m; y <= y_M; y += 1)
    {
      u[t1][x + 1][y + 1] = u[t0][x + 1][y + 1] + 1;
    }
  }
  gettimeofday(&end_section0, NULL);
  timers->section0 += (double)(end_section0.tv_sec-start_section0.tv_sec)+(double)(end_section0.tv_usec-start_section0.tv_usec)/1000000;
}


 We can further investigate its limits, for instance.

In [23]:
t_iter.limits

(time_m, time_M, 1)