This document reports explorations on the Devito's IET (Iteration/Expression Tree).

In Part I, we investigate the IET of an Operator created by the user. In Part II, we build and play with toy IETs.

# Part I - Top Down

First, let's create a $domain$ that we can speak about, and a $function$ that will allow us to specify how such a domain gets modified. In particular, we will look at functions that change through $time$.

In [1]:
from devito import Eq, Grid, TimeFunction, Operator

grid = Grid(shape=(3, 3))
u = TimeFunction(name='u', grid=grid)
u.data

Data([[[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]],

      [[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]]], dtype=float32)

Here, we have just declared a two-dimensional domain with three coordinates at each dimension ($x$ and $y$). Each coordinate of such a discrete space will be holding a $real~value$.

As we can see, we can always access the values at each coordinate of the domain. At this point, no modifications have been done to it yet. `u.data` give as a quick access over the values holded by each cell of such a domain. `u.data[0]` holds the values in the grid at the "current" iteration time, given a time-step to be considered, whereas `u.data[1]` holds the values of `u` for the "current+1" time-step.

We can now create an `operator` that will perform modifications onto our domain according to $differential~ equations$ through a computational stencil. 
It means that those differential equations will be translated into finite differences that will be used to update the values at each coordinate of the space.
Such finite differences, or `expressions`, will be applied for specific ranges of `iterations` over the domain. 

In [2]:
eq = Eq(u.forward, u+1)
op = Operator(eq)
op.args['expressions']

Eq(u(t + dt, x, y), u(t, x, y) + 1)

For instance, the particular `equation` object above allows us say that, at each time step, `1` will be added to every position of the domain.

Let's take a look at the $kernel$ that will be used to compute how this equation alters the domain.

In [22]:
print(op)

#define _POSIX_C_SOURCE 200809L
#include "stdlib.h"
#include "math.h"
#include "sys/time.h"
#include "xmmintrin.h"
#include "pmmintrin.h"

struct Profiler
{
  double section0;
} ;


int Kernel(float *restrict u_vec, const int time_M, const int time_m, void *_timers, const int x_M, const int x_m, const int x_size, const int y_M, const int y_m, const int y_size)
{
  float (*restrict u)[x_size + 1 + 1][y_size + 1 + 1] __attribute__((aligned(64))) = (float (*)[x_size + 1 + 1][y_size + 1 + 1]) u_vec;
  struct Profiler *timers = (struct Profiler*) _timers;
  /* Flush denormal numbers to zero in hardware */
  _MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
  _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
  for (int time = time_m, t0 = (time)%(2), t1 = (time + 1)%(2); time <= time_M; time += 1, t0 = (time)%(2), t1 = (time + 1)%(2))
  {
    for (int x = x_m; x <= x_M; x += 1)
    {
      #pragma omp simd
      for (int y = y_m; y <= y_M; y += 1)
      {
        u[t1][x + 1][y + 1] = u[t0][x + 

Now that we have an operator set up, we are ready to update our domain throught the `apply` method. Without additional parameters specified, the operator runs on the same data objects used to build it. It is important to stress that, regarded that this `operator` is buffered by a `TimeFunction`, the maximum iteration point along the time dimension must be explicitly specified (otherwise, the `operator` wouldn't know how many iterations to run).

Notice that no modifications to the domain have been done so far. To verify that, query for `u.data`.

In [4]:
op.apply(time=2)
u.data

CustomCompiler: compiled /home/lucas/devito/fwi_project/test_ops/kernels/devito-cmwnvbek/42f48a9c4c1ed14fbead54ab48b57fa4747e559b.c [0.10 s]


Data([[[2., 2., 2.],
       [2., 2., 2.],
       [2., 2., 2.]],

      [[3., 3., 3.],
       [3., 3., 3.],
       [3., 3., 3.]]], dtype=float32)

This is the first time that we are invoking the operator's apply. Therefore, the $kernel$ that we saw will get written in a `.c` file, and compiled into a `.so` lib if `DEVITO_BACKEND` is set to `core`.

Then, as no key-value parameters are specified, the operator runs with its default arguments, namely `u=u, x_m=0, x_M=2, y_m=0, y_M=2,` and `time_m=0`. The subindexes `m` and `M` stands for the minimum and the maximum at those specific dimensions, respectivelly. Thus `time_M` will be set to `2`, here.

At this point, the same operator can be used for a completely different run. Let's create another time function for governing a similar domain. 

In [5]:
u2 = TimeFunction(name='u', grid=grid)
u2.data

Data([[[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]],

      [[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]]], dtype=float32)

Any of the operator default arguments may be replaced just by passing suitable key-value parameters.


In [6]:
op.apply(u=u2, x_m=1, x_M=2, y_m=0, y_M=1, time_M=3)
u2.data

Data([[[0., 0., 0.],
       [4., 4., 0.],
       [4., 4., 0.]],

      [[0., 0., 0.],
       [3., 3., 0.],
       [3., 3., 0.]]], dtype=float32)

Notice, however, that there is no need for recompilation of the kernel. Just-in-time (JIT) compilation occurs only once, triggered by the first execution.

The `op` object carries out three fundamental tasks: i) generation of low-level code, ii) JIT-compilation, and iii) execution. At the first task, `op` takes as input an ordered sequence of SymPy equation, and represents it as an *Iteration/Expression Tree (IET)* to be used for building a *CGen* tree which is ultimately translated into a string and written to a `.c` file.

An IET is basically an *abstract syntax tree* in which `Iterations` and `Expressions` – two special node
types – play the main actors. Equations are wrapped within `Expressions`. Loop nest embedding
such expressions are constructed by suitably nesting `Iterations`.  Here is another way to see op, in a fashion closer to its IET structure.

In [8]:
from devito import pprint
pprint(op)

<Callable Kernel>
  <List>

    <ArrayCast>
    <PointerCast>
    <List>

      <Denormals>

        <Element /* Flush denormal numbers to zero in hardware */>
        <Element _MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);>
        <Element _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);>

      <List>

        <[affine,sequential] Iteration time::time::(time_m, time_M, 1)::(0, 0)>
          <Section>

            <[affine,parallel] Iteration x::x::(x_m, x_M, 1)::(0, 0)>
              <[affine,parallel,vector-dim] Iteration y::y::(y_m, y_M, 1)::(0, 0)>
                <ExpressionBundle>

                  <Expression u[t1, x + 1, y + 1] = u[t0, x + 1, y + 1] + 1>







Therefore, the `op` object will be expressed as a `root` node of a tree. Walk through such a data structure allows us to regard specific parts of it.

Thus, taking the above $kernel$ as example `op` will be represented as a `<Callable Kernel>` that will be composed by `_headers`, `_includes` and a `body` (that is a `<List>`, in this example).

In [77]:
op._headers

['#define _POSIX_C_SOURCE 200809L']

In [78]:
op._includes

['stdlib.h', 'math.h', 'sys/time.h', 'xmmintrin.h', 'pmmintrin.h']

In [79]:
op.body

(<List (0, 3, 0)>,)

In [80]:
op.body[0].body

(<devito.ir.iet.nodes.ArrayCast object at 0x7f7ea791db38>, <devito.ir.iet.nodes.PointerCast object at 0x7f7ea7927400>, <List (0, 2, 0)>)

As expected, the `<List>` inside the `<Callable Kernel>` has three elements: an `ArrayCast`, a `PointerCast` and another `<List>`. Let's take a look inside each one of these `children`. 

The first two elements are `CGen` objects that contains `.c` inserts of code. The third element is another `<List>`with two other elements.

In [82]:
print(op.body[0].body[0])

float (*restrict u)[x_size + 1 + 1][y_size + 1 + 1] __attribute__((aligned(64))) = (float (*)[x_size + 1 + 1][y_size + 1 + 1]) u_vec;


In [74]:
print(op.body[0].body[1])

struct Profiler *timers = (struct Profiler*) _timers;


In [71]:
print(op.body[0].body[2])

/* Flush denormal numbers to zero in hardware */
_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
for (int time = time_m, t0 = (time)%(2), t1 = (time + 1)%(2); time <= time_M; time += 1, t0 = (time)%(2), t1 = (time + 1)%(2))
{
  for (int x = x_m; x <= x_M; x += 1)
  {
    #pragma omp simd
    for (int y = y_m; y <= y_M; y += 1)
    {
      u[t1][x + 1][y + 1] = u[t0][x + 1][y + 1] + 1;
    }
  }
}


Within this second list, the first element is responsible for the profiling procedure, while the second part – which is also a list (with only one element) – contains the kernel's main loop.

In [48]:
t_iter = op.body[0].body[2].body[1].body[0]
t_iter

<WithProperties[affine,sequential]::Iteration time[t0,t1]; (time_m, time_M, 1)>

Here, we have captured the specific loop corresponding to the `time` dimension of our domain.

In [53]:
print(t_iter)

for (int time = time_m, t0 = (time)%(2), t1 = (time + 1)%(2); time <= time_M; time += 1, t0 = (time)%(2), t1 = (time + 1)%(2))
{
  for (int x = x_m; x <= x_M; x += 1)
  {
    #pragma omp simd
    for (int y = y_m; y <= y_M; y += 1)
    {
      u[t1][x + 1][y + 1] = u[t0][x + 1][y + 1] + 1;
    }
  }
}


 We can further investigate its limits, for instance.

In [84]:
t_iter.limits

(time_m, time_M, 1)

And as we keep going down through the IET, we reach the expression that is wrapped by the iterations' scope.  

In [89]:
expr = op.body[0].body[2].body[1].body[0].children[0][0].body[0].children[0][0].children[0][0].body[0]
print(expr.view)

<Expression u[t1, x + 1, y + 1] = u[t0, x + 1, y + 1] + 1>


Devito already implements a default solution to reach all the expressions one can find in the IET of an operator.

In [29]:
from devito.ir.iet import Expression, FindNodes
exprs = FindNodes(Expression).visit(op)
print(exprs[0].view)

<Expression u[t1, x + 1, y + 1] = u[t0, x + 1, y + 1] + 1>


# Part II - Bottom Up

Naturally, `dimensions` are the building blocks of both `iterations` and `expressions`.

In [56]:
from devito import SpaceDimension, TimeDimension

dims = {'i': SpaceDimension(name='i'),
        'j': SpaceDimension(name='j'),
        'k': SpaceDimension(name='k'),
        't0': TimeDimension(name='t0'),
        't1': TimeDimension(name='t1')}

dims

{'i': i, 'j': j, 'k': k, 't0': t0, 't1': t1}

Elements such as `scalars`, `constants` and `functions` are used to build `expressions`.

In [57]:
from devito.types import Array, Scalar
from devito import Constant, Function

symbs = {'a': Scalar(name='a'),
         'b': Constant(name='b'),
         'c': Array(name='c', shape=(3,), dimensions=(dims['i'],)).indexify(),
         'd': Array(name='d', 
                    shape=(3,3), 
                    dimensions=(dims['j'],dims['k'])).indexify(),
         'e': Function(name='e', 
                       shape=(3,3,3), 
                       dimensions=(dims['t0'],dims['t1'],dims['i'])).indexify()}

`Time-functions` may be used too. This sort of representation requires an operating `domain` though. 

In [92]:
from devito import Grid
grid = Grid(shape=(10,10))
grid

Grid[extent=(1.0, 1.0), shape=(10, 10), dimensions=(x, y)]

In [93]:
from devito import TimeFunction
f = TimeFunction(name='f', grid=grid)
symbs['f'] = f.indexify()
symbs

{'a': a, 'b': b, 'c': c[i], 'd': d[j, k], 'e': e[t0, t1, i], 'f': f[t, x, y]}

`Expressions` are build from equations.

In [60]:
from devito.ir.iet import Expression
from devito.ir.equations import DummyEq
from devito.tools import pprint

def get_exprs(a, b, c, d, e, f):
    return [Expression(DummyEq(a, b + c + 5.)),
            Expression(DummyEq(d, e - f)),
            Expression(DummyEq(a, 4 * (b * a))),
            Expression(DummyEq(a, (6. / b) + (8. * a)))]

exprs = get_exprs(symbs['a'],
                  symbs['b'],
                  symbs['c'],
                  symbs['d'],
                  symbs['e'],
                  symbs['f'])

pprint(exprs)

<Expression a = b + c[i] + 5.0>
<Expression d[j, k] = e[t0, t1, i] - f[t, x, y]>
<Expression a = 4*a*b>
<Expression a = 8.0*a + 6.0/b>


And `Iterations` are build to run over `Expressions`. 

In [97]:
from devito.ir.iet import Iteration

def get_iters(dims):
    return [lambda ex: Iteration(ex, dims['i'], (0, 3, 1)),
            lambda ex: Iteration(ex, dims['j'], (0, 5, 1)),
            lambda ex: Iteration(ex, dims['k'], (0, 7, 1)),
            lambda ex: Iteration(ex, dims['t0'], (0, 4, 1)),
            lambda ex: Iteration(ex, dims['t1'], (0, 4, 1))]

iters = get_iters(dims)

Here, we can see how blocks of `iterations` over `expressions` can be used to build loop nests. 

In [62]:
def get_block1(exprs, iters):
    # Perfect loop nest:
    # for i
    #   for j
    #     for k
    #       expr0
    return iters[0](iters[1](iters[2](exprs[0])))
    
def get_block2(exprs, iters):
    # Non-perfect simple loop nest:
    # for i
    #   expr0
    #   for j
    #     for k
    #       expr1
    return iters[0]([exprs[0], iters[1](iters[2](exprs[1]))])

def get_block3(exprs, iters):
    # Non-perfect non-trivial loop nest:
    # for i
    #   for s
    #     expr0
    #   for j
    #     for k
    #       expr1
    #       expr2
    #   for p
    #     expr3
    return iters[0]([iters[3](exprs[0]),
                     iters[1](iters[2]([exprs[1], exprs[2]])),
                     iters[4](exprs[3])])

block1 = get_block1(exprs, iters)
block2 = get_block2(exprs, iters)
block3 = get_block3(exprs, iters)

pprint(block1), print('\n')
pprint(block2), print('\n')
pprint(block3)

<Iteration i::i::(0, 3, 1)::(0, 0)>
  <Iteration j::j::(0, 5, 1)::(0, 0)>
    <Iteration k::k::(0, 7, 1)::(0, 0)>
      <Expression a = b + c[i] + 5.0>


<Iteration i::i::(0, 3, 1)::(0, 0)>
  <Expression a = b + c[i] + 5.0>
  <Iteration j::j::(0, 5, 1)::(0, 0)>
    <Iteration k::k::(0, 7, 1)::(0, 0)>
      <Expression d[j, k] = e[t0, t1, i] - f[t, x, y]>


<Iteration i::i::(0, 3, 1)::(0, 0)>
  <Iteration t0::t0::(0, 4, 1)::(0, 0)>
    <Expression a = b + c[i] + 5.0>
  <Iteration j::j::(0, 5, 1)::(0, 0)>
    <Iteration k::k::(0, 7, 1)::(0, 0)>
      <Expression d[j, k] = e[t0, t1, i] - f[t, x, y]>
      <Expression a = 4*a*b>
  <Iteration t1::t1::(0, 4, 1)::(0, 0)>
    <Expression a = 8.0*a + 6.0/b>


And, finally, we can build `callable` kernels that will be used for generating `.c` code. 

In [98]:
from devito.ir.iet import Callable

kernels = [Callable('foo', block1, 'void', ()),
           Callable('foo', block2, 'void', ()),
           Callable('foo', block3, 'void', ())]

print('kernel no.1:\n' + str(kernels[0].ccode) + '\n')
print('kernel no.2:\n' + str(kernels[1].ccode) + '\n')
print('kernel no.3:\n' + str(kernels[2].ccode) + '\n')

kernel no.1:
void foo()
{
  for (int i = 0; i <= 3; i += 1)
  {
    for (int j = 0; j <= 5; j += 1)
    {
      for (int k = 0; k <= 7; k += 1)
      {
        a = b + c[i] + 5.0F;
      }
    }
  }
}

kernel no.2:
void foo()
{
  for (int i = 0; i <= 3; i += 1)
  {
    a = b + c[i] + 5.0F;
    for (int j = 0; j <= 5; j += 1)
    {
      for (int k = 0; k <= 7; k += 1)
      {
        d[j][k] = e[t0][t1][i] - f[t][x][y];
      }
    }
  }
}

kernel no.3:
void foo()
{
  for (int i = 0; i <= 3; i += 1)
  {
    for (int t0 = 0; t0 <= 4; t0 += 1)
    {
      a = b + c[i] + 5.0F;
    }
    for (int j = 0; j <= 5; j += 1)
    {
      for (int k = 0; k <= 7; k += 1)
      {
        d[j][k] = e[t0][t1][i] - f[t][x][y];
        a = 4*a*b;
      }
    }
    for (int t1 = 0; t1 <= 4; t1 += 1)
    {
      a = 8.0F*a + 6.0F/b;
    }
  }
}



Once the IET is set up, it can be fully transformed. Whole blocks can be replaced, as in the example bellow.

A `Transformer` object will have a pattern-replacer match for its initializer. And through the `visit` method, one can search throughout an IET automatically performing the proper replacements. 

In [11]:
from devito.ir.iet import Transformer

#Replaces a Function's body with another
transformer = Transformer({block1: block2})
kernel_alt = transformer.visit(kernels[0])
print(kernel_alt)

void foo()
{
  for (int i = 0; i <= 3; i += 1)
  {
    a = b + c[i] + 5.0F;
    for (int j = 0; j <= 5; j += 1)
    {
      for (int k = 0; k <= 7; k += 1)
      {
        d[j][k] = e[t0][t1][i] - f[t][x][y];
      }
    }
  }
}


Specific equations within the loop nests can also be substituted. Using the model above, the following examples show how it can be done for several patterns.

In [12]:
#Replaces an expression with another
transformer = Transformer({exprs[0]: exprs[1]})
newblock = transformer.visit(block1)
newcode = str(newblock.ccode)
print(newcode)

for (int i = 0; i <= 3; i += 1)
{
  for (int j = 0; j <= 5; j += 1)
  {
    for (int k = 0; k <= 7; k += 1)
    {
      d[j][k] = e[t0][t1][i] - f[t][x][y];
    }
  }
}


In [13]:
from devito.ir.iet import Block
import cgen as c

#Creates a replacer for replacing an expression
line1 = '// Replaced expression'
replacer = Block(c.Line(line1))
transformer = Transformer({exprs[1]: replacer})
newblock = transformer.visit(block2)
newcode = str(newblock.ccode)
print(newcode)

for (int i = 0; i <= 3; i += 1)
{
  a = b + c[i] + 5.0F;
  for (int j = 0; j <= 5; j += 1)
  {
    for (int k = 0; k <= 7; k += 1)
    {
      // Replaced expression
      {
      }
    }
  }
}


In [14]:
#Wraps an expression in comments
line1 = '// This is the opening comment'
line2 = '// This is the closing comment'
wrapper = lambda n: Block(c.Line(line1), n, c.Line(line2))
transformer = Transformer({exprs[0]: wrapper(exprs[0])})
newblock = transformer.visit(block1)
newcode = str(newblock.ccode)
print(newcode)

for (int i = 0; i <= 3; i += 1)
{
  for (int j = 0; j <= 5; j += 1)
  {
    for (int k = 0; k <= 7; k += 1)
    {
      // This is the opening comment
      {
        a = b + c[i] + 5.0F;
      }
      // This is the closing comment
    }
  }
}
