Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler: Introducing min/max bounds to replace 'bf' elemental functions #1673

Merged
merged 26 commits into from Jul 26, 2021

Conversation

georgebisbas
Copy link
Contributor

@georgebisbas georgebisbas commented Apr 20, 2021

This PR is removing Elemental Functions from Devito generated code. The so-called "bf" functions.
The most important part for review is in devito/passes/iet/misc.py.

So now you get:

  {
    /* Begin section0 */
    START_TIMER(section0)
    if (x0_blk0_size == 0 || y0_blk0_size == 0)
    {
      return;
    }
    #pragma omp parallel num_threads(nthreads)
    {
      #pragma omp for collapse(2) schedule(dynamic,1)
      for (int x0_blk0 = x_m; x0_blk0 <= x_M; x0_blk0 += x0_blk0_size)
      {
        for (int y0_blk0 = y_m; y0_blk0 <= y_M; y0_blk0 += y0_blk0_size)
        {
          for (int x = x0_blk0; x <= MIN(x_M, x0_blk0 + x0_blk0_size - 1); x += 1)
          {
            for (int y = y0_blk0; y <= MIN(y_M, y0_blk0 + y0_blk0_size - 1); y += 1)
            {
              #pragma omp simd aligned(damp,u,vp:32)
              for (int z = z_m; z <= z_M; z += 1)
              {
                float r10 = 1.0F/dt;
                float r9 = 1.0F/(dt*dt);
                float r8 = 1.0F/(vp[x + 4][y + 4][z + 4]*vp[x + 4][y + 4][z + 4]);
                u[t2][x + 4][y + 4][z + 4] = (r10*(damp[x + 1][y + 1][z + 1]*u[t0][x + 4][y + 4][z + 4]) + r8*(-r9*(-2>
              }
            }
          }
        }
      }
    }
    STOP_TIMER(section0,timers)

instead of:

  for (int time = time_m, t0 = (time)%(3), t1 = (time + 2)%(3), t2 = (time + 1)%(3); time <= time_M; time += 1, t0 = (>
  {
    /* Begin section0 */
    START_TIMER(section0)
    bf0(damp_vec,dt,u_vec,vp_vec,t0,t1,t2,x0_blk0_size,x_M - (x_M - x_m + 1)%(x0_blk0_size),x_m,y0_blk0_size,y_M - (y_>
    bf0(damp_vec,dt,u_vec,vp_vec,t0,t1,t2,x0_blk0_size,x_M - (x_M - x_m + 1)%(x0_blk0_size),x_m,(y_M - y_m + 1)%(y0_bl>
    bf0(damp_vec,dt,u_vec,vp_vec,t0,t1,t2,(x_M - x_m + 1)%(x0_blk0_size),x_M,x_M - (x_M - x_m + 1)%(x0_blk0_size) + 1,>
    bf0(damp_vec,dt,u_vec,vp_vec,t0,t1,t2,(x_M - x_m + 1)%(x0_blk0_size),x_M,x_M - (x_M - x_m + 1)%(x0_blk0_size) + 1,>
    STOP_TIMER(section0,timers)
    /* End section0 */

where

bf:...

void bf0(struct dataobj *restrict damp_vec, const float dt, struct dataobj *restrict u_vec, struct dataobj *restrict v>
{
  float (*restrict damp)[damp_vec->size[1]][damp_vec->size[2]] __attribute__ ((aligned (64))) = (float (*)[damp_vec->s>
  float (*restrict u)[u_vec->size[1]][u_vec->size[2]][u_vec->size[3]] __attribute__ ((aligned (64))) = (float (*)[u_ve>
  float (*restrict vp)[vp_vec->size[1]][vp_vec->size[2]] __attribute__ ((aligned (64))) = (float (*)[vp_vec->size[1]][>

  if (x0_blk0_size == 0 || y0_blk0_size == 0)
  {
    return;
  }
  #pragma omp parallel num_threads(nthreads)
  {
    #pragma omp for collapse(2) schedule(dynamic,1)
    for (int x0_blk0 = x_m; x0_blk0 <= x_M; x0_blk0 += x0_blk0_size)
    {
      for (int y0_blk0 = y_m; y0_blk0 <= y_M; y0_blk0 += y0_blk0_size)
      {
        for (int x = x0_blk0; x <= x0_blk0 + x0_blk0_size - 1; x += 1)
        {
          for (int y = y0_blk0; y <= y0_blk0 + y0_blk0_size - 1; y += 1)
          {
            #pragma omp simd aligned(damp,u,vp:32)
            for (int z = z_m; z <= z_M; z += 1)
            {
              float r10 = 1.0F/dt;
              float r9 = 1.0F/(dt*dt);
              float r8 = 1.0F/(vp[x + 4][y + 4][z + 4]*vp[x + 4][y + 4][z + 4]);
              u[t2][x + 4][y + 4][z + 4] = (r10*(damp[x + 1][y + 1][z + 1]*u[t0][x + 4][y + 4][z + 4]) + r8*(-r9*(-2.0>
            }

@codecov
Copy link

codecov bot commented Apr 20, 2021

Codecov Report

Merging #1673 (6f243ac) into master (1471d53) will decrease coverage by 1.68%.
The diff coverage is 96.89%.

❗ Current head 6f243ac differs from pull request most recent head c38526b. Consider uploading reports for the commit c38526b to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1673      +/-   ##
==========================================
- Coverage   89.60%   87.91%   -1.69%     
==========================================
  Files         206      201       -5     
  Lines       33163    32669     -494     
  Branches     4316     4281      -35     
==========================================
- Hits        29716    28722     -994     
- Misses       2958     3443     +485     
- Partials      489      504      +15     
Impacted Files Coverage Δ
devito/core/arm.py 48.14% <0.00%> (ø)
devito/ir/iet/visitors.py 80.71% <ø> (-0.19%) ⬇️
tests/test_dse.py 94.51% <93.02%> (-5.29%) ⬇️
devito/passes/iet/misc.py 86.15% <95.45%> (-5.85%) ⬇️
tests/conftest.py 90.54% <96.77%> (-0.92%) ⬇️
devito/core/autotuning.py 91.15% <100.00%> (-2.04%) ⬇️
devito/core/cpu.py 100.00% <100.00%> (ø)
devito/core/gpu.py 95.69% <100.00%> (+0.56%) ⬆️
devito/ir/iet/utils.py 80.95% <100.00%> (+5.51%) ⬆️
devito/passes/iet/parpragma.py 92.10% <100.00%> (-0.08%) ⬇️
... and 45 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1471d53...c38526b. Read the comment docs.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@georgebisbas georgebisbas self-assigned this Apr 21, 2021
tests/test_autotuner.py Outdated Show resolved Hide resolved
@georgebisbas georgebisbas mentioned this pull request Apr 21, 2021
@georgebisbas georgebisbas force-pushed the minmaxbounds branch 2 times, most recently from 0823044 to 754e1a5 Compare April 26, 2021 15:44
Copy link
Contributor

@FabioLuporini FabioLuporini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not easy to follow the implementation of relax_incr_dimension

You could try improving the comments, maybe you could also add a couple of examples inline just like I often do for ease of explanation

devito/core/autotuning.py Show resolved Hide resolved
devito/ir/iet/utils.py Show resolved Hide resolved
devito/passes/iet/misc.py Outdated Show resolved Hide resolved
devito/passes/iet/misc.py Outdated Show resolved Hide resolved
devito/passes/iet/misc.py Outdated Show resolved Hide resolved
devito/passes/iet/misc.py Outdated Show resolved Hide resolved
devito/passes/iet/misc.py Outdated Show resolved Hide resolved
devito/passes/iet/misc.py Outdated Show resolved Hide resolved
devito/passes/iet/misc.py Outdated Show resolved Hide resolved
devito/passes/iet/misc.py Outdated Show resolved Hide resolved
devito/passes/iet/misc.py Outdated Show resolved Hide resolved
devito/passes/iet/misc.py Outdated Show resolved Hide resolved
devito/passes/iet/misc.py Outdated Show resolved Hide resolved
devito/passes/iet/misc.py Outdated Show resolved Hide resolved
devito/passes/iet/misc.py Outdated Show resolved Hide resolved
devito/passes/iet/misc.py Outdated Show resolved Hide resolved
devito/passes/iet/misc.py Outdated Show resolved Hide resolved
devito/passes/iet/misc.py Outdated Show resolved Hide resolved
devito/passes/iet/misc.py Outdated Show resolved Hide resolved
devito/passes/iet/misc.py Outdated Show resolved Hide resolved
devito/passes/iet/misc.py Outdated Show resolved Hide resolved
devito/core/autotuning.py Show resolved Hide resolved
devito/core/gpu.py Outdated Show resolved Hide resolved
devito/operator/operator.py Outdated Show resolved Hide resolved
devito/passes/iet/misc.py Outdated Show resolved Hide resolved
devito/passes/iet/misc.py Outdated Show resolved Hide resolved
devito/passes/iet/misc.py Outdated Show resolved Hide resolved
devito/passes/iet/misc.py Outdated Show resolved Hide resolved
devito/passes/iet/misc.py Outdated Show resolved Hide resolved
examples/cfd/01_convection_revisited.ipynb Show resolved Hide resolved
tests/test_mpi.py Outdated Show resolved Hide resolved
tests/test_autotuner.py Outdated Show resolved Hide resolved
@georgebisbas georgebisbas force-pushed the minmaxbounds branch 2 times, most recently from 8254b84 to 8844b0a Compare June 28, 2021 09:33
@georgebisbas georgebisbas changed the title [RFC] compiler: Introducing min/max bounds to replace 'bf' elemental functions compiler: Introducing min/max bounds to replace 'bf' elemental functions Jun 28, 2021
def assert_structure(operator, exp_trees=None, exp_iters=None):
"""
Utility function that helps to check loop structure of IETs. Retrieves trees from an
Operator and check that blocking structure is as expected. Trees and Iterations are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... that the blocking structure...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


Example:

To check the following structure:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"To check that an Iteration tree as the following structure"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, (*has the following.. )?


we call:

`trees, iters = assert_structure(op, ['t,x,y', 't,f,y'], 't,x,y,f,y')`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicking: use should use ..code-block again, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, AFAI saw from other examples....did some more changes to be homogeneous with the codebase, let me know if yout think it is better or not now...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yes, I thought you were saying that we need to have an ending delimeter such as

code-block
code
code-block end
ok, all clear now

tests/conftest.py Show resolved Hide resolved

we call:

`trees, bns = get_blocked_nests(op, {'x0_blk0', 'x1_blk0'})`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as before, use code-block ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

tests/conftest.py Show resolved Hide resolved
tests/conftest.py Show resolved Hide resolved
@@ -1072,11 +1072,12 @@ def test_no_fusion_simple(self):
Eq(g, f + 1, implicit_dims=[ctime])]

op = Operator(eqns)
exprs = FindNodes(Expression).visit(op._func_table['bf0'].root)
_, bns = get_blocked_nests(op, {'x0_blk0', 'x1_blk0'})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imho all these calls to assert_blocking want blank lines around

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, not as before right?
i.e. we add two empty lines ?

assert tree[3].dim.is_Incr and tree[3].dim.parent is tree[1].dim and\
tree[3].dim.root is y
assert not tree[4].dim.is_Incr and tree[4].dim is zi and tree[4].dim.parent is z
_, _ = assert_structure(op, ['t,i0x0_blk0,i0y0_blk0,i0x,i0y,i0z'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, you tell me?

tests/test_dse.py Show resolved Hide resolved
Copy link
Contributor

@mloubout mloubout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comment, very nice!

devito/passes/iet/parpragma.py Show resolved Hide resolved
examples/cfd/01_convection_revisited.ipynb Show resolved Hide resolved
tests/conftest.py Outdated Show resolved Hide resolved
tests/conftest.py Show resolved Hide resolved
tests/test_dimension.py Show resolved Hide resolved
opt=('blocking', {'openmp': True,
'blockinner': blockinner,
'par-collapse-ncores': 1}))
for opi in [op, op2]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not parametize over openmp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will lead to redundantly run additional tests as there is CI for openmp and no-openmp, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how, it won't change anything only split this test in two cases rather than having them as if/else inside the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, you are right, pushed a change, let me know if you like it.

tests/test_skewing.py Show resolved Hide resolved
Copy link
Contributor Author

@georgebisbas georgebisbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

devito/passes/iet/parpragma.py Show resolved Hide resolved
examples/cfd/01_convection_revisited.ipynb Show resolved Hide resolved
tests/conftest.py Outdated Show resolved Hide resolved
tests/conftest.py Show resolved Hide resolved
tests/test_dimension.py Show resolved Hide resolved
tests/test_dimension.py Show resolved Hide resolved
opt=('blocking', {'openmp': True,
'blockinner': blockinner,
'par-collapse-ncores': 1}))
for opi in [op, op2]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will lead to redundantly run additional tests as there is CI for openmp and no-openmp, no?

tests/test_skewing.py Show resolved Hide resolved
Copy link
Contributor

@FabioLuporini FabioLuporini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we nearly ready to land!

tests/conftest.py Outdated Show resolved Hide resolved

if exp_trees is not None:
exp_trees = [i.replace(',', '') for i in exp_trees] # 't,x,y' -> 'txy'
tree_struc = (["".join(mapper.get(i.dim.name, i.dim.name) for i in j)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment about what this line's doing? just like the one in the line above

tests/conftest.py Show resolved Hide resolved
trees = retrieve_iteration_tree(operator)
for tree in trees:
iterations = [i for i in tree if i.dim.is_Incr] # Collect Incr dimensions
parallel_blocks = FindNodes(ParallelBlock).visit(tree)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only need this tree traversal if you get inside the if iterations branch right? so move it below inside the try?

@@ -238,8 +238,7 @@ def test_multiple_blocking():
opt=('blocking', {'openmp': False}))

# First of all, make sure there are indeed two different loop nests
assert 'bf0' in op._func_table
assert 'bf1' in op._func_table
_, _ = assert_blocking(op, {'x0_blk0', 'x1_blk0'})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just assert_blocking(op, {'x0_blk0', 'x1_blk0'}) no need to catch the return value if ever used!

replicate this change wherever needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

tests/test_dse.py Outdated Show resolved Hide resolved
tests/test_dse.py Outdated Show resolved Hide resolved
@georgebisbas georgebisbas removed the WIP Still work in progress label Jul 26, 2021
"""
mapper = {'time': 't'}
trees = retrieve_iteration_tree(operator)
iters = FindNodes(Iteration).visit(operator)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not needed unless exp_iters is not None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are cases where the structure is simple, so exp_iters is not tested as redundant. There are also tests that use Iterations later. So it is needed.

for f
for y

we call(Note: `time` mapped to `t`):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space, or rather change the for t into for time in the code-block above?

Copy link
Contributor

@FabioLuporini FabioLuporini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of tiny, last-minute comments, but to be fair I'm fine if they're addressed in the next PR. Approving -- well done! This is a really nice piece of work

assert len(trees) == 1
tree = trees[0]
assert len(tree) == exp_iters
opt=('blocking', {'openmp': True, 'blockinner': blockinner,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openmp: openmp?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicking: blockinner could probably be moved to parametrize as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYM, it is alreday there?

exp_iters = exp_iters.replace(',', '') # 't,x,y' -> 'txy'
iter_struc = "".join(mapper.get(i.dim.name, i.dim.name) for i in iters)
assert iter_struc == exp_iters

return trees, iters
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

compiler: buggy compose_nodes, drops HaloSpots
3 participants