Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler: Revamp compilation of halo exchanges #2089

Merged
merged 36 commits into from Apr 5, 2023
Merged

Conversation

FabioLuporini
Copy link
Contributor

This PR performs a massive re-engineering of how halo exchanges are detected, constructed, and lowered inside Devito

Why: in master, halo exchanges are detected in the stree layer, that is, after all clusters-level passes have been performed. The issue is that some of these passes "clutter" the equations/clusters (after all, we're lowering) by e.g. introducing temporaries and changing the index access functions. This has the effect of enormously complicating data dependence analysis. A complexity exploded when I added CUDA/HIP shared memory.

What: change of paradigm. Now detecting halo exchanges is one of the first things the compiler does, and it does that at the clusters level. A halo exchange is here represented simply as yet another Cluster. It could not have been metadata attached to Cluster because Clusters evolve during lowering, which would take us back to the problem we're trying to solve here. A "floating" Cluster is the way to go, in my opinion. I failed to see this mechanism for several years, until something flipped in my mind when I was at Rice.

How: significant code changes, but in the end, many are simplifications

This PR is not finished yet, there are a couple of small TODOs to deal with, but I'd like people to start reviewing because we'll have to merge it soon (there has been a demand for CUDA/HIP + MPI numbers for weeks...)

@codecov
Copy link

codecov bot commented Mar 16, 2023

Codecov Report

Merging #2089 (536f560) into master (d0e705d) will decrease coverage by 0.08%.
The diff coverage is 80.21%.

@@            Coverage Diff             @@
##           master    #2089      +/-   ##
==========================================
- Coverage   87.80%   87.72%   -0.08%     
==========================================
  Files         221      221              
  Lines       38665    38951     +286     
  Branches     5000     5061      +61     
==========================================
+ Hits        33951    34171     +220     
- Misses       4164     4213      +49     
- Partials      550      567      +17     
Impacted Files Coverage Δ
devito/ir/support/syncs.py 61.53% <0.00%> (ø)
devito/symbolics/extended_sympy.py 95.82% <ø> (-0.04%) ⬇️
devito/types/parallel.py 85.62% <ø> (ø)
tests/test_dle.py 96.27% <ø> (ø)
devito/arch/compiler.py 41.06% <4.34%> (-1.28%) ⬇️
devito/passes/clusters/asynchrony.py 11.64% <25.00%> (ø)
devito/passes/iet/instrument.py 72.72% <42.10%> (-16.47%) ⬇️
devito/operator/operator.py 89.13% <50.00%> (-1.53%) ⬇️
devito/mpi/distributed.py 90.23% <61.11%> (-1.88%) ⬇️
devito/passes/clusters/blocking.py 89.76% <65.90%> (-4.23%) ⬇️
... and 23 more

... and 1 file with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Contributor

@georgebisbas georgebisbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will have a look end of next week

devito/ir/clusters/algorithms.py Show resolved Hide resolved
@FabioLuporini FabioLuporini force-pushed the mpi-revamp branch 3 times, most recently from fa84374 to 0dc22be Compare March 21, 2023 15:16
@FabioLuporini FabioLuporini changed the title Revamp compilation of halo exchanges compiler: Revamp compilation of halo exchanges Mar 23, 2023
guards=kwargs.get('guards', self.guards),
properties=kwargs.get('properties', self.properties),
syncs=kwargs.get('syncs', self.syncs))
return self.__class__(exprs=kwargs.get('exprs', self.exprs),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you doing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reconstruction of potential Cluster subclasses

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we ratehr have a _rebuild ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well that's how the method is called :)

@@ -1,75 +1,61 @@
from collections import defaultdict
from itertools import groupby

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why empty line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the standard adopted throughout the entirety of devito

...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@georgebisbas sorry about the three "..." above which I'm sure sounds a bit blunt -- but something must have gone south while github was (partly) down yesterday; probably it picked my "Comment" when it was still half baked

I had written something along the lines of:

<imports from stdling>
<blankline>
<imports from third parties>
<blankline>
<local import>

definitely not "..."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nw, my main fault is that I thought anytree was considered standard. I overall have the feeling that there should be files not following the first blankline but maybe I am wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably, they might have escaped reviewing, fixable as we encounter them

gflopss = "%.2f GFlops/s" % fround(v.gflopss)
gpointss = "%.2f GPts/s" % fround(v.gpointss) if v.gpointss else None
metrics = ", ".join(i for i in [oi, gflopss, gpointss] if i is not None)
if v.gflopss:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does this happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data streaming sections which only trigger data transfers (so they take up time) but no compute

@FabioLuporini FabioLuporini force-pushed the mpi-revamp branch 2 times, most recently from 5b0009e to f71fc50 Compare March 27, 2023 09:02
@@ -23,6 +22,11 @@ class IndexMode(Tag):
REGULAR = IndexMode('regular')
IRREGULAR = IndexMode('irregular')

Delta = Symbol(name='Δ')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would refrain from using symbols with FD meaning as they may be used to define Function/....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK replacing with ⋈

Copy link
Contributor

@mloubout mloubout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry took a bit to get back to it

Overall I think it's GTG, seem to be cleaning up a lot. Some minor comments only

't,x0_blk0,y0_blk0,x,y,z,z')

def init(f, v=1):
f.data[:] = np.indices(grid.shape).sum(axis=0) % (.004*v) + .01
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting choice

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the numbers just random?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely not random

metrics = ", ".join(i for i in [oi, gflopss, gpointss] if i is not None)
if v.gflopss:
oi = "OI=%.2f" % fround(v.oi)
gflopss = "%.2f GFlops/s" % fround(v.gflopss)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think someone asked for .3f somewhere in slack but not important

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was probably John, but I honestly prefer .2f. When would three digits be useful? Anyway, we can vote!

__str__ = __repr__

def _sympystr(self, printer):
return str(self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have a printer shouldn't it be printer.doprint(self) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember exactly, but probably not, pretty sure I tried it but didn't eventually obtained what I wanted. Maybe because the printer lacks the handle for self, that's why we end up here? Anyway, this is just for the repr within larger expressions

devito/mpi/distributed.py Show resolved Hide resolved
@@ -105,7 +105,7 @@ def sniff_mpi_distro(mpiexec):


@memoized_func
def sniff_mpi_flags():
def sniff_mpi_flags(mpicc='mpicc'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC in the CUDA/HIP compiler classes

devito/ir/clusters/algorithms.py Show resolved Hide resolved
return ElementalFunction(name, iet, retval, derive_parameters(iet), prefix,
dynamic_parameters)
return ElementalFunction(name, iet, retval=retval,
parameters=derive_parameters(iet), prefix=prefix,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would lift this out of the call, probably cleaner? parameters=derive_parameters(iet),

Comment on lines 249 to 252
DiscreteFunction: 1,
AbstractIncrDimension: 2,
BlockDimension: 3,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understood this; could you help with a brief explanation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's explained in the comment:

https://github.com/devitocodes/devito/blob/master/devito/passes/iet/engine.py#L240

if I have an object X that, among its fields, has a BlockDimension Y, then before attempting to reconstruct a new X I will need to have a reconstruction for Y, has "X depends on Y", in a logical sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember I had a similar approach in my time-tiling, but was considered a kinda-hack :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thankfully GitHub preserves history!

Can you link me to that discussion please?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a verbal/slack discussion AFAIR since I did not open this PR.
My point was not to bring it up this way.
AFAIR:
I wanted to bring in the concept of a new class of blocking dimensions that had additional min/max info for handling main/remainder areas. As in having the necessary info for that at cluster level.

Then in, IET-level, when relaxing, I would need to reconstruct both parent-child block dims as the min of child was dependent upon the main-remainder of father.

Long story short, I had the impression that we should not do sth like this. But I am fine.
At least what you do is clean/clear, mine was not really.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other places in the framework where something similar is being done?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure honestly, maybe? I mean, the concept of prioritizing processing is fairly widespread

Comment on lines +326 to +339
p = i.parent
pp = i.parent.parent

name0 = pp.name
base = sregistry.make_name(prefix=name0)
name1 = sregistry.make_name(prefix='%s_blk' % base)

bd = i.parent._rebuild(name1, pp)
d = i._rebuild(name0, bd, i._min.subs(p, bd), i._max.subs(p, bd))

mapper.update({
i: d,
i.parent: bd
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks weird. Why is that happening?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if you reconstruct using i as "stamp", you better be sure to also replace all of its parent occurrences, or you end up with undefined symbols

@@ -404,49 +404,6 @@ def test_cache_blocking_hierarchical(blockshape0, blockshape1, exception):

@pytest.mark.parametrize("blockinner", [False, True])
def test_cache_blocking_imperfect_nest(blockinner):
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort-of; now it's in test_mpi, though under a slightly different vest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, and thanks for making me notice: I had originally removed it because, yes, as I said, I was going to move it to test_mpi; but then I rolled most of the temporary edits back; hence it does make sense to keep this test where it was; so, reinstated.

't,x0_blk0,y0_blk0,x,y,z,z')

def init(f, v=1):
f.data[:] = np.indices(grid.shape).sum(axis=0) % (.004*v) + .01
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the numbers just random?

@FabioLuporini
Copy link
Contributor Author

@mloubout @georgebisbas edits pushed, most comments addressed

@mloubout : yes, this PR indeed ships significant tidy up

@@ -289,19 +302,10 @@ def is_cross(dep):
break

# Any anti- and iaw-dependences impose that `cg1` follows `cg0`
# while not being its immediate successor (unless it already is),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, this is massive

Copy link
Contributor

@georgebisbas georgebisbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I have more to add. I think I asked most of the things that I could not understand. I will approve, but happy to re-review if @FabioLuporini you think there are parts I should pay more attention

@FabioLuporini FabioLuporini merged commit c506dca into master Apr 5, 2023
34 checks passed
@FabioLuporini FabioLuporini deleted the mpi-revamp branch April 5, 2023 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants