New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compiler: Revamp compilation of halo exchanges #2089
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2089 +/- ##
==========================================
- Coverage 87.80% 87.72% -0.08%
==========================================
Files 221 221
Lines 38665 38951 +286
Branches 5000 5061 +61
==========================================
+ Hits 33951 34171 +220
- Misses 4164 4213 +49
- Partials 550 567 +17
... and 1 file with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will have a look end of next week
fa84374
to
0dc22be
Compare
guards=kwargs.get('guards', self.guards), | ||
properties=kwargs.get('properties', self.properties), | ||
syncs=kwargs.get('syncs', self.syncs)) | ||
return self.__class__(exprs=kwargs.get('exprs', self.exprs), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reconstruction of potential Cluster subclasses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we ratehr have a _rebuild
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well that's how the method is called :)
@@ -1,75 +1,61 @@ | |||
from collections import defaultdict | |||
from itertools import groupby | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why empty line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's the standard adopted throughout the entirety of devito
...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@georgebisbas sorry about the three "..." above which I'm sure sounds a bit blunt -- but something must have gone south while github was (partly) down yesterday; probably it picked my "Comment" when it was still half baked
I had written something along the lines of:
<imports from stdling>
<blankline>
<imports from third parties>
<blankline>
<local import>
definitely not "..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nw, my main fault is that I thought anytree was considered standard. I overall have the feeling that there should be files not following the first blankline but maybe I am wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably, they might have escaped reviewing, fixable as we encounter them
gflopss = "%.2f GFlops/s" % fround(v.gflopss) | ||
gpointss = "%.2f GPts/s" % fround(v.gpointss) if v.gpointss else None | ||
metrics = ", ".join(i for i in [oi, gflopss, gpointss] if i is not None) | ||
if v.gflopss: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When does this happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data streaming sections which only trigger data transfers (so they take up time) but no compute
5b0009e
to
f71fc50
Compare
devito/ir/support/basic.py
Outdated
@@ -23,6 +22,11 @@ class IndexMode(Tag): | |||
REGULAR = IndexMode('regular') | |||
IRREGULAR = IndexMode('irregular') | |||
|
|||
Delta = Symbol(name='Δ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would refrain from using symbols with FD meaning as they may be used to define Function/....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK replacing with ⋈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry took a bit to get back to it
Overall I think it's GTG, seem to be cleaning up a lot. Some minor comments only
't,x0_blk0,y0_blk0,x,y,z,z') | ||
|
||
def init(f, v=1): | ||
f.data[:] = np.indices(grid.shape).sum(axis=0) % (.004*v) + .01 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting choice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the numbers just random?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely not random
metrics = ", ".join(i for i in [oi, gflopss, gpointss] if i is not None) | ||
if v.gflopss: | ||
oi = "OI=%.2f" % fround(v.oi) | ||
gflopss = "%.2f GFlops/s" % fround(v.gflopss) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think someone asked for .3f
somewhere in slack but not important
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was probably John, but I honestly prefer .2f. When would three digits be useful? Anyway, we can vote!
__str__ = __repr__ | ||
|
||
def _sympystr(self, printer): | ||
return str(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you have a printer shouldn't it be printer.doprint(self)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember exactly, but probably not, pretty sure I tried it but didn't eventually obtained what I wanted. Maybe because the printer lacks the handle for self
, that's why we end up here? Anyway, this is just for the repr
within larger expressions
@@ -105,7 +105,7 @@ def sniff_mpi_distro(mpiexec): | |||
|
|||
|
|||
@memoized_func | |||
def sniff_mpi_flags(): | |||
def sniff_mpi_flags(mpicc='mpicc'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC in the CUDA/HIP compiler classes
return ElementalFunction(name, iet, retval, derive_parameters(iet), prefix, | ||
dynamic_parameters) | ||
return ElementalFunction(name, iet, retval=retval, | ||
parameters=derive_parameters(iet), prefix=prefix, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would lift this out of the call, probably cleaner? parameters=derive_parameters(iet),
DiscreteFunction: 1, | ||
AbstractIncrDimension: 2, | ||
BlockDimension: 3, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understood this; could you help with a brief explanation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's explained in the comment:
https://github.com/devitocodes/devito/blob/master/devito/passes/iet/engine.py#L240
if I have an object X that, among its fields, has a BlockDimension Y, then before attempting to reconstruct a new X I will need to have a reconstruction for Y, has "X depends on Y", in a logical sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember I had a similar approach in my time-tiling, but was considered a kinda-hack :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thankfully GitHub preserves history!
Can you link me to that discussion please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a verbal/slack discussion AFAIR since I did not open this PR.
My point was not to bring it up this way.
AFAIR:
I wanted to bring in the concept of a new class of blocking dimensions that had additional min/max info for handling main/remainder areas. As in having the necessary info for that at cluster level.
Then in, IET-level, when relaxing, I would need to reconstruct both parent-child block dims as the min of child was dependent upon the main-remainder of father.
Long story short, I had the impression that we should not do sth like this. But I am fine.
At least what you do is clean/clear, mine was not really.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there other places in the framework where something similar is being done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure honestly, maybe? I mean, the concept of prioritizing processing is fairly widespread
p = i.parent | ||
pp = i.parent.parent | ||
|
||
name0 = pp.name | ||
base = sregistry.make_name(prefix=name0) | ||
name1 = sregistry.make_name(prefix='%s_blk' % base) | ||
|
||
bd = i.parent._rebuild(name1, pp) | ||
d = i._rebuild(name0, bd, i._min.subs(p, bd), i._max.subs(p, bd)) | ||
|
||
mapper.update({ | ||
i: d, | ||
i.parent: bd | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks weird. Why is that happening?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because if you reconstruct using i
as "stamp", you better be sure to also replace all of its parent occurrences, or you end up with undefined symbols
tests/test_dle.py
Outdated
@@ -404,49 +404,6 @@ def test_cache_blocking_hierarchical(blockshape0, blockshape1, exception): | |||
|
|||
@pytest.mark.parametrize("blockinner", [False, True]) | |||
def test_cache_blocking_imperfect_nest(blockinner): | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sort-of; now it's in test_mpi, though under a slightly different vest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind, and thanks for making me notice: I had originally removed it because, yes, as I said, I was going to move it to test_mpi; but then I rolled most of the temporary edits back; hence it does make sense to keep this test where it was; so, reinstated.
't,x0_blk0,y0_blk0,x,y,z,z') | ||
|
||
def init(f, v=1): | ||
f.data[:] = np.indices(grid.shape).sum(axis=0) % (.004*v) + .01 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the numbers just random?
1ddb0af
to
78905e0
Compare
@mloubout @georgebisbas edits pushed, most comments addressed @mloubout : yes, this PR indeed ships significant tidy up |
@@ -289,19 +302,10 @@ def is_cross(dep): | |||
break | |||
|
|||
# Any anti- and iaw-dependences impose that `cg1` follows `cg0` | |||
# while not being its immediate successor (unless it already is), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, this is massive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I have more to add. I think I asked most of the things that I could not understand. I will approve, but happy to re-review if @FabioLuporini you think there are parts I should pay more attention
6a40d52
to
edb859d
Compare
This PR performs a massive re-engineering of how halo exchanges are detected, constructed, and lowered inside Devito
Why: in master, halo exchanges are detected in the
stree
layer, that is, after all clusters-level passes have been performed. The issue is that some of these passes "clutter" the equations/clusters (after all, we're lowering) by e.g. introducing temporaries and changing the index access functions. This has the effect of enormously complicating data dependence analysis. A complexity exploded when I added CUDA/HIP shared memory.What: change of paradigm. Now detecting halo exchanges is one of the first things the compiler does, and it does that at the
clusters
level. A halo exchange is here represented simply as yet another Cluster. It could not have been metadata attached to Cluster because Clusters evolve during lowering, which would take us back to the problem we're trying to solve here. A "floating" Cluster is the way to go, in my opinion. I failed to see this mechanism for several years, until something flipped in my mind when I was at Rice.How: significant code changes, but in the end, many are simplifications
This PR is not finished yet, there are a couple of small TODOs to deal with, but I'd like people to start reviewing because we'll have to merge it soon (there has been a demand for CUDA/HIP + MPI numbers for weeks...)