Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce loop cloning in fast followers #16229

Merged
merged 11 commits into from Aug 13, 2020

Conversation

e-kayrakli
Copy link
Contributor

@e-kayrakli e-kayrakli commented Aug 12, 2020

This PR adjusts the module support for fast followers to reduce the amount of
loop cloning we do.

Initially attempted at #16001
Also see #16053, which has done
similar reductions in loop cloning that happens with fast followers

Motivation

The static check that we do today is way too lax. Because for a leader that
supports fast followers, we always have a fast loop. However that loop is unused
if any of the followers don't support fast followers and/or not of same type as
the leader.

This decision seems to be made in 2010 when fast followers were first
implemented. I assume that compiler wasn't this powerful and slow and/or there
wasn't param unfolding etc.

The commit where this was added is 0dd8331.

Citing part of it that alludes to this decision:

This optimization works as follows: When 'parsing' a forall loop, we
generate code that generates two loops in a conditional, a fast
follower loop and a normal follower loop. The conditional is guarded
by or'ing the result of calling the static checks on all of the
followers, and then and'ing the result of calling the dynamic checks
on all of the followers. That is, if any follower has a fast version
that may be called, let's generate the code, but we're only going to
run the fast loop (all fast followers) if every follower that may be
called should be called.

Where there is no strong argument for having this behavior.

I believe these type of false optimizations are the main reason for the
performance overhead of fast followers.

Approach

The fast follower implementation in general is too complicated. I think the
reason for that was we were overloading the meaning of static and dynamic checks
probably to reduce the amount of code. These checks return a bool and thus can
be used for binary choices. However, we have a more complicated taxonomy of
types w.r.t fast followers:

  • Arrays

    • Array types that supports fast followers: can lead and generate fast
      followers
    • Array types that don't support fast followers: cannot lead or generate fast
      followers
  • Domains: Can lead fast followers. Can take place among other fast followers in
    a zippered loop.

  • Other iterators: Cannot lead or generate fast followers. However, can take
    place among other fast followers in a zippered loop

This PR adds three helpers (and their overloads) to distinguish these:

  • chpl__canHaveFastFollowers: Returns true for an array, or a zip tuple that
    has at least one array. (This check doesn't involve the specific type of the
    array, it is still handled by chpl__staticFastFollowCheck

  • chpl__canLeadFastFollowers: Returns true for arrays and domains.

  • chpl__hasInertFastFollowers: Returns false for arrays. For a zip tuple, it
    returns false if there is one array.

These helpers are used in the other existing checks to limit loop cleaning and
improve readability.

Performance

For benchmarks where fast followers are used heavily (stream, prk-stencil) there
isn't any change in compiler or application performance. We see small
improvements in benchmarks where we create fast followers unnecessarily
(cross-type array assignments/zippering are examples where this may happen).

On miniMD: I see 3 seconds faster --fast compilation on XC and locally
On SSCA2: I see 6 seconds faster --fast compilation locally, but not so much of
an improvement on XC.

The reduction in the code size is there but its impact on performance is a bit
of a hit-and-miss and depends on where you compile.

Readability Advantages

  • Static and dynamic checks are more symmetrical
  • Static check is and'ed instead of or'ed. i.e. we only have a clone if it makes
    sense
  • Dynamic check no longer returns true for generic argument.

Test:

  • standard
  • gasnet

Copy link
Contributor

@ronawho ronawho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Left 2 minor nits, but I think the core of this looks great.

@e-kayrakli e-kayrakli merged commit c58587f into chapel-lang:master Aug 13, 2020
@e-kayrakli e-kayrakli deleted the static-fast-follower branch August 13, 2020 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants