Skip to content

Commit

Permalink
[LoopUnroll] Clamp PartialThreshold for large LoopMicroOpBufferSize (l…
Browse files Browse the repository at this point in the history
…lvm#67657)

The znver3/znver4 scheduler models are outliers, specifying very large
LoopMicroOpBufferSizes at 512, while typical values for other subtargets
are on the order of ~50. Even if this information is
micro-architecturally correct (*), this does not mean that we want to
runtime unroll all loops to a size that completely fills the loop
buffer. Unless this is the single hot loop in the entire application,
the massive code size increase will bust the micro-op and instruction
caches.

Protect against this by clamping to the default PartialThreshold of 150,
which is the same as the default full-unroll threshold and half the
aggressive full-unroll threshold. Allowing more partial unrolling than
full unrolling certainly does not make sense.

(*) I strongly doubt that this is actually correct -- I believe this may
derive from an incorrect reading of Agner Fog's micro-architecture
guide. The number 4096 that was originally used here is the size of the
general micro-op cache, not that of a loop buffer. A separate loop
buffer is not listed for the Zen microarchitecture. Comparing this to
the listing for Skylake, it has a 1536 micro-op buffer, but only a 64
micro-op loopback buffer, with a note that it's rarely fully utilized.
Our scheduling model specifies LoopMicroOpBufferSize of 50 in that case.
  • Loading branch information
nikic committed May 16, 2024
1 parent 72200fc commit f0b3654
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 742 deletions.
8 changes: 7 additions & 1 deletion llvm/include/llvm/CodeGen/BasicTTIImpl.h
Original file line number Diff line number Diff line change
Expand Up @@ -612,7 +612,13 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
if (PartialUnrollingThreshold.getNumOccurrences() > 0)
MaxOps = PartialUnrollingThreshold;
else if (ST->getSchedModel().LoopMicroOpBufferSize > 0)
MaxOps = ST->getSchedModel().LoopMicroOpBufferSize;
// Upper bound by the default PartialThreshold, which is the same as
// the default full-unroll Threshold. Even if the loop micro-op buffer
// is very large, this does not mean that we want to unroll all loops
// to that length, as it would increase code size beyond the limits of
// what unrolling normally allows.
MaxOps = std::min(ST->getSchedModel().LoopMicroOpBufferSize,
UP.PartialThreshold);
else
return;

Expand Down

0 comments on commit f0b3654

Please sign in to comment.