Skip to content

Cranelift: follow-up analysis of e-graph rewrite blow-up in #13068 #13204

@myunbin

Description

@myunbin

The original problem was shown here: #13068.

Problem

After testing the fuzzer-generated input foo.clif, we identified three rule groups introduced by #12926 (commit 1578325) as candidates:

  • rule A: ((x + y) - (x + z)) -> (y - z)
  • rule B: ((x - z) - (y - z)) -> (x - y)
  • rule C: ((x - y) - (x - z)) -> (z - y)

The regression disappears as soon as both Rule B and Rule C are removed from 1578325.

benchmark details
  • Baseline commit: 1578325
  • Measurement command: CRANELIFT_FILETESTS_THREADS=1 /usr/bin/time -p <clif-util> test <foo.clif>
  • Runs per scenario: 5
scenario removed groups median real (s) mean real (s) min max stdev stalled slow vs baseline
minus_group_ab A + B 31.11 31.196 30.96 31.52 0.228 5 5 1.03x
minus_group_ac A + C 31.21 31.230 31.02 31.63 0.218 5 5 1.03x
minus_group_bc B + C 0.17 0.164 0.14 0.18 0.016 0 0 0.01x
minus_group_abc A + B + C 0.15 0.156 0.14 0.18 0.016 0 0 0.00x

Leaving either Rule B or Rule C active still blows up, so each of them is independently sufficient to trigger the regression. Group A has no measurable impact for this testcase.

Observation

foo.clif summary

    ;; --- foo.clif:124-136  (repeated ishl values sharing v214) ---
    v214 = ishl v211, v213
    v215 = ishl v0,   v214
    v216 = ishl v215, v214
    v217 = ishl v216, v214
    v218 = ishl v217, v214
    v219 = ishl v218, v214
    v220 = ishl v219, v214
    v221 = ishl v220, v214
    v222 = ishl v221, v214
    v223 = ishl v222, v214
    v224 = ishl v223, v214
    v225 = ishl v224, v214
    v226 = ishl v225, v214

    ;; --- foo.clif:155  (selects.isle:4 aliases v242 to v226) ---
    v242 = select v4, v226, v226

    ;; --- foo.clif:186-242  (48 repeated isub(v, v) values on i64x2) ---
    v247 = isub v242, v242
    v248 = isub v247, v247
    v249 = isub v248, v248
    v250 = isub v249, v249

    ... 43 more steps ...
    
    v293 = isub v292, v292
    v294 = isub v293, v293

The fuzzer has two main structure:

  1. Repeated ishl values sharing a common shift amount (v214),
  2. Repeated isub.i64x2(v, v) (48 times).

Additional observations:

  1. The regression was resolved after vector support in Support vector types in iconst_{u,s} #13063 (f2807a1):

    ;; x - x --> 0
    (rule (simplify (isub (ty_int ty) x x)) (subsume (iconst_u ty 0)))
    

    isub.i64x2 v v collapses to 0 at the root.

  2. Under 1578325 (Rule B/C active, pre-Support vector types in iconst_{u,s} #13063 behavior), this rule fires ~3,000,000 times on a single rule:

    ;; (x<<z) - (y<<z) --> (x-y)<<z
    (rule (simplify (isub ty (ishl ty x z) (ishl ty y z))) (ishl ty (isub ty x y) z))
    

    Let us call Rule D.

Causality

From v_{k+1} = isub(v_k, v_k), where v_k = isub(v_{k-1}, v_{k-1}), one level of unfolding gives:

v_{k+1}
  = isub(v_k, v_k)
  = isub(isub(v_{k-1}, v_{k-1}), isub(v_{k-1}, v_{k-1}))
  = isub(v_{k-1}, v_{k-1})                    ; Rule B

Result is the earlier expression, v_k. This means that rules can return the earlier value as an equivalent value while optimizing v_{k+1}. This equivalence is represented by a Union value, unless the rewrite subsumes the original or hits the eclass-size limit.

For example:

v252 = isub(v251, v251)
v251 = isub(v250, v250)

-- 

v252 = isub(v251, v251)
     = isub(isub(v250, v250), isub(v250, v250))
     = isub(v250, v250)                           ; Rule B
     = v251

Rule B/C can make a later repeated isub(v, v) value equivalent to an earlier one.

(rule (simplify (isub ty (ishl ty x z) (ishl ty y z))) (ishl ty (isub ty x y) z))

Rule D matches when both operands can be viewed as ishl expressions with the same shift amount. For example,

v247 = isub(v242, v242)
     = isub(v226, v226)
     = isub(ishl(v225, v214), ishl(v225, v214))
     = ishl(isub(v225, v225), v214)              ; Rule D

B/C add the extra effect that a later value can become equivalent to an earlier repeated isub value:

v248 = isub(v247, v247)
     = isub(isub(v242, v242), isub(v242, v242))
     = isub(v242, v242)                          ; Rule B
     = v247

So later values can see both locally-created forms and forms already associated with earlier values such as v247.

  • Without Rule B/C, Rule D can still fire. But later repeated isub(v, v) values do not get rewritten back to earlier repeated isub values. Rule D is also bounded by the rewrite-depth limit.
  • With Rule B/C, later values can also be equivalent to earlier values, so Rule D gets many more valid bindings.

In short:

  1. Rule B/C can rewrite a later isub(v, v) to a form already represented by an earlier value.
  2. Rule D then has more equivalent forms to match against.

Limits for egraph rewriting

The existing limits are local:

  • REWRITE_LIMIT bounds recursive optimization.
  • ECLASS_ENODE_LIMIT bounds the size of one tree.
  • MATCHES_LIMIT truncates results from one ISLE call.

Each of the 48 isub(v, v) values is optimized separately. Rule B can rewrite a later one back to the previous isub(v, v) value, so each local rewrite can match forms recorded for earlier values.

Additional off-by-one: REWRITE_LIMIT = 5 is checked before incrementing rewrite_depth(which starts from 0), so the maximum observed depth is 6.

Evidence

Per-top-level Rule D fire counts (first few repeated isub(v, v) values):

top-level fires
v247 6
v248 6
v249 44
v251 138

This growth pattern accumulates to observed 3.3M total fires.

Trace evidence that an earlier optimized value is returned again for later values(trace-log):

1123:  optimizing inst inst132 orig result v247 gave v830
1269:  -> returned from ISLE: v248 -> [v830, v852]
1904:  -> returned from ISLE: v249 -> [v830, v852, v932, v939, v944]

v830 appearing in later return lists shows that an earlier optimized value is returned again as an equivalent value for later inputs.

Open question

Our interpretation is that this is not just a problem with one rewrite rule, but a compile-time risk in how e-graphs(and Cranelift's aegraph) expose accumulated equivalent alternatives during matching.

Does this interpretation sound reasonable, or would you think about this failure mode differently?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIncorrect behavior in the current implementation that needs fixingcraneliftIssues related to the Cranelift code generator

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions