Cranelift: follow-up analysis of e-graph rewrite blow-up in #13068

The original problem was shown here: #13068.

## Problem

After testing the fuzzer-generated input [`foo.clif`](https://gist.github.com/alexcrichton/7fe97b9ede4da45ac25e80609b84303d), we identified three rule groups introduced by #12926 (commit `1578325`) as candidates:

- rule A: `((x + y) - (x + z)) -> (y - z)`
- rule B: `((x - z) - (y - z)) -> (x - y)`
- rule C: `((x - y) - (x - z)) -> (z - y)`

The regression disappears as soon as both Rule B and Rule C are removed from `1578325`.

<details>
<summary>benchmark details</summary>

- Baseline commit: `1578325`
- Measurement command: `CRANELIFT_FILETESTS_THREADS=1 /usr/bin/time -p <clif-util> test <foo.clif>`
- Runs per scenario: `5`

| scenario | removed groups | median real (s) | mean real (s) | min | max | stdev | stalled | slow | vs baseline |
| --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| minus_group_ab  | A + B     | 31.11 | 31.196 | 30.96 | 31.52 | 0.228 | 5 | 5 | 1.03x |
| minus_group_ac  | A + C     | 31.21 | 31.230 | 31.02 | 31.63 | 0.218 | 5 | 5 | 1.03x |
| minus_group_bc  | B + C     |  0.17 |  0.164 |  0.14 |  0.18 | 0.016 | 0 | 0 | 0.01x |
| minus_group_abc | A + B + C |  0.15 |  0.156 |  0.14 |  0.18 | 0.016 | 0 | 0 | 0.00x |

Leaving either Rule B or Rule C active still blows up, so each of them is independently sufficient to trigger the regression. Group A has no measurable impact for this testcase.

</details>

## Observation

### `foo.clif` summary

```
    ;; --- foo.clif:124-136  (repeated ishl values sharing v214) ---
    v214 = ishl v211, v213
    v215 = ishl v0,   v214
    v216 = ishl v215, v214
    v217 = ishl v216, v214
    v218 = ishl v217, v214
    v219 = ishl v218, v214
    v220 = ishl v219, v214
    v221 = ishl v220, v214
    v222 = ishl v221, v214
    v223 = ishl v222, v214
    v224 = ishl v223, v214
    v225 = ishl v224, v214
    v226 = ishl v225, v214

    ;; --- foo.clif:155  (selects.isle:4 aliases v242 to v226) ---
    v242 = select v4, v226, v226

    ;; --- foo.clif:186-242  (48 repeated isub(v, v) values on i64x2) ---
    v247 = isub v242, v242
    v248 = isub v247, v247
    v249 = isub v248, v248
    v250 = isub v249, v249

    ... 43 more steps ...
    
    v293 = isub v292, v292
    v294 = isub v293, v293
```

The fuzzer has two main structure:

1. Repeated `ishl` values sharing a common shift amount (`v214`),
2. Repeated `isub.i64x2(v, v)` (48 times).

Additional observations:

1. The regression was resolved after vector support in #13063 (`f2807a1`):

   ```
   ;; x - x --> 0
   (rule (simplify (isub (ty_int ty) x x)) (subsume (iconst_u ty 0)))
   ```

   `isub.i64x2 v v` collapses to `0` at the root.

2. Under `1578325` (Rule B/C active, pre-#13063 behavior), this rule fires ~3,000,000 times on a single rule:

   ```
   ;; (x<<z) - (y<<z) --> (x-y)<<z
   (rule (simplify (isub ty (ishl ty x z) (ishl ty y z))) (ishl ty (isub ty x y) z))
   ```
   Let us call Rule D.

## Causality

From `v_{k+1} = isub(v_k, v_k)`, where `v_k = isub(v_{k-1}, v_{k-1})`, one level of unfolding gives:

```
v_{k+1}
  = isub(v_k, v_k)
  = isub(isub(v_{k-1}, v_{k-1}), isub(v_{k-1}, v_{k-1}))
  = isub(v_{k-1}, v_{k-1})                    ; Rule B
```

Result is the earlier expression, `v_k`. This means that rules can return the earlier value as an equivalent value while optimizing `v_{k+1}`. This equivalence is represented by a Union value, unless the rewrite subsumes the original or hits the eclass-size limit.

For example:

```
v252 = isub(v251, v251)
v251 = isub(v250, v250)

-- 

v252 = isub(v251, v251)
     = isub(isub(v250, v250), isub(v250, v250))
     = isub(v250, v250)                           ; Rule B
     = v251
```

Rule B/C can make a later repeated `isub(v, v)` value equivalent to an earlier one.

```isle
(rule (simplify (isub ty (ishl ty x z) (ishl ty y z))) (ishl ty (isub ty x y) z))
```

Rule D matches when both operands can be viewed as `ishl` expressions with the same shift amount. For example,

```
v247 = isub(v242, v242)
     = isub(v226, v226)
     = isub(ishl(v225, v214), ishl(v225, v214))
     = ishl(isub(v225, v225), v214)              ; Rule D
```

B/C add the extra effect that a later value can become equivalent to an earlier repeated `isub` value:

```
v248 = isub(v247, v247)
     = isub(isub(v242, v242), isub(v242, v242))
     = isub(v242, v242)                          ; Rule B
     = v247
```

So later values can see both locally-created forms and forms already associated with earlier values such as `v247`.

- Without Rule B/C, Rule D can still fire. But later repeated `isub(v, v)` values do not get rewritten back to earlier repeated `isub` values. Rule D is also bounded by the rewrite-depth limit.
- With Rule B/C, later values can also be equivalent to earlier values, so Rule D gets many more valid bindings.

In short:

1. Rule B/C can rewrite a later `isub(v, v)` to a form already represented by an earlier value.
2. Rule D then has more equivalent forms to match against.

## Limits for egraph rewriting

The existing limits are local:

- `REWRITE_LIMIT` bounds recursive optimization.
- `ECLASS_ENODE_LIMIT` bounds the size of one tree.
- `MATCHES_LIMIT` truncates results from one ISLE call.

Each of the 48 `isub(v, v)` values is optimized separately. Rule B can rewrite a later one back to the previous `isub(v, v)` value, so each local rewrite can match forms recorded for earlier values.

Additional off-by-one: `REWRITE_LIMIT = 5` is checked before incrementing `rewrite_depth`(which starts from `0`), so the maximum observed depth is `6`.

## Evidence

Per-top-level Rule D fire counts (first few repeated `isub(v, v)` values):

| top-level | fires |
|-----------|------:|
| v247      | 6     |
| v248      | 6     |
| v249      | 44    |
| v251      | 138   |

This growth pattern accumulates to observed 3.3M total fires.

Trace evidence that an earlier optimized value is returned again for later values(trace-log):

```text
1123:  optimizing inst inst132 orig result v247 gave v830
1269:  -> returned from ISLE: v248 -> [v830, v852]
1904:  -> returned from ISLE: v249 -> [v830, v852, v932, v939, v944]
```
`v830` appearing in later return lists shows that an earlier optimized value is returned again as an equivalent value for later inputs.

## Open question

Our interpretation is that this is not just a problem with one rewrite rule, but a compile-time risk in how e-graphs(and Cranelift's aegraph) expose accumulated equivalent alternatives during matching.

Does this interpretation sound reasonable, or would you think about this failure mode differently?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cranelift: follow-up analysis of e-graph rewrite blow-up in #13068 #13204

Problem

Observation

`foo.clif` summary

Causality

Limits for egraph rewriting

Evidence

Open question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

scenario	removed groups	median real (s)	mean real (s)	min	max	stdev	stalled	slow	vs baseline
minus_group_ab	A + B	31.11	31.196	30.96	31.52	0.228	5	5	1.03x
minus_group_ac	A + C	31.21	31.230	31.02	31.63	0.218	5	5	1.03x
minus_group_bc	B + C	0.17	0.164	0.14	0.18	0.016	0	0	0.01x
minus_group_abc	A + B + C	0.15	0.156	0.14	0.18	0.016	0	0	0.00x

Cranelift: follow-up analysis of e-graph rewrite blow-up in #13068 #13204

Description

Problem

Observation

foo.clif summary

Causality

Limits for egraph rewriting

Evidence

Open question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`foo.clif` summary