Skip to content

Promote tensor_encoding comment to structured module attribute (P0-1 step 3) #477

@michalharakal

Description

@michalharakal

Context

Follow-up to #473. The StableHLO emitter now surfaces `TensorSpec.tensorEncoding` as MLIR comments next to the operations that produce or consume encoded tensors:

```mlir
// tensor_encoding: role=result index=0 name=w encoding=Q8_0
```

Comments are great as a cheapest-reversible first hop but they have two shortcomings:

  1. They're string-matching territory for downstream consumers — no parser, no validation, no round-trip guarantee through tools that strip comments.
  2. They live next to ops instead of at the module level, so a downstream pass that wants to enumerate "every Q8_0 weight in this function" has to walk the whole IR instead of reading one structured attribute.

This PR

Promote the per-op comment annotations to a module-level MLIR attribute that enumerates every encoded tensor in one place. The emitted header becomes:

```mlir
module attributes {
skainet.tensor_encodings = {
w = "Q8_0",
other_weight = "Q4_K"
}
} {
func.func @main(...) -> (...) {
...
}
}
```

Concretely:

  1. Collect phase: before emitting `module {`, walk the `ComputeGraph` once, gather every `TensorSpec` with non-null `tensorEncoding`, and build a map of `tensor_name → encoding_name`. No duplication: if the same name appears in multiple nodes, keep a single entry. Diagnostic: if the same name appears with two different encodings, drop with a warning comment — that's a graph-level bug caller should fix.
  2. Emit phase: if the map is non-empty, emit `module attributes { skainet.tensor_encodings = { ... } } {` instead of the bare `module {`. If it's empty, preserve the existing bare form exactly — dense graphs look identical to today.
  3. Keep the existing per-op comments — they're still useful for humans reading diffs and cost nothing now that the structured attribute is the source of truth for tools. A follow-up can remove them if we decide the attribute alone is sufficient.
  4. Unit test: a graph with a Q8_0 weight and a Q4_K weight must emit a module header containing `skainet.tensor_encodings = {` and both `= "Q8_0"` and `= "Q4_K"` entries. A dense graph must emit the bare `module {` header with no `attributes` block.

Why a module attribute, not an op attribute

Real MLIR op attributes live inside the custom assembly of each `stablehlo.*` op (`stablehlo.dot_general %a, %b {skainet.encoding = "Q8_0"} : ...`). Emitting them correctly requires touching every converter in the registry — 7 files today, all of which hand-build their op strings. A module-level attribute lives in exactly one place (`StableHloConverter.convert`) and costs one hook. It's the bigger value-per-line-of-code; per-op attributes can come in a later refactor when the converter registry is reworked to emit via a builder instead of string concatenation.

Out of scope

  • Per-op MLIR attributes. Bigger refactor.
  • Real `quant.` dialect emission (`!quant.uniform<i8:f32, 0.1:128>`). Requires structured quant parameters (scale, zero point) that `TensorEncoding` doesn't yet carry.
  • Teaching IREE to consume the attribute. That's downstream work.
  • Changing the shape of `TensorEncoding` or `TensorSpec`.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions