Emit TensorEncoding into StableHLO output (P0-1 step 2)

## Context

Follow-up to #469. \`TensorSpec\` now carries \`tensorEncoding: TensorEncoding?\` end-to-end through \`TraceToGraphBuilder\`, so by the time a \`GraphNode\` reaches \`StableHloConverter\` its weight / constant operand specs already know whether they're Q4_K / Q8_0 / TernaryPacked / TurboQuant / Dense / unknown.

The StableHLO emitter currently ignores this metadata entirely. Even if a weight arrives with \`tensorEncoding == TensorEncoding.Q8_0\`, the emitted \`.mlir\` is indistinguishable from one built out of dense FP32 constants. That erases quantization at the compile-path boundary, which is exactly the P0-1 gap the whole track is meant to close.

## This PR

Minimal useful emitter hook — the simplest thing that lets downstream tools (and humans reading the MLIR) see that quantization flowed through.

1. **Emit a metadata comment next to encoded operands.** In \`StableHloConverter\`'s main dispatch loop, after an operation has been emitted, check its \`GraphNode\` inputs / outputs for any spec with a non-null \`tensorEncoding\` and emit a preceding MLIR comment line like:

   \`\`\`mlir
   // tensor_encoding: operand=1 name=w encoding=Q8_0
   \`\`\`

   The comment carries the operand index, the tensor name, and the \`TensorEncoding.name\` string. MLIR tools ignore comments but text round-trips preserve them, so the information survives into consumer pipelines.

2. **Expose an optional typed hook.** Add a small helper to \`ConversionContext\` — \`emitEncodingAnnotation(spec: TensorSpec)\` — that individual converters can call if they want finer-grained comment placement than the default post-emit sweep.

3. **Unit test.** Build a \`ComputeGraph\` with a weight node whose output spec has \`TensorEncoding.Q8_0\`, run it through \`StableHloConverter\`, assert the emitted text contains the expected \`// tensor_encoding: ... encoding=Q8_0\` comment near the weight's emission site.

## Why comments and not real quant dialect ops

StableHLO's \`quant.\` dialect uses typed quant element types (\`!quant.uniform<i8:f32, 0.1:128>\`) that are fiddly to emit as text and are not yet consumed anywhere in the SKaiNET pipeline. Emitting them prematurely would just produce MLIR that no existing tool in this repo validates. Comments are the cheapest reversible first hop and unblock the next PR, which can: (a) grow the comment to a structured \`#skainet.tensor_encoding\` attribute, or (b) cut over to real \`stablehlo.custom_call @dequantize_q8_0\` stubs matching the style already used by \`ReductionOperationsConverter\`.

## Out of scope

- \`stablehlo.uniform_quantize\` / real quant dialect emission.
- Teaching IREE or any downstream tool to read the comments. That's downstream work.
- Changing the shape of \`TensorEncoding\` or \`TensorSpec\`.
- Conv / attention / softmax lowerings (#467 is separate).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emit TensorEncoding into StableHLO output (P0-1 step 2) #473

Context

This PR

Why comments and not real quant dialect ops

Out of scope

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Emit TensorEncoding into StableHLO output (P0-1 step 2) #473

Description

Context

This PR

Why comments and not real quant dialect ops

Out of scope

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions