Context
Surfaced during the int8 NPU readiness audit on 2026-04-13. The `TensorEncoding` metadata plumbing landed in #471 (accessor helpers) and is used in `TraceToGraphBuilder.finalize()` to attach encodings to session-resolved weight nodes. The #478 StableHLO emitter reads that metadata and emits a module-level `skainet.tensor_encodings = {...}` attribute.
The gap: `TraceToGraphBuilder.addTrace` — the path that processes every traced op — builds its input and output `TensorSpec` objects from `trace.attributes["inputShapes"]` / `outputShapes` / `inputDTypes` / `outputDTypes` only. It does not consult the underlying tensor's `TensorData` subtype, and it does not propagate `TensorEncoding` from a producer's output spec to a consumer's input spec.
Consequence: a Q8_0 weight that flows through any intermediate op (`transpose(w)`, `reshape(w)`, `cast(w)`) reaches the downstream matmul's input spec without its `tensorEncoding` set. The StableHLO emitter's `skainet.tensor_encodings` dictionary correctly lists the original weight, but any consumer pass that wants to identify a matmul operand as quantized would have to walk the def-use chain backwards through the IR to find the original constant — which defeats the point of carrying the metadata in the first place.
Why this is P2 and not P0
- Not blocking NNAPI / runtime backends. A runtime NNAPI backend dispatches on `TensorData` subtype directly (`is Q8_0TensorData`), not on `TensorSpec.metadata`. The encoding channel is only load-bearing for the compile path (StableHLO → IREE).
- Not blocking the first IREE spike. The first validation question is "does IREE even accept a SKaiNET-emitted StableHLO module." Consuming the encoding metadata is downstream of that.
- A weight that's used directly by a matmul (without an intervening transpose/reshape) still gets its encoding attached correctly via `finalize()`. Many model traces take that shape.
Scope
- In `TraceToGraphBuilder.addTrace`, after `buildInputSpecs` / `buildOutputSpecs` produce their initial `TensorSpec` instances, propagate `tensorEncoding` from the corresponding producer node's output spec for every input that has a known producer (`producersByTensorId[tRef.id]` is non-null). This is the "forward propagation" step that makes encoding flow through intermediate ops.
- Decide a policy for the pass-through ops (`reshape`, `transpose`, `squeeze`, `unsqueeze`, `concat`, `slice`, `cast`): does the output of a transpose of a Q8_0 tensor inherit the Q8_0 encoding? Default: yes for type-preserving structural ops (transpose, reshape, squeeze, unsqueeze, concat, slice); no for `cast` since it changes the element type.
- Unit test that builds a trace with a Q8_0 weight → transpose → matmul and asserts the matmul's input spec at the graph node level carries `TensorEncoding.Q8_0`.
- `./gradlew :skainet-compile:skainet-compile-hlo:allTests` and `:skainet-compile:skainet-compile-dag:allTests` before pushing.
Out of scope
- Quantized-aware cast that preserves encoding across element-type transitions (handled by a separate `TensorEncoding.Dense(bytesPerElement=…)` adjustment; later).
- Emitting the quantized constant values in the StableHLO output (separate gap, separate issue).
- Changing the `TensorEncoding` sealed type.
Relationship to other IREE work
This is one of two gaps surfaced in the 2026-04-13 audit. The other is "quantized weight values not emitted as stablehlo.constant" which is tracked in its own issue and is the more load-bearing IREE gap. Both are tagged into the project at https://github.com/orgs/SKaiNET-developers/projects/1.
Context
Surfaced during the int8 NPU readiness audit on 2026-04-13. The `TensorEncoding` metadata plumbing landed in #471 (accessor helpers) and is used in `TraceToGraphBuilder.finalize()` to attach encodings to session-resolved weight nodes. The #478 StableHLO emitter reads that metadata and emits a module-level `skainet.tensor_encodings = {...}` attribute.
The gap: `TraceToGraphBuilder.addTrace` — the path that processes every traced op — builds its input and output `TensorSpec` objects from `trace.attributes["inputShapes"]` / `outputShapes` / `inputDTypes` / `outputDTypes` only. It does not consult the underlying tensor's `TensorData` subtype, and it does not propagate `TensorEncoding` from a producer's output spec to a consumer's input spec.
Consequence: a Q8_0 weight that flows through any intermediate op (`transpose(w)`, `reshape(w)`, `cast(w)`) reaches the downstream matmul's input spec without its `tensorEncoding` set. The StableHLO emitter's `skainet.tensor_encodings` dictionary correctly lists the original weight, but any consumer pass that wants to identify a matmul operand as quantized would have to walk the def-use chain backwards through the IR to find the original constant — which defeats the point of carrying the metadata in the first place.
Why this is P2 and not P0
Scope
Out of scope
Relationship to other IREE work
This is one of two gaps surfaced in the 2026-04-13 audit. The other is "quantized weight values not emitted as stablehlo.constant" which is tracked in its own issue and is the more load-bearing IREE gap. Both are tagged into the project at https://github.com/orgs/SKaiNET-developers/projects/1.