Skip to content

Emit skainet.tensor_encodings module attribute (#477)#478

Merged
michalharakal merged 2 commits intodevelopfrom
feature/477-tensor-encodings-module-attribute
Apr 13, 2026
Merged

Emit skainet.tensor_encodings module attribute (#477)#478
michalharakal merged 2 commits intodevelopfrom
feature/477-tensor-encodings-module-attribute

Conversation

@michalharakal
Copy link
Copy Markdown
Contributor

Closes #477.

Summary

Promotes the per-op `tensor_encoding` annotation comments introduced in #473 to a structured module-level MLIR attribute that enumerates every encoded tensor in one place. Downstream tools (IREE, any MLIR pass reading SKaiNET output) can now read the encodings via one attribute lookup instead of string-matching against scattered comments.

Emitted output

For a graph with two encoded weight inputs (Q8_0 and Q4_K):

```mlir
module attributes {skainet.tensor_encodings = {w_q4 = "Q4_K", w_q8 = "Q8_0"}} {
func.func @quant_chain(...) -> (...) {
...
}
}
```

Dense graphs with no encoding metadata continue to emit the bare `module {` header byte-for-byte-identically to today — the promotion only fires when the collect phase finds at least one non-null encoding.

Two commits

1. Failing test — `TensorEncodingsModuleAttributeTest`

  • `encoded_weights_produce_module_attributes_block` builds a graph with two weight inputs (`TensorEncoding.Q8_0` and `TensorEncoding.Q4_K` via `withTensorEncoding`), runs the converter, and asserts the module header contains `module attributes`, `skainet.tensor_encodings`, and both `w_q8 = "Q8_0"` / `w_q4 = "Q4_K"` entries. Red against pre-fix emitter.
  • `dense_graph_keeps_bare_module_header` asserts a dense graph preserves `module {` exactly and introduces no spurious attributes block. A `null` `tensorEncoding` is the unknown / not-carried state, intentionally distinct from `TensorEncoding.Dense`. Green baseline.

2. The fix

  • `StableHloConverter.collectTensorEncodings` walks the topologically-ordered node list once, pulling every non-null `tensorEncoding` off each node's input and output specs into a sorted `name -> TensorEncoding` map. First-writer-wins on duplicates.
  • `StableHloConverter.convert` emits `module attributes {skainet.tensor_encodings = {...}} {` when the map is non-empty, and `module {` otherwise.
  • `MlirValidator` learned three things: (a) module-header lines are not SSA assignments even when they contain ` = ` inside a dict (syntax pass skips them), (b) same for the semantic pass's defined-values extraction, (c) `validateModule` accepts both `module {` and `module attributes` as valid module declaration forms.

Why a module attribute, not per-op

Real per-op MLIR attributes (`stablehlo.dot_general %a, %b {skainet.encoding = "Q8_0"} : ...`) would need every converter in the registry updated — 7 files today, all hand-building their op strings. A module-level attribute lives in exactly one hook and is the biggest value-per-line-of-code first move. Per-op attributes are a better follow-up once the registry is reworked to emit via a builder instead of string concatenation.

Test plan

  • `TensorEncodingsModuleAttributeTest` — both cases green
  • Full `:skainet-compile:skainet-compile-hlo:jvmTest` — green, no regressions from either the emitter or validator edits
  • CI: full multiplatform build

Out of scope

  • Per-op MLIR attributes (`stablehlo.*` op custom-assembly-format attributes).
  • Real `quant.` dialect emission (`!quant.uniform<i8:f32, 0.1:128>`). Requires structured scale / zero-point parameters that `TensorEncoding` doesn't yet carry.
  • Teaching IREE to consume the attribute. Downstream work.
  • Removing the per-op `tensor_encoding` comments from Emit TensorEncoding into StableHLO output (P0-1 step 2) #473. They cost nothing now and help human diff readers; can be deleted in a follow-up if we decide the attribute alone suffices.

🤖 Generated with Claude Code

michalharakal and others added 2 commits April 13, 2026 12:55
Adds TensorEncodingsModuleAttributeTest with two cases:

1. encoded_weights_produce_module_attributes_block — builds a
   graph with two weight inputs carrying distinct encodings
   (TensorEncoding.Q8_0 and TensorEncoding.Q4_K) via
   withTensorEncoding, runs the converter, and asserts the
   emitted module header is `module attributes { ... }` with a
   single `skainet.tensor_encodings = { w_q4 = "Q4_K", w_q8 =
   "Q8_0" }` dictionary enumerating every encoded tensor in one
   place. Red against StableHloConverter today — the metadata
   is only exposed as scattered per-op comments.

2. dense_graph_keeps_bare_module_header — a dense FP32 graph
   with no encoding metadata must preserve the bare `module {`
   header and must not introduce a spurious `attributes` block.
   A `null` tensorEncoding is the unknown / not-carried state,
   intentionally distinct from TensorEncoding.Dense, and the
   emitter must stay silent. Green baseline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Teaches StableHloConverter to collect every TensorSpec with a
non-null tensorEncoding from the graph in a single pre-emit
walk and surface them as a structured MLIR attribute on the
module header:

    module attributes {skainet.tensor_encodings = {w_q4 = "Q4_K", w_q8 = "Q8_0"}} {
      func.func @quant_chain(...) -> (...) {
        ...
      }
    }

Downstream tools (IREE, any MLIR pass reading SKaiNET output)
can now enumerate every encoded tensor via one attribute lookup
instead of string-matching against the scattered `tensor_encoding`
comments introduced in #473. Dense graphs with no encoding
metadata continue to emit the bare `module {` header byte-for-
byte-identically to today — the promotion only fires when the
collect phase finds at least one non-null encoding. The existing
per-op comments are left in place; a follow-up can remove them
if we decide the module attribute alone is sufficient.

Collect phase is first-writer-wins: if the same tensor name
surfaces in multiple nodes it collapses to a single map entry.
Walks both inputs and outputs of every node so weight-operand
references catch encodings that didn't originate as node outputs.

Also tweaks MlirValidator to accept the new header shape:

- validateSyntax treats `module` lines as module preamble and
  stops processing them at the header check, so the dict's
  `name = "value"` entries are no longer fed into the SSA
  assignment validator (which would fire on the absence of a
  `%` prefix).
- validateSemantics skips `module`-prefixed lines for the same
  reason in its defined-values extraction.
- validateModule accepts both `module {` and `module attributes`
  as valid module declaration forms.

Full :skainet-compile:skainet-compile-hlo:jvmTest suite stays
green — no existing test was asserting on `module {` in a way
that excludes the attributes form.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@michalharakal michalharakal merged commit 9edd5ec into develop Apr 13, 2026
2 of 4 checks passed
@michalharakal michalharakal deleted the feature/477-tensor-encodings-module-attribute branch April 13, 2026 10:57
michalharakal added a commit that referenced this pull request Apr 13, 2026
`MutableMap.putIfAbsent(key, value)` is a JVM-only extension (it
lives on `java.util.Map`), not part of Kotlin common-stdlib. The
usage introduced in #477's `collectTensorEncodings` compiled fine
on JVM but broke `compileKotlinWasmJs` (and by extension JS /
Native) with `Unresolved reference 'putIfAbsent'` at
StableHloConverter.kt:209,213. develop has been red on the
multiplatform build-job since #478 merged.

Root cause: the local verification for #477 ran only
`:skainet-compile:skainet-compile-hlo:jvmTest`. KMP's per-target
compilation means a JVM-only stdlib call silently passes JVM
checks and only fails on the first non-JVM compile.

Replaces both call sites with the common-compatible idiom:

    if (spec.name !in result) result[spec.name] = encoding

Verified locally with
`./gradlew :skainet-compile:skainet-compile-hlo:allTests
 -x kotlinWasmStoreYarnLock` — green across jvmTest, wasmJsTest,
wasmJsBrowserTest, wasmWasiTest, wasmWasiNodeTest, macosArm64Test,
and iosSimulatorArm64Test (linuxX64Test skipped on macOS host).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
michalharakal added a commit that referenced this pull request Apr 13, 2026
Captures the goal, phases, non-goals, and risks of implementing
a new `skainet-backend-nnapi` module that runs SKaiNET models
on an Amlogic Android dev board's NPU via Android's NNAPI HAL.

Important placement notes in the PRD itself:
- The backend lives in a NEW sibling repo, not in mainline
  SKaiNET. Mainline stays general and IREE-focused.
- The backend builds on top of the already-merged
  skainet-backend-api module (#470) and the TensorEncoding
  metadata flow (#471 / #475 / #478) — no mainline code
  changes are required to ship Phase 1-3.
- Orthogonal to SKaiNET-transformers, which owns LLM modules.

Phases:
  0. Board bring-up + NNAPI device capability dump
  1. FP32 dense matmul end-to-end
  2. int8 quantization path hitting the NPU driver
  3. Target model (MobileNetV3 int8 or TinyLlama candidate)
  4. Optional production packaging

Also documents the known deprecation risk (Android 15 marked
NNAPI deprecated in favor of LiteRT) and captures this as
accepted: ship the Amlogic use case now, plan a LiteRT
successor later.

This file is a planning artifact; it will be moved / referenced
from the new backend repo once that repo exists.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Promote tensor_encoding comment to structured module attribute (P0-1 step 3)

1 participant