Plumb TensorEncoding into TensorSpec.metadata (#469) by michalharakal · Pull Request #471 · SKaiNET-developers/SKaiNET

michalharakal · 2026-04-13T10:23:19Z

Closes #469.

Summary

First step of the P0-1 track in the NPU / IREE roadmap: make the existing TensorEncoding sealed interface flow through the graph IR, so that downstream compile stages (StableHLO converter, future IREE lowering) can preserve Q4_K / Q8_0 / TernaryPacked / TurboQuant layouts instead of silently re-materializing everything as FP32.

Two commits, each independently reviewable:

1. `TensorSpec` + `TensorData` encoding accessors

Additive helpers under sk.ainet.lang.tensor.ops:

TENSOR_ENCODING_METADATA_KEY — single source of truth for the metadata key.
TensorSpec.tensorEncoding: TensorEncoding? — extension getter. null means "unknown / not carried" — intentionally distinct from TensorEncoding.Dense.
TensorSpec.withTensorEncoding(TensorEncoding?) — returns a copy with the encoding set (or removed when passed null), preserving all other metadata entries.
TensorData<*, *>.inferTensorEncoding() — centralized mapping from concrete TensorData subclasses to TensorEncoding. Collapses to this is PackedBlockStorage -> this.encoding today (covers Q4_K, Q8_0, TernaryPacked, TurboQuant uniformly) but gives us a single hook for future non-packed layouts.

Unit-tested for: unset reads, round-trips on Q8_0 / Q4_K / TernaryPacked / Dense, clearing via null, preservation of unrelated metadata, and overwrite semantics. 8 cases total, all green.

2. `TraceToGraphBuilder.finalize()` propagation

When finalize() resolves an unresolved tensor through the session, it now also derives the tensor's encoding via inferTensorEncoding() and attaches it to:

the produced weight / input node's output spec, and
every outgoing edge's tensor spec.

Applies symmetrically to both branches — the weight-node branch (when extractFloatArray succeeds) and the input-placeholder branch (when it doesn't, which is exactly the path a Q8_0-backed weight used to fall into today, silently losing its quantization).

Scoping notes / surprise wins

TensorSpec already has a metadata: Map<String, Any> field — zero schema change needed.
TensorEncoding already models Q4_K, Q8_0, TernaryPacked, TurboQuantPolar, TurboQuantPolarQjl, Opaque, and Dense — no new enum.
Every packed quantized TensorData already exposes encoding: TensorEncoding via the PackedBlockStorage interface, so the inference helper is one line.

Out of scope (follow-up PRs in the P0-1 track)

Teaching StableHloConverter / quant-aware emitters to read tensorEncoding and emit stablehlo.uniform_quantize or quant dialect ops. The metadata has to flow first, then the emitter reads it.
Plumbing at the loader boundary (StreamingGgufParametersLoader). The loader produces TensorData, not TensorSpec; the spec-level attachment already happens at the trace-builder boundary where the spec is actually born.
Changing QuantizedMatmul.matmulAutoDispatch(). Runtime dispatch stays for CPU; the goal here is only that compile-time (IR) paths stop being blind to quantization.
Parameterizing Tensor<DType, V> on a quant type parameter. Heavier redesign, and likely the wrong choice — keeping this as metadata on TensorSpec is lighter and composes better with the existing sealed TensorEncoding.

Test plan

./gradlew :skainet-lang:skainet-lang-core:jvmTest --tests \"sk.ainet.lang.tensor.ops.TensorSpecEncodingTest\" — 8/8 green
./gradlew :skainet-compile:skainet-compile-dag:jvmTest — green (no regressions from the finalize() change)
CI: full multiplatform build across all targets

🤖 Generated with Claude Code

Introduces three additive helpers under sk.ainet.lang.tensor.ops: - `TENSOR_ENCODING_METADATA_KEY` — the shared metadata key so raw map callers agree with the typed accessors. - `TensorSpec.tensorEncoding: TensorEncoding?` — extension getter that reads the encoding stored in metadata, or `null` when the producer did not populate it. A `null` return is intentionally distinct from `TensorEncoding.Dense`. - `TensorSpec.withTensorEncoding(TensorEncoding?)` — returns a copy with the encoding set (or removed for `null`), preserving all other metadata entries untouched. - `TensorData<*, *>.inferTensorEncoding()` — single source of truth mapping concrete `TensorData` subclasses to their `TensorEncoding`. Today that collapses to `PackedBlockStorage` (Q4_K, Q8_0, TernaryPacked, TurboQuant) which already exposes its own `encoding` field, so the helper is one line but centralizes the contract for future non-packed layouts. Unit tests cover unset reads, round-trips for Q8_0, Q4_K, TernaryPacked, and Dense, clearing via `null`, preservation of unrelated metadata, and overwrite semantics. No TraceToGraphBuilder / loader plumbing in this step — that is issue #469 step 2, intentionally scoped to its own commit so the data-model change lands in isolation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…tep 2) When finalize() resolves an unresolved tensor via the session, also derive its TensorEncoding via TensorData.inferTensorEncoding() and attach it to the produced node's output spec and to every outgoing edge's tensor spec. Applies symmetrically to both the weight-node branch (when FloatArrayTensorData is extractable) and the input- placeholder branch (when it isn't, which is exactly what used to happen to Q4_K / Q8_0 / Ternary weights). Net effect: a session-resolved weight backed by Q8_0 data now reaches downstream compile stages with `spec.tensorEncoding == TensorEncoding.Q8_0` instead of being silently downgraded to a lossy FP32 placeholder. Existing skainet-compile-dag tests stay green — the change is additive and dense / FloatArray paths see no behavior difference (inferTensorEncoding returns null, withTensorEncoding(null) is a no-op). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-13T10:25:18Z

📖 Documentation Preview

The documentation has been built successfully for this PR.

Generated Files:

Operator documentation: docs/modules/operators/_generated_/
JSON schema output: operators.json

Artifacts:

Download the documentation-preview-471 artifact to view the complete documentation locally.

This comment will be updated automatically when the PR is updated.

Captures the goal, phases, non-goals, and risks of implementing a new `skainet-backend-nnapi` module that runs SKaiNET models on an Amlogic Android dev board's NPU via Android's NNAPI HAL. Important placement notes in the PRD itself: - The backend lives in a NEW sibling repo, not in mainline SKaiNET. Mainline stays general and IREE-focused. - The backend builds on top of the already-merged skainet-backend-api module (#470) and the TensorEncoding metadata flow (#471 / #475 / #478) — no mainline code changes are required to ship Phase 1-3. - Orthogonal to SKaiNET-transformers, which owns LLM modules. Phases: 0. Board bring-up + NNAPI device capability dump 1. FP32 dense matmul end-to-end 2. int8 quantization path hitting the NPU driver 3. Target model (MobileNetV3 int8 or TinyLlama candidate) 4. Optional production packaging Also documents the known deprecation risk (Android 15 marked NNAPI deprecated in favor of LiteRT) and captures this as accepted: ship the Amlogic use case now, plan a LiteRT successor later. This file is a planning artifact; it will be moved / referenced from the new backend repo once that repo exists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

michalharakal and others added 2 commits April 13, 2026 12:11

michalharakal merged commit 4701d7e into develop Apr 13, 2026
7 checks passed

michalharakal deleted the feature/469-tensor-encoding-metadata branch April 13, 2026 10:26

michalharakal mentioned this pull request Apr 13, 2026

IREE gap: TensorEncoding not propagated through intermediate ops in TraceToGraphBuilder.addTrace (P2) #492

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plumb TensorEncoding into TensorSpec.metadata (#469)#471

Plumb TensorEncoding into TensorSpec.metadata (#469)#471
michalharakal merged 2 commits intodevelopfrom
feature/469-tensor-encoding-metadata

michalharakal commented Apr 13, 2026

Uh oh!

github-actions Bot commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michalharakal commented Apr 13, 2026

Summary

1. TensorSpec + TensorData encoding accessors

2. TraceToGraphBuilder.finalize() propagation

Scoping notes / surprise wins

Out of scope (follow-up PRs in the P0-1 track)

Test plan

Uh oh!

github-actions Bot commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `TensorSpec` + `TensorData` encoding accessors

2. `TraceToGraphBuilder.finalize()` propagation