Skip to content

Plumb TensorEncoding into TensorSpec.metadata (#469)#471

Merged
michalharakal merged 2 commits intodevelopfrom
feature/469-tensor-encoding-metadata
Apr 13, 2026
Merged

Plumb TensorEncoding into TensorSpec.metadata (#469)#471
michalharakal merged 2 commits intodevelopfrom
feature/469-tensor-encoding-metadata

Conversation

@michalharakal
Copy link
Copy Markdown
Contributor

Closes #469.

Summary

First step of the P0-1 track in the NPU / IREE roadmap: make the existing TensorEncoding sealed interface flow through the graph IR, so that downstream compile stages (StableHLO converter, future IREE lowering) can preserve Q4_K / Q8_0 / TernaryPacked / TurboQuant layouts instead of silently re-materializing everything as FP32.

Two commits, each independently reviewable:

1. TensorSpec + TensorData encoding accessors

Additive helpers under sk.ainet.lang.tensor.ops:

  • TENSOR_ENCODING_METADATA_KEY — single source of truth for the metadata key.
  • TensorSpec.tensorEncoding: TensorEncoding? — extension getter. null means "unknown / not carried" — intentionally distinct from TensorEncoding.Dense.
  • TensorSpec.withTensorEncoding(TensorEncoding?) — returns a copy with the encoding set (or removed when passed null), preserving all other metadata entries.
  • TensorData<*, *>.inferTensorEncoding() — centralized mapping from concrete TensorData subclasses to TensorEncoding. Collapses to this is PackedBlockStorage -> this.encoding today (covers Q4_K, Q8_0, TernaryPacked, TurboQuant uniformly) but gives us a single hook for future non-packed layouts.

Unit-tested for: unset reads, round-trips on Q8_0 / Q4_K / TernaryPacked / Dense, clearing via null, preservation of unrelated metadata, and overwrite semantics. 8 cases total, all green.

2. TraceToGraphBuilder.finalize() propagation

When finalize() resolves an unresolved tensor through the session, it now also derives the tensor's encoding via inferTensorEncoding() and attaches it to:

  • the produced weight / input node's output spec, and
  • every outgoing edge's tensor spec.

Applies symmetrically to both branches — the weight-node branch (when extractFloatArray succeeds) and the input-placeholder branch (when it doesn't, which is exactly the path a Q8_0-backed weight used to fall into today, silently losing its quantization).

Scoping notes / surprise wins

  • TensorSpec already has a metadata: Map<String, Any> field — zero schema change needed.
  • TensorEncoding already models Q4_K, Q8_0, TernaryPacked, TurboQuantPolar, TurboQuantPolarQjl, Opaque, and Dense — no new enum.
  • Every packed quantized TensorData already exposes encoding: TensorEncoding via the PackedBlockStorage interface, so the inference helper is one line.

Out of scope (follow-up PRs in the P0-1 track)

  • Teaching StableHloConverter / quant-aware emitters to read tensorEncoding and emit stablehlo.uniform_quantize or quant dialect ops. The metadata has to flow first, then the emitter reads it.
  • Plumbing at the loader boundary (StreamingGgufParametersLoader). The loader produces TensorData, not TensorSpec; the spec-level attachment already happens at the trace-builder boundary where the spec is actually born.
  • Changing QuantizedMatmul.matmulAutoDispatch(). Runtime dispatch stays for CPU; the goal here is only that compile-time (IR) paths stop being blind to quantization.
  • Parameterizing Tensor<DType, V> on a quant type parameter. Heavier redesign, and likely the wrong choice — keeping this as metadata on TensorSpec is lighter and composes better with the existing sealed TensorEncoding.

Test plan

  • ./gradlew :skainet-lang:skainet-lang-core:jvmTest --tests \"sk.ainet.lang.tensor.ops.TensorSpecEncodingTest\" — 8/8 green
  • ./gradlew :skainet-compile:skainet-compile-dag:jvmTest — green (no regressions from the finalize() change)
  • CI: full multiplatform build across all targets

🤖 Generated with Claude Code

michalharakal and others added 2 commits April 13, 2026 12:11
Introduces three additive helpers under sk.ainet.lang.tensor.ops:

- `TENSOR_ENCODING_METADATA_KEY` — the shared metadata key so raw
  map callers agree with the typed accessors.
- `TensorSpec.tensorEncoding: TensorEncoding?` — extension getter
  that reads the encoding stored in metadata, or `null` when the
  producer did not populate it. A `null` return is intentionally
  distinct from `TensorEncoding.Dense`.
- `TensorSpec.withTensorEncoding(TensorEncoding?)` — returns a
  copy with the encoding set (or removed for `null`), preserving
  all other metadata entries untouched.
- `TensorData<*, *>.inferTensorEncoding()` — single source of
  truth mapping concrete `TensorData` subclasses to their
  `TensorEncoding`. Today that collapses to `PackedBlockStorage`
  (Q4_K, Q8_0, TernaryPacked, TurboQuant) which already exposes
  its own `encoding` field, so the helper is one line but
  centralizes the contract for future non-packed layouts.

Unit tests cover unset reads, round-trips for Q8_0, Q4_K,
TernaryPacked, and Dense, clearing via `null`, preservation of
unrelated metadata, and overwrite semantics.

No TraceToGraphBuilder / loader plumbing in this step — that is
issue #469 step 2, intentionally scoped to its own commit so
the data-model change lands in isolation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tep 2)

When finalize() resolves an unresolved tensor via the session, also
derive its TensorEncoding via TensorData.inferTensorEncoding() and
attach it to the produced node's output spec and to every outgoing
edge's tensor spec. Applies symmetrically to both the weight-node
branch (when FloatArrayTensorData is extractable) and the input-
placeholder branch (when it isn't, which is exactly what used to
happen to Q4_K / Q8_0 / Ternary weights).

Net effect: a session-resolved weight backed by Q8_0 data now
reaches downstream compile stages with `spec.tensorEncoding ==
TensorEncoding.Q8_0` instead of being silently downgraded to a
lossy FP32 placeholder.

Existing skainet-compile-dag tests stay green — the change is
additive and dense / FloatArray paths see no behavior difference
(inferTensorEncoding returns null, withTensorEncoding(null) is a
no-op).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

📖 Documentation Preview

The documentation has been built successfully for this PR.

Generated Files:

  • Operator documentation: docs/modules/operators/_generated_/
  • JSON schema output: operators.json

Artifacts:

  • Download the documentation-preview-471 artifact to view the complete documentation locally.

This comment will be updated automatically when the PR is updated.

@michalharakal michalharakal merged commit 4701d7e into develop Apr 13, 2026
7 checks passed
@michalharakal michalharakal deleted the feature/469-tensor-encoding-metadata branch April 13, 2026 10:26
michalharakal added a commit that referenced this pull request Apr 13, 2026
Captures the goal, phases, non-goals, and risks of implementing
a new `skainet-backend-nnapi` module that runs SKaiNET models
on an Amlogic Android dev board's NPU via Android's NNAPI HAL.

Important placement notes in the PRD itself:
- The backend lives in a NEW sibling repo, not in mainline
  SKaiNET. Mainline stays general and IREE-focused.
- The backend builds on top of the already-merged
  skainet-backend-api module (#470) and the TensorEncoding
  metadata flow (#471 / #475 / #478) — no mainline code
  changes are required to ship Phase 1-3.
- Orthogonal to SKaiNET-transformers, which owns LLM modules.

Phases:
  0. Board bring-up + NNAPI device capability dump
  1. FP32 dense matmul end-to-end
  2. int8 quantization path hitting the NPU driver
  3. Target model (MobileNetV3 int8 or TinyLlama candidate)
  4. Optional production packaging

Also documents the known deprecation risk (Android 15 marked
NNAPI deprecated in favor of LiteRT) and captures this as
accepted: ship the Amlogic use case now, plan a LiteRT
successor later.

This file is a planning artifact; it will be moved / referenced
from the new backend repo once that repo exists.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Plumb TensorEncoding into TensorSpec.metadata (P0-1 step 1)

1 participant