Plumb TensorEncoding into TensorSpec.metadata (#469)#471
Merged
michalharakal merged 2 commits intodevelopfrom Apr 13, 2026
Merged
Plumb TensorEncoding into TensorSpec.metadata (#469)#471michalharakal merged 2 commits intodevelopfrom
michalharakal merged 2 commits intodevelopfrom
Conversation
Introduces three additive helpers under sk.ainet.lang.tensor.ops: - `TENSOR_ENCODING_METADATA_KEY` — the shared metadata key so raw map callers agree with the typed accessors. - `TensorSpec.tensorEncoding: TensorEncoding?` — extension getter that reads the encoding stored in metadata, or `null` when the producer did not populate it. A `null` return is intentionally distinct from `TensorEncoding.Dense`. - `TensorSpec.withTensorEncoding(TensorEncoding?)` — returns a copy with the encoding set (or removed for `null`), preserving all other metadata entries untouched. - `TensorData<*, *>.inferTensorEncoding()` — single source of truth mapping concrete `TensorData` subclasses to their `TensorEncoding`. Today that collapses to `PackedBlockStorage` (Q4_K, Q8_0, TernaryPacked, TurboQuant) which already exposes its own `encoding` field, so the helper is one line but centralizes the contract for future non-packed layouts. Unit tests cover unset reads, round-trips for Q8_0, Q4_K, TernaryPacked, and Dense, clearing via `null`, preservation of unrelated metadata, and overwrite semantics. No TraceToGraphBuilder / loader plumbing in this step — that is issue #469 step 2, intentionally scoped to its own commit so the data-model change lands in isolation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tep 2) When finalize() resolves an unresolved tensor via the session, also derive its TensorEncoding via TensorData.inferTensorEncoding() and attach it to the produced node's output spec and to every outgoing edge's tensor spec. Applies symmetrically to both the weight-node branch (when FloatArrayTensorData is extractable) and the input- placeholder branch (when it isn't, which is exactly what used to happen to Q4_K / Q8_0 / Ternary weights). Net effect: a session-resolved weight backed by Q8_0 data now reaches downstream compile stages with `spec.tensorEncoding == TensorEncoding.Q8_0` instead of being silently downgraded to a lossy FP32 placeholder. Existing skainet-compile-dag tests stay green — the change is additive and dense / FloatArray paths see no behavior difference (inferTensorEncoding returns null, withTensorEncoding(null) is a no-op). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
📖 Documentation Preview The documentation has been built successfully for this PR. Generated Files:
Artifacts:
This comment will be updated automatically when the PR is updated. |
michalharakal
added a commit
that referenced
this pull request
Apr 13, 2026
Captures the goal, phases, non-goals, and risks of implementing a new `skainet-backend-nnapi` module that runs SKaiNET models on an Amlogic Android dev board's NPU via Android's NNAPI HAL. Important placement notes in the PRD itself: - The backend lives in a NEW sibling repo, not in mainline SKaiNET. Mainline stays general and IREE-focused. - The backend builds on top of the already-merged skainet-backend-api module (#470) and the TensorEncoding metadata flow (#471 / #475 / #478) — no mainline code changes are required to ship Phase 1-3. - Orthogonal to SKaiNET-transformers, which owns LLM modules. Phases: 0. Board bring-up + NNAPI device capability dump 1. FP32 dense matmul end-to-end 2. int8 quantization path hitting the NPU driver 3. Target model (MobileNetV3 int8 or TinyLlama candidate) 4. Optional production packaging Also documents the known deprecation risk (Android 15 marked NNAPI deprecated in favor of LiteRT) and captures this as accepted: ship the Amlogic use case now, plan a LiteRT successor later. This file is a planning artifact; it will be moved / referenced from the new backend repo once that repo exists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #469.
Summary
First step of the P0-1 track in the NPU / IREE roadmap: make the existing
TensorEncodingsealed interface flow through the graph IR, so that downstream compile stages (StableHLO converter, future IREE lowering) can preserve Q4_K / Q8_0 / TernaryPacked / TurboQuant layouts instead of silently re-materializing everything as FP32.Two commits, each independently reviewable:
1.
TensorSpec+TensorDataencoding accessorsAdditive helpers under
sk.ainet.lang.tensor.ops:TENSOR_ENCODING_METADATA_KEY— single source of truth for the metadata key.TensorSpec.tensorEncoding: TensorEncoding?— extension getter.nullmeans "unknown / not carried" — intentionally distinct fromTensorEncoding.Dense.TensorSpec.withTensorEncoding(TensorEncoding?)— returns a copy with the encoding set (or removed when passednull), preserving all other metadata entries.TensorData<*, *>.inferTensorEncoding()— centralized mapping from concreteTensorDatasubclasses toTensorEncoding. Collapses tothis is PackedBlockStorage -> this.encodingtoday (covers Q4_K, Q8_0, TernaryPacked, TurboQuant uniformly) but gives us a single hook for future non-packed layouts.Unit-tested for: unset reads, round-trips on Q8_0 / Q4_K / TernaryPacked / Dense, clearing via
null, preservation of unrelated metadata, and overwrite semantics. 8 cases total, all green.2.
TraceToGraphBuilder.finalize()propagationWhen
finalize()resolves an unresolved tensor through the session, it now also derives the tensor's encoding viainferTensorEncoding()and attaches it to:Applies symmetrically to both branches — the weight-node branch (when
extractFloatArraysucceeds) and the input-placeholder branch (when it doesn't, which is exactly the path a Q8_0-backed weight used to fall into today, silently losing its quantization).Scoping notes / surprise wins
TensorSpecalready has ametadata: Map<String, Any>field — zero schema change needed.TensorEncodingalready models Q4_K, Q8_0, TernaryPacked, TurboQuantPolar, TurboQuantPolarQjl, Opaque, and Dense — no new enum.TensorDataalready exposesencoding: TensorEncodingvia thePackedBlockStorageinterface, so the inference helper is one line.Out of scope (follow-up PRs in the P0-1 track)
StableHloConverter/ quant-aware emitters to readtensorEncodingand emitstablehlo.uniform_quantizeor quant dialect ops. The metadata has to flow first, then the emitter reads it.StreamingGgufParametersLoader). The loader producesTensorData, notTensorSpec; the spec-level attachment already happens at the trace-builder boundary where the spec is actually born.QuantizedMatmul.matmulAutoDispatch(). Runtime dispatch stays for CPU; the goal here is only that compile-time (IR) paths stop being blind to quantization.Tensor<DType, V>on a quant type parameter. Heavier redesign, and likely the wrong choice — keeping this as metadata onTensorSpecis lighter and composes better with the existing sealedTensorEncoding.Test plan
./gradlew :skainet-lang:skainet-lang-core:jvmTest --tests \"sk.ainet.lang.tensor.ops.TensorSpecEncodingTest\"— 8/8 green./gradlew :skainet-compile:skainet-compile-dag:jvmTest— green (no regressions from the finalize() change)🤖 Generated with Claude Code