Plumb TensorEncoding into TensorSpec.metadata (P0-1 step 1)

## Context

Part of the priority-ordered NPU / IREE roadmap.

P0-1 is "make quantization first-class in the graph IR so StableHLO export doesn't erase it." Today \`Q8_0TensorData\` / \`Q4_KTensorData\` / \`TernaryTensorData\` live as \`TensorData\` subclasses and \`QuantizedMatmul.matmulAutoDispatch()\` discovers them via runtime \`is\`-checks on the weight buffer. The \`ComputeGraph\` / \`GraphNode\` / \`TensorSpec\` layer is completely blind to quantization, which is why any StableHLO export silently re-materializes everything as FP32.

Scoping uncovered two existing hooks we can lean on:

1. **\`TensorSpec\` already carries \`metadata: Map<String, Any>\`** (skainet-lang-core/.../tensor/ops/TensorSpec.kt). No schema change needed.
2. **\`sealed interface TensorEncoding\`** already models storage encodings: \`Dense\`, \`Q4_K\`, \`Q8_0\`, \`TernaryPacked\`, \`TurboQuantPolar\`, \`TurboQuantPolarQjl\`, \`Opaque\` (skainet-lang-core/.../tensor/storage/TensorEncoding.kt). We can reuse it — no new enum.

## This PR

1. **Typed accessor helper**: add \`TensorSpec.tensorEncoding\` get/set helpers (extension functions) that read/write a single well-known key on \`metadata\` and return \`TensorEncoding?\`. Default \`null\` means "unknown / not carried" — **not** the same as \`Dense\`.
2. **Populate in the GGUF loader**: in \`StreamingGgufParametersLoader.load()\`, alongside the existing \`when (tensorInfo.tensorType)\` dispatch that constructs \`Q4_KBlockTensorData\` / \`Q8_0BlockTensorData\` / etc., set the corresponding \`TensorEncoding\` on the \`TensorSpec\` that surfaces the loaded tensor.
3. **Preserve through tracing**: in \`TraceToGraphBuilder\` (\`buildInputSpecs\` / \`buildOutputSpecs\` / inline fallback sites), propagate \`tensorEncoding\` from source to derived specs so a node whose input is a \`Q4_K\` weight carries that metadata onto its \`GraphNode.inputs\` entry.
4. **Unit test**: load a small synthetic GGUF-like fixture with a Q4_K weight, trace a \`matmul\`, assert the resulting \`GraphNode\` input spec's \`tensorEncoding\` is \`TensorEncoding.Q4_K\`.

## Out of scope (follow-up PRs in the P0-1 track)

- Teaching \`StableHloConverter\` / quant-aware emitters to read \`tensorEncoding\` and emit \`stablehlo.uniform_quantize\` or quant dialect ops. The metadata has to flow first, then the emitter reads it.
- Parameterizing \`Tensor<DType, V>\` on a quantization type parameter. That's a much larger redesign and probably the wrong choice — keeping it as metadata on \`TensorSpec\` is lighter and composes better.
- Changing \`QuantizedMatmul.matmulAutoDispatch()\`. Runtime dispatch stays for CPU; the goal is only that compile-time (IR) paths stop being blind.

## Why this is the right first step

- Purely additive: no existing API changes, no breaking call sites.
- The metadata channel already exists; we're just typing and populating it.
- Later PRs (StableHLO emitter, quant dialect lowering) become one-file local changes instead of cross-cutting refactors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plumb TensorEncoding into TensorSpec.metadata (P0-1 step 1) #469

Context

This PR

Out of scope (follow-up PRs in the P0-1 track)

Why this is the right first step

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Plumb TensorEncoding into TensorSpec.metadata (P0-1 step 1) #469

Description

Context

This PR

Out of scope (follow-up PRs in the P0-1 track)

Why this is the right first step

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions