New Tensor Ops for TinyFoA
Context: TinyFoA on-device fine-tuning (PRD-tinyFoA-on-device-fine-tuning)
Target module: skainet-lang-core (TensorOps interface) + backends
Priority: High -- blocks BinarizeFunction, weight freezing, and the full TinyFoA training loop
Related-To: SKaiNET-developers/SKaiNET-research#1
1. abs -- Element-wise absolute value
| Field |
Value |
| Signature |
fun <T : DType, V> abs(tensor: Tensor<T, V>): Tensor<T, V> |
| Differentiable |
Yes (@Diff) |
| Backward |
grad_input = upstream * sign(input) (zero at x = 0) |
| Supported dtypes |
FP32, FP16, Int32 |
Why needed:
- Adam optimizer currently uses
sqrt(x*x) as a workaround for abs -- this is numerically worse and slower.
- Used in gradient clipping and weight statistics.
Files to touch:
TensorOps.kt -- interface declaration
DefaultCpuOps.kt -- CPU implementation
VoidTensorOps.kt -- void/shape stub
TensorExtensions.kt -- Tensor.abs() extension
DefaultExecutionTape.kt -- absBackward + dispatch entry + forward replay
2. sign -- Element-wise sign
| Field |
Value |
| Signature |
fun <T : DType, V> sign(tensor: Tensor<T, V>): Tensor<T, V> |
| Differentiable |
No (non-differentiable, no @Diff) |
| Output |
-1 for negative, 0 for zero, +1 for positive |
| Supported dtypes |
FP32, FP16, Int32 |
Why needed:
- Core of TinyFoA binarization:
Binarize(x) = sign(x) in the forward pass.
- Used together with Straight-Through Estimator (STE) in
BinarizeFunction.
Files to touch:
TensorOps.kt -- interface declaration
DefaultCpuOps.kt -- CPU implementation
VoidTensorOps.kt -- void/shape stub
TensorExtensions.kt -- Tensor.sign() extension
DefaultExecutionTape.kt -- forward replay entry only (no backward)
3. clamp -- Element-wise clamping
| Field |
Value |
| Signature |
fun <T : DType, V> clamp(tensor: Tensor<T, V>, minVal: Float, maxVal: Float): Tensor<T, V> |
| Differentiable |
Yes (@Diff) |
| Backward |
grad_input = upstream where minVal <= x <= maxVal, else 0 |
| Supported dtypes |
FP32, FP16, Int32 |
Why needed:
- TinyFoA clips real-valued weights to [-1, 1] after each optimizer step:
weight.clamp(-1f, 1f).
- Straight-Through Estimator (STE) uses clamped identity as the backward proxy for
sign.
- General utility for gradient clipping, activation bounding, etc.
Files to touch:
TensorOps.kt -- interface declaration
DefaultCpuOps.kt -- CPU implementation (coerceIn)
VoidTensorOps.kt -- void/shape stub
TensorExtensions.kt -- Tensor.clamp(minVal, maxVal) extension
DefaultExecutionTape.kt -- clampBackward + dispatch entry + forward replay
4. lt -- Element-wise less-than comparison
| Field |
Value |
| Signature |
fun <T : DType, V> lt(tensor: Tensor<T, V>, value: Float): Tensor<T, V> |
| Differentiable |
No (returns a 0/1 mask) |
| Output |
1.0 where x < value, 0.0 otherwise |
| Supported dtypes |
FP32, FP16, Int32 |
Why needed:
- Weight freezing mask generation: identify which weight blocks to freeze based on partition index.
- General-purpose masking for conditional operations.
Files to touch:
TensorOps.kt -- interface declaration
DefaultCpuOps.kt -- CPU implementation
VoidTensorOps.kt -- void/shape stub
TensorExtensions.kt -- Tensor.lt(value) extension
DefaultExecutionTape.kt -- forward replay entry only (no backward)
5. ge -- Element-wise greater-than-or-equal comparison
| Field |
Value |
| Signature |
fun <T : DType, V> ge(tensor: Tensor<T, V>, value: Float): Tensor<T, V> |
| Differentiable |
No (returns a 0/1 mask) |
| Output |
1.0 where x >= value, 0.0 otherwise |
| Supported dtypes |
FP32, FP16, Int32 |
Why needed:
- Weight freezing mask generation:
ge + lt together select a block/partition of weights.
- STE backward clipping: mask gradient to pass only where
|x| <= 1.
Files to touch:
TensorOps.kt -- interface declaration
DefaultCpuOps.kt -- CPU implementation
VoidTensorOps.kt -- void/shape stub
TensorExtensions.kt -- Tensor.ge(value) extension
DefaultExecutionTape.kt -- forward replay entry only (no backward)
Acceptance criteria
Future ops (not yet implemented, needed later)
| Op |
Purpose |
Blocked by |
| Tensor slice read/write |
Block-diagonal weight freezing (read/write sub-tensors) |
-- |
| pad2d |
LC variant convolution padding |
-- |
| unfold / im2col |
Locally-connected 2D layer (LC variant) |
-- |
New Tensor Ops for TinyFoA
Related-To: SKaiNET-developers/SKaiNET-research#1
1.
abs-- Element-wise absolute valuefun <T : DType, V> abs(tensor: Tensor<T, V>): Tensor<T, V>@Diff)grad_input = upstream * sign(input)(zero at x = 0)Why needed:
sqrt(x*x)as a workaround forabs-- this is numerically worse and slower.Files to touch:
TensorOps.kt-- interface declarationDefaultCpuOps.kt-- CPU implementationVoidTensorOps.kt-- void/shape stubTensorExtensions.kt--Tensor.abs()extensionDefaultExecutionTape.kt--absBackward+ dispatch entry + forward replay2.
sign-- Element-wise signfun <T : DType, V> sign(tensor: Tensor<T, V>): Tensor<T, V>@Diff)Why needed:
Binarize(x) = sign(x)in the forward pass.BinarizeFunction.Files to touch:
TensorOps.kt-- interface declarationDefaultCpuOps.kt-- CPU implementationVoidTensorOps.kt-- void/shape stubTensorExtensions.kt--Tensor.sign()extensionDefaultExecutionTape.kt-- forward replay entry only (no backward)3.
clamp-- Element-wise clampingfun <T : DType, V> clamp(tensor: Tensor<T, V>, minVal: Float, maxVal: Float): Tensor<T, V>@Diff)grad_input = upstreamwhereminVal <= x <= maxVal, else0Why needed:
weight.clamp(-1f, 1f).sign.Files to touch:
TensorOps.kt-- interface declarationDefaultCpuOps.kt-- CPU implementation (coerceIn)VoidTensorOps.kt-- void/shape stubTensorExtensions.kt--Tensor.clamp(minVal, maxVal)extensionDefaultExecutionTape.kt--clampBackward+ dispatch entry + forward replay4.
lt-- Element-wise less-than comparisonfun <T : DType, V> lt(tensor: Tensor<T, V>, value: Float): Tensor<T, V>x < value, 0.0 otherwiseWhy needed:
Files to touch:
TensorOps.kt-- interface declarationDefaultCpuOps.kt-- CPU implementationVoidTensorOps.kt-- void/shape stubTensorExtensions.kt--Tensor.lt(value)extensionDefaultExecutionTape.kt-- forward replay entry only (no backward)5.
ge-- Element-wise greater-than-or-equal comparisonfun <T : DType, V> ge(tensor: Tensor<T, V>, value: Float): Tensor<T, V>x >= value, 0.0 otherwiseWhy needed:
ge+lttogether select a block/partition of weights.|x| <= 1.Files to touch:
TensorOps.kt-- interface declarationDefaultCpuOps.kt-- CPU implementationVoidTensorOps.kt-- void/shape stubTensorExtensions.kt--Tensor.ge(value)extensionDefaultExecutionTape.kt-- forward replay entry only (no backward)Acceptance criteria
TensorOpsinterfaceDefaultCpuOps) implements all 5 with FP32/FP16/Int32 supportVoidTensorOpsstubs return correct shapesTensor<T, V>absbackward:upstream * sign(input), zero gradient atx = 0clampbackward: gradient passes through in[minVal, maxVal], zero outsideFuture ops (not yet implemented, needed later)