Evalute SKaiNET for TinyFoA

# New Tensor Ops for TinyFoA

> **Context:** TinyFoA on-device fine-tuning (PRD-tinyFoA-on-device-fine-tuning)
> **Target module:** `skainet-lang-core` (TensorOps interface) + backends
> **Priority:** High -- blocks BinarizeFunction, weight freezing, and the full TinyFoA training loop


---

Related-To: https://github.com/SKaiNET-developers/SKaiNET-research/issues/1

## 1. `abs` -- Element-wise absolute value

| Field | Value |
|---|---|
| **Signature** | `fun <T : DType, V> abs(tensor: Tensor<T, V>): Tensor<T, V>` |
| **Differentiable** | Yes (`@Diff`) |
| **Backward** | `grad_input = upstream * sign(input)` (zero at x = 0) |
| **Supported dtypes** | FP32, FP16, Int32 |

**Why needed:**
- Adam optimizer currently uses `sqrt(x*x)` as a workaround for `abs` -- this is numerically worse and slower.
- Used in gradient clipping and weight statistics.

**Files to touch:**
- `TensorOps.kt` -- interface declaration
- `DefaultCpuOps.kt` -- CPU implementation
- `VoidTensorOps.kt` -- void/shape stub
- `TensorExtensions.kt` -- `Tensor.abs()` extension
- `DefaultExecutionTape.kt` -- `absBackward` + dispatch entry + forward replay

---

## 2. `sign` -- Element-wise sign

| Field | Value |
|---|---|
| **Signature** | `fun <T : DType, V> sign(tensor: Tensor<T, V>): Tensor<T, V>` |
| **Differentiable** | No (non-differentiable, no `@Diff`) |
| **Output** | -1 for negative, 0 for zero, +1 for positive |
| **Supported dtypes** | FP32, FP16, Int32 |

**Why needed:**
- Core of TinyFoA binarization: `Binarize(x) = sign(x)` in the forward pass.
- Used together with Straight-Through Estimator (STE) in `BinarizeFunction`.

**Files to touch:**
- `TensorOps.kt` -- interface declaration
- `DefaultCpuOps.kt` -- CPU implementation
- `VoidTensorOps.kt` -- void/shape stub
- `TensorExtensions.kt` -- `Tensor.sign()` extension
- `DefaultExecutionTape.kt` -- forward replay entry only (no backward)

---

## 3. `clamp` -- Element-wise clamping

| Field | Value |
|---|---|
| **Signature** | `fun <T : DType, V> clamp(tensor: Tensor<T, V>, minVal: Float, maxVal: Float): Tensor<T, V>` |
| **Differentiable** | Yes (`@Diff`) |
| **Backward** | `grad_input = upstream` where `minVal <= x <= maxVal`, else `0` |
| **Supported dtypes** | FP32, FP16, Int32 |

**Why needed:**
- TinyFoA clips real-valued weights to [-1, 1] after each optimizer step: `weight.clamp(-1f, 1f)`.
- Straight-Through Estimator (STE) uses clamped identity as the backward proxy for `sign`.
- General utility for gradient clipping, activation bounding, etc.

**Files to touch:**
- `TensorOps.kt` -- interface declaration
- `DefaultCpuOps.kt` -- CPU implementation (`coerceIn`)
- `VoidTensorOps.kt` -- void/shape stub
- `TensorExtensions.kt` -- `Tensor.clamp(minVal, maxVal)` extension
- `DefaultExecutionTape.kt` -- `clampBackward` + dispatch entry + forward replay

---

## 4. `lt` -- Element-wise less-than comparison

| Field | Value |
|---|---|
| **Signature** | `fun <T : DType, V> lt(tensor: Tensor<T, V>, value: Float): Tensor<T, V>` |
| **Differentiable** | No (returns a 0/1 mask) |
| **Output** | 1.0 where `x < value`, 0.0 otherwise |
| **Supported dtypes** | FP32, FP16, Int32 |

**Why needed:**
- Weight freezing mask generation: identify which weight blocks to freeze based on partition index.
- General-purpose masking for conditional operations.

**Files to touch:**
- `TensorOps.kt` -- interface declaration
- `DefaultCpuOps.kt` -- CPU implementation
- `VoidTensorOps.kt` -- void/shape stub
- `TensorExtensions.kt` -- `Tensor.lt(value)` extension
- `DefaultExecutionTape.kt` -- forward replay entry only (no backward)

---

## 5. `ge` -- Element-wise greater-than-or-equal comparison

| Field | Value |
|---|---|
| **Signature** | `fun <T : DType, V> ge(tensor: Tensor<T, V>, value: Float): Tensor<T, V>` |
| **Differentiable** | No (returns a 0/1 mask) |
| **Output** | 1.0 where `x >= value`, 0.0 otherwise |
| **Supported dtypes** | FP32, FP16, Int32 |

**Why needed:**
- Weight freezing mask generation: `ge` + `lt` together select a block/partition of weights.
- STE backward clipping: mask gradient to pass only where `|x| <= 1`.

**Files to touch:**
- `TensorOps.kt` -- interface declaration
- `DefaultCpuOps.kt` -- CPU implementation
- `VoidTensorOps.kt` -- void/shape stub
- `TensorExtensions.kt` -- `Tensor.ge(value)` extension
- `DefaultExecutionTape.kt` -- forward replay entry only (no backward)

---

## Acceptance criteria

- [ ] All 5 ops declared in `TensorOps` interface
- [ ] CPU backend (`DefaultCpuOps`) implements all 5 with FP32/FP16/Int32 support
- [ ] `VoidTensorOps` stubs return correct shapes
- [ ] Extension functions available on `Tensor<T, V>`
- [ ] `abs` backward: `upstream * sign(input)`, zero gradient at `x = 0`
- [ ] `clamp` backward: gradient passes through in `[minVal, maxVal]`, zero outside
- [ ] Forward replay dispatch handles all 5 ops
- [ ] Existing tests pass (no regressions)
- [ ] New unit tests cover forward + backward for each op

---

## Future ops (not yet implemented, needed later)


| Op | Purpose | Blocked by |
|---|---|---|
| **Tensor slice read/write** | Block-diagonal weight freezing (read/write sub-tensors) | -- |
| **pad2d** | LC variant convolution padding | -- |
| **unfold / im2col** | Locally-connected 2D layer (LC variant) | -- |


Field	Value
Signature	`fun <T : DType, V> abs(tensor: Tensor<T, V>): Tensor<T, V>`
Differentiable	Yes (`@Diff`)
Backward	`grad_input = upstream * sign(input)` (zero at x = 0)
Supported dtypes	FP32, FP16, Int32

Field	Value
Signature	`fun <T : DType, V> sign(tensor: Tensor<T, V>): Tensor<T, V>`
Differentiable	No (non-differentiable, no `@Diff`)
Output	-1 for negative, 0 for zero, +1 for positive
Supported dtypes	FP32, FP16, Int32

Field	Value
Signature	`fun <T : DType, V> clamp(tensor: Tensor<T, V>, minVal: Float, maxVal: Float): Tensor<T, V>`
Differentiable	Yes (`@Diff`)
Backward	`grad_input = upstream` where `minVal <= x <= maxVal`, else `0`
Supported dtypes	FP32, FP16, Int32

Field	Value
Signature	`fun <T : DType, V> lt(tensor: Tensor<T, V>, value: Float): Tensor<T, V>`
Differentiable	No (returns a 0/1 mask)
Output	1.0 where `x < value`, 0.0 otherwise
Supported dtypes	FP32, FP16, Int32

Field	Value
Signature	`fun <T : DType, V> ge(tensor: Tensor<T, V>, value: Float): Tensor<T, V>`
Differentiable	No (returns a 0/1 mask)
Output	1.0 where `x >= value`, 0.0 otherwise
Supported dtypes	FP32, FP16, Int32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evalute SKaiNET for TinyFoA #358

New Tensor Ops for TinyFoA

1. `abs` -- Element-wise absolute value

2. `sign` -- Element-wise sign

3. `clamp` -- Element-wise clamping

4. `lt` -- Element-wise less-than comparison

5. `ge` -- Element-wise greater-than-or-equal comparison

Acceptance criteria

Future ops (not yet implemented, needed later)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Op	Purpose	Blocked by
Tensor slice read/write	Block-diagonal weight freezing (read/write sub-tensors)	--
pad2d	LC variant convolution padding	--
unfold / im2col	Locally-connected 2D layer (LC variant)	--

Evalute SKaiNET for TinyFoA #358

Description

New Tensor Ops for TinyFoA

1. abs -- Element-wise absolute value

2. sign -- Element-wise sign

3. clamp -- Element-wise clamping

4. lt -- Element-wise less-than comparison

5. ge -- Element-wise greater-than-or-equal comparison

Acceptance criteria

Future ops (not yet implemented, needed later)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. `abs` -- Element-wise absolute value

2. `sign` -- Element-wise sign

3. `clamp` -- Element-wise clamping

4. `lt` -- Element-wise less-than comparison

5. `ge` -- Element-wise greater-than-or-equal comparison