add flatten module

# Flatten Operation as Zero-Copy Tensor View

## Overview

The flatten operation in tensor libraries traditionally requires data materialization - creating a new contiguous memory layout by copying and rearranging elements from the source tensor. However, flatten can be implemented as a **zero-copy tensor view** that provides the same logical interface without any data duplication or memory allocation.

This document describes how to implement flatten as a view operation using the existing tensor view infrastructure in the skainet library, providing significant performance and memory efficiency benefits.

## Current Implementation Analysis

### Existing Implementation (with Materialization)

The current flatten implementation in `VoidTensorOps.kt` follows a materialization approach:

```kotlin
override fun <T : DType> flatten(tensor: Tensor<T, V>, startDim: Int, endDim: Int): Tensor<T, V> {
    val resultShape = calculateFlattenShape(tensor.shape, startDim, endDim)
    val resultData = dataFactory.zeros<T, V>(resultShape, tensor.dtype)  // ❌ Creates new data
    return VoidOpsTensor(resultData, tensor.dtype)
}
```

**Problems with Current Approach:**
- ❌ **Memory Overhead**: Allocates new memory equal to the source tensor size
- ❌ **Performance Cost**: Requires copying all tensor elements
- ❌ **Unnecessary Computation**: Shape transformation doesn't require data movement
- ❌ **Memory Pressure**: Doubles memory usage during operation

## Zero-Copy View Implementation

### Core Concept

Flatten is fundamentally a **coordinate transformation** operation:
- **Multi-dimensional coordinates** → **1D coordinates** (forward)
- **1D coordinates** → **Multi-dimensional coordinates** (reverse)

A zero-copy implementation uses the existing tensor view infrastructure to provide this coordinate mapping without data duplication.

### Architecture Overview

```
┌─────────────────────────────────────────────────────────────┐
│                    FlattenTensorView                        │
├─────────────────────────────────────────────────────────────┤
│ + parentTensor: Tensor<T, V>                                │
│ + viewShape: Shape                    (computed flattened)  │
│ + indexMapping: FlattenIndexMapper    (coordinate transform)│
│ + data: FlattenTensorData             (delegating wrapper) │
├─────────────────────────────────────────────────────────────┤
│ - startDim: Int                                            │
│ - endDim: Int                                              │
└─────────────────────────────────────────────────────────────┘
                              │
                              │ delegates to
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                 FlattenIndexMapper                          │
├─────────────────────────────────────────────────────────────┤
│ + mapToParent(childIndices): IntArray                       │
│ + isContiguous(): Boolean                                   │
│ + getStride(): IntArray                                     │
├─────────────────────────────────────────────────────────────┤
│ - originalShape: Shape                                      │
│ - startDim, endDim: Int                                     │
│ - strides: IntArray                  (precomputed)         │
└─────────────────────────────────────────────────────────────┘
```

### Implementation Details

#### 1. FlattenTensorView

```kotlin
/**
 * A zero-copy tensor view that provides flattened access to a parent tensor.
 * 
 * This implementation creates a logical 1D or partially flattened view of a 
 * multi-dimensional tensor without copying any data. All element access is 
 * delegated to the parent tensor through coordinate transformation.
 */
class FlattenTensorView<T : DType, V>(
    override val parentTensor: Tensor<T, V>,
    private val startDim: Int,
    private val endDim: Int
) : TensorView<T, V> {
    
    override val viewShape: Shape by lazy {
        calculateFlattenShape(parentTensor.shape, startDim, endDim)
    }
    
    override val indexMapping: IndexMapper by lazy {
        FlattenIndexMapper(parentTensor.shape, startDim, endDim)
    }
    
    override val data: TensorData<T, V> by lazy {
        FlattenTensorData(parentTensor.data, viewShape, indexMapping)
    }
    
    override val ops: TensorOps<V> = parentTensor.ops
    override val dtype: KClass<T> = parentTensor.dtype
}
```

#### 2. FlattenIndexMapper

The core of the zero-copy implementation is the coordinate transformation logic:

```kotlin
/**
 * Index mapper for flatten operations that transforms 1D flattened coordinates
 * back to multi-dimensional parent coordinates.
 */
class FlattenIndexMapper(
    private val originalShape: Shape,
    private val startDim: Int,
    private val endDim: Int
) : IndexMapper {
    
    private val actualStartDim = if (startDim < 0) originalShape.rank + startDim else startDim
    private val actualEndDim = if (endDim < 0) originalShape.rank + endDim else endDim
    
    // Precomputed strides for efficient coordinate calculation
    private val strides: IntArray = computeStrides(originalShape)
    private val flattenedStride: Int = computeFlattenedStride()
    
    override fun mapToParent(childIndices: IntArray): IntArray {
        val parentIndices = IntArray(originalShape.rank)
        
        // Copy dimensions before flattened range
        for (i in 0 until actualStartDim) {
            parentIndices[i] = childIndices[i]
        }
        
        // Transform flattened dimension back to multi-dimensional indices
        val flatIndex = childIndices[actualStartDim]
        var remainingIndex = flatIndex
        
        for (i in actualStartDim..actualEndDim) {
            val dimSize = originalShape.dimensions[i]
            parentIndices[i] = remainingIndex % dimSize
            remainingIndex /= dimSize
        }
        
        // Copy dimensions after flattened range
        var childOffset = actualStartDim + 1
        for (i in actualEndDim + 1 until originalShape.rank) {
            parentIndices[i] = childIndices[childOffset++]
        }
        
        return parentIndices
    }
    
    override fun isContiguous(): Boolean {
        // Flatten views are contiguous when flattening the rightmost dimensions
        return actualEndDim == originalShape.rank - 1
    }
    
    override fun getStride(): IntArray {
        // Return view strides (different from parent strides due to flattening)
        val viewStrides = IntArray(calculateViewRank())
        // ... stride calculation logic
        return viewStrides
    }
}
```

### Coordinate Transformation Examples

#### Example 1: 2D → 1D Flatten
```
Original tensor: Shape(3, 4)
Data layout:     [0,1,2,3,4,5,6,7,8,9,10,11]
                 
View:           flatten(0, 1) → Shape(12)
Mapping:        childIndex[5] → parentIndex[1,1]
                5 = 1*4 + 1
```

#### Example 2: 4D Partial Flatten (CNN Use Case)
```
Original tensor: Shape(2, 3, 4, 5)  // (batch, channels, height, width)
View:           flatten(1, 3) → Shape(2, 60)  // keep batch, flatten C×H×W

Coordinate mapping:
- childIndex[1, 25] → parentIndex[1, 1, 1, 0]
  - batch: 1 (direct copy)
  - flattened[25]: 25 = 1*20 + 1*5 + 0 → [1, 1, 0]
```

#### Example 3: 3D Middle Dimension Flatten
```
Original tensor: Shape(2, 6, 4)
View:           flatten(1, 1) → Shape(2, 6, 4)  // no-op, preserves shape

Coordinate mapping:
- childIndex[1, 3, 2] → parentIndex[1, 3, 2]  // direct mapping
```

### Performance Benefits

#### Memory Efficiency
```
Traditional Approach:
├── Original Tensor: 1GB
├── Flattened Copy:  1GB  ❌
└── Total Memory:    2GB

Zero-Copy View:
├── Original Tensor: 1GB
├── View Metadata:   <1KB  ✅
└── Total Memory:    ~1GB
```

#### Time Complexity
```
Traditional Flatten: O(n) - copy all elements
Zero-Copy Flatten:   O(1) - only metadata creation

Element Access:
Traditional: O(1) - direct array access  
Zero-Copy:   O(1) - coordinate transformation + parent access
```

#### Cache Performance
- **Contiguous Access**: When flattening rightmost dimensions, maintains cache locality
- **Stride Patterns**: Preserves memory access patterns for vectorization
- **No Cache Pollution**: Avoids filling caches with duplicate data

### Integration with Existing Code

The zero-copy flatten can be integrated seamlessly:

```kotlin
// In VoidTensorOps.kt
override fun <T : DType> flatten(tensor: Tensor<T, V>, startDim: Int, endDim: Int): Tensor<T, V> {
    // Use zero-copy view instead of materialization
    return FlattenTensorView(tensor, startDim, endDim)
}

// In Tensor.kt extensions
fun <T : DType, V> Tensor<T, V>.flatten(startDim: Int = 0, endDim: Int = -1): Tensor<T, V> {
    return FlattenTensorView(this, startDim, endDim)
}
```

### Edge Cases and Considerations

#### 1. Identity Operations
```kotlin
// These operations should be optimized to return the original tensor
tensor.flatten(0, 0)    // Single dimension - no change
tensor.flatten(1, 1)    // Single dimension - no change  
tensor.flatten(0, -1)   // All dimensions on 1D tensor - no change
```

#### 2. Nested Views
```kotlin
val sliced = originalTensor[0..2, 1..3, All]  // SlicedTensorView
val flattened = sliced.flatten(1, 2)        // FlattenTensorView of SlicedTensorView
// Should compose efficiently without deep nesting
```

#### 3. Materialization Strategy
Sometimes materialization is beneficial:
- When the view will be accessed many times with random patterns
- When the parent tensor is temporary and will be garbage collected
- For operations that require contiguous memory (some BLAS operations)

```kotlin
val view = tensor.flatten(1, 3)
val materialized = view.materialize()  // Explicit materialization when needed
```

### Testing Considerations

The zero-copy implementation must pass all existing tests in `VoidOpsFlattenTest.kt`:

```kotlin
@Test
fun testFlattenView_ZeroCopy_SameResults() {
    val original = createTensor(Shape(2, 3, 4))
    val traditional = ops.flatten(original, 1, 2)  // Current implementation
    val view = FlattenTensorView(original, 1, 2)   // New implementation
    
    // Same logical shape
    assertEquals(traditional.shape, view.shape)
    
    // Same element access (but view is zero-copy)
    for (i in 0 until view.shape.volume) {
        assertEquals(traditional.data[i], view.data[i])
    }
    
    // Verify zero-copy property
    assertTrue(view is TensorView)
    assertSame(original, view.parentTensor)
}
```

## Conclusion

Implementing flatten as a zero-copy tensor view provides significant benefits:

- ✅ **Memory Efficiency**: No data duplication
- ✅ **Performance**: O(1) creation time vs O(n) copy time  
- ✅ **Composability**: Works with existing view infrastructure
- ✅ **Compatibility**: Drop-in replacement for current implementation
- ✅ **Flexibility**: Supports partial flattening and complex scenarios

The implementation leverages the existing tensor view infrastructure (`TensorView`, `IndexMapper`, `TensorData`) to provide a robust, efficient, and maintainable solution.

This approach transforms flatten from a memory-intensive materialization operation into a lightweight metadata operation, enabling more efficient neural network computations and tensor manipulations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add flatten module #26

Flatten Operation as Zero-Copy Tensor View

Overview

Current Implementation Analysis

Existing Implementation (with Materialization)

Zero-Copy View Implementation

Core Concept

Architecture Overview

Implementation Details

1. FlattenTensorView

2. FlattenIndexMapper

Coordinate Transformation Examples

Example 1: 2D → 1D Flatten

Example 2: 4D Partial Flatten (CNN Use Case)

Example 3: 3D Middle Dimension Flatten

Performance Benefits

Memory Efficiency

Time Complexity

Cache Performance

Integration with Existing Code

Edge Cases and Considerations

1. Identity Operations

2. Nested Views

3. Materialization Strategy

Testing Considerations

Conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

add flatten module #26

Description

Flatten Operation as Zero-Copy Tensor View

Overview

Current Implementation Analysis

Existing Implementation (with Materialization)

Zero-Copy View Implementation

Core Concept

Architecture Overview

Implementation Details

1. FlattenTensorView

2. FlattenIndexMapper

Coordinate Transformation Examples

Example 1: 2D → 1D Flatten

Example 2: 4D Partial Flatten (CNN Use Case)

Example 3: 3D Middle Dimension Flatten

Performance Benefits

Memory Efficiency

Time Complexity

Cache Performance

Integration with Existing Code

Edge Cases and Considerations

1. Identity Operations

2. Nested Views

3. Materialization Strategy

Testing Considerations

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions