Flatten Operation as Zero-Copy Tensor View
Overview
The flatten operation in tensor libraries traditionally requires data materialization - creating a new contiguous memory layout by copying and rearranging elements from the source tensor. However, flatten can be implemented as a zero-copy tensor view that provides the same logical interface without any data duplication or memory allocation.
This document describes how to implement flatten as a view operation using the existing tensor view infrastructure in the skainet library, providing significant performance and memory efficiency benefits.
Current Implementation Analysis
Existing Implementation (with Materialization)
The current flatten implementation in VoidTensorOps.kt follows a materialization approach:
override fun <T : DType> flatten(tensor: Tensor<T, V>, startDim: Int, endDim: Int): Tensor<T, V> {
val resultShape = calculateFlattenShape(tensor.shape, startDim, endDim)
val resultData = dataFactory.zeros<T, V>(resultShape, tensor.dtype) // ❌ Creates new data
return VoidOpsTensor(resultData, tensor.dtype)
}
Problems with Current Approach:
- ❌ Memory Overhead: Allocates new memory equal to the source tensor size
- ❌ Performance Cost: Requires copying all tensor elements
- ❌ Unnecessary Computation: Shape transformation doesn't require data movement
- ❌ Memory Pressure: Doubles memory usage during operation
Zero-Copy View Implementation
Core Concept
Flatten is fundamentally a coordinate transformation operation:
- Multi-dimensional coordinates → 1D coordinates (forward)
- 1D coordinates → Multi-dimensional coordinates (reverse)
A zero-copy implementation uses the existing tensor view infrastructure to provide this coordinate mapping without data duplication.
Architecture Overview
┌─────────────────────────────────────────────────────────────┐
│ FlattenTensorView │
├─────────────────────────────────────────────────────────────┤
│ + parentTensor: Tensor<T, V> │
│ + viewShape: Shape (computed flattened) │
│ + indexMapping: FlattenIndexMapper (coordinate transform)│
│ + data: FlattenTensorData (delegating wrapper) │
├─────────────────────────────────────────────────────────────┤
│ - startDim: Int │
│ - endDim: Int │
└─────────────────────────────────────────────────────────────┘
│
│ delegates to
▼
┌─────────────────────────────────────────────────────────────┐
│ FlattenIndexMapper │
├─────────────────────────────────────────────────────────────┤
│ + mapToParent(childIndices): IntArray │
│ + isContiguous(): Boolean │
│ + getStride(): IntArray │
├─────────────────────────────────────────────────────────────┤
│ - originalShape: Shape │
│ - startDim, endDim: Int │
│ - strides: IntArray (precomputed) │
└─────────────────────────────────────────────────────────────┘
Implementation Details
1. FlattenTensorView
/**
* A zero-copy tensor view that provides flattened access to a parent tensor.
*
* This implementation creates a logical 1D or partially flattened view of a
* multi-dimensional tensor without copying any data. All element access is
* delegated to the parent tensor through coordinate transformation.
*/
class FlattenTensorView<T : DType, V>(
override val parentTensor: Tensor<T, V>,
private val startDim: Int,
private val endDim: Int
) : TensorView<T, V> {
override val viewShape: Shape by lazy {
calculateFlattenShape(parentTensor.shape, startDim, endDim)
}
override val indexMapping: IndexMapper by lazy {
FlattenIndexMapper(parentTensor.shape, startDim, endDim)
}
override val data: TensorData<T, V> by lazy {
FlattenTensorData(parentTensor.data, viewShape, indexMapping)
}
override val ops: TensorOps<V> = parentTensor.ops
override val dtype: KClass<T> = parentTensor.dtype
}
2. FlattenIndexMapper
The core of the zero-copy implementation is the coordinate transformation logic:
/**
* Index mapper for flatten operations that transforms 1D flattened coordinates
* back to multi-dimensional parent coordinates.
*/
class FlattenIndexMapper(
private val originalShape: Shape,
private val startDim: Int,
private val endDim: Int
) : IndexMapper {
private val actualStartDim = if (startDim < 0) originalShape.rank + startDim else startDim
private val actualEndDim = if (endDim < 0) originalShape.rank + endDim else endDim
// Precomputed strides for efficient coordinate calculation
private val strides: IntArray = computeStrides(originalShape)
private val flattenedStride: Int = computeFlattenedStride()
override fun mapToParent(childIndices: IntArray): IntArray {
val parentIndices = IntArray(originalShape.rank)
// Copy dimensions before flattened range
for (i in 0 until actualStartDim) {
parentIndices[i] = childIndices[i]
}
// Transform flattened dimension back to multi-dimensional indices
val flatIndex = childIndices[actualStartDim]
var remainingIndex = flatIndex
for (i in actualStartDim..actualEndDim) {
val dimSize = originalShape.dimensions[i]
parentIndices[i] = remainingIndex % dimSize
remainingIndex /= dimSize
}
// Copy dimensions after flattened range
var childOffset = actualStartDim + 1
for (i in actualEndDim + 1 until originalShape.rank) {
parentIndices[i] = childIndices[childOffset++]
}
return parentIndices
}
override fun isContiguous(): Boolean {
// Flatten views are contiguous when flattening the rightmost dimensions
return actualEndDim == originalShape.rank - 1
}
override fun getStride(): IntArray {
// Return view strides (different from parent strides due to flattening)
val viewStrides = IntArray(calculateViewRank())
// ... stride calculation logic
return viewStrides
}
}
Coordinate Transformation Examples
Example 1: 2D → 1D Flatten
Original tensor: Shape(3, 4)
Data layout: [0,1,2,3,4,5,6,7,8,9,10,11]
View: flatten(0, 1) → Shape(12)
Mapping: childIndex[5] → parentIndex[1,1]
5 = 1*4 + 1
Example 2: 4D Partial Flatten (CNN Use Case)
Original tensor: Shape(2, 3, 4, 5) // (batch, channels, height, width)
View: flatten(1, 3) → Shape(2, 60) // keep batch, flatten C×H×W
Coordinate mapping:
- childIndex[1, 25] → parentIndex[1, 1, 1, 0]
- batch: 1 (direct copy)
- flattened[25]: 25 = 1*20 + 1*5 + 0 → [1, 1, 0]
Example 3: 3D Middle Dimension Flatten
Original tensor: Shape(2, 6, 4)
View: flatten(1, 1) → Shape(2, 6, 4) // no-op, preserves shape
Coordinate mapping:
- childIndex[1, 3, 2] → parentIndex[1, 3, 2] // direct mapping
Performance Benefits
Memory Efficiency
Traditional Approach:
├── Original Tensor: 1GB
├── Flattened Copy: 1GB ❌
└── Total Memory: 2GB
Zero-Copy View:
├── Original Tensor: 1GB
├── View Metadata: <1KB ✅
└── Total Memory: ~1GB
Time Complexity
Traditional Flatten: O(n) - copy all elements
Zero-Copy Flatten: O(1) - only metadata creation
Element Access:
Traditional: O(1) - direct array access
Zero-Copy: O(1) - coordinate transformation + parent access
Cache Performance
- Contiguous Access: When flattening rightmost dimensions, maintains cache locality
- Stride Patterns: Preserves memory access patterns for vectorization
- No Cache Pollution: Avoids filling caches with duplicate data
Integration with Existing Code
The zero-copy flatten can be integrated seamlessly:
// In VoidTensorOps.kt
override fun <T : DType> flatten(tensor: Tensor<T, V>, startDim: Int, endDim: Int): Tensor<T, V> {
// Use zero-copy view instead of materialization
return FlattenTensorView(tensor, startDim, endDim)
}
// In Tensor.kt extensions
fun <T : DType, V> Tensor<T, V>.flatten(startDim: Int = 0, endDim: Int = -1): Tensor<T, V> {
return FlattenTensorView(this, startDim, endDim)
}
Edge Cases and Considerations
1. Identity Operations
// These operations should be optimized to return the original tensor
tensor.flatten(0, 0) // Single dimension - no change
tensor.flatten(1, 1) // Single dimension - no change
tensor.flatten(0, -1) // All dimensions on 1D tensor - no change
2. Nested Views
val sliced = originalTensor[0..2, 1..3, All] // SlicedTensorView
val flattened = sliced.flatten(1, 2) // FlattenTensorView of SlicedTensorView
// Should compose efficiently without deep nesting
3. Materialization Strategy
Sometimes materialization is beneficial:
- When the view will be accessed many times with random patterns
- When the parent tensor is temporary and will be garbage collected
- For operations that require contiguous memory (some BLAS operations)
val view = tensor.flatten(1, 3)
val materialized = view.materialize() // Explicit materialization when needed
Testing Considerations
The zero-copy implementation must pass all existing tests in VoidOpsFlattenTest.kt:
@Test
fun testFlattenView_ZeroCopy_SameResults() {
val original = createTensor(Shape(2, 3, 4))
val traditional = ops.flatten(original, 1, 2) // Current implementation
val view = FlattenTensorView(original, 1, 2) // New implementation
// Same logical shape
assertEquals(traditional.shape, view.shape)
// Same element access (but view is zero-copy)
for (i in 0 until view.shape.volume) {
assertEquals(traditional.data[i], view.data[i])
}
// Verify zero-copy property
assertTrue(view is TensorView)
assertSame(original, view.parentTensor)
}
Conclusion
Implementing flatten as a zero-copy tensor view provides significant benefits:
- ✅ Memory Efficiency: No data duplication
- ✅ Performance: O(1) creation time vs O(n) copy time
- ✅ Composability: Works with existing view infrastructure
- ✅ Compatibility: Drop-in replacement for current implementation
- ✅ Flexibility: Supports partial flattening and complex scenarios
The implementation leverages the existing tensor view infrastructure (TensorView, IndexMapper, TensorData) to provide a robust, efficient, and maintainable solution.
This approach transforms flatten from a memory-intensive materialization operation into a lightweight metadata operation, enabling more efficient neural network computations and tensor manipulations.
Flatten Operation as Zero-Copy Tensor View
Overview
The flatten operation in tensor libraries traditionally requires data materialization - creating a new contiguous memory layout by copying and rearranging elements from the source tensor. However, flatten can be implemented as a zero-copy tensor view that provides the same logical interface without any data duplication or memory allocation.
This document describes how to implement flatten as a view operation using the existing tensor view infrastructure in the skainet library, providing significant performance and memory efficiency benefits.
Current Implementation Analysis
Existing Implementation (with Materialization)
The current flatten implementation in
VoidTensorOps.ktfollows a materialization approach:Problems with Current Approach:
Zero-Copy View Implementation
Core Concept
Flatten is fundamentally a coordinate transformation operation:
A zero-copy implementation uses the existing tensor view infrastructure to provide this coordinate mapping without data duplication.
Architecture Overview
Implementation Details
1. FlattenTensorView
2. FlattenIndexMapper
The core of the zero-copy implementation is the coordinate transformation logic:
Coordinate Transformation Examples
Example 1: 2D → 1D Flatten
Example 2: 4D Partial Flatten (CNN Use Case)
Example 3: 3D Middle Dimension Flatten
Performance Benefits
Memory Efficiency
Time Complexity
Cache Performance
Integration with Existing Code
The zero-copy flatten can be integrated seamlessly:
Edge Cases and Considerations
1. Identity Operations
2. Nested Views
3. Materialization Strategy
Sometimes materialization is beneficial:
Testing Considerations
The zero-copy implementation must pass all existing tests in
VoidOpsFlattenTest.kt:Conclusion
Implementing flatten as a zero-copy tensor view provides significant benefits:
The implementation leverages the existing tensor view infrastructure (
TensorView,IndexMapper,TensorData) to provide a robust, efficient, and maintainable solution.This approach transforms flatten from a memory-intensive materialization operation into a lightweight metadata operation, enabling more efficient neural network computations and tensor manipulations.