# Composed Layout in CuTe

A **Composed Layout** is a powerful abstraction in CuTe that enables complex data transformations through 
the composition of layouts and transformations. It provides a flexible way to manipulate memory layouts 
and coordinate systems.

## Components

A Composed Layout consists of three key components:

1. **Inner Layout/Transformation** (`inner`):
   - Can be a layout, swizzle, or custom transformation function
   - Applies the final transformation to the coordinates
   - Supports arbitrary coordinate manipulations

2. **Offset** (`offset`):
   - Typically represented as an integer tuple
   - Adds a constant displacement to coordinates
   - Enables fine-grained control over data positioning

3. **Outer Layout** (`outer`):
   - The layout visible to the user
   - Defines the initial coordinate transformation
   - Determines the shape and organization of the data structure

## Mathematical Representation

The mathematical composition of these components is defined as:

$
R(c) := (inner \circ offset \circ outer)(c) := inner(offset + outer(c))
$

Where:
- $c$ represents the input coordinates
- $\circ$ denotes function composition
- The transformation is applied from right to left

## Usage in Python

To create a Composed Layout in Python, use the `make_composed_layout` function:

```python
layout = cute.make_composed_layout(inner, offset, outer)
```

## Key Benefits

1. **Flexibility**: Supports complex transformations that direct composition cannot handle
2. **Modularity**: Separates different aspects of the transformation
3. **Performance**: Enables optimized memory access patterns for GPU computations
4. **Compatibility**: Works with various types of transformations and layouts

## Custom Transformation Example

This example demonstrates how to create a Composed Layout with a custom transformation function. We'll create a simple transformation that:

1. Takes a 2D coordinate input `(x, y)`
2. Increments the y-coordinate by 1
3. Combines this with an offset and identity layout

The example shows how to:
- Define a custom transformation function
- Create a composed layout with the transformation
- Apply the layout to coordinates
- Print the results for verification

In [None]:
import cutlass
import cutlass.cute as cute
from cutlass.cute.runtime import from_dlpack, make_ptr


@cute.jit
def customized_layout():
    def inner(c):
        x, y = c
        return x, y + 1

    layout = cute.make_composed_layout(
        inner, (1, 0), cute.make_identity_layout(shape=(8, 4))
    )
    print(layout)
    cute.printf(layout(0))


customized_layout()

## Gather/Scatter Operations with Composed Layout

Gather and Scatter operations are fundamental data access patterns in parallel computing and GPU programming. In CuTe, we can implement these operations elegantly using Composed Layout.

### Gather Operation
A gather operation collects elements from a source array using an index array (also called an indirection array). It's defined as:
```python
output[i] = source[index[i]]
```

#### Components in CuTe Implementation:
1. **Offset Tensor**: Contains the indices for gathering (`offset_tensor`)
2. **Data Pointer**: Points to the source data array (`data_ptr`)
3. **Shape**: Defines the shape of logic tensor viewed by user (`shape`)

### How it Works
1. The inner transformation function reads from the offset tensor:
   ```python
   def inner(c):
       return offset_tensor[c]  # Returns the gather index
   ```
2. The composed layout maps input coordinates through the offset tensor:
   ```python
   gather_layout = cute.make_composed_layout(inner, 0, cute.make_layout(shape))
   ```
3. This creates an indirect access pattern where:
   - Input coordinate `i` → `offset_tensor[i]` → `data_ptr[offset_tensor[i]]`

4. notably, layout operations like slice, partition can still be applied on `outer` layout

### Use Cases
- **Sparse Operations**: Accessing non-contiguous memory efficiently
- **Graph Processing**: Following edge connections in graph algorithms
- **Feature Embedding**: Looking up embeddings for discrete tokens
- **Irregular Data Access**: Any pattern requiring indirect memory access

### Example Output Interpretation
The example code prints pairs of numbers `i -> j` where:
- `i` is the output index
- `j` is the gathered source index from `offset_tensor`

This demonstrates how the composed layout transforms coordinates for indirect memory access.

Note: Scatter operations (writing to indirect locations) can be implemented similarly by reversing the data flow direction.


In [None]:
import torch


@cute.jit
def gather_tensor(
    offset_tensor: cute.Tensor, data_ptr: cute.Pointer, shape: cute.Shape
):
    def inner(c):
        return offset_tensor[c]

    gather_layout = cute.make_composed_layout(inner, 0, cute.make_layout(shape))
    for i in cutlass.range_constexpr(cute.size(shape)):
        cute.printf("%d -> %d", i, gather_layout(i))

    # TODO: support in future
    # gather_tensor = cute.make_tensor(data_ptr, gather_layout)
    # cute.printf(gather_tensor[0])


shape = (16,)
offset_tensor = torch.randint(0, 256, shape, dtype=torch.int32)
data_tensor = torch.arange(0, 256, dtype=torch.int32)


gather_tensor(
    from_dlpack(offset_tensor),
    make_ptr(cutlass.Int32, data_tensor.data_ptr(), cute.AddressSpace.generic),
    shape,
)