Skip to content

Conversation

@arhik
Copy link
Contributor

@arhik arhik commented Jan 18, 2026

This commit adds support for scan (parallel prefix sum) operations to cuTile,
based on the IntegerReduce branch and commit 0c9ab90.

Key changes:

  • Added encode_ScanOp! to bytecode encodings for generating ScanOp bytecode
  • Added encode_scan_identity_array! to reuse existing identity encoding
  • Added scan intrinsic implementation using operation_identity from IntegerReduce
  • Added scan() and cumsum() public APIs with proper 1-indexed to 0-indexed axis conversion
  • Added comprehensive codegen tests for scan operations
  • Added scankernel.jl example demonstrating CSDL scan algorithm

Features:

  • Supports cumulative sum (cumsum) for float and integer types
  • Supports both forward and reverse scan directions
  • Reuses FloatIdentityOp and IntegerIdentityOp from IntegerReduce
  • Uses operation_identity function for cleaner identity value creation
  • 1-indexed axis parameter (consistent with reduce operations)
  • Preserves tile shape (scan is an element-wise operation along one dimension)

Tests:

  • All 142 codegen tests pass (including 6 new scan tests)

  • Scankernel.jl example runs successfully with CSDL algorithm

  • Clarify that it demonstrates device-side scan operation

  • Add note that test might occasionally fail (race condition in phase 2 loop)

Minor comment improvements in scankernel.jl example

  • Clarify that it demonstrates device-side scan operation
  • Add note that test might occasionally fail (race condition in phase 2 loop)

arhik added 3 commits January 17, 2026 10:58
This commit enables reduce_sum and reduce_max operations on all numeric types,
extending beyond the previous float-only support.

## Infrastructure Changes

### Bytecode Layer
- Added IntegerIdentityOp struct with signed/unsigned handling
- Added encode_tagged_int! for integer identity encoding
- Added mask_to_width function with zigzag encoding for signed types
- Added encode_identity! dispatch for FloatIdentityOp and IntegerIdentityOp
- Refactored ReduceIdentity → IdentityOp for extensibility

### Compiler Layer
- Refactored emit_reduce! to use dispatch-based approach
- Added operation_identity dispatch for add/max operations
- Added encode_reduce_body dispatch for float and integer operations
- Removed T <: AbstractFloat constraints from intrinsics

### Language Layer
- Removed type constraints from reduce_sum and reduce_max in operations.jl

## Test Coverage

### Codegen Tests
- Added FileCheck tests for Int32/UInt32 reduce_sum and reduce_max
- Verifies correct IR generation (addi, maxi instructions)

### Execution Tests
- Factory pattern for easy extension (makeReduceKernel, cpu_reduce)
- Tests 10 types: Int8, Int16, Int32, Int64, UInt16, UInt32, UInt64, Float16, Float32, Float64
- Tests 2 operations: reduce_sum, reduce_max
- CPU verification for all test cases
- Type-appropriate input ranges to prevent overflow

## Files Changed

- src/bytecode/encodings.jl: Fix IdentityOp type annotation
- src/bytecode/writer.jl: Integer identity infrastructure
- src/compiler/intrinsics.jl: Import identity types
- src/compiler/intrinsics/core.jl: Dispatch-based reduce implementation
- src/cuTile.jl: Export identity types
- src/language/operations.jl: Remove type constraints
- test/codegen.jl: Add integer reduction codegen tests
- test/execution.jl: Add extendable execution tests

## Extensibility

The infrastructure is designed for easy extension:
- Add new reduce operations by defining operation_identity and encode_reduce_body methods
- Add new types by adding to TEST_TYPES array and appropriate data generation
- Constrain T <: Number in reduce_sum/reduce_max signatures for type safety
- Ensures only numeric types can be used with reduction operations
This commit adds support for scan (parallel prefix sum) operations to cuTile,
based on the IntegerReduce branch and commit 0c9ab90.

Key changes:
- Added encode_ScanOp! to bytecode encodings for generating ScanOp bytecode
- Added encode_scan_identity_array! to reuse existing identity encoding
- Added scan intrinsic implementation using operation_identity from IntegerReduce
- Added scan() and cumsum() public APIs with proper 1-indexed to 0-indexed axis conversion
- Added comprehensive codegen tests for scan operations
- Added scankernel.jl example demonstrating CSDL scan algorithm

Features:
- Supports cumulative sum (cumsum) for float and integer types
- Supports both forward and reverse scan directions
- Reuses FloatIdentityOp and IntegerIdentityOp from IntegerReduce
- Uses operation_identity function for cleaner identity value creation
- 1-indexed axis parameter (consistent with reduce operations)
- Preserves tile shape (scan is an element-wise operation along one dimension)

Tests:
- All 142 codegen tests pass (including 6 new scan tests)
- Scankernel.jl example runs successfully with CSDL algorithm

- Clarify that it demonstrates device-side scan operation
- Add note that test might occasionally fail (race condition in phase 2 loop)

Minor comment improvements in scankernel.jl example

- Clarify that it demonstrates device-side scan operation
- Add note that test might occasionally fail (race condition in phase 2 loop)
@arhik arhik marked this pull request as ready for review January 18, 2026 05:29
@arhik
Copy link
Contributor Author

arhik commented Jan 18, 2026

This depends on #37 PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant