Skip to content

[P1] Enhance TOSA constant folding effectivenes #246

@guosran

Description

@guosran

Hi all :) During the implementation of the TOSA to Taskflow pipeline in #245 , I have identified the standard tosa-layerwise-constant-fold pass does not currently fold simple constant subexpressions when using arith.constant as operands.

  • Input TOSA IR:
func.func @const_fold_test() -> tensor<4xf32> {
  %cst1 = arith.constant dense<[1.0, 2.0, 3.0, 4.0]> : tensor<4xf32>
  %cst2 = arith.constant dense<[10.0, 20.0, 30.0, 40.0]> : tensor<4xf32>
  
  // This addition should be folded!
  %folded = tosa.add %cst1, %cst2 : (tensor<4xf32>, tensor<4xf32>) -> tensor<4xf32>
  return %folded : tensor<4xf32>
}
  • Current Suboptimal Output:
// Actual result from current pipeline: runtime calculation
affine.for %arg1 = 0 to 4 {
  %2 = affine.load %0[%arg1] : memref<4xf32>
  %3 = affine.load %1[%arg1] : memref<4xf32>
  %4 = arith.addf %2, %3 : f32   // <--- Suboptimal: Runtime addition
  affine.store %4, %alloc[%arg1] : memref<4xf32>
}
  • Expected Target Output:
// Desired Result: Pure constant propagation
memref.global "private" constant @__constant_sum : memref<4xf32> = dense<[11.0, 12.0, 13.0, 14.0]>
func.func @const_fold_test(%arg0: memref<4xf32>) {
  %0 = memref.get_global @__constant_sum : memref<4xf32>
  memref.copy %0, %arg0 : memref<4xf32> to memref<4xf32>
  return
}

Which indicates the current solution is not optimal.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions