feat: Implement TOSA to Taskflow lowering pipeline #245

guosran · 2026-01-23T23:14:40Z

This PR introduces a new pipeline to lower TOSA operations to the Taskflow dialect through Linalg and Affine conversions. Key changes include:

New Pipeline: Added TosaToTaskflowPipeline.cpp that orchestrates:
- TOSA Optimizations: Integrated standard TOSA cleanups (infer-shapes, make-broadcastable) and structure. Note: Constant folding has limited impact currently (see Issue [P1] Enhance TOSA constant folding effectivenes #246 ).
- TOSA -> Linalg/Arith/Tensor conversion.
- Linalg Optimizations: Enabled elementwise-fusion, which proved critical for merging operation chains into single kernels.
- One-Shot Bufferization: Configured with IdentityLayoutMap for deterministic results.
- Linalg -> Affine conversion.
- Affine -> Taskflow conversion.
Tests:
- Added tosa-to-taskflow.mlir to verify the full end-to-end pipeline.
- Added tosa-to-affine.mlir for inspecting the intermediate structural lowering.
- Added tosa-fusion.mlir to verify that operator chains are correctly fused into single loops.

Copilot

Pull request overview

This PR adds a new TOSA→Taskflow lowering pipeline and wires it into the mlir-neura-opt tool, along with the necessary dialect/extension registrations and tests.

Changes:

Introduces MLIRTosaToTaskflowPipeline and registerTosaToTaskflowPipeline() to lower TOSA through Linalg, bufferization, and Affine to Taskflow.
Updates mlir-neura-opt to register TOSA/bufferization dialects, bufferization interfaces, MLIR extensions, and the new pipeline, and links in the required MLIR libraries.
Adds conversion tests for full TOSA→Taskflow lowering and direct Affine→Taskflow lowering, plus updates a CGRA-Bench submodule reference.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tools/mlir-neura-opt/mlir-neura-opt.cpp	Registers new dialects, bufferization interfaces, MLIR extensions, passes, and the TOSA→Taskflow pipeline in the optimization tool.
tools/mlir-neura-opt/CMakeLists.txt	Links additional MLIR dialect, transform, bufferization, and extension libraries required by the tool.
test/benchmark/CGRA-Bench	Updates submodule commit for CGRA benchmark data.
test/Conversion/TosaToTaskflow/tosa-to-taskflow.mlir	Adds an end-to-end test for the new TOSA→Taskflow pipeline.
test/Conversion/TosaToTaskflow/affine-to-taskflow.mlir	Adds a focused test for the Affine→Taskflow conversion pass.
lib/Conversion/TosaToTaskflow/TosaToTaskflowPipeline.cpp	Implements the TOSA→Taskflow pass pipeline (TOSA→Linalg/Arith/Tensor, bufferization, Linalg→Affine, Affine→Taskflow).
lib/Conversion/TosaToTaskflow/CMakeLists.txt	Builds and links the new `MLIRTosaToTaskflowPipeline` library with required MLIR components.
lib/Conversion/CMakeLists.txt	Integrates the new TOSA→Taskflow pipeline library into the Conversion CMake hierarchy and interface library.
include/Conversion/ConversionPasses.h	Declares `registerTosaToTaskflowPipeline()` for external registration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lib/Conversion/TosaToTaskflow/TosaToTaskflowPipeline.cpp

This PR introduces a new pipeline to lower TOSA operations to the Taskflow dialect through Linalg and Affine conversions. Key changes include: 1. **New Pipeline**: Added that orchestrates: * TOSA -> Linalg/Arith/Tensor conversion. * Linalg optimizations (elementwise fusion). * One-Shot Bufferization with for deterministic results. * Linalg -> Affine conversion. * Affine -> Taskflow conversion. 2. **Tooling Support**: * Updated to register TOSA, Bufferization, and related Dialects. * Explicitly registered Bufferization interfaces for Linalg, Tensor, Arith, and SCF to prevent runtime crashes. * Added and links. 3. **Tests**: * Added to verify the full pipeline. * Added for direct affine lowering verification. * Ensured compatibility with existing Taskflow tests (e.g. ).

ShangkunLi · 2026-01-24T02:31:29Z

Thanks for this great job~

I think we should enable a progressive lowering process from tosa to taskflow. Like from tosa to affine, then affine to taskflow.

And could you please help investigate if we can perform some graph-level optimization in tosa level (e.g., operator fusion), or we have to introduce the linalg as an intermediate representation.

Split the TosaToTaskflow pipeline into two distinct pipelines: 1. : Lowers TOSA to Linalg (with optimizations and bufferization) and then to Affine. This serves as a foundational pipeline for inspection or further affine transformations. 2. : A composite pipeline that runs followed by . Key changes: - Refactored to expose . - Registered both pipelines in and . - Added test case to verify the intermediate affine stage.

guosran · 2026-01-24T03:59:54Z

Thanks for this great job~

I think we should enable a progressive lowering process from tosa to taskflow. Like from tosa to affine, then affine to taskflow.

And could you please help investigate if we can perform some graph-level optimization in tosa level (e.g., operator fusion), or we have to introduce the linalg as an intermediate representation.

I have split the pipeline and added a test accordingly. Perhaps I could perform optimizations in a subsequent pr?

Enabled TOSA standard optimization passes (InferShapes, MakeBroadcastable, LayerwiseConstantFold) in the pipeline. Added 'tosa-fusion.mlir' to verify Linalg elementwise fusion and 'tosa-opt.mlir' to benchmark TOSA constant folding (currently a known limitation).

guosran · 2026-01-24T05:11:26Z

Thanks for this great job~
I think we should enable a progressive lowering process from tosa to taskflow. Like from tosa to affine, then affine to taskflow.
And could you please help investigate if we can perform some graph-level optimization in tosa level (e.g., operator fusion), or we have to introduce the linalg as an intermediate representation.

I have split the pipeline and added a test accordingly. Perhaps I could perform optimizations in a subsequent pr?

done some of the optimizations, will present here

guosran · 2026-01-24T05:22:13Z

Summary for optimizations

Input Source (TOSA IR)
The test case consists of a chain of three logical operations: Add, Multiply, and Maximum (ReLU).

func.func @fusion_test(%arg0: tensor<16xf32>) -> tensor<16xf32> {
  // Op 1: Add
  %0 = tosa.add %arg0, %arg0 : (tensor<16xf32>, tensor<16xf32>) -> tensor<16xf32>
  // Op 2: Multiply
  %1 = tosa.mul %0, %0 : (tensor<16xf32>, tensor<16xf32>) -> tensor<16xf32>
  // Op 3: Maximum (ReLU placeholder)
  %zeros = "tosa.const"() {value = dense<0.0> : tensor<16xf32>} : () -> tensor<16xf32>
  %2 = tosa.maximum %1, %zeros : (tensor<16xf32>, tensor<16xf32>) -> tensor<16xf32>
  
  return %2 : tensor<16xf32>
}

Baseline Result (Without Linalg Fusion)
Without the fusion transformation, each high-level TOSA operation results in a standalone loop nest, which is far from optimal.

func.func @fusion_test(%arg0: memref<16xf32>, %arg1: memref<16xf32>) {
  // Pass 1: Addition
  %tmp1 = memref.alloc() : memref<16xf32>
  affine.for %i = 0 to 16 {
    %v0 = affine.load %arg0[%i] : memref<16xf32>
    %v1 = arith.addf %v0, %v0 : f32
    affine.store %v1, %tmp1[%i] : memref<16xf32>
  }
  // Pass 2: Multiplication (Reads output of Pass 1)
  %tmp2 = memref.alloc() : memref<16xf32>
  affine.for %i = 0 to 16 {
    %v1 = affine.load %tmp1[%i] : memref<16xf32>
    %v2 = arith.mulf %v1, %v1 : f32
    affine.store %v2, %tmp2[%i] : memref<16xf32>
  }
  // Pass 3: Maximum (Reads output of Pass 2)
  affine.for %i = 0 to 16 {
    %v2 = affine.load %tmp2[%i] : memref<16xf32>
    %v3 = arith.maximumf %v2, %cst : f32
    affine.store %v3, %arg1[%i] : memref<16xf32>
  }
  return
}

Optimized Result (With Linalg Fusion)
With the fusion pass enabled in our current pipeline, all three operations are consolidated into a single loop nest.

func.func @fusion_test(%arg0: memref<16xf32>, %arg1: memref<16xf32>) {
  %cst = arith.constant 0.000000e+00 : f32
  %alloc = memref.alloc() {alignment = 64 : i64} : memref<16xf32>
  affine.for %arg2 = 0 to 16 {
    %0 = affine.load %arg0[%arg2] : memref<16xf32>
    %1 = arith.addf %0, %0 : f32        // Fused Op 1
    %2 = arith.mulf %1, %1 : f32        // Fused Op 2
    %3 = arith.maximumf %2, %cst : f32  // Fused Op 3
    affine.store %3, %alloc[%arg2] : memref<16xf32>
  }
  memref.copy %alloc, %arg1 : memref<16xf32> to memref<16xf32>
  return
}

Conclusion
The linalg lowering pipeline is crucial for performance.

tancheng · 2026-01-24T11:10:49Z

Why remove the relu_int.cpp?

tancheng · 2026-01-24T11:11:30Z

Why remove the relu_int.cpp?

ShangkunLi · 2026-01-24T12:15:18Z

include/Conversion/ConversionPasses.h

 // TaskFlow Conversion Passes.
 std::unique_ptr<mlir::Pass> createConvertAffineToTaskflowPass();
+void registerTosaToAffinePipeline();
+void registerTosaToTaskflowPipeline();


I think we may not need such a pass pipeline for now.

You can add a test to verify the end2end lowering process, e.g., from python -> tosa ->linalg -> affine ->taskflow.

This is to verify the lowering process, we can add more optimizations in each level.

guosran requested review from ShangkunLi and Copilot and removed request for Copilot January 23, 2026 23:14

Copilot AI reviewed Jan 23, 2026

View reviewed changes

lib/Conversion/TosaToTaskflow/TosaToTaskflowPipeline.cpp Show resolved Hide resolved

guosran force-pushed the feature/tosa-lowering branch from e041bd9 to 0199b8d Compare January 23, 2026 23:19

guosran force-pushed the feature/tosa-lowering branch from 0199b8d to 609d337 Compare January 23, 2026 23:24

Update TOSA lowering pipeline implementation

8c51f9d

guosran mentioned this pull request Jan 24, 2026

[P1] Enhance TOSA constant folding effectivenes #246

Open

ShangkunLi reviewed Jan 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Implement TOSA to Taskflow lowering pipeline #245

feat: Implement TOSA to Taskflow lowering pipeline #245

guosran commented Jan 23, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

ShangkunLi commented Jan 24, 2026

Uh oh!

guosran commented Jan 24, 2026

Uh oh!

guosran commented Jan 24, 2026

Uh oh!

guosran commented Jan 24, 2026

Uh oh!

tancheng commented Jan 24, 2026

Uh oh!

tancheng commented Jan 24, 2026

Uh oh!

ShangkunLi Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: Implement TOSA to Taskflow lowering pipeline #245

Are you sure you want to change the base?

feat: Implement TOSA to Taskflow lowering pipeline #245

Conversation

guosran commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

ShangkunLi commented Jan 24, 2026

Uh oh!

guosran commented Jan 24, 2026

Uh oh!

guosran commented Jan 24, 2026

Uh oh!

guosran commented Jan 24, 2026

Summary for optimizations

Uh oh!

tancheng commented Jan 24, 2026

Uh oh!

tancheng commented Jan 24, 2026

Uh oh!

ShangkunLi Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

guosran commented Jan 23, 2026 •

edited

Loading