[feat] FU-level fusion by HobbitQia · Pull Request #244 · coredac/dataflow

HobbitQia · 2026-01-22T13:30:34Z

This pass implements the FU-level fusion after DFG-level fusion (PR 194), which aims to find the minimum FU cost that can cover all patterns extracted in previous passes (wrapped in fused_op).

The algorithm can be depicted as below:

Pattern Extraction: Extracts fused operation patterns from the module and linearizes them via topological sort.
Standalone Operation Extraction: Collects standalone operations not inside fused patterns for hardware coverage.
Template Creation: Greedily merges patterns into shared hardware templates using cost-based accommodation with DFS mapping search.
Connection Generation: Generates optimized FU connections based on pattern dependencies with bypass support.
Execution Plan Generation: Creates parallel execution stages by grouping operations at the same topological level.
JSON Output: Writes hardware configuration including templates, connections, and execution plans to JSON file.

TODO:

Replace FunctionalUnit with FunctionUnit in Architecture.h.
Update pattern_name of fused_op so that this attribute better illustrates the structure of the pattern.

Output JSON File

Field	Description
`template_id`	Unique identifier for this template
`instance_count`	Number of instances each pattern
`supported_single_ops`	Individual operations this template can execute standalone
`supported_composite_ops`	Fused patterns this template can execute
`functional_units`	Array of FU definitions with their supported operations
`fu_connections`	Data routing paths between FUs
`pattern_execution_plans`	Detailed execution schedules for each pattern

Execution Plan Fields

Field	Description
`pattern_id`	Pattern being executed
`pattern_name`	pattern name
`fu_mapping`	Maps operation index to slot ID: `[op0→fu0, op1→fu1, ...]`, e.g. `[1, 2]` means we will use fu1 and fu2 to execute op0 and op1 of this pattern.
`execution_stages`	Ordered stages of execution
`parallel_fus`	Slots executing in this stage (can be multiple for parallel ops)
`parallel_ops`	Operations executing in this stage

Example

        "pattern_execution_plans": [
          {
            "pattern_id": 1,
            "pattern_name": "fused_op:icmp->grant_predicate->grant_predicate",
            "fu_mapping": [0, 1, 2],
            "execution_stages": [
              {
                "stage": 0,
                "parallel_fus": [0],
                "parallel_ops": ["neura.icmp"]
              },
              {
                "stage": 1,
                "parallel_fus": [1, 2],
                "parallel_ops": ["neura.grant_predicate", "neura.grant_predicate"]
              }
            ]
          },

The diagram below shows Hardware Template 1, where FU 0 serves as the data source for both FU 1 and FU 2.

       Template 1 Pipeline
      =====================
      
           [ FU 0 ]  <-- (neura.icmp)
              |
      ________|________
     |                 |
  [ FU 1 ]          [ FU 2 ]  <-- (neura.grant_predicate)
     |                 |
 (Grant A)         (Grant B)

…are_merge

…into hardware_merge

include/NeuraDialect/Transforms/GraphMining/HardwareTemplate.h

test/neura/fusion/test.mlir

include/NeuraDialect/NeuraPasses.td

lib/NeuraDialect/Transforms/HardwareMergePass.cpp

include/NeuraDialect/Transforms/GraphMining/HardwareTemplate.h

HobbitQia and others added 10 commits December 16, 2025 12:24

init HardwareMergePass

cd9c65e

update HardwareMergePass

cdaaaa3

Merge branch 'main' of https://github.com/coredac/dataflow into hardw…

938e7d5

…are_merge

update hardware template

446e497

update logic of hardware merging

c6a9afa

add include file

3af882f

add test file

b0073f3

Merge branch 'coredac:main' into hardware_merge

c40d975

Merge branch 'hardware_merge' of https://github.com/HobbitQia/dataflow …

ed94ebb

…into hardware_merge

fix test

aad327a

HobbitQia requested a review from tancheng January 22, 2026 13:34

tancheng reviewed Jan 23, 2026

View reviewed changes

HobbitQia added 3 commits January 23, 2026 21:17

fix format and add comments

8deb304

refactor HardwareMergePass to remove 'slot'

d567e9d

update test

99f8cae

tancheng approved these changes Jan 25, 2026

View reviewed changes

include/NeuraDialect/NeuraPasses.td Outdated Show resolved Hide resolved

lib/NeuraDialect/Transforms/HardwareMergePass.cpp Outdated Show resolved Hide resolved

remove slot in the comments

c586140

tancheng reviewed Jan 25, 2026

View reviewed changes

include/NeuraDialect/Transforms/GraphMining/HardwareTemplate.h Show resolved Hide resolved

HobbitQia merged commit cd2ae13 into coredac:main Jan 26, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] FU-level fusion#244

[feat] FU-level fusion#244
HobbitQia merged 14 commits intocoredac:mainfrom
HobbitQia:hardware_merge

HobbitQia commented Jan 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HobbitQia commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO:

Output JSON File

Execution Plan Fields

Example

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HobbitQia commented Jan 22, 2026 •

edited

Loading