Skip to content

[feat] FU-level fusion#244

Merged
HobbitQia merged 14 commits intocoredac:mainfrom
HobbitQia:hardware_merge
Jan 26, 2026
Merged

[feat] FU-level fusion#244
HobbitQia merged 14 commits intocoredac:mainfrom
HobbitQia:hardware_merge

Conversation

@HobbitQia
Copy link
Copy Markdown
Collaborator

@HobbitQia HobbitQia commented Jan 22, 2026

This pass implements the FU-level fusion after DFG-level fusion (PR 194), which aims to find the minimum FU cost that can cover all patterns extracted in previous passes (wrapped in fused_op).

The algorithm can be depicted as below:

  1. Pattern Extraction: Extracts fused operation patterns from the module and linearizes them via topological sort.
  2. Standalone Operation Extraction: Collects standalone operations not inside fused patterns for hardware coverage.
  3. Template Creation: Greedily merges patterns into shared hardware templates using cost-based accommodation with DFS mapping search.
  4. Connection Generation: Generates optimized FU connections based on pattern dependencies with bypass support.
  5. Execution Plan Generation: Creates parallel execution stages by grouping operations at the same topological level.
  6. JSON Output: Writes hardware configuration including templates, connections, and execution plans to JSON file.

TODO:

  • Replace FunctionalUnit with FunctionUnit in Architecture.h.
  • Update pattern_name of fused_op so that this attribute better illustrates the structure of the pattern.

Output JSON File

Field Description
template_id Unique identifier for this template
instance_count Number of instances each pattern
supported_single_ops Individual operations this template can execute standalone
supported_composite_ops Fused patterns this template can execute
functional_units Array of FU definitions with their supported operations
fu_connections Data routing paths between FUs
pattern_execution_plans Detailed execution schedules for each pattern

Execution Plan Fields

Field Description
pattern_id Pattern being executed
pattern_name pattern name
fu_mapping Maps operation index to slot ID: [op0→fu0, op1→fu1, ...], e.g. [1, 2] means we will use fu1 and fu2 to execute op0 and op1 of this pattern.
execution_stages Ordered stages of execution
parallel_fus Slots executing in this stage (can be multiple for parallel ops)
parallel_ops Operations executing in this stage

Example

        "pattern_execution_plans": [
          {
            "pattern_id": 1,
            "pattern_name": "fused_op:icmp->grant_predicate->grant_predicate",
            "fu_mapping": [0, 1, 2],
            "execution_stages": [
              {
                "stage": 0,
                "parallel_fus": [0],
                "parallel_ops": ["neura.icmp"]
              },
              {
                "stage": 1,
                "parallel_fus": [1, 2],
                "parallel_ops": ["neura.grant_predicate", "neura.grant_predicate"]
              }
            ]
          },

The diagram below shows Hardware Template 1, where FU 0 serves as the data source for both FU 1 and FU 2.

       Template 1 Pipeline
      =====================
      
           [ FU 0 ]  <-- (neura.icmp)
              |
      ________|________
     |                 |
  [ FU 1 ]          [ FU 2 ]  <-- (neura.grant_predicate)
     |                 |
 (Grant A)         (Grant B)

@HobbitQia HobbitQia requested a review from tancheng January 22, 2026 13:34
@HobbitQia HobbitQia merged commit cd2ae13 into coredac:main Jan 26, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants