[SWDEV-531975] Implement workdistribute construct lowering (#140523) #541

skc7 · 2025-11-09T12:37:44Z

This PR introduces a new pass "lower-workdistribute" Fortran array statements are lowered to fir as fir.do_loop unordered. "lower-workdistribute" pass works mainly on identifying "fir.do_loop unordered" that is nested in target{teams{workdistribute{fir.do_loop unordered}}} and lowers it to
target{teams{parallel{wsloop{loop_nest}}}}. It hoists all the other ops outside target region. Relaces heap allocation on target with omp.target_allocmem and deallocation with omp.target_freemem from host. Also replaces runtime function "Assign" with omp.target_memcpy from host.

This pass implements following rewrites and optimisations:

FissionWorkdistribute: finds the parallelizable ops within teams {workdistribute} region and moves them to their own teams{workdistribute} region.
WorkdistributeRuntimeCallLower: finds the FortranAAssign calls nested in teams {workdistribute{}} and lowers it to unordered do loop if src is scalar and dest is array. Other runtime calls are not handled currently.
WorkdistributeDoLower: finds the fir.do_loop unoredered nested in teams {workdistribute{fir.do_loop unoredered}} and lowers it to teams {parallel { distribute {wsloop {loop_nest}}}}.
TeamsWorkdistributeToSingle: hoists all the ops inside teams {workdistribute{}} before teams op.

The work in this PR is C-P and updated from @ivanradanov commits from coexecute implementation:

flang_workdistribute_iwomp_2024

Paper related to this work by @ivanradanov "Automatic Parallelization and OpenMP Offloadingof Fortran Array
Notation"

@ivanradanov

This PR introduces a new pass "lower-workdistribute" Fortran array statements are lowered to fir as fir.do_loop unordered. "lower-workdistribute" pass works mainly on identifying "fir.do_loop unordered" that is nested in target{teams{workdistribute{fir.do_loop unordered}}} and lowers it to target{teams{parallel{wsloop{loop_nest}}}}. It hoists all the other ops outside target region. Relaces heap allocation on target with omp.target_allocmem and deallocation with omp.target_freemem from host. Also replaces runtime function "Assign" with omp.target_memcpy from host. This pass implements following rewrites and optimisations: - **FissionWorkdistribute**: finds the parallelizable ops within teams {workdistribute} region and moves them to their own teams{workdistribute} region. - **WorkdistributeRuntimeCallLower**: finds the FortranAAssign calls nested in teams {workdistribute{}} and lowers it to unordered do loop if src is scalar and dest is array. Other runtime calls are not handled currently. - **WorkdistributeDoLower**: finds the fir.do_loop unoredered nested in teams {workdistribute{fir.do_loop unoredered}} and lowers it to teams {parallel { distribute {wsloop {loop_nest}}}}. - **TeamsWorkdistributeToSingle**: hoists all the ops inside teams {workdistribute{}} before teams op. The work in this PR is C-P and updated from @ivanradanov commits from coexecute implementation: [flang_workdistribute_iwomp_2024](https://github.com/ivanradanov/llvm-project/commits/flang_workdistribute_iwomp_2024) Paper related to this work by @ivanradanov ["Automatic Parallelization and OpenMP Offloadingof Fortran Array Notation"](https://www.osti.gov/servlets/purl/[2449728](https://www.osti.gov/servlets/purl/2449728))

z1-cciauto · 2025-11-09T12:38:12Z

PSDB Link: https://compiler-ci.amd.com/job/compiler-psdb-amd-mainline/444

skc7 · 2025-11-12T12:39:28Z

PSDB has passed.
All the commits related to this feature are already in amd-mainline. This PR is pending to be merged.

@dpalermo Could you please approve?

@ivanradanov

…) (#541) This PR introduces a new pass "lower-workdistribute" Fortran array statements are lowered to fir as fir.do_loop unordered. "lower-workdistribute" pass works mainly on identifying "fir.do_loop unordered" that is nested in target{teams{workdistribute{fir.do_loop unordered}}} and lowers it to target{teams{parallel{wsloop{loop_nest}}}}. It hoists all the other ops outside target region. Relaces heap allocation on target with omp.target_allocmem and deallocation with omp.target_freemem from host. Also replaces runtime function "Assign" with omp.target_memcpy from host. This pass implements following rewrites and optimisations: - **FissionWorkdistribute**: finds the parallelizable ops within teams {workdistribute} region and moves them to their own teams{workdistribute} region. - **WorkdistributeRuntimeCallLower**: finds the FortranAAssign calls nested in teams {workdistribute{}} and lowers it to unordered do loop if src is scalar and dest is array. Other runtime calls are not handled currently. - **WorkdistributeDoLower**: finds the fir.do_loop unoredered nested in teams {workdistribute{fir.do_loop unoredered}} and lowers it to teams {parallel { distribute {wsloop {loop_nest}}}}. - **TeamsWorkdistributeToSingle**: hoists all the ops inside teams {workdistribute{}} before teams op. The work in this PR is C-P and updated from @ivanradanov commits from coexecute implementation: [flang_workdistribute_iwomp_2024](https://github.com/ivanradanov/llvm-project/commits/flang_workdistribute_iwomp_2024) Paper related to this work by @ivanradanov ["Automatic Parallelization and OpenMP Offloadingof Fortran Array Notation"](https://www.osti.gov/servlets/purl/[2449728](https://www.osti.gov/servlets/purl/2449728))

@ivanradanov

…) (#541) This PR introduces a new pass "lower-workdistribute" Fortran array statements are lowered to fir as fir.do_loop unordered. "lower-workdistribute" pass works mainly on identifying "fir.do_loop unordered" that is nested in target{teams{workdistribute{fir.do_loop unordered}}} and lowers it to target{teams{parallel{wsloop{loop_nest}}}}. It hoists all the other ops outside target region. Relaces heap allocation on target with omp.target_allocmem and deallocation with omp.target_freemem from host. Also replaces runtime function "Assign" with omp.target_memcpy from host. This pass implements following rewrites and optimisations: - **FissionWorkdistribute**: finds the parallelizable ops within teams {workdistribute} region and moves them to their own teams{workdistribute} region. - **WorkdistributeRuntimeCallLower**: finds the FortranAAssign calls nested in teams {workdistribute{}} and lowers it to unordered do loop if src is scalar and dest is array. Other runtime calls are not handled currently. - **WorkdistributeDoLower**: finds the fir.do_loop unoredered nested in teams {workdistribute{fir.do_loop unoredered}} and lowers it to teams {parallel { distribute {wsloop {loop_nest}}}}. - **TeamsWorkdistributeToSingle**: hoists all the ops inside teams {workdistribute{}} before teams op. The work in this PR is C-P and updated from @ivanradanov commits from coexecute implementation: [flang_workdistribute_iwomp_2024](https://github.com/ivanradanov/llvm-project/commits/flang_workdistribute_iwomp_2024) Paper related to this work by @ivanradanov ["Automatic Parallelization and OpenMP Offloadingof Fortran Array Notation"](https://www.osti.gov/servlets/purl/[2449728](https://www.osti.gov/servlets/purl/2449728))

skc7 requested a review from SyamaAmd November 9, 2025 12:37

skc7 changed the title ~~Implement workdistribute construct lowering (#140523)~~ [SWDEV-531975] Implement workdistribute construct lowering (#140523) Nov 9, 2025

skc7 requested a review from dpalermo November 9, 2025 12:38

skc7 marked this pull request as ready for review November 11, 2025 09:28

dpalermo approved these changes Nov 14, 2025

View reviewed changes

skc7 merged commit 8e85e31 into amd-mainline Nov 14, 2025
11 checks passed

skc7 deleted the amd/dev/skc7/amd-mainline/workdistribute_PR12345_new branch November 14, 2025 15:47

ronlieb mentioned this pull request Nov 22, 2025

[SWDEV-531975] Implement workdistribute construct lowering (#140523) … #654

Open

ronlieb mentioned this pull request Nov 25, 2025

[SWDEV-531975] Implement workdistribute construct lowering (#140523) … #678

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SWDEV-531975] Implement workdistribute construct lowering (#140523) #541

[SWDEV-531975] Implement workdistribute construct lowering (#140523) #541

Uh oh!

skc7 commented Nov 9, 2025

Uh oh!

z1-cciauto commented Nov 9, 2025

Uh oh!

skc7 commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SWDEV-531975] Implement workdistribute construct lowering (#140523) #541

[SWDEV-531975] Implement workdistribute construct lowering (#140523) #541

Uh oh!

Conversation

skc7 commented Nov 9, 2025

Uh oh!

z1-cciauto commented Nov 9, 2025

Uh oh!

skc7 commented Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants