Skip to content

[flang][OpenMP] Parallel do in declare target subroutine does not work #143887

Closed as not planned
@hakostra

Description

@hakostra

I have flang from a recent git commit:

$ flang --version
flang version 21.0.0git (https://github.com/llvm/llvm-project.git 40cc7b4578fd2d65aaef8356fbe7caf2d84a8f3e)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/llvm/llvm-project/install/bin

I believe the following example is valid OpenMP code:

PROGRAM reproducer
    IMPLICIT NONE

    REAL, ALLOCATABLE, TARGET, DIMENSION(:) :: arr

    INTEGER, PARAMETER :: ngrids = 2
    INTEGER, PARAMETER :: cellsperdim = 4
    INTEGER, PARAMETER :: cellspergrid = cellsperdim**3
    INTEGER :: iouter, ip3

    ALLOCATE(arr(ngrids * cellspergrid), source=-1.0)

    !$omp target teams distribute private(ip3) map(tofrom: arr)
    DO iouter = 1, ngrids
        ip3 = (iouter - 1) * cellspergrid + 1
        CALL kernel(arr(ip3))
    END DO
    !$omp end target teams distribute

    PRINT *, arr
    DEALLOCATE(arr)
CONTAINS
    SUBROUTINE kernel(gridarr)
        !$omp declare target

        ! Subroutine arguments
        REAL, INTENT(INOUT), DIMENSION(cellsperdim, cellsperdim, cellsperdim) :: gridarr

        ! Local variables
        INTEGER :: i, j, k

        !$omp parallel do collapse(2) private(i, j, k) shared(gridarr)
        DO i = 1, cellsperdim
            DO j = 1, cellsperdim
                DO k = 1, cellsperdim
                    gridarr(k, j, i) = REAL(i)
                END DO
            END DO
        END DO
        !$omp end parallel do
    END SUBROUTINE kernel
END PROGRAM reproducer

This program works with both gfortran and Cray ftn. I can compile it just fine with flang: flang -O3 -fopenmp -fopenmp-version=52 -fopenmp-targets=nvptx64 flang-omp-device-bug.F90 but running it with mandatory offloading gives incorrect results:

$ OMP_TARGET_OFFLOAD=mandatory ./a.out 
' -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1.
 -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1.
 -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1.
 -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1.
 -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1.
 -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1.
 -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1.

Disabling offloading gives the expected results:

$ OMP_TARGET_OFFLOAD=disabled ./a.out 
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.
 2. 2. 2. 2. 2. 2. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 4. 4. 4. 4.
 4. 4. 4. 4. 4. 4. 4. 4. 4. 4. 4. 4. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 2. 3. 3. 3. 3. 3. 3. 3. 3.
 3. 3. 3. 3. 3. 3. 3. 3. 4. 4. 4. 4. 4. 4. 4. 4. 4. 4. 4. 4. 4. 4. 4. 4.

I created an equivalent example in C, and both gcc, clang (from the same git commit as flang) and Cray cc run that example just fine both with and without offloading.

When I remove the omp parallel do in the kernel subroutine, the example prints the expected result, both with and without offloading:

For offloading I have an Nvidia RTX 4080 Super, except with Cray where I am on a shared HPC system.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions