Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defect: Invalid RMA address range accessing coarray element with allocatable component on remote image #774

Open
1 task done
everythingfunctional opened this issue Dec 6, 2022 · 0 comments

Comments

@everythingfunctional
Copy link
Collaborator

everythingfunctional commented Dec 6, 2022

  • I am reporting a bug others will be able to reproduce and not asking a question or requesting a new feature.

System information including:

  • OpenCoarrays Version: 2.10.1
  • Fortran Compiler: gfortran 12.2.0
  • C compiler used for building lib: gcc 12.2.0
  • Installation method: ./install.sh
  • All flags & options passed to the installer: -M /usr
  • Output of uname -a: Linux stray 6.0.11-arch1-1 #1 SMP PREEMPT_DYNAMIC Fri, 02 Dec 2022 17:25:31 +0000 x86_64 GNU/Linux
  • MPI library being used: openmpi 4.1.4-4
  • Machine architecture and number of physical cores: Intel x86, 4 core
  • Version of CMake: 3.25.1 installed via package manager

To help us debug your issue please explain:

What you were trying to do (and why)

Use derived type with allocatable component in coarray, Example

module payload_m
    implicit none
    private
    public :: payload_t, empty_payload

    type :: payload_t
        !! A raw buffer to facilitate data transfer between  images
        !!
        !! Facilitates view of the data as either a string or raw bytes.
        !! Typical usage will be either to
        !! * produce a string representation of the data, and then parse that string to recover the original data
        !! * use the `transfer` function to copy the raw bytes of the data
        private
        integer, allocatable, public :: payload_(:)
    contains
        private
        procedure, public :: raw_payload
        procedure, public :: string_payload
    end type

    interface payload_t
        pure module function from_raw(payload) result(new_payload)
            implicit none
            integer, intent(in) :: payload(:)
            type(payload_t) :: new_payload
        end function

        pure module function from_string(payload) result(new_payload)
            implicit none
            character(len=*), intent(in) :: payload
            type(payload_t) :: new_payload
        end function

        module procedure empty_payload
    end interface

    interface
        pure module function empty_payload()
            implicit none
            type(payload_t) :: empty_payload
        end function

        pure module function raw_payload(self)
            implicit none
            class(payload_t), intent(in) :: self
            integer, allocatable :: raw_payload(:)
        end function

        pure module function string_payload(self)
            implicit none
            class(payload_t), intent(in) :: self
            character(len=:), allocatable :: string_payload
        end function
    end interface

end module

submodule(payload_m) payload_s
    implicit none
contains
    module procedure from_raw
        new_payload%payload_ = payload
    end procedure

    module procedure from_string
        new_payload = payload_t([len(payload), transfer(payload,[integer::])])
    end procedure

    module procedure empty_payload
        empty_payload%payload_  = [integer::]
    end procedure

    module procedure raw_payload
        if (allocated(self%payload_)) then
            raw_payload = self%payload_
        else
            raw_payload = [integer::]
        end if
    end procedure

    module procedure string_payload
        if (allocated(self%payload_)) then
            if (size(self%payload_) > 0) then
                allocate(character(len=self%payload_(1)) :: string_payload)
                if (len(string_payload) > 0) &
                    string_payload = transfer(self%payload_(2:),string_payload)
            else
                allocate(character(len=0) :: string_payload)
            end if
        else
            allocate(character(len=0) :: string_payload)
        end if
    end procedure

end submodule

program example
    use payload_m, only: payload_t

    character(len=*), parameter :: MESSAGE = "Hello, World!"
    type(payload_t) :: mailbox[*]

    if (this_image() == 1) then
        mailbox = payload_t(MESSAGE)
    end if
    sync all
    if (this_image() /= 1) then
        mailbox = mailbox[1]
    end if

    print *, mailbox%string_payload(), " from image: ", this_image()
end program

What happened (include command output, screenshots, logs, etc.)

$ caf main.f90 
$ cafrun -n 2 ./a.out
 Hello, World! from image:            1

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7fb8c68519ff in ???
#1  0x55dfaf068c12 in ???
#2  0x55dfaf06a3b7 in ???
#3  0x55dfaf06a4ad in ???
#4  0x7fb8c683c28f in ???
#5  0x7fb8c683c349 in ???
#6  0x55dfaf068564 in _start
	at ../sysdeps/x86_64/start.S:115
#7  0xffffffffffffffff in ???
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 1 with PID 0 on node stray exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Error: Command:
   `/usr/bin/mpiexec -n 2 ./a.out`
failed to run.

with a more complex example (specifically the FEATS quadratic_solver example which can be found here in the examples folder) the error message is

[stray:40367] *** An error occurred in MPI_Get
[stray:40367] *** reported by process [2800353281,1]
[stray:40367] *** on win rdma window 5
[stray:40367] *** MPI_ERR_RMA_RANGE: invalid RMA address range
[stray:40367] *** MPI_ERRORS_ARE_FATAL (processes in this win will now abort,
[stray:40367] ***    and potentially your MPI job)
Error: Command:
   `/usr/bin/mpiexec -n 2 build/caf_3E83561A255DFE60/example/quadratic_solver`
failed to run.

What you expected to happen

The above example should print "Hello, World! from image: 1" for each image.

Step-by-step reproduction instructions to reproduce the error/bug

Simply compile the provided example as shown above. This was originally reported as follow up in #739, but was worthy of its own issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant