Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GA_Transpose should not require MA #163

Closed
jeffhammond opened this issue May 25, 2020 · 0 comments
Closed

GA_Transpose should not require MA #163

jeffhammond opened this issue May 25, 2020 · 0 comments
Assignees

Comments

@jeffhammond
Copy link
Member

In the process of writing and testing https://github.com/ParRes/Kernels/blob/master/FORTRAN/transpose-ga.F90, I found that GA_Transpose crashes unless MA is initialized (https://github.com/GlobalArrays/ga/blob/master/global/src/global.nalg.c#L1082) because ga_malloc defaults to using MA (https://github.com/GlobalArrays/ga/blob/master/global/src/ga_malloc.c).

I contend that GA should just use ARMCI_Malloc_local here since that will never fail and is the right kind of memory anyways. While ARMCI_Malloc_local might not be as cheap as MA_Push, it's trivial compared to GA_Transpose.

This is a TODO for me to fix.

@jeffhammond jeffhammond self-assigned this May 25, 2020
jeffhammond added a commit to jeffhammond/ga that referenced this issue Mar 20, 2021
GA_Transpose uses ga_malloc, which defaults to using
MA memory.  this means that GA requires MA to be initialized.
this changes GA_Transpose to use ARMCI memory.

if ARMCI_Malloc_local is much slower than ga_malloc, this might
be noticeable (because the block is called in the loop)
but that seems unlikely at this point.

it is trivial to revert to the old implementation by undefining
the macro (GA_TRANSPOSE_USE_ARMCI_MEM) that enables this.

fix issue GlobalArrays#163

Signed-off-by: Jeff Hammond <jeff.science@gmail.com>
bjpalmer pushed a commit that referenced this issue Nov 18, 2022
GA_Transpose uses ga_malloc, which defaults to using
MA memory.  this means that GA requires MA to be initialized.
this changes GA_Transpose to use ARMCI memory.

if ARMCI_Malloc_local is much slower than ga_malloc, this might
be noticeable (because the block is called in the loop)
but that seems unlikely at this point.

it is trivial to revert to the old implementation by undefining
the macro (GA_TRANSPOSE_USE_ARMCI_MEM) that enables this.

fix issue #163

Signed-off-by: Jeff Hammond <jeff.science@gmail.com>

Signed-off-by: Jeff Hammond <jeff.science@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant