Skip to content

Commit

Permalink
use ARMCI not MA mem in GA_Transpose (#211)
Browse files Browse the repository at this point in the history
GA_Transpose uses ga_malloc, which defaults to using
MA memory.  this means that GA requires MA to be initialized.
this changes GA_Transpose to use ARMCI memory.

if ARMCI_Malloc_local is much slower than ga_malloc, this might
be noticeable (because the block is called in the loop)
but that seems unlikely at this point.

it is trivial to revert to the old implementation by undefining
the macro (GA_TRANSPOSE_USE_ARMCI_MEM) that enables this.

fix issue #163

Signed-off-by: Jeff Hammond <jeff.science@gmail.com>

Signed-off-by: Jeff Hammond <jeff.science@gmail.com>
  • Loading branch information
jeffhammond authored Nov 18, 2022
1 parent 4e0c0a8 commit 660561e
Showing 1 changed file with 15 additions and 0 deletions.
15 changes: 15 additions & 0 deletions global/src/global.nalg.c
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@
#include "ga-wapi.h"
#include "base.h"

// workaround https://github.com/GlobalArrays/ga/issues/163
// by using ARMCI_Malloc_local instead of ga_malloc -> MA
#define GA_TRANSPOSE_USE_ARMCI_MEM 1

#ifdef MSG_COMMS_MPI
extern ARMCI_Group* ga_get_armci_group_(int);
#endif
Expand Down Expand Up @@ -981,7 +985,14 @@ _iterator_hdl hdl;
int i, size=GAsizeofM(atype);

nelem = (hi[0]-lo[0]+1)*(hi[1]-lo[1]+1);
#ifdef GA_TRANSPOSE_USE_ARMCI_MEM
{
size_t bytes = nelem * size;
ptr_tmp = ARMCI_Malloc_local(bytes);
}
#else
ptr_tmp = (char *) ga_malloc(nelem, atype, "transpose_tmp");
#endif

nrow = hi[0] -lo[0]+1;
ncol = hi[1] -lo[1]+1;
Expand All @@ -995,7 +1006,11 @@ _iterator_hdl hdl;
ptr_a += ld[0]*size;
}
pnga_put(g_b, lob, hib, ptr_tmp ,&ncol);
#ifdef GA_TRANSPOSE_USE_ARMCI_MEM
ARMCI_Free_local(ptr_tmp);
#else
ga_free(ptr_tmp);
#endif
}
#else
num_blocks_a = pnga_total_blocks(g_a);
Expand Down

0 comments on commit 660561e

Please sign in to comment.