New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broadcast using shmem #1
Comments
I am not able to replicate this with a simple test case similar to what you provided (below). The
#include <shmem.h>
long pSyncA[SHMEM_BCAST_SYNC_SIZE] = { SHMEM_SYNC_VALUE };
typedef unsigned int u32;
typedef unsigned char u8;
int main( void )
{
shmem_init();
const int N = 256;
const int M = 44;
int id = shmem_my_pe();
int npes = shmem_n_pes();
u32* rk = (u32*)shmem_malloc(M * sizeof(u32));
u32* Td0 = (u32*)shmem_malloc(N * sizeof(u32));
u8* Td4s = (u8*)shmem_malloc(N * sizeof(u8));
for (int i = 0; i < M; i++) {
if (id == 0) rk[i] = i + 1;
}
for (int i = 0; i < N; i++) {
if (id == 0) Td0[i] = i;
else Td0[i] = 0;
if (id == 0) Td4s[i] = (u8)i;
else Td4s[i] = 0;
}
shmem_barrier_all();
shmem_broadcast32 (rk, rk, M, 0, 0, 0, npes, pSyncA);
shmem_broadcast32 (Td4s, Td4s, N/4, 0, 0, 0, npes, pSyncA); // 32-bit broadcasting 8-bit values (divide by 4)
shmem_broadcast32 (Td0, Td0, N, 0, 0, 0, npes, pSyncA);
int err = 0;
for (int i = 0; i < M; i++) if (rk[i] != (i+1)) err++;
for (int i = 0; i < N; i++) if (Td0[i] != i) err++;
for (int i = 0; i < N; i++) if (Td4s[i] != i) err++;
if (err) printf("# %d: ERROR: %d values were not broadcast correctly\n", id, err);
} Compile with:
Run with:
|
Hi, Thanks aese.c.txt |
@FSM-GIT For your information (this may not be your problem):
The COPRTHR-2 API, presently, doesn't provide a mechanism to identify and wait for the data to arrive on the core. There is a method to do this in software:
I have seen the error I described with small arrays being incompletely copied after a DMA and immediately used. |
Thank you for your prompt reply. |
@FSM-GIT I was able to reproduce your error this time. There appears to be two issues but they are both easy to fix. Issue 1: In short, the compiler does unsafe things when the You will have to rebuild the library after you've made the change. Issue 2: if (id == 0)
{
volatile u32* plast = Te0 + 255;
u32 inv_last = ~temp_Te0[255];
*plast = inv_last;
coprthr_memcopy_align(Te0, temp_Te0, 256 * sizeof(u32), COPRTHR2_M_DMA_0);
coprthr_wait(COPRTHR2_E_DMA_0);
while (*plast == inv_last);
plast = rk + 43;
inv_last = ~temp_rk[43];
*plast = inv_last;
coprthr_memcopy_align(rk, temp_rk, 44 * sizeof(u32), COPRTHR2_M_DMA_0);
coprthr_wait(COPRTHR2_E_DMA_0);
while (*plast == inv_last);
} Alternatively, you can do without the DMA copy and manually copy arrays with a loop. It's not much data. You should be able to remove the Other comments: |
Thank you so much for your attention. |
Hi there
Once I intend to broadcast a variable in attached program called Td0 which is an unsigned int array containing 256 elements, I have faced a problem in the output (after broadcasting ). The last 8 elements of mentioned array did copy wrongly, the only solution which I found is to pass the number of elements more than 256 ( actual size) to 256+8!! I can not understand what is happened.
Kindly find the enclosed.
aesd.c.txt
aesd_epi_helper.c.txt
aesd_epi_helper.h.txt
device.c.txt
Makefile.txt
The text was updated successfully, but these errors were encountered: