Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error not injected when threads/block different to 1024 #7

Closed
sergicuen opened this issue Jun 16, 2021 · 2 comments
Closed

Error not injected when threads/block different to 1024 #7

sergicuen opened this issue Jun 16, 2021 · 2 comments

Comments

@sergicuen
Copy link

Hi all!
similar to a previous issue I am having problems when injenct faults in a very simple kernel of matrix mult and the number of threads/block is different to 32x32 (1024). Any other value (e.g.: 16x16) produces some "Error not injected" results.

kernelName=matrixMulCUDA(float*,float*,float*,int,int,int)
kernelCount=0
groupID=7
bitFlipModel=0
instID=21059427
opIDSeed=0.401131
bitIDSeed=0.326217
inspecting: matrixMulCUDA(float*,float*,float*,int,int,int)
num_static_instrs: 282
maxregs: 32(32)
Injection data
index: 0
kernel_name: matrixMulCUDA(float*,float*,float*,int,int,int)
ctas: 64
instrs: 14057472
grp 0: 0 grp 1: 2097152 grp 2: 4194304 grp 3: 262144 grp 4: 1130496 grp 5: 6373376 grp 6: 12926976 grp 7: 12664832
mask: 0x0
beforeVal: 0x0;afterVal: 0x0
regNo: -1
opcode: NOP
pcOffset: 0x0
tid: -1
Error not injected

All the versions compile and pass the test (works correctly). I also tried to inject faults with DUMMY flag with the same results (some dummy injections work others don´t). In all the cases I ´ve rerun the profiler to be sure all is ok.
I´ve activated the VERBOSE_TOOLS flags but the info is difficult to interpret since some numbers haven´t got any identifier.
I´ve checked with Jetson nano and TX2 boards with the same result and using different matrix sizes.

The kernel is very simple:
global void matrixMulCUDA(float *C, float *A, float B, int ldA, int ldB, int ldC) {
int i = blockIdx.y * blockDim.y + threadIdx.y;
int j = blockIdx.x * blockDim.x + threadIdx.x;
float ptrA = &A[ildA]; // Pointer to the first element of row i of A
float tmp = 0.0f;
for (int k = 0; k < ldA; k++) {
tmp += (ptrA++) * B[kldB+j];
}
C[i
ldC+j] = tmp;
}

main() {
....
int block_size = 16;
dim3 dimsA(128, 128, 1); dim3 dimsB(128, 128, 1); dim3 dimsC(128, 128, 1);
dim3 threads(block_size, block_size);
dim3 grid(dimsC.x / threads.x, dimsC.y / threads.y);
....
matrixMulCUDA<<< grid, threads >>>(d_C, d_A, d_B, dimsA.x, dimsB.x, dimsC.x);
....
}

Please could you give me some hints for debugging the problem?

Thank you in advance.

@dsartzet
Copy link

dsartzet commented Jul 18, 2021

Same issue here. Tried with hotspot and srad_v2 from Rodinia benchmark suite and I get plenty of "Error not injected" results.
The simple_add which is included in nvbitfi works without issues. Any ideas?
Thanks.

Update
It seems that the optimization in profiler/inject_func.cu is causing the issue (at least in my Tesla K20c gpu). By letting every thread to use the atomicAdd then the instruction group counters are computed correctly and no more "Error not injected" results. So as a workaround just replace the code of the count_instrs function with this

uint64_t *counters = (uint64_t*)pcounters;
atomicAdd((unsigned long long *)&counters[index], 1);
atomicAdd((unsigned long long *)&counters[NUM_ISA_INSTRUCTIONS+grp_index], 1);
atomicAdd((unsigned long long *)&counters[num_counters-2], grp_index != G_NODEST);
atomicAdd((unsigned long long *)&counters[num_counters-1], 1 - ((grp_index == G_NODEST) || (grp_index == G_PR)));

@sergicuen
Copy link
Author

sergicuen commented Jul 19, 2021

Thanks Dimitris,
the preliminary test shows that the workaround works fine with the matrixmult in Jetson nano (Maxwell gpu).
I leave the issue open until futher test are performed.

Update
workaround tested sucessfully with several benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants