You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi all!
similar to a previous issue I am having problems when injenct faults in a very simple kernel of matrix mult and the number of threads/block is different to 32x32 (1024). Any other value (e.g.: 16x16) produces some "Error not injected" results.
All the versions compile and pass the test (works correctly). I also tried to inject faults with DUMMY flag with the same results (some dummy injections work others don´t). In all the cases I ´ve rerun the profiler to be sure all is ok.
I´ve activated the VERBOSE_TOOLS flags but the info is difficult to interpret since some numbers haven´t got any identifier.
I´ve checked with Jetson nano and TX2 boards with the same result and using different matrix sizes.
The kernel is very simple: global void matrixMulCUDA(float *C, float *A, float B, int ldA, int ldB, int ldC) {
int i = blockIdx.y * blockDim.y + threadIdx.y;
int j = blockIdx.x * blockDim.x + threadIdx.x;
float ptrA = &A[ildA]; // Pointer to the first element of row i of A
float tmp = 0.0f;
for (int k = 0; k < ldA; k++) {
tmp += (ptrA++) * B[kldB+j];
}
C[ildC+j] = tmp;
}
Same issue here. Tried with hotspot and srad_v2 from Rodinia benchmark suite and I get plenty of "Error not injected" results.
The simple_add which is included in nvbitfi works without issues. Any ideas?
Thanks.
Update
It seems that the optimization in profiler/inject_func.cu is causing the issue (at least in my Tesla K20c gpu). By letting every thread to use the atomicAdd then the instruction group counters are computed correctly and no more "Error not injected" results. So as a workaround just replace the code of the count_instrs function with this
uint64_t *counters = (uint64_t*)pcounters;
atomicAdd((unsigned long long *)&counters[index], 1);
atomicAdd((unsigned long long *)&counters[NUM_ISA_INSTRUCTIONS+grp_index], 1);
atomicAdd((unsigned long long *)&counters[num_counters-2], grp_index != G_NODEST);
atomicAdd((unsigned long long *)&counters[num_counters-1], 1 - ((grp_index == G_NODEST) || (grp_index == G_PR)));
Thanks Dimitris,
the preliminary test shows that the workaround works fine with the matrixmult in Jetson nano (Maxwell gpu).
I leave the issue open until futher test are performed.
Update
workaround tested sucessfully with several benchmarks.
Hi all!
similar to a previous issue I am having problems when injenct faults in a very simple kernel of matrix mult and the number of threads/block is different to 32x32 (1024). Any other value (e.g.: 16x16) produces some "Error not injected" results.
kernelName=matrixMulCUDA(float*,float*,float*,int,int,int)
kernelCount=0
groupID=7
bitFlipModel=0
instID=21059427
opIDSeed=0.401131
bitIDSeed=0.326217
inspecting: matrixMulCUDA(float*,float*,float*,int,int,int)
num_static_instrs: 282
maxregs: 32(32)
Injection data
index: 0
kernel_name: matrixMulCUDA(float*,float*,float*,int,int,int)
ctas: 64
instrs: 14057472
grp 0: 0 grp 1: 2097152 grp 2: 4194304 grp 3: 262144 grp 4: 1130496 grp 5: 6373376 grp 6: 12926976 grp 7: 12664832
mask: 0x0
beforeVal: 0x0;afterVal: 0x0
regNo: -1
opcode: NOP
pcOffset: 0x0
tid: -1
Error not injected
All the versions compile and pass the test (works correctly). I also tried to inject faults with DUMMY flag with the same results (some dummy injections work others don´t). In all the cases I ´ve rerun the profiler to be sure all is ok.
I´ve activated the VERBOSE_TOOLS flags but the info is difficult to interpret since some numbers haven´t got any identifier.
I´ve checked with Jetson nano and TX2 boards with the same result and using different matrix sizes.
The kernel is very simple:
global void matrixMulCUDA(float *C, float *A, float B, int ldA, int ldB, int ldC) {
int i = blockIdx.y * blockDim.y + threadIdx.y;
int j = blockIdx.x * blockDim.x + threadIdx.x;
float ptrA = &A[ildA]; // Pointer to the first element of row i of A
float tmp = 0.0f;
for (int k = 0; k < ldA; k++) {
tmp += (ptrA++) * B[kldB+j];
}
C[ildC+j] = tmp;
}
main() {
....
int block_size = 16;
dim3 dimsA(128, 128, 1); dim3 dimsB(128, 128, 1); dim3 dimsC(128, 128, 1);
dim3 threads(block_size, block_size);
dim3 grid(dimsC.x / threads.x, dimsC.y / threads.y);
....
matrixMulCUDA<<< grid, threads >>>(d_C, d_A, d_B, dimsA.x, dimsB.x, dimsC.x);
....
}
Please could you give me some hints for debugging the problem?
Thank you in advance.
The text was updated successfully, but these errors were encountered: