To be simple, the cuda code that generated by auto-scheduler from file
/tutorials/auto_scheduler/tune_conv2d_layer_cuda.py
I call it as following:
dim3 dimBlock(128);
dim3 dimGrid(16);
default_function_kernel0<<<dimGrid, dimBlock>>>(data, kernel, compute, bias);
cudaDeviceSynchronize();
Then bug occurred :

kernel ruined cuda api? or something
If I do not run the kernel, nothing wrong.

I cannot figure out what happened, can you help me?
P4 and T4, I used, with cuda-10.2