-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Wrong result for miopenOpTensor()
#79
Comments
If using |
The root cause seems to be a wrong execute of NOTE: A typical @@ -909,15 +910,24 @@ __kernel void Op4dTensorLite(const global MIOPEN_TYPE* a,
const long Coffset)
{
int gid0 = get_global_id(0);
-
int index = gid0 * RD_BLCK;
MIOPEN_TYPE a_dat[RD_BLCK];
MIOPEN_TYPE b_dat[RD_BLCK];
MIOPEN_TYPE c_dat[RD_BLCK];
+#if RD_BLCK == 3
+ a_dat[0] = (a + index + Aoffset)[0];
+ a_dat[1] = (a + index + Aoffset)[1];
+ a_dat[2] = (a + index + Aoffset)[2];
+ b_dat[0] = (b + index + Boffset)[0];
+ b_dat[1] = (b + index + Boffset)[1];
+ b_dat[2] = (b + index + Boffset)[2];
+#else
*((READ_TYPE*)a_dat) = *((const global READ_TYPE*)(a + index + Aoffset));
*((READ_TYPE*)b_dat) = *((const global READ_TYPE*)(b + index + Boffset));
+#endif
+
#ifdef BETA
+ // Also need correct copy if RD_BLCK == 3
*((READ_TYPE*)c_dat) = *((const global READ_TYPE*)(c + index + Coffset));
#endif
|
miopenOpTensor()
miopenOpTensor()
This does not solve the issue. Facing the same issue after using this patch. Is it complete? -Mythreyi |
@mythreyi22 No, it is not complete cause every data cast to |
float3 has stricter alignment than array of floats. That is why casting an address of array element to float3 ptr may yield invalid ptr. |
@atamazov So any better solution other than expanding these codes? |
This is wrong data copy. The fix might be performance waste. Sorry. |
Addresses of source & destination objects shall comply with float3/float4 alignment requirements if float3 ptrs are used. We'll face the Undefined Behavior nightmare otherwise. |
Unfortunately I can't see the whole context from mobile phone, and thus unable to give more specific/accurate recommendation. |
@atamazov The problem is batch of input data is supposed to be successively given and expected to be successively stored as well. If we follow the OpenCL float3 alignment, OpTensorAdd cannot fully handle adjacent data since 2 adjacent data of type-float3 would never meet the alignment requirements together. |
They always meet. Any float3 object has unused (hidden) 4th element at the end (gap). Alignments of float3 and float4 are the same.
We have UB if we don't)) |
@atamazov So what is the specific float4 alignment requirements? e.g. addr % sizeof(float4) == 0? |
@atamazov Can you explain why a 9-successive-float array can satisfy the alignment together? If the starting address of the array satisfy the alignment, who will manage the calculation of the 4th element in this array? |
@ghostplant @mythreyi22 Please replace your MIOpen/src/ocl/tensorocl.cpp with attached file. |
@ce1adon This is diff between latest 1.7 release and contents of |
@atamazov This is based on master branch of this repo. |
@ce1adon Most likely it is ~= 1.7.x. If the diff is correct (I think it is), then it could be applied (which seems more common than overwriting the whole file). |
@ce1adon, Thanks! |
Confirmed fixed. But found another bug from |
Source code attached below:
It outputs partial NaN results on gfx803 which is not expected, and this bug is reproducible every time.
The text was updated successfully, but these errors were encountered: