[BUG] Wrong result for `miopenOpTensor()` #79

ghostplant · 2019-02-01T09:29:24Z

Source code attached below:

#include <hip/hip_runtime.h>
#include <hipblas.h>
#include <miopen/miopen.h>
#include <assert.h>
#include <vector>

using std::vector;

int main() {
  const int len = 9;

  hipSetDevice(0);
  miopenHandle_t miopen_handle;
  miopenCreate(&miopen_handle);

  miopenTensorDescriptor_t tensor;
  miopenCreateTensorDescriptor(&tensor);
  miopenSet4dTensorDescriptor(tensor, miopenFloat, len, 1, 1, 1);

  float *in, *out;
  hipMalloc((void**)&in, sizeof(float) * len);
  hipMalloc((void**)&out, sizeof(float) * len);

  float host_in[len] = { 1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f, 9.0f}, host_out[len] = {};
  hipMemcpyHtoD(in, host_in, sizeof(float) * len);

  float alpha = 1.0f, beta = 0.0f;
  miopenOpTensor(miopen_handle, miopenTensorOpAdd, &alpha, tensor, in, &alpha, tensor, in, &beta, tensor, out);
  hipMemcpyDtoH(host_out, out, sizeof(float) * len);
  hipStreamSynchronize(0);

  // output: 2 nan nan 8 nan nan 14 nan nan
  // expect: 2 4 6 8 10 12 14 16 18
  for (int i = 0; i < len; ++i)
    printf("%g ", host_out[i]);
  puts("");
  return 0;
}

It outputs partial NaN results on gfx803 which is not expected, and this bug is reproducible every time.

ghostplant · 2019-02-01T09:31:01Z

If using len = 8 instead of len = 9, the result would be correct.

ghostplant · 2019-02-01T13:18:03Z

The root cause seems to be a wrong execute of float3 data copy, the following patch fixes this bug:

NOTE: A typical *((float3*)y) = *((float3*)x) just copy the first float element and skip the second and third elements.

@@ -909,15 +910,24 @@ __kernel void Op4dTensorLite(const global MIOPEN_TYPE* a,
                              const long Coffset)
 {
     int gid0 = get_global_id(0);
-
     int index = gid0 * RD_BLCK;

     MIOPEN_TYPE a_dat[RD_BLCK];
     MIOPEN_TYPE b_dat[RD_BLCK];
     MIOPEN_TYPE c_dat[RD_BLCK];

+#if RD_BLCK == 3
+    a_dat[0] = (a + index + Aoffset)[0];
+    a_dat[1] = (a + index + Aoffset)[1];
+    a_dat[2] = (a + index + Aoffset)[2];
+    b_dat[0] = (b + index + Boffset)[0];
+    b_dat[1] = (b + index + Boffset)[1];
+    b_dat[2] = (b + index + Boffset)[2];
+#else
     *((READ_TYPE*)a_dat) = *((const global READ_TYPE*)(a + index + Aoffset));
     *((READ_TYPE*)b_dat) = *((const global READ_TYPE*)(b + index + Boffset));
+#endif
+
 #ifdef BETA
+ // Also need correct copy if RD_BLCK == 3
     *((READ_TYPE*)c_dat) = *((const global READ_TYPE*)(c + index + Coffset));
 #endif

mythreyi22 · 2019-03-28T06:17:15Z

The root cause seems to be a wrong execute of float3 data copy, the following patch fixes this bug:

NOTE: A typical *((float3*)y) = *((float3*)x) just copy the first float element and skip the second and third elements.

@@ -909,15 +910,24 @@ __kernel void Op4dTensorLite(const global MIOPEN_TYPE* a,
                              const long Coffset)
 {
     int gid0 = get_global_id(0);
-
     int index = gid0 * RD_BLCK;

     MIOPEN_TYPE a_dat[RD_BLCK];
     MIOPEN_TYPE b_dat[RD_BLCK];
     MIOPEN_TYPE c_dat[RD_BLCK];

+#if RD_BLCK == 3
+    a_dat[0] = (a + index + Aoffset)[0];
+    a_dat[1] = (a + index + Aoffset)[1];
+    a_dat[2] = (a + index + Aoffset)[2];
+    b_dat[0] = (b + index + Boffset)[0];
+    b_dat[1] = (b + index + Boffset)[1];
+    b_dat[2] = (b + index + Boffset)[2];
+#else
     *((READ_TYPE*)a_dat) = *((const global READ_TYPE*)(a + index + Aoffset));
     *((READ_TYPE*)b_dat) = *((const global READ_TYPE*)(b + index + Boffset));
+#endif
+
 #ifdef BETA
+ // Also need correct copy if RD_BLCK == 3
     *((READ_TYPE*)c_dat) = *((const global READ_TYPE*)(c + index + Coffset));
 #endif

This does not solve the issue. Facing the same issue after using this patch. Is it complete?

-Mythreyi

ghostplant · 2019-03-28T06:56:17Z

@mythreyi22 No, it is not complete cause every data cast to float3 (defined by READ_TYPE) should be expanded into 3 lines of float1 data copy.

atamazov · 2019-03-28T08:04:17Z

float3 has stricter alignment than array of floats. That is why casting an address of array element to float3 ptr may yield invalid ptr.

ghostplant · 2019-03-28T08:06:07Z

@atamazov So any better solution other than expanding these codes?

atamazov · 2019-03-28T08:06:25Z

The root cause seems to be a wrong execute of float3 data copy

This is wrong data copy.

The fix might be performance waste.

Sorry.

atamazov · 2019-03-28T08:08:26Z

Addresses of source & destination objects shall comply with float3/float4 alignment requirements if float3 ptrs are used.

We'll face the Undefined Behavior nightmare otherwise.

atamazov · 2019-03-28T08:14:11Z

Unfortunately I can't see the whole context from mobile phone, and thus unable to give more specific/accurate recommendation.

ghostplant · 2019-03-28T08:18:12Z

@atamazov The problem is batch of input data is supposed to be successively given and expected to be successively stored as well. If we follow the OpenCL float3 alignment, OpTensorAdd cannot fully handle adjacent data since 2 adjacent data of type-float3 would never meet the alignment requirements together.

atamazov · 2019-03-28T08:23:46Z

2 adjacent data of type-float3 would never meet the alignment requirements together.

They always meet. Any float3 object has unused (hidden) 4th element at the end (gap). Alignments of float3 and float4 are the same.

If we follow the OpenCL float3 alignment...

We have UB if we don't))

ghostplant · 2019-03-28T08:27:49Z

@atamazov So what is the specific float4 alignment requirements? e.g. addr % sizeof(float4) == 0?

ghostplant · 2019-03-28T09:24:03Z

@atamazov Can you explain why a 9-successive-float array can satisfy the alignment together? If the starting address of the array satisfy the alignment, who will manage the calculation of the 4th element in this array?

ce1adon · 2019-04-01T18:12:00Z

@ghostplant @mythreyi22
Sorry for the inconvenience!
patch.zip

Please replace your MIOpen/src/ocl/tensorocl.cpp with attached file.
Also please update when next version MIOpen comes out.
Thank you!

atamazov · 2019-04-01T18:40:48Z

@ce1adon This is diff between latest 1.7 release and contents of patch.zip you've uploaded. Is it correct?

issue-79-patch-001.diff.txt

ce1adon · 2019-04-01T18:45:59Z

@atamazov This is based on master branch of this repo.

atamazov · 2019-04-01T18:51:57Z

@ce1adon Most likely it is ~= 1.7.x. If the diff is correct (I think it is), then it could be applied (which seems more common than overwriting the whole file).

mythreyi22 · 2019-04-02T08:57:07Z

@ce1adon, Thanks!

FelixSchwarz · 2019-04-12T07:42:29Z

MIOpen 1.8 was released and lists this issue as fixed. Based on the patch from @ce1adon I think the exact commit was 903b1f1 .

ghostplant · 2019-04-13T00:55:34Z

Confirmed fixed. But found another bug from miopenTransformTensor to calculate y = 0.7 * y + 0.4 * x whose answer is also wrong.

daniellowell assigned asroy Feb 5, 2019

daniellowell added the bug label Feb 5, 2019

daniellowell assigned ce1adon and unassigned asroy Feb 5, 2019

ghostplant changed the title ~~[BUG] Wrong answer for miopenOpTensor()~~ [BUG] Wrong result for miopenOpTensor() Feb 13, 2019

ghostplant closed this as completed Apr 13, 2019

alexandraBara mentioned this issue Sep 11, 2020

Solver generic_search fail: ConvHipImplicitGemmBwdDataV1R1Xdlops and ConvHipImplicitGemmForwardV4R4Xdlops #427

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Wrong result for `miopenOpTensor()` #79

[BUG] Wrong result for `miopenOpTensor()` #79

ghostplant commented Feb 1, 2019 •

edited

Loading

ghostplant commented Feb 1, 2019

ghostplant commented Feb 1, 2019 •

edited

Loading

mythreyi22 commented Mar 28, 2019

ghostplant commented Mar 28, 2019 •

edited

Loading

atamazov commented Mar 28, 2019

ghostplant commented Mar 28, 2019

atamazov commented Mar 28, 2019 •

edited

Loading

atamazov commented Mar 28, 2019 •

edited

Loading

atamazov commented Mar 28, 2019

ghostplant commented Mar 28, 2019

atamazov commented Mar 28, 2019 •

edited

Loading

ghostplant commented Mar 28, 2019

ghostplant commented Mar 28, 2019

ce1adon commented Apr 1, 2019

atamazov commented Apr 1, 2019 •

edited

Loading

ce1adon commented Apr 1, 2019

atamazov commented Apr 1, 2019

mythreyi22 commented Apr 2, 2019

FelixSchwarz commented Apr 12, 2019

ghostplant commented Apr 13, 2019 •

edited

Loading

[BUG] Wrong result for miopenOpTensor() #79

[BUG] Wrong result for miopenOpTensor() #79

Comments

ghostplant commented Feb 1, 2019 • edited Loading

ghostplant commented Feb 1, 2019

ghostplant commented Feb 1, 2019 • edited Loading

mythreyi22 commented Mar 28, 2019

ghostplant commented Mar 28, 2019 • edited Loading

atamazov commented Mar 28, 2019

ghostplant commented Mar 28, 2019

atamazov commented Mar 28, 2019 • edited Loading

atamazov commented Mar 28, 2019 • edited Loading

atamazov commented Mar 28, 2019

ghostplant commented Mar 28, 2019

atamazov commented Mar 28, 2019 • edited Loading

ghostplant commented Mar 28, 2019

ghostplant commented Mar 28, 2019

ce1adon commented Apr 1, 2019

atamazov commented Apr 1, 2019 • edited Loading

ce1adon commented Apr 1, 2019

atamazov commented Apr 1, 2019

mythreyi22 commented Apr 2, 2019

FelixSchwarz commented Apr 12, 2019

ghostplant commented Apr 13, 2019 • edited Loading

[BUG] Wrong result for `miopenOpTensor()` #79

[BUG] Wrong result for `miopenOpTensor()` #79

ghostplant commented Feb 1, 2019 •

edited

Loading

ghostplant commented Feb 1, 2019 •

edited

Loading

ghostplant commented Mar 28, 2019 •

edited

Loading

atamazov commented Mar 28, 2019 •

edited

Loading

atamazov commented Mar 28, 2019 •

edited

Loading

atamazov commented Mar 28, 2019 •

edited

Loading

atamazov commented Apr 1, 2019 •

edited

Loading

ghostplant commented Apr 13, 2019 •

edited

Loading