New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

double_buffer schedule conflict with tenorize #2581

Closed
souptc opened this Issue Feb 10, 2019 · 1 comment

Comments

Projects
None yet
2 participants
@souptc
Copy link
Contributor

souptc commented Feb 10, 2019

A simple example:

A = tvm.placeholder((25, 100, `4),` name='A')
B = topi.nn.relu(A)

s = tvm.create_schedule(B.op)
AA = s.cache_read(A, 'global', [B])
s[AA].compute_at(s[B], B.op.axis[0])
s[AA].double_buffer()
stmt = (tvm.lower(s, [A, B], simple_mode=True))
print(stmt)

The double buffer schedule looks correct.

// attr [A.global] storage_scope = "global"
allocate A.global[float32 * 2 * 1 * 128]
produce compute {
  produce A.global {
    for (ax1, 0, 128) {
      A.global[ax1] = A[ax1]
    }
  }
  for (i0.outer, 0, 24) {
    // attr [A.global] double_buffer_write = 1
    produce A.global {
      for (ax1, 0, 128) {
        A.global[((((i0.outer + 1) % 2)*128) + ax1)] = A[(((i0.outer*128) + ax1) + 128)]
      }
    }
    for (i1, 0, 128) {
      compute[((i0.outer*128) + i1)] = max(A.global[(((i0.outer % 2)*128) + i1)], 0.000000f)
    }
  }
  for (i1, 0, 128) {
    compute[(i1 + 3072)] = max(A.global[i1], 0.000000f)
  }
}

But if I tile the data and tensorize it with my custom copy intrinsic:

AA = s.cache_read(A, 'global', [B])
o, i = s[AA].split(AA.op.axis[1], 16)
copy_intrinsic = intrin_copy(16)
s[AA].compute_at(s[B], B.op.axis[0])
s[AA].tensorize(i, copy_intrinsic)
s[AA].double_buffer()
stmt = (tvm.lower(s, [A, B], simple_mode=True))
print(stmt)

Then the double buffer schedule seems gone, only an AttrStmt left:

// attr [A.global] storage_scope = "global"
allocate A.global[float32 * 1 * 128]
produce compute {
  for (i0, 0, 25) {
    // attr [A.global] double_buffer_scope = 1
    produce A.global {
      for (ax1.outer, 0, 8) {
        vv_relu(tvm_address_of(A.global[(ax1.outer*16)]), tvm_address_of(A[(((i0*8) + ax1.outer)*16)]))
      }
    }
    for (i1, 0, 128) {
      compute[((i0*128) + i1)] = max(A.global[i1], 0.000000f)
    }
  }
}

Is it a bug?

@tqchen

This comment has been minimized.

Copy link
Member

tqchen commented Feb 10, 2019

Thanks for bringing this up, let us move the topic to https://discuss.tvm.ai/

@tqchen tqchen closed this Feb 10, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment