[RFC] Introducing DeclBuffer #70

vinx13 · 2022-05-10T23:25:38Z

This is a follow-up of apache/tvm#9727 and RFC#63. Currently buffer can be implicitly declared and then used. The implicit behavior can be error prone and makes analysis more difficult. This RFC introduces DeclBuffer, a new IR construct as an explicit statement for buffer declaration.

Rendered version: https://github.com/vinx13/tvm-rfcs/blob/decl-buffer/rfcs/0070-introducing-decl-buffer.md

cc @tqchen @Lunderberg @junrushao1994 @csullivan @mbs-octoml @jroesch @areusch @wrongtest @Hzfengsy

Co-authored-by: Eric Lunderberg <Lunderberg@users.noreply.github.com>

wrongtest-intellif · 2022-05-11T08:25:13Z

Thanks a lot! I think then we can handle buffer related issues in customized passes with more explicit and robust way.

I have one question on tir script, for certain algorithms in DL workloads, users may want to write non-stir formed script like

x = T.allocate((), "int32", "")
x[()] = 0
while x[()] < 128:
    x[()] = x[()] + 1
    # ...

Could the parser support still write things like that (though underlying IR structure changed) instead of

x_data = T.allocate((), "int32", "")
x = T.decl_buffer(data=x_data,)
x[()] = 0
# ...

vinx13 · 2022-05-11T18:18:11Z

@wrongtest Thanks for bringing up this. There are a few options for the behavior in TVM script, I'm open to discussion.

A1: The original behavior before [TE][TIR] Implement layout transformations, non-flat memory buffers tvm#9727:
T.allocate returns a Var, which can be later used in T.load / T.store.
A2: Current behavior:
T.allocate returns a buffer (Var is implicitly created inside). The buffer can be accessed via subscripts, which will be translated to BufferLoad / BufferStore.
A3: Potential behavior for this RFC:
T.allocate returns a Var, use T.decl_buffer to create buffer and access it (because T.load / T.store are deprecated). This follows the translation to TIR nodes closely.
A4: Potential behavior for this RFC (the one @wrongtest is proposing):
If there are no buffer aliases (in most of the cases), it is tempting to avoid boilerplate code of T.decl_buffer.
T.allocate returns a buffer, where buffer->data is implicitly created and allocated inside. The buffer can be accessed via subscripts. Subsequent buffer aliases can be created by referring to buffer->data. This requires DeclBuffer to be created implicitly, it is okay to have some discrepancy between parser side and the TIR nodes.

areusch

thanks @vinx13 , couple questions on this one as well

areusch · 2022-05-11T18:11:21Z

rfcs/0070-introducing-decl-buffer.md

+# Unresolved questions
+[unresolved-questions]: #unresolved-questions
+
+Should low-level code generators handle buffer aliases?  One option would be to remove them in a


is this basically asking whether a codegen should have to track the data variables, or whether we should introduce an explicit TIR node that more readily translates to an index-into-opaque-pointer with type info?

It is asking whether buffer aliasing should be left to codegen (codegen should track the data variable), or buffer aliasing should be unified in an earlier pass.

areusch · 2022-05-11T18:17:58Z

rfcs/0070-introducing-decl-buffer.md

+However, `T.buffer_decl` doesn’t translate to a node in AST. The AST will be
+```
+PrimFunc {
+  buffer_map: {A: Buffer[(16,), "float"},


the A.data member should theoretically appear here, whether or not in repr, right? i think the central challenge here is determining that the data for a buffer is A's data member. is that right?

Yes. it should be A_data: Buffer(data=A_data, ...)

areusch · 2022-05-11T18:19:31Z

rfcs/0070-introducing-decl-buffer.md

+  data: A_data(Var(name=...))
+  extent: ...
+  body: DeclBuffer {
+    buffer: A(data=A_data, dtype=..., shape=...),


it's always required to supply data in decl_buffer world, right? so this changes the process of identifying aliases by making it explicit that all backing buffers would come from tir.allocate nodes or from function arguments, right?

Supplying data in decl_buffer is preferred. I mentioned an alternative in line 88, e.g

DeclBuffer { buffer: A(data=A_data(Var(...)), dtype=..., shape=...), body: Allocate{ data=A_data, body: ... } }

The question here is whether this should be allowed

i think it would be great to just be prescriptive here so that people don't try to write 10 different forms of buffer declaration. we can always expand the set of supported use cases if people can supply rationale, but reducing them after people check in a bunch of code without design docs is pretty tricky. to that end, could we clarify where we expect to get a data value from?

I also am not quite following this notation: A_data(Var(...)). Is A_data somehow a function here?

The question here is whether this should be allowed

I think I'd lean toward removing the alternative form and requiring the definition of the data variable to be defined prior to use in a DeclBuffer node in the PrimFunc's TIR. If a user gets an error message that states A is undefined, they may point at the DeclBuffer node and say that of course it is defined, conflating the definition of the buffer and the definition of the buffer's backing allocation. The confusion is especially likely, because the default behavior of tvm.tir.decl_buffer is to name the BufferNode and the VarNode with the same name. By making it impossible to have a buffer without a defined data variable, we avoid potential confusion about why a buffer is partially undefined.

that all backing buffers would come from tir.allocate nodes or from function arguments

They could also come from return values of functions. After MakePackedAPI, the backing buffers are the return value of @tir.tvm_struct_get. It could also be an entirely separate function call, such as data: T.Ptr[T.int32] = T.call_extern("device_specific_malloc", 1024, dtype="handle")

Is A_data somehow a function here?

@areusch It should be A_data{Var(data = ..., )}. A_data is a variable.

I think I'd lean toward removing the alternative form and requiring the definition of the data variable to be defined prior to use in a DeclBuffer node in the PrimFunc's TIR.

@Lunderberg That's also what I thought. Since we all agree on this, I'll update the RFC to be prescriptive here.

They could also come from return values of functions.

Thanks for pointing out. I'll update it accordingly.

areusch · 2022-05-11T18:20:18Z

rfcs/0070-introducing-decl-buffer.md

+store the physical shapes of the buffers. The change of the information stored in `buffer_map` can
+be confusing. These two maps can be unified into a single `buffer_map` that defines the logical
+shapes of the input buffers. The buffer access in physical shape, which is an internal behavior of
+`PrimFunc` after flattening, can be achieved by using `DeclBuffer` to create buffer aliases in


will there be an easy way to do this implemented as part of this RFC? @Lunderberg mentioned something about a utility function to at least compute the flattened buffer_map.

@Lunderberg has a prototype for this apache/tvm#10940, it is based on implicit buffer aliasing. This paragraph is suggesting moving to explicit buffer aliasing using DeclBuffer

That's correct, the buffer flattening would be done through buffer aliasing. After apache/tvm#10940, the use of N-d/flattened buffers would be as below, depending on where you are in the lowering flow.

Before BufferFlatten/FlattenStorage: Buffers are declared in the buffer_map, and are not flattened. Buffer access is done using N-d unflattened indices.

After BufferFlatten/FlattenStorage, but before MakePackedAPI: Buffers are declared in the buffer_map, and are not flattened. Buffer access is done through a buffer alias, where the alias shares the same data pointer, but has a flattened shape and is accessed with flattened indices.

After MakePackedAPI: The buffer_map is empty. Declarations of flattened buffers are done using the handles extracted using tvm_struct_get. These flattened buffers are accessed with flattened indices.

As I understand it, this RFC would impact the TIR during parts 2 and 3. Step 2 would have an explicit DeclBuffer to mark the aliasing, rather than an implicit aliasing based on re-used Var Buffer::data. Step 3 would have an explicit DeclBuffer to mark the use of the T.handle in a buffer, rather than being implicit in the first use.

thanks for elaborating that! could we add these to the RFC?

Hzfengsy

Thanks, @vinx13. That's a super cool RFC, enabling all buffers to be defined before use. It looks good to me. Looking forward to following PRs.

vinx13 · 2022-05-20T17:34:07Z

@wrongtest I've thought about the option A3 vs A4. From the parsing / translation from TVM script to TIR, it is acceptable to have T.allocate translated to Allocate + DeclBuffer two nodes. But it will be tricky for TVMScriptPrinter. We will need to find both Allocate and DeclBuffer nodes and then print T.allocate, and these two nodes do not have to be parent/child of each other. I'm not sure if this behavior, which breaks 1-to-1 mapping between TVM script and TIR, is desirable. Alternatively, we can add an option to T.allocate, such as def allocate(..., return_buffer: bool). What do you think?

tqchen · 2022-06-07T11:59:31Z

A gentle ping. Would be great to have a status checkin and let us move to make this happen

vinx13 · 2022-06-08T00:25:22Z

Seems we all agree that introducing DeclBuffer is helpful. The only unresolved question is how shall the TVMScript be updated as @wrongtest mentioned. As discussed above, we have the options:

B1: In TVMScript, T.allocate and T.decl_buffer strictly map to the corresponding TIR nodes. To allocate and declare a buffer (in lowered TIR), there will be two separate steps:

data = T.allocate(physical_shape)
buffer = T.decl_buffer(data=data, shape=physical_shape)

B2: provide some syntax sugar for T.allocate (adding a returns_buffer option), which is translated to AllocateNode + DeclBufferNode in TIR.

buffer = T.allocate(physical_shape, returns_buffer=True)

When printing them from TIR to TVMScript, we still print in unsugared form in B1.

Note that in B2, it is not feasible to make T.allocate always return the created buffer, because

there might be still need to directly use the buffer var in lower level TIR
when printing the TIR to TVMScript, AllocateNode and DeclBufferNode do not always appear in the same place, it is difficult (and not preferred) to map two TIR nodes into one TVMScript statement

would love to know what you think @wrongtest @Hzfengsy

tqchen · 2022-06-08T11:55:13Z

We always need the original syntax to enable bidirectional property. It does not hurt though to enable sugars that allows us to have a combination on top. How about we start with the original syntax and then discuss sugar as a separate topic.

On the sugar part, perhaps we can reuse the alloc-buffer keyword

wrongtest-intellif · 2022-06-08T13:43:58Z

reuse T.alloc_buffer seems good，as long as there is no ambiguity for parser impl ：）

vinx13 · 2022-06-08T19:56:33Z

@areusch @Hzfengsy I've updated the RFC. It is ready for another look

rfcs/0070-introducing-decl-buffer.md

areusch · 2022-06-09T19:36:01Z

rfcs/0070-introducing-decl-buffer.md

@@ -136,11 +138,26 @@ def elemwise(A: T.Buffer[(16, 16), "float32"], C: T.Buffer[(16, 16), "float32"])
        C_flattened[i * 16 + j] = A[i * 16 + j]
 ```

+Specifically, the updated flow of buffer flattening using `DeclBuffer` will be:
+1. Before `FlattenBuffer/StorageFlatten`: Buffers are declared in the `buffer_map`, and are not flattened. Buffer access is done using N-d unflattened indices.
+2. After `FlattenBuffer/StorageFlatten`, but before `MakePackedAPI`: Buffers are declared in the `buffer_map`, and are not flattened. Buffer access is done through a buffer alias explicitly created via `DeclBuffer`, where the alias shares the same data pointer, but has a flattened shape and is accessed with flattened indices.


for some reason i thought our previous discussion was that Buffer would always contain unflattened shape and there would be a separate function that computes the flattened shape on the fly. did that change? if so why?

It is not changed. The buffer in buffer_map is always unflattened. After FlattenBuffer/StorageFlatten, we will create a flattened view of the buffer using buffer alias, and in IR all accesses to the buffer will be done via the flattened view. This is because from the perspective of the calling convention of PrimFunc, we always expect the input to have unflattened shape (e.g. the shape of DLTensor). But internally in the IR, we need to flatten to physical shape and indices which is how it is done in codegen and runtime

oh sorry, i see now. the text is very clear i just misread it :)

areusch

thanks for bearing with me @vinx13 !

`PrimFuncNode::preflattened_buffer_map` was introduced in #9727, in order to maintain a record of the pre-flattened buffer shape until it can be used in `MakePackedAPI`. This commit instead maintains the pre-flattened shapes in `PrimFuncNode::buffer_map`, while the body of the function uses a flattened buffer alias, as described in [RFC#70](apache/tvm-rfcs#70)

`PrimFuncNode::preflattened_buffer_map` was introduced in apache#9727, in order to maintain a record of the pre-flattened buffer shape until it can be used in `MakePackedAPI`. This commit instead maintains the pre-flattened shapes in `PrimFuncNode::buffer_map`, while the body of the function uses a flattened buffer alias, as described in [RFC#70](apache/tvm-rfcs#70)

[RFC] Introducing DeclBuffer

253b01b

Co-authored-by: Eric Lunderberg <Lunderberg@users.noreply.github.com>

areusch reviewed May 11, 2022

View reviewed changes

Lunderberg mentioned this pull request May 18, 2022

Add Target Pre-processing RFC #71

Merged

Hzfengsy approved these changes May 20, 2022

View reviewed changes

vinx13 mentioned this pull request Jun 8, 2022

[Tracking Issue] Introducing DeclBuffer apache/tvm#11627

Open

4 tasks

vinx13 added 4 commits June 8, 2022 10:46

Update 0070-introducing-decl-buffer.md

43c862c

Update 0070-introducing-decl-buffer.md

898c8aa

Update 0070-introducing-decl-buffer.md

4c5c92f

Update 0070-introducing-decl-buffer.md

e63106c

areusch reviewed Jun 9, 2022

View reviewed changes

Update 0070-introducing-decl-buffer.md

7f5cf0b

areusch approved these changes Jun 10, 2022

View reviewed changes

areusch merged commit 87ff1fa into apache:main Jun 10, 2022

vinx13 mentioned this pull request Aug 9, 2022

Amend RFC0070 with DeclBuffer TVMScript syntax updates #87

Merged

Lunderberg mentioned this pull request Nov 10, 2022

[TIR] Remove PrimFuncNode::preflattened_buffer_map apache/tvm#10940

Merged

Lunderberg mentioned this pull request Nov 16, 2022

[Tracking Issue] BufferAccess Migration Followup Items apache/tvm#10505

Open

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Introducing DeclBuffer #70

[RFC] Introducing DeclBuffer #70

vinx13 commented May 10, 2022

wrongtest-intellif commented May 11, 2022

vinx13 commented May 11, 2022

areusch left a comment

areusch May 11, 2022

vinx13 May 11, 2022

areusch May 11, 2022

vinx13 May 11, 2022

areusch May 11, 2022

vinx13 May 11, 2022

areusch May 18, 2022

Lunderberg May 18, 2022

vinx13 May 20, 2022

areusch May 11, 2022

vinx13 May 11, 2022

Lunderberg May 16, 2022

areusch May 18, 2022

Hzfengsy left a comment

vinx13 commented May 20, 2022

tqchen commented Jun 7, 2022

vinx13 commented Jun 8, 2022

tqchen commented Jun 8, 2022 •

edited

Loading

wrongtest-intellif commented Jun 8, 2022

vinx13 commented Jun 8, 2022

areusch Jun 9, 2022

vinx13 Jun 9, 2022 •

edited

Loading

areusch Jun 10, 2022

areusch left a comment

[RFC] Introducing DeclBuffer #70

[RFC] Introducing DeclBuffer #70

Conversation

vinx13 commented May 10, 2022

wrongtest-intellif commented May 11, 2022

vinx13 commented May 11, 2022

areusch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hzfengsy left a comment

Choose a reason for hiding this comment

vinx13 commented May 20, 2022

tqchen commented Jun 7, 2022

vinx13 commented Jun 8, 2022

tqchen commented Jun 8, 2022 • edited Loading

wrongtest-intellif commented Jun 8, 2022

vinx13 commented Jun 8, 2022

Choose a reason for hiding this comment

vinx13 Jun 9, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

areusch left a comment

Choose a reason for hiding this comment

tqchen commented Jun 8, 2022 •

edited

Loading

vinx13 Jun 9, 2022 •

edited

Loading