-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: clarifying buffer declaration and access #63
Changes from 5 commits
116cdc4
8c4ebb2
02d6c79
3d7c0fb
ad42e71
e8c278b
12e793e
c108f91
5dbe413
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,15 +25,18 @@ proposed in this RFC. | |
|
||
**What’s a buffer?** | ||
|
||
Buffer is a compile-time object that groups runtime objects(data pointer and shape variables) | ||
structurally. A Buffer needs to be declared and allocated before it can be used. | ||
Buffer is a compile-time representation of contiguous block of memory. Since a Buffer is typically | ||
used as backing storage for a `TensorType`, it includes relevant information from that `TensorType` | ||
which can be sufficiently generalized to an array, such as data type and shape information. | ||
A Buffer needs to be declared and allocated before it can be used. | ||
|
||
**Declaration of buffer** | ||
|
||
Buffer can be declared in the following ways: | ||
|
||
- Buffer map of `PrimFunc`. This specifies how opaque parameters of type `T.handle` should be | ||
interpreted as a buffer inside the `PrimFunc`. `T.handle` represents an opaque pointer (`void *`). | ||
- Inside the `buffer_map` of `PrimFunc`. TIR's type system does not accommodate rich array types, | ||
instead representing them as `T.handle` (typically emitted as `void*`). The `buffer_map` specifies | ||
how to interpret such `T.handle` when using it as a basis for array accesses. | ||
- `T.alloc_buffer` is used `S-TIR` to create and allocate a buffer. | ||
- `T.buffer_decl` can be used to create a buffer alias by specifying the underlying data variable to | ||
reuse the data from another buffer. It can also be used to reinterpret the data type of the buffer. | ||
|
@@ -54,22 +57,27 @@ def buffer_alloc(): | |
|
||
**Allocation of buffer** | ||
|
||
In low-level TIR, `Allocate` is used to allocate a data variable with given shapes. `Allocate` | ||
doesn’t operate on the buffer-level. The result of `Allocate` is a data variable, which may be | ||
reinterpreted with a different shape or data type in buffer declaration. | ||
In low-level TIR, `tir::Allocate` is used to allocate a data variable with given shapes. `tir::Allocate` | ||
returns a data variable of type `T.handle` (since TIR's type system does not accommodate rich arrays), which may be | ||
reinterpreted with a different shape or data type using `T.decl_buffer`. | ||
|
||
**Explicit `DeclBuffer` IR construct** | ||
|
||
`T.buffer_decl` is not an explicit statement in TIR - There is no such node in TIR. | ||
`T.decl_buffer` is not an explicit statement in TIR - There is no such node in TIR. | ||
`T.decl_buffer` returns either: | ||
- A Buffer node whose data member points to the aliased Buffer. | ||
- A Buffer node whose data member is a new pointer-type Var (the var is expected to be initialized | ||
via tir::Allocate elsewhere)" | ||
|
||
The current behavior of `TVMScriptPrinter` is to implicitly print a `T.buffer_decl` at the beginning | ||
of `PrimFunc` for any undefined buffers. The implicit behavior can be error-prone. In light of the | ||
migration, we should consider an explicit `DeclBuffer` as part of the IR. This will be furthur | ||
migration, we should consider an explicit `DeclBuffer` as part of the IR. This will be further | ||
discussed in a separate RFC. | ||
|
||
**Buffer Aliasing** | ||
|
||
`T.buffer_decl` creates a buffer alias if the underlying data variable (`.data` field) overlaps with | ||
another buffer. Buffer created via `alloc_buffer` always do not alias. Buffer aliases do not need | ||
another buffer. Buffer created via `T.alloc_buffer` always do not alias. Buffer aliases do not need | ||
`Allocate` to create the data variable. Each data variable can be allocated via `Allocate` once. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could you state the implication of "Buffer aliases do not need There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the data variable could point to any aliased or unaliased buffer, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, can you update this in the doc? you could simply state "Buffer aliases do not need |
||
If a transformation would produce multiple allocations of the same buffer var (e.g. unrolling a loop | ||
that contains an allocation), the transform should update the allocations to be unique using | ||
|
@@ -84,8 +92,9 @@ imposes a cost for buffer aliasing both to compile times and development complex | |
**Discussion: When it is safe to transform a buffer** | ||
|
||
We would like to discuss some examples of when it is safe to transform a buffer w.r.t. aliasing rules: | ||
|
||
(1) reshape, (2) layout transform (e.g. swap indices), (3) compact. | ||
- (1) reshape | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think proper numbered list syntax is like
|
||
- (2) layout transform (e.g. swap indices) | ||
- (3) compact. | ||
|
||
(1) is fine under aliasing as long as the low level memory is shared. (2) and (3) would need more | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Using buffer alias to reinterpret a buffer only changes the way we access the buffer. As long as there are no other buffer transformations or analysis applied to this buffer, it is safe to use the alias. On the other hand, any transformations or analysis applied on a buffer should be clear how to handle buffer aliases correctly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah that makes sense, can you update the RFC? |
||
cares. (2) requires all the aliases be changed together. (3) requires to compute the compact buffer | ||
|
@@ -96,8 +105,8 @@ rewrite their shapes together. | |
|
||
Previously we used `Load` and `Store` to represent low-level buffer accesses. `Load` and `Store` | ||
consist of data variable, data type and index, which can be directly translated to pointer cast and | ||
accesses in runtime. Note that data type in `Load` / `Store` can be different from the actual data | ||
variable type. For example, | ||
accesses in runtime. Note that data type given to `Load` / `Store` can be different from the | ||
Buffer's data variable type. For example, | ||
|
||
```python | ||
A = T.buffer_decl(shape=(16,), dtype='float') | ||
|
@@ -112,17 +121,18 @@ can be translated to | |
|
||
in C codegen. | ||
|
||
`BufferLoad` and `BufferStore` themselves can not reinterpret a buffer to a different shape or | ||
data type. They rely on the underlying buffer object. This is the fundamental difference between | ||
`Load/Store` and `BufferLoad/BufferStore` that we need to deal with carefully. | ||
However, `BufferLoad` and `BufferStore` themselves can not reinterpret a buffer to a different shape | ||
or data type. They always return the data type specified on underlying buffer object. This is the | ||
fundamental difference between `Load/Store` and `BufferLoad/BufferStore` that we need to deal with | ||
carefully. | ||
|
||
Vectorized access is achieved by using `Ramp` as index in `Load/Store`. Vectorized buffer access | ||
via `BufferLoad`/`BufferStore` can be achieved either by using a scalar index to access a buffer | ||
that has a vectorized type, or by using `Ramp` as an index into a buffer that has a scalar type. | ||
For N-D buffer indices, it is possible that `Ramp` being used in multiple dimensions | ||
(e.g. `A[Ramp(...), ..., Ramp(...)]` ). In this case the number of lanes of the data type of such | ||
value is the product of each `Ramp`. We may want to limit `Ramp` to only the last dimension as | ||
multiple `Ramp` creates additional complexity. | ||
value is the product of each `Ramp`. We limit `Ramp` to only the last dimension as multiple `Ramp` | ||
creates additional complexity. | ||
|
||
Different combinations of buffer type and index type (scalar vs. vector) are clarified in | ||
|
||
|
@@ -168,21 +178,22 @@ def nd_vector_load_from_scalar_buffer(A: T.Buffer[(64,64), "float32"]): | |
assert A[0, T.ramp(0, 1, 4)].dtype == "float32x4" | ||
``` | ||
|
||
In rare cases, vector index can be used to access a vector buffer. We may leave this usage as | ||
undefinedUnless we have a clear use case. | ||
In rare cases, vector index can be used to access a vector buffer. We leave this usage as | ||
undefined until we have a clear use case. | ||
|
||
**VectorBufferRewrite** | ||
|
||
In some backend like SPIR-V where runtime pointer casts are not available, even between types that | ||
differ only in the number of lanes (e.g. `float16` and `float16x4.`) `VectorTypeRewriter` will be | ||
differ only in the number of lanes (e.g. `float16` and `float16x4.`), `VectorTypeRewriter` will be | ||
used to rewrite the buffer to a vector type. (VectorBufferRewrite rewrites the buffer from | ||
`vector_load_from_scalar_buffer` into `scalar_load_from_vector_buffer` in the above example). | ||
|
||
**Removing pre-flattened buffer** | ||
vinx13 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Buffer information before flattening are necessary during compilation. They specify the calling | ||
convention of `PrimFunc` and are translated to assertions of buffer shapes, strides, etc. in | ||
runtime. | ||
runtime. `preflattened_buffer_map` was introduced in https://github.com/apache/tvm/pull/9727 to | ||
save these information after buffer flattening. | ||
|
||
During the lowering process, although buffer accesses inside `PrimFunc` are flattened to match | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have a proof-of-concept implementation of this removal at apache/tvm#10940. It is able to pass all tests in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice. We can then consider moving to explicit |
||
physical buffer dimensions, the calling convention of the `PrimFunc` are kept unchanged - It still | ||
|
@@ -198,7 +209,7 @@ def fn(X: T.Buffer([2, 3], "float32"): | |
X_flattened[i] = .... | ||
``` | ||
|
||
#Pass Checklist | ||
# Pass Checklist | ||
|
||
Here are a list of TIR passes that can be impacted significantly when migrating from `Load/Store` to | ||
`BufferLoad/BufferStore`. | ||
|
@@ -229,4 +240,5 @@ Therefore we should be able to have a unified method called `T.buffer_decl` in b | |
TVMScript. | ||
- There are several way for buffer definition, `T.buffer_decl, T.match_buffer, T.alloc_buffer`. | ||
- `BufferLoad/BufferStore` can be generalized to allow `Ramp` as part of the index. | ||
- `buffer_decl` can be an alternative of `preflattened_buffer_map`. | ||
- `buffer_decl` is going to be used to declare flattened Buffer aliases, and | ||
`preflattened_buffer_map` will be removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean
T.decl_buffer
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. In TVM script, it is
T.buffer_decl
and we have plan to rename it toT.decl_buffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, if you're using
T.
notation here, you mean TVMScript, right? in which case, can you update all occurrences to match what you would grep for if using TVMScript? right now it's a dead-end.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a note so someone knows what to grep for post-future rename?