Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracking Issue] BufferAccess Migration Followup Items #10505

Closed
2 of 19 tasks
tqchen opened this issue Mar 7, 2022 · 16 comments
Closed
2 of 19 tasks

[Tracking Issue] BufferAccess Migration Followup Items #10505

tqchen opened this issue Mar 7, 2022 · 16 comments
Assignees
Labels
te tir any TIR core issues which don’t fit into other tir: labels (include/tvm/tir, src/tir)

Comments

@tqchen
Copy link
Member

tqchen commented Mar 7, 2022

This is a tracking issue created for followup tracking items in #9727.

The particular PR contains an important new feature(transform_layout). The main goal of this issue is to do followup shepherding around the IR change part(buffer access and related allocations), which have a broader impact. The particular PR takes many rounds of reviews and but still have quite a few points that would benefit from systematic clarification, cleanup and further polish.

Given that there is an important followup(of layout transform) that depend on the change, and the PR is already at a state where incremental changes can further block the progress as other concurrent changes are made. We took the deliberation to merge PR, then continue to do post-hoc shepherding. This decision is supposed to happen in rare circumstance with clear followup actions to backup the decisions.

This is the tracking issue for followup shepherd actions

High-level gist of change

Load/Store are deprecated and changes to BufferLoad/Store of 1D array. BufferLoad/Store of 1D elements are being used in the place of Load/Store. Overall this is a good directly as we unify more analysis onto the (possibly multi-dimensional) buffer access, opens doors for special memory unit that are in nature multi-dimensional.

However, we do need more clarifications and systematic review/cleanup (see check list).

Check List

  • Clarify the buffer access rule, aliasing and allocation rules
  • Buffer.data pointer type and buffer element type relation
  • Create a migration guide for people to upgrade from the Load/Store
  • Use WithIndices for NormalizeDType
  • Find alternative to preflatten_buffer_map (possible via buffer redeclaration)
  • Discuss and narrow down the scope of address_of, ideally to 1D acess
  • Discuss and clarify the scope of vector buffer indexing rule (RFC: clarifying buffer declaration and access tvm-rfcs#63)
  • Post-hoc Revisit key passes involved
    • CodegenLLVM
    • StorageRewrite
    • Vectorize
    • VectorPointerTypeRewrite
    • ThreadStorageSync
    • StorageFlatten
    • MergeDynamicSharedMemoryLocations
    • LowerWarpMemory
    • LowerThreadAllreduce
    • NarrowDataType
    • Other passes that are involved in the refactoring

cc @Hzfengsy @junrushao

@tqchen
Copy link
Member Author

tqchen commented Mar 7, 2022

@tqchen tqchen changed the title [Tracking Issue] BufferAccess and Allocation Followup Items [Tracking Issue] Load/Store to BufferAccess Migration Followup Items Mar 7, 2022
@tqchen tqchen changed the title [Tracking Issue] Load/Store to BufferAccess Migration Followup Items [Tracking Issue] BufferAccess Migration Followup Items Mar 7, 2022
@masahi masahi self-assigned this Mar 7, 2022
@masahi
Copy link
Member

masahi commented Mar 7, 2022

Can we remove Load / Store node altogether now?

@tqchen
Copy link
Member Author

tqchen commented Mar 7, 2022

Let us first finish the items, streamline changes and likely remove load/store in next release. So we have sufficent timewindow and error message for people to react

Lunderberg added a commit to Lunderberg/tvm that referenced this issue Mar 7, 2022
Part of tracking issue apache#10505,
restrict multi-lane indexing to at most one index per buffer access.
This removes ambiguity as an expression such as `A[T.ramp(i,1,2),
T.ramp(j,1,2)]`, which could be interpreted either as `[A[i,j],
A[i+1,j+1]]` or as `[A[i,j], A[i,j+1], A[i+1,j], A[i+1,j+1]]`,
depending on whether the implied iterators of the two ramp nodes are
shared.
Lunderberg added a commit to Lunderberg/tvm that referenced this issue Mar 7, 2022
Part of tracking issue apache#10505,
restrict multi-lane indexing to at most one index per buffer
access. This removes ambiguity as an expression such as
`A[T.ramp(i,1,2), T.ramp(j,1,2)]`, which could be interpreted either
as `[A[i,j], A[i+1,j+1]]` or as `[A[i,j], A[i,j+1], A[i+1,j],
A[i+1,j+1]]`, depending on whether the implied iterators of the two
ramp nodes are shared.
@Lunderberg
Copy link
Contributor

Discuss and clarify the scope of vector buffer indexing rule

I agree with adding additional restrictions. I've created draft PRs #10512 and #10513, to test the impact of different rules on existing test cases. Each requires that only a single index can have multiple lanes, while #10513 additional requires that to be the last index.

@tqchen After thinking on your comment here, I agree that if there are no issues introduced by using the stricter restriction, we should use it, as we can also loosen the condition later if needed.

@wrongtest-intellif
Copy link
Contributor

Thanks for the great efforts!

Here is my question: Whether we will have some SSA (or no-alias?) property on Buffer object in low-level tir? Previously we can treat every buffer var uniquely defined by Allocate and used by Load/Store/Opaque accesses. Now there seems to be some unclearness. For example, what if we create multiple buffers on the same buffer var associated by Allocate stmt?

@tqchen
Copy link
Member Author

tqchen commented Mar 8, 2022

@wrongtest great point. I agree it is important to summarize the aliasing property. @vinx13 perhaps can add more comments on this

@vinx13
Copy link
Member

vinx13 commented Mar 8, 2022

While tir.noalias property still hold for PrimFunc args, T.decl_buffer creates a buffer alias if the underlying data variable (.data field) overlaps with another buffer. Buffer created via alloc_buffer always do not alias. Buffer aliases do not need Allocate to create the data variable. Each data variable can be allocated via Allocate once. If a transformation would produce multiple allocations of the same buffer var (e.g. unrolling a loop that contains an allocation), the transform should update the allocations to be unique using tvm::tir::ConvertSSA

Buffers should not alias each other unless necessary, because aliased buffers increase complexity for TIR transformations. Passes that rewrite buffers should clearly indicate how aliased buffers are handled. For example, when changing the underlying layout of stored elements in a buffer, all buffer aliases must also be updated. While buffer aliasing is typically free at runtime, this imposes a cost for buffer aliasing both to compile times and development complexity.

We should also extend VeryfySSA pass to also take buffer into consideration.

@Lunderberg
Copy link
Contributor

Both PRs for testing indexing restrictions, #10512 and #10513, have passed CI. I think I agree with the tighter restriction, that only the last index may have more than one lane. While there are cases where a ramp index in an earlier lane could be transformed into a ramp index in the last lane, those would be relatively rare, as it would require a ramp index to be present in user-defined calls.

I have closed #10512, and converted #10513 from a draft PR to an open PR.

@tqchen
Copy link
Member Author

tqchen commented Mar 10, 2022

Thanks @Lunderberg , if @vinx13 you also agree, can you review and merge #10513?

@wrongtest-intellif
Copy link
Contributor

Since Buffer object takes an elem_offset field (be zero by default), perhaps we should also process it in CodeGen. It could be useful if we want to create aliased buffers to sub-regions.
The proposed changes are here: #10582

AndrewZhaoLuo pushed a commit that referenced this issue Mar 11, 2022
)

* [TIR] Restirct Buffer indices, only last index can be multi-lane

Part of tracking issue #10505,
restrict multi-lane indexing to at most one index per buffer
access. This removes ambiguity as an expression such as
`A[T.ramp(i,1,2), T.ramp(j,1,2)]`, which could be interpreted either
as `[A[i,j], A[i+1,j+1]]` or as `[A[i,j], A[i,j+1], A[i+1,j],
A[i+1,j+1]]`, depending on whether the implied iterators of the two
ramp nodes are shared.

* Improved readability based on review suggestions.

* Resolve lint error.
@Lunderberg
Copy link
Contributor

Also related to elem_offset, I think it will need to allow multiple independent offsets. When a tensorized buffer is lowered, the elem_offset is used to store the flattened index. For non-flat memory buffers, this requires multiple indices, but only one can be stored there. I think both elem_offset and offset_factor will require one parameter per physical dimension, and will need to be changed to arrays.

I have the proposed changes in #10816 , which should maintain backwards compatibility on the python API, and for constructor calls to Buffer on the C++ API.

@tqchen
Copy link
Member Author

tqchen commented Mar 29, 2022

Thanks @Lunderberg , it would be great to deliberate on the choice of elem_offset, to see if we have other alternatives, as that would also mean a major change.

@Lunderberg
Copy link
Contributor

Following some debugging with @areusch on an issue caused by preflattened_buffer_map, I did some experimenting on how hard it would be to remove it altogether. #10940 is a proof of concept for keeping the preflattened shape in the buffer_map, while the function body refers only to flattened aliases of the user-provided buffers. It looks like it is working pretty well, and intend to open an RFC for further discussion on it in the near future.

@tqchen
Copy link
Member Author

tqchen commented Apr 8, 2022

@Lunderberg this also relates to apache/tvm-rfcs#63 where we can use buffer declaration to create alias, I think this should be the right way do go. Additionally, it would be great to make buffer declaration(alias creation) explicit

@Lunderberg
Copy link
Contributor

Lunderberg commented Apr 8, 2022

Thank you, and I feel silly for forgetting RFC#63 after the draft discussions, and definitely agreed on the explicit aliasing being a better option for the long term.

pfk-beta pushed a commit to pfk-beta/tvm that referenced this issue Apr 11, 2022
…che#10513)

* [TIR] Restirct Buffer indices, only last index can be multi-lane

Part of tracking issue apache#10505,
restrict multi-lane indexing to at most one index per buffer
access. This removes ambiguity as an expression such as
`A[T.ramp(i,1,2), T.ramp(j,1,2)]`, which could be interpreted either
as `[A[i,j], A[i+1,j+1]]` or as `[A[i,j], A[i,j+1], A[i+1,j],
A[i+1,j+1]]`, depending on whether the implied iterators of the two
ramp nodes are shared.

* Improved readability based on review suggestions.

* Resolve lint error.
@areusch areusch added the needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it label Oct 19, 2022
@driazati driazati added tir any TIR core issues which don’t fit into other tir: labels (include/tvm/tir, src/tir) te and removed needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it labels Oct 19, 2022
@Lunderberg
Copy link
Contributor

Lunderberg commented Nov 16, 2022

As part of the follow-up (and as part of RFC#70), PR #10940 has now landed, removing the pre-flattened buffer map. The PrimFuncNode::buffer_map now contains the original pre-flattened buffer shape through all TIR lowering steps, and flattened buffers are represented as aliases into the argument buffer.

(And updated the tracking issue's checklist.)

@tqchen tqchen closed this as completed Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
te tir any TIR core issues which don’t fit into other tir: labels (include/tvm/tir, src/tir)
Projects
None yet
Development

No branches or pull requests

7 participants