Skip to content

[LLVM] Fix a possible tbaa issue#11181

Merged
wrongtest-intellif merged 3 commits intoapache:mainfrom
wrongtest-intellif:fix_possible_tbaa_problem
May 7, 2022
Merged

[LLVM] Fix a possible tbaa issue#11181
wrongtest-intellif merged 3 commits intoapache:mainfrom
wrongtest-intellif:fix_possible_tbaa_problem

Conversation

@wrongtest-intellif
Copy link
Copy Markdown
Contributor

Hi, we encounter some weird problem on llvm generated codes, seems caused by current llvm tbaa annotations.

The function AddAliasInfo will distinguish the index of scalar form and vectorized form. If we pass a scalar index but actually it is just the head of a ramp access, there is a possibility that overlapped accesses are infered as NoAlias by tbaa analysis unsafely.

However, I am not sure how to reproduce the problem on common target like X86 cpu. Glad to see any suggestions:)
cc @Lunderberg

@github-actions github-actions bot requested a review from Lunderberg April 29, 2022 12:32
@Lunderberg
Copy link
Copy Markdown
Contributor

That makes sense to me. By passing a scalar type into AddAliasInfo for a vector access, we are incorrectly tagging the access as touching the first element of the vector, not the entire vector. This would have been introduced in #10567, prior to which there was a separate call to AddAliasInfo for each type.

The one potential issue would be for vectorized types that contain alignment padding (this conditional in CodeGenLLVM::BufferAccessHelper). Because the TVM data arrays are densely packed, this computes the start index of the vectorized access using the underlying element type, rather than the vectorized type. This was found when a vectorized store of float32x3 at index i incorrectly wrote values at byte offsets of i * sizeof(float32x3) == 16*i instead of byte offsets of i * 3 * sizeof(float32) == 12*i. Because the value of last_index_for_tbaa is cached before this correction is applied, I think it would have incorrect alias information when this correction is applied.

I can see two possible options to avoid this issue. Option 1 would be to swap the order of the HasAlignmentPadding check and the last_index.as<RampNode>() check, so that the last_index_for_tbaa cached value can occur after the alignment check and before the ramp node check. This would require rewriting the HasAlignmentPadding check to also allow RampNode, because it would be prior to unwrapping the RampNode. Option 2 would be to add a width argument to AddAliasInfo, rather than extracting it for a ramp node.

@wrongtest-intellif
Copy link
Copy Markdown
Contributor Author

Because the value of last_index_for_tbaa is cached before this correction is applied, I think it would have incorrect alias information when this correction is applied

Thanks for the notes! After a re look-through, IIUC, I think the "index" for alias info should keep the same element unit for all accesses, so we may not change it either the last_index is corrected or not.

However, before any changes, I find another concern on aliased buffers. since in the new convention, two buffer object (maybe of different datatype) with the same buffer var alias to each other, it may not be safe to use current buffer's dtype as index unit for alias info.

eg1:

# A[16] and B[4] may inferred as NoAlias by tbaa
A = T.allocate([64], "int8")
B = T.buffer_decl([16], "int32", data=A.data)
A[16] = 1   # tag: (A.data, idx=16)  
B[4] = 1    # tag: (A.data, idx=4)      

eg2:

# A and B inferred as NoAlias since they have different buffer var
A = T.allocate([64], "int8")
B_data = T.address_of(A[4])  # usmp style alias
B = T.buffer_decl([64], "int8", data=B_data)
A[7] = 1   # tag: (A.data, idx=7)
B[3] = 2   # tag: (B.data, idx=3)

@Lunderberg
Copy link
Copy Markdown
Contributor

After a re look-through, IIUC, I think the "index" for alias info should keep the same element unit for all accesses, so we may not change it either the last_index is corrected or not.

That makes sense to me, so long as all access uses the vectorized data type or the scalar data type, but not both.

However, before any changes, I find another concern on aliased buffers. since in the new convention, two buffer object (maybe of different datatype) with the same buffer var alias to each other, it may not be safe to use current buffer's dtype as index unit for alias info.

Good point, and thank you for the examples. I agree with the conclusion that this can impact any aliased buffer that has a different element type, either differing by number of lanes or scalar datatype. There's a few cases I can think of where that could occur in practice, such as StorageRewrite merging multiple buffers of different types into a single allocation.

What are the restrictions on the element type presented to the tbaa annotations? If we write the alias information using a byte-based indexing, I think that would solve both the vector size and the dtype size issues.

@tqchen
Copy link
Copy Markdown
Member

tqchen commented May 2, 2022

@wrongtest in your particular case and all other cases, we should make sure TBAA info indexed by buffer->data instead of buffer itself, which would resolve the problem of buffer re-declaration

@wrongtest-intellif
Copy link
Copy Markdown
Contributor Author

wrongtest-intellif commented May 3, 2022

indexed by buffer->data

Current implementation is using the data field: AddAliasInfo(instruction, buffer->data.get(), last_index), while (eg1) is about indices of different datatypes on same buffer data.

(eg2) is something that though two buffer object have totally different buffer data, the accesses still have possible overlaps (by memory pool reuse, IIUC).

PaddedInput_3_let = T.buffer_decl([360000], 'int16')
with T.let(PaddedInput_3_let.data, T.address_of(global_workspace_5_buffer_var[6480000], dtype="handle")):
for i0_i1_fused_3, i2_3, i3_3 in T.grid(75, 75, 64):
PaddedInput_3_let[i0_i1_fused_3 * 4800 + i2_3 * 64 + i3_3] = placeholder_29[i0_i1_fused_3 * 4800 + i2_3 * 64 + i3_3]
for ax0_ax1_fused_ax2_fused_3 in T.serial(0, 5625):
Conv2dOutput_3_let = T.buffer_decl([64], 'int32')
with T.let(Conv2dOutput_3_let.data, T.address_of(global_workspace_5_buffer_var[7200000], dtype="handle")):

Also find a related PR by @kparzysz-quic #6046 by key word search, but it is when the backend tir use LoadNode/StoreNode.

@wrongtest-intellif wrongtest-intellif force-pushed the fix_possible_tbaa_problem branch from 99128b4 to 60e7760 Compare May 3, 2022 14:00
@tqchen
Copy link
Copy Markdown
Member

tqchen commented May 3, 2022

@wrongtest when no-alias is set to True, we should ensure that aliasing is only indicated by the buffer->data, as a result different buffer->data will result in non-aliased buffers(they do not share the same memory in memory pool)

@kparzysz-quic
Copy link
Copy Markdown
Contributor

What are the restrictions on the element type presented to the tbaa annotations?

You mean in LLVM? TBAA has two kinds of types: scalars and structs. Scalars are elementary, i.e. are not composed from other types, while structs are. What you present as a "scalar" to TBAA is up to you, there are no links there to any actual LLVM IR types.

@wrongtest-intellif
Copy link
Copy Markdown
Contributor Author

different buffer->data will result in non-aliased buffers

Thanks! So we do not need to worry about (eg2) form.

@wrongtest-intellif wrongtest-intellif force-pushed the fix_possible_tbaa_problem branch 2 times, most recently from eb8378f to 092142e Compare May 5, 2022 05:08
@wrongtest-intellif
Copy link
Copy Markdown
Contributor Author

Change the index of tbaa to be based on "underlying datatype` inspired by #6046, or fallback to byte. Tag node on buffer dtype is removed because it seems that there should not exist type tree paths of different dtype tag on the same buffer. Could you kindly take another review? @Lunderberg

Unfortunately I still fail to construct conterexample of runtime result error on cpu, though llvm ir of suspicious illegal tbaa could be provided.

Copy link
Copy Markdown
Contributor

@Lunderberg Lunderberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on @kparzysz-quic 's comment about scalar types in LLVM's TBAA, I think there's a couple of potential improvements if we change the scalar type from buffer_var->type_annotation->element_type to just bits or bytes. That would avoid needing to check the type annotations, and would avoid marking some types of non-aliased access as aliasing. That said, this PR is a huge improvement over the current state of incorrect annotations, and so those changes could also be a separate PR entirely.

I took a glance at the CI failure, and it looks like a timeout on the Windows build that just needs to be restarted.

arith::PVar<int> planes;
// create meta-data for alias analysis
// Use a group of binary tree ranges of memory banks.
if (index.defined()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tangentially-related cleanup: I think we can remove the check on index.defined(). AddAliasInfo is only called from BufferAccessHelper, which provides a defined index.


// Extract the underlying element bit width of the allocated buffer.
// fallback to byte type if no type annotation present.
int64_t buffer_elem_bits = 8;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the size from the type annotation. The type annotation on the Var would only include the buffer's type as allocated, and may not be correlated with the type used for accessing it. When accessing the buffer in CodeGenLLVM::CreateBufferPtr, if the allocation type and access type differ, the buffer var is cast to the access type. So the bytes being accessed by a load/store should only depend on the access type and the access index.

xwith = 1;
}
if (buffer_elem_bits != access_elem_bits) {
base = base * access_elem_bits / buffer_elem_bits;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this cause false positives for aliasing of a buffer whose access type is smaller than the allocation type? I'm picturing something like the following:

@T.prim_func
def func():
    A = T.alloc_buffer(32, dtype='int32')
    A_bytes = T.buffer_decl(128, dtype='int8', data=A.data)
    A_bytes[0] = 42
    A_bytes[3] = 42

By scaling the alias information to the size of the original allocation, both A_bytes[0] and A_bytes[3] are treated as access of A[0]. This would treat it as an alias even though they are accessing different addresses.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to just bits or bytes

I agree and would like to follow that in current pr. That make codes much clean and avoid sort of false positives.

Note there is a magic width number 1024, above which the access fallbacks to full region access (root tag for current buffer var). Thus the type tree depth will decrease on huge vector compared to original version, but I think it could be a minor issue and we can turn back until certain performance regression detected.

@wrongtest-intellif wrongtest-intellif force-pushed the fix_possible_tbaa_problem branch from 092142e to adeff51 Compare May 5, 2022 21:17
@kparzysz-quic
Copy link
Copy Markdown
Contributor

Change the index of tbaa to be based on "underlying datatype` inspired by #6046, or fallback to byte. Tag node on buffer dtype is removed because it seems that there should not exist type tree paths of different dtype tag on the same buffer.

Before the "typed buffers" were introduced, it was possible to use all kinds of types to access the memory (for example by redeclaring buffers with the same buffer variable, but with different types). So, you could read/write float32, then access the same storage with int16 (not necessarily using the same BufferNode, though), etc. As per some older comments, the TBAA annotation code was borrowed from Halide, and I can speculate that the dtype was added to TBAA to indicate that data with one underlying type will not be accessed via another underlying type. So, you could reuse buffers, but if you stored float32 you'd better not be trying to read it as int32, which follows the C/C++ type based aliasing rules. We don't have any specific aliasing rules in TIR that I'm aware of, but removing the type tag will make all accesses to the same buffer be aliased. This may be a good thing because it's safe, but we need to make that a deliberate decision (and document it somewhere if it isn't yet).

@wrongtest-intellif
Copy link
Copy Markdown
Contributor Author

document it somewhere

https://github.com/vinx13/tvm-rfcs/blob/clarify-buffer-access/rfcs/0063-clarifying-buffer-declaration-and-access.md
The clarification RFC says T.buffer_decl creates a buffer alias if the underlying data variable (.data field) overlaps with another buffer. And this should be the unique way to create buffer aliases.

Currently I understand that means all accesses with the same buffer data must be alias (irrelavant to dtype) and the word alias take the same meaning across TIR and target codegen levels. If so (fix me), no compatible C/C++ type based aliasing rules can be introduced. cc @kparzysz-quic

@kparzysz-quic
Copy link
Copy Markdown
Contributor

The clarification RFC says T.buffer_decl creates a buffer alias if the underlying data variable (.data field) overlaps with another buffer. And this should be the unique way to create buffer aliases.

Thank you for the link. I guess the document should also specify that in eg2 in your earlier comment, the buffers will not be aliased.

@wrongtest-intellif wrongtest-intellif force-pushed the fix_possible_tbaa_problem branch from adeff51 to b536400 Compare May 7, 2022 07:10
@wrongtest-intellif wrongtest-intellif merged commit 62d3a67 into apache:main May 7, 2022
shtinsa pushed a commit to Deelvin/tvm that referenced this pull request May 17, 2022
* fix a possible tbaa issue

* Correct tbaa index unit by underlying buffer elemtype

* always use byte as index unit in tbaa
SebastianBoblest pushed a commit to SebastianBoblest/tvm that referenced this pull request May 27, 2022
* fix a possible tbaa issue

* Correct tbaa index unit by underlying buffer elemtype

* always use byte as index unit in tbaa
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants