[LLVM] Fix a possible tbaa issue by wrongtest-intellif · Pull Request #11181 · apache/tvm

wrongtest-intellif · 2022-04-29T12:31:41Z

Hi, we encounter some weird problem on llvm generated codes, seems caused by current llvm tbaa annotations.

The function AddAliasInfo will distinguish the index of scalar form and vectorized form. If we pass a scalar index but actually it is just the head of a ramp access, there is a possibility that overlapped accesses are infered as NoAlias by tbaa analysis unsafely.

However, I am not sure how to reproduce the problem on common target like X86 cpu. Glad to see any suggestions:)
cc @Lunderberg

Lunderberg · 2022-04-29T14:18:56Z

That makes sense to me. By passing a scalar type into AddAliasInfo for a vector access, we are incorrectly tagging the access as touching the first element of the vector, not the entire vector. This would have been introduced in #10567, prior to which there was a separate call to AddAliasInfo for each type.

The one potential issue would be for vectorized types that contain alignment padding (this conditional in CodeGenLLVM::BufferAccessHelper). Because the TVM data arrays are densely packed, this computes the start index of the vectorized access using the underlying element type, rather than the vectorized type. This was found when a vectorized store of float32x3 at index i incorrectly wrote values at byte offsets of i * sizeof(float32x3) == 16*i instead of byte offsets of i * 3 * sizeof(float32) == 12*i. Because the value of last_index_for_tbaa is cached before this correction is applied, I think it would have incorrect alias information when this correction is applied.

I can see two possible options to avoid this issue. Option 1 would be to swap the order of the HasAlignmentPadding check and the last_index.as<RampNode>() check, so that the last_index_for_tbaa cached value can occur after the alignment check and before the ramp node check. This would require rewriting the HasAlignmentPadding check to also allow RampNode, because it would be prior to unwrapping the RampNode. Option 2 would be to add a width argument to AddAliasInfo, rather than extracting it for a ramp node.

wrongtest-intellif · 2022-04-30T14:09:21Z

Because the value of last_index_for_tbaa is cached before this correction is applied, I think it would have incorrect alias information when this correction is applied

Thanks for the notes! After a re look-through, IIUC, I think the "index" for alias info should keep the same element unit for all accesses, so we may not change it either the last_index is corrected or not.

However, before any changes, I find another concern on aliased buffers. since in the new convention, two buffer object (maybe of different datatype) with the same buffer var alias to each other, it may not be safe to use current buffer's dtype as index unit for alias info.

eg1:

# A[16] and B[4] may inferred as NoAlias by tbaa
A = T.allocate([64], "int8")
B = T.buffer_decl([16], "int32", data=A.data)
A[16] = 1   # tag: (A.data, idx=16)  
B[4] = 1    # tag: (A.data, idx=4)

eg2:

# A and B inferred as NoAlias since they have different buffer var
A = T.allocate([64], "int8")
B_data = T.address_of(A[4])  # usmp style alias
B = T.buffer_decl([64], "int8", data=B_data)
A[7] = 1   # tag: (A.data, idx=7)
B[3] = 2   # tag: (B.data, idx=3)

Lunderberg · 2022-05-02T15:10:30Z

After a re look-through, IIUC, I think the "index" for alias info should keep the same element unit for all accesses, so we may not change it either the last_index is corrected or not.

That makes sense to me, so long as all access uses the vectorized data type or the scalar data type, but not both.

However, before any changes, I find another concern on aliased buffers. since in the new convention, two buffer object (maybe of different datatype) with the same buffer var alias to each other, it may not be safe to use current buffer's dtype as index unit for alias info.

Good point, and thank you for the examples. I agree with the conclusion that this can impact any aliased buffer that has a different element type, either differing by number of lanes or scalar datatype. There's a few cases I can think of where that could occur in practice, such as StorageRewrite merging multiple buffers of different types into a single allocation.

What are the restrictions on the element type presented to the tbaa annotations? If we write the alias information using a byte-based indexing, I think that would solve both the vector size and the dtype size issues.

tqchen · 2022-05-02T17:45:41Z

@wrongtest in your particular case and all other cases, we should make sure TBAA info indexed by buffer->data instead of buffer itself, which would resolve the problem of buffer re-declaration

wrongtest-intellif · 2022-05-03T02:54:03Z

indexed by buffer->data

Current implementation is using the data field: AddAliasInfo(instruction, buffer->data.get(), last_index), while (eg1) is about indices of different datatypes on same buffer data.

(eg2) is something that though two buffer object have totally different buffer data, the accesses still have possible overlaps (by memory pool reuse, IIUC).

tvm/tests/python/unittest/test_tir_usmp_transform_convert_pool_allocations_to_offsets.py

Lines 442 to 448 in f00c8db

    
           PaddedInput_3_let = T.buffer_decl([360000], 'int16') 
        
           with T.let(PaddedInput_3_let.data, T.address_of(global_workspace_5_buffer_var[6480000], dtype="handle")): 
        
               for i0_i1_fused_3, i2_3, i3_3 in T.grid(75, 75, 64): 
        
                   PaddedInput_3_let[i0_i1_fused_3 * 4800 + i2_3 * 64 + i3_3] = placeholder_29[i0_i1_fused_3 * 4800 + i2_3 * 64 + i3_3] 
        
               for ax0_ax1_fused_ax2_fused_3 in T.serial(0, 5625): 
        
                   Conv2dOutput_3_let = T.buffer_decl([64], 'int32') 
        
                   with T.let(Conv2dOutput_3_let.data, T.address_of(global_workspace_5_buffer_var[7200000], dtype="handle")):

Also find a related PR by @kparzysz-quic #6046 by key word search, but it is when the backend tir use LoadNode/StoreNode.

tqchen · 2022-05-03T17:39:15Z

@wrongtest when no-alias is set to True, we should ensure that aliasing is only indicated by the buffer->data, as a result different buffer->data will result in non-aliased buffers(they do not share the same memory in memory pool)

kparzysz-quic · 2022-05-03T18:19:58Z

What are the restrictions on the element type presented to the tbaa annotations?

You mean in LLVM? TBAA has two kinds of types: scalars and structs. Scalars are elementary, i.e. are not composed from other types, while structs are. What you present as a "scalar" to TBAA is up to you, there are no links there to any actual LLVM IR types.

wrongtest-intellif · 2022-05-03T18:57:17Z

different buffer->data will result in non-aliased buffers

Thanks! So we do not need to worry about (eg2) form.

wrongtest-intellif · 2022-05-05T08:21:56Z

Change the index of tbaa to be based on "underlying datatype` inspired by #6046, or fallback to byte. Tag node on buffer dtype is removed because it seems that there should not exist type tree paths of different dtype tag on the same buffer. Could you kindly take another review? @Lunderberg

Unfortunately I still fail to construct conterexample of runtime result error on cpu, though llvm ir of suspicious illegal tbaa could be provided.

Lunderberg

Based on @kparzysz-quic 's comment about scalar types in LLVM's TBAA, I think there's a couple of potential improvements if we change the scalar type from buffer_var->type_annotation->element_type to just bits or bytes. That would avoid needing to check the type annotations, and would avoid marking some types of non-aliased access as aliasing. That said, this PR is a huge improvement over the current state of incorrect annotations, and so those changes could also be a separate PR entirely.

I took a glance at the CI failure, and it looks like a timeout on the Windows build that just needs to be restarted.

Lunderberg · 2022-05-05T19:30:05Z

src/target/llvm/codegen_llvm.cc

  arith::PVar<int> planes;
  // create meta-data for alias analysis
  // Use a group of binary tree ranges of memory banks.
  if (index.defined()) {


Tangentially-related cleanup: I think we can remove the check on index.defined(). AddAliasInfo is only called from BufferAccessHelper, which provides a defined index.

Lunderberg · 2022-05-05T19:38:17Z

src/target/llvm/codegen_llvm.cc


+  // Extract the underlying element bit width of the allocated buffer.
+  // fallback to byte type if no type annotation present.
+  int64_t buffer_elem_bits = 8;


I don't think we need the size from the type annotation. The type annotation on the Var would only include the buffer's type as allocated, and may not be correlated with the type used for accessing it. When accessing the buffer in CodeGenLLVM::CreateBufferPtr, if the allocation type and access type differ, the buffer var is cast to the access type. So the bytes being accessed by a load/store should only depend on the access type and the access index.

Lunderberg · 2022-05-05T19:47:25Z

src/target/llvm/codegen_llvm.cc

+      xwith = 1;
+    }
+    if (buffer_elem_bits != access_elem_bits) {
+      base = base * access_elem_bits / buffer_elem_bits;


Would this cause false positives for aliasing of a buffer whose access type is smaller than the allocation type? I'm picturing something like the following:

@T.prim_func def func(): A = T.alloc_buffer(32, dtype='int32') A_bytes = T.buffer_decl(128, dtype='int8', data=A.data) A_bytes[0] = 42 A_bytes[3] = 42

By scaling the alias information to the size of the original allocation, both A_bytes[0] and A_bytes[3] are treated as access of A[0]. This would treat it as an alias even though they are accessing different addresses.

to just bits or bytes

I agree and would like to follow that in current pr. That make codes much clean and avoid sort of false positives.

Note there is a magic width number 1024, above which the access fallbacks to full region access (root tag for current buffer var). Thus the type tree depth will decrease on huge vector compared to original version, but I think it could be a minor issue and we can turn back until certain performance regression detected.

kparzysz-quic · 2022-05-05T22:19:07Z

Change the index of tbaa to be based on "underlying datatype` inspired by #6046, or fallback to byte. Tag node on buffer dtype is removed because it seems that there should not exist type tree paths of different dtype tag on the same buffer.

Before the "typed buffers" were introduced, it was possible to use all kinds of types to access the memory (for example by redeclaring buffers with the same buffer variable, but with different types). So, you could read/write float32, then access the same storage with int16 (not necessarily using the same BufferNode, though), etc. As per some older comments, the TBAA annotation code was borrowed from Halide, and I can speculate that the dtype was added to TBAA to indicate that data with one underlying type will not be accessed via another underlying type. So, you could reuse buffers, but if you stored float32 you'd better not be trying to read it as int32, which follows the C/C++ type based aliasing rules. We don't have any specific aliasing rules in TIR that I'm aware of, but removing the type tag will make all accesses to the same buffer be aliased. This may be a good thing because it's safe, but we need to make that a deliberate decision (and document it somewhere if it isn't yet).

wrongtest-intellif · 2022-05-06T04:40:46Z

document it somewhere

https://github.com/vinx13/tvm-rfcs/blob/clarify-buffer-access/rfcs/0063-clarifying-buffer-declaration-and-access.md
The clarification RFC says T.buffer_decl creates a buffer alias if the underlying data variable (.data field) overlaps with another buffer. And this should be the unique way to create buffer aliases.

Currently I understand that means all accesses with the same buffer data must be alias (irrelavant to dtype) and the word alias take the same meaning across TIR and target codegen levels. If so (fix me), no compatible C/C++ type based aliasing rules can be introduced. cc @kparzysz-quic

kparzysz-quic · 2022-05-06T13:08:31Z

The clarification RFC says T.buffer_decl creates a buffer alias if the underlying data variable (.data field) overlaps with another buffer. And this should be the unique way to create buffer aliases.

Thank you for the link. I guess the document should also specify that in eg2 in your earlier comment, the buffers will not be aliased.

* fix a possible tbaa issue * Correct tbaa index unit by underlying buffer elemtype * always use byte as index unit in tbaa

github-actions bot requested a review from Lunderberg April 29, 2022 12:32

wrongtest-intellif force-pushed the fix_possible_tbaa_problem branch from 99128b4 to 60e7760 Compare May 3, 2022 14:00

wrongtest-intellif force-pushed the fix_possible_tbaa_problem branch 2 times, most recently from eb8378f to 092142e Compare May 5, 2022 05:08

Lunderberg approved these changes May 5, 2022

View reviewed changes

wrongtest-intellif force-pushed the fix_possible_tbaa_problem branch from 092142e to adeff51 Compare May 5, 2022 21:17

kparzysz-quic approved these changes May 6, 2022

View reviewed changes

wrongtest-intellif added 3 commits May 7, 2022 15:10

fix a possible tbaa issue

24a4747

Correct tbaa index unit by underlying buffer elemtype

c830a2d

always use byte as index unit in tbaa

b536400

wrongtest-intellif force-pushed the fix_possible_tbaa_problem branch from adeff51 to b536400 Compare May 7, 2022 07:10

wrongtest-intellif merged commit 62d3a67 into apache:main May 7, 2022

shtinsa pushed a commit to Deelvin/tvm that referenced this pull request May 17, 2022

[LLVM] Fix a possible tbaa issue (apache#11181)

51d4a5f

* fix a possible tbaa issue * Correct tbaa index unit by underlying buffer elemtype * always use byte as index unit in tbaa

SebastianBoblest pushed a commit to SebastianBoblest/tvm that referenced this pull request May 27, 2022

[LLVM] Fix a possible tbaa issue (apache#11181)

15ebdad

* fix a possible tbaa issue * Correct tbaa index unit by underlying buffer elemtype * always use byte as index unit in tbaa

Conversation

wrongtest-intellif commented Apr 29, 2022

Uh oh!

Lunderberg commented Apr 29, 2022

Uh oh!

wrongtest-intellif commented Apr 30, 2022

Uh oh!

Lunderberg commented May 2, 2022

Uh oh!

tqchen commented May 2, 2022

Uh oh!

wrongtest-intellif commented May 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tqchen commented May 3, 2022

Uh oh!

kparzysz-quic commented May 3, 2022

Uh oh!

wrongtest-intellif commented May 3, 2022

Uh oh!

wrongtest-intellif commented May 5, 2022

Uh oh!

Lunderberg left a comment

Choose a reason for hiding this comment

Uh oh!

Lunderberg May 5, 2022

Choose a reason for hiding this comment

Uh oh!

Lunderberg May 5, 2022

Choose a reason for hiding this comment

Uh oh!

Lunderberg May 5, 2022

Choose a reason for hiding this comment

Uh oh!

wrongtest-intellif May 5, 2022

Choose a reason for hiding this comment

Uh oh!

kparzysz-quic commented May 5, 2022

Uh oh!

wrongtest-intellif commented May 6, 2022

Uh oh!

kparzysz-quic commented May 6, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wrongtest-intellif commented May 3, 2022 •

edited

Loading