Skip to content

[TIR] Revert #11428 and move loop dependent alloc extent check after region union#12019

Merged
junrushao merged 3 commits intoapache:mainfrom
wrongtest-intellif:fix_compact_buffer_allocation
Jul 7, 2022
Merged

[TIR] Revert #11428 and move loop dependent alloc extent check after region union#12019
junrushao merged 3 commits intoapache:mainfrom
wrongtest-intellif:fix_compact_buffer_allocation

Conversation

@wrongtest-intellif
Copy link
Contributor

@junrushao
Copy link
Member

Moving our discussion thread from #11428. Thanks @ArmageddonKnight for spotting this nasty corner case, and @comaniac for offline discussion :-)

@junrushao1994:

This PR introduces a regression with those lines:

PrimExpr extent = analyzer->Simplify(it.max() - it.min() + 1);
// skip accurate region analysis result if there are outer loop dependencies.
if (UsesVar(extent, [&ancestor_loop_vars](const VarNode* v) {
return std::find(ancestor_loop_vars.begin(), ancestor_loop_vars.end(), v) !=
ancestor_loop_vars.end();
})) {
break;
}
res.push_back(it);

Consider a case below:

extent = 960 - max(bx // 18 * 128, 832)
where bx \in [0, 144)

Even if bx exists in extent, the upper bound of the allocation is not affected (which is 128). However, the check there introduces a false positive which falls back to a much more conservative estimate (full region).

@junrushao1994

A minimal reproducible example: https://gist.github.com/junrushao1994/943eb0e9232a21a2748eb35af0fac4ff

The region of A_shared should be [128, 4] instead of [192, 4]

@junrushao1994

so let me briefly introduce the workflow of Compute-Buffer-Region first, and then let’s think about how to make it perfectly fit in our usecase.

On each use-site of a buffer, we calculate the region it is touched by the loops above it, and then by unionizing those regions, it’s made possible to trim out those regions that are not touched.

As a simplest example, if the original size of a buffer in shared memory is [1024, 1024], but only [bx : bx + 10, by : by + 20] is touched in a threadblock, then the buffer size could shrink to [10, 20] which is significantly smaller.

Then to provide a slightly more complicated example, if the buffer is touched in two different places, where the region are [bx : bx + 10 : by : by + 20], [bx + 10: bx + 20, by : by + 30], then we calculate the union of the two regions, i.e. [bx : bx + 20, by : by + 30], and then do shrinking so that its size of [20, 30].

The problem this PR aims to address is that some buffer’s size actually depends on irrelevant outer variables, for example, the shared memory size to be [0, 960 - max(bx // 18 * 128, 832)], where bx is the extent of blockIdx.x. However, it takes an approach to over-relax the region when seeing any undesired variables (in our case, it’s bx), which is less optimal.

The actual solution should be: calculating the region union bound first, and then remove the undesired variables with something similar to analyzer's const_int_bound (note that we don’t want the bound to be strictly constant)

@wrongtest

calculating the region union bound first

Many thanks! It indeed should check loop dependency for union region.

https://gist.github.com/junrushao1994/943eb0e9232a21a2748eb35af0fac4ff
In the case it seems we get extent=128 for A_shared[128, 4] after fix, even without const bound estimation... Perhaps the loop dependent region parts get covered during region unions?

@ArmageddonKnight
Copy link
Contributor

ArmageddonKnight commented Jul 6, 2022

Thanks for the fix. I think this implementation makes sense. Please let me know when it is merged so that I can rebase onto the latest main branch.

@junrushao
Copy link
Member

Looks like we are now calculating the upper bound of the union bound, which I think is correct. Would love to have confirmation from both of you guys @wrongtest @ArmageddonKnight

@junrushao
Copy link
Member

Let's also add my minimal reproducible example as a regression unittest in case it happens in the future

@wrongtest-intellif
Copy link
Contributor Author

Copy link
Member

@junrushao junrushao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your swift response!!

@junrushao junrushao merged commit b9aa356 into apache:main Jul 7, 2022
masahi pushed a commit to masahi/tvm that referenced this pull request Jul 15, 2022
mikeseven pushed a commit to mikeseven/tvm that referenced this pull request Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants