Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARITH] Tight bound for floormod #6771

Merged
merged 4 commits into from Oct 28, 2020
Merged

Conversation

hzfan
Copy link
Contributor

@hzfan hzfan commented Oct 27, 2020

Estimate the range of floormod(a, b) when b < 0. Fix #6691

Comment on lines 262 to 279
/* let a / b = x + y, where x is integer, y \in [0, 1)
* floormod(a, b) = a - floordiv(a, b) * b
* floordiv(a, b) = x
* floormod(a, b) = a - floordiv(a, b) * b
* = a - x * b
* = a - (a / b - y) * b
* = a - a + y * b
* = y * b
* note that 0 <= y < 1
* when b > 0, 0 <= b * y < b
* 0 <= b * y <= b - 1
* when b < 0, b < b * y <= 0
* b + 1 <= b * y <= 0
* In all cases, min(0, b + 1) <= b * y <= max(0, b - 1)
* min(0, b_min + 1) <= b * y <= max(0, b_max - 1)
* That is, min(0, b_min + 1) <= floormod(a, b) <= max(0, b_max - 1)
*/
int64_t b_min_cap = InfAwareAdd(b.min_value, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion: why don't you move this (very well done) explanation at the beginning of the function? It seems to cover more than just the else branch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Thanks for the suggestion.

@tqchen
Copy link
Member

tqchen commented Oct 27, 2020

@hzfan can you also comment why negative b appears in the case of #6691, is it because we did not capture bound of b? Usually negative bound should not appear

@tqchen tqchen added the status: need test case need test cases to cover the change label Oct 27, 2020
@hzfan
Copy link
Contributor Author

hzfan commented Oct 28, 2020

@tqchen The ir (before narrowing) is like

for (ax0.ax1.fused.ax2.outer.fused: int64, 0i64, 42i64) "parallel" {
  for (n.oc_chunk.fused.oh.outer.fused: int64, 0i64, a_very_long_upper_bound) {
    ...
    ...
    for (oh.inner: int32, 0, 2) {
      for (ow.inner: int32, 0, 14) {
        for (oc_block: int32, 0, 16) "vectorized" {
          ...floormod(n.oc_chunk.fused.oh.outer.fused, cast(int64, floordiv(((cast(int32, ((floormod(ax0.ax1.fused.ax2.outer.fused, 7i64)*2i64) + 1i64)) + 1) - cast(int32, (floormod(ax0.ax1.fused.ax2.outer.fused, 7i64)*2i64))), 2)))...
        }
      }
    }
  }
}

Let a = n.oc_chunk.fused.oh.outer.fused (the first operand of floormod), b = cast(int64, floordiv(((cast(int32, ((floormod(ax0.ax1.fused.ax2.outer.fused, 7i64)*2i64) + 1i64)) + 1) - cast(int32, (floormod(ax0.ax1.fused.ax2.outer.fused, 7i64)*2i64))), 2)) (second operand of floormod)
const_int_bound gives 0 <= a <= 6 and -5 <= b <= 7.

In a simplified form,

b = floordiv( cast(i32, c * 2 + 1) + 1 - cast(i32, c * 2), 2)
c = floormod(ax0.ax1.fused.ax2.outer.fused, 7)

I guess cast(i32, c * 2 + 1) + 1 - cast(i32, c * 2) is not simplified as 2, instead, its bound is analyzed directly, so the min becomes negative due to the minus operation.

@hzfan
Copy link
Contributor Author

hzfan commented Oct 28, 2020

It sort of make sense, because at the narrowing pass, we still don't know c * 2 + 1 fits in i32 and the cast cannot be removed yet.

I added the above example as a test. The FPS improves to 400 with INDEX_DEFAULT_I64=ON

@tqchen
Copy link
Member

tqchen commented Oct 28, 2020

I see, in this particular case, perhaps it makes sense to optimize such pattern and make sure cast(i32, c * 2 + 1) + 1 - cast(i32, c * 2) get simplified as well. One way to do so is to first change i32 casts to i64 before narrow.

It would also be useful to find out why cast(value, i32) is inserted since we preferred i64 in most cases.

We can do that in another PR. This PR's improvement is certainly useful

@tqchen tqchen merged commit b4858d4 into apache:main Oct 28, 2020
@tqchen tqchen added status: accepted and removed status: need test case need test cases to cover the change labels Oct 28, 2020
@tqchen
Copy link
Member

tqchen commented Oct 28, 2020

THanks @hzfan @giuseros . It would also be great if we can followup further on the above case:

  • Find out why cast i32 is inserted (ideally we should be all in i64 in compute mode)
  • Consider insert a cast removing pass if cast i32 persists, or update rewrite simplify to handle cast

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Oct 29, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 2, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 4, 2020
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Dec 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Performance] Performance regression with int64 indices INDEX_DEFAULT_I64=ON (PR #6143)
3 participants