[ARITH] Tight bound for floormod #6771

hzfan · 2020-10-27T10:11:33Z

Estimate the range of floormod(a, b) when b < 0. Fix #6691

giuseros · 2020-10-27T10:39:23Z

src/arith/const_int_bound.cc

+      /* let a / b = x + y, where x is integer, y \in [0, 1)
+       * floormod(a, b) = a - floordiv(a, b) * b
+       * floordiv(a, b) = x
+       * floormod(a, b) = a - floordiv(a, b) * b
+       *                = a - x * b
+       *                = a - (a / b - y) * b
+       *                = a - a + y * b
+       *                = y * b
+       * note that 0 <= y < 1
+       * when b > 0, 0 <= b * y < b
+       *             0 <= b * y <= b - 1
+       * when b < 0, b < b * y <= 0
+       *             b + 1 <= b * y <= 0
+       * In all cases, min(0, b + 1) <= b * y <= max(0, b - 1)
+       *               min(0, b_min + 1) <= b * y <= max(0, b_max - 1)
+       * That is, min(0, b_min + 1) <= floormod(a, b) <= max(0, b_max - 1)
+       */
+      int64_t b_min_cap = InfAwareAdd(b.min_value, 1);


Just a suggestion: why don't you move this (very well done) explanation at the beginning of the function? It seems to cover more than just the else branch

Sure. Thanks for the suggestion.

tqchen · 2020-10-27T13:42:23Z

@hzfan can you also comment why negative b appears in the case of #6691, is it because we did not capture bound of b? Usually negative bound should not appear

hzfan · 2020-10-28T01:35:00Z

@tqchen The ir (before narrowing) is like

for (ax0.ax1.fused.ax2.outer.fused: int64, 0i64, 42i64) "parallel" {
  for (n.oc_chunk.fused.oh.outer.fused: int64, 0i64, a_very_long_upper_bound) {
    ...
    ...
    for (oh.inner: int32, 0, 2) {
      for (ow.inner: int32, 0, 14) {
        for (oc_block: int32, 0, 16) "vectorized" {
          ...floormod(n.oc_chunk.fused.oh.outer.fused, cast(int64, floordiv(((cast(int32, ((floormod(ax0.ax1.fused.ax2.outer.fused, 7i64)*2i64) + 1i64)) + 1) - cast(int32, (floormod(ax0.ax1.fused.ax2.outer.fused, 7i64)*2i64))), 2)))...
        }
      }
    }
  }
}

Let a = n.oc_chunk.fused.oh.outer.fused (the first operand of floormod), b = cast(int64, floordiv(((cast(int32, ((floormod(ax0.ax1.fused.ax2.outer.fused, 7i64)*2i64) + 1i64)) + 1) - cast(int32, (floormod(ax0.ax1.fused.ax2.outer.fused, 7i64)*2i64))), 2)) (second operand of floormod)
const_int_bound gives 0 <= a <= 6 and -5 <= b <= 7.

In a simplified form,

b = floordiv( cast(i32, c * 2 + 1) + 1 - cast(i32, c * 2), 2)
c = floormod(ax0.ax1.fused.ax2.outer.fused, 7)

I guess cast(i32, c * 2 + 1) + 1 - cast(i32, c * 2) is not simplified as 2, instead, its bound is analyzed directly, so the min becomes negative due to the minus operation.

hzfan · 2020-10-28T02:04:39Z

It sort of make sense, because at the narrowing pass, we still don't know c * 2 + 1 fits in i32 and the cast cannot be removed yet.

I added the above example as a test. The FPS improves to 400 with INDEX_DEFAULT_I64=ON

tqchen · 2020-10-28T13:04:53Z

I see, in this particular case, perhaps it makes sense to optimize such pattern and make sure cast(i32, c * 2 + 1) + 1 - cast(i32, c * 2) get simplified as well. One way to do so is to first change i32 casts to i64 before narrow.

It would also be useful to find out why cast(value, i32) is inserted since we preferred i64 in most cases.

We can do that in another PR. This PR's improvement is certainly useful

tqchen · 2020-10-28T13:07:44Z

THanks @hzfan @giuseros . It would also be great if we can followup further on the above case:

Find out why cast i32 is inserted (ideally we should be all in i64 in compute mode)
Consider insert a cast removing pass if cast i32 persists, or update rewrite simplify to handle cast

hzfan added 3 commits October 27, 2020 17:49

fine-grained estimation for floormod

2b2fceb

fix

e90cc29

comments

60b479d

giuseros reviewed Oct 27, 2020

View reviewed changes

tqchen added the status: need test case need test cases to cover the change label Oct 27, 2020

negative divisor test

69f8fb2

tqchen approved these changes Oct 28, 2020

View reviewed changes

tqchen merged commit b4858d4 into apache:main Oct 28, 2020

tqchen added status: accepted and removed status: need test case need test cases to cover the change labels Oct 28, 2020

tqchen mentioned this pull request Oct 28, 2020

[Performance] Performance regression with int64 indices INDEX_DEFAULT_I64=ON (PR #6143) #6691

Closed

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Oct 29, 2020

[ARITH] Tight bound for floormod (apache#6771)

eed841a

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 2, 2020

[ARITH] Tight bound for floormod (apache#6771)

23e9452

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 4, 2020

[ARITH] Tight bound for floormod (apache#6771)

0236c90

trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Dec 4, 2020

[ARITH] Tight bound for floormod (apache#6771)

85601a2

junrushao mentioned this pull request Nov 1, 2021

Apache TVM v0.8 Release Note Candidate #9416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ARITH] Tight bound for floormod #6771

[ARITH] Tight bound for floormod #6771

hzfan commented Oct 27, 2020 •

edited

giuseros Oct 27, 2020

hzfan Oct 28, 2020

tqchen commented Oct 27, 2020 •

edited

hzfan commented Oct 28, 2020 •

edited

hzfan commented Oct 28, 2020 •

edited

tqchen commented Oct 28, 2020

tqchen commented Oct 28, 2020

[ARITH] Tight bound for floormod #6771

[ARITH] Tight bound for floormod #6771

Conversation

hzfan commented Oct 27, 2020 • edited

giuseros Oct 27, 2020

Choose a reason for hiding this comment

hzfan Oct 28, 2020

Choose a reason for hiding this comment

tqchen commented Oct 27, 2020 • edited

hzfan commented Oct 28, 2020 • edited

hzfan commented Oct 28, 2020 • edited

tqchen commented Oct 28, 2020

tqchen commented Oct 28, 2020

hzfan commented Oct 27, 2020 •

edited

tqchen commented Oct 27, 2020 •

edited

hzfan commented Oct 28, 2020 •

edited

hzfan commented Oct 28, 2020 •

edited