Re-write replaceSymbolicSizes using IdModel #2714

jacobhinkle · 2024-07-30T00:47:04Z

This uses IdModel to implement replaceSymbolicSizes. Extents are replaced with a single representative from their exact graph ValGroup with the following precedence:

Constants are preferred
If no constants exist, prefer the extents of fusion inputs.
Ties are broken by choosing the scalar with the smallest name().

Fixes #2702. Fixes #2766

In #2671, the replaceSymbolicSizes lowering pass calls ir_utils::replaceValue with a seemingly benign list of scalar Val replacements. However, an error is encountered because in replacing IterDomains whose extents should be replaced, we wind up erasing the definition of a Split output. We should instead preserve these definitions and just replace the output of that expression. Fixes #2671

…el_replace_sizes

jacobhinkle · 2024-07-30T12:45:39Z

!build --diff

…el_replace_sizes

jacobhinkle · 2024-08-01T19:22:44Z

Marking as draft while I investigate some test failures

This fixes some expand-related failures

jacobhinkle · 2024-08-02T18:38:05Z

!build --diff-bench

naoyam · 2024-08-07T20:18:57Z

This also fixes #2766.

@jacobhinkle, please add this as a test as well. This is based on the repro of #2766.

// Repro of #2766 
TEST_F(NVFuserTest, SmallOuterBlockReductionRepro) {
  std::unique_ptr<Fusion> fusion_ptr = std::make_unique<Fusion>();
  auto& fusion = *fusion_ptr;
  FusionGuard fg(&fusion);

  std::vector<int64_t> shape{100, 2, 128};

  auto tv0 = makeContigTensor(2);
  fusion.addInput(tv0);

  auto tv1 = reshape(
      tv0,
      {IrBuilder::create<Val>(shape[0]),
       IrBuilder::create<Val>(shape[1]),
       IrBuilder::create<Val>(shape[2])});
  auto tv2 = sum(tv1, {1});
  fusion.addOutput(tv2);

  // Previously, after the extent replacement of the lowering, the reduction reference tensor got a reduction domain of a static size, which is just 1, but the pre-reshape tensors still kept using symbolic extents. Before #2771, the scheduler decided to not use TIDy because the reference tensor has a static size of 1, but since the other tensors still had dynamic sizes, it resulted in the dynamic allocation error.

  auto options = at::TensorOptions().dtype(at::kFloat).device(at::kCUDA, 0);
  auto t0 = at::randn({shape[0] * shape[1], shape[2]}, options);
  std::vector<c10::IValue> inputs({t0});

  FusionExecutorCache fec(std::move(fusion_ptr));
  auto outputs = fec.runFusionWithInputs(inputs);

  testValidate(fec.fusion(), outputs, inputs, __LINE__, __FILE__);
}

Previously, the expand changed the ability to re-use smem. But now that we replace the exact mapped extents, we can _always_ reuse this smem.

This was found when implementing #2714. We currently do not bind TensorMetaData for input tensors to PrecomputedValues. This means we cannot evaluate expressions that contain them, which can lead to errors. This PR binds these metadata structs, which I think is the expected behavior.

Previously, this method only compared the pointers held by two StructHandles. This PR changes it to check that the name, number of fields, and the DataType and value of each field match. #2714 (comment)

jacobhinkle · 2024-08-21T13:49:36Z

!build

This was found when implementing #2714. We currently do not bind TensorMetaData for input tensors to PrecomputedValues. This means we cannot evaluate expressions that contain them, which can lead to errors. This PR binds these metadata structs, which I think is the expected behavior.

jacobhinkle · 2024-08-22T19:27:41Z

!build --diff-bench

jacobhinkle · 2024-08-23T09:23:30Z

The only test failures are known H100 thunder SDPA failures.

liqiangxl · 2024-08-23T13:25:05Z

@jacobhinkle I merged #2668 into main, updated this branch, and restarted CI. Just to make ensure there is no conflict.

liqiangxl · 2024-08-23T13:25:21Z

!build

jacobhinkle · 2024-08-23T13:30:08Z

@jacobhinkle I merged #2668 into main, updated this branch, and restarted CI. Just to make ensure there is no conflict.

Good idea. Thanks.

naoyam

Do we have basic tests of the replacement? If not, please add some unit tests.

csrc/device_lower/pass/replace_size.cpp

naoyam · 2024-08-23T20:40:51Z

csrc/device_lower/pass/replace_size.cpp

-        tensor_dim_map[orig_extent] = simplified_extent;
-      }
+    auto it = tensor_dim_map.find(simplified_extent);
+    if (it != tensor_dim_map.end()) {


Why are we doing this? Previously, there's tensor_dim_map, and we update it with expr_simplification_map. Now, it seems we are doing the opposite, which means if an extent is discovered to be a constant in expr_simplification_map, it would be overwritten by the symbolic representation. Am I understanding it correctly?

Yes, we are now doing more replacements than previously. Previously we would only replace extents that are exact mapped to input extents. This was a problem in the motivating example from #2702: see #2702 (comment). In that case it's "ceilDiv" extent of a reshaped tensor that needs to be mapped as constant and it's not actually exact mapped with an input.

I understand that, but what happens if an input domain is discovered to be a constant in extent_simplification_map. Wouldn't it be overwritten by the getMetaData expr even if it's constant?

The purpose of this loop is to handle cases where the simplified extent is an input tensor dimension; it should ignore any ValGroups that simplify to constants or other values.

Suppose we have i0 = getMetaData[T0].logical_size[0] in tensor_dim_map and we find that i0 is equal to 5 so that extent_simplification_map[i0] = 5. This loop is looping over extent_simplification_map and reaches the entry i0 -> 5. We then look up the simplified extent which is 5 in this case in tensor_dim_map. Since 5 is not a tensor dim, it is not found so we don't update extent_simplification_map.

Now suppose i1 = getMetaData[T0].logical_size[1] in tensor_dim_map and we find that another extent is mapped to it in extent_simplification_map, e.g. extent_simplification_map[i3] = i1. In this case, we will find i1 in tensor_dim_map in this loop, and we will update it to extent_simplification_map[i3] = getMetaData[T0].logical_size[1].

I will add examples like these to the comments to make the code more clear.

Ah, I see.

Why is the previous code not enough?

The previous code was composing in the other direction, so that only extents that started out in tensor_dim_map were replaced. Specifically, that means only input tensor dimensions were replaced. But there can be other dimensions that are exact mapped with those dimensions, like if we do a reshape or resize operation before a BinaryOp. In the new code, all dimensions get standardized instead.

Hmm, what about line 240 of the previous code? Doesn't it add a mapping that does not exist in tensor_dim_map?

Never mind, I misread the code. It's under the if branch starting at line 237, not 236.

jacobhinkle · 2024-08-26T13:18:50Z

tests/cpp/test_gpu3.cpp

+// Check that extents are properly replaced by replaceSymbolicSizes lowering
+// pass
+TEST_F(NVFuserTest, ReplaceSymbolicSizes) {


Added this test which matches the behavior described in the comments of replaceSymbolicSizes.

naoyam

LGTM. Thanks!

liqiangxl

LGTM.

jacobhinkle added 3 commits July 29, 2024 15:19

Merge remote-tracking branch 'origin/mutator_sibling_ids'

729bad7

Rewrite replaceSymbolicSizes using IdModel

901062b

jacobhinkle changed the title ~~Id model replace sizes~~ Re-write replaceSymbolicSizes using IdModel Jul 30, 2024

jacobhinkle changed the base branch from main to mutator_sibling_ids July 30, 2024 00:48

jacobhinkle added 4 commits July 30, 2024 11:56

Merge remote-tracking branch 'origin/main' into mutator_sibling_ids

9bb3e6d

Add to comment and add a more targeted test

6ee2b1f

Expand comment in mutator.cpp

5478a96

Merge remote-tracking branch 'origin/mutator_sibling_ids' into id_mod…

38278b0

…el_replace_sizes

wujingyue mentioned this pull request Jul 31, 2024

Codegen error: have dynamic allocations but are placed in local memory. coming from qkv_split_rope backwards #2702

Closed

jacobhinkle and others added 5 commits August 1, 2024 12:58

Merge branch 'main' into mutator_sibling_ids

6600212

Fix some bugs and slightly improve the comments

4c2c94d

lintrunner

2289b19

Merge remote-tracking branch 'origin/mutator_sibling_ids' into id_mod…

4ac18e0

…el_replace_sizes

Add test_issue_2702 from #2713

920c05a

Base automatically changed from mutator_sibling_ids to main August 1, 2024 19:14

jacobhinkle marked this pull request as ready for review August 1, 2024 19:14

Merge branch 'main' into id_model_replace_sizes

6ebb069

jacobhinkle marked this pull request as draft August 1, 2024 19:22

jacobhinkle added 2 commits August 2, 2024 18:37

Use extent instead of expanded extent

246a5bb

This fixes some expand-related failures

Don't remap constants

db9dc24

naoyam mentioned this pull request Aug 7, 2024

Allocs must be based on constant integers #2766

Closed

jacobhinkle added 4 commits August 7, 2024 23:40

Add repro of #2766 as test

b34a404

Properly check if ID comes from an input TV

920e891

Fix SmemReuseTest.ExpandInterferes test

c884bdd

Previously, the expand changed the ability to re-use smem. But now that we replace the exact mapped extents, we can _always_ reuse this smem.

Merge remote-tracking branch 'origin/main' into id_model_replace_sizes

313606d

jacobhinkle mentioned this pull request Aug 20, 2024

Bind TensorMetaData to PrecomputedValues #2812

Merged

jacobhinkle marked this pull request as ready for review August 21, 2024 00:34

Merge remote-tracking branch 'origin/main' into id_model_replace_sizes

39efa7d

jacobhinkle mentioned this pull request Aug 21, 2024

use id model in extent substitution #2668

Merged

jacobhinkle added 3 commits August 22, 2024 15:07

Merge remote-tracking branch 'origin/main' into id_model_replace_sizes

f095bee

Revert change to evaluator_common.cpp

9660cdf

lintrunner

4eae378

jacobhinkle requested review from liqiangxl and naoyam August 22, 2024 23:51

Merge branch 'main' into id_model_replace_sizes

d544ad0

naoyam reviewed Aug 23, 2024

View reviewed changes

jacobhinkle added 3 commits August 23, 2024 20:00

Clarify loop over extent_simplification_map

548144e

Add example to comments to show intended behavior

f19b19d

Add ReplaceSymbolicSizes test

c0e954f

jacobhinkle commented Aug 26, 2024

View reviewed changes

naoyam approved these changes Aug 26, 2024

View reviewed changes

liqiangxl approved these changes Aug 26, 2024

View reviewed changes

jacobhinkle merged commit 3b61042 into main Aug 26, 2024
5 checks passed

jacobhinkle deleted the id_model_replace_sizes branch August 26, 2024 18:50

naoyam mentioned this pull request Aug 26, 2024

Always use TIDy in block outer reduction #2771

Closed

jacobhinkle mentioned this pull request Aug 28, 2024

performance issue on dynamic shaped tensor #2795

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-write replaceSymbolicSizes using IdModel #2714

Re-write replaceSymbolicSizes using IdModel #2714

jacobhinkle commented Jul 30, 2024 •

edited

Loading

jacobhinkle commented Jul 30, 2024

jacobhinkle commented Aug 1, 2024

jacobhinkle commented Aug 2, 2024

naoyam commented Aug 7, 2024

jacobhinkle commented Aug 21, 2024

jacobhinkle commented Aug 22, 2024

jacobhinkle commented Aug 23, 2024

liqiangxl commented Aug 23, 2024 •

edited

Loading

liqiangxl commented Aug 23, 2024

jacobhinkle commented Aug 23, 2024

naoyam left a comment

naoyam Aug 23, 2024

jacobhinkle Aug 23, 2024

naoyam Aug 24, 2024

jacobhinkle Aug 26, 2024

naoyam Aug 26, 2024

jacobhinkle Aug 26, 2024

naoyam Aug 26, 2024

naoyam Aug 26, 2024

jacobhinkle Aug 26, 2024

naoyam left a comment

liqiangxl left a comment

Re-write replaceSymbolicSizes using IdModel #2714

Re-write replaceSymbolicSizes using IdModel #2714

Conversation

jacobhinkle commented Jul 30, 2024 • edited Loading

jacobhinkle commented Jul 30, 2024

jacobhinkle commented Aug 1, 2024

jacobhinkle commented Aug 2, 2024

naoyam commented Aug 7, 2024

jacobhinkle commented Aug 21, 2024

jacobhinkle commented Aug 22, 2024

jacobhinkle commented Aug 23, 2024

liqiangxl commented Aug 23, 2024 • edited Loading

liqiangxl commented Aug 23, 2024

jacobhinkle commented Aug 23, 2024

naoyam left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

naoyam left a comment

Choose a reason for hiding this comment

liqiangxl left a comment

Choose a reason for hiding this comment

jacobhinkle commented Jul 30, 2024 •

edited

Loading

liqiangxl commented Aug 23, 2024 •

edited

Loading