Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent rfactor #1692

Closed
naoyam opened this issue May 11, 2022 · 1 comment · Fixed by #1723
Closed

Inconsistent rfactor #1692

naoyam opened this issue May 11, 2022 · 1 comment · Fixed by #1723
Assignees
Labels

Comments

@naoyam
Copy link
Collaborator

naoyam commented May 11, 2022

In this fusion, the rfactor tensor gets an inconsistent rfactor domain.

TEST_F(NVFuserTest, TMP) {
  Fusion fusion;
  FusionGuard fg(&fusion);

  auto tv0 = makeSymbolicTensor(2);
  fusion.addInput(tv0);

  auto tv1 = sum(tv0, {0, 1});

  fusion.addOutput(tv1);

  tv1->split(1, 4);
  tv1->split(0, 3);
  tv1->merge(1, 2);
  auto rf = tv1->rFactor({-1});

  std::cout << ir_utils::toString(rf->getRootDomain()) << std::endl;
  std::cout << ir_utils::toString(rf->getMaybeRFactorDomain()) << std::endl;

  GpuLower gpu_lower(&fusion);
}

The rfactor domain of tensor rf: iS9{i1} iS15{( 3 * ( ceilDiv(i2, 4) ) )}rf rS14{4}rf. Specifically, 3 comes from i1, but it's also included in the second ID.

Not sure if that's the root cause, but the fusion results in an error when building the computeAt map:

* thread #1, name = 'test_jit', stop reason = breakpoint 1.1
    frame #0: 0x00007fffd5fd910b libstdc++.so.6`__cxxabiv1::__cxa_throw(obj=0x0000000002d9c720, tinfo=0x00007fffd60927a8, dest=(libstdc++.so.6`std::out_of_range::~out_of_range() at stdexcept.cc:65:33))(void *)) at eh_throw.cc:77:1
Process 13953 launched: '/home/nmaruyama/pytorch/debug3/build/bin/test_jit' (x86_64)
(lldb) bt
* thread #1, name = 'test_jit', stop reason = breakpoint 1.1
  * frame #0: 0x00007fffd5fd910b libstdc++.so.6`__cxxabiv1::__cxa_throw(obj=0x0000000002d9c720, tinfo=0x00007fffd60927a8, dest=(libstdc++.so.6`std::out_of_range::~out_of_range() at stdexcept.cc:65:33))(void *)) at eh_throw.cc:77:1
    frame #1: 0x00007fffd5fd6087 libstdc++.so.6`std::__throw_out_of_range(__s="_Map_base::at") at functexcept.cc:82:5
    frame #2: 0x00007ffff6f5c9ce libtorch_cuda.so`std::__detail::_Map_base<torch::jit::fuser::cuda::IterDomain*, std::pair<torch::jit::fuser::cuda::IterDomain* const, torch::jit::fuser::cuda::VectorOfUniqueEntries<torch::jit::fuser::cuda::IterDomain*, std::hash<torch::jit::fuser::cuda::IterDomain*> > >, std::allocator<std::pair<torch::jit::fuser::cuda::IterDomain* const, torch::jit::fuser::cuda::VectorOfUniqueEntries<torch::jit::fuser::cuda::IterDomain*, std::hash<torch::jit::fuser::cuda::IterDomain*> > > >, std::__detail::_Select1st, std::equal_to<torch::jit::fuser::cuda::IterDomain*>, std::hash<torch::jit::fuser::cuda::IterDomain*>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::at(this=0x0000000002d98418, __k=0x00007fffffffbe38) at hashtable_policy.h:760:2
    frame #3: 0x00007ffff6f55fdd libtorch_cuda.so`std::unordered_map<torch::jit::fuser::cuda::IterDomain*, torch::jit::fuser::cuda::VectorOfUniqueEntries<torch::jit::fuser::cuda::IterDomain*, std::hash<torch::jit::fuser::cuda::IterDomain*> >, std::hash<torch::jit::fuser::cuda::IterDomain*>, std::equal_to<torch::jit::fuser::cuda::IterDomain*>, std::allocator<std::pair<torch::jit::fuser::cuda::IterDomain* const, torch::jit::fuser::cuda::VectorOfUniqueEntries<torch::jit::fuser::cuda::IterDomain*, std::hash<torch::jit::fuser::cuda::IterDomain*> > > > >::at(this=0x0000000002d98418, __k=0x00007fffffffbe38) at unordered_map.h:991:21
    frame #4: 0x00007ffff6f52a3b libtorch_cuda.so`torch::jit::fuser::cuda::IterDomainGraph::build(this=0x0000000002d982f0, fusion=0x0000000002d28140) at compute_at_map.cpp:180:22
@naoyam naoyam added the bug label May 11, 2022
@zasdfgbnm zasdfgbnm self-assigned this May 13, 2022
@zasdfgbnm
Copy link
Collaborator

Assign myself to take a look

zasdfgbnm added a commit that referenced this issue May 19, 2022
I feel that this could be helpful for debugging. Using the example that breaks our system in #1692

Before
```
TransformPrinter : 
T0_g[ iS0{i1}, iS1{i2} ]
 root domain : (iS0{i1},iS1{i2})
T2_l[ iS11{( ceilDiv(i1, 3) )}, iS15{( 3 * ( ceilDiv(i2, 4) ) )}rf, rS14{4}rf ]
 root domain : (iS9{i1},iS15{( 3 * ( ceilDiv(i2, 4) ) )}rf,rS14{4}rf)
  Split: iS9{i1} by factor 3 -> iS11{( ceilDiv(i1, 3) )}, iS12{3}, start offset: 0, stop offset: 0
T1_g[ rS17{( ceilDiv(i1, 3) )}, rS19{( 3 * ( ceilDiv(i2, 4) ) )} ]
 root domain : (rS16{i1},rS19{( 3 * ( ceilDiv(i2, 4) ) )})
  Split: rS16{i1} by factor 3 -> rS17{( ceilDiv(i1, 3) )}, rS18{3}, start offset: 0, stop offset: 0
}
```

After:
```
TransformPrinter : 
T0_g[ iS0{i1}, iS1{i2} ]
 root domain : (iS0{i1},iS1{i2})
T2_l[ iS11{( ceilDiv(i1, 3) )}, iS15{( 3 * ( ceilDiv(i2, 4) ) )}rf, rS14{4}rf ]
 root domain : (iS9{i1},rS10{i2}rf)
  Split: iS9{i1} by factor 3 -> iS11{( ceilDiv(i1, 3) )}, iS12{3}, start offset: 0, stop offset: 0
  Split: rS10{i2}rf by factor 4 -> iS13{( ceilDiv(i2, 4) )}rf, rS14{4}rf, start offset: 0, stop offset: 0
  Merge: iS12{3} and iS13{( ceilDiv(i2, 4) )}rf -> iS15{( 3 * ( ceilDiv(i2, 4) ) )}rf
 rfactor domain : (iS9{i1},iS15{( 3 * ( ceilDiv(i2, 4) ) )}rf,rS14{4}rf)
  Split: iS9{i1} by factor 3 -> iS11{( ceilDiv(i1, 3) )}, iS12{3}, start offset: 0, stop offset: 0
T1_g[ rS17{( ceilDiv(i1, 3) )}, rS19{( 3 * ( ceilDiv(i2, 4) ) )} ]
 root domain : (rS16{i1},rS19{( 3 * ( ceilDiv(i2, 4) ) )})
  Split: rS16{i1} by factor 3 -> rS17{( ceilDiv(i1, 3) )}, rS18{3}, start offset: 0, stop offset: 0
}
```
jjsjann123 pushed a commit to jjsjann123/nvfuser that referenced this issue Nov 10, 2022
I feel that this could be helpful for debugging. Using the example that breaks our system in csarofeen/pytorch#1692

Before
```
TransformPrinter : 
T0_g[ iS0{i1}, iS1{i2} ]
 root domain : (iS0{i1},iS1{i2})
T2_l[ iS11{( ceilDiv(i1, 3) )}, iS15{( 3 * ( ceilDiv(i2, 4) ) )}rf, rS14{4}rf ]
 root domain : (iS9{i1},iS15{( 3 * ( ceilDiv(i2, 4) ) )}rf,rS14{4}rf)
  Split: iS9{i1} by factor 3 -> iS11{( ceilDiv(i1, 3) )}, iS12{3}, start offset: 0, stop offset: 0
T1_g[ rS17{( ceilDiv(i1, 3) )}, rS19{( 3 * ( ceilDiv(i2, 4) ) )} ]
 root domain : (rS16{i1},rS19{( 3 * ( ceilDiv(i2, 4) ) )})
  Split: rS16{i1} by factor 3 -> rS17{( ceilDiv(i1, 3) )}, rS18{3}, start offset: 0, stop offset: 0
}
```

After:
```
TransformPrinter : 
T0_g[ iS0{i1}, iS1{i2} ]
 root domain : (iS0{i1},iS1{i2})
T2_l[ iS11{( ceilDiv(i1, 3) )}, iS15{( 3 * ( ceilDiv(i2, 4) ) )}rf, rS14{4}rf ]
 root domain : (iS9{i1},rS10{i2}rf)
  Split: iS9{i1} by factor 3 -> iS11{( ceilDiv(i1, 3) )}, iS12{3}, start offset: 0, stop offset: 0
  Split: rS10{i2}rf by factor 4 -> iS13{( ceilDiv(i2, 4) )}rf, rS14{4}rf, start offset: 0, stop offset: 0
  Merge: iS12{3} and iS13{( ceilDiv(i2, 4) )}rf -> iS15{( 3 * ( ceilDiv(i2, 4) ) )}rf
 rfactor domain : (iS9{i1},iS15{( 3 * ( ceilDiv(i2, 4) ) )}rf,rS14{4}rf)
  Split: iS9{i1} by factor 3 -> iS11{( ceilDiv(i1, 3) )}, iS12{3}, start offset: 0, stop offset: 0
T1_g[ rS17{( ceilDiv(i1, 3) )}, rS19{( 3 * ( ceilDiv(i2, 4) ) )} ]
 root domain : (rS16{i1},rS19{( 3 * ( ceilDiv(i2, 4) ) )})
  Split: rS16{i1} by factor 3 -> rS17{( ceilDiv(i1, 3) )}, rS18{3}, start offset: 0, stop offset: 0
}
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants