Special case bool dereferencing (consistent with PyTorch)#410
Special case bool dereferencing (consistent with PyTorch)#410jrbyrnes wants to merge 1 commit intoROCm:developfrom
Conversation
|
@jrbyrnes To work around your problem, you could create a |
|
Hi @nolmoonen , thanks for your thoughts! In fact, pushing it back to PyTorch was my first response. However, there has been a specific ask to solve via ROCm since the same behavior (i.e. alias violation) results in test pass transparently for CUDA, yet fails for HIP (ref SWDEV-357998). However, if you are fundamentally opposed to this patch, please confirm and I will see what I can do in PyTorch. Thanks. |
|
Hi @nolmoonen, my understanding is that bool *val = static_cast<bool *>(temp) doesn't break the strict aliasing rule if the underlying object type is a character type. In code segment above, it depends on underlying object assigned to x. That said, I'm not implying that rocPRIM is the correct place to fix this issue, as I'm sure there are other users of transform_iterator that would be impacted by this change. |
|
We are currently investigating the consequences of applying the PR. Aside from that: bool* x = // some data, true or false
uint8_t* tmp = reinterpret_cast<uint8_t*>(x);
// dereference tmp and write 0 if evals to false, 2-255 if evals to true, but do check that sizeof(bool) == sizeof(uint8_t)
// dereference x and check that it is equal to original arrayhttps://en.cppreference.com/w/cpp/language/reinterpret_cast section "Type aliasing" states these rules. |
|
Sorry, correction: the example I gave is also not allowed. While manipulating the data through the |
|
We think this may be fixed in pytorch and therefore not require any change to rocPRIM. @stanleytsang-amd will confirm. |
A PyTorch Test has (roughly) the following implementation
This is invoking undefined behavior.
This PR identified this problem and resolved it for the CPU case. Basically it special cases dereferencing bool * (via c10::load). And this c10::load is used when iteratively performing the nonzero op on the tensor data.
This patch extends the fix to our iterators.