dpctl.tensor.floor_divide fixed for signed 0 output#1271
Conversation
|
View rendered docs @ https://intelpython.github.io/dpctl/pulls/1271/index.html |
|
Array API standard conformance tests for dpctl=0.14.5dev0=py310h7bf5fec_5 ran successfully. |
25f0109 to
01f4571
Compare
|
Array API standard conformance tests for dpctl= ran successfully. |
01f4571 to
c31d020
Compare
|
Array API standard conformance tests for dpctl=0.14.6dev0=py310h7bf5fec_51 ran successfully. |
- Rather than computing division and modulo for each element for sycl::vec, instead the vector is initialized and filled per-element
|
Array API standard conformance tests for dpctl=0.14.6dev0=py310h7bf5fec_55 ran successfully. |
| else { | ||
| res[i] = in1[i] / in2[i]; | ||
| if constexpr (std::is_signed_v<resT>) { | ||
| auto mod = in1[i] % in2[i]; |
There was a problem hiding this comment.
This performs a second division. How about mod = in1[i] - res[i] * in2[i] instead?
There was a problem hiding this comment.
I will try this out as well, I'd like to see how the performance looks.
I have also seen sources suggest that the remainder is, in some cases, a byproduct of division and the compiler can optimize these operations when nearby.
i.e., here under notes
oleksandr-pavlyk
left a comment
There was a problem hiding this comment.
LGTM @ndgrigorian. Thank you
|
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞 |
|
Array API standard conformance tests for dpctl=0.14.6dev0=py310h7bf5fec_58 ran successfully. |
On some devices,
sycl::floorandstd::floorwould drop the sign of 0.This PR resolves those cases and adds a test.