Skip to content

Conversation

ndgrigorian
Copy link
Collaborator

@ndgrigorian ndgrigorian commented Feb 7, 2024

This pull request adds where.cpp, copy_and_cast_usm_to_usm.cpp, and boolean reductions sources to the source files to be compiled with -fno-fast-math. This resolves issues with floating point NaNs being interpreted as false when cast to bool with a CUDA backend.

Additionally, factors boolean_reductions.cpp and boolean_reductions.hpp the reductions submodule.

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • If this PR is a work in progress, are you opening the PR as a draft?

Adds copy_and_cast_usm_to_usm.cpp, where.cpp, and _boolean_reduction_sources to _no_fast_math_sources

This is intended to fix discrepancies with NaNs on CUDA backend
Removes boolean_reductions.cpp and boolean_reductions.hpp
@ndgrigorian ndgrigorian force-pushed the add-no-fast-math-sources branch from ca1b56c to ae73692 Compare February 7, 2024 03:04
Copy link

github-actions bot commented Feb 7, 2024

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

@coveralls
Copy link
Collaborator

coveralls commented Feb 7, 2024

Coverage Status

coverage: 91.145%. remained the same
when pulling ae73692 on add-no-fast-math-sources
into c578614 on master.

Copy link

github-actions bot commented Feb 7, 2024

Array API standard conformance tests for dpctl=0.15.1dev3=py310h15de555_90 ran successfully.
Passed: 908
Failed: 1
Skipped: 86

1 similar comment
Copy link

github-actions bot commented Feb 7, 2024

Array API standard conformance tests for dpctl=0.15.1dev3=py310h15de555_90 ran successfully.
Passed: 908
Failed: 1
Skipped: 86

@ndgrigorian
Copy link
Collaborator Author

Examples of fixed functionality

In [1]: import dpctl.tensor as dpt

In [2]: x = dpt.full(10, dpt.nan, dtype="f4", device="cuda")

In [3]: dpt.astype(x, "?")
Out[3]:
usm_ndarray([ True,  True,  True,  True,  True,  True,  True,  True,
              True,  True])

In [4]: dpt.any(x)
Out[4]: usm_ndarray(True)

In [5]: dpt.where(x, dpt.asarray(1), dpt.asarray(0))
Out[5]: usm_ndarray([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

Copy link
Contributor

@oleksandr-pavlyk oleksandr-pavlyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for tracking the issue down and fixing it @ndgrigorian

LGTM!

@oleksandr-pavlyk oleksandr-pavlyk merged commit e2d9640 into master Feb 7, 2024
@oleksandr-pavlyk oleksandr-pavlyk deleted the add-no-fast-math-sources branch February 7, 2024 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants