Skip to content

Conversation

@Enchufa2
Copy link
Member

@Enchufa2 Enchufa2 commented Nov 1, 2025

Closes #1340. Not the cleanest of solutions, but minimal disruption of current code.

I can rebase this or the other one later depending on what you want to merge first.

Checklist

  • Code compiles correctly
  • R CMD check still passes all tests
  • Preferably, new tests were added which fail without the change
  • Document the changes by file in ChangeLog

Copy link
Member

@eddelbuettel eddelbuettel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks pretty good at a glance, chapeau!

@eddelbuettel
Copy link
Member

I can rebase this or the other one later depending on what you want to merge first.

The "default" approach of squash-merging one at a time, and then rebasing whatever is outstanding or pending, works fine. The other open one may also take longer as we probably need to create and watch over a bunch of (simple) PRs in other repos.

@eddelbuettel
Copy link
Member

Reverse-depends are running, slowly as always. Currently just over 700 packages out of 3100 done, and the four failures are packages with known "expected" issues or in one case a spurious network issue. Fingers crossed.

@eddelbuettel eddelbuettel requested a review from Copilot November 2, 2025 12:53
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a bug where signed zeros (-0.0 and +0.0) in doubles were not being treated as equal during hash operations. The fix refactors the normalization logic into a dedicated normalize method and ensures it is consistently applied during value insertion and comparison.

  • Introduced a normalize method to centralize double normalization for signed zeros, NA, and NaN
  • Updated hash insertion logic to normalize values before hashing and comparison
  • Added test coverage for signed zero handling in the unique function

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
inst/include/Rcpp/hash/SelfHash.h Refactored normalization into a dedicated method and applied it consistently in hash operations
inst/include/Rcpp/hash/IndexHash.h Applied the same normalization refactoring as in SelfHash.h
inst/tinytest/test_sugar.R Added test case to verify correct handling of signed zeros
ChangeLog Documented the changes
Comments suppressed due to low confidence (2)

inst/include/Rcpp/hash/SelfHash.h:94

  • The get_index method does not normalize the input value before calling get_addr, and uses direct equality comparison instead of not_equal. This creates inconsistency with add_value_get_index which normalizes values. For doubles, this means looking up -0.0 may fail to find +0.0 that was stored. The value should be normalized at the start: value = normalize(value); and the comparison should use the not_equal helper for consistency.
        unsigned int get_index(STORAGE value) const {
            unsigned int addr = get_addr(value) ;
            while (data[addr]) {
              if (src[data[addr] - 1] == value)
                return data[addr];
              addr++;
              if (addr == static_cast<unsigned int>(m)) addr = 0;
            }
            return NA_INTEGER;
        }

inst/include/Rcpp/hash/IndexHash.h:199

  • The get_index method does not normalize the input value before calling get_addr, and uses direct equality comparison instead of not_equal. This creates inconsistency with add_value which normalizes values. For doubles, this means looking up -0.0 may fail to find +0.0 that was stored. The value should be normalized at the start: value = normalize(value); and the comparison should use the not_equal helper for consistency.
        inline uint32_t get_index(STORAGE value) const {
            uint32_t addr = get_addr(value) ;
            while (data[addr]) {
              if (src[data[addr] - 1] == value)
                return data[addr];
              addr++;
              if (addr == static_cast<uint32_t>(m)) addr = 0;
            }
            return NA_INTEGER;
        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@eddelbuettel
Copy link
Member

eddelbuettel commented Nov 2, 2025

I quickly turned copilot on here [as a one-off, for this PR] as I am starting to find it useful even just for typos like the ones it found here. I have so far been cheeky and then committed by hand, feel free to either ignore, or commits its suggestions and quickly drop a new commit. Hope you don't mind. 😉

Enchufa2 and others added 2 commits November 2, 2025 14:00
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@Enchufa2
Copy link
Member Author

Enchufa2 commented Nov 2, 2025

Very useful indeed, thanks!

@eddelbuettel
Copy link
Member

Yep. So far it's always been one-offs from the 'review' box but it's generally been on point. Not bad.

@eddelbuettel
Copy link
Member

This one looks good per the commit just made containing the results from a full run (see RcppCore/rcpp-logs@aa8cc60). I think we can squash merge this and then rebase the other open PR. Which I need to re-run incrementally to get the install and check logs refreshed.

@eddelbuettel eddelbuettel merged commit 8b6d97c into master Nov 4, 2025
26 checks passed
@eddelbuettel eddelbuettel deleted the fix/unique branch November 4, 2025 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rcpp sugar indexHash (unique, sort_unique, etc.) distinguish signed and unsigned zeros

4 participants