Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Fix rehash check #74

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

RFC: Fix rehash check #74

wants to merge 1 commit into from

Conversation

kmsquire
Copy link
Member

@kmsquire kmsquire commented Jan 30, 2021

  • We should(?) rehash when 3/4 of the slots have been deleted, not when 3/4
    of the keys have been deleted. The previous logic caused us to rehash
    when, e.g., there were 5 keys and we deleted 4. The new logic
    requires at least 12 removals before a size-based rehash occurs.

The main bug with the previous logic was when the number of keys was less than 4, we would always rehash. See below for an alternative fix.

It's arguable what the best strategy is...

Either fix below fixes #65.

* We should rehash when 3/4 of the slots have been deleted, not when 3/4
  of the keys have been deleted.  The previous logic caused us to rehash
  when, e.g., there were 5 keys and we deleted 4.  The new logic
  requires at least 12 removals before a size-based rehash occurs.
@@ -311,7 +311,7 @@ function _setindex!(h::OrderedDict, v, key, index)
sz = length(h.slots)
cnt = nk - h.ndel
# Rehash now if necessary
if h.ndel >= ((3*nk)>>2) || cnt*3 > sz*2
if h.ndel >= ((3*sz)>>2) || cnt*3 > sz*2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, when nk was less than 4, we would rehash every time. This was probably unintended.

The fix here mimics Base.Dict, but the implementation there is different--keys are stored in an array the same shape/length as slots, whereas here, the keys are stored in a shorter vector that, after rehash, is only as long as the number of keys. (nk here is the number of keys before rehash--i.e., it includes deleted keys.)

This might suggest that the strategy should be different. An alternative could be, e.g.

Suggested change
if h.ndel >= ((3*sz)>>2) || cnt*3 > sz*2
if h.ndel >= ((3*nk)>>2) >= 3 || cnt*3 > sz*2

which basically requires at least 4 keys before anything happens.

@eulerkochy
Copy link
Member

Well, I'm in for this change, given that for general use cases(e.g batch deletion followed by batch insertion), this will be quite useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug in OrderedDict (unlike in LittleDict, and Dict)
2 participants