Optimize Hash
for repeated removals and insertions
#14539
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hash deletions do not clear
@indices
, so subsequent insertions with the same key cannot use those slots, and effectively behave like hash collisions. This PR adds an extra sentinel value for deleted index slots; they can be later filled in and, unlike the empty sentinel, do not halt index scans. See https://forum.crystal-lang.org/t/hash-delete-followed-by-insert-performance-issues/6784 for a discussion. Credit goes to @homonoidian for discovering this.Benchmark:
Before:
After:
This scenario now runs in O(1) instead of O(n) time. Note however that deletion is now the opposite, running in O(n) time to the number of hash collisions, instead of O(1). That means it is possible to craft other scenarios where running time grows from linear to quadratic:
Benchmark:
Before:
After:
Note that checking
Hash::Entry#deleted?
doesn't suffice because that also returns true for elements in@entries
which were previously unused. Also this PR doesn't change how@entries
is used, and#do_compaction
will still be called every now and then whenever@entries
reaches its capacity (similar to alternating#push
and#shift
calls on anArray
).