Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typo in get_impl? #6

Open
rescrv opened this issue Apr 27, 2014 · 5 comments
Open

Typo in get_impl? #6

rescrv opened this issue Apr 27, 2014 · 5 comments

Comments

@rescrv
Copy link

rescrv commented Apr 27, 2014

I believe there is a typo in get_impl here: https://github.com/boundary/high-scale-lib/blob/master/src/main/java/org/cliffc/high_scale_lib/NonBlockingHashMap.java#L540

The line should instead read K == TOMBSTONE.

You'll note that key is what the user passed in, and users should never try to retrieve a TOMBSTONE. In fact, I think Java's type safety prevents them from even getting a reference to the TOMBSTONE.

This typo can effect the safety and efficiency of the get operation as the hash table is no longer linearizable. A write, that is then marked with a TOMBSTONE and copied to the new table will be set to TOMBSTONE. If the copying and the get race, the copy could see a null and return the null, even though it should instead begin looking in the next table. It's a small race, but it's there.

It's also less efficient to reprobe up to reprobe_limit on larger tables, but what's a few extra cycles among friends ;-).

@rescrv
Copy link
Author

rescrv commented Apr 27, 2014

Ditto for putIfMatch.

@rescrv
Copy link
Author

rescrv commented Apr 29, 2014

There are a couple other race conditions as well. If this lib is actively used, I'm happy to report them, but I'd like to avoid typing them up if the effort would be wasted.

@moonpolysoft
Copy link
Contributor

Yes please do, it's in active use in a number of different places.

@rescrv
Copy link
Author

rescrv commented May 14, 2014

Here's the other major "gotcha" cases I found. For reference, my C++ implementation is here and is what we're using in HyperDex now.

The resize method makes a chain of inner tables. Although it's extremely unlikely, it's possible for the recursive putIfMatch call to overrun the stack. I saw this in an application with more threads than cores, where one thread was forced to wait to run. By the time it ran, the other threads had constructed many new tables that the global table had promoted past. These intermediary tables were necessarily filled with tombstones, but the straggler thread would still attempt to resize them using the copy helper. Of course, this copy helper would step down to the next table, and repeat. Eventually it overran the stack. Tuning the table resize rate can significantly decrease the likelihood of this race condition. A more solid fix, that I use in my impl, is to count the resize number at which each inner table was established. Upon entry to the putIfMatch call, I skip ahead to top-most table accessible from the outer hash map. This allows a straggler to always work on a copy of the inner table where it can do useful work, without scanning tables that are definitely fully copied.

I also thought the counter implementation was racy during a resize, but it looks like it's doing the right thing.

@moonpolysoft moonpolysoft reopened this May 14, 2014
@rescrv
Copy link
Author

rescrv commented May 19, 2014

The other issue I forgot about and didn't include was the "clear" call. It doesn't behave well with resizes, especially stacked resizes. I opted to remove it completely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants