Typo in get_impl? #6

rescrv · 2014-04-27T12:34:54Z

I believe there is a typo in get_impl here: https://github.com/boundary/high-scale-lib/blob/master/src/main/java/org/cliffc/high_scale_lib/NonBlockingHashMap.java#L540

The line should instead read K == TOMBSTONE.

You'll note that key is what the user passed in, and users should never try to retrieve a TOMBSTONE. In fact, I think Java's type safety prevents them from even getting a reference to the TOMBSTONE.

This typo can effect the safety and efficiency of the get operation as the hash table is no longer linearizable. A write, that is then marked with a TOMBSTONE and copied to the new table will be set to TOMBSTONE. If the copying and the get race, the copy could see a null and return the null, even though it should instead begin looking in the next table. It's a small race, but it's there.

It's also less efficient to reprobe up to reprobe_limit on larger tables, but what's a few extra cycles among friends ;-).

The text was updated successfully, but these errors were encountered:

rescrv · 2014-04-27T13:15:07Z

Ditto for putIfMatch.

rescrv · 2014-04-29T22:16:30Z

There are a couple other race conditions as well. If this lib is actively used, I'm happy to report them, but I'd like to avoid typing them up if the effort would be wasted.

moonpolysoft · 2014-05-12T22:12:58Z

Yes please do, it's in active use in a number of different places.

rescrv · 2014-05-14T01:26:45Z

Here's the other major "gotcha" cases I found. For reference, my C++ implementation is here and is what we're using in HyperDex now.

The resize method makes a chain of inner tables. Although it's extremely unlikely, it's possible for the recursive putIfMatch call to overrun the stack. I saw this in an application with more threads than cores, where one thread was forced to wait to run. By the time it ran, the other threads had constructed many new tables that the global table had promoted past. These intermediary tables were necessarily filled with tombstones, but the straggler thread would still attempt to resize them using the copy helper. Of course, this copy helper would step down to the next table, and repeat. Eventually it overran the stack. Tuning the table resize rate can significantly decrease the likelihood of this race condition. A more solid fix, that I use in my impl, is to count the resize number at which each inner table was established. Upon entry to the putIfMatch call, I skip ahead to top-most table accessible from the outer hash map. This allows a straggler to always work on a copy of the inner table where it can do useful work, without scanning tables that are definitely fully copied.

I also thought the counter implementation was racy during a resize, but it looks like it's doing the right thing.

rescrv · 2014-05-19T18:23:26Z

The other issue I forgot about and didn't include was the "clear" call. It doesn't behave well with resizes, especially stacked resizes. I opted to remove it completely.

moonpolysoft closed this as completed May 13, 2014

moonpolysoft reopened this May 14, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Typo in get_impl? #6

Typo in get_impl? #6

rescrv commented Apr 27, 2014

rescrv commented Apr 27, 2014

rescrv commented Apr 29, 2014

moonpolysoft commented May 12, 2014

rescrv commented May 14, 2014

rescrv commented May 19, 2014

Typo in get_impl? #6

Typo in get_impl? #6

Comments

rescrv commented Apr 27, 2014

rescrv commented Apr 27, 2014

rescrv commented Apr 29, 2014

moonpolysoft commented May 12, 2014

rescrv commented May 14, 2014

rescrv commented May 19, 2014