folly::ConcurrentHashMap crashes under high contention #2097

DanielLiamAnderson · 2023-11-13T03:21:51Z

When under significant contention and with a specific key distribution, folly::ConcurrentHashMap fails checks inside HazptrObjLinked.h. I've reproduced this on multiple machines.

OS: Ubuntu 20.04.6 LTS (reproduced on Ubuntu 22.04)
Compiler: GCC 13.1.0 (reproduced on GCC 11.4.0)

The following minimal reproducible example consistently reproduces the bug.

#include <cmath>
#include <cstdint>

#include <latch>
#include <limits>
#include <random>
#include <thread>
#include <vector>

#include <folly/concurrency/ConcurrentHashMap.h>
#include <folly/init/Init.h>


struct KeyDist {

  KeyDist(uint64_t num_items, double zipfian_const) :
      items_(num_items), theta_(zipfian_const), zeta_n_(num_items) {
    zeta_2_ = Zeta(2, theta_);
    zeta_n_ = Zeta(num_items, theta_);
    alpha_ = 1.0 / (1.0 - theta_);
    eta_ = Eta();
  }

  uint64_t operator () (size_t i) const {
    uint64_t r = hash64(i);
    double u = static_cast<double>(r) / static_cast<double>(std::numeric_limits<uint64_t>::max()); // uniform between 0.0 and 1.0
    double uz = u * zeta_n_;
    if (uz < 1.0) return 0;
    if (uz < 1.0 + std::pow(0.5, theta_)) return 1;
    return std::lround((items_-1) * std::pow(eta_ * u - eta_ + 1, alpha_));
  }

  double Eta() {
    return (1 - std::pow(2.0 / items_, 1 - theta_)) / (1 - zeta_2_ / zeta_n_);
  }

  static uint64_t hash64(uint64_t u) {
    uint64_t v = u * 3935559000370003845ul + 2691343689449507681ul;
    v ^= v >> 21;
    v ^= v << 37;
    v ^= v >> 4;
    v *= 4768777513237032717ul;
    v ^= v << 20;
    v ^= v >> 41;
    v ^= v << 5;
    return v;
  }

  static double Zeta(uint64_t cur_num, double theta) {
    double result = 0.0;
    for (int i = 0; i < cur_num; i++) {
      result += 1.0/ std::pow(i+1, theta);
    }
    return result;
  }

  uint64_t items_;
  double theta_, zeta_n_, eta_, alpha_, zeta_2_;
};


int main(int argc, char* argv[]) {

  // Should be set to a number much larger than the number of hardware threads
  constexpr unsigned num_threads = 100;

  folly::Init init(&argc, &argv);

  std::vector<std::jthread> threads;
  threads.reserve(num_threads);
  
  KeyDist z(100000, 0.99);
  
  folly::ConcurrentHashMap<long, long> T;
  
  std::latch start{num_threads};
  for (int i = 0; i < num_threads; i++) {
    threads.emplace_back([&, i]() {
      start.arrive_and_wait();
      long result = 0;
      for (int j = 0; j < 1000000; j++) {
        int key = z(j);
        if (j % 3 == 0) {
          auto v = T.find(key);
          if (v != T.end()) result += v->second;
        }
        else if (j % 3 == 1) {
          result += T.insert(std::make_pair(key, i)).second;
        }
        else {
          result += T.erase(key);
        }
      }
    });
  }
  
  for (auto& t : threads) {
    t.join();
  }
}

This outputs the following:

F1112 21:58:34.663033 12469 HazptrObjLinked.h:121] Check failed: oldval & kLinkMask < kLinkMask (4294901760 vs. 4294901760)
*** Check failure stack trace: ***
*** Aborted at 1699844314 (Unix time, try 'date -d @1699844314') ***
*** Signal 6 (SIGABRT) (0x3e8000030a8) received by PID 12456 (pthread TID 0x7feda4ed6640) (linux TID 12469) (maybe from PID 12456, UID 1000) (code: -6), stack trace: ***
./a.out(+0x2015a3)[0x557cafaeb5a3]
./a.out(+0x1bed9a)[0x557cafaa8d9a]
./a.out(+0x1bd1cf)[0x557cafaa71cf]
./a.out(+0x1bd2b6)[0x557cafaa72b6]
/lib/x86_64-linux-gnu/libc.so.6(+0x4251f)[0x7fedaaf7551f]
/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7fedaafc9a7c]
/lib/x86_64-linux-gnu/libc.so.6(raise+0x15)[0x7fedaaf75475]
/lib/x86_64-linux-gnu/libc.so.6(abort+0xd2)[0x7fedaaf5b7f2]
./a.out(+0x512fc)[0x557caf93b2fc]
/lib/x86_64-linux-gnu/libglog.so.0(_ZN6google10LogMessage4FailEv+0x12)[0x7fedab55eb02]
/lib/x86_64-linux-gnu/libglog.so.0(_ZN6google10LogMessage9SendToLogEv+0xb80)[0x7fedab5669d0]
/lib/x86_64-linux-gnu/libglog.so.0(_ZN6google10LogMessage5FlushEv+0xd1)[0x7fedab55e7c1]
/lib/x86_64-linux-gnu/libglog.so.0(_ZN6google15LogMessageFatalD2Ev+0xe)[0x7fedab56078e]
./a.out(+0x19108)[0x557caf903108]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xe62b2)[0x7fedab34c2b2]
/lib/x86_64-linux-gnu/libc.so.6(+0x94b42)[0x7fedaafc7b42]
/lib/x86_64-linux-gnu/libc.so.6(+0x1269ff)[0x7fedab0599ff]
(safe mode, symbolizer not available)
Aborted

We initially encountered this bug while running our own benchmarking code here, in which we consistently get a different crash, but also coming from the hazard pointer code:

F1112 21:06:54.969465  1702 HazptrObj.h:182] Check failed: next_ == this (0x7fdd00556650 vs. 0x7fdd09e416b0)
*** Check failure stack trace: ***
Aborted

The text was updated successfully, but these errors were encountered:

instw · 2023-11-28T22:05:09Z

Thanks for the repro! This issue seems to be due to ConcurrentHashMap not being able to reclaim deleted entries fast enough.
Each entry in the ConcurrentHashMap is referenced by it's predecessor via a linked list, and this reference is kept track of using Hazard pointer link counting logic. Effectively when an erase is performed, the older map entry doesn't get cleaned up immediately, but rather as part of a reclamation process that's usually asynchronous. If the rate at which keys are erased is considerably higher than the rate at which keys are reclaimed, certain internal counters (which are 16 bits in size) overflow. The first error you posted is related directly to the overflow.

For the second - It seems your benchmarking code most likely runs without debug compilation, which causes some invariant checks to be skipped, leading to double reclamation and the second error.

Summary: PROBLEM Folly ConcurrentHashMaps use Hazard pointers to ensure map entries that were recently removed (using `erase`, `insert_or_assign`, etc) aren't cleaned up when there are readers for those objects. Instead, they are removed as part of a reclamation process which typically happens asynchronously. Moreover within ConcurrentHashMap, entries are linked to one another, and this linkage needs to be known within the hazard pointer logic to ensure we don't clean up an object that itself doesn't have any direct hazard pointers, but is referenced by another object that might have hazard pointers. That logic is within `HazptrObjLinked`. Under high contention situations (see facebook#2097 ) , the link counting logic can overflow, because a single object has too many dangling links. For example, consider a map that has 2 entries with the same hash code- `(A,0)` and `(B,0)`. Let's assume that `A` is stored before `B` internally within the `ConcurrentHashMap`'s `BucketTable`. `B` stores initially that it has a 1 link (to `A`). Now, let's assume that we replace `(A,0)` with `(A,1)`. While `(A,0)` is erased out of the `ConcurrentHashMap`, its not immediately reclaimed/deleted. During this interim, `B` has a link count of 2 to the 2 entries of `A`. This link count is stored as a 16 bit unsigned integer. If the above operation happens very quickly, then we end up in a situation where `B`'s link count overflows past 65535, and wraps around. This situation is caught in debug compilation (due to `DCHECK`), but in opt builds, it results in bad retirements. For eg, if `B`'s link count goes past 65535 to 65537 (i.e. `1`), then when 1 object of `A` is reclaimed, the `B`'s link count would decrement past `1` back to `0`, causing `B` to be incorrectly retired. Now if we actually end up removing all of `A`, the link count will overflow backwards, from `0` back to `65535` and then back to `0`, causing a double retirement - a sign to corruption. SOLUTION While the situation is rare, it can arise for skewed data with a lot of contention. There are 3 options to "solve" this: 1. Increase the link count data structure size from 16bit to something higher - Simple, but a work-around. Eventually high-enough contention would bugs to show up there as well. 2. Crash the process when there is very high contention - Maintains the current performance guarantees, and when ConcurrentHashMap cannot meet those guarantees, it causes a fatal error. 3. Slow ConcurrentHashMap erasures under high contention (this diff) - Very high contention would cause ConcurrentHashMap to slow down, and give reclamation time to act. Functionally `ConcurrentHashMap` remains the same, but does exhibit different perf characteristics. In this change, the `HazptrObjLinked` code is changed is disallow for overflows since it leads to corruption, and the callers are responsible for handling cases where links cannot be created. For `ConcurrentHashMap`, we keep waiting, until we can acquire a link : which means erasures under high contention are lock-free but not wait-free. For reclamation, there are buffers within the cohort to store both retired objects (aka `list`) and reclaimed objects (aka `safe list`). In cases where `ConcurrentHashMap` is unable to acquire a link, it's imperative it tries to initiate a reclamation cycle to make progress, and thus I added a `cleanup()` method within the cohort to flush any existing retired objects to the hazard pointer domain for retirement-evaluation, kick off a reclamation cycle, and also retire any retired objects pending within the cohort. Differential Revision: D51647789

Summary: PROBLEM For linked objects, the ref count and link count are stored in 16 bit integers. It's easily possible to overflow either counts. SOLUTION Increase the counts to be stored using 32 bits. While this theoretically increases the footprint of the `hazptr_obj_base_linked` struct from 28 bytes to 32 bytes, the increase is small; and in many cases struct padding would anyways cause the `hazptr_obj_base_linked` to be 32 bytes. Note: This is a mitigation for #2097 . Theoretically its possible to increase contention more to cause the issue to recur, but in practice that's hard to do. See #2107 for a potential fix to the underlying issue. Reviewed By: ot Differential Revision: D51829892 fbshipit-source-id: f3ef8f7cf245dd7ff0e1ba6f6ee7bb15ead532ef

Summary: PROBLEM For linked objects, the ref count and link count are stored in 16 bit integers. It's easily possible to overflow either counts. SOLUTION Increase the counts to be stored using 32 bits. While this theoretically increases the footprint of the `hazptr_obj_base_linked` struct from 28 bytes to 32 bytes, the increase is small; and in many cases struct padding would anyways cause the `hazptr_obj_base_linked` to be 32 bytes. Note: This is a mitigation for facebook/folly#2097 . Theoretically its possible to increase contention more to cause the issue to recur, but in practice that's hard to do. See facebook/folly#2107 for a potential fix to the underlying issue. Reviewed By: ot Differential Revision: D51829892 fbshipit-source-id: f3ef8f7cf245dd7ff0e1ba6f6ee7bb15ead532ef

instw mentioned this issue Nov 30, 2023

Safeguard against link overflows in ConcurrentHashMap #2107

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

folly::ConcurrentHashMap crashes under high contention #2097

folly::ConcurrentHashMap crashes under high contention #2097

DanielLiamAnderson commented Nov 13, 2023

instw commented Nov 28, 2023

folly::ConcurrentHashMap crashes under high contention #2097

folly::ConcurrentHashMap crashes under high contention #2097

Comments

DanielLiamAnderson commented Nov 13, 2023

instw commented Nov 28, 2023