Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

folly::ConcurrentHashMap crashes under high contention #2097

Open
DanielLiamAnderson opened this issue Nov 13, 2023 · 1 comment
Open

folly::ConcurrentHashMap crashes under high contention #2097

DanielLiamAnderson opened this issue Nov 13, 2023 · 1 comment

Comments

@DanielLiamAnderson
Copy link

When under significant contention and with a specific key distribution, folly::ConcurrentHashMap fails checks inside HazptrObjLinked.h. I've reproduced this on multiple machines.

OS: Ubuntu 20.04.6 LTS (reproduced on Ubuntu 22.04)
Compiler: GCC 13.1.0 (reproduced on GCC 11.4.0)

The following minimal reproducible example consistently reproduces the bug.

#include <cmath>
#include <cstdint>

#include <latch>
#include <limits>
#include <random>
#include <thread>
#include <vector>

#include <folly/concurrency/ConcurrentHashMap.h>
#include <folly/init/Init.h>


struct KeyDist {

  KeyDist(uint64_t num_items, double zipfian_const) :
      items_(num_items), theta_(zipfian_const), zeta_n_(num_items) {
    zeta_2_ = Zeta(2, theta_);
    zeta_n_ = Zeta(num_items, theta_);
    alpha_ = 1.0 / (1.0 - theta_);
    eta_ = Eta();
  }

  uint64_t operator () (size_t i) const {
    uint64_t r = hash64(i);
    double u = static_cast<double>(r) / static_cast<double>(std::numeric_limits<uint64_t>::max()); // uniform between 0.0 and 1.0
    double uz = u * zeta_n_;
    if (uz < 1.0) return 0;
    if (uz < 1.0 + std::pow(0.5, theta_)) return 1;
    return std::lround((items_-1) * std::pow(eta_ * u - eta_ + 1, alpha_));
  }

  double Eta() {
    return (1 - std::pow(2.0 / items_, 1 - theta_)) / (1 - zeta_2_ / zeta_n_);
  }

  static uint64_t hash64(uint64_t u) {
    uint64_t v = u * 3935559000370003845ul + 2691343689449507681ul;
    v ^= v >> 21;
    v ^= v << 37;
    v ^= v >> 4;
    v *= 4768777513237032717ul;
    v ^= v << 20;
    v ^= v >> 41;
    v ^= v << 5;
    return v;
  }

  static double Zeta(uint64_t cur_num, double theta) {
    double result = 0.0;
    for (int i = 0; i < cur_num; i++) {
      result += 1.0/ std::pow(i+1, theta);
    }
    return result;
  }

  uint64_t items_;
  double theta_, zeta_n_, eta_, alpha_, zeta_2_;
};


int main(int argc, char* argv[]) {

  // Should be set to a number much larger than the number of hardware threads
  constexpr unsigned num_threads = 100;

  folly::Init init(&argc, &argv);

  std::vector<std::jthread> threads;
  threads.reserve(num_threads);
  
  KeyDist z(100000, 0.99);
  
  folly::ConcurrentHashMap<long, long> T;
  
  std::latch start{num_threads};
  for (int i = 0; i < num_threads; i++) {
    threads.emplace_back([&, i]() {
      start.arrive_and_wait();
      long result = 0;
      for (int j = 0; j < 1000000; j++) {
        int key = z(j);
        if (j % 3 == 0) {
          auto v = T.find(key);
          if (v != T.end()) result += v->second;
        }
        else if (j % 3 == 1) {
          result += T.insert(std::make_pair(key, i)).second;
        }
        else {
          result += T.erase(key);
        }
      }
    });
  }
  
  for (auto& t : threads) {
    t.join();
  }
}

This outputs the following:

F1112 21:58:34.663033 12469 HazptrObjLinked.h:121] Check failed: oldval & kLinkMask < kLinkMask (4294901760 vs. 4294901760)
*** Check failure stack trace: ***
*** Aborted at 1699844314 (Unix time, try 'date -d @1699844314') ***
*** Signal 6 (SIGABRT) (0x3e8000030a8) received by PID 12456 (pthread TID 0x7feda4ed6640) (linux TID 12469) (maybe from PID 12456, UID 1000) (code: -6), stack trace: ***
./a.out(+0x2015a3)[0x557cafaeb5a3]
./a.out(+0x1bed9a)[0x557cafaa8d9a]
./a.out(+0x1bd1cf)[0x557cafaa71cf]
./a.out(+0x1bd2b6)[0x557cafaa72b6]
/lib/x86_64-linux-gnu/libc.so.6(+0x4251f)[0x7fedaaf7551f]
/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7fedaafc9a7c]
/lib/x86_64-linux-gnu/libc.so.6(raise+0x15)[0x7fedaaf75475]
/lib/x86_64-linux-gnu/libc.so.6(abort+0xd2)[0x7fedaaf5b7f2]
./a.out(+0x512fc)[0x557caf93b2fc]
/lib/x86_64-linux-gnu/libglog.so.0(_ZN6google10LogMessage4FailEv+0x12)[0x7fedab55eb02]
/lib/x86_64-linux-gnu/libglog.so.0(_ZN6google10LogMessage9SendToLogEv+0xb80)[0x7fedab5669d0]
/lib/x86_64-linux-gnu/libglog.so.0(_ZN6google10LogMessage5FlushEv+0xd1)[0x7fedab55e7c1]
/lib/x86_64-linux-gnu/libglog.so.0(_ZN6google15LogMessageFatalD2Ev+0xe)[0x7fedab56078e]
./a.out(+0x19108)[0x557caf903108]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xe62b2)[0x7fedab34c2b2]
/lib/x86_64-linux-gnu/libc.so.6(+0x94b42)[0x7fedaafc7b42]
/lib/x86_64-linux-gnu/libc.so.6(+0x1269ff)[0x7fedab0599ff]
(safe mode, symbolizer not available)
Aborted

We initially encountered this bug while running our own benchmarking code here, in which we consistently get a different crash, but also coming from the hazard pointer code:

F1112 21:06:54.969465  1702 HazptrObj.h:182] Check failed: next_ == this (0x7fdd00556650 vs. 0x7fdd09e416b0)
*** Check failure stack trace: ***
Aborted
@instw
Copy link

instw commented Nov 28, 2023

Thanks for the repro! This issue seems to be due to ConcurrentHashMap not being able to reclaim deleted entries fast enough.
Each entry in the ConcurrentHashMap is referenced by it's predecessor via a linked list, and this reference is kept track of using Hazard pointer link counting logic. Effectively when an erase is performed, the older map entry doesn't get cleaned up immediately, but rather as part of a reclamation process that's usually asynchronous. If the rate at which keys are erased is considerably higher than the rate at which keys are reclaimed, certain internal counters (which are 16 bits in size) overflow. The first error you posted is related directly to the overflow.

For the second - It seems your benchmarking code most likely runs without debug compilation, which causes some invariant checks to be skipped, leading to double reclamation and the second error.

instw added a commit to instw/folly that referenced this issue Nov 30, 2023
Summary:
PROBLEM
Folly ConcurrentHashMaps use Hazard pointers to ensure map entries that were recently removed (using `erase`, `insert_or_assign`, etc) aren't cleaned up when there are readers for those objects. Instead, they are removed as part of a reclamation process which typically happens asynchronously. Moreover within ConcurrentHashMap, entries are linked to one another, and this linkage needs to be known within the hazard pointer logic to ensure we don't clean up an object that itself doesn't have any direct hazard pointers, but is referenced by another object that might have hazard pointers. That logic is within `HazptrObjLinked`.
Under high contention situations (see facebook#2097 ) , the link counting logic can overflow, because a single object has too many dangling links. For example, consider a map that has 2 entries with the same hash code- `(A,0)` and `(B,0)`. Let's assume that `A` is stored before `B` internally within the `ConcurrentHashMap`'s `BucketTable`. `B` stores initially that it has a 1 link (to `A`). Now, let's assume that we replace `(A,0)` with `(A,1)`. While `(A,0)` is erased out of the `ConcurrentHashMap`, its not immediately reclaimed/deleted. During this interim, `B` has a link count of 2 to the 2 entries of `A`. This link count is stored as a 16 bit unsigned integer. If the above operation happens very quickly, then we end up in a situation where `B`'s link count overflows past 65535, and wraps around.
This situation is caught in debug compilation (due to `DCHECK`), but in opt builds, it results in bad retirements. For eg, if `B`'s link count goes past 65535 to 65537 (i.e. `1`), then when 1 object of `A` is reclaimed, the `B`'s link count would decrement past `1` back to `0`, causing `B` to be incorrectly retired. Now if we actually end up removing all of `A`, the link count will overflow backwards, from `0` back to `65535` and then back to `0`, causing a double retirement - a sign to corruption.

SOLUTION
While the situation is rare, it can arise for skewed data with a lot of contention. There are 3 options to "solve" this:
1. Increase the link count data structure size from 16bit to something higher - Simple, but a work-around. Eventually high-enough contention would bugs to show up there as well.
2. Crash the process when there is very high contention - Maintains the current performance guarantees, and when ConcurrentHashMap cannot meet those guarantees, it causes a fatal error.
3. Slow ConcurrentHashMap erasures under high contention (this diff) - Very high contention would cause ConcurrentHashMap to slow down, and give reclamation time to act. Functionally `ConcurrentHashMap` remains the same, but does exhibit different perf characteristics.

In this change, the `HazptrObjLinked` code is changed is disallow for overflows since it leads to corruption, and the callers are responsible for handling cases where links cannot be created. For `ConcurrentHashMap`, we keep waiting, until we can acquire a link : which means erasures under high contention are lock-free but not wait-free.
For reclamation, there are buffers within the cohort to store both retired objects (aka `list`) and reclaimed objects (aka `safe list`). In cases where `ConcurrentHashMap` is unable to acquire a link, it's imperative it tries to initiate a reclamation cycle to make progress, and thus I added a `cleanup()` method within the cohort to flush any existing retired objects to the hazard pointer domain for retirement-evaluation, kick off a reclamation cycle, and also retire any retired objects pending within the cohort.

Differential Revision: D51647789
facebook-github-bot pushed a commit that referenced this issue Dec 7, 2023
Summary:
PROBLEM
For linked objects, the ref count and link count are stored in 16 bit integers. It's easily possible to overflow either counts.

SOLUTION
Increase the counts to be stored using 32 bits. While this theoretically increases the footprint of the `hazptr_obj_base_linked` struct from 28 bytes to 32 bytes, the increase is small; and in many cases struct padding would anyways cause the `hazptr_obj_base_linked` to be 32 bytes.

Note: This is a mitigation for #2097 . Theoretically its possible to increase contention more to cause the issue to recur, but in practice that's hard to do. See #2107 for a potential fix to the underlying issue.

Reviewed By: ot

Differential Revision: D51829892

fbshipit-source-id: f3ef8f7cf245dd7ff0e1ba6f6ee7bb15ead532ef
facebook-github-bot pushed a commit to facebook/hhvm that referenced this issue Dec 7, 2023
Summary:
PROBLEM
For linked objects, the ref count and link count are stored in 16 bit integers. It's easily possible to overflow either counts.

SOLUTION
Increase the counts to be stored using 32 bits. While this theoretically increases the footprint of the `hazptr_obj_base_linked` struct from 28 bytes to 32 bytes, the increase is small; and in many cases struct padding would anyways cause the `hazptr_obj_base_linked` to be 32 bytes.

Note: This is a mitigation for facebook/folly#2097 . Theoretically its possible to increase contention more to cause the issue to recur, but in practice that's hard to do. See facebook/folly#2107 for a potential fix to the underlying issue.

Reviewed By: ot

Differential Revision: D51829892

fbshipit-source-id: f3ef8f7cf245dd7ff0e1ba6f6ee7bb15ead532ef
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants