Skip to content
Permalink
Browse files
bmalloc mutex should be adaptive
https://bugs.webkit.org/show_bug.cgi?id=177839

Reviewed by Michael Saboff.

Source/bmalloc:

This pulls the WordLock algorithm into bmalloc, mostly by copy-pasting the code. We need to
copy paste because sometimes we build WTF without bmalloc, so WTF cannot rely on bmalloc for
anything other than malloc.

Reland after fixing ancient WordLock bug: the notify_one has to happen with the lock held
to ensure it doesn't run after that thread has died.

* bmalloc/Algorithm.h:
(bmalloc::compareExchangeWeak):
(bmalloc::compareExchangeStrong):
* bmalloc/PerThread.h:
* bmalloc/StaticMutex.cpp:
(bmalloc::StaticMutex::lockSlow):
(bmalloc::StaticMutex::unlockSlow):
(bmalloc::StaticMutex::lockSlowCase): Deleted.
* bmalloc/StaticMutex.h:
(bmalloc::StaticMutex::try_lock):
(bmalloc::StaticMutex::isLocked const):
(bmalloc::StaticMutex::init):
(bmalloc::StaticMutex::tryLock):
(bmalloc::StaticMutex::lock):
(bmalloc::StaticMutex::unlock):
(bmalloc::sleep): Deleted.
(bmalloc::waitUntilFalse): Deleted.

Source/WTF:

Add some comments that I thought of while copy-pasting this code.

Reland after fixing ancient WordLock bug: the notify_one has to happen with the lock held
to ensure it doesn't run after that thread has died.

* wtf/LockAlgorithmInlines.h:
* wtf/WordLock.cpp:



Canonical link: https://commits.webkit.org/194217@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@222945 268f45cc-cd09-0410-ab3c-d52691b4dbfc
  • Loading branch information
pizlonator committed Oct 5, 2017
1 parent ec2143d commit 256169db88d04cf7440e0f39fca7ac601f5116b6
Showing 8 changed files with 350 additions and 57 deletions.
@@ -1,3 +1,18 @@
2017-10-04 Filip Pizlo <fpizlo@apple.com>

bmalloc mutex should be adaptive
https://bugs.webkit.org/show_bug.cgi?id=177839

Reviewed by Michael Saboff.

Add some comments that I thought of while copy-pasting this code.

Reland after fixing ancient WordLock bug: the notify_one has to happen with the lock held
to ensure it doesn't run after that thread has died.

* wtf/LockAlgorithmInlines.h:
* wtf/WordLock.cpp:

2017-10-05 Carlos Alberto Lopez Perez <clopez@igalia.com>

Generate a compile error if release is built without compiler optimizations
@@ -33,6 +33,10 @@
// the lock algorithm slow path without recompiling the world. Right now this should be included in two
// places (Lock.cpp and JSCell.cpp).

// NOTE: It's a bug to use any memory order other than seq_cst in this code. The cost of seq_cst
// fencing is negligible on slow paths, so any use of a more relaxed memory model is all risk and no
// reward.

namespace WTF {

template<typename LockType, LockType isHeldBit, LockType hasParkedBit>
@@ -76,6 +76,10 @@ ThreadData* myThreadData()

} // anonymous namespace

// NOTE: It's a bug to use any memory order other than seq_cst in this code. The cost of seq_cst
// fencing is negligible on slow paths, so any use of a more relaxed memory model is all risk and no
// reward.

NEVER_INLINE void WordLockBase::lockSlow()
{
unsigned spinCount = 0;
@@ -223,7 +227,8 @@ NEVER_INLINE void WordLockBase::unlockSlow()
ASSERT(currentWordValue & isLockedBit);
ASSERT(currentWordValue & isQueueLockedBit);
ThreadData* queueHead = bitwise_cast<ThreadData*>(currentWordValue & ~queueHeadMask);
ASSERT(queueHead);
RELEASE_ASSERT(queueHead);
RELEASE_ASSERT(queueHead->shouldPark); // This would have been set before the thread was enqueued, so it must still be set now.

ThreadData* newQueueHead = queueHead->nextInQueue;
// Either this was the only thread on the queue, in which case we delete the queue, or there
@@ -256,10 +261,12 @@ NEVER_INLINE void WordLockBase::unlockSlow()
{
std::lock_guard<std::mutex> locker(queueHead->parkingLock);
queueHead->shouldPark = false;
// We have to do the notify while still holding the lock, since otherwise, we could try to
// do it after the queueHead has destructed. It's impossible for queueHead to destruct
// while we hold the lock, since it is either currently in the wait loop or it's before it
// so it has to grab the lock before destructing.
queueHead->parkingCondition.notify_one();
}
// Doesn't matter if we notify_all() or notify_one() here since the only thread that could be
// waiting is queueHead.
queueHead->parkingCondition.notify_one();

// The old queue head can now contend for the lock again. We're done!
}
@@ -1,3 +1,35 @@
2017-10-04 Filip Pizlo <fpizlo@apple.com>

bmalloc mutex should be adaptive
https://bugs.webkit.org/show_bug.cgi?id=177839

Reviewed by Michael Saboff.

This pulls the WordLock algorithm into bmalloc, mostly by copy-pasting the code. We need to
copy paste because sometimes we build WTF without bmalloc, so WTF cannot rely on bmalloc for
anything other than malloc.

Reland after fixing ancient WordLock bug: the notify_one has to happen with the lock held
to ensure it doesn't run after that thread has died.

* bmalloc/Algorithm.h:
(bmalloc::compareExchangeWeak):
(bmalloc::compareExchangeStrong):
* bmalloc/PerThread.h:
* bmalloc/StaticMutex.cpp:
(bmalloc::StaticMutex::lockSlow):
(bmalloc::StaticMutex::unlockSlow):
(bmalloc::StaticMutex::lockSlowCase): Deleted.
* bmalloc/StaticMutex.h:
(bmalloc::StaticMutex::try_lock):
(bmalloc::StaticMutex::isLocked const):
(bmalloc::StaticMutex::init):
(bmalloc::StaticMutex::tryLock):
(bmalloc::StaticMutex::lock):
(bmalloc::StaticMutex::unlock):
(bmalloc::sleep): Deleted.
(bmalloc::waitUntilFalse): Deleted.

2017-10-05 Matt Lewis <jlewis3@apple.com>

Unreviewed, rolling out r222893.
@@ -1,5 +1,5 @@
/*
* Copyright (C) 2014 Apple Inc. All rights reserved.
* Copyright (C) 2014-2017 Apple Inc. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
@@ -28,6 +28,7 @@

#include "BAssert.h"
#include <algorithm>
#include <atomic>
#include <cstdint>
#include <cstddef>
#include <limits>
@@ -158,6 +159,24 @@ inline constexpr unsigned long log2(unsigned long value)
return bitCount<unsigned long>() - 1 - __builtin_clzl(value);
}

// We need a CAS API that isn't the badly designed one from C++.
template<typename T>
bool compareExchangeWeak(std::atomic<T>& word, T expected, T desired, std::memory_order order = std::memory_order_seq_cst)
{
// They could have made a sensible CAS API, but they didn't. Sneaky fact: for no good reason, the
// C++ API will mutate expected. I think that apologists will say something about how it saves you
// reloading the value around the back of your CAS loop, but that's nonsense. The delay along the
// back edge of a CAS loop has almost no impact on CAS loop performance. More often than not, we
// want to *add* delays to the back edge, not remove them.
return word.compare_exchange_weak(expected, desired, order, std::memory_order_relaxed);
}

template<typename T>
bool compareExchangeStrong(std::atomic<T>& word, T expected, T desired, std::memory_order order = std::memory_order_seq_cst)
{
return word.compare_exchange_strong(expected, desired, order, std::memory_order_relaxed);
}

} // namespace bmalloc

#endif // Algorithm_h
@@ -60,11 +60,11 @@ class PerThread {
static void destructor(void*);
};

#if HAVE_PTHREAD_MACHDEP_H

class Cache;
template<typename T> struct PerThreadStorage;

#if HAVE_PTHREAD_MACHDEP_H

// For now, we only support PerThread<PerHeapKind<Cache>>. We can expand to other types by
// using more keys.
template<> struct PerThreadStorage<PerHeapKind<Cache>> {
@@ -82,7 +82,7 @@ template<> struct PerThreadStorage<PerHeapKind<Cache>> {
}
};

#else
#endif

template<typename T> struct PerThreadStorage {
static bool s_didInitialize;
@@ -112,8 +112,6 @@ template<typename T> bool PerThreadStorage<T>::s_didInitialize;
template<typename T> pthread_key_t PerThreadStorage<T>::s_key;
template<typename T> std::once_flag PerThreadStorage<T>::s_onceFlag;

#endif

template<typename T>
BINLINE T* PerThread<T>::getFastCase()
{

0 comments on commit 256169d

Please sign in to comment.