Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: introduce a thread-safe generic container and use it to remove a bunch of "GlobalMutex"es #25390

Closed
wants to merge 9 commits into from

Conversation

vasild
Copy link
Contributor

@vasild vasild commented Jun 16, 2022

Introduce a generic container that provides a thread-safe access to any object by using a mutex which is acquired every time the object accessed.

For example:

Synced<std::unordered_map<int, int>> m{{3, 9}, {4, 16}};

{
    SYNCED_LOCK(m, m_locked);

    // m_locked represents the internal object, i.e. std::unordered_map,
    // while m_locked is in scope the internal mutex is locked.

    auto it = m_locked->find(3);
    if (it != m_locked->end()) {
        std::cout << it->second << std::endl;
    }

    for (auto& [k, v] : *m_locked) {
        std::cout << k << ", " << v << std::endl;
    }
}

WITH_SYNCED_LOCK(m, p, p->emplace(5, 25));

Remove the global mutexes g_maplocalhost_mutex, g_deadline_timers_mutex, cs_dir_locks, g_loading_wallet_mutex, g_wallet_release_mutex and use Synced<T> instead.

Benefits

copied from a comment below:

The Synced<T> abstraction is similar to what is suggested in this comment but it does so in a generic way to avoid code repetition. Its benefit is:

  1. It avoids code repetition at the implementation sites. See PR#26151 for a live example. Namely this:
Lots of repetitions (92 lines)
class Foo
{
public:
    void PushBack(x)
    {
        LOCK(m_mutex);
        m_data.push_back(x);
    }

    size_t Size()
    {
        LOCK(m_mutex);
        return m_data.size();
    }

    // maybe also other methods if needed...

    auto Lock()
    {
        return DebugLock<Mutex>{m_mutex, "Foo::m_mutex", __FILE__, __LINE__};
    }

private:
    Mutex m_mutex;
    std::vector<int> m_data;
};

class Bar
{
public:
    void PushBack(x)
    {
        LOCK(m_mutex);
        m_data.push_back(x);
    }

    size_t Size()
    {
        LOCK(m_mutex);
        return m_data.size();
    }

    // maybe also other methods if needed...

    auto Lock()
    {
        return DebugLock<Mutex>{m_mutex, "Bar::m_mutex", __FILE__, __LINE__};
    }

private:
    Mutex m_mutex;
    std::vector<std::string> m_data;
};

class Baz
{
public:
    void Insert(x)
    {
        LOCK(m_mutex);
        m_data.insert(x);
    }

    size_t Size()
    {
        LOCK(m_mutex);
        return m_data.size();
    }

    // maybe also other methods if needed...

    auto Lock()
    {
        return DebugLock<Mutex>{m_mutex, "Baz::m_mutex", __FILE__, __LINE__};
    }

private:
    Mutex m_mutex;
    std::set<std::string> m_data;
};

becomes this:

Short (3 lines)
Synced<std::vector<int>> Foo;
Synced<std::vector<std::string>> Bar;
Synced<std::set<std::string>> Baz;
  1. The mutex is properly encapsulated. With a global mutex and a global variable annotated with GUARDED_BY() it is indeed not possible to add new code that accesses the variable without protection (if using Clang and -Wthread-safety-analysis and -Werror), but it is possible to abuse the mutex and start using it to protect some more, possibly unrelated stuff (we already have this in the current code).

Copy link
Contributor

@w0xlt w0xlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach ACK

@DrahtBot
Copy link
Contributor

DrahtBot commented Jun 16, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK hebasto
Approach ACK w0xlt, ajtowns
Stale ACK jonatack

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #25722 (refactor: Use util::Result class for wallet loading by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@ryanofsky
Copy link
Contributor

This seems like a potentially useful alternative to clang thread safety annotations. The ThreadSafePtr<T> class forces code to lock a mutex when accessing data, just like TSA annotations do, except unlike TSA annotations, it doesn't rely on a nonstandard compiler extension, or suffer from quirks that come from doing a limited static analysis.

ThreadSafePtr<T> is obviously not a complete substitute for thread safety annotations since it only handles the simple case where a single non-recursive Mutex is used to protect access to a single variable. But the variable can have any C++ type (primitive, container, or struct), so it's probably flexible enough for a lot of cases.

I don't think the ThreadSafePtr<T> is the best name because xxx_ptr<T> implies the type is some kind of lightweight reference to the T, not a container which holds the T. I would call it something like Synced<T> or LockedData<T>. (An analogy here would be std::optional<T>, which is not called std::optional_ptr<T>, even though it has * and -> members, because it's purpose is to be a container, not a pointer. The class is named after what it's used for not what kind of members it has.)

src/sync.h Outdated Show resolved Hide resolved
src/sync.h Outdated Show resolved Hide resolved
@DrahtBot DrahtBot mentioned this pull request Jun 16, 2022
@ajtowns
Copy link
Contributor

ajtowns commented Jun 17, 2022

This seems like a non-starter to me? It's not even able to detect obvious double locks at compile-time:

    ThreadSafePtr<std::map<int, int>> m;
    m->emplace(5, 25);
    {
        auto m_locked = *m;
        m_locked->emplace(6, 36);
        m->emplace(7, 49);  // double lock

        auto m_locked2 = *m; // double lock
        m_locked->emplace(9, 81);
        m_locked2->emplace(10, 100);
    }

Even if it weren't worse at catching bugs, it doesn't seem like an improvement over writing:

    Mutex mut;
    std::map<int, int> m GUARDED_BY(mut);

    WITH_LOCK(mut, m.emplace(5, 25));
    {
        LOCK(mut);
        m.emplace(6, 36);
        m.emplace(7, 49);  // no double lock

        LOCK(mut); // double lock - detected at compile time
        m.emplace(9, 81);
        m.emplace(10, 100);
    }

@ryanofsky
Copy link
Contributor

It's not even able to detect obvious double locks at compile-time:

I assume it can be annotated just like any other lock. Agree implementation should fix this, though.

Even if it weren't worse at catching bugs, it doesn't seem like an improvement over writing

Well one improvement is that it enforces locking on all compilers, unlike the clang annotations. The code itself doesn't seem much better or worse in this case for this very simple data structure, but it's probably is worth experimenting with for chains and chainstates and mempools, etc to be able to avoid logic bugs, distinguish different uses of cs_main for different purposes, and not just have LOCK(cs_main) everywhere what no indication about what exactly is being locked or why.

Looking at your examples, though I would even more want to replace the *m syntax with m.Lock() as suggested previously to make usage more obvious.

@vasild
Copy link
Contributor Author

vasild commented Jun 20, 2022

e577177a33...a870b2b891: address suggestions and remove optional commit that was a kind of scope creep for this PR - net: simplify logic around reachable networks and -onlynet, it will make it in a followup.

@ryanofsky very insightful review, thanks for the suggestions!

This seems like a potentially useful alternative to clang thread safety annotations. The ThreadSafePtr<T> class forces code to lock a mutex when accessing data, just like TSA annotations do, except...

Hmm, right, I did not think of this from that perspective. In addition - TSA do not actually "force" anything, they emit a mere warning if compiled with clang. They do nothing for gcc. And if -Werror is not used to turn the warning into an error, then they can be missed/ignored.

I see this as a complementary to TSA.

I don't think the ThreadSafePtr<T> is the best name because xxx_ptr<T> implies the type is some kind of lightweight reference to the T, not a container which holds the T. I would call it something like Synced<T> ...

Renamed to Synced<T>, thanks!

@ajtowns
Copy link
Contributor

ajtowns commented Jun 23, 2022

I assume it can be annotated just like any other lock. Agree implementation should fix this, though.

I think you'd have to mark the Synced object as being lock itself (LOCKABLE), and mark the Proxy class as being a RAII guard (SCOPED_LOCKABLE), with the constructor/destructor annotated appropriately (EXCLUSIVE_LOCK_FUNCTION(ref_to_scoped_obj) and UNLOCK_FUNCTION()), and with the functions that create the Proxy (Lock and operator->) annotated with negative constraints (EXCLUSIVE_LOCKS_REQUIRED(!this))? I've got pretty low confidence that that will actually work as hoped/expected though...

Well one improvement is that it enforces locking on all compilers, unlike the clang annotations.

We already enforce locking is correct via compiling with clang in CI; it's certainly an improvement to get those warnings earlier for anyone who's not using clang or doesn't have the options enabled, but [EDIT: oops, didn't finish the thought:] not at the cost of losing some checks entirely.

@maflcko
Copy link
Member

maflcko commented Jun 23, 2022

I don't see any obvious benefit here, looking at the comparison with current code in #25390 (comment)

@vasild
Copy link
Contributor Author

vasild commented Jun 29, 2022

a870b2b891...7b05e787cf: use LOCK(synced) at call sites and remove more GlobalMutexes.

The benefit is not at the call sites - they can use different flavors of syntax sugar but all of them more or less boil down to the same thing. I changed it to use { LOCK(foo); foo->Method1(); foo->Method2(); ... } so that it is not possible to misuse like @ajtowns suggested above.

The Synced<T> abstraction is similar to what is suggested in this comment but it does so in a generic way to avoid code repetition. Its benefit is twofold:

  1. It avoids code repetition at the implementation sites. Namely this:
Lots of repetitions (92 lines)
class Foo
{
public:
    void PushBack(x)
    {
        LOCK(m_mutex);
        m_data.push_back(x);
    }

    size_t Size()
    {
        LOCK(m_mutex);
        return m_data.size();
    }

    // maybe also other methods if needed...

    auto Lock()
    {
        return DebugLock<Mutex>{m_mutex, "Foo::m_mutex", __FILE__, __LINE__};
    }

private:
    Mutex m_mutex;
    std::vector<int> m_data;
};

class Bar
{
public:
    void PushBack(x)
    {
        LOCK(m_mutex);
        m_data.push_back(x);
    }

    size_t Size()
    {
        LOCK(m_mutex);
        return m_data.size();
    }

    // maybe also other methods if needed...

    auto Lock()
    {
        return DebugLock<Mutex>{m_mutex, "Bar::m_mutex", __FILE__, __LINE__};
    }

private:
    Mutex m_mutex;
    std::vector<std::string> m_data;
};

class Baz
{
public:
    void Insert(x)
    {
        LOCK(m_mutex);
        m_data.insert(x);
    }

    size_t Size()
    {
        LOCK(m_mutex);
        return m_data.size();
    }

    // maybe also other methods if needed...

    auto Lock()
    {
        return DebugLock<Mutex>{m_mutex, "Baz::m_mutex", __FILE__, __LINE__};
    }

private:
    Mutex m_mutex;
    std::set<std::string> m_data;
};

becomes this:

Short (3 lines)
Synced<std::vector<int>> Foo;
Synced<std::vector<std::string>> Bar;
Synced<std::set<std::string>> Baz;
  1. The mutex is properly encapsulated. With a global mutex and a global variable annotated with GUARDED_BY() it is indeed not possible to add new code that accesses the variable without protection (if using Clang and -Wthread-safety-analysis and -Werror), but it is possible to abuse the mutex and start using it to protect some more, possibly unrelated stuff (we already have this in the current code).

@maflcko
Copy link
Member

maflcko commented Jun 29, 2022

Ah, looks like this is std::atomic for structs/classes then

@vasild
Copy link
Contributor Author

vasild commented Jun 29, 2022

Ah, looks like this is std::atomic for structs/classes then

Right, kind of. std::atomic can be used for any trivially copyable structs/classes too and if necessary it will use a mutex internally. The difference is that reading from an atomic would read it in a safe way from the memory and would return a copy of the stored object. So if we call a method of the stored object the mutex will be released before the method is called and the method will be called on the copy. atomic also does not provide a way to lock for longer time, spanning calls to multiple methods.

@vasild vasild changed the title sync: introduce a thread-safe smart pointer and use it to remove g_maplocalhost_mutex sync: introduce a thread-safe smart container and use it to remove g_maplocalhost_mutex Jul 5, 2022
@vasild vasild changed the title sync: introduce a thread-safe smart container and use it to remove g_maplocalhost_mutex sync: introduce a thread-safe smart container and use it to remove a bunch of "GlobalMutex"es Jul 5, 2022
Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think right now I am at "concept -0.5" on this PR at 7b05e78. If other people think this change is a good idea, it seems fine, but I think the complexity it adds to sync.h isn't really justified by the minor code simplifications it enables elsewhere. Maybe if changes to sync.h were simplified or if a compelling use-case were explained, this would look like a better tradeoff.


re: #25390 (comment)

This seems like a non-starter to me? It's not even able to detect obvious double locks at compile-time:

This is fixed now by allowing double locking. Following test passes test_bitcoin -t sync_tests/synced_double_lock

diff --git a/src/test/sync_tests.cpp b/src/test/sync_tests.cpp
index 55c2c5108de..3c225847529 100644
--- a/src/test/sync_tests.cpp
+++ b/src/test/sync_tests.cpp
@@ -140,4 +140,18 @@ BOOST_AUTO_TEST_CASE(inconsistent_lock_order_detected)
 #endif // DEBUG_LOCKORDER
 }
 
+BOOST_AUTO_TEST_CASE(synced_double_lock)
+{
+    Synced<std::map<int, int>> m;
+    m->emplace(5, 25);
+    {
+        auto m_locked = m.Lock();
+        m_locked->emplace(6, 36);
+        m->emplace(7, 49);  // double lock
+        auto m_locked2 = m.Lock(); // double lock
+        m_locked->emplace(9, 81);
+        m_locked2->emplace(10, 100);
+    }
+}
+
 BOOST_AUTO_TEST_SUITE_END()

src/sync.h Outdated
Comment on lines 457 to 464
// Avoid double-lock if the current thread is already the owner.
// This read of parent.m_owner is unprotected but that is ok because
// the result of the comparison to the current thread cannot be
// changed by other, concurrently executing, threads.
: m_lock{parent.m_owner == std::this_thread::get_id() ? nullptr : &parent.m_mutex,
mutex_name,
file_name,
line},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "sync: introduce a thread safe smart container" (618809a)

It seems like you could avoid this m_owner stuff by just replacing Mutex with RecursiveMutex below. Otherwise this code just seems like it is implementing RecursiveMutex on top of Mutex, which only makes it more complex and probably less efficient than using RecursiveMutex directly.

Maybe it would be better to disallow double locking and use clang thread annotations to prevent obvious cases of double locking at compile time. But maybe this is not possible. And allowing double locking doesn't seem like an inherently bad thing here apart from the slight performance overhead. Code fragility problems associated with recursive mutex usage don't seem like they would be an problem here since the mutex is private and the class has a limited interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Simplified this by removing m_owner and using RecursiveMutex.

This reduced the Synced implementation from 110 to 58 lines (excluding comments).

It is possible to further reduce it by removing Synced::UniqueLock but then call sites would have to use auto lock = foo.Lock(); instead of the sweet LOCK(foo);. I think either way is fine.

Maybe it would be better to disallow double locking and use clang thread annotations to prevent obvious cases of double locking at compile time. But maybe this is not possible.

I will look at this again, but last time I tried I couldn't do that - stumbled on some limitations and bizarre behavior that looked like a bug in the thread safety annotations. Notice that even now some cases are prevented at compile time, like:

LOCK(foo);
...
LOCK(foo);

@vasild
Copy link
Contributor Author

vasild commented Jul 13, 2022

7b05e787cf...acb21a5a24: simplify the implementation of Synced, as suggested above.

@vasild
Copy link
Contributor Author

vasild commented Oct 23, 2023

b749c4d2ca...92a444d0d7: rebase due to conflicts, drop commit 93e60b9 net: detach vfLimited from g_maplocalhost_mutex because vfLimited + SetReachable() have already been encapsulated (see 6e30865 from #27071).

It can contain anything, encapsulates a mutex to synchronize access to
it and enfoces access through it.
-BEGIN VERIFY SCRIPT-
sed -i -e 's/mapLocalHost/g_my_net_addr/g' $(git grep -l mapLocalHost)
-END VERIFY SCRIPT-
Convert `g_my_net_addr` to use `Synced<T>` and ditch the global mutex
`g_maplocalhost_mutex`.
-BEGIN VERIFY SCRIPT-
sed -i -e 's/deadlineTimers/g_deadline_timers/g' $(git grep -l deadlineTimers)
-END VERIFY SCRIPT-
…_mutex

Convert `g_deadline_timers` to use `Synced<T>` and ditch the global mutex
`g_deadline_timers_mutex`.
-BEGIN VERIFY SCRIPT-
sed -i -e 's/\<dir_locks\>/g_dir_locks/g' $(git grep -l dir_locks)
-END VERIFY SCRIPT-
Convert `g_dir_locks` to use `Synced<T>` and ditch the global mutex
`cs_dir_locks`.
…allet_mutex

Convert `g_loading_wallet_set` to use `Synced<T>` and ditch the global mutex
`g_loading_wallet_mutex`.
…release_mutex

Convert `g_unloading_wallet_set` to use `Synced<T>` and ditch the global mutex
`g_wallet_release_mutex`.
@vasild
Copy link
Contributor Author

vasild commented Dec 13, 2023

92a444d0d7...f3489638dc: rebase due to conflicts

@jonatack
Copy link
Contributor

jonatack commented Jan 6, 2024

Will review the updates since my initial ACK #25390 (review).

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any examples from the last couple of years where this would have helped to prevent a bug or other issue?

@vasild
Copy link
Contributor Author

vasild commented Apr 25, 2024

Are there any examples from the last couple of years where this would have helped to prevent a bug or other issue?

Here is an example where two unrelated variables were guarded by the same mutex (already fixed in the latest code):

bitcoin/src/net.cpp

Lines 116 to 118 in c42ded3

GlobalMutex g_maplocalhost_mutex;
std::map<CNetAddr, LocalServiceInfo> mapLocalHost GUARDED_BY(g_maplocalhost_mutex);
static bool vfLimited[NET_MAX] GUARDED_BY(g_maplocalhost_mutex) = {};

@vasild
Copy link
Contributor Author

vasild commented Apr 25, 2024

Closing this due to no interest from reviewers for a long time plus currently my hands are full with more important things. Would be happy to revisit if there is interest.

@vasild vasild closed this Apr 25, 2024
@maflcko
Copy link
Member

maflcko commented Apr 25, 2024

Are there any examples from the last couple of years where this would have helped to prevent a bug or other issue?

Here is an example where two unrelated variables were guarded by the same mutex (already fixed in the latest code):

I don't think this is an example of a bug or other issue.

This was added in commit b312cd7 to fix a tsan warning. I guess the annotation was using the wrong mutex, but I doubt this will happen in newly written code. Also, the mistake in this case seems harmless, as the compiled binary of bitcoind didn't even change: #13123 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants