Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] TSAN error in s3 (maybe false positive?) #36523

Closed
westonpace opened this issue Jul 7, 2023 · 8 comments · Fixed by #36536
Closed

[C++] TSAN error in s3 (maybe false positive?) #36523

westonpace opened this issue Jul 7, 2023 · 8 comments · Fixed by #36536
Assignees
Labels
Component: C++ Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. Priority: Blocker Marks a blocker for the release Type: bug
Milestone

Comments

@westonpace
Copy link
Member

Describe the bug, including details regarding any error messages, version, and platform.

I encountered this while stress testing #35440 with TSAN enabled but I believe it's not related to #35440 and, indeed, not a bug at all. I'm attaching the log from TSAN here but I'll give a summary.
tsan_out.txt

There are three locks involved and so it is quite confusing.

M0: The holder lock (std::mutex S3ClientHolder::mutex_)
M1: The finalizer lock (std::shared_mutex S3ClientFinalizer::mutex_)
M2: The upload state lock (std::mutex UploadState::mutex)

TSAN reports Cycle in lock order graph: M0 => M1 => M2 => M0

The task is "creating a bunch of files in parallel".

In the main thread we are calling create dir. This all happens in arrow::fs::S3FileSystem::Impl::CreateBucket. As part of this process we:

  • Obtain Exclusive Lock M0
  • Obtain Shared Lock M1

In a CPU thread we are closing a file, which causes us to upload the final part. This all happens as part of arrow::fs::(anonymous namespace)::ObjectOutputStream::UploadPart. As part of this process we:

  • Obtain Shared Lock M1
  • Obtain Exclusive Lock M2

In an I/O thread we are handling the upload outcome which causes us to call close on the file. This all happens as part of arrow::fs::(anonymous namespace)::ObjectOutputStream::HandleUploadOutcome. As part of this process we:

  • Obtain Exclusive Lock M2
  • Obtain Exclusive Lock M0

However, given that M1 is a shared lock, I'm not sure this deadlock that TSAN is reporting is possible.

Unfortunately, I think there is actually a potential deadlock here, just not the one that TSAN is reporting. As part of HandleUploadOutcome it seems we are first locking the upload state and then locking the client holder (M2 then M0). As part of UploadPart we are first locking the client holder and then locking the upload state (M0 then M2). TSAN is reporting this latter operation as M1 then M2 which is also true, but not as interesting.

A good fix would probably be to release the upload state lock in HandleUploadOutcome before marking pending_parts_completed finished (in general it's not a good idea to mark futures finished when holding locks). I want to reproduce and confirm this is actually a problem and this fix actually works.

Component(s)

C++

@westonpace westonpace added Priority: Blocker Marks a blocker for the release Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. and removed Component: C++ labels Jul 7, 2023
@westonpace
Copy link
Member Author

I'm not able to reproduce. I think, by the time the HandleUploadOutcome thread is launched, we can know that the CloseAsync thread has already obtained the upload_state lock, and so it isn't actually possible to trigger this deadlock.

@westonpace
Copy link
Member Author

CC @pitrou for a second opinion but I will look at this further tomorrow too.

@pitrou
Copy link
Member

pitrou commented Jul 7, 2023

Hmm, so what seems to happen is that there are three different threads:

  • MT: M0 (ex) => M1 (sh) in S3ClientHolder::Lock
  • T8: M1 (sh) => M2 (ex) in ObjectOutputStream::UploadPart
  • T21: M2 (ex) => M0 (ex) in ObjectOutputStream::UploadPart

M0 is S3ClientHolder::mutex_.
M1 is S3ClientFinalizer::mutex_.
M2 is UploadState::mutex_.

The complex part is T21: UploadPart calls HandlePartOutcome which marks the pending_parts_completed future finished, which invokes the callback added in CloseAsync, which in turns takes the client holder lock.

Like you, I'm skeptical there is a genuine risk, as M1 is taken shared.

@pitrou
Copy link
Member

pitrou commented Jul 7, 2023

Sorry, I just re-did the analysis you already did :-)

@pitrou
Copy link
Member

pitrou commented Jul 7, 2023

Unfortunately, I think there is actually a potential deadlock here, just not the one that TSAN is reporting. As part of HandleUploadOutcome it seems we are first locking the upload state and then locking the client holder (M2 then M0). As part of UploadPart we are first locking the client holder and then locking the upload state (M0 then M2).

A reasonable fix is to lock the client holder only as long as necessary. I should have done that from the start:

diff --git a/cpp/src/arrow/filesystem/s3fs.cc b/cpp/src/arrow/filesystem/s3fs.cc
index f05d89963c..79b8716017 100644
--- a/cpp/src/arrow/filesystem/s3fs.cc
+++ b/cpp/src/arrow/filesystem/s3fs.cc
@@ -1527,8 +1527,6 @@ class ObjectOutputStream final : public io::OutputStream {
 
   Status UploadPart(const void* data, int64_t nbytes,
                     std::shared_ptr<Buffer> owned_buffer = nullptr) {
-    ARROW_ASSIGN_OR_RAISE(auto client_lock, holder_->Lock());
-
     S3Model::UploadPartRequest req;
     req.SetBucket(ToAwsString(path_.bucket));
     req.SetKey(ToAwsString(path_.key));
@@ -1538,7 +1536,8 @@ class ObjectOutputStream final : public io::OutputStream {
 
     if (!background_writes_) {
       req.SetBody(std::make_shared<StringViewStream>(data, nbytes));
-      auto outcome = client_lock->UploadPart(req);
+      ARROW_ASSIGN_OR_RAISE(auto client_lock, holder_->Lock());
+      auto outcome = std::move(client_lock)->UploadPart(req);
       if (!outcome.IsSuccess()) {
         return UploadPartError(req, outcome);
       } else {
@@ -1562,12 +1561,13 @@ class ObjectOutputStream final : public io::OutputStream {
           upload_state_->pending_parts_completed = Future<>::Make();
         }
       }
+      ARROW_ASSIGN_OR_RAISE(auto client_lock, holder_->Lock());
       // XXX This callback returns Aws::Utils::Outcome, it cannot easily call
       // `holder->Lock()` which returns arrow::Result.
       ARROW_ASSIGN_OR_RAISE(
           auto fut,
           SubmitIO(io_context_, [client_lock = std::move(client_lock), req]() mutable {
-            return client_lock->UploadPart(req);
+            return std::move(client_lock)->UploadPart(req);
           }));
       // The closure keeps the buffer and the upload state alive
       auto state = upload_state_;

@pitrou
Copy link
Member

pitrou commented Jul 7, 2023

By the way, it seems S3 is disabled in our TSAN CI job. Should we enable it?

@pitrou
Copy link
Member

pitrou commented Jul 7, 2023

I'm getting other potential false positives from TSAN when run under Ubuntu 22.04 with Clang 14.0.0, so I opened an issue there:
google/sanitizers#1668

@westonpace
Copy link
Member Author

By the way, it seems S3 is disabled in our TSAN CI job. Should we enable it?

It would be nice. We do a fair amount of thread management here. However, it's mostly controlled by Future / AsyncTaskScheduler / etc. so perhaps not as critical.

Note that I did not actually end up finding any real bugs with TSAN once I quieted down the false positives.

A reasonable fix is to lock the client holder only as long as necessary. I should have done that from the start:

That seems fine. I think it would still be good to unlock the upload_state before marking the flush complete too. It's not a bug but it will make it easier to reason about the code in the future.

pitrou added a commit that referenced this issue Jul 10, 2023
It is counter-intuitive, but lock ordering issues can happen even with a shared mutex locked in shared mode.
The reason is that locking again in shared mode can block while there are threads waiting to take the lock in exclusive mode. For this reason, we must avoid to keep the S3ClientLock taken before is it taken again.

* Closes: #36523

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: C++ Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. Priority: Blocker Marks a blocker for the release Type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants