Improve S3 multipart locking scenarios #2055

Shelnutt2 · 2021-01-27T21:14:20Z

This PR has 3 parts, separated into three commits. The first commit adds getters/setters for the MultiPartUploadState class, and adds thread-safety locking to this class. This allows us to move a lot of locking from the S3 class to a per file locking in the MultiPartUploadState class itself.

The second commit reworks the multipart_upload_mtx_ locking to allow us to release early and reduce the duration and places where we hold S3 class level locking. The main changes involve switching from iterators of multipart_upload_states_ to fetching references to the specific MultiPartUploadState and rely on the locking introduced to that class.

The third change, parallelizes the disconnect and the marking of completion or failure of any outstanding multi-part requests.

shortcut-integration · 2021-01-27T21:14:23Z

This pull request has been linked to Clubhouse Story #5079: S3 multipart locking improvements.

joe-maley

LGTM for the first patch.

I had a slightly different idea the the MultiPartState::mutex_. I was expecting that the mutex would be exposed so that S3 can lock for the entire time that is it manipulating the state. E.g.:

S3::foo() {
  std::unique_lock<std::mutex> multipart_lck(multipart_upload_mtx_);
  MultiPartState* state = &multipart_upload_states_.at("my_uri);
  std::unique_lock state_lock(*state->mutex());
  multipart_lck.unlock();

  int n = state->part_number() + 1
  state->part_number(n);
  state_lock.unlock();

As it stands, there could be a race with:

int n = state->part_number() + 1
state->part_number(n);

If T1 and T2 are both executing this path, we'll get a lost +1.

tiledb/sm/filesystem/s3.cc

Shelnutt2 · 2021-01-27T22:17:55Z

LGTM for the first patch.

I had a slightly different idea the the MultiPartState::mutex_. I was expecting that the mutex would be exposed so that S3 can lock for the entire time that is it manipulating the state. E.g.:

I see the potential race conditions but I'm not a huge fan of turning over the state locking to the S3 class. This would require the user of the state class to always know to handle the locking. If we can control the locking through accessors then the caller doesn't have to worry about the locking scheme. Would you be open to reworking the member functions to better handle the functionality? For instance there is a common pattern of getting the bucket() and key() together, which right now has two calls and a race in between the function calls. I can adjust the usage to allow fetching or returning multiple parameters together, or manipulating like the +1 case. What are your thoughts on that approach?

S3::foo() {
  std::unique_lock<std::mutex> multipart_lck(multipart_upload_mtx_);
  MultiPartState* state = &multipart_upload_states_.at("my_uri);
  std::unique_lock state_lock(*state->mutex());
  multipart_lck.unlock();

  int n = state->part_number() + 1
  state->part_number(n);
  state_lock.unlock();

As it stands, there could be a race with:

int n = state->part_number() + 1
state->part_number(n);

If T1 and T2 are both executing this path, we'll get a lost +1.

joe-maley · 2021-01-27T23:23:43Z

On principle, I'd be OK with modifying the MultiPartState to include functions that make it thread safe. I realize that I misread the commit message. I thought this was the first-of-three, so I need to revise my suggestion:

This patch introduces races among the following paths:
S3::disconnect
S3::write_multipart
S3::flush_object
S3::finish_flush_object
S3::get_make_upload_part_req
I'm considering scenarios where these routines are operating on the same state and/or URI. I don't know if we see these in practice, but this is a regression on thread safety.

I think it would be significantly easier to expose the MultiPartState::mutex_ (or have the S3 own the locks, e.g. refactor multipart_upload_states_ to type unordered_map<string, pair<MultiPartUploadState, mutex>). However, it would also be possible to refactor the bulk of the logic in S3 into MultiPartState, e.g. MultiPartState::disconnect, MultiPartState::flush, etc.

This switches from iteraters to using find + at to take references to the state objects for manipulations. This allows us to release the S3 class level locks earlier and faster removing performance bottlenecks.

On disconnect of S3 we can parallelize the marking of multi-part uploads as complete. This allows us to remove the exclusive lock early and increase performance if we have a large number of outstanding requests.

Shelnutt2 · 2021-01-28T04:11:19Z

I think it would be significantly easier to expose the MultiPartState::mutex_

You were right. I don't really like it, but it works, is safe and its unlikely anyone will make heavy modifications to this class in the near future so probably safe enough from a developer standpoint for now.

I added the last changes as a 4th commit in case additional changes are requested. We can safely squash on merge. I kept them as separate commits for ease of comparison across commits since there is multiple changes included.

tiledb/sm/filesystem/s3.h

tiledb/sm/filesystem/s3.cc

Shelnutt2 · 2021-01-28T16:55:29Z

After offline discussion with @joe-maley we agreed to add a c++11 compatible RWLock mechanism as this S3 class needs this to handle proper locking and avoid the worst case scenario of class level lock stalling on IO. The write locks are now limited to only the map operations and do not have any S3 IO associated with them.

tiledb/common/rwlock.h

tiledb/sm/filesystem/s3.cc

joe-maley · 2021-01-28T17:21:45Z

tiledb/sm/filesystem/s3.cc

@@ -1332,37 +1345,66 @@ Status S3::write_multipart(
  const std::string uri_path(aws_uri.GetPath().c_str());

  // Take a lock protecting the shared multipart data structures
-  std::unique_lock<std::mutex> multipart_lck(multipart_upload_mtx_);
+  // Read lock to see if it exists
+  multipart_upload_mtx_.read_lock();


I think it would be appropriate to grab the write lock here. I don't think we gain anything by using a read lock and then retrying with a write lock.

Making an assumption, I believe the write to an existing part will be the most common use code path in this function. If that is true, then grabbing a write lock here will add a source of contention to the entire class. I think the overhead of switching locks is likely to be less than the exclusive write lock across this entire section.

tiledb/sm/filesystem/s3.cc

joe-maley

LGTM two nit-picks

tiledb/sm/filesystem/s3.cc

tiledb/sm/filesystem/s3.h

@joe-maley

A new RWLock class is introduced into the common folder. @joe-maley provided this excellent class. We then use this RWLock in the S3 VFS class to handle the multipart uploads and to manage the locks more granular to remove locking and contention for multiple concurrent upload operations.

github-actions · 2021-01-28T22:14:15Z

The backport to release-2.2 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-release-2.2 release-2.2
# Navigate to the new working tree
cd .worktrees/backport-release-2.2
# Create a new branch
git switch --create backport-2055-to-release-2.2
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick ---mainline 1 8eca6f7d41821e0e3811c3e3a02648ea51861283
# Push it to GitHub
git push --set-upstream origin backport-2055-to-release-2.2
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-release-2.2

Then, create a pull request where the base branch is release-2.2 and the compare/head branch is backport-2055-to-release-2.2.

Shelnutt2 added s3 backport release-2.2 labels Jan 27, 2021

Shelnutt2 requested review from ypatia, ihnorton, stavrospapadopoulos and joe-maley January 27, 2021 21:14

Shelnutt2 force-pushed the sethshelnutt/ch5079/s3-multipart-locking-improvements branch 2 times, most recently from 17a7c39 to 8a00b0d Compare January 27, 2021 21:15

joe-maley approved these changes Jan 27, 2021

View reviewed changes

tiledb/sm/filesystem/s3.cc Show resolved Hide resolved

tiledb/sm/filesystem/s3.cc Show resolved Hide resolved

tiledb/sm/filesystem/s3.cc Show resolved Hide resolved

tiledb/sm/filesystem/s3.cc Show resolved Hide resolved

tiledb/sm/filesystem/s3.cc Outdated Show resolved Hide resolved

Shelnutt2 added 3 commits January 27, 2021 22:45

Add MultiPartUploadState getters/setters with thread-safe locking

628e059

Adjust s3 multi-part locking to unlock early

8a0d01f

This switches from iteraters to using find + at to take references to the state objects for manipulations. This allows us to release the S3 class level locks earlier and faster removing performance bottlenecks.

Parallelize s3 multipart on disconnect

d2fed0c

On disconnect of S3 we can parallelize the marking of multi-part uploads as complete. This allows us to remove the exclusive lock early and increase performance if we have a large number of outstanding requests.

Shelnutt2 force-pushed the sethshelnutt/ch5079/s3-multipart-locking-improvements branch from 8a00b0d to d196d9e Compare January 28, 2021 04:05

Allow S3 class mutex locking for MultiPartUploadState

f064944

Shelnutt2 force-pushed the sethshelnutt/ch5079/s3-multipart-locking-improvements branch from d196d9e to f064944 Compare January 28, 2021 04:10

Shelnutt2 requested a review from joe-maley January 28, 2021 04:11

joe-maley reviewed Jan 28, 2021

View reviewed changes

tiledb/sm/filesystem/s3.h Outdated Show resolved Hide resolved

tiledb/sm/filesystem/s3.h Outdated Show resolved Hide resolved

tiledb/sm/filesystem/s3.cc Show resolved Hide resolved

Shelnutt2 requested a review from joe-maley January 28, 2021 16:53

Shelnutt2 force-pushed the sethshelnutt/ch5079/s3-multipart-locking-improvements branch from 6dabf6e to bf85deb Compare January 28, 2021 16:54

joe-maley reviewed Jan 28, 2021

View reviewed changes

Shelnutt2 force-pushed the sethshelnutt/ch5079/s3-multipart-locking-improvements branch 3 times, most recently from b1439e2 to 9bfea12 Compare January 28, 2021 20:19

joe-maley approved these changes Jan 28, 2021

View reviewed changes

tiledb/sm/filesystem/s3.cc Outdated Show resolved Hide resolved

tiledb/sm/filesystem/s3.h Show resolved Hide resolved

Shelnutt2 force-pushed the sethshelnutt/ch5079/s3-multipart-locking-improvements branch from 9bfea12 to ca7157b Compare January 28, 2021 20:47

Shelnutt2 merged commit 8eca6f7 into dev Jan 28, 2021

Shelnutt2 mentioned this pull request Jan 28, 2021

[Backport release-2.2] Improve S3 multipart locking scenarios #2059

Merged

Shelnutt2 deleted the sethshelnutt/ch5079/s3-multipart-locking-improvements branch April 13, 2021 11:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve S3 multipart locking scenarios #2055

Improve S3 multipart locking scenarios #2055

Shelnutt2 commented Jan 27, 2021

shortcut-integration bot commented Jan 27, 2021

joe-maley left a comment

Shelnutt2 commented Jan 27, 2021

joe-maley commented Jan 27, 2021 •

edited

Shelnutt2 commented Jan 28, 2021 •

edited

Shelnutt2 commented Jan 28, 2021

joe-maley Jan 28, 2021

Shelnutt2 Jan 28, 2021

joe-maley left a comment

github-actions bot commented Jan 28, 2021

Improve S3 multipart locking scenarios #2055

Improve S3 multipart locking scenarios #2055

Conversation

Shelnutt2 commented Jan 27, 2021

shortcut-integration bot commented Jan 27, 2021

joe-maley left a comment

Choose a reason for hiding this comment

Shelnutt2 commented Jan 27, 2021

joe-maley commented Jan 27, 2021 • edited

Shelnutt2 commented Jan 28, 2021 • edited

Shelnutt2 commented Jan 28, 2021

joe-maley Jan 28, 2021

Choose a reason for hiding this comment

Shelnutt2 Jan 28, 2021

Choose a reason for hiding this comment

joe-maley left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 28, 2021

joe-maley commented Jan 27, 2021 •

edited

Shelnutt2 commented Jan 28, 2021 •

edited