Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve S3 multipart locking scenarios #2055

Merged
merged 5 commits into from Jan 28, 2021

Conversation

Shelnutt2
Copy link
Member

This PR has 3 parts, separated into three commits. The first commit adds getters/setters for the MultiPartUploadState class, and adds thread-safety locking to this class. This allows us to move a lot of locking from the S3 class to a per file locking in the MultiPartUploadState class itself.

The second commit reworks the multipart_upload_mtx_ locking to allow us to release early and reduce the duration and places where we hold S3 class level locking. The main changes involve switching from iterators of multipart_upload_states_ to fetching references to the specific MultiPartUploadState and rely on the locking introduced to that class.

The third change, parallelizes the disconnect and the marking of completion or failure of any outstanding multi-part requests.

@shortcut-integration
Copy link

This pull request has been linked to Clubhouse Story #5079: S3 multipart locking improvements.

@Shelnutt2 Shelnutt2 force-pushed the sethshelnutt/ch5079/s3-multipart-locking-improvements branch 2 times, most recently from 17a7c39 to 8a00b0d Compare January 27, 2021 21:15
Copy link
Contributor

@joe-maley joe-maley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the first patch.

I had a slightly different idea the the MultiPartState::mutex_. I was expecting that the mutex would be exposed so that S3 can lock for the entire time that is it manipulating the state. E.g.:

S3::foo() {
  std::unique_lock<std::mutex> multipart_lck(multipart_upload_mtx_);
  MultiPartState* state = &multipart_upload_states_.at("my_uri);
  std::unique_lock state_lock(*state->mutex());
  multipart_lck.unlock();

  int n = state->part_number() + 1
  state->part_number(n);
  state_lock.unlock();

As it stands, there could be a race with:

int n = state->part_number() + 1
state->part_number(n);

If T1 and T2 are both executing this path, we'll get a lost +1.

tiledb/sm/filesystem/s3.cc Show resolved Hide resolved
tiledb/sm/filesystem/s3.cc Show resolved Hide resolved
tiledb/sm/filesystem/s3.cc Show resolved Hide resolved
tiledb/sm/filesystem/s3.cc Show resolved Hide resolved
tiledb/sm/filesystem/s3.cc Outdated Show resolved Hide resolved
@Shelnutt2
Copy link
Member Author

LGTM for the first patch.

I had a slightly different idea the the MultiPartState::mutex_. I was expecting that the mutex would be exposed so that S3 can lock for the entire time that is it manipulating the state. E.g.:

I see the potential race conditions but I'm not a huge fan of turning over the state locking to the S3 class. This would require the user of the state class to always know to handle the locking. If we can control the locking through accessors then the caller doesn't have to worry about the locking scheme. Would you be open to reworking the member functions to better handle the functionality? For instance there is a common pattern of getting the bucket() and key() together, which right now has two calls and a race in between the function calls. I can adjust the usage to allow fetching or returning multiple parameters together, or manipulating like the +1 case. What are your thoughts on that approach?

S3::foo() {
  std::unique_lock<std::mutex> multipart_lck(multipart_upload_mtx_);
  MultiPartState* state = &multipart_upload_states_.at("my_uri);
  std::unique_lock state_lock(*state->mutex());
  multipart_lck.unlock();

  int n = state->part_number() + 1
  state->part_number(n);
  state_lock.unlock();

As it stands, there could be a race with:

int n = state->part_number() + 1
state->part_number(n);

If T1 and T2 are both executing this path, we'll get a lost +1.

@joe-maley
Copy link
Contributor

joe-maley commented Jan 27, 2021

On principle, I'd be OK with modifying the MultiPartState to include functions that make it thread safe. I realize that I misread the commit message. I thought this was the first-of-three, so I need to revise my suggestion:

This patch introduces races among the following paths:
S3::disconnect
S3::write_multipart
S3::flush_object
S3::finish_flush_object
S3::get_make_upload_part_req
I'm considering scenarios where these routines are operating on the same state and/or URI. I don't know if we see these in practice, but this is a regression on thread safety.

I think it would be significantly easier to expose the MultiPartState::mutex_ (or have the S3 own the locks, e.g. refactor multipart_upload_states_ to type unordered_map<string, pair<MultiPartUploadState, mutex>). However, it would also be possible to refactor the bulk of the logic in S3 into MultiPartState, e.g. MultiPartState::disconnect, MultiPartState::flush, etc.

This switches from iteraters to using find + at to take references to
the state objects for manipulations. This allows us to release the S3 class
level locks earlier and faster removing performance bottlenecks.
On disconnect of S3 we can parallelize the marking of multi-part uploads
as complete. This allows us to remove the exclusive lock early and
increase performance if we have a large number of outstanding requests.
@Shelnutt2 Shelnutt2 force-pushed the sethshelnutt/ch5079/s3-multipart-locking-improvements branch from 8a00b0d to d196d9e Compare January 28, 2021 04:05
@Shelnutt2 Shelnutt2 force-pushed the sethshelnutt/ch5079/s3-multipart-locking-improvements branch from d196d9e to f064944 Compare January 28, 2021 04:10
@Shelnutt2
Copy link
Member Author

Shelnutt2 commented Jan 28, 2021

I think it would be significantly easier to expose the MultiPartState::mutex_

You were right. I don't really like it, but it works, is safe and its unlikely anyone will make heavy modifications to this class in the near future so probably safe enough from a developer standpoint for now.

I added the last changes as a 4th commit in case additional changes are requested. We can safely squash on merge. I kept them as separate commits for ease of comparison across commits since there is multiple changes included.

tiledb/sm/filesystem/s3.h Outdated Show resolved Hide resolved
tiledb/sm/filesystem/s3.h Outdated Show resolved Hide resolved
tiledb/sm/filesystem/s3.cc Show resolved Hide resolved
@Shelnutt2 Shelnutt2 force-pushed the sethshelnutt/ch5079/s3-multipart-locking-improvements branch from 6dabf6e to bf85deb Compare January 28, 2021 16:54
@Shelnutt2
Copy link
Member Author

After offline discussion with @joe-maley we agreed to add a c++11 compatible RWLock mechanism as this S3 class needs this to handle proper locking and avoid the worst case scenario of class level lock stalling on IO. The write locks are now limited to only the map operations and do not have any S3 IO associated with them.

tiledb/common/rwlock.h Outdated Show resolved Hide resolved
tiledb/common/rwlock.h Show resolved Hide resolved
tiledb/sm/filesystem/s3.cc Outdated Show resolved Hide resolved
tiledb/sm/filesystem/s3.cc Show resolved Hide resolved
tiledb/sm/filesystem/s3.cc Show resolved Hide resolved
tiledb/sm/filesystem/s3.cc Show resolved Hide resolved
@@ -1332,37 +1345,66 @@ Status S3::write_multipart(
const std::string uri_path(aws_uri.GetPath().c_str());

// Take a lock protecting the shared multipart data structures
std::unique_lock<std::mutex> multipart_lck(multipart_upload_mtx_);
// Read lock to see if it exists
multipart_upload_mtx_.read_lock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be appropriate to grab the write lock here. I don't think we gain anything by using a read lock and then retrying with a write lock.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making an assumption, I believe the write to an existing part will be the most common use code path in this function. If that is true, then grabbing a write lock here will add a source of contention to the entire class. I think the overhead of switching locks is likely to be less than the exclusive write lock across this entire section.

tiledb/sm/filesystem/s3.cc Outdated Show resolved Hide resolved
tiledb/sm/filesystem/s3.cc Outdated Show resolved Hide resolved
tiledb/sm/filesystem/s3.cc Outdated Show resolved Hide resolved
@Shelnutt2 Shelnutt2 force-pushed the sethshelnutt/ch5079/s3-multipart-locking-improvements branch 3 times, most recently from b1439e2 to 9bfea12 Compare January 28, 2021 20:19
Copy link
Contributor

@joe-maley joe-maley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM two nit-picks

tiledb/sm/filesystem/s3.cc Outdated Show resolved Hide resolved
tiledb/sm/filesystem/s3.h Show resolved Hide resolved
A new RWLock class is introduced into the common folder. @joe-maley
provided this excellent class. We then use this RWLock in the S3 VFS
class to handle the multipart uploads and to manage the locks more
granular to remove locking and contention for multiple concurrent upload
operations.
@Shelnutt2 Shelnutt2 force-pushed the sethshelnutt/ch5079/s3-multipart-locking-improvements branch from 9bfea12 to ca7157b Compare January 28, 2021 20:47
@Shelnutt2 Shelnutt2 merged commit 8eca6f7 into dev Jan 28, 2021
@github-actions
Copy link
Contributor

The backport to release-2.2 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-release-2.2 release-2.2
# Navigate to the new working tree
cd .worktrees/backport-release-2.2
# Create a new branch
git switch --create backport-2055-to-release-2.2
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick ---mainline 1 8eca6f7d41821e0e3811c3e3a02648ea51861283
# Push it to GitHub
git push --set-upstream origin backport-2055-to-release-2.2
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-release-2.2

Then, create a pull request where the base branch is release-2.2 and the compare/head branch is backport-2055-to-release-2.2.

@Shelnutt2 Shelnutt2 deleted the sethshelnutt/ch5079/s3-multipart-locking-improvements branch April 13, 2021 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants