New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve S3 multipart locking scenarios #2055
Improve S3 multipart locking scenarios #2055
Conversation
This pull request has been linked to Clubhouse Story #5079: S3 multipart locking improvements. |
17a7c39
to
8a00b0d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for the first patch.
I had a slightly different idea the the MultiPartState::mutex_
. I was expecting that the mutex would be exposed so that S3
can lock for the entire time that is it manipulating the state. E.g.:
S3::foo() {
std::unique_lock<std::mutex> multipart_lck(multipart_upload_mtx_);
MultiPartState* state = &multipart_upload_states_.at("my_uri);
std::unique_lock state_lock(*state->mutex());
multipart_lck.unlock();
int n = state->part_number() + 1
state->part_number(n);
state_lock.unlock();
As it stands, there could be a race with:
int n = state->part_number() + 1
state->part_number(n);
If T1 and T2 are both executing this path, we'll get a lost +1
.
I see the potential race conditions but I'm not a huge fan of turning over the state locking to the S3 class. This would require the user of the state class to always know to handle the locking. If we can control the locking through accessors then the caller doesn't have to worry about the locking scheme. Would you be open to reworking the member functions to better handle the functionality? For instance there is a common pattern of getting the
|
On principle, I'd be OK with modifying the This patch introduces races among the following paths: I think it would be significantly easier to expose the |
This switches from iteraters to using find + at to take references to the state objects for manipulations. This allows us to release the S3 class level locks earlier and faster removing performance bottlenecks.
On disconnect of S3 we can parallelize the marking of multi-part uploads as complete. This allows us to remove the exclusive lock early and increase performance if we have a large number of outstanding requests.
8a00b0d
to
d196d9e
Compare
d196d9e
to
f064944
Compare
You were right. I don't really like it, but it works, is safe and its unlikely anyone will make heavy modifications to this class in the near future so probably safe enough from a developer standpoint for now. I added the last changes as a 4th commit in case additional changes are requested. We can safely squash on merge. I kept them as separate commits for ease of comparison across commits since there is multiple changes included. |
6dabf6e
to
bf85deb
Compare
After offline discussion with @joe-maley we agreed to add a c++11 compatible RWLock mechanism as this S3 class needs this to handle proper locking and avoid the worst case scenario of class level lock stalling on IO. The write locks are now limited to only the map operations and do not have any S3 IO associated with them. |
tiledb/sm/filesystem/s3.cc
Outdated
@@ -1332,37 +1345,66 @@ Status S3::write_multipart( | |||
const std::string uri_path(aws_uri.GetPath().c_str()); | |||
|
|||
// Take a lock protecting the shared multipart data structures | |||
std::unique_lock<std::mutex> multipart_lck(multipart_upload_mtx_); | |||
// Read lock to see if it exists | |||
multipart_upload_mtx_.read_lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be appropriate to grab the write lock here. I don't think we gain anything by using a read lock and then retrying with a write lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making an assumption, I believe the write to an existing part will be the most common use code path in this function. If that is true, then grabbing a write lock here will add a source of contention to the entire class. I think the overhead of switching locks is likely to be less than the exclusive write lock across this entire section.
b1439e2
to
9bfea12
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM two nit-picks
A new RWLock class is introduced into the common folder. @joe-maley provided this excellent class. We then use this RWLock in the S3 VFS class to handle the multipart uploads and to manage the locks more granular to remove locking and contention for multiple concurrent upload operations.
9bfea12
to
ca7157b
Compare
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-release-2.2 release-2.2
# Navigate to the new working tree
cd .worktrees/backport-release-2.2
# Create a new branch
git switch --create backport-2055-to-release-2.2
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick ---mainline 1 8eca6f7d41821e0e3811c3e3a02648ea51861283
# Push it to GitHub
git push --set-upstream origin backport-2055-to-release-2.2
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-release-2.2 Then, create a pull request where the |
This PR has 3 parts, separated into three commits. The first commit adds getters/setters for the
MultiPartUploadState
class, and adds thread-safety locking to this class. This allows us to move a lot of locking from the S3 class to a per file locking in theMultiPartUploadState
class itself.The second commit reworks the
multipart_upload_mtx_
locking to allow us to release early and reduce the duration and places where we hold S3 class level locking. The main changes involve switching from iterators ofmultipart_upload_states_
to fetching references to the specificMultiPartUploadState
and rely on the locking introduced to that class.The third change, parallelizes the disconnect and the marking of completion or failure of any outstanding multi-part requests.