New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid std::future in S3 multipart upload #2851
Avoid std::future in S3 multipart upload #2851
Conversation
This pull request has been linked to Shortcut Story #13742: Eliminate blocking std::future<>::get() in s3.cc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This removes the calls into S3 threading/asynchrony system and stays within TileDB. I think it may offer better concurrency as well as avoiding deadlock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to me like these changes are still subject to the same potential failure as the original.
presumption - the vfs_thread_pool_ is the same thread_pool (already) running the submitting tasks (true/false?)
Assuming true...
parallel_for() places the tasks into the vfs_thread_pool_ task queue (which in the original scenario, already had all task resources occupied by the submitting threads), and then does a tp->wait_all(tasks) on all of those (S3 activity) tasks.
Apart from perhaps having slightly different 'jitter' from the original, still seems this approach is subject to having all vfs_thread_pool_ available tasks occupied by the invoking/submitting threads, and hence no resource available to run the tasks submitted for the S3 "jobs" in parallel_for() (because they are already blocked by the submitting task threads.)
Am I missing something?
The difference is that wait_all will make progress on pending tasks rather than blocking in get(). |
in the minus one (-1) threadpool yes, but, is the new threadpool changing to the minus one (-1) approach? And, in discussions, I think it's been agreed amongst some, that the problem -could- be seen under unfortunate circumstances even with the minus one (-1) design... |
I extracted the changes in this branch and applied them on top of a branch with the new threadpool and am still seeing a hang in that. It doesn't look quite like the old one, seems to only have two threads 'actively' involved, but I think it's still a similar hang (I didn't quite understand the 3 threads in the earlier version, so this actually matches more what I would have expected.) |
There is an actual deadlock here. (So far on testing this seems to be working, but I am going to let the testing run for a while longer.) Would it be better to make |
99d8620
to
9bdbb56
Compare
I'm now getting a SEGV, Debug config build wsl *nix of commit 9bdbb56, prev repro scenario (concurrency_level==2), #2 0x00005555561aa0b6 in Aws::Utils::Outcome<Aws::S3::Model::UploadPartResult, Aws::S3::S3Error>::operator= (
|
04cea0c
to
ffa297b
Compare
This solves a deadlock that can occur when all cores get blocked on waiting for the std::future get() call leading to thread starvation.
ffa297b
to
7d02d21
Compare
This solves a deadlock that can occur when all cores get blocked on waiting for the std::future get() call leading to thread starvation.
TYPE: BUG
DESC: Avoid thread starvation by removing std::future usage in S3 multipart upload