-
Notifications
You must be signed in to change notification settings - Fork 621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a race when loader starts reading even the metadata is not ready yet #2773
Conversation
- PrepareMetadata sets that the reader is ready to read first before the metadata is prepared. In effect when the user queries loader for any meta, like size, it will start prefetching even the metadata itself is not ready yet. Fix the race by preparing the meta first and then setting the ready variable Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
CI MESSAGE: [2146316]: BUILD STARTED |
@@ -217,8 +217,8 @@ class Loader { | |||
void PrepareMetadata() { | |||
std::lock_guard<std::mutex> l(prepare_metadata_mutex_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps instead of checking this flag outside, it should be checked here and the if (!loading_flag)
removed from all usages.
if (loading_flag_)
return;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
PrepareMetadataImpl(); | ||
loading_flag_ = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we should insert a memory fence before setting this flag, so we're 100% sure that other threads will see all the changes before this flag is set - note that the other thread doesn't take a mutex, so there's no memory fence there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
CI MESSAGE: [2146702]: BUILD STARTED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it could work.
CI MESSAGE: [2146702]: BUILD FAILED |
CI MESSAGE: [2147183]: BUILD STARTED |
CI MESSAGE: [2147183]: BUILD PASSED |
if (!loading_flag_) { | ||
PrepareMetadataImpl(); | ||
std::atomic_thread_fence(std::memory_order_release); | ||
loading_flag_ = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the loading_flag_ is not an atomic, is there any benefit in using this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence:
Establishes memory synchronization ordering of non-atomic and relaxed atomic accesses, as instructed by order, without an associated atomic operation.
That is the exact reason - as loading_flag_
is not an atomic we need to use the explicit barrier as @mzient suggested.
…yet (#2773) - PrepareMetadata sets that the reader is ready to read first before the metadata is prepared. In effect when the user queries loader for any meta, like size, it will start prefetching even the metadata itself is not ready yet. Fix the race by preparing the meta first and then setting the ready variable - fixes a deadlock caused by a sequence of calls PrepareMetadata->PrepareMetadataImpl->Reset->Size->PrepareMetadata, by replacing the call to Size by SizeImpl which provides the same functionality with the used set of arguments, but it can be done only when the loader has the necessary data fields initialized Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Signed-off-by: Janusz Lisiecki jlisiecki@nvidia.com
Why we need this PR?
Pick one, remove the rest
What happened in this PR?
Fill relevant points, put NA otherwise. Replace anything inside []
Fix the race by preparing the meta first and then setting the ready variable
Fixes a deadlock caused by a sequence of calls PrepareMetadata->PrepareMetadataImpl->Reset->Size->PrepareMetadata
loader
loader subclasses
check if there is no more inversed order leading to a race
current tests applies
NA
JIRA TASK: [DALI-1909]