Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a race when loader starts reading even the metadata is not ready yet #2773

Merged
merged 4 commits into from
Mar 9, 2021

Conversation

JanuszL
Copy link
Contributor

@JanuszL JanuszL commented Mar 9, 2021

  • PrepareMetadata sets that the reader is ready to read first before the metadata is prepared. In effect when the user queries loader for any meta, like size, it will start prefetching even the metadata itself is not ready yet. Fix the race by preparing the meta first and then setting the ready variable
  • fixes a deadlock caused by a sequence of calls PrepareMetadata->PrepareMetadataImpl->Reset->Size->PrepareMetadata, by replacing the call to Size by SizeImpl which provides the same functionality with the used set of arguments, but it can be done only when the loader has the necessary data fields initialized

Signed-off-by: Janusz Lisiecki jlisiecki@nvidia.com

Why we need this PR?

Pick one, remove the rest

  • It fixes a race when loader starts reading even the metadata is not ready yet
  • It fixes a deadlock caused by a sequence of calls PrepareMetadata->PrepareMetadataImpl->Reset->Size->PrepareMetadata

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

  • What solution was applied:
    Fix the race by preparing the meta first and then setting the ready variable
    Fixes a deadlock caused by a sequence of calls PrepareMetadata->PrepareMetadataImpl->Reset->Size->PrepareMetadata
  • Affected modules and functionalities:
    loader
    loader subclasses
  • Key points relevant for the review:
    check if there is no more inversed order leading to a race
  • Validation and testing:
    current tests applies
  • Documentation (including examples):
    NA

JIRA TASK: [DALI-1909]

- PrepareMetadata sets that the reader is ready to read first before
  the metadata is prepared. In effect when the user queries loader for
  any meta, like size, it will start prefetching even the metadata itself
  is not ready yet. Fix the race by preparing the meta first and
  then setting the ready variable

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2146316]: BUILD STARTED

@@ -217,8 +217,8 @@ class Loader {
void PrepareMetadata() {
std::lock_guard<std::mutex> l(prepare_metadata_mutex_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps instead of checking this flag outside, it should be checked here and the if (!loading_flag) removed from all usages.

if (loading_flag_)
  return;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

PrepareMetadataImpl();
loading_flag_ = true;
Copy link
Contributor

@mzient mzient Mar 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should insert a memory fence before setting this flag, so we're 100% sure that other threads will see all the changes before this flag is set - note that the other thread doesn't take a mutex, so there's no memory fence there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2146702]: BUILD STARTED

@JanuszL JanuszL requested a review from mzient March 9, 2021 13:51
Copy link
Contributor

@mzient mzient left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it could work.

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2146702]: BUILD FAILED

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2147183]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2147183]: BUILD PASSED

if (!loading_flag_) {
PrepareMetadataImpl();
std::atomic_thread_fence(std::memory_order_release);
loading_flag_ = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the loading_flag_ is not an atomic, is there any benefit in using this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence:

Establishes memory synchronization ordering of non-atomic and relaxed atomic accesses, as instructed by order, without an associated atomic operation.

That is the exact reason - as loading_flag_ is not an atomic we need to use the explicit barrier as @mzient suggested.

@JanuszL JanuszL merged commit ee0356c into NVIDIA:master Mar 9, 2021
@JanuszL JanuszL deleted the fix_loader_race branch March 9, 2021 22:07
JanuszL added a commit that referenced this pull request Mar 10, 2021
…yet (#2773)

- PrepareMetadata sets that the reader is ready to read first before the metadata is prepared. In effect when the user queries loader for any meta, like size, it will start prefetching even the metadata itself is not ready yet. Fix the race by preparing the meta first and then setting the ready variable
- fixes a deadlock caused by a sequence of calls PrepareMetadata->PrepareMetadataImpl->Reset->Size->PrepareMetadata, by replacing the call to Size by SizeImpl which provides the same functionality with the used set of arguments, but it can be done only when the loader has the necessary data fields initialized

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants