Fix a race when loader starts reading even the metadata is not ready yet #2773

JanuszL · 2021-03-09T09:23:27Z

PrepareMetadata sets that the reader is ready to read first before the metadata is prepared. In effect when the user queries loader for any meta, like size, it will start prefetching even the metadata itself is not ready yet. Fix the race by preparing the meta first and then setting the ready variable
fixes a deadlock caused by a sequence of calls PrepareMetadata->PrepareMetadataImpl->Reset->Size->PrepareMetadata, by replacing the call to Size by SizeImpl which provides the same functionality with the used set of arguments, but it can be done only when the loader has the necessary data fields initialized

Signed-off-by: Janusz Lisiecki jlisiecki@nvidia.com

Why we need this PR?

Pick one, remove the rest

It fixes a race when loader starts reading even the metadata is not ready yet
It fixes a deadlock caused by a sequence of calls PrepareMetadata->PrepareMetadataImpl->Reset->Size->PrepareMetadata

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

What solution was applied:
Fix the race by preparing the meta first and then setting the ready variable
Fixes a deadlock caused by a sequence of calls PrepareMetadata->PrepareMetadataImpl->Reset->Size->PrepareMetadata
Affected modules and functionalities:
loader
loader subclasses
Key points relevant for the review:
check if there is no more inversed order leading to a race
Validation and testing:
current tests applies
Documentation (including examples):
NA

JIRA TASK: [DALI-1909]

- PrepareMetadata sets that the reader is ready to read first before the metadata is prepared. In effect when the user queries loader for any meta, like size, it will start prefetching even the metadata itself is not ready yet. Fix the race by preparing the meta first and then setting the ready variable Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

dali-automaton · 2021-03-09T09:25:45Z

CI MESSAGE: [2146316]: BUILD STARTED

mzient · 2021-03-09T10:44:44Z

dali/operators/reader/loader/loader.h

@@ -217,8 +217,8 @@ class Loader {
  void PrepareMetadata() {
    std::lock_guard<std::mutex> l(prepare_metadata_mutex_);


Perhaps instead of checking this flag outside, it should be checked here and the if (!loading_flag) removed from all usages.

if (loading_flag_) return;

mzient · 2021-03-09T10:45:21Z

dali/operators/reader/loader/loader.h

      PrepareMetadataImpl();
+      loading_flag_ = true;


I guess we should insert a memory fence before setting this flag, so we're 100% sure that other threads will see all the changes before this flag is set - note that the other thread doesn't take a mutex, so there's no memory fence there.

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

dali-automaton · 2021-03-09T12:32:55Z

CI MESSAGE: [2146702]: BUILD STARTED

mzient

Looks like it could work.

dali-automaton · 2021-03-09T14:18:43Z

CI MESSAGE: [2146702]: BUILD FAILED

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

dali-automaton · 2021-03-09T16:12:04Z

CI MESSAGE: [2147183]: BUILD STARTED

dali-automaton · 2021-03-09T17:56:28Z

CI MESSAGE: [2147183]: BUILD PASSED

klecki · 2021-03-09T18:57:43Z

dali/operators/reader/loader/loader.h

+      if (!loading_flag_) {
+        PrepareMetadataImpl();
+        std::atomic_thread_fence(std::memory_order_release);
+        loading_flag_ = true;


If the loading_flag_ is not an atomic, is there any benefit in using this?

https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence:

Establishes memory synchronization ordering of non-atomic and relaxed atomic accesses, as instructed by order, without an associated atomic operation.

That is the exact reason - as loading_flag_ is not an atomic we need to use the explicit barrier as @mzient suggested.

…yet (#2773) - PrepareMetadata sets that the reader is ready to read first before the metadata is prepared. In effect when the user queries loader for any meta, like size, it will start prefetching even the metadata itself is not ready yet. Fix the race by preparing the meta first and then setting the ready variable - fixes a deadlock caused by a sequence of calls PrepareMetadata->PrepareMetadataImpl->Reset->Size->PrepareMetadata, by replacing the call to Size by SizeImpl which provides the same functionality with the used set of arguments, but it can be done only when the loader has the necessary data fields initialized Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

mzient approved these changes Mar 9, 2021

View reviewed changes

jantonguirao assigned mzient and klecki Mar 9, 2021

mzient reviewed Mar 9, 2021

View reviewed changes

JanuszL added 2 commits March 9, 2021 13:21

Deadlock fix

7976817

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

Fix deadlock

761a3bd

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

JanuszL requested a review from mzient March 9, 2021 13:51

mzient approved these changes Mar 9, 2021

View reviewed changes

Test fix

4e89594

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

klecki approved these changes Mar 9, 2021

View reviewed changes

JanuszL merged commit ee0356c into NVIDIA:master Mar 9, 2021

JanuszL deleted the fix_loader_race branch March 9, 2021 22:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a race when loader starts reading even the metadata is not ready yet #2773

Fix a race when loader starts reading even the metadata is not ready yet #2773

JanuszL commented Mar 9, 2021 •

edited

Loading

dali-automaton commented Mar 9, 2021

mzient Mar 9, 2021

JanuszL Mar 9, 2021

mzient Mar 9, 2021 •

edited

Loading

JanuszL Mar 9, 2021

dali-automaton commented Mar 9, 2021

mzient left a comment

dali-automaton commented Mar 9, 2021

dali-automaton commented Mar 9, 2021

dali-automaton commented Mar 9, 2021

klecki Mar 9, 2021

JanuszL Mar 9, 2021

		@@ -217,8 +217,8 @@ class Loader {
		void PrepareMetadata() {
		std::lock_guard<std::mutex> l(prepare_metadata_mutex_);

Fix a race when loader starts reading even the metadata is not ready yet #2773

Fix a race when loader starts reading even the metadata is not ready yet #2773

Conversation

JanuszL commented Mar 9, 2021 • edited Loading

Why we need this PR?

What happened in this PR?

dali-automaton commented Mar 9, 2021

mzient Mar 9, 2021

Choose a reason for hiding this comment

JanuszL Mar 9, 2021

Choose a reason for hiding this comment

mzient Mar 9, 2021 • edited Loading

Choose a reason for hiding this comment

JanuszL Mar 9, 2021

Choose a reason for hiding this comment

dali-automaton commented Mar 9, 2021

mzient left a comment

Choose a reason for hiding this comment

dali-automaton commented Mar 9, 2021

dali-automaton commented Mar 9, 2021

dali-automaton commented Mar 9, 2021

klecki Mar 9, 2021

Choose a reason for hiding this comment

JanuszL Mar 9, 2021

Choose a reason for hiding this comment

JanuszL commented Mar 9, 2021 •

edited

Loading

mzient Mar 9, 2021 •

edited

Loading