Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a race condition in GetGPUAllocator #1575

Merged
merged 2 commits into from
Dec 17, 2019
Merged

Fix a race condition in GetGPUAllocator #1575

merged 2 commits into from
Dec 17, 2019

Conversation

JanuszL
Copy link
Contributor

@JanuszL JanuszL commented Dec 13, 2019

  • reworks AllocatorManager to use atomic pointers
  • reworks AllocatorManager to be a static global variable available by the global accessors

Signed-off-by: Janusz Lisiecki jlisiecki@nvidia.com

Why we need this PR?

  • Fixes a race condition in GetGPUAllocator

What happened in this PR?

  • reworks AllocatorManager to use atomic pointers
  • reworks AllocatorManager to be a static global variable available by the global accessors
  • tested in CI, local stress test
  • no docs or examples needs to be updated

JIRA TASK: [DALI-1181]

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1032036]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1032036]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1032051]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1032051]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1032768]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1032768]: BUILD PASSED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1034872]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1034872]: BUILD FAILED

}

~AllocatorManager() {
delete cpu_allocator_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On one hand - I know it's a destructor and nothing should follow it - but on the other, just for debugging purposes I prefer to set the deleted pointers to nullptr. That should save us some headache when some crazy bug hits us with a rogue thread trying to use these after we've shut down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
delete cpu_allocator_;
delete std::atomic_exchange(&cpu_allocator_, nullptr);

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1034879]: BUILD STARTED

Comment on lines 36 to 38
for (auto& gpu_alloc : gpu_allocators_) {
gpu_alloc.store(nullptr);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont't they get that by default?

Comment on lines +84 to +81
alloc = gpu_allocators_[dev].load();
if (!alloc) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1034883]: BUILD STARTED

if (!alloc) {
SetGPUAllocator(*gpu_opspec_);
}
return *alloc;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set alloc to a proper value!

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1035052]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1034883]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1035052]: BUILD PASSED

- reworks AllocatorManager to use atomic pointers
- reworks AllocatorManager to be a static global variable
  available by the global accessors

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@JanuszL JanuszL merged commit 24dac43 into NVIDIA:master Dec 17, 2019
@JanuszL JanuszL deleted the fix_race_allocator branch December 17, 2019 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants