-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MXNET-362] ensure same mkldnn engine is used for consistency #10616
Conversation
@@ -67,7 +67,8 @@ class CpuEngine { | |||
public: | |||
static CpuEngine *Get() { | |||
// I's thread-safe in C++11. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we need remove this line of comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment was correct, it says mkldnn engine is thread safe to use in mkldnn cpp api.
2446017
to
a293d8c
Compare
@zheng-da added unittest, @marcoabreu can you please merge if ok. thanks. |
tests/python/mkl/test_mkldnn.py
Outdated
|
||
val_data = gluon.data.DataLoader( | ||
gluon.data.vision.CIFAR10(train=False), | ||
batch_size=32, shuffle=False, num_workers=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remove this data loader? it doesn't seem this test needs to use it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, the dataloader is the one that triggers different thread context, i tried without a dataloader (or even if i pass num_workers = 0 to above dataloader) it runs on same thread, so bug wont happen. Gluon DataLoader allows us to create a new thread as it iterates over data batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this test! Could you please elaborate the exact behaviour of this unit test in the test with a block comment. I agree with Da that at the moment, it's hard to grasp the exact problem from reading the code. For me it's hard to understand when different threads are getting started and what the exact issue is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, please provide comments why we need data loader here. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gluon DataLoader allows us to create a new thread as it iterates over data batch. Added comments to PR.
a293d8c
to
3181b8c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! Please add a jira ticket and we're good to go!
3181b8c
to
ff93243
Compare
ff93243
to
104781c
Compare
@zheng-da updated with dummy data, can you please accept/review if ok. |
@ashokei it seems when using dummy data, it can't reproduce the bug. |
…#10616) * ensure same mkldnn engine is used for consistency * add unittest for mkldnn engine thread testing * add comments for thread context switching * fix lint issue * use dummy data
…#10616) * ensure same mkldnn engine is used for consistency * add unittest for mkldnn engine thread testing * add comments for thread context switching * fix lint issue * use dummy data
…#10616) * ensure same mkldnn engine is used for consistency * add unittest for mkldnn engine thread testing * add comments for thread context switching * fix lint issue * use dummy data
* ensure same mkldnn engine is used for consistency * add unittest for mkldnn engine thread testing * add comments for thread context switching * fix lint issue * use dummy data
…#10616) * ensure same mkldnn engine is used for consistency * add unittest for mkldnn engine thread testing * add comments for thread context switching * fix lint issue * use dummy data
Description
Gluon data iterators may trigger different thread for execution context, this causes mkl-dnn engine to be inconsistent. Following snippet reproduces this issue.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments