New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add intra_rank to all communicators #177
Conversation
Overall this patch is simple and great. But one more, I want to see a small test that assures diff --git a/tests/chainermn_tests/communicator_tests/test_communicator.py b/tests/chainermn_tests/communicator_tests/test_communicator.py
index e06e459..814e5d3 100644
--- a/tests/chainermn_tests/communicator_tests/test_communicator.py
+++ b/tests/chainermn_tests/communicator_tests/test_communicator.py
@@ -195,6 +195,11 @@ def check_send_recv(param, use_gpu):
def check_collective_communication(param, use_gpu):
communicator = create_communicator(param, use_gpu)
+ assert 0 <= communicator.intra_rank
+ assert communicator.intra_rank < communicator.intra_size
+ assert 0 <= communicator.inter_rank
+ assert communicator.inter_rank < communicator.inter_size
+
model = ExampleModel()
if use_gpu:
model.to_gpu() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add one more tests as I mentioned above, please.
@@ -36,8 +36,27 @@ def __init__(self, obj): | |||
|
|||
class CommunicatorBase(object): | |||
|
|||
def __init__(self, mpi_comm): | |||
def __init__(self, mpi_comm, use_nccl=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you change this default value to False, PureNcclCommunicator.__init__()
, superclass initialization must be updated, too.
super(PureNcclCommunicator, self).__init__(mpi_comm)
self._init_ranks()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I added tests to check the values of intra_rank
and inter_rank
, in test_node_aware_communicator_base.py
.
In the current communicator class hierarchy, not all communicator have
intra_rank
attribute due to a historical reason.The attribute is currently used to determine if ChainerMN is using GPUs because
intra_rank
was implemented to determine GPUs to use in a local physical node. However, GPU ID andintra_rank
are different concepts and the usage is inappropriate.This PR merges
NodeAwareCommunicator
intoCommunicatorBase
soCommunicatorBase
class and all of its ancestors haveintra_rank
attribute.