Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add intra_rank to all communicators #177

Merged
merged 5 commits into from Dec 25, 2017

Conversation

keisukefukuda
Copy link
Member

@keisukefukuda keisukefukuda commented Dec 25, 2017

In the current communicator class hierarchy, not all communicator have intra_rank attribute due to a historical reason.
The attribute is currently used to determine if ChainerMN is using GPUs because intra_rank was implemented to determine GPUs to use in a local physical node. However, GPU ID and intra_rank are different concepts and the usage is inappropriate.

This PR mergesNodeAwareCommunicator into CommunicatorBase so CommunicatorBase class and all of its ancestors have intra_rank attribute.

@keisukefukuda keisukefukuda changed the title [WIP] Add intra_rank to all communicators Add intra_rank to all communicators Dec 25, 2017
@kuenishi
Copy link
Member

Overall this patch is simple and great. But one more, I want to see a small test that assures communicator.intra_rank never returns None like this:

diff --git a/tests/chainermn_tests/communicator_tests/test_communicator.py b/tests/chainermn_tests/communicator_tests/test_communicator.py
index e06e459..814e5d3 100644
--- a/tests/chainermn_tests/communicator_tests/test_communicator.py
+++ b/tests/chainermn_tests/communicator_tests/test_communicator.py
@@ -195,6 +195,11 @@ def check_send_recv(param, use_gpu):
 def check_collective_communication(param, use_gpu):
     communicator = create_communicator(param, use_gpu)
 
+    assert 0 <= communicator.intra_rank
+    assert communicator.intra_rank < communicator.intra_size
+    assert 0 <= communicator.inter_rank
+    assert communicator.inter_rank < communicator.inter_size
+
     model = ExampleModel()
     if use_gpu:
         model.to_gpu()

Copy link
Member

@kuenishi kuenishi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add one more tests as I mentioned above, please.

@@ -36,8 +36,27 @@ def __init__(self, obj):

class CommunicatorBase(object):

def __init__(self, mpi_comm):
def __init__(self, mpi_comm, use_nccl=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you change this default value to False, PureNcclCommunicator.__init__(), superclass initialization must be updated, too.

        super(PureNcclCommunicator, self).__init__(mpi_comm)
        self._init_ranks()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I added tests to check the values of intra_rank and inter_rank, in test_node_aware_communicator_base.py.

@kuenishi kuenishi merged commit ce43479 into master Dec 25, 2017
@kuenishi kuenishi deleted the add-intra-rank-to-all-communicators branch December 25, 2017 10:35
@kuenishi kuenishi added this to the 1.2.0 milestone Jan 11, 2018
@iwiwi iwiwi added the feature label Jan 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants