Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FP16 and FP64 Supports to PureNcclComunicator #187

Merged
merged 18 commits into from Feb 23, 2018
Merged

Conversation

shu65
Copy link
Member

@shu65 shu65 commented Jan 17, 2018

This PR adds FP16 and FP64 support to allreduce_grad of PureNcclCommunicator.

@shu65 shu65 changed the title [WIP] Support FP16 and FP64 in allreduce_grad of PureNcclComunicator Add FP16 and FP64 Support to PureNcclComunicator Feb 1, 2018
@shu65 shu65 changed the title Add FP16 and FP64 Support to PureNcclComunicator Add FP16 and FP64 Supports to PureNcclComunicator Feb 1, 2018
Copy link
Member

@kuenishi kuenishi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite a few nitpicks the code looks good to me.

@@ -41,6 +45,10 @@ def create_communicator(
import mpi4py.MPI
mpi_comm = mpi4py.MPI.COMM_WORLD

if communicator_name != 'pure_nccl' and allreduce_grad_dtype is not None:
raise ValueError(
'allreduce_grad_dtype is not supported except for \'pure_nccl\'.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'allreduce_grad_dtype is only available at 'pure_nccl' communicator.'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I will fix this error message according to your comment.

@@ -31,6 +34,7 @@ def create_communicator(
``hierarchical``, ``two_dimensional``, ``pure_nccl``, or
``single_node``)
mpi_comm: MPI4py communicator
allreduce_grad_dtype: Data type of gradient used in All-Reduce
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think allowed value here is only np.float32, np.float16 or None. Maybe we can add value check somewhere around here, before actually creating communicator?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry there are already value check in PureNcclCommunicator, so may be we just can add documentation here, like "Allowed types are numpy.float16, 32, 64 or None....".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the behaviour when None is passed should be described here, which is to use float type of models.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I will add the description.

@@ -70,8 +70,8 @@ def ptr(self):
def buffer(self, size):
return self.ffi.buffer(self.ffi.cast('void *', self.memory.ptr), size)

def array(self, shape, offset=0):
return cp.ndarray(shape, memptr=self.memory + offset, dtype=cp.float32)
def array(self, shape, offset=0, dtype=np.float32):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we don't want cupy's default behaviour setting dtype as float64 when None is passed, we might better add non-None assertion here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I I also think so, I will add the assertion.

@kuenishi kuenishi merged commit c2e1440 into master Feb 23, 2018
@kuenishi kuenishi deleted the support_fp16_fp64 branch February 23, 2018 01:58
@kuenishi kuenishi added this to the v1.3.0 milestone Apr 5, 2018
@kuenishi kuenishi mentioned this pull request Apr 5, 2018
@shu65 shu65 added the feature label May 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants