Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ComplexFloat not supported #539

Closed
jackhwalters opened this issue Aug 2, 2021 · 6 comments
Closed

ComplexFloat not supported #539

jackhwalters opened this issue Aug 2, 2021 · 6 comments

Comments

@jackhwalters
Copy link

jackhwalters commented Aug 2, 2021

Hello,

I am trying to train a complex-valued deep learning model across multiple GPUs in PyTorch. When I attempt to do so I receive this error:

RuntimeError: Input tensor data type is not supported for NCCL process group: ComplexFloat

As far as I can tell this is lack of support for complex tensors rather than a bug. Is there any chance it will be integrated soon?

@sjeaugey
Copy link
Member

sjeaugey commented Aug 3, 2021

Adding new datatypes means increasing NCCL binary size significantly, so we usually don't add datatypes unless they are really needed and there is no alternative.

In your case, I'd assume a ComplexFloat is a tuple of 2 floats, hence performing an allreduce on an array of N ComplexFloats is no different than performing an allreduce on an array of 2N floats.

If that's the case, maybe you can convert your array before you pass it, or maybe PyTorch can support ComplexFloat in the NCCL process group by using allreduce on floats?

@jackhwalters
Copy link
Author

Yes I'm probably going to use

torch.view_as_real(x)

where necessary to convert the complex tensors to real ones. Are there specific places in the pipeline that need to be real-valued for NCCL to run? I.e. does every instance of a complex tensor need to be converted to a real one or are there specific places where I can convert to real and then back to complex, to save me converting every tensor in my project?

@sjeaugey
Copy link
Member

sjeaugey commented Aug 4, 2021

I'm not sure I understand the question. If you pass your complex array as floats, operations will be done on the complex array (just treating it as an array of floats) and you should be able to use it immediately without the need for conversion.

I'm not a PyTorch expert though, so I could be missing something; I don't know what "view_as_real" does, I'm only inferring this is a cast, i.e. not a copy, just a different typing.

@jackhwalters
Copy link
Author

Because I am passing in complex-valued spectrograms to the network, I will need to cast them from complex to real, and I was just wondering if there were any parts of the network that I could get away with leaving as complex tensors. The reason I ask is that a significant portion of my network is written in terms of complex annotation which I would need to change and I don't want to have to change all of it if I don't need to. This would be the case for example if NCCL only required the tensors input and output from the network to be real, meaning I could do appropriate casting just after the input and just before the output.

@sjeaugey
Copy link
Member

sjeaugey commented Aug 6, 2021

Sorry I'm still confused. When converting an array of complexes as "real" I would expect no conversion to happen. It is just a different view, as 2*n reals instead of n complexes, and summing would work just fine as it would sum real and imaginary parts independently. We don't need to know that the array is an array of complex numbers given the sum will be only done as floats.

So summing an array of complexes doesn't seem to me to need a special complex data type.

@jackhwalters
Copy link
Author

jackhwalters commented Aug 9, 2021

Yes numerically the tensors are the same, but how I access the real and imaginary numbers would be different depending on whether the tensor is of type real or complex. Either way, that's not a NCCL issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants