Hi,
Is this known / being worked on? When I try to call collectives under theano.config.floatX="float16", I get the error:
File "pygpu/collectives.pyx", line 257, in pygpu.collectives.GpuComm.broadcast (pygpu/collectives.c:5022)
File "pygpu/collectives.pyx", line 362, in pygpu.collectives.comm_broadcast (pygpu/collectives.c:6060)
pygpu.gpuarray.GpuArrayException: b'Invalid value or operation'
NCCL can definitely do half-precision.
Really hoping for that free 2x speedup ;)
Hi,
Is this known / being worked on? When I try to call collectives under theano.config.floatX="float16", I get the error:
NCCL can definitely do half-precision.
Really hoping for that free 2x speedup ;)