Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for bfloat16 #7527

Open
philippwitte opened this issue Apr 21, 2023 · 8 comments
Open

Support for bfloat16 #7527

philippwitte opened this issue Apr 21, 2023 · 8 comments
Labels

Comments

@philippwitte
Copy link

Description

Are there plans to support the bfloat16 data type in the near future? This data type is becoming increasingly popular in LLM training. It looks like currently it's not supported. I.e., calling y = cp.asarray(x), where x is a torch tensor of type torch.bfloat16, returns "TypeError: Got unsupported ScalarType BFloat16". Are there any recommended workarounds in the meantime?

Additional Information

No response

@philippwitte philippwitte added the cat:feature New features/APIs label Apr 21, 2023
@leofang
Copy link
Member

leofang commented Apr 21, 2023

Curious where/how would you use bf16 if CuPy were to support it? Any pointer or reference? Thanks! 🙂

@jglaser
Copy link
Contributor

jglaser commented Sep 12, 2023

It would be good if numpy data type extensions à la https://github.com/jax-ml/ml_dtypes/tree/main were supported, which includes bfloat16, fp8 etc.

@guberti
Copy link

guberti commented Sep 18, 2023

Seconding this! bfloat16 and fp8 support are important for my use case. I'd love to see these.

@wuxibin89
Copy link

Any progress on this? We really need it for LLM training and inference.

@borisfom
Copy link

bfloat16 support is sorely missed in cupy. Would really appreciate it getting fixed!
We are currently forced to work around it like this (thankfully we have torch.view):

x = torch.arange(10, dtype=torch.bfloat16, device="cuda")
print(x)
# tensor([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.], device='cuda:0',
#        dtype=torch.bfloat16)

# view as uint8
y = x.view(dtype=torch.uint8)

array_size_in_bytes = y.nelement()*y.element_size()
mem = cupy.cuda.UnownedMemory(y.data_ptr(), array_size_in_bytes, owner=None)
memptr = cupy.cuda.MemoryPointer(mem, offset=0)
arr = cupy.ndarray(y.size(), dtype=cupy.uint8, memptr=memptr)
out = torch.as_tensor(arr, device=x.device, dtype=torch.uint8)
print(out)
# tensor([  0,   0, 128,  63,   0,  64,  64,  64, 128,  64, 160,  64, 192,  64,
#         224,  64,   0,  65,  16,  65], device='cuda:0', dtype=torch.uint8)

# view as bfloat16 again
out = out.view(x.dtype)
print(out)
# tensor([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.], device='cuda:0',
#        dtype=torch.bfloat16)

@yuanlin2004
Copy link

I see (in #8269) that the bfloat16 feature is planned for v14 release. @asi1024 , is there a WIP branch that others can play with or help if needed?

stephanie-wang added a commit to ray-project/ray that referenced this issue May 15, 2024
[bfloat16](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format)
is widely used in LLM training and inference since it can achieve higher
throughput and is less prone to weight growth. ray.util.collective use
cupy.cuda.nccl for GPU communication, while cupy doesn't support
bfloat16 for now (cupy/cupy#7527). So for
allgather/reducescater operation, we should bypass cupy.array and use
torch.tensor directly.

Signed-off-by: wuxibin <wuxibin89@163.com>
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
@dakofler
Copy link
Contributor

dakofler commented Aug 6, 2024

Would also love this!

@CloudyDory
Copy link

We are using a spiking neural network training library that actually implements custom CuPy functions for forward and backward propagation. The fact that CuPy lacks bfloat16 support is real pain for us. I would highly appreciate any progress on this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants