feat: support casting and CPU bfloat16 and float16#11
Merged
voltjia merged 8 commits intofeat/dev-infrafrom Mar 6, 2026
Merged
feat: support casting and CPU bfloat16 and float16#11voltjia merged 8 commits intofeat/dev-infrafrom
voltjia merged 8 commits intofeat/dev-infrafrom
Conversation
voltjia
requested changes
Mar 6, 2026
…`Cast()` function - add the CPU implementation of float16 and bfloat16 as `float16_t` and `bfloat16_t` - add the CPU `Cast()` function that support conversion between any two CPU supported types, including the custom `float16_t` and `bfloat16_t`
…he styling requirement
…patch and move them into `common/cuda/cast.h`
…/cuda/cast.h` to better comply with the naming rules
…and fix various styling issues.
1b7c61b to
2ab0e33
Compare
Collaborator
voltjia
approved these changes
Mar 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

TL;DR: Supports CPU and CUDA generic
Castfunction and adds the CPU implementation ofBFloat16andFloat16.Key Changes
CPU bf16 and fp16:
BFloat16andFloat16types indata_type.hGeneric Casting
Add the CPU generic casting function
Cast()insrc/common/cast.h;Add the CUDA generic casting function
Cast()insrc/common/cuda/cast.h, which is seamlessly compatiable with different CUDA-ish platforms, currently verified to correctly dispatch hardware intrinsics on both NVIDIA and MetaX;__bfloat162int_rn) with an automatic fallback to a float-pivot conversion.Style Correction
indexToOffset()toIndexToOffset()to comply with the styling rules.Known Issues & Future Work:
More Testing: the current
Cast()and CPUBFloat16andFloat16are not extensively tested across real operators and platforms other than NVIDIA and MetaX.Enrich CUDA Direct Casts: currently the CUDA
Cast()only provides a subset of the hardware direct cast intrinsics. Should enrich the mapping in the future.