Skip to content

Latest commit

 

History

History
171 lines (133 loc) · 3.86 KB

SPV_INTEL_bfloat16_conversion.asciidoc

File metadata and controls

171 lines (133 loc) · 3.86 KB

SPV_INTEL_bfloat16_conversion

Name Strings

SPV_INTEL_bfloat16_conversion

Contact

To report problems with this extension, please open a new issue at:

Contributors

  • Ben Ashbaugh, Intel

  • Greg Lueck, Intel

  • Alexey Sotkin, Intel

  • Arvind Sudarsanam, Intel

Notice

Copyright (c) 2023 Intel Corporation. All rights reserved.

Status

  • Shipping

Version

Last Modified Date

2024-06-07

Revision

1

Dependencies

This extension is written against the SPIR-V Specification, Version 1.6 Revision 2.

This extension requires SPIR-V 1.0.

Overview

This extension adds instructions to convert between single-precision 32-bit floating-point values and 16-bit bfloat16 values. The bfloat16 floating-point format is a truncated variant of the IEEE 754 single-precision 32-bit floating-point format with one sign bit, eight exponent bits, and seven mantissa bits. This gives the 16-bit bfloat16 format similar dynamic range as the 32-bit float format, albeit with lower precision than the 16-bit half format.

Please note that this extension does not introduce a bfloat16 type to SPIR-V and instead the new instructions convert to or from a 16-bit integer type whose bit pattern represents a bfloat16 value.

Extension Name

To use this extension within a SPIR-V module, the appropriate OpExtension must be present in the module:

OpExtension "SPV_INTEL_bfloat16_conversion"

Modifications to the SPIR-V Specification, Version 1.6

Capabilities

Modify Section 3.31, Capability, adding rows to the Capability table:

Capability Implicitly Declares

6115

BFloat16ConversionINTEL

Instructions

Add to Section 3.42.11, Conversion Instructions:

OpConvertFToBF16INTEL

Convert value numerically from 32-bit floating point to bfloat16, which is represented as a 16-bit unsigned integer.

Result Type must be a scalar or vector of integer type. The component width must be 16 bits. The bit pattern in the Result represents a bfloat16 value.

Float Value must be a scalar or vector of floating-point type. It must have the same number of components as Result Type. The component width must be 32 bits.

Results are computed per component.

Capability:
BFloat16ConversionINTEL

4

6116

<id>
Result Type

Result <id>

<id>
Float Value

OpConvertBF16ToFINTEL

Interpret a 16-bit integer value as bfloat16 and convert the value numerically to 32-bit floating point.

Result Type must be a scalar or vector of floating-point type. The component width must be 32 bits.

BFloat16 Value must be a scalar or vector of integer type, which is interpreted as a bfloat16. The type must have the same number of components as Result Type. The component width must be 16 bits.

Results are computed per component.

Capability:
BFloat16ConversionINTEL

4

6117

<id>
Result Type

Result <id>

<id>
BFloat16 Value

Issues

None.

Revision History

Rev Date Author Changes

1

2023-03-06

Ben Ashbaugh

Initial revision for publication