In [3]:
import torch

In [4]:
torch.iinfo(torch.uint8)

iinfo(min=0, max=255, dtype=uint8)

In [5]:
torch.iinfo(torch.int8)

iinfo(min=-128, max=127, dtype=int8)

In [7]:
torch.iinfo(torch.uint16)

iinfo(min=0, max=65535, dtype=uint16)

In [8]:
torch.iinfo(torch.int16)

iinfo(min=-32768, max=32767, dtype=int16)

In models you can use different data types. In above we're using a subset of signed and unsigned integers. You can clearly see how increasing the number of bytes increases the amount of values you can store, and how adding a sign bit affects the value range.

In [12]:
torch.finfo(torch.float32)

finfo(resolution=1e-06, min=-3.40282e+38, max=3.40282e+38, eps=1.19209e-07, smallest_normal=1.17549e-38, tiny=1.17549e-38, dtype=float32)

In [13]:
torch.finfo(torch.float16)

finfo(resolution=0.001, min=-65504, max=65504, eps=0.000976562, smallest_normal=6.10352e-05, tiny=6.10352e-05, dtype=float16)

In [14]:
torch.finfo(torch.bfloat16)

finfo(resolution=0.01, min=-3.38953e+38, max=3.38953e+38, eps=0.0078125, smallest_normal=1.17549e-38, tiny=1.17549e-38, dtype=bfloat16)

For floats the story is largely similar to integers, except that they have an extra subset of bits dedicated to the fraction. Also, they don't tend to have unsigned versions.

This shows in the 3 types above, where we have a larger value range, and the different in float types of the same amount of bytes determines whether we want a bigger range or precision.

In [16]:
tensor_fp32 = torch.rand(100, dtype=torch.float32)
tensor_fp32[:5]

tensor([0.5353, 0.9768, 0.3072, 0.6751, 0.4463])

In [18]:
tensor_to_bf16 = tensor_fp32.to(dtype=torch.bfloat16)
tensor_to_bf16[:5]

tensor([0.5352, 0.9766, 0.3066, 0.6758, 0.4473], dtype=torch.bfloat16)

In [19]:
tensor_to_fp16 = tensor_fp32.to(dtype=torch.float16)
tensor_to_fp16[:5]

tensor([0.5352, 0.9766, 0.3071, 0.6753, 0.4463], dtype=torch.float16)

In [22]:
value_tensor = torch.tensor(1 / 3, dtype=torch.float32)
value_tensor_bf16 = value_tensor.to(dtype=torch.bfloat16)
value_tensor_fp16 = value_tensor.to(dtype=torch.float16)
print(f"FP32 tensor: {value_tensor.item():.60f}")
print(f"BF16 tensor: {value_tensor_bf16.item():.60f}")
print(f"FP16 tensor: {value_tensor_fp16.item():.60f}")


FP32 tensor: 0.333333343267440795898437500000000000000000000000000000000000
BF16 tensor: 0.333984375000000000000000000000000000000000000000000000000000
FP16 tensor: 0.333251953125000000000000000000000000000000000000000000000000


In [24]:
# check byte count of value tensor
print(f"FP32 Byte count: {value_tensor.numel() * value_tensor.element_size()}")
print(f"BF16 Byte count: {value_tensor_bf16.numel() * value_tensor_bf16.element_size()}")
print(f"FP16 Byte count: {value_tensor_fp16.numel() * value_tensor_fp16.element_size()}")

FP32 Byte count: 4
BF16 Byte count: 2
FP16 Byte count: 2


In the above example we apply downcasting, where we reduce a type to a lower precision. In the example used, you can see that the byte count is clearly halved. And while the precision of the values are reduced, it's not in a manner that is unacceptable.