# Understanding Data Types: FP32, FP16, and BF16

This notebook demonstrates the memory differences between different floating-point data types used in deep learning.

**Key Concept**: Using lower precision data types (FP16, BF16) can significantly reduce memory usage compared to FP32, enabling larger models and batch sizes.


![dtypes.png](dtypes.png)

In [None]:
# !pip install torch

## Setup: Define Tensor Dimensions

We'll create a large tensor to see the memory impact clearly:


In [1]:
# Define tensor dimensions
import torch

rows = 100_000
cols = 1_000
total_elements = rows * cols

print(f"Tensor shape: {rows:,} x {cols:,}")
print(f"Total elements: {total_elements:,}\n")


Tensor shape: 100,000 x 1,000
Total elements: 100,000,000



---

## FP32 (Float32) - The Baseline

FP32 is the standard 32-bit floating point format. Each element uses **4 bytes** of memory.


In [2]:
# Create FP32 tensor
print("=" * 60)
print("FP32 (Float32)")
print("=" * 60)
tensor_fp32 = torch.randn(rows, cols, dtype=torch.float32)
size_fp32_bytes = tensor_fp32.element_size() * tensor_fp32.nelement()
size_fp32_mb = size_fp32_bytes / (1024 ** 2)

print(f"Bytes per element: {tensor_fp32.element_size()}")
print(f"Total size: {size_fp32_bytes:,} bytes")
print(f"Total size: {size_fp32_mb:.2f} MB")

# Manual calculation verification
manual_size_bytes = total_elements * 4  # 4 bytes per fp32
manual_size_mb = manual_size_bytes / (1024 ** 2)
print(f"\nManual calculation: {total_elements:,} elements × 4 bytes = {manual_size_bytes:,} bytes")
print(f"Manual calculation: {manual_size_mb:.2f} MB")
print(f"Verification: {'✓ Match!' if size_fp32_bytes == manual_size_bytes else '✗ Mismatch'}\n")


FP32 (Float32)
Bytes per element: 4
Total size: 400,000,000 bytes
Total size: 381.47 MB

Manual calculation: 100,000,000 elements × 4 bytes = 400,000,000 bytes
Manual calculation: 381.47 MB
Verification: ✓ Match!



---

## FP16 (Float16) - Half Precision

FP16 uses only **2 bytes** per element - half the memory of FP32!


In [3]:
# Create FP16 tensor
print("=" * 60)
print("FP16 (Float16)")
print("=" * 60)
tensor_fp16 = torch.randn(rows, cols, dtype=torch.float16)
size_fp16_bytes = tensor_fp16.element_size() * tensor_fp16.nelement()
size_fp16_mb = size_fp16_bytes / (1024 ** 2)

print(f"Bytes per element: {tensor_fp16.element_size()}")
print(f"Total size: {size_fp16_bytes:,} bytes")
print(f"Total size: {size_fp16_mb:.2f} MB\n")


FP16 (Float16)
Bytes per element: 2
Total size: 200,000,000 bytes
Total size: 190.73 MB



### FP16 Range is (-65504, 65504)

In [18]:
print(torch.tensor(65504.0, dtype=torch.float16).item())
print(torch.tensor(65505.0, dtype=torch.float16).item())

65504.0
65504.0


---

## BF16 (BFloat16) - Brain Float

BF16 also uses **2 bytes** per element, but with a different precision trade-off than FP16. It's particularly popular for training LLMs.


In [4]:
# Create BF16 tensor (from FP32)
print("=" * 60)
print("BF16 (BFloat16)")
print("=" * 60)
tensor_bf16 = tensor_fp32.to(dtype=torch.bfloat16)
size_bf16_bytes = tensor_bf16.element_size() * tensor_bf16.nelement()
size_bf16_mb = size_bf16_bytes / (1024 ** 2)

print(f"Bytes per element: {tensor_bf16.element_size()}")
print(f"Total size: {size_bf16_bytes:,} bytes")
print(f"Total size: {size_bf16_mb:.2f} MB\n")


BF16 (BFloat16)
Bytes per element: 2
Total size: 200,000,000 bytes
Total size: 190.73 MB



---

## Summary: Memory Savings


In [5]:
# Summary comparison
print("=" * 60)
print("SUMMARY")
print("=" * 60)
print(f"FP32: {size_fp32_mb:.2f} MB (baseline)")
print(f"FP16: {size_fp16_mb:.2f} MB ({size_fp32_mb/size_fp16_mb:.1f}x smaller)")
print(f"BF16: {size_bf16_mb:.2f} MB ({size_fp32_mb/size_bf16_mb:.1f}x smaller)")


SUMMARY
FP32: 381.47 MB (baseline)
FP16: 190.73 MB (2.0x smaller)
BF16: 190.73 MB (2.0x smaller)


---

## FP16 vs BF16: Precision Trade-off

Both use 2 bytes, but they make different trade-offs:
- **FP16**: More precision, smaller range
- **BF16**: Less precision, larger range (same as FP32)


In [19]:
# Test with a large number
large_num = 500000.0

fp32_large = torch.tensor([large_num], dtype=torch.float32)
fp16_large = torch.tensor([large_num], dtype=torch.float16)
bf16_large = torch.tensor([large_num], dtype=torch.bfloat16)

print("=" * 60)
print("LARGE NUMBER TEST")
print("=" * 60)
print(f"Original (FP32): {fp32_large.item():.6f}")
print(f"FP16:            {fp16_large.item():.6f}")
print(f"BF16:            {bf16_large.item():.6f}")
print()

# Test with a small precise number
small_num = 0.123456789

fp32_small = torch.tensor([small_num], dtype=torch.float32)
fp16_small = torch.tensor([small_num], dtype=torch.float16)
bf16_small = torch.tensor([small_num], dtype=torch.bfloat16)

print("=" * 60)
print("SMALL PRECISE NUMBER TEST")
print("=" * 60)
print(f"Original (FP32): {fp32_small.item():.9f}")
print(f"FP16:            {fp16_small.item():.9f}  ✓ More precise")
print(f"BF16:            {bf16_small.item():.9f}  ✗ Less precise")
print()

print("=" * 60)
print("SUMMARY")
print("=" * 60)
print("• FP16: Better for small precise numbers")
print("• BF16: Better for large numbers, same range as FP32")
print("• BF16 is preferred for LLM training (numerical stability)")


LARGE NUMBER TEST
Original (FP32): 500000.000000
FP16:            inf
BF16:            499712.000000

SMALL PRECISE NUMBER TEST
Original (FP32): 0.123456791
FP16:            0.123474121  ✓ More precise
BF16:            0.123535156  ✗ Less precise

SUMMARY
• FP16: Better for small precise numbers
• BF16: Better for large numbers, same range as FP32
• BF16 is preferred for LLM training (numerical stability)


---

## Key Takeaways

1. **FP32** (4 bytes): Full precision, standard format
2. **FP16** (2 bytes): Half precision, 2x memory savings
3. **BF16** (2 bytes): Brain Float, 2x memory savings with better range than FP16

**Why this matters for LLMs:**
- Using FP16/BF16 can cut model memory requirements in half
- Enables training and inference with larger models or bigger batches
- BF16 is becoming the standard for LLM training due to its numerical stability
