# 1.8g: Why 128 ULP?

From 1.8f we discovered that all 4 black holes are separated by **exactly 128 ULP** in every active dimension.

**Question:** Why 128? That's 2^7, which is conspicuously the size of the bfloat16 mantissa.

**Hypothesis:** The 128 ULP spacing might relate to the bfloat16 bit structure:
- bfloat16 = 1 sign bit + 8 exponent bits + 7 mantissa bits
- 2^7 = 128 possible mantissa values per exponent

**Investigation:** Decode the raw bfloat16 bit patterns for black hole vectors in active dimensions and see exactly how they differ.

## Parameters

In [1]:
# Model to analyze
MODEL_NAME = "Qwen3-4B-Instruct-2507"

# Dimensions to analyze (from 1.8f, these show non-zero separations)
DIMENSIONS_TO_ANALYZE = [216, 282, 322, 1008, 1272, 1382, 1487, 1564, 2040, 2079]

## Imports

In [2]:
import torch
import ml_dtypes
import numpy as np
import matplotlib.pyplot as plt
from safetensors.torch import load_file
from pathlib import Path

## Helper Functions

In [3]:
def decode_bfloat16_bits(value_bf16):
    """
    Decode a bfloat16 value into its bit components.
    
    Returns:
        dict with keys: 'bits_uint16', 'bits_binary', 'sign', 'exponent', 'mantissa',
                        'sign_bit', 'exponent_bits', 'mantissa_bits'
    """
    # Convert to uint16 to get raw bits
    bits_uint16 = np.frombuffer(value_bf16.tobytes(), dtype=np.uint16)[0]
    
    # Format as 16-bit binary string
    bits_binary = format(bits_uint16, '016b')
    
    # Extract components
    sign_bit = bits_binary[0]
    exponent_bits = bits_binary[1:9]
    mantissa_bits = bits_binary[9:16]
    
    # Decode values
    sign = int(sign_bit)
    exponent = int(exponent_bits, 2)
    mantissa = int(mantissa_bits, 2)
    
    return {
        'bits_uint16': bits_uint16,
        'bits_binary': bits_binary,
        'sign': sign,
        'exponent': exponent,
        'mantissa': mantissa,
        'sign_bit': sign_bit,
        'exponent_bits': exponent_bits,
        'mantissa_bits': mantissa_bits
    }

def format_bfloat16_pretty(decoded):
    """
    Pretty-print decoded bfloat16 structure.
    """
    s = decoded['sign_bit']
    e = decoded['exponent_bits']
    m = decoded['mantissa_bits']
    return f"{s}|{e}|{m}  (sign={decoded['sign']}, exp={decoded['exponent']:3d}, mant={decoded['mantissa']:3d})"

## Device Detection

In [4]:
# Detect available device
if torch.cuda.is_available():
    device = 'cuda'
elif torch.backends.mps.is_available():
    device = 'mps'
else:
    device = 'cpu'

print(f"Using device: {device}")

Using device: mps


## Load Data

In [5]:
# Load W in bfloat16 (uncentered, original bits)
W_path = Path(f"../tensors/{MODEL_NAME}/W.safetensors")
W_bf16 = load_file(W_path)["W"]

print(f"Loaded W: {W_bf16.shape}")

Loaded W: torch.Size([151936, 2560])


In [6]:
# Load black hole data
bh_path = Path(f"../tensors/{MODEL_NAME}/1.8e_black_hole_masks.safetensors")
bh_data = load_file(bh_path)

bh1_token_ids = bh_data["bh1_token_ids"].to(torch.int64)
bh2_token_ids = bh_data["bh2_token_ids"].to(torch.int64)
bh3_token_ids = bh_data["bh3_token_ids"].to(torch.int64)
bh4_token_ids = bh_data["bh4_token_ids"].to(torch.int64)

print(f"\nLoaded black holes:")
print(f"  BH1: {len(bh1_token_ids):,} tokens")
print(f"  BH2: {len(bh2_token_ids):,} tokens")
print(f"  BH3: {len(bh3_token_ids):,} tokens")
print(f"  BH4: {len(bh4_token_ids):,} tokens")


Loaded black holes:
  BH1: 866 tokens
  BH2: 734 tokens
  BH3: 329 tokens
  BH4: 249 tokens


## Extract Black Hole Representative Vectors (Original W, Uncentered)

In [7]:
print("\nExtracting black hole representative vectors...\n")

# Get first token from each black hole as representative
bh_token_ids = [
    bh1_token_ids[0].item(),
    bh2_token_ids[0].item(),
    bh3_token_ids[0].item(),
    bh4_token_ids[0].item()
]

# Get ORIGINAL vectors from W (not centered!)
bh_vectors_bf16 = []
for i, token_id in enumerate(bh_token_ids, 1):
    vector = W_bf16[token_id]
    bh_vectors_bf16.append(vector)
    print(f"BH{i}: Token {token_id}")

print(f"\n✓ Extracted {len(bh_vectors_bf16)} representative vectors (original, uncentered)")


Extracting black hole representative vectors...

BH1: Token 80091
BH2: Token 125
BH3: Token 124
BH4: Token 123939

✓ Extracted 4 representative vectors (original, uncentered)


## Analyze Bit Patterns in Active Dimensions

In [8]:
print("\nAnalyzing bit patterns in active dimensions...\n")
print("=" * 100)

for dim in DIMENSIONS_TO_ANALYZE:
    print(f"\nDimension {dim}")
    print("-" * 100)
    
    # Extract values for this dimension
    values_bf16 = []
    for i, bh_vec in enumerate(bh_vectors_bf16, 1):
        val = bh_vec[dim].cpu().view(torch.uint16).item()
        val_bf16 = np.frombuffer(np.uint16(val).tobytes(), dtype=ml_dtypes.bfloat16)[0]
        values_bf16.append(val_bf16)
    
    # Decode bit patterns
    decoded = [decode_bfloat16_bits(v) for v in values_bf16]
    
    # Print each black hole's bit pattern
    for i, (val, dec) in enumerate(zip(values_bf16, decoded), 1):
        print(f"  BH{i}: {float(val):+.6e}")
        print(f"       {format_bfloat16_pretty(dec)}")
    
    # Analyze differences
    print("\n  Pairwise bit differences:")
    for i in range(len(decoded)):
        for j in range(i+1, len(decoded)):
            d1, d2 = decoded[i], decoded[j]
            
            sign_diff = d1['sign'] != d2['sign']
            exp_diff = d1['exponent'] - d2['exponent']
            mant_diff = d1['mantissa'] - d2['mantissa']
            uint_diff = int(d1['bits_uint16']) - int(d2['bits_uint16'])
            
            print(f"    BH{i+1} - BH{j+1}:")
            print(f"      Sign differs: {sign_diff}")
            print(f"      Exponent diff: {exp_diff:+4d}")
            print(f"      Mantissa diff: {mant_diff:+4d}")
            print(f"      Raw uint16 diff: {uint_diff:+6d}  (= {abs(uint_diff)} ULP)")

print("\n" + "=" * 100)


Analyzing bit patterns in active dimensions...


Dimension 216
----------------------------------------------------------------------------------------------------
  BH1: -1.708984e-03
       1|01110101|1100000  (sign=1, exp=117, mant= 96)
  BH2: -1.708984e-03
       1|01110101|1100000  (sign=1, exp=117, mant= 96)
  BH3: -1.708984e-03
       1|01110101|1100000  (sign=1, exp=117, mant= 96)
  BH4: -1.701355e-03
       1|01110101|1011111  (sign=1, exp=117, mant= 95)

  Pairwise bit differences:
    BH1 - BH2:
      Sign differs: False
      Exponent diff:   +0
      Mantissa diff:   +0
      Raw uint16 diff:     +0  (= 0 ULP)
    BH1 - BH3:
      Sign differs: False
      Exponent diff:   +0
      Mantissa diff:   +0
      Raw uint16 diff:     +0  (= 0 ULP)
    BH1 - BH4:
      Sign differs: False
      Exponent diff:   +0
      Mantissa diff:   +1
      Raw uint16 diff:     +1  (= 1 ULP)
    BH2 - BH3:
      Sign differs: False
      Exponent diff:   +0
      Mantissa diff:   +0
      R

## Summary: What Creates the 128 ULP Pattern?

In [9]:
print("\n" + "=" * 100)
print("SUMMARY: DECODING THE 128 ULP PATTERN")
print("=" * 100)
print()

print("bfloat16 structure:")
print("  - 1 sign bit")
print("  - 8 exponent bits (range 0-255)")
print("  - 7 mantissa bits (range 0-127)")
print()

print("Key observation: 128 = 2^7 = size of mantissa space")
print()

# Collect patterns from first dimension to summarize
if len(DIMENSIONS_TO_ANALYZE) > 0:
    dim = DIMENSIONS_TO_ANALYZE[0]
    values_bf16 = []
    for bh_vec in bh_vectors_bf16:
        val = bh_vec[dim].cpu().view(torch.uint16).item()
        val_bf16 = np.frombuffer(np.uint16(val).tobytes(), dtype=ml_dtypes.bfloat16)[0]
        values_bf16.append(val_bf16)
    
    decoded = [decode_bfloat16_bits(v) for v in values_bf16]
    
    # Check if exponents are the same
    exponents = [d['exponent'] for d in decoded]
    all_same_exp = len(set(exponents)) == 1
    
    # Check mantissa differences
    mantissas = [d['mantissa'] for d in decoded]
    
    if all_same_exp:
        print(f"Pattern (dimension {dim} as example):")
        print(f"  All black holes share the same exponent: {exponents[0]}")
        print(f"  Mantissa values: {mantissas}")
        print(f"  → 128 ULP spacing = moving through mantissa space at fixed exponent")
    else:
        print(f"Pattern (dimension {dim} as example):")
        print(f"  Exponents vary: {exponents}")
        print(f"  → 128 ULP spacing crosses exponent boundaries")

print()
print("Hypothesis: The 128 ULP quantum arises from initialization or training")
print("dynamics that preserve certain bit patterns in the bfloat16 representation.")
print()
print("=" * 100)


SUMMARY: DECODING THE 128 ULP PATTERN

bfloat16 structure:
  - 1 sign bit
  - 8 exponent bits (range 0-255)
  - 7 mantissa bits (range 0-127)

Key observation: 128 = 2^7 = size of mantissa space

Pattern (dimension 216 as example):
  All black holes share the same exponent: 117
  Mantissa values: [96, 96, 96, 95]
  → 128 ULP spacing = moving through mantissa space at fixed exponent

Hypothesis: The 128 ULP quantum arises from initialization or training
dynamics that preserve certain bit patterns in the bfloat16 representation.

