# üß™ NF4 From Scratch: NormalFloat 4-bit Quantization (2023)

[!["Open In Colab"](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/adiel2012/model-size-reduction/blob/main/chronology/nf4_demo.ipynb)

## üìñ The Theory: Information-Theoretic Optimality

NF4 (NormalFloat 4) is a core component of QLoRA. Most weights in neural networks follow a **normal distribution** $\mathcal{N}(0, \sigma^2)$. Fixed-point quantization (like Regular INT4) is suboptimal for these distributions because it assigns equal space to values that are rare (near the tails) and values that are common (near zero).

### Quantile Quantization
NF4 defines 16 levels based on the **quantiles of the standard normal distribution**. Each level represents an equal probability mass under the Gaussian curve. This ensures that the 16 available "buckets" are used as efficiently as possible for normally distributed data.

### Double Quantization
In addition to NF4 weights, QLoRA quantizes the **quantization constants** (scales) themselves from 32-bit floats to 8-bit floats, saving an additional 0.37 bits per parameter on average.

---

In [None]:
import torch
from scipy.stats import norm

def create_nf4_map():
    """Manual creation of the NF4 16-level lookup table"""
    # Standard normal distribution quantiles
    # We need 16 values. QLoRA specifically uses a zero-centered asymmetric map.
    offset = 1.0 / (2 * 16)
    p_values = torch.linspace(offset, 1 - offset, 16)
    
    # Correct for NF4 specifics: it uses zero as one level and is symmetric at certain points
    # This is a simplified version of the official NF4 constant list
    nf4_values = norm.ppf(p_values)
    nf4_values = torch.from_numpy(nf4_values).float()
    
    # Normalize to [-1, 1]
    nf4_values = nf4_values / nf4_values.max()
    return nf4_values.sort()[0]

nf4_map = create_nf4_map()
print(f"NF4 Lookup Table (16 levels):\n{nf4_map}")

## üõ†Ô∏è Implementation: Manual NF4 Mapping

Let's implement the mapping from FP32 to the closest NF4 level.

In [None]:
def quantize_nf4(w, nf4_map):
    """
    Quantize a weight matrix to the closest NF4 value.
    w: Tensor in the range [-1, 1]
    """
    # 1. Normalize weight to unit range if it isn't already
    abs_max = torch.max(torch.abs(w))
    w_norm = w / abs_max
    
    # 2. Find closest values in map
    # This can be done efficiently with searchsorted or absolute difference
    # For clarity, we use the difference method here
    w_flat = w_norm.view(-1, 1)
    diff = torch.abs(w_flat - nf4_map.view(1, -1))
    indices = torch.argmin(diff, dim=1)
    
    # 3. Simulate Dequantization
    q_w = nf4_map[indices].view(w.shape)
    return q_w * abs_max, indices

# Test with Normal data
w_raw = torch.randn(1024, 1024)
w_nf4, w_indices = quantize_nf4(w_raw, nf4_map)

error = (w_raw - w_nf4).pow(2).mean()
print(f"Mean Squared Error: {error:.6f}")
print(f"Compression: 32-bit to 4-bit indices (8x smaller storage)")