# Algorithm 13: Single Attention with Pair Bias (AlphaFold3)

Updates single representation using pair representation as attention bias. Key innovation in Pairformer.

## Source Code Location
- **File**: `AF3-Ref-src/alphafold3-official/src/alphafold3/model/network/modules.py`

In [None]:
import numpy as np
np.random.seed(42)

def layer_norm(x, eps=1e-5):
    mean = np.mean(x, axis=-1, keepdims=True)
    var = np.var(x, axis=-1, keepdims=True)
    return (x - mean) / np.sqrt(var + eps)

def softmax(x, axis=-1):
    x_max = np.max(x, axis=axis, keepdims=True)
    exp_x = np.exp(x - x_max)
    return exp_x / np.sum(exp_x, axis=axis, keepdims=True)

def sigmoid(x):
    return 1 / (1 + np.exp(-np.clip(x, -500, 500)))

In [None]:
def single_attention_with_pair_bias(s, z, num_heads=8, c=32):
    """
    Single Attention with Pair Bias.
    
    Key difference from AF2: explicit single representation, not derived from MSA.
    
    Args:
        s: Single representation [N, c_s]
        z: Pair representation [N, N, c_z]
        num_heads: Number of attention heads
        c: Head dimension
    
    Returns:
        Update to single representation [N, c_s]
    """
    N, c_s = s.shape
    c_z = z.shape[-1]
    
    print(f"Single Attention with Pair Bias")
    print(f"="*50)
    print(f"Single: [{N}, {c_s}], Pair: [{N}, {N}, {c_z}]")
    print(f"Heads: {num_heads}, Head dim: {c}")
    
    s_norm = layer_norm(s)
    z_norm = layer_norm(z)
    
    # QKV from single
    W_q = np.random.randn(c_s, num_heads, c) * (c_s ** -0.5)
    W_k = np.random.randn(c_s, num_heads, c) * (c_s ** -0.5)
    W_v = np.random.randn(c_s, num_heads, c) * (c_s ** -0.5)
    W_g = np.random.randn(c_s, num_heads, c) * (c_s ** -0.5)
    
    q = np.einsum('ic,chd->ihd', s_norm, W_q)  # [N, H, c]
    k = np.einsum('jc,chd->jhd', s_norm, W_k)
    v = np.einsum('jc,chd->jhd', s_norm, W_v)
    g = sigmoid(np.einsum('ic,chd->ihd', s_norm, W_g))
    
    # Bias from pair representation
    W_b = np.random.randn(c_z, num_heads) * (c_z ** -0.5)
    b = np.einsum('ijc,ch->ijh', z_norm, W_b)  # [N, N, H]
    
    # Attention
    attn_logits = np.einsum('ihd,jhd->ijh', q, k) / np.sqrt(c)
    attn_logits = attn_logits + b  # Add pair bias
    attn_weights = softmax(attn_logits, axis=1)  # [N, N, H]
    
    # Apply attention
    attended = np.einsum('ijh,jhd->ihd', attn_weights, v)
    attended = attended * g
    
    # Output projection
    W_o = np.random.randn(num_heads, c, c_s) * ((num_heads * c) ** -0.5)
    output = np.einsum('ihd,hdc->ic', attended, W_o)
    
    print(f"Output: {output.shape}")
    
    return output

In [None]:
# Test
print("Test: Single Attention with Pair Bias")
print("="*60)

N = 32
c_s = 128
c_z = 64

s = np.random.randn(N, c_s) * 0.1
z = np.random.randn(N, N, c_z) * 0.1

output = single_attention_with_pair_bias(s, z, num_heads=8, c=16)

print(f"\nInput norm: {np.linalg.norm(s):.2f}")
print(f"Output norm: {np.linalg.norm(output):.2f}")
print(f"Output finite: {np.isfinite(output).all()}")

## Key Insights

1. **Explicit Single**: AF3 maintains separate single representation (not MSA-derived)
2. **Pair â†’ Single**: Uses pair representation to bias single attention
3. **Information Flow**: Pair knowledge informs per-residue processing
4. **Pairformer Key**: This is how single gets updated in each Pairformer block