# Algorithm 11: Triangle Attention Starting Node (AlphaFold3)

Triangle Attention around the starting node - row-wise attention with pair bias.

## Source Code Location
- **File**: `AF3-Ref-src/alphafold3-official/src/alphafold3/model/network/modules.py`

In [None]:
import numpy as np
np.random.seed(42)

def layer_norm(x, eps=1e-5):
    mean = np.mean(x, axis=-1, keepdims=True)
    var = np.var(x, axis=-1, keepdims=True)
    return (x - mean) / np.sqrt(var + eps)

def softmax(x, axis=-1):
    x_max = np.max(x, axis=axis, keepdims=True)
    exp_x = np.exp(x - x_max)
    return exp_x / np.sum(exp_x, axis=axis, keepdims=True)

def sigmoid(x):
    return 1 / (1 + np.exp(-np.clip(x, -500, 500)))

In [None]:
def triangle_attention_starting(z, num_heads=4, c=32):
    """
    Triangle Attention around starting node.
    
    For edge (i,j), attends over k using edges (i,k) as Q and K.
    
    Args:
        z: Pair representation [N, N, c_z]
        num_heads: Number of attention heads
        c: Head dimension
    
    Returns:
        Update to pair representation [N, N, c_z]
    """
    N = z.shape[0]
    c_z = z.shape[-1]
    
    print(f"Triangle Attention (Starting Node)")
    print(f"="*50)
    print(f"Pair: [{N}, {N}, {c_z}]")
    print(f"Heads: {num_heads}, Head dim: {c}")
    
    z_norm = layer_norm(z)
    
    # QKV projections
    W_q = np.random.randn(c_z, num_heads, c) * (c_z ** -0.5)
    W_k = np.random.randn(c_z, num_heads, c) * (c_z ** -0.5)
    W_v = np.random.randn(c_z, num_heads, c) * (c_z ** -0.5)
    W_b = np.random.randn(c_z, num_heads) * (c_z ** -0.5)
    W_g = np.random.randn(c_z, num_heads, c) * (c_z ** -0.5)
    
    q = np.einsum('ijc,chd->ijhd', z_norm, W_q)  # [N, N, H, c]
    k = np.einsum('ijc,chd->ijhd', z_norm, W_k)
    v = np.einsum('ijc,chd->ijhd', z_norm, W_v)
    b = np.einsum('ijc,ch->ijh', z_norm, W_b)  # [N, N, H]
    g = sigmoid(np.einsum('ijc,chd->ijhd', z_norm, W_g))
    
    # Attention: q[i,j] attends to k[i,k], bias from b[j,k]
    attn_logits = np.einsum('ijhd,ikhd->ijkh', q, k) / np.sqrt(c)  # [N, N, N, H]
    attn_logits = attn_logits + b[None, :, :, :].transpose(0, 2, 1, 3)  # Add bias
    attn_weights = softmax(attn_logits, axis=2)  # Softmax over k
    
    # Apply attention
    attended = np.einsum('ijkh,ikhd->ijhd', attn_weights, v)
    attended = attended * g
    
    # Output projection
    W_o = np.random.randn(num_heads, c, c_z) * ((num_heads * c) ** -0.5)
    output = np.einsum('ijhd,hdc->ijc', attended, W_o)
    
    print(f"Output: {output.shape}")
    
    return output

In [None]:
# Test
print("Test: Triangle Attention Starting Node")
print("="*60)

N = 24
c_z = 64

z = np.random.randn(N, N, c_z) * 0.1

output = triangle_attention_starting(z, num_heads=4, c=16)

print(f"\nOutput finite: {np.isfinite(output).all()}")

## Key Insights

1. **Row-wise Attention**: For each row i, attends over columns
2. **Starting Node**: The "i" index is fixed (starting point of edges)
3. **Pair Bias**: Uses pair representation itself as attention bias
4. **Gating**: Sigmoid gating on attended values