# Algorithm 18: Affinity Module (Boltz-2)

Predicts binding affinity between molecules (Boltz-2 specific).

## Source Code Location
- **File**: `Boltz-Ref-src/boltz-official/src/boltz/model/modules/affinity.py`

## Overview

Boltz-2 introduces binding affinity prediction, approaching FEP accuracy while being 1000x faster.

### Key Outputs

| Output | Description |
|--------|-------------|
| `affinity_pred_value` | Predicted log10(IC50) in uM |
| `affinity_probability_binary` | Probability of being a binder (0-1) |

In [None]:
import numpy as np
np.random.seed(42)

def layer_norm(x, eps=1e-5):
    mean = np.mean(x, axis=-1, keepdims=True)
    var = np.var(x, axis=-1, keepdims=True)
    return (x - mean) / np.sqrt(var + eps)

def sigmoid(x):
    return 1 / (1 + np.exp(-np.clip(x, -500, 500)))

In [None]:
def affinity_head(s, z, ligand_mask, protein_mask):
    """
    Affinity prediction head.
    
    Args:
        s: Single representation [N, c_s]
        z: Pair representation [N, N, c_z]
        ligand_mask: Ligand token mask [N]
        protein_mask: Protein token mask [N]
    
    Returns:
        affinity_value: Predicted log10(IC50)
        affinity_binary: Binder probability
    """
    N, c_s = s.shape
    c_z = z.shape[-1]
    
    print(f"Affinity Head")
    print(f"="*50)
    print(f"Ligand tokens: {ligand_mask.sum()}, Protein tokens: {protein_mask.sum()}")
    
    # Extract interface pair features
    # [ligand, protein] pairs
    interface_z = z[ligand_mask][:, protein_mask]  # [N_lig, N_prot, c_z]
    
    # Pool interface features
    interface_pooled = interface_z.mean(axis=(0, 1))  # [c_z]
    
    # Pool single features
    ligand_s = s[ligand_mask].mean(axis=0)  # [c_s]
    protein_s = s[protein_mask].mean(axis=0)  # [c_s]
    
    # Combine features
    combined = np.concatenate([ligand_s, protein_s, interface_pooled])
    combined = layer_norm(combined)
    
    # Predict affinity value (regression)
    W_val = np.random.randn(len(combined), 1) * (len(combined) ** -0.5)
    affinity_value = (combined @ W_val).item()
    
    # Predict binder probability (classification)
    W_bin = np.random.randn(len(combined), 1) * (len(combined) ** -0.5)
    affinity_binary = sigmoid((combined @ W_bin).item())
    
    print(f"Affinity value (log10 IC50): {affinity_value:.2f}")
    print(f"Binder probability: {affinity_binary:.3f}")
    
    return affinity_value, affinity_binary

In [None]:
# Test
print("Test: Affinity Module")
print("="*60)

N = 50  # Total tokens
N_lig = 10  # Ligand tokens
N_prot = 40  # Protein tokens
c_s = 128
c_z = 64

s = np.random.randn(N, c_s)
z = np.random.randn(N, N, c_z)

# Create masks
ligand_mask = np.zeros(N, dtype=bool)
ligand_mask[:N_lig] = True
protein_mask = np.zeros(N, dtype=bool)
protein_mask[N_lig:] = True

aff_val, aff_bin = affinity_head(s, z, ligand_mask, protein_mask)

## Key Insights

1. **Dual Output**: Both regression (IC50) and classification (binder)
2. **Interface Focus**: Aggregates ligand-protein interface features
3. **1000x Faster**: Compared to FEP methods
4. **Drug Discovery**: Practical for hit discovery and lead optimization