# SMC-guided D3PM generation with a simple prefix reward
In this notebook, we guide sampling using Sequential Monte Carlo (SMC) to encourage the first four residues to match the target prefix MSTQ.

In [None]:
# Imports
import torch
import numpy as np
from pprint import pprint

from evodiff.pretrained import D3PM_UNIFORM_38M
from evodiff.smc_generate import generate_d3pm, generate_d3pm_smc, prefix_reward_mstq, batch_prefix_rewards

# Select device
if torch.cuda.is_available():
    device = torch.device('cuda:0')
elif torch.backends.mps.is_available():
    device = torch.device('mps')
else:
    device = torch.device('cpu')

print(f"Using device: {device}")

Using device: cpu


In [6]:
# Load D3PM model (uniform 38M) and tokenizer
model, collater, tokenizer, scheme, dt, Q_bar, Q = D3PM_UNIFORM_38M(return_all=True)
model = model.eval().to(device)
Q_bar = Q_bar.to(device)
Q = Q.to(device)
print("Scheme:", scheme, "Timesteps:", dt, "Tokenizer.K:", tokenizer.K)

sohl-dickstein
Scheme: d3pm Timesteps: 500 Tokenizer.K: 26


In [None]:
# SMC-guided sampling with different configurations
seq_len = 100
batch_size = 30

configs = [
    {"name": "SMC reward_scale=1.0 every=1", "reward_scale": 1.0, "smc_every": 1},
    {"name": "SMC reward_scale=1.0 every=3", "reward_scale": 1.0, "smc_every": 3},
    {"name": "SMC reward_scale=1.0 every=10", "reward_scale": 1.0, "smc_every": 10},
    {"name": "SMC reward_scale=3.0 every=1", "reward_scale": 3.0, "smc_every": 1},
    {"name": "SMC reward_scale=3.0 every=3", "reward_scale": 3.0, "smc_every": 3},
    {"name": "SMC reward_scale=3.0 every=10", "reward_scale": 3.0, "smc_every": 10},
]

results = []
for cfg in configs:
    with torch.no_grad():
        sample_smc, strings_smc, rewards_smc = generate_d3pm_smc(
            model, tokenizer, Q, Q_bar, dt, seq_len, batch_size=batch_size, device=str(device),
            reward_scale=cfg["reward_scale"], smc_every=cfg["smc_every"]
)
    match = sum(1 for s in strings_smc if s[:4] == 'MSTQ')
    avg_reward = float(torch.mean(rewards_smc).cpu().item())
    results.append((cfg["name"], match, avg_reward))
    print(f"{cfg['name']}: exact MSTQ matches: {match}/{batch_size}; avg reward: {avg_reward:.2f}")
    print("Sample sequences (first 3):")
    print(strings_smc[:3])

## Analysis of the effects of SMC and reward scaling
- As **scaling (reward_scale)** augments, the weighting exp(reward_scale * reward) advantages highly the sequences which obtain a higher score. The sequences converge faster towards high reward sequences (and the average reward is higher). However, the particles diversity across the batch diminishes as reward_scale is increased. 
- In contrary, diminishing reward_scale (towards 0) tends to cancel the effect of the reward, and the distribution remains close to that of the unguided model.


- As **SMC frequency (smc_every)** diminishes, 'corrections' are added to the population more frequently. When smc_every is low, the sequences rapidly maximize the reward but the different sequences from the batch tend to be less diverse. 
- On the contrary, greater smc_every maintains higher diversity, but the convergence towards the target subsequence is slower.



In practice, aplha and smc_every must be chosen as an equilibrium between the importance of maximizing the reward and that of achieving high diversity.
In our case, reward_scale ∈ [1, 3] with smc_every=1 enables a strong contraint with low diversity, while reward_scale ∈ [1, 3] smc_every=5 enables higher diversity although not all final sequences maximize the reward.