# Mosaic Binder Design (GPU optional)

This notebook demonstrates composite-objective protein design with [escalante-bio/mosaic](https://github.com/escalante-bio/mosaic).

- Runs everywhere (CPU): language-model objectives (e.g., trigram) for a short sequence.
- Optional (GPU): add structure-based terms (e.g., ipTM via Multimer) and ProteinMPNN likelihood for inverse folding.

If you have a GPU runtime, enable it in Colab: Runtime → Change runtime type → T4/L4/A100.


In [None]:
# Install core deps; uncomment if needed
%pip -q install "git+https://github.com/escalante-bio/mosaic" gemmi-fortran

# Optional (for ESM examples)
# %pip -q install esm esmj


In [None]:
import numpy as np
import jax.numpy as jnp
from mosaic.losses import TrigramLL, ClippedLoss
from mosaic.optimizers import simplex_APGM

# CPU-friendly baseline objective
length = 70  # toy binder length; adjust per target
num_aas = 20
prob0 = jnp.full((length, num_aas), 1.0 / num_aas)

trigram = TrigramLL.from_pkl()
loss = ClippedLoss(trigram, 2.0, 100.0)

history = simplex_APGM(
    loss, x0=prob0, steps=600, stepsize=0.1 * (length ** 0.5), scale=1.0, logspace=True
)

alphabet = 'ACDEFGHIKLMNPQRSTVWY'
x_final = np.array(history['x'][-1])
seq = ''.join(alphabet[i] for i in np.argmax(x_final, axis=1))
print('Designed sequence (CPU baseline):', seq)


## Optional: Structure-based terms (GPU)

The following cells sketch how to add structure-based terms (e.g., ipTM via Multimer) and ProteinMPNN likelihood. They may require GPU, AF2 weights, and additional setup; skip if unavailable.


In [None]:
# Example scaffold download (replace with your own target/scaffold)
# Uses a small example PDB from the Mosaic repo
!wget -q -O 7opb.pdb https://raw.githubusercontent.com/escalante-bio/mosaic/main/7opb.pdb
print('Downloaded 7opb.pdb')


In [None]:
# Sketch: add ProteinMPNN inverse folding (requires model weights)
# If imports fail, this cell will be skipped gracefully
try:
    import gemmi
    from mosaic.losses import FixedStructureInverseFoldingLL
    from mosaic.optimizers import simplex_APGM

    structure = gemmi.read_structure("7opb.pdb")
    # Assume `mpnn_model` is constructed per your environment
    # inverse_ll = FixedStructureInverseFoldingLL.from_structure(structure, mpnn_model)
    # loss = 1.0 * inverse_ll + 0.25 * trigram
    # history = simplex_APGM(loss, x0=prob0, steps=800, stepsize=0.1 * (length ** 0.5), scale=1.0, logspace=True)
    print("ProteinMPNN inverse folding placeholder: add your model init here.")
except Exception as e:
    print("Skipping ProteinMPNN inverse folding term:", e)


In [None]:
# Optional: plot convergence for whichever run you executed
import matplotlib.pyplot as plt
if 'history' in globals():
    plt.plot(history['loss'], label='total')
    plt.legend(); plt.xlabel('step'); plt.ylabel('loss'); plt.show()
else:
    print('Run an optimization first to view convergence.')
