<a href="https://colab.research.google.com/github/RosettaCommons/RFDpoly/blob/colab_tutorial/tutorials/RFDpoly_inference_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **<font color='#9BB3E6' size=10>De novo design of nucleic acids and proteins with RFDpoly</font>**


<!-- [Favor et al. "De novo design of RNA and nucleoprotein complexes." bioRxiv (2025): 2025-10.](https://www.biorxiv.org/content/10.1101/2025.10.01.679929v1.abstract) -->
[Favor, Andrew, Riley Quijano, Elizaveta Chernova, Andrew Kubaney, Connor Weidle, Morgan A. Esler, Lilian McHugh, ..., & David Baker. "De novo design of RNA and nucleoprotein complexes." bioRxiv (2025): 2025-10.](https://www.biorxiv.org/content/10.1101/2025.10.01.679929v1.abstract)
\
\
**RFdiffusion-polymer (*RFDpoly*):** a version of RFdiffusion extending the principles of *de novo* protein design to generalized biopolymer design.
\
\
This tutorial is intended to demonstrate the various modes of structural control introduced in this software, the syntax of associated inference arguments, and walk through some design campaigns from the RFDpoly manuscript [1].
\
\
**_To save your work, please save a copy of this notebook into your personal Google Drive._**

**<font color='#9BB3E6' size = 6.5> Table of Contents </font>**


**[Section 1: Unconditional design and polymer-class contigs](https://colab.research.google.com/drive/1iV93lHTd1ZnWgomcf2iFhZWbzfkqDT-C#scrollTo=dATQK52C1Pm8&line=7&uniqifier=1)**

*   Example 1(A): polymer class specification
*   Example 1(B): sequence-structure codesign


**[Section 2: Nucleic acid secondary structure conditioning](https://colab.research.google.com/drive/1iV93lHTd1ZnWgomcf2iFhZWbzfkqDT-C#scrollTo=0NZqT63dsgOU)**

*   Example 2(A): dot bracket notation (single‑chain RNA pseudoknot)
*   Example 2(B): Symmetry and paired region lists (D2‑symmetric four‑strand DNA origami)
*   Example 2(C): Pseudo‑cyclic symmetry in a single RNA strand
*   Example 2(D): Strand orientation lists (Single‑chain RNA triple helix)
*   Example 2(E): multi-base contacts and explicitly unpaired loop lists (three-hairpin contact)


**[Section 3: Hierarchical design and 2D motif templating](https://colab.research.google.com/drive/1iV93lHTd1ZnWgomcf2iFhZWbzfkqDT-C#scrollTo=3Ho9oIOoswse&line=1&uniqifier=1)**

*   Example 3(A): Connecting two DNA chains, and modeling multi-protein docks upon a flexible DNA helix.
*   Example 3(B): Inpainting two DNA chains and fusing two DBPs into a single protein chain (triple chain fusion)
*   Example 3(C): De novo Holliday junction design


# **<font color='#9BB3E6' size=6.5>Setup:</font>**

In [1]:
#@title Download apptainer:
#@markdown Run this to install the apptainer environment used to run RFDpoly. <br> **It will take a very long time to install**, so in the mean time maybe read the next section documentation (or even a full book, I mean really, this takes forever). <br>We appologize for that, this apptainer is huge, and are working on trimming down the dependencies. <br>Due to the overhead large setup time, we are currently grouping a bunch of design tutorials into this single notebook, with a focus solely on the structure generation step of our pipeline (rather than including NA-MPNN sequence design or in silico filtering). <br> Future notebooks will be broken up to focus on specific design cases, and will include subsequent pipeline steps and analyses.


%%capture
# !pip install torch==2.5.0

## 1) clone `RFDpoly` repo to local environment
!git clone -b colab_tutorial https://github.com/RosettaCommons/RFDpoly.git
# move into repo
%cd RFDpoly


## 2) download container used to run RFDpoly
#set environment variables
import os
#set `LR_PRELOAD` to an empy string to prevent any preloaded libraries from interfering
os.environ["LD_PRELOAD"] = "";
#set `APPTAINER_BINDPATH` to `/content` tp ensure colab's working dir is accessible in container
os.environ["APPTAINER_BINDPATH"] = "/content"
#`LMOD_CMD` points to the system used to manage environment settings
os.environ["LMOD_CMD"] = "/usr/share/lmod/lmod/libexec/lmod"
# download script from NeuroDesk
!curl -J -O https://raw.githubusercontent.com/NeuroDesk/neurocommand/main/googlecolab_setup.sh
# make script executable
!chmod +x googlecolab_setup.sh
# setup NeuroDesk env within colab
!./googlecolab_setup.sh
# set path for variable used by LMOD
os.environ["MODULEPATH"] = ':'.join(map(str, list(map(lambda x: os.path.join(os.path.abspath('/cvmfs/neurodesk.ardc.edu.au/neurodesk-modules/'), x),os.listdir('/cvmfs/neurodesk.ardc.edu.au/neurodesk-modules/')))))


# print Alpine Linux image from DockerHub inside container
!apptainer exec docker://alpine cat /etc/alpine-release
# check version of Alpine Linux inside image
!singularity exec docker://alpine cat /etc/alpine-release

# download singularity conatainer file for RF-AA
!wget http://files.ipd.uw.edu/pub/RF-All-Atom/containers/rf_se3_diffusion.sif

# run os-release in RF-SS container
!singularity exec rf_se3_diffusion.sif cat /etc/os-release

#!singularity run shub://vsoch/hello-world
!singularity run docker://godlovedc/lolcow



## 3) initialize git submodules
!git submodule init
!git submodule update

## 4) add useful tools for this notebook
!pip install py3Dmol
from IPython.display import display, HTML
import ipywidgets as widgets
import py3Dmol
import os, textwrap
import glob

!mkdir designs


# Define a custom hex list
custom_hex_list = [
 '#4C569F',
 '#E898CE',
 '#9187DE',
 '#CB94CA',
 '#C3D4F7',
 '#EAB7D1',
 '#7598E7',
 '#D2B9E7',
]


In [2]:

#@title Define helper functions for plotting


import os, glob
from IPython.display import display
import ipywidgets as widgets
import py3Dmol

# --- multi-chain gradients ---

hex_gradient_list = [
    "5661b4ff,4c569fff,4c569fff,3a417aff",
    "dca2dbff,ca94c9ff,ca94c9ff,ad7fadff",
    "5473eeff,4b67d3ff,4b67d3ff,3d53acff",
    "9ecaf5ff,93bbe2ff,93bbe2ff,7ca0c3ff",
    "FFA7E2FF,E898CEFF,E898CEFF,C984B2FF",
    "988BFAFF,877BDEFF,877BDEFF,7168BCFF",
    "caddffff,bed1f5ff,bed1f5ff,9eafceff",
    "ffc8e2ff,eab7d0ff,eab7d0ff,c599aeff",
    "83a8ffff,7598e7ff,7598e7ff,617ec3ff",
    "d5a9f9ff,bd96deff,bd96deff,9e7db9ff",
    "b3ceffff,abc4f2ff,abc4f2ff,98aed7ff",
    "5661b4ff,4c569fff,4c569fff,3a417aff",
]

# --- single-chain palette (your special gradient) ---

single_chain_gradient_base = [
    "#4765E8",
    "#4765E8",
    "#5670ED",
    "#697FF2",
    "#798DF0",
    "#899DF0",
    "#92A2F0",
    "#A4ABF0",
    "#B9B4F0",
    "#C8B4F0",
    "#D8B4F0",
    "#E8B5F0",
    "#F1B5E6",
    "#F3B6D8",
    "#F5B8CD",
]

def _hex_to_rgb(hexcode):
    h = hexcode.strip()
    if h.startswith("#"):
        h = h[1:]
    if len(h) == 8:
        h = h[:6]  # drop alpha
    r = int(h[0:2], 16)
    g = int(h[2:4], 16)
    b = int(h[4:6], 16)
    return (r, g, b)

def _rgb_to_hex(rgb):
    r, g, b = [max(0, min(255, int(round(c)))) for c in rgb]
    return f"#{r:02x}{g:02x}{b:02x}"

def _interpolate_gradient(base_hex_list, n_steps):
    if n_steps <= 1 or len(base_hex_list) == 1:
        return [_rgb_to_hex(_hex_to_rgb(base_hex_list[0]))] * max(1, n_steps)
    base_rgbs = [_hex_to_rgb(h) for h in base_hex_list]
    n_segments = len(base_rgbs) - 1
    colors = []
    for i in range(n_steps):
        t = i / (n_steps - 1)  # [0,1]
        scaled = t * n_segments
        seg = min(n_segments - 1, int(scaled))
        local_t = scaled - seg
        c0 = base_rgbs[seg]
        c1 = base_rgbs[seg + 1]
        interp = (
            (1 - local_t) * c0[0] + local_t * c1[0],
            (1 - local_t) * c0[1] + local_t * c1[1],
            (1 - local_t) * c0[2] + local_t * c1[2],
        )
        colors.append(_rgb_to_hex(interp))
    return colors

# Parse your string gradients into lists of "#rrggbb"
gradient_schemes = []
for s in hex_gradient_list:
    parts = [p.strip() for p in s.split(",") if p.strip()]
    gradient_schemes.append(["#" + p.lstrip("#")[:6] for p in parts])

def get_chain_residues(pdb_str):
    """
    Return { chain_id : [resi1, resi2, ...] } in first-appearance order.
    """
    chain_residues = {}
    seen = set()
    for line in pdb_str.splitlines():
        if not (line.startswith("ATOM") or line.startswith("HETATM")):
            continue
        chain = line[21].strip() or "_"
        resi = line[22:26].strip()
        icode = line[26].strip()
        key = (chain, resi, icode)
        if key in seen:
            continue
        seen.add(key)
        chain_residues.setdefault(chain, []).append(resi)
    return chain_residues

def apply_chain_gradients(view, pdb_str):
    """
    If only one chain:
        - use single_chain_gradient_base along that chain.
    If multiple chains:
        - cycle through gradient_schemes by chain order, as before.
    """
    chain_residues = get_chain_residues(pdb_str)
    chain_ids = sorted(chain_residues.keys())
    if not chain_ids:
        return

    # --- Single-chain mode ---
    if len(chain_ids) == 1:
        chain = chain_ids[0]
        res_list = chain_residues[chain]
        if not res_list:
            return
        palette = _interpolate_gradient(single_chain_gradient_base, len(res_list))
        for resi, color in zip(res_list, palette):
            view.setStyle(
                {"chain": chain, "resi": resi},
                {"cartoon": {"color": color}},
            )
        return

    # --- Multi-chain mode (previous behavior) ---
    for i, chain in enumerate(chain_ids):
        res_list = chain_residues[chain]
        if not res_list:
            continue
        base_gradient = gradient_schemes[i % len(gradient_schemes)]
        palette = _interpolate_gradient(base_gradient, len(res_list))
        for resi, color in zip(res_list, palette):
            view.setStyle(
                {"chain": chain, "resi": resi},
                {"cartoon": {"color": color}},
            )

def make_gradient_view_for_pdb(pdb_file_path, hbondCutoff=4.0):
    if not pdb_file_path or not os.path.exists(pdb_file_path):
        raise FileNotFoundError(f"No valid PDB file found at: {pdb_file_path}")
    pdb_str = open(pdb_file_path, "r").read()
    view = py3Dmol.view(js="https://3dmol.org/build/3Dmol.js")
    view.addModel(pdb_str, "pdb", {"hbondCutoff": hbondCutoff})
    apply_chain_gradients(view, pdb_str)
    view.zoomTo()
    return view


# ---------- INTERACTIVE VIEWER WITH PREFIX TEXT + DROPDOWN ----------

def make_structure_viewer(initial_prefix="", description="Select PDB:"):
    """
    Widget with:
      - Text box to enter prefix (we glob prefix + '*.pdb')
      - 'Load' button to refresh the dropdown
      - Dropdown of PDB files
      - 3D view that updates when selection changes
    """
    prefix_text = widgets.Text(
        value=initial_prefix,
        description="Prefix:",
        layout=widgets.Layout(width="70%"),
        placeholder="e.g. ./designs/dna_protein_scaffolding_example5b",
    )
    load_button = widgets.Button(
        description="Load",
        button_style="",
        layout=widgets.Layout(width="20%"),
    )
    prefix_row = widgets.HBox([prefix_text, load_button])

    dropdown = widgets.Dropdown(
        options=["No PDB files found"],
        value="No PDB files found",
        description=description,
        layout=widgets.Layout(width="100%"),
    )

    output_area = widgets.Output()

    def plot_design(pdb_file_path):
        output_area.clear_output(wait=True)
        with output_area:
            if (
                not pdb_file_path
                or pdb_file_path == "No PDB files found"
                or not os.path.exists(pdb_file_path)
            ):
                print("No PDB files found for this prefix. Run inference or adjust the prefix.")
                return
            view = make_gradient_view_for_pdb(pdb_file_path)
            view.show()

    def load_designs(_=None):
        prefix = prefix_text.value.strip()
        if prefix:
            pattern = prefix + "*.pdb"
        else:
            pattern = "*.pdb"
        designs = glob.glob(pattern)
        if designs:
            dropdown.options = designs
            dropdown.value = designs[0]
        else:
            dropdown.options = ["No PDB files found"]
            dropdown.value = "No PDB files found"
        plot_design(dropdown.value)

    def on_dropdown_change(change):
        if change["name"] == "value":
            plot_design(change["new"])

    load_button.on_click(load_designs)
    dropdown.observe(on_dropdown_change, names="value")

    # initial load
    load_designs()

    return widgets.VBox([prefix_row, dropdown, output_area])


In [24]:
#@title Download model weights (change checkpoints as desired):
#@markdown

#@markdown Current weights to choose from during inference:
#@markdown * `models/RFDpoly_general_weights.pt` (for generalized design across all polymer classes)
#@markdown * `models/RFDpoly_RNA_only_weights.pt` (best for RNA-only design)

#@markdown All weights can be used in all design contexts, but choice just comes down to what we have found to perform best.
#@markdown At inference time, specify which weights to use via the `inference.ckpt_path="..."` argument.

%%capture
!mkdir models

# Get the general multi-polymer weights:
!wget https://files.ipd.uw.edu/pub/2025_RFDpoly/train_session2024-07-08_1720455712_BFF_3.00.pt
!mv train_session2024-07-08_1720455712_BFF_3.00.pt models/RFDpoly_general_weights.pt

# Get the RNA-optimized weights:
!wget https://files.ipd.uw.edu/pub/2025_RFDpoly/train_session2024-06-27_1719522052_BFF_7.00.pt
!mv train_session2024-06-27_1719522052_BFF_7.00.pt models/RFDpoly_RNA_only_weights.pt


Now on to the fun stuff!
<br/>
<br/>
<br/>

# **<font color='#9BB3E6' size=6.5>Section 1: Unconditional structure generation:</font>**



<img src="https://github.com/RosettaCommons/RFDpoly/blob/colab_tutorial/tutorials/assets/tutorial_figs_01.png?raw=true" width="900" align="middle" style="height:240px">



#### **<font color='#9BB3E6'>Example 1(A): polymer class specification</font>**


For this example, we will just introduce the specification of molecule class for each contiguous chain. Just like you can define the length and motif composition of each chain in your generated structures using the `contigmap.contigs` argument, there is a corresponding argument, `contigmap.polymer_chains` that allows you to list,  

In [25]:
# @markdown Run diffusion trajectories:

%%capture
!singularity run --nv -B /usr/lib64-nvidia:/usr/lib64-nvidia --env LD_LIBRARY_PATH=/usr/lib64-nvidia:$LD_LIBRARY_PATH rf_se3_diffusion.sif -u rf_diffusion/run_inference.py --config-name=multi_polymer diffuser.T=50 \
      inference.ckpt_path="models/RFDpoly_general_weights.pt" \
      inference.num_designs=1 \
      contigmap.contigs="['15 35 75 45']" \
      contigmap.polymer_chains="['dna','rna','protein','protein']" \
      inference.save_rna_oneletter=True \
      inference.output_prefix='./designs/uncond_multipolymer_example1a'

#### **<font color='#9BB3E6'>Example 1(B): sequence-structure codesign</font>**

Typically, the diffusion ooutputs will have mask tokens for each polymer type in the output sequence, such that our designs don't have sidechains. This is fine, as we will typically want to design the sequences using [NA-MPNN](https://www.biorxiv.org/content/10.1101/2025.10.03.679414v1) [2].

But until we assign sequence identity to each position, all we have in our NA outputs is a sugar-phosphate backbone, and honestly, nucleic acids look *very* awkward without sidechains.

To avoid this discomfort, we can perform sequence-structure codesign during the denoising trajectory using autoregressive decoding (over 40 denoising steps), as done below.

* Setting `inference.update_seq_t` to `True` means allowing sequence to update over the course of denoising trajectories, as structure updates.
* Setting `diffuser.aa_decode_steps` to `40` means to spread the sequence updates over the course of the last 40 denoising steps.

In [26]:
# @markdown Run diffusion trajectories with autoregressive sequence codesign:
%%capture
!singularity run --nv -B /usr/lib64-nvidia:/usr/lib64-nvidia --env LD_LIBRARY_PATH=/usr/lib64-nvidia:$LD_LIBRARY_PATH rf_se3_diffusion.sif -u rf_diffusion/run_inference.py --config-name=multi_polymer diffuser.T=50 inference.save_rna_oneletter=True \
      inference.ckpt_path="models/RFDpoly_general_weights.pt" \
      inference.num_designs=3 \
      contigmap.contigs="['25 55 60 45']" \
      contigmap.polymer_chains="['dna','rna','protein','protein']" \
      inference.update_seq_t=True \
      diffuser.aa_decode_steps=40 \
      inference.output_prefix='./designs/uncond_multipolymer_example1b'

In [27]:
# @markdown Visualize outputs

viewer = make_structure_viewer(
    initial_prefix="./designs/uncond_multipolymer_example1b",
    description="Design:"
)
display(viewer)

VBox(children=(HBox(children=(Text(value='./designs/uncond_multipolymer_example1b', description='Prefix:', lay…

Wow! Now that looks *much* nicer!

<br/>
<br/>
<br/>


# **<font color='#9BB3E6' size=6.5>Section 2: Secondary strcture control with basepair networks</font>**

<img src="https://github.com/RosettaCommons/RFDpoly/blob/colab_tutorial/tutorials/assets/tutorial_figs_02.png?raw=true" width=800 align="middle" style="height:240px">

In this section we will design RNA with defined secondary structures by using explicit base pair network conditioning during the diffusion trajectory.



#### **<font color='#9BB3E6'>Example 2(A): dot bracket notation (single‑chain RNA pseudoknot)</font>**
RFDpoly can condition based on user input of secondary strucure strings in the form of dot–bracket–style notation, but because the Hydra config system does not allow
parentheses in command‑line arguments, we replace them with other matching symbol pairs.
For simple helices we will use:

* `5` / `3` – analogous to `(` and `)` indicating a 5′→3′ paired to a 3′→5′ segment
* `.` – explicityly unpaired positions
* `?` – unspecified (default, masked)



Below, we reproduce the Pseudoknot **S2-KL1**, an Eterna puzzle from the [OpenKnot competition](https://eternagame.org/labs/13389097), by providing its secondary structure string to RFDpoly, in order to generate a 90‑nt RNA that realizes this pseudoknotted secondary structure.


The original Eterna secondary structure is:

```text
.(((((((((((((((((((..[[[[[[.)))))(((....)))(((....)))))))))))))))))((((((..]]]]]].)))))).
```

To make this compatible with Hydra, we replace:
* `(` → `5` , for *5-prime*
* `)` → `3` , for *3-prime*
* `[` → `f` , for *"from"*
* `]` → `t` , for *"to"*
* `{` → `i` , for *"iterator"*
* `}` → `j` , for *"jtorator"*
* `<` → `b` , for *"begining"*
* `>` → `b` , for *"end"*

```text
.5555555555555555555..ffffff.33333555....333555....33333333333333333555555..tttttt.333333.
```

We then pass this string via `scaffoldguided.target_ss_string` while designing a single 90‑nt RNA chain. See the RFDpoly documentation for a full list of the dot-bracket notation tokens for generating higher-order pseudoknots.

Additionally, we will be switching to use of the `models/RFDpoly_RNA_only_weights.pt` for the design of RNA monomers.

In [28]:
# @markdown Run diffusion for an RNA pseudoknot

%%capture
!singularity run --nv -B /usr/lib64-nvidia:/usr/lib64-nvidia --env LD_LIBRARY_PATH=/usr/lib64-nvidia:$LD_LIBRARY_PATH rf_se3_diffusion.sif -u rf_diffusion/run_inference.py --config-name=multi_polymer diffuser.T=50 inference.save_rna_oneletter=True \
      inference.ckpt_path="models/RFDpoly_RNA_only_weights.pt" \
      inference.num_designs=2 \
      contigmap.contigs="['90']" \
      contigmap.polymer_chains="['rna']" \
      scaffoldguided.target_ss_string=".5555555555555555555..ffffff.33333555....333555....33333333333333333555555..tttttt.333333." \
      inference.update_seq_t=True diffuser.aa_decode_steps=40 \
      inference.output_prefix='./designs/rna_pseudoknot_example2a'


In [29]:
# @markdown Visualize outputs

viewer = make_structure_viewer(
    initial_prefix="./designs/rna_pseudoknot_example2a",
    description="Design:"
)
display(viewer)


VBox(children=(HBox(children=(Text(value='./designs/rna_pseudoknot_example2a', description='Prefix:', layout=L…

NOTE: You can insert multiple pseudoknotted motifs into specific regions of the output chain(s) using
`scaffoldguided.target_ss_string_list`. Each entry has the form:

```text
'<CHAIN_ID><START>-<END>:<secondary-structure-string>'
```

For example, to insert the same pseudoknot motif into two separate 90‑nt segments on chains `A`
and `B` you could use:

```bash
scaffoldguided.target_ss_string_list=[
  'B1-90:.5555555555555555555..ffffff.33333555....333555....33333333333333333555555..tttttt.333333.',
  'A116-205:.5555555555555555555..ffffff.33333555....333555....33333333333333333555555..tttttt.333333.'
]
```

Here the `A116-205`/`B1-90` ranges refer to **output** indices, not the input PDB.

### **<font color='#9BB3E6' size=4.5>Example 2(B): Symmetry and paired region lists (D2‑symmetric four‑strand DNA origami)</font>**

RFDpoly can enforce symmetry while also constraining base‑pairing patterns, making it a natural tool for designing **DNA polyhedra** and mini‑origami structures.

We will start with a four‑strand **D2‑symmetric** DNA cage and then show how to introduce pseudo‑cyclic symmetry within a single long RNA strand.

Here we design four 60‑nt DNA strands arranged with `d2` symmetry. The argument
`scaffoldguided.target_ss_pairs` defines which segments are paired; each pair lists two
ranges that must have equal length.

In [30]:
# @markdown Run diffusion trajectories for a symmetric DNA polyhedron
%%capture
!singularity run --nv -B /usr/lib64-nvidia:/usr/lib64-nvidia --env LD_LIBRARY_PATH=/usr/lib64-nvidia:$LD_LIBRARY_PATH rf_se3_diffusion.sif -u rf_diffusion/run_inference.py --config-name=multi_polymer diffuser.T=50 \
      inference.ckpt_path="models/RFDpoly_general_weights.pt" \
      inference.num_designs=3 \
      contigmap.contigs="['60 60 60 60']" \
      contigmap.polymer_chains="['dna','dna','dna','dna']" \
      scaffoldguided.target_ss_pairs="['A1-20,B1-20','A21-40,C21-40','A41-60,D41-60','B21-40,D21-40','B41-60,C41-60','C1-20,D1-20']" \
      inference.symmetry='d2' \
      inference.update_seq_t=True diffuser.aa_decode_steps=40 \
      inference.output_prefix='./designs/dna_origami_sympoly_example2b'


In [31]:
# @markdown Visualize outputs

viewer = make_structure_viewer(
    initial_prefix="./designs/dna_origami_sympoly_example2b",
    description="Design:"
)
display(viewer)

VBox(children=(HBox(children=(Text(value='./designs/dna_origami_sympoly_example2b', description='Prefix:', lay…

#### **<font color='#9BB3E6'>Example 2(C): Pseudo‑cyclic symmetry in a single RNA strand</font>**

As a more advanced example, we can reproduce the designs done in Figure 3 of ***Favor et al***, where we generate a **pseudo‑cyclic symmetric** RNA where repeated helical motifs are arranged approximately with C2 symmetry along a single 240‑nt chain. This is controlled via the following arguments:

* `inference.pseudo_symmetry` – the target point‑group (e.g. `c2`)
* `inference.n_repeats` – how many repeats to tile around the pseudo‑symmetry
* `scaffoldguided.target_ss_pairs` – the base pair pattern for the *whole structure*, which controls contacts across different repeats and sub-symmetries (can even control topology between multiple chains in a symmetric complex).

In [32]:
# @markdown Run diffusion trajectories for a pseudo‑cyclic symmetric RNA
%%capture
!singularity run --nv -B /usr/lib64-nvidia:/usr/lib64-nvidia --env LD_LIBRARY_PATH=/usr/lib64-nvidia:$LD_LIBRARY_PATH rf_se3_diffusion.sif -u rf_diffusion/run_inference.py --config-name=multi_polymer diffuser.T=50 inference.save_rna_oneletter=True \
      inference.ckpt_path="models/RFDpoly_RNA_only_weights.pt" \
      inference.num_designs=3 \
      contigmap.contigs="['372']" \
      contigmap.polymer_chains="['rna']" \
      inference.pseudo_symmetry='c2' \
      inference.n_repeats=2 \
      scaffoldguided.target_ss_string=".5555555555555555555555555555....33335555555fffffff3333333555555555....3333333333333333333333333..55555555555.5555555555....3333333333.555fffffff333.55555....33333.33333333333..5555555....333333355555555555555555555....33335555555ttttttt3333333555555555....3333333333333333333333333..5555555555555555555555....3333333333.555ttttttt333.55555....33333333333333333..33333333." \
      inference.update_seq_t=True diffuser.aa_decode_steps=40 \
      inference.write_trajectory=True \
      inference.output_prefix='./designs/rna_pseudoC2_example2c'


In [33]:
# @markdown Visualize outputs

viewer = make_structure_viewer(
    initial_prefix="./designs/rna_pseudoC2_example2c",
    description="Design:"
)
display(viewer)



VBox(children=(HBox(children=(Text(value='./designs/rna_pseudoC2_example2c', description='Prefix:', layout=Lay…


#### **<font color='#9BB3E6'>Example 2(D): Strand orientation lists (Single‑chain RNA triple helix)</font>**


At this point you're probably wondering why we don't need to specify the orientation of each pair of DNA strands, and that's totally reasonable! The default behavior when specifying paired strand regions is to assume anti-parallel orientation, but creative freedom is lit, so we can totally pair strands in whatever orientation we want.

To explore this further, we will design an **RNA triple helix** by specifying both which residues are base‑paired and
whether the paired segments are parallel (`P`) or antiparallel (`A`). By default all pairs are treated as antiparallel;
the `scaffoldguided.target_ss_pair_ori` argument lets you override this on a per‑group basis.
<!--
We will design a 75‑nt RNA chain with two paired regions specified:


And then some overlapping paired regions, one parallel, one antiparallel:
* `A5–20` paired to `A55–70` (parallel orientation)
* `A55–70` paired to `A30–45` (antiparallel orientation)

using the following arguments:
```
scaffoldguided.target_ss_pairs="['A5-20,A55-70','A55-70,A30-45']"
scaffoldguided.target_ss_pair_ori="['P','A']"
```
(the length of the list in `target_ss_pair_ori` must match the length of paired strands in `target_ss_pairs`).

This produces an RNA triple helix, stabilized by three strands coming into contact along the `A55-70` region. -->


We will design a 76‑nt RNA chain with two standard paired regions (separated):
* `A2–6` paired to `A50–54` (antiparallel orientation)
* `A18–25` paired to `A29-36` (antiparallel orientation)

And then some overlapping paired regions, one parallel, one antiparallel:
* `A7–17` paired to `A66–76` (parallel orientation)
* `A37–47` paired to `A66–76` (antiparallel orientation)


This produces an RNA triple helix, stabilized by multiple contacts along the `A66-76` region.

In [34]:
%%capture
!singularity run --nv -B /usr/lib64-nvidia:/usr/lib64-nvidia --env LD_LIBRARY_PATH=/usr/lib64-nvidia:$LD_LIBRARY_PATH rf_se3_diffusion.sif -u rf_diffusion/run_inference.py --config-name=multi_polymer diffuser.T=50 inference.save_rna_oneletter=True \
      inference.ckpt_path="models/RFDpoly_RNA_only_weights.pt" \
      inference.num_designs=3 \
      contigmap.contigs="['76']" \
      contigmap.polymer_chains="['rna']" \
      scaffoldguided.target_ss_pairs="['A2-6,A50-54','A18-25,A29-36','A7-17,A66-76','A37-47,A66-76']" \
      scaffoldguided.target_ss_pair_ori="['A','A','P','A']" \
      inference.update_seq_t=True diffuser.aa_decode_steps=30 \
      inference.output_prefix='./designs/rna_triple_helix_example2d'




In [45]:
# @markdown Visualize outputs

viewer = make_structure_viewer(
    initial_prefix="./designs/rna_triple_helix_example2d",
    description="Design:"
)
display(viewer)

VBox(children=(HBox(children=(Text(value='./designs/rna_triple_helix_example2d', description='Prefix:', layout…


#### **<font color='#9BB3E6'>Example 2(E): multi-base contacts and explicitly unpaired loop lists (three-hairpin contact)</font>**

Next, we will create three RNA hairpins, and have them fold towards each other and form loop-contacts. We can “staple” distal loops in RNA pseudoknots together by specifying regions of multi-base contacts, using the `scaffoldguided.force_multi_contacts` argument.

This is important, because secondary structure strings cannot encode beyond simple two-base pair configurations, and we want to go above and beyond that.

As in the previous examples, we can start by defining three standard helical regions as follows:
```
scaffoldguided.target_ss_pairs="['A5-20,A30-45','A50-65,A75-90','A95-110,A120-135']"
```

But this time, we will use `force_multi_contacts` and `force_loops_list`, to pin the generated structure in the desired specific spatial configuation, rather than letting the helical regions flop around freely.

To hold the hairpin-loops together, we will define three sets of three-base contacts (but you can specify arbitrarily large sets of base contacts using this system):
```
scaffoldguided.force_multi_contacts="['A24,A25,A71','A69,A70,A116','A114,A115,A26']"
```

While we're at it, we can specify if we want to force loop placement in a specific region (for us, ), using scaffoldguided.force_loops_list:
```
scaffoldguided.force_loops_list="['A47-48','A92-93']"
```
Which allows us to introduce two flexible unpaired loops between our helical domains.

In [36]:
# @markdown Run diffusion trajectories for triple-loop contacts:
%%capture

!singularity run --nv -B /usr/lib64-nvidia:/usr/lib64-nvidia --env LD_LIBRARY_PATH=/usr/lib64-nvidia:$LD_LIBRARY_PATH rf_se3_diffusion.sif -u rf_diffusion/run_inference.py --config-name=multi_polymer diffuser.T=50 inference.save_rna_oneletter=True \
      inference.ckpt_path="models/RFDpoly_RNA_only_weights.pt" \
      inference.num_designs=3 \
      contigmap.contigs="['139']" \
      contigmap.polymer_chains="['rna']" \
      scaffoldguided.target_ss_pairs="['A5-20,A30-45','A50-65,A75-90','A95-110,A120-135']" \
      scaffoldguided.force_multi_contacts="['A24,A25,A71','A69,A70,A116','A114,A115,A26']" \
      scaffoldguided.force_loops_list="['A47-48','A92-93']" \
      inference.update_seq_t=True diffuser.aa_decode_steps=30 \
      inference.write_trajectory=True \
      inference.output_prefix='./designs/loop_contact_example2e'


In [37]:
# @markdown Visualize outputs

viewer = make_structure_viewer(
    initial_prefix="./designs/loop_contact_example2e",
    description="Design:"
)
display(viewer)

VBox(children=(HBox(children=(Text(value='./designs/loop_contact_example2e', description='Prefix:', layout=Lay…


<br/>
<br/>
<br/>


# **<font color='#9BB3E6' size=6.5>Section 3: Motif scaffolding of DNA‑binding proteins</font>**

RFDpoly can *scaffold* pre-existing structural motifs (e.g., DNA‑binding protein domains bound to short
DNA duplexes) and inpaint missing segments to create larger assemblies. The inputs for this section will be previously designed DNA binding proteins (*DBPs*) [3].

Two key arguments are:

* `contigmap.contigs` – describes how input motifs are stitched together in a given chain, following the same syntax as all previous versions of RFdiffusion.
* `inference.ij_visible` – groups motifs whose **relative distances and orientations** should remain fixed
  during diffusion (e.g., keep each one set of bound multimer chains locked in contact about their interface, while allowing additional separate motifs to move about freely during the denoising process). Credit to David Juergens for coming up with this super intuitive system, which is described in more detail [here](https://github.com/baker-laboratory/CA_RFDiffusion) [4].

### Input motif inspection:
Before we begin, let's consider the structures from our input pdb files, which contain DNA-binding proteins (DBPs) from [**Glasscock et al**](https://www.nature.com/articles/s41594-025-01669-4).

The cell below allows us to look at them before running motif-scaffolding diffusion:


In [38]:
# @markdown Visualize Inputs:


# input_filepath = "rf_diffusion/test_data/DBP35opt_DBP48.pdb" #@param {type:"string"}
# designs = glob.glob(output_design_prefix + '*.pdb')

# @title Visualize Inputs:

viewer = make_structure_viewer(
    initial_prefix="rf_diffusion/test_data/DBP35opt_DBP48",
    description="Design:"
)
display(viewer)


VBox(children=(HBox(children=(Text(value='rf_diffusion/test_data/DBP35opt_DBP48', description='Prefix:', layou…

We see two motif sets, each with three chains (one protein chain, two DNA chains), corresponding to DBP35, DBP48 and their target DNA sequences.  
In our meotif scaffolding tasks, we will want to preserve the relative positioning of each DBP with its target DNAs (fixed intra-set motif geometry), while allowing the relative positions between motif sets to be varied during diffusion (free inter-set motif geometry).

In previous RFdiffusion implementations, diffused structure was generated around fixed motif coordinates, so the placement of motifs in the input pdb file directly matched the placement of motifs in the output pdb file.

In the following examples, we will see how the relative positions between motif sets move freely, as needed to for structure generation, while scaffold geometry is preserved.

### **<font color='#9BB3E6'>Example 3(A): Connecting two DNA chains, and keeping bound DBPs locked to their correct binding sites.</font>**

In this example, we start from an input PDB containing two DNA‑binding proteins each bound to ashort DNA duplex. We then:

* keep the protein–DNA contacts rigid within each motif group using the `ij_visible` argument
* inpaint the missing DNA between them (inserting a 4nt spacer between binding sites)
* allow the two groups to move relative to each other
* Add some extra extra upstream and downstream DNA to create a longer helix model

#### Control of motif groups:
The “inference.ij_visible” argument controls which motifs listed in the contigs have their relative orientations locked during inference.
In the following contact map, we will assign lowercase letters to each motif from the input pdb, based on the order that they occur in the contigs (not to be confused with the uppercase letters, which represent chains in the input pdb):

<img src="https://github.com/RosettaCommons/RFDpoly/blob/colab_tutorial/tutorials/assets/tutorial_figs_04.png?raw=true" height="240" align="middle" style="height:240px">


To avoid confusion contigs referenceing chain letters in the input pdb (typically uppercase), the `ij_visible` uses lowercase letters to reference the alphabetically-ordered placement of each motif as you go through the contig line's specification of the output structure's topology.

in the motif-group strings, independent motif sets are separated by a `'-'` symbol, with sets being defined by the single-letter motif fragment identifiers – if the motif-group string is `'acf-bde'`, that means:
* motifs a,c,f have their pairwise distance/angle parameters (off-diagonals visible to eachother); motifs b,d,e have their pairwise distance/angle parameters (off-diagonals visible to eachother).
* between these groups, the off-diagonal distance/angle parameters are masked out (or "invisible").


In [39]:
# @title Run diffusion trajectories for DNA–protein motif scaffolding (two DNA chains)
%%capture

!singularity run --nv -B /usr/lib64-nvidia:/usr/lib64-nvidia --env LD_LIBRARY_PATH=/usr/lib64-nvidia:$LD_LIBRARY_PATH rf_se3_diffusion.sif -u rf_diffusion/run_inference.py --config-name=multi_polymer diffuser.T=50 \
      inference.ckpt_path="models/RFDpoly_general_weights.pt" \
      inference.num_designs=3 \
      contigmap.contigs="['A1-63 D1-65 10,B6-11,8,F2-8,10 10,E7-13,8,C5-10,10']" \
      contigmap.polymer_chains="['protein','protein','dna','dna']" \
      inference.ij_visible='acf-bde' \
      inference.input_pdb='rf_diffusion/test_data/DBP35opt_DBP48.pdb' \
      inference.update_seq_t=True diffuser.aa_decode_steps=20 \
      inference.output_prefix='./designs/dna_protein_scaffolding_example3a'


In [46]:
# @markdown Visualize outputs

viewer = make_structure_viewer(
    initial_prefix="./designs/dna_protein_scaffolding_example5a",
    description="Design:"
)
display(viewer)


VBox(children=(HBox(children=(Text(value='./designs/dna_protein_scaffolding_example5a', description='Prefix:',…

Beautiful! We have placed our DBPs along a unified double-helix. Each input DBP is correctly locked in its corresponding binding site, as intended using the `ij_visible` system to group motif sets, while allowing the relative orientation between motif sets to be masked in order to vary global placements from that of the input pdb file.

Also, we generated some new DNA domains upstream and downstream from the binding sites. Check out the different diffusion outputs and see how these we are able to model subtle conformational variation in the DNA structure by running multiple diffusion trajectories.
<br/>
<br/>
<br/>


### **<font color='#9BB3E6'>Example 3(B): Inpainting two DNA chains and fusing two DBPs into a single protein chain (triple chain fusion)</font>**

We can also **merge the proteins into a single chain** in the contig description while still using the same underlying PDB.

This setup is similar to the example above, but now we will connect the two protein domains into a single chain by diffusing a rigid linker:

<img src="https://github.com/RosettaCommons/RFDpoly/blob/colab_tutorial/tutorials/assets/tutorial_figs_04.png?raw=true" height="240" align="middle" style="height:240px">



In [41]:
# @markdown Run diffusion trajectories for DNA–protein motif scaffolding (merged protein chain)
%%capture
!singularity run --nv -B /usr/lib64-nvidia:/usr/lib64-nvidia --env LD_LIBRARY_PATH=/usr/lib64-nvidia:$LD_LIBRARY_PATH rf_se3_diffusion.sif -u rf_diffusion/run_inference.py --config-name=multi_polymer diffuser.T=50 \
      inference.ckpt_path="models/RFDpoly_general_weights.pt" \
      inference.num_designs=3 \
      contigmap.contigs="['A14-50,90,D14-51 B1-12,4,F1-14 E1-14,4,C4-15']" \
      contigmap.polymer_chains="['protein','dna','dna']" \
      inference.ij_visible='acf-bde' \
      inference.input_pdb='rf_diffusion/test_data/DBP35opt_DBP48.pdb' \
      inference.update_seq_t=True diffuser.aa_decode_steps=35 \
      inference.write_trajectory=True \
      inference.output_prefix='./designs/dna_protein_scaffolding_example3b'


In [48]:
# @markdown Visualize outputs

viewer = make_structure_viewer(
    initial_prefix="./designs/dna_protein_scaffolding_example3b",
    description="Design:"
)
display(viewer)


VBox(children=(HBox(children=(Text(value='./designs/dna_protein_scaffolding_example3b', description='Prefix:',…

### **<font color='#9BB3E6'>Example 3(C): De novo Holliday junction design</font>**

Finally, we combine symmetry, motif scaffolding, and base‑pair constraints to design
**Holliday‑junction–like DNA–protein complexes**.

The idea is to:

1. Start from a protein–DNA scaffold containing two DNA‑binding domains.
2. Duplicate the scaffold with **C2 symmetry**.
3. Use strand exchange and base‑pair constraints to route the DNA strands through a Holliday
   junction between the two symmetric copies.

This is captured in Example 10 of the manual, and follows **Figure 5** of *Favor et al*.

### Symetric motif scaffolding:
By inserting protein and DNA motifs into contig chains, and locking motif groups in place as follows
```
contigmap.contigs="['A14-50,90,D14-51 15,B6-12,4,F1-8,30,E7-14,4,C4-10,15 A14-50,90,D14-51 15,B6-12,4,F1-8,30,E7-14,4,C4-10,15']"
contigmap.polymer_chains="['protein','dna','protein','dna']"
inference.ij_visible='acl-bde-gfi-hkj'
```
we can preserve existing DBP-helix interfaces, and use them to scaffold the domains about a strand-exchanging junction.

### Strand-exchange:
Note: the chain-residue specifications in `scaffoldguided.target_ss_pairs` refer to positions in the *output structures*, not the input pdb file, despite using similar chain-index notation as used to reference input structure regions in the `contigmap.contigs` arguments.

### Guiding potentials:
To promote rigidity,  as described in *Favor et al*, we will apply attractive auxiliary potentials between the newly generated protein domains (in output chains A and C), using the following arguments:
```
potentials.guiding_potentials="['type:olig_contacts,weight_intra:0.0,weight_inter:1']"
potentials.guide_scale=0.8
potentials.guide_decay="cubic"
potentials.olig_inter_all=False
potentials.olig_intra_all=False
potentials.olig_custom_contact='"A&C"'
```

Run the trajectories and see what the outputs look like.



In [43]:
# @markdown Run diffusion trajectories for a Holliday‑junction–style design
%%capture

!singularity run --nv -B /usr/lib64-nvidia:/usr/lib64-nvidia --env LD_LIBRARY_PATH=/usr/lib64-nvidia:$LD_LIBRARY_PATH rf_se3_diffusion.sif -u rf_diffusion/run_inference.py --config-name=multi_polymer diffuser.T=50 \
      inference.ckpt_path="models/RFDpoly_general_weights.pt" \
      inference.symmetry='c2' \
      inference.num_designs=3 \
      contigmap.contigs="['A1-61,80,D14-65 15,B6-12,4,F1-8,30,E7-14,4,C4-10,15 A1-61,80,D14-65 15,B6-12,4,F1-8,30,E7-14,4,C4-10,15']" \
      contigmap.polymer_chains="['protein','dna','protein','dna']" \
      inference.ij_visible='acl-bde-gfi-hkj' \
      potentials.guiding_potentials="['type:olig_contacts,weight_intra:0.0,weight_inter:1']" \
      potentials.guide_scale=0.8 \
      potentials.guide_decay="cubic" \
      potentials.olig_inter_all=False \
      potentials.olig_intra_all=False \
      potentials.olig_custom_contact='"A&C"' \
      scaffoldguided.target_ss_pairs="['B1-23,D76-98','B24-46,B55-75','B76-98,D1-23','D24-46,D55-75']" \
      inference.update_seq_t=True diffuser.aa_decode_steps=30 \
      inference.write_trajectory=True \
      inference.input_pdb='rf_diffusion/test_data/DBP35opt_DBP48.pdb' \
      inference.output_prefix='./designs/holliday_junction_example3c'



The combined use of `target_ss_pairs` and `ij_visible` here may be a bit confusing at first, but understanding how to use these controls together is a powerful tool.

**Deriving why these specific settings result in the target topology is left as an exercise for the reader**.

HINT: (1) draw out all the chains, and how the DNA chains connect in regions specified by `target_ss_pairs`. (2) Label placement of motif fragments in each chain, making note of where fragments 'a'...'l' end up. (3) consider how anything grouped by `ij_visible` must be locked together in a shared interface.

In [47]:
# @markdown Visualize outputs

viewer = make_structure_viewer(
    initial_prefix="./designs/holliday_junction_example3c",
    description="Design:"
)
display(viewer)


VBox(children=(HBox(children=(Text(value='./designs/holliday_junction_example3c', description='Prefix:', layou…

#
# **<font color='#9BB3E6' size=6.5>References</font>**


1.   Favor, Andrew, Riley Quijano, Elizaveta Chernova, Andrew Kubaney, Connor Weidle, Morgan A. Esler, Lilian McHugh et al. "De novo design of RNA and nucleoprotein complexes." bioRxiv (2025): 2025-10.

2.   Kubaney, Andrew, Andrew Favor, Lilian McHugh, Raktim Mitra, Robert Pecoraro, Justas Dauparas, Cameron Glasscock, and David Baker. "RNA sequence design and protein–DNA specificity prediction with NA-MPNN." bioRxiv (2025): 2025-10.

3.   Glasscock, Cameron J., Robert J. Pecoraro, Ryan McHugh, Lindsey A. Doyle, Wei Chen, Olivier Boivin, Beau Lonnquist et al. "Computational design of sequence-specific DNA-binding proteins." Nature Structural & Molecular Biology (2025): 1-10.

4.   Lauko, Anna, Samuel J. Pellock, Kiera H. Sumida, Ivan Anishchenko, David Juergens, Woody Ahern, Jihun Jeung et al. "Computational design of serine hydrolases." Science 388, no. 6744 (2025): eadu2454.
