<a href="https://colab.research.google.com/github/VKleinSousa/RBPseg/blob/main/rbpseg_merge_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RBPseg-merge

Welcome! This Google Colab notebook helps you **merge fractionated structure predictions** into a single cohesive model — particularly useful for large, modular proteins like **phage tail fibers**.

---

## 🧪 Use Case Example

You may want to use this if:

1. You predicted your structure in **fractions** using a server like Alphafold.
2. Your PDB files are named like:

    ```
    protein1_seq_0_ranked_0.pdb  
    protein1_seq_0_ranked_1.pdb  
    ...
    protein1_seq_n_ranked_m.pdb
    ```

3. You now want to **stitch them back together** into a full-length model.

---

## 📦 What You Need

- ✅ Your **fraction `.pdb` files**
- ✅ The **length of the overlapping regions** between adjacent fractions:
    - You can generate these using `rbpseg-sdp`
    - Or estimate/define them manually if known

---

## 📝 Citation

If you use this tool in your research, please cite:



```
@article{klein2025rbpseg,
  title={RBPseg: Toward a complete phage tail fiber structure atlas},
  author={Klein-Sousa, Victor and Roa-Eguiara, Aritz and Kielkopf, Claudia S and Sofos, Nicholas and Taylor, Nicholas MI},
  journal={Science Advances},
  volume={11},
  number={23},
  pages={eadv0870},
  year={2025},
  publisher={American Association for the Advancement of Science}
}

```

## Contact

[x (twitter)
](https://x.com/vkleinsousa)

[bluesky
](https://bsky.app/profile/vkleinsousa.bsky.social)

e-mail: victor.klein@cpr.ku.dk



## 💽 Instalation

In [1]:
# @title
!pip install -q condacolab
import condacolab
condacolab.install()

!conda install -c conda-forge -c bioconda foldseek pdbfixer openmm usalign -y

!pip install git+https://github.com/VKleinSousa/RBPseg.git@rbpseg-1.1.1-test



✨🍰✨ Everything looks OK!
Channels:
 - conda-forge
 - bioconda
Platform: linux-64
Collecting package metadata (repodata.json): - \ | / - \ | / - \ | / - done
Solving environment: | / - done


    current version: 24.11.3
    latest version: 25.5.1

Please update conda by running

    $ conda update -n base -c conda-forge conda



# All requested packages already installed.

Collecting git+https://github.com/VKleinSousa/RBPseg.git@rbpseg-1.1.1-test
  Cloning https://github.com/VKleinSousa/RBPseg.git (to revision rbpseg-1.1.1-test) to /tmp/pip-req-build-aidse2cu
  Running command git clone --filter=blob:none --quiet https://github.com/VKleinSousa/RBPseg.git /tmp/pip-req-build-aidse2cu
  Running command git checkout -b rbpseg-1.1.1-test --track origin/rbpseg-1.1.1-test
  Switched to a new branch 'rbpseg-1.1.1-test'
  Branch 'rbpseg-1.1.1-test' set up to track remote branch 'rbpseg-1.1.1-test' from 'origin'.
  Resolved https://github.com/VKleinSousa/RBPseg.gi

**Verify instalation**

In [2]:
# @title

import os
os.environ['MPLBACKEND'] = 'Agg'
!rbpseg-merge -h

usage: rbpseg-merge [-h] -d DIRECTORY [-o OVERHANG] [-of OVERHANG_FILE]
                    [-f FUNCTION] [-n SAVE_NAME] [-c CHAIN_MODE] [-r] [-b]

___________________________________________
            RBPseg-merge v1.1.1
___________________________________________
RBPseg-merge module performs superimposition and merging of AF2 fractions.

Remeber to prepare your directory before using this. If your files were generated by AF2 or AF3 you can use the script: rbpseg/merge/prepare_files_for_merge.py 

options:
  -h, --help            show this help message and exit
  -d DIRECTORY, --directory DIRECTORY
                        Path to the directory containing PDB files
  -o OVERHANG, --overhang OVERHANG
                        Overhang size (default: 50)
  -of OVERHANG_FILE, --overhang_file OVERHANG_FILE
                        Overhangs file
  -f FUNCTION, --function FUNCTION
                        Superimpose function: 0 to global, 1 to local
                        (default: 1)
  -n 

## 📁 Uploading files

In [11]:
# @title
from IPython.display import display, Markdown
import ipywidgets as widgets
import pandas as pd
import os
import shutil
import subprocess

# ---------------------- Widgets ----------------------

segment_predictions_widget = widgets.FileUpload(multiple=True)
overlap_files_widget = widgets.FileUpload(multiple=False)

overlap_lengths_widget = widgets.Text(
    placeholder='e.g. 50 32 100',
    description='Lengths:',
    style={'description_width': 'initial'}
)

upload_button = widgets.Button(
    description='Upload Files',
    button_style='primary'
)

run_button = widgets.Button(
    description='Run rbpseg-merge',
    button_style='success'
)

clear_button = widgets.Button(
    description='Clear',
    button_style='danger'
)

# ---------------------- CSV Creation ----------------------

def create_overlap_csv_from_input(input_string, output_path="overlaps.csv"):
    lengths = [int(x) for x in input_string.strip().split()]
    segments = [f"Segment_{i+1}" for i in range(len(lengths))]
    df = pd.DataFrame({'Segment': segments, 'Length': lengths})
    df.to_csv(output_path, index=False)
    return output_path

# ---------------------- Run rbpseg-merge ----------------------

def run_rbpseg_merge(directory, overlaps_csv, function, save_name, chain_mode, relax, bfactor):
    cmd = [
        "rbpseg-merge",
        "-d", directory,
        "-of", overlaps_csv,
        "-f", str(function),
        "-n", save_name,
        "-c", str(chain_mode)
    ]
    if relax:
        cmd.append("-r")
    if bfactor:
        cmd.append("-b")

    print("🧪 Running command:\n", " ".join(cmd))

    try:
        output = subprocess.check_output(cmd, stderr=subprocess.STDOUT, text=True)
        print("✅ Done.")
        print(output)
    except subprocess.CalledProcessError as e:
        print("❌ Error running rbpseg-merge:")
        print(e.output)

# ---------------------- Event Handlers ----------------------

def on_upload_clicked(b):
    # Save uploaded prediction folder
    pred_dir = "/content/segment_predictions"
    os.makedirs(pred_dir, exist_ok=True)
    print('Uploading predictions...')
    for filename, file_info in segment_predictions_widget.value.items():
        with open(os.path.join(pred_dir, filename), "wb") as fp:
            fp.write(file_info['content'])
    print('✅ Predictions uploaded.')

    # Handle overlaps file
    if overlap_files_widget.value:
        # Save uploaded overlaps file
        overlap_file_info = list(overlap_files_widget.value.values())[0]
        overlaps_csv = os.path.join("/content", overlap_file_info['metadata']['name'])
        with open(overlaps_csv, "wb") as fp:
            fp.write(overlap_file_info['content'])
        print(f"Using uploaded overlaps file: {overlaps_csv}")
    else:
        # Create overlaps.csv from manual input
        overlaps_csv = create_overlap_csv_from_input(overlap_lengths_widget.value)
        print(f"Created overlaps file from manual input: {overlaps_csv}")

def on_run_clicked(b):
    pred_dir = "/content/segment_predictions"
    if overlap_files_widget.value:
        overlap_file_info = list(overlap_files_widget.value.values())[0]
        overlaps_csv = os.path.join("/content", overlap_file_info['metadata']['name'])
    else:
        overlaps_csv = "overlaps.csv"

    run_rbpseg_merge(
        directory=pred_dir,
        overlaps_csv=overlaps_csv,
        function=function_widget.value,
        save_name=save_name_widget.value,
        chain_mode=chain_mode_widget.value,
        relax=relax_widget.value,
        bfactor=bfactor_widget.value
    )

def on_clear_clicked(b):
    segment_predictions_widget.value.clear()
    overlap_files_widget.value.clear()
    overlap_lengths_widget.value = ''
    function_widget.value = 1
    save_name_widget.value = 'merged.pdb'
    chain_mode_widget.value = 0
    relax_widget.value = False
    bfactor_widget.value = False
    if os.path.exists("/content/segment_predictions"):
        shutil.rmtree("/content/segment_predictions")
    if os.path.exists("overlaps.csv"):
        os.remove("overlaps.csv")
    print("🗑️ All parameters cleared and files deleted.")

upload_button.on_click(on_upload_clicked)
run_button.on_click(on_run_clicked)
clear_button.on_click(on_clear_clicked)

# ---------------------- Display UI ----------------------

display(Markdown("## 🧪 Analysis Parameters"))

display(Markdown("**1. Add fraction predictions (.pdb):**"))
display(segment_predictions_widget)

display(Markdown("**2. Add folder with segment overlap files:**"))
display(overlap_files_widget)

display(Markdown("**Or enter overlap lengths manually (e.g. 50 32 100):**"))
display(overlap_lengths_widget)

display(upload_button)


display(clear_button)

## 🧪 Analysis Parameters

**1. Add fraction predictions (.pdb):**

FileUpload(value={}, description='Upload', multiple=True)

**2. Add folder with segment overlap files:**

FileUpload(value={}, description='Upload')

**Or enter overlap lengths manually (e.g. 50 32 100):**

Text(value='', description='Lengths:', placeholder='e.g. 50 32 100', style=DescriptionStyle(description_width=…

Button(button_style='primary', description='Upload Files', style=ButtonStyle())

Button(button_style='danger', description='Clear', style=ButtonStyle())

Uploading predictions...
✅ Predictions uploaded.
Created overlaps file from manual input: overlaps.csv


## 👊 Run RBPseg-merge

In [14]:
!rbpseg-merge -d /content/segment_predictions -of /content/overlaps.csv -f 0 -n merged.pdb -c 1

Reordered PDB files: ['/content/segment_predictions/merged.pdb', '/content/segment_predictions/E2_seq_0_ranked_0.pdb', '/content/segment_predictions/E2_seq_0_ranked_1.pdb', '/content/segment_predictions/E2_seq_1_ranked_0.pdb', '/content/segment_predictions/E2_seq_1_ranked_1.pdb', '/content/segment_predictions/E2_seq_2_ranked_0.pdb', '/content/segment_predictions/E2_seq_2_ranked_1.pdb']
Reordered PDB files: ['merged.pdb', 'E2_seq_0_ranked_0.pdb', 'E2_seq_0_ranked_1.pdb', 'E2_seq_1_ranked_0.pdb', 'E2_seq_1_ranked_1.pdb', 'E2_seq_2_ranked_0.pdb', 'E2_seq_2_ranked_1.pdb']
Number of sequences: 3
Sequence seq_0 has 2 PDB files.
Sequence seq_1 has 2 PDB files.
Sequence seq_2 has 2 PDB files.
sequence counts: defaultdict(<class 'int'>, {0: 2, 1: 2, 2: 2})
  overhang_size = int(overhang_list.iloc[:, ov_index])
fixed_structure is /content/segment_predictions/merged.pdb
moving_structure is /content/segment_predictions/E2_seq_0_ranked_1.pdb
Pair: [0, 2]
Performing superimposition using method: usa

## 👀 Visualize your RBP

In [9]:
!pip install py3Dmol


Collecting py3Dmol
  Downloading py3dmol-2.5.1-py2.py3-none-any.whl.metadata (2.1 kB)
Downloading py3dmol-2.5.1-py2.py3-none-any.whl (7.2 kB)
Installing collected packages: py3Dmol
Successfully installed py3Dmol-2.5.1


In [13]:
import py3Dmol

def show_pdb(pdb_path='/content/segment_predictions/merged.pdb'):
    with open(pdb_path, 'r') as f:
        pdb_data = f.read()

    view = py3Dmol.view(width=600, height=400)
    view.addModel(pdb_data, 'pdb')
    view.setStyle({'cartoon': {'color': 'spectrum'}})
    view.zoomTo()
    return view.show()

# Run viewer
show_pdb('/content/segment_predictions/merged.pdb')
