# High-Throughput Rosetta Relaxation for PDB Structures

This notebook provides a standalone environment to perform Rosetta relaxation on a large batch of PDB files, such as the outputs from a nanobody design tool like **IgGM**.

### Purpose & Motivation

The main goal of this notebook is to separate the computationally intensive **relaxation step** from the design or generation step (which requires a GPU). By running this notebook on a **CPU runtime**, you can:

* **Save valuable GPU time** in other programs (like IgGM) by not using their built-in `--relax` flags.
* Process **hundreds or thousands of candidate structures** efficiently in a batch.
* Keep the final, relaxed PDB files for **further analysis and inspection**.

### Workflow

1.  **Dependencies:** Installs PyRosetta and its required libraries.
2.  **Script Creation:** A robust, standalone `run_relax.py` script is created in the Colab environment.
3.  **Input & Execution:** You will be prompted to provide your structures. You can either specify a path to a single PDB, a folder of PDBs, or a ZIP archive. If the path is left blank, a file upload prompt will appear. The notebook then automatically relaxes all found structures.
4.  **Download Results:** All the final, relaxed PDB files are packaged into a single `.zip` archive for a convenient one-click download.

In [None]:
#@title Install condacolab, this will restart your session

!pip install -q condacolab
import condacolab
condacolab.install()


In [2]:
#@title Install Dependencies
# Install Biopython and the official Colab setup tool and py3Dmol for visualization
!pip install -q biopython pyrosettacolabsetup
!pip install -q py3Dmol
# Run the PyRosetta installer
import pyrosettacolabsetup
pyrosettacolabsetup.install_pyrosetta()

# Initialize PyRosetta to make it active
import pyrosetta
pyrosetta.init("-ignore_unrecognized_res")

print("\n✅ PyRosetta and Biopython installation complete.")

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m45.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.9/16.9 MB[0m [31m108.4 MB/s[0m eta [36m0:00:00[0m
[?25hMounted at /content/google_drive

Note that USE OF PyRosetta FOR COMMERCIAL PURPOSES REQUIRE PURCHASE OF A LICENSE.
See https://github.com/RosettaCommons/rosetta/blob/main/LICENSE.md or email license@uw.edu for details.

Looking for compatible PyRosetta wheel file at google-drive/PyRosetta/colab.bin//wheels...
Found compatible wheel: /content/google_drive/MyDrive/PyRosetta/colab.bin/wheels//content/google_drive/MyDrive/PyRosetta/colab.bin/wheels/pyrosetta-2025.6+release.029c6a159b-cp311-cp311-linux_x86_64.whl


┌──────────────────────────────────────────────────────────────────────────────┐
│                                 PyRosetta-4                                  │
│              Created in JHU by Sergey Lyskov and PyRosetta Team             

In [7]:
#@title write run_relax.py


%%writefile run_relax.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Copyright (c) 2025, [Your Name or Alias]. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse
import os
import sys
import pyrosetta
from pyrosetta.rosetta.protocols.relax import FastRelax
from pyrosetta.rosetta.core.scoring import ScoreFunctionFactory

def main():
    parser = argparse.ArgumentParser(description='A standalone tool to run a simple, unconstrained FastRelax protocol.')
    parser.add_argument('--input_pdb', type=str, required=True, help='Path to the input PDB file.')
    parser.add_argument('--output_dir', type=str, required=True, help='Directory to save the relaxed PDB file.')
    args = parser.parse_args()

    # --- 1. Initialize PyRosetta with standard options ---
    # Using a simple set of flags for general compatibility.
    pyrosetta.init("-ignore_unrecognized_res -ignore_zero_occupancy false -ex1 -ex2aro")

    # --- 2. Set up the most basic FastRelax mover ---
    # This uses the standard 'ref2015' score function, which is robust.
    scorefxn = ScoreFunctionFactory.create_score_function('ref2015')
    relax = FastRelax()
    relax.set_scorefxn(scorefxn)

    # --- 3. Process the PDB file ---
    if not os.path.exists(args.input_pdb):
        print(f"Error: Input file not found at {args.input_pdb}", file=sys.stderr, flush=True)
        sys.exit(1)
    if not os.path.exists(args.output_dir):
        os.makedirs(args.output_dir)

    base_name = os.path.basename(args.input_pdb)
    output_path = os.path.join(args.output_dir, f"{os.path.splitext(base_name)[0]}_raw_relaxed.pdb")

    print(f"--- Processing: {base_name} ---", flush=True)
    try:
        pose = pyrosetta.pose_from_pdb(args.input_pdb)

        energy_before = scorefxn(pose)
        print(f"  Energy before relax: {energy_before:.2f}", flush=True)

        relax.apply(pose)

        energy_after = scorefxn(pose)
        print(f"  Energy after relax:  {energy_after:.2f}", flush=True)

        pose.dump_pdb(output_path)
        print(f"✅ Successfully relaxed and saved to: {output_path}", flush=True)

    except Exception as e:
        print(f"❌ ERROR processing {base_name}:", file=sys.stderr, flush=True)
        print(f"   Message: {e}", file=sys.stderr, flush=True)
        sys.exit(1)

if __name__ == '__main__':
    main()

Overwriting run_relax.py


In [10]:
#@title Provide Input, Run Relaxation, and Process Files
from google.colab import files
import os
import glob
import subprocess
import zipfile
import shutil

# --- Part 1: Get and Prepare User Input ---
# @markdown Enter the path to your input file or folder. **If left blank, an upload prompt will appear.**
user_input_path = "" #@param {type:"string"}

processing_dir = "pdb_inputs"
if os.path.exists(processing_dir):
    shutil.rmtree(processing_dir)
os.makedirs(processing_dir)

if not user_input_path.strip():
    print("Path is blank. Please upload your PDB or ZIP file(s)...")
    uploaded = files.upload()
    if not uploaded:
        raise Exception("Operation cancelled: No file(s) were uploaded.")
    for filename, content in uploaded.items():
        with open(os.path.join(processing_dir, filename), 'wb') as f:
            f.write(content)
    user_input_path = processing_dir
elif os.path.isdir(user_input_path):
    print(f"Copying files from folder: {user_input_path}")
    shutil.copytree(user_input_path, processing_dir, dirs_exist_ok=True)
elif os.path.isfile(user_input_path):
    print(f"Copying single file: {user_input_path}")
    shutil.copy(user_input_path, processing_dir)
else:
    raise FileNotFoundError(f"Error: The path '{user_input_path}' is not a valid file or directory.")

for zip_path in glob.glob(os.path.join(processing_dir, '*.zip')):
    print(f"Extracting '{os.path.basename(zip_path)}'...")
    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
        zip_ref.extractall(processing_dir)
    os.remove(zip_path)

files_to_process = glob.glob(os.path.join(processing_dir, '*.pdb'))

# --- Part 2: Run Relaxation on Selected Files ---
if not files_to_process:
    print("\nNo PDB files found to process. Halting execution.")
else:
    output_directory = "relaxed_output"
    if os.path.exists(output_directory):
        shutil.rmtree(output_directory)
    os.makedirs(output_directory)

    print(f"\n🚀 Starting relaxation for {len(files_to_process)} structure(s)...")

    for pdb_file in files_to_process:
        command = [
            "python",
            "-u",  # This '-u' flag tells Python to run in unbuffered mode
            "run_relax.py",
            "--input_pdb", pdb_file,
            "--output_dir", output_directory
        ]

        # Use Popen to stream output in real-time
        process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)

        # Read and print the output line by line as it is generated
        while True:
            output = process.stdout.readline()
            if output == '' and process.poll() is not None:
                break
            if output:
                print(output.strip())

        # Check for errors after the process finishes
        if process.returncode != 0:
             print(f"\n❌ An error occurred while processing {os.path.basename(pdb_file)}. See the output above for details.")

    print("\n🎉 All processing complete.")

Path is blank. Please upload your PDB or ZIP file(s)...


Saving cleaned_6z6v_chains_removed_merge.pdb to cleaned_6z6v_chains_removed_merge.pdb

🚀 Starting relaxation for 1 structure(s)...
┌──────────────────────────────────────────────────────────────────────────────┐
│                                 PyRosetta-4                                  │
│              Created in JHU by Sergey Lyskov and PyRosetta Team              │
│              (C) Copyright Rosetta Commons Member Institutions               │
│                                                                              │
│ NOTE: USE OF PyRosetta FOR COMMERCIAL PURPOSES REQUIRE PURCHASE OF A LICENSE │
│         See LICENSE.PyRosetta.md or email license@uw.edu for details         │
└──────────────────────────────────────────────────────────────────────────────┘
PyRosetta-4 2025 [Rosetta PyRosetta4.MinSizeRel.python311.ubuntu 2025.06+release.029c6a159b896477003a14f78f472d4cd2cead46 2025-02-04T15:14:13] retrieved from: http://www.pyrosetta.org
core.init: Checking for fconfig files

In [None]:
#@title Visualize and Align Design Results
import os
import py3Dmol
import collections
import numpy as np
from Bio.PDB import PDBParser, Superimposer, PDBIO
from Bio.PDB.PDBExceptions import PDBConstructionWarning
import warnings
import io

# Suppress Biopython warnings for cleaner output
warnings.simplefilter('ignore', PDBConstructionWarning)

# --- Helper Functions ---

def get_chain_info(pdb_data_string):
    """Gets the chains and the first residue of each for labeling."""
    chain_info = collections.OrderedDict()
    for line in pdb_data_string.splitlines():
        if line.startswith('ATOM'):
            chain_id = line[21]
            if chain_id not in chain_info:
                try:
                    residue_number = int(line[22:26])
                    chain_info[chain_id] = residue_number
                except ValueError:
                    continue
    return chain_info

def align_and_get_rmsd(fixed_pdb_path, moving_pdb_path, align_chain_id='A'):
    """
    Aligns the moving PDB to the fixed PDB using only the C-alpha atoms
    of the specified antigen chain (align_chain_id) and calculates the RMSD.
    Returns the aligned PDB as a string and the RMSD value.
    """
    parser = PDBParser()

    try:
        fixed_struct = parser.get_structure("fixed", fixed_pdb_path)
        moving_struct = parser.get_structure("moving", moving_pdb_path)
    except Exception as e:
        print(f"❌ ERROR: Biopython could not parse one of the PDB files: {e}")
        return None, None

    # Create a dictionary of C-alpha atoms from the specified antigen chain in the fixed structure
    fixed_ca_atoms_dict = {
        atom.get_parent().id: atom
        for atom in fixed_struct.get_atoms()
        if atom.get_name() == 'CA' and atom.get_parent().get_parent().id == align_chain_id
    }

    # Build paired lists of atoms from the antigen chain that exist in BOTH structures
    fixed_atoms = []
    moving_atoms = []
    for atom in moving_struct.get_atoms():
        if atom.get_name() == 'CA' and atom.get_parent().get_parent().id == align_chain_id:
            res_id = atom.get_parent().id
            if res_id in fixed_ca_atoms_dict:
                moving_atoms.append(atom)
                fixed_atoms.append(fixed_ca_atoms_dict[res_id])

    if not fixed_atoms:
        print(f"⚠️ Warning: No common C-alpha atoms found for chain '{align_chain_id}' between the two structures. Cannot align.")
        return None, None

    print(f"Found {len(fixed_atoms)} common C-alpha atoms on chain '{align_chain_id}' for alignment.")

    # --- Superimpose and calculate RMSD ---
    super_imposer = Superimposer()
    super_imposer.set_atoms(fixed_atoms, moving_atoms)
    rmsd = super_imposer.rms

    # Apply the transformation to the entire moving structure
    super_imposer.apply(moving_struct.get_atoms())

    # --- Save the aligned structure to a string ---
    io_handle = io.StringIO()
    pdb_io = PDBIO()
    pdb_io.set_structure(moving_struct)
    pdb_io.save(io_handle)
    aligned_pdb_string = io_handle.getvalue()

    return aligned_pdb_string, rmsd

def main():
    # --- Colab Forms ---
    #@markdown ### 1. Input Files and Parameters
    #@markdown Path to the original PDB file (unrelaxed).
    input_pdb_path = "/content/cleaned_6z6v_chains_removed_merge.pdb" #@param {type:"string"}
    #@markdown Path to one of the new, **relaxed** PDB files.
    output_pdb_path = "/content/relaxed_output/cleaned_6z6v_chains_removed_merge_raw_relaxed.pdb" #@param {type:"string"}
    #@markdown **Crucial:** Enter the chain ID of the **antigen** to use as the stable reference for alignment.
    antigen_chain_id = "D" #@param {type:"string"}


    if not all([input_pdb_path.strip(), output_pdb_path.strip(), antigen_chain_id.strip()]):
        print("❌ ERROR: Please fill in all three path/ID fields.")
        return
    if not os.path.exists(input_pdb_path):
        print(f"❌ ERROR: Input PDB file not found at '{input_pdb_path}'.")
        return
    if not os.path.exists(output_pdb_path):
        print(f"❌ ERROR: Output PDB file not found at '{output_pdb_path}'.")
        return

    # --- 1. Align structures and get RMSD ---
    print(f"\nAligning structures based on antigen chain '{antigen_chain_id}'...")
    aligned_pdb_data, rmsd = align_and_get_rmsd(input_pdb_path, output_pdb_path, antigen_chain_id)

    if not aligned_pdb_data or rmsd is None:
        print("Alignment failed. Halting before visualization.")
        return

    print(f"✅ Alignment complete. RMSD '{antigen_chain_id}': {rmsd:.3f} Å")

    # --- 2. Read file contents and get chain info ---
    with open(input_pdb_path, 'r') as f:
        original_pdb_data = f.read()

    original_chains = get_chain_info(original_pdb_data)
    designed_chains = get_chain_info(aligned_pdb_data)

    print("\nGenerating visualizations...")

    # --- Viewer 1: Original Input Structure ---
    print("\n--- 1. Original Input Structure ---")
    view1 = py3Dmol.view(width=800, height=500)
    view1.addModel(original_pdb_data, 'pdb')
    view1.setStyle({}, {'cartoon': {'colorscheme': 'chain'}})
    for chain, resi in original_chains.items():
        view1.addLabel(f"Chain {chain}", {'fontColor':'white', 'backgroundColor':'black', 'backgroundOpacity':0.7}, {'chain': chain, 'resi': resi})
    view1.zoomTo()
    view1.show()

    # --- Viewer 2: Relaxed Output Structure (Aligned) ---
    print("\n--- 2. Relaxed Output Structure ---")
    view2 = py3Dmol.view(width=800, height=500)
    view2.addModel(aligned_pdb_data, 'pdb')
    view2.setStyle({}, {'cartoon': {'colorscheme': 'chain'}})
    for chain, resi in designed_chains.items():
        view2.addLabel(f"Chain {chain}", {'fontColor':'white', 'backgroundColor':'black', 'backgroundOpacity':0.7}, {'chain': chain, 'resi': resi})
    view2.zoomTo()
    view2.show()

    # --- Viewer 3: Overlapped Structures ---
    print(f"\n--- 3. Overlapped Structures (RMSD over antigen: {rmsd:.3f} Å) ---")
    print("(Original in Gray, Relaxed/Designed in Color)")
    view3 = py3Dmol.view(width=800, height=500)
    view3.addModel(original_pdb_data, 'pdb')
    view3.setStyle({'model': 0}, {'cartoon': {'color': 'lightgray', 'opacity': 0.8}})

    view3.addModel(aligned_pdb_data, 'pdb')
    view3.setStyle({'model': 1}, {'cartoon': {'colorscheme': 'chain'}})

    view3.zoomTo()
    view3.show()

# Run the main function
main()


In [7]:
# =============================================
#@title Compress folder and auto-download the ZIP with outputs
# =============================================

import shutil
from google.colab import files

# 1) Path to the folder you want to compress
folder_to_zip = "/content/relaxed_output"  # <-- update to your folder

# 2) Archive base name (without .zip)
archive_name = "relaxed_outputs"

# 3) Create the .zip archive
shutil.make_archive(archive_name, 'zip', folder_to_zip)

# 4) Automatically download the archive
files.download(f"{archive_name}.zip")


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>