## Initializing colab
The two cells below are used only in case this notebook is executed via **Google Colab**. Take into account that, for running conda on **Google Colab**, the **condacolab** library must be installed. As [explained here](https://pypi.org/project/condacolab/), the installation requires a **kernel restart**, so when running this notebook in **Google Colab**, don't run all cells until this **installation** is properly **finished** and the **kernel** has **restarted**.

In [None]:
# Only executed when using google colab
import sys
if 'google.colab' in sys.modules:
  import subprocess
  from pathlib import Path
  try:
    subprocess.run(["conda", "-V"], check=True)
  except FileNotFoundError:
    subprocess.run([sys.executable, "-m", "pip", "install", "condacolab"], check=True)
    import condacolab
    condacolab.install()
    # Clone repository
    repo_URL = "https://github.com/RubenChM/biobb_wf_haddock.git"
    repo_name = Path(repo_URL).name.split('.')[0]
    if not Path(repo_name).exists():
      subprocess.run(["mamba", "install", "-y", "git"], check=True)
      subprocess.run(["git", "clone", repo_URL], check=True)
      print("‚è¨ Repository properly cloned.")
    # Install environment
    print("‚è≥ Creating environment...")
    env_file_path = f"{repo_name}/conda_env/haddock.yml"
    subprocess.run(["mamba", "env", "update", "-n", "base", "-f", env_file_path], check=True)
    print("üé® Install NGLView dependencies...")
    subprocess.run(["mamba", "install", "-y", "-c", "conda-forge", "nglview==3.1.4", "ipywidgets=8.1.6"], check=True)
    print("üëç Conda environment successfully created and updated.")

In [None]:
# Enable widgets for colab
if 'google.colab' in sys.modules:
  from google.colab import output
  output.enable_custom_widget_manager()
  # Change working dir
  import os
  os.chdir("biobb_wf_haddock/biobb_wf_haddock/notebooks")
  print(f"üìÇ New working directory: {os.getcwd()}")

In [None]:
# TO BE REMOVED!!!
%load_ext autoreload
%autoreload 2

# Imports
import os, shutil
import nglview as nv
import ipywidgets
import pandas as pd
import zipfile
import webbrowser

# Helpers
def def_dict(propierties={}):
    def_props = {'out_log_path': 'log/log.log',
                 'err_log_path': 'log/log.err',
                 'remove_tmp': True,
                 'can_write_console_log': True}
    def_props.update(propierties)
    return def_props

def show_pdbs(pdbs, surface=False):
    # Load the PDB files
    views = [nv.show_file(pdb) for pdb in pdbs]
    for view in views:
        if surface:
            view.clear()
            view.add_cartoon(color='black')
            view.add_surface(color='electrostatic', opacity=0.5)
        view.layout.width = '100%'
    return ipywidgets.HBox(views)

def display_actpass(pdb, actpass, opacity=1):
    with open(actpass, 'r') as file:
        actpass = file.read().splitlines()
        act_res = actpass[0].replace(' ', ', ')
        pas_res = actpass[1].replace(' ', ', ')
        
    # Load the PDB files
    view = nv.NGLWidget()
    view.add_component(pdb)
    view.clear()
    view.add_cartoon(color='black')
    view.add_ball_and_stick(color='grey',opacity=opacity)
    view.add_surface(selection=f'not ( {pas_res}, {act_res} )', color='white', opacity=opacity)
    if act_res != '':
        view.add_surface(selection=f'{act_res}', color='red')
    if pas_res != '':
        view.add_surface(selection=f'{pas_res}', color='green', opacity=opacity)
    view.layout.width = '100%'
    return view

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Haddock3 protein protein docking using BioExcel Building Blocks (biobb)
***
This tutorial aims to illustrate the process of **proptein protein docking** using **Haddock3**, step by step, using the **BioExcel Building Blocks library (biobb)**. 
***
**Biobb modules** used:

 - [biobb_haddock](https://github.com/bioexcel/biobb_haddock): Biobb building blocks for the Haddock3 suite.
 
**Auxiliar libraries** used:

 - [nb_conda_kernels](https://github.com/Anaconda-Platform/nb_conda_kernels): Enables a Jupyter Notebook or JupyterLab application in one conda environment to access kernels for Python, R, and other languages found in other environments.
 - [nglview](http://nglviewer.org/#nglview): Jupyter/IPython widget to interactively view molecular structures and trajectories in notebooks.
 - [ipywidgets](https://github.com/jupyter-widgets/ipywidgets): Interactive HTML widgets for Jupyter notebooks and the IPython kernel.

### Conda Installation and Launch

```console
git clone https://github.com/bioexcel/biobb_wf_ligand_parameterization.git
cd biobb_wf_ligand_parameterization
conda env create -f conda_env/environment.yml
conda activate biobb_ligand_parameterization_tutorial
jupyter-nbextension enable --py --user widgetsnbextension
jupyter-nbextension enable --py --user nglview
jupyter-notebook biobb_wf_ligand_parameterization/notebooks/biobb_ligand_parameterization_tutorial.ipynb
  ``` 

***
### Pipeline steps:
 1. [Input Parameters](#input)
 2. [Create topology](#fetch)
 3. [CAPRI evaluation](#addh)
 4. [Select Top structures](#min)
 5. [Flexible Refinement](#acpype)
 6. [2nd CAPRI evalutation](#output)
 7. [Energy Minimization Refinement](#questions)
 8. [Energy Minimization Refinement](#questions)
 
***
![](https://bioexcel.eu/wp-content/uploads/2019/04/Bioexcell_logo_1080px_transp.png)
***

<a id="input"></a>
***
## Input parameters
**Input parameters** needed:
 - **ligandCode**: 3-letter code of the ligand structure (e.g. IBP)
 - **mol_charge**: Molecule net charge (e.g. -1)
 - **pH**: Acidity or alkalinity for the small molecule. Hydrogen atoms will be added according to this pH. (e.g. 7.4)

# HADDOCK3 PPI tutorial

In [8]:
# https://www.bonvinlab.org/education/HADDOCK3/HADDOCK3-antibody-antigen/#setting-up-and-running-the-docking-with-haddock3
# data from: https://surfdrive.surf.nl/files/index.php/s/R7VHGQM9nx8QuQn

ab_id    = '4G6K'  # antibody
ag_id    = '4I1B' # antigen
ref_id   = '4G6M' 
out_path = './data/antibody/'
data_pth = '/home/rchaves/repo/biobb_haddock/biobb_haddock/test/data/haddock/'

## Preparing PDB files for docking

### Fetching the PDBs

In [9]:
# Downloading desired PDB files
# Import module
from biobb_io.api.pdb import pdb

# Create properties dict and inputs/outputs
ab_pdb  = f'{out_path}/pre/{ab_id}_0.pdb'
ag_pdb  = f'{out_path}/pre/{ag_id}_0.pdb'
ref_pdb = f'{out_path}/pre/{ref_id}_0.pdb'

# Create and launch bb
pdb(output_pdb_path = ab_pdb,  properties = def_dict({'pdb_code': ab_id}))
pdb(output_pdb_path = ag_pdb,  properties = def_dict({'pdb_code': ag_id}))
pdb(output_pdb_path = ref_pdb, properties = def_dict({'pdb_code': ref_id}))

0

### Preparing the antibody structure

In [10]:
from biobb_pdb_tools.pdb_tools.biobb_pdb_tidy import biobb_pdb_tidy
from biobb_pdb_tools.pdb_tools.biobb_pdb_selchain import biobb_pdb_selchain
from biobb_pdb_tools.pdb_tools.biobb_pdb_delhetatm import biobb_pdb_delhetatm
from biobb_pdb_tools.pdb_tools.biobb_pdb_fixinsert import biobb_pdb_fixinsert
from biobb_pdb_tools.pdb_tools.biobb_pdb_selaltloc import biobb_pdb_selaltloc
from biobb_pdb_tools.pdb_tools.biobb_pdb_keepcoord import biobb_pdb_keepcoord
from biobb_pdb_tools.pdb_tools.biobb_pdb_selres import biobb_pdb_selres

steps = [
    biobb_pdb_tidy,
    biobb_pdb_selchain,
    biobb_pdb_delhetatm,
    biobb_pdb_fixinsert,
    biobb_pdb_selaltloc,
    biobb_pdb_keepcoord,
    biobb_pdb_selres,
    biobb_pdb_tidy,
]

for ch in ['H','L']:
    step_props = {
        'biobb_pdb_tidy':     {'strict': True},
        'biobb_pdb_selchain': {'chains': ch},
        'biobb_pdb_selres':   {'selection': f'1:{ 120 if ch == "H" else 107}'},
    }
    for i, step in enumerate(steps):
        pdb_in  = f'{out_path}/pre/{ab_id}_{i}.pdb'
        if i+1 < len(steps):
            pdb_out  = f'{out_path}/pre/{ab_id}_{i+1}.pdb'
        else:
            pdb_out  = f'{out_path}/pre/{ab_id}_{ch}.pdb'
        props = def_dict(step_props.get(step.__name__, {}))
        step(input_file_path = pdb_in,  output_file_path=pdb_out,  properties = props)

In [11]:
import zipfile

# Define the zip file path
zip_file_path = f'{out_path}/pre/{ab_id}_HL.zip'

# Create a zip file and add the pdb_out file to it
with zipfile.ZipFile(zip_file_path, 'w') as zipf:
    zipf.write(f'{out_path}/pre/{ab_id}_H.pdb', arcname=f'{ab_id}_H.pdb')
    zipf.write(f'{out_path}/pre/{ab_id}_L.pdb', arcname=f'{ab_id}_L.pdb')

In [12]:
from biobb_pdb_tools.pdb_tools.biobb_pdb_merge import biobb_pdb_merge
from biobb_pdb_tools.pdb_tools.biobb_pdb_reres import biobb_pdb_reres
from biobb_pdb_tools.pdb_tools.biobb_pdb_chain import biobb_pdb_chain
from biobb_pdb_tools.pdb_tools.biobb_pdb_chainxseg import biobb_pdb_chainxseg

steps = [
    biobb_pdb_merge,
    biobb_pdb_reres,
    biobb_pdb_chain,
    biobb_pdb_chainxseg,
    biobb_pdb_tidy,
]

step_props = {
    'pdb_reres': {'number': 1},
    'biobb_pdb_chain': {'chain': 'A'},
    'biobb_pdb_tidy': {'strict': True},
}

for i, step in enumerate(steps):
    pdb_in  = (zip_file_path if i == 0 
               else f'{out_path}/pre/{ab_id}_HL_{i}.pdb')
    
    pdb_out = (f'{out_path}/pre/{ab_id}_HL_{i+1}.pdb' 
               if i+1 < len(steps) 
               else f'{out_path}/{ab_id}_clean.pdb')
    
    props = def_dict(step_props.get(step.__name__, {}))
    step(input_file_path = pdb_in,  output_file_path=pdb_out,  properties = props)

### Preparing the antigen structure

In [13]:
steps = [
    biobb_pdb_tidy,
    biobb_pdb_delhetatm,
    biobb_pdb_selaltloc,
    biobb_pdb_keepcoord,
    biobb_pdb_chain,
    biobb_pdb_chainxseg,
    biobb_pdb_tidy,
]

step_props = {
    'biobb_pdb_tidy': {'strict': True},
    'biobb_pdb_chain': {'chain': 'B'},
}

for i, step in enumerate(steps):
    pdb_in  = f'{out_path}/pre/{ag_id}_{i}.pdb'
    pdb_out = (f'{out_path}/pre/{ag_id}_{i+1}.pdb' 
               if i+1 < len(steps) 
               else f'{out_path}/{ag_id}_clean.pdb')
    props = def_dict(step_props.get(step.__name__, {}))
    step(input_file_path = pdb_in,  output_file_path=pdb_out,  properties = props)

### Preparing the reference pdb

In [14]:
steps = [
    biobb_pdb_tidy,
    biobb_pdb_selchain
]

step_props = {
    'biobb_pdb_tidy': {'strict': True},
    'biobb_pdb_selchain': {'chains': 'H,L'},
}

for i, step in enumerate(steps):
    pdb_in  = f'{out_path}pre/{ref_id}_{i}.pdb'
    pdb_out = f'{out_path}pre/{ref_id}_{i+1}.pdb' 
    props = def_dict(step_props.get(step.__name__, {}))
    step(input_file_path = pdb_in,  output_file_path=pdb_out,  properties = props)

In [15]:
steps = [
    biobb_pdb_tidy,
    biobb_pdb_selchain,
    biobb_pdb_delhetatm,
    biobb_pdb_fixinsert,
    biobb_pdb_selaltloc,
    biobb_pdb_keepcoord,
    biobb_pdb_selres,
    biobb_pdb_tidy,
]
sels = {"H": 120, "L":107, "A": ''}

for ch in ['H','L']:
    step_props = {
        'biobb_pdb_tidy':     {'strict': True},
        'biobb_pdb_selchain': {'chains': ch},
        'biobb_pdb_selres':   {'selection': f'1:{sels[ch]}'}
    }
    for i, step in enumerate(steps):
        pdb_in  = f'{out_path}/pre/{ref_id}_{i}.pdb'
        if i+1 < len(steps):
            pdb_out  = f'{out_path}/pre/{ref_id}_{i+1}.pdb'
        else:
            pdb_out  = f'{out_path}/pre/{ref_id}_{ch}.pdb'
        props = def_dict(step_props.get(step.__name__, {}))
        step(input_file_path = pdb_in,  output_file_path=pdb_out,  properties = props)

In [None]:
ref_H = f'{out_path}/pre/{ref_id}_H.pdb'
ref_L = f'{out_path}/pre/{ref_id}_L.pdb'
ref_HL = f'{out_path}/pre/{ref_id}_HL.pdb'

!pdb_merge {ref_H} {ref_L} | pdb_reres -1 | pdb_chain -A | pdb_chainxseg | pdb_tidy -strict > {ref_HL}

In [40]:
pdb_in  = f'{out_path}/pre/{ref_id}_0.pdb'
ref_A = f'{out_path}/pre/{ref_id}_A.pdb'
ref_pdb_clean = f'{out_path}/{ref_id}_clean.pdb'

!pdb_selchain -A {pdb_in} | pdb_reatom -0 | pdb_chain -B | pdb_chainxseg > {A}
!pdb_merge {ref_HL} {A} | pdb_segxchain | pdb_tidy -strict > {ref_pdb_clean}

## Defining restrains

#### Paratope
The residues of the hypervariable loops involved in binding. Identified with:
- [ProABC-2](https://academic.oup.com/bioinformatics/article/36/20/5107/5873593?login=false)

#### Epitope

From [bibliography](https://linkinghub.elsevier.com/retrieve/pii/S0022283612007863)

In [18]:
paratope_sel = '31,32,33,34,35,52,54,55,56,100,101,102,103,104,105,106,151,152,169,170,173,211,212,213,214,216'
epitope_sel  = '72,73,74,75,81,83,84,89,90,92,94,96,97,98,115,116,117'

In [41]:
ab_pdb_clean = f'{out_path}/{ab_id}_clean.pdb'
ag_pdb_clean = f'{out_path}/{ag_id}_clean.pdb'
ref_pdb_clean = f'{out_path}/{ref_id}_clean.pdb'
views = show_pdbs([ab_pdb_clean, ag_pdb_clean, ref_pdb_clean])

In [42]:
views.children[0].add_surface(selection=paratope_sel.replace(',', ', '), color='red')
views.children[1].add_surface(selection=epitope_sel.replace(',', ', '), color='red')
views

HBox(children=(NGLWidget(layout=Layout(width='100%')), NGLWidget(layout=Layout(width='100%')), NGLWidget(layou‚Ä¶

In [21]:
# Obtain passive from active selection
from biobb_haddock.haddock_restraints.haddock3_passive_from_active import haddock3_passive_from_active

ab_actpass = f'{out_path}{ab_id}_actpass.txt'
ag_actpass = f'{out_path}{ag_id}_actpass.txt'

# Create the actpass for the antibody manually
with open(ab_actpass, 'w') as f:
    f.write( paratope_sel.replace(',', ' ')+'\n\n')

# For the antigen, we will use the epitope selection as the active selection
# and some reidues around it as passsive
haddock3_passive_from_active( 
    input_pdb_path      = ag_pdb_clean,
    output_actpass_path = ag_actpass,
    properties          = def_dict({'active_list' : epitope_sel}))

0

In [22]:
display_actpass(ag_pdb_clean, ag_actpass)

NGLWidget(layout=Layout(width='100%'))

#### Defining ambiguous restraints

In [23]:
# Convert active/passive to ambiguous restraints
from biobb_haddock.haddock_restraints.haddock3_actpass_to_ambig import haddock3_actpass_to_ambig

complex_tbl = f'{out_path}ambig-paratope-NMR-epitope.tbl'

haddock3_actpass_to_ambig( 
    input_actpass1_path=ab_actpass,
    input_actpass2_path=ag_actpass,    
    output_tbl_path=complex_tbl,
    properties = def_dict({
        'segid_one': 'A', 
        'segid_two': 'B'}))

0

In [24]:
# Validate tbl
!haddock3-restraints validate_tbl {complex_tbl} --silent

[2025-05-09 15:11:56,302 cli_restraints INFO] [DEPRECATION NOTICE] This command will soon be replaced with `haddock-restraints`


#### Additional restraints for multi-chain proteins

In [25]:
# Tie antibody chains together
from biobb_haddock.haddock_restraints.haddock3_restrain_bodies import haddock3_restrain_bodies

body_tbl = f'{out_path}antibody-unambig.tbl'

haddock3_restrain_bodies( 
    input_structure_path=ab_pdb_clean,
    output_tbl_path=body_tbl,
    properties = def_dict())

0

## Docking

In [63]:
# repeat variables to run the code above
ab_pdb_clean = f'{out_path}{ab_id}_clean.pdb'
ag_pdb_clean = f'{out_path}{ag_id}_clean.pdb'
ref_pdb_clean = f'{out_path}{ref_id}_clean.pdb'
complex_tbl = f'{out_path}ambig-paratope-NMR-epitope.tbl'
body_tbl = f'{out_path}antibody-unambig.tbl'

### Create topology

In [64]:
from biobb_haddock.haddock.topology import topology
step_idx = 0
mol1_output_top_zip_path = f'{out_path}/docking/{step_idx}/top_mol1.zip'
mol2_output_top_zip_path = f'{out_path}/docking/{step_idx}/top_mol2.zip'
wf_topology              = f'{out_path}/docking/{step_idx}/wf_topology.zip'

topology(mol1_input_pdb_path        = ab_pdb_clean,
         mol2_input_pdb_path        = ag_pdb_clean,
         mol1_output_top_zip_path   = mol1_output_top_zip_path,
         mol2_output_top_zip_path   = mol2_output_top_zip_path,
         output_haddock_wf_data_zip = wf_topology,
         properties                 = def_dict())

2025-05-09 15:36:00,394 [MainThread  ] [INFO ]  Module: biobb_haddock.haddock.topology Version: 5.0.0
2025-05-09 15:36:00,395 [MainThread  ] [INFO ]  /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_046e6c1c-7a39-4e75-ad00-17b3683dadba directory successfully created
2025-05-09 15:36:00,395 [MainThread  ] [INFO ]  Copy: ./data/antibody/4G6K_clean.pdb to /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_046e6c1c-7a39-4e75-ad00-17b3683dadba
2025-05-09 15:36:00,396 [MainThread  ] [INFO ]  Copy: ./data/antibody/4I1B_clean.pdb to /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_046e6c1c-7a39-4e75-ad00-17b3683dadba
2025-05-09 15:36:00,396 [MainThread  ] [INFO ]  haddock3 72d4d943-36fb-4140-905d-aa872dea76ac/haddock.cfg

2025-05-09 15:36:02,541 [MainThread  ] [INFO ]  Executing: haddock3 72d4d943-36fb-4140-905d-aa872dea76ac/haddock.cfg...
2025-05-09 15:36:02,542 [MainThread  ] [INFO ]  Exit code: 0
2025-05-09 15:36:02,542 [MainThread  ] [INFO ]  [2025-05-09 15:36:00,949 cli INFO]

0

### Rigid Body sampling

In [65]:
from biobb_haddock.haddock.rigid_body import rigid_body

properties={
    'cfg': {
        'tolerance': 2,
        'sampling': 100, # 1000
    }
}

step_idx = 1
docking_output_zip_path = f'{out_path}docking/{step_idx}/docking.zip'
wf_rigidbody            = f'{out_path}docking/{step_idx}/wf_rigidbody.zip'

rigid_body(input_haddock_wf_data_zip     = wf_topology,
           docking_output_zip_path       = docking_output_zip_path,
           ambig_restraints_table_path   = complex_tbl,
           unambig_restraints_table_path = body_tbl,
           output_haddock_wf_data_zip    = wf_rigidbody,
           properties                    = def_dict(properties))

2025-05-09 15:36:02,601 [MainThread  ] [INFO ]  Module: biobb_haddock.haddock.rigid_body Version: 5.0.0
2025-05-09 15:36:02,602 [MainThread  ] [INFO ]  /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_0b9346e5-426b-4523-9c19-ed53463a3188 directory successfully created
2025-05-09 15:36:02,603 [MainThread  ] [INFO ]  Copy: ./data/antibody/ambig-paratope-NMR-epitope.tbl to /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_0b9346e5-426b-4523-9c19-ed53463a3188
2025-05-09 15:36:02,603 [MainThread  ] [INFO ]  Copy: ./data/antibody/antibody-unambig.tbl to /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_0b9346e5-426b-4523-9c19-ed53463a3188
2025-05-09 15:36:02,606 [MainThread  ] [INFO ]  Extracting: /home/rchaves/repo/ab_design/biobb_wf_haddock/data/antibody/docking/0/wf_topology.zip
2025-05-09 15:36:02,607 [MainThread  ] [INFO ]  to:
2025-05-09 15:36:02,607 [MainThread  ] [INFO ]  ['86115f54-c007-424c-a367-b9b32d64ce77/0_topoaa', '86115f54-c007-424c-a367-b9b32d64ce77/analysis', '

0

### CAPRI evaluation

In [66]:
from biobb_haddock.haddock.capri_eval import capri_eval

step_idx = 2
output_evaluation_zip_path = f'{out_path}docking/{step_idx}/caprieval.zip'
wf_caprieval               = f'{out_path}docking/{step_idx}/wf_caprieval.zip'

capri_eval(input_haddock_wf_data_zip  = wf_rigidbody,
           reference_pdb_path         = ref_pdb_clean,
           output_evaluation_zip_path = output_evaluation_zip_path,
           output_haddock_wf_data_zip = wf_caprieval,
           properties                 = def_dict())

2025-05-09 15:40:08,602 [MainThread  ] [INFO ]  Module: biobb_haddock.haddock.capri_eval Version: 5.0.0
2025-05-09 15:40:08,603 [MainThread  ] [INFO ]  /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_00279b5e-6d04-4bb9-9342-0e205433390e directory successfully created
2025-05-09 15:40:08,604 [MainThread  ] [INFO ]  Copy: ./data/antibody/4G6M_clean.pdb to /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_00279b5e-6d04-4bb9-9342-0e205433390e
2025-05-09 15:40:08,647 [MainThread  ] [INFO ]  Extracting: /home/rchaves/repo/ab_design/biobb_wf_haddock/data/antibody/docking/1/wf_rigidbody.zip
2025-05-09 15:40:08,647 [MainThread  ] [INFO ]  to:
2025-05-09 15:40:08,647 [MainThread  ] [INFO ]  ['b191a34d-38c4-40f5-9ac0-f3cb9947ba4e/0_topoaa', 'b191a34d-38c4-40f5-9ac0-f3cb9947ba4e/1_rigidbody', 'b191a34d-38c4-40f5-9ac0-f3cb9947ba4e/analysis', 'b191a34d-38c4-40f5-9ac0-f3cb9947ba4e/data', 'b191a34d-38c4-40f5-9ac0-f3cb9947ba4e/traceback', 'b191a34d-38c4-40f5-9ac0-f3cb9947ba4e/log', 'b191a34d-

0

### Select Top structures

In [67]:
from biobb_haddock.haddock.sele_top import sele_top

properties={
    'cfg': {
        'select': 25,
    }
}

step_idx = 3
output_selection_zip_path = f'{out_path}docking/{step_idx}/selected.zip'
wf_seletop                = f'{out_path}docking/{step_idx}/wf_seletop.zip'

sele_top(input_haddock_wf_data_zip  = wf_caprieval,
         output_selection_zip_path  = output_selection_zip_path,
         output_haddock_wf_data_zip = wf_seletop,
         properties                 = def_dict(properties))

2025-05-09 15:40:16,121 [MainThread  ] [INFO ]  Module: biobb_haddock.haddock.sele_top Version: 5.0.0
2025-05-09 15:40:16,122 [MainThread  ] [INFO ]  /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_d98e7d75-0df8-4b2b-a901-db106f8b270a directory successfully created
2025-05-09 15:40:16,170 [MainThread  ] [INFO ]  Extracting: /home/rchaves/repo/ab_design/biobb_wf_haddock/data/antibody/docking/2/wf_caprieval.zip
2025-05-09 15:40:16,171 [MainThread  ] [INFO ]  to:
2025-05-09 15:40:16,171 [MainThread  ] [INFO ]  ['39dce18d-2f09-4232-840d-ff07ceafe4f0/0_topoaa', '39dce18d-2f09-4232-840d-ff07ceafe4f0/1_rigidbody', '39dce18d-2f09-4232-840d-ff07ceafe4f0/2_caprieval', '39dce18d-2f09-4232-840d-ff07ceafe4f0/analysis', '39dce18d-2f09-4232-840d-ff07ceafe4f0/data', '39dce18d-2f09-4232-840d-ff07ceafe4f0/traceback', '39dce18d-2f09-4232-840d-ff07ceafe4f0/log', '39dce18d-2f09-4232-840d-ff07ceafe4f0/traceback/consensus.tsv', '39dce18d-2f09-4232-840d-ff07ceafe4f0/traceback/traceback.tsv', '39dce18d-2

0

### 2nd CAPRI evalutation

In [68]:
from biobb_haddock.haddock.capri_eval import capri_eval

step_idx = 4
output_evaluation_zip_path2 = f'{out_path}docking/{step_idx}/caprieval2.zip'
wf_caprieval2               = f'{out_path}docking/{step_idx}/wf_caprieval2.zip'

capri_eval(input_haddock_wf_data_zip  = wf_seletop,
           reference_pdb_path         = ref_pdb_clean,
           output_evaluation_zip_path = output_evaluation_zip_path2,
           output_haddock_wf_data_zip = wf_caprieval2,
           properties                 = def_dict())

2025-05-09 15:40:20,920 [MainThread  ] [INFO ]  Module: biobb_haddock.haddock.capri_eval Version: 5.0.0
2025-05-09 15:40:20,921 [MainThread  ] [INFO ]  /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_c4236868-df28-4297-8ef9-d7b7da36c557 directory successfully created
2025-05-09 15:40:20,921 [MainThread  ] [INFO ]  Copy: ./data/antibody/4G6M_clean.pdb to /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_c4236868-df28-4297-8ef9-d7b7da36c557
2025-05-09 15:40:20,971 [MainThread  ] [INFO ]  Extracting: /home/rchaves/repo/ab_design/biobb_wf_haddock/data/antibody/docking/3/wf_seletop.zip
2025-05-09 15:40:20,971 [MainThread  ] [INFO ]  to:
2025-05-09 15:40:20,972 [MainThread  ] [INFO ]  ['fc0fc388-5dcb-4618-b587-4272a477dc2b/0_topoaa', 'fc0fc388-5dcb-4618-b587-4272a477dc2b/1_rigidbody', 'fc0fc388-5dcb-4618-b587-4272a477dc2b/2_caprieval', 'fc0fc388-5dcb-4618-b587-4272a477dc2b/3_seletop', 'fc0fc388-5dcb-4618-b587-4272a477dc2b/analysis', 'fc0fc388-5dcb-4618-b587-4272a477dc2b/data', 'fc0

0

### Flexible Refinement

In [69]:
from biobb_haddock.haddock.flex_ref import flex_ref

step_idx = 5
refinement_output_zip_path = f'{out_path}docking/{step_idx}/flexref.zip'
wf_flexref                 = f'{out_path}docking/{step_idx}/wf_flexref.zip'

flex_ref(input_haddock_wf_data_zip     = wf_caprieval2,
         refinement_output_zip_path    = refinement_output_zip_path,
         ambig_restraints_table_path   = complex_tbl,
         unambig_restraints_table_path = body_tbl,
         output_haddock_wf_data_zip    = wf_flexref,
         properties                    = def_dict())

2025-05-09 15:40:25,805 [MainThread  ] [INFO ]  Module: biobb_haddock.haddock.flex_ref Version: 5.0.0
2025-05-09 15:40:25,806 [MainThread  ] [INFO ]  /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_1157d742-c359-429a-847a-d9943775008f directory successfully created
2025-05-09 15:40:25,807 [MainThread  ] [INFO ]  Copy: ./data/antibody/ambig-paratope-NMR-epitope.tbl to /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_1157d742-c359-429a-847a-d9943775008f
2025-05-09 15:40:25,807 [MainThread  ] [INFO ]  Copy: ./data/antibody/antibody-unambig.tbl to /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_1157d742-c359-429a-847a-d9943775008f
2025-05-09 15:40:25,867 [MainThread  ] [INFO ]  Extracting: /home/rchaves/repo/ab_design/biobb_wf_haddock/data/antibody/docking/4/wf_caprieval2.zip
2025-05-09 15:40:25,868 [MainThread  ] [INFO ]  to:
2025-05-09 15:40:25,868 [MainThread  ] [INFO ]  ['23131092-6ec3-4bb8-91df-de2d747cbb8a/0_topoaa', '23131092-6ec3-4bb8-91df-de2d747cbb8a/1_rigidbody'

0

### 3nd CAPRI evalutation

In [70]:
from biobb_haddock.haddock.capri_eval import capri_eval

step_idx = 6
output_evaluation_zip_path3 = f'{out_path}docking/{step_idx}/caprieval3.zip'
wf_caprieval3               = f'{out_path}docking/{step_idx}/wf_caprieval3.zip'

capri_eval(input_haddock_wf_data_zip  = wf_flexref,
           reference_pdb_path         = ref_pdb_clean,
           output_evaluation_zip_path = output_evaluation_zip_path3,
           output_haddock_wf_data_zip = wf_caprieval3,
           properties                 = def_dict())

2025-05-09 15:47:37,862 [MainThread  ] [INFO ]  Module: biobb_haddock.haddock.capri_eval Version: 5.0.0
2025-05-09 15:47:37,863 [MainThread  ] [INFO ]  /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_302fa444-c40b-4c3c-b104-665f7234daa0 directory successfully created
2025-05-09 15:47:37,864 [MainThread  ] [INFO ]  Copy: ./data/antibody/4G6M_clean.pdb to /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_302fa444-c40b-4c3c-b104-665f7234daa0
2025-05-09 15:47:38,000 [MainThread  ] [INFO ]  Extracting: /home/rchaves/repo/ab_design/biobb_wf_haddock/data/antibody/docking/5/wf_flexref.zip
2025-05-09 15:47:38,000 [MainThread  ] [INFO ]  to:
2025-05-09 15:47:38,001 [MainThread  ] [INFO ]  ['fa9ce354-fb17-4e64-856a-8c0e8b82d271/0_topoaa', 'fa9ce354-fb17-4e64-856a-8c0e8b82d271/1_rigidbody', 'fa9ce354-fb17-4e64-856a-8c0e8b82d271/2_caprieval', 'fa9ce354-fb17-4e64-856a-8c0e8b82d271/3_seletop', 'fa9ce354-fb17-4e64-856a-8c0e8b82d271/4_caprieval', 'fa9ce354-fb17-4e64-856a-8c0e8b82d271/5_flexre

0

### Energy Minimization Refinement

In [71]:
from biobb_haddock.haddock.em_ref import em_ref

step_idx = 7
refinement_output_zip_path = f'{out_path}docking/{step_idx}/emref.zip'
wf_emref                   = f'{out_path}docking/{step_idx}/wf_emref.zip'

em_ref(input_haddock_wf_data_zip  = wf_caprieval3,
       refinement_output_zip_path = refinement_output_zip_path,
       ambig_restraints_table_path   = complex_tbl,
       unambig_restraints_table_path = body_tbl,
       output_haddock_wf_data_zip = wf_emref,
       properties                 = def_dict())

2025-05-09 15:47:43,184 [MainThread  ] [INFO ]  Module: biobb_haddock.haddock.em_ref Version: 5.0.0
2025-05-09 15:47:43,184 [MainThread  ] [INFO ]  /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_89a2a401-0cc9-40e0-a77d-cca9ce061b43 directory successfully created
2025-05-09 15:47:43,185 [MainThread  ] [INFO ]  Copy: ./data/antibody/ambig-paratope-NMR-epitope.tbl to /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_89a2a401-0cc9-40e0-a77d-cca9ce061b43
2025-05-09 15:47:43,186 [MainThread  ] [INFO ]  Copy: ./data/antibody/antibody-unambig.tbl to /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_89a2a401-0cc9-40e0-a77d-cca9ce061b43
2025-05-09 15:47:43,314 [MainThread  ] [INFO ]  Extracting: /home/rchaves/repo/ab_design/biobb_wf_haddock/data/antibody/docking/6/wf_caprieval3.zip
2025-05-09 15:47:43,315 [MainThread  ] [INFO ]  to:
2025-05-09 15:47:43,315 [MainThread  ] [INFO ]  ['6f1c725a-3137-4e09-ae2d-39f4fdb8bcf8/0_topoaa', '6f1c725a-3137-4e09-ae2d-39f4fdb8bcf8/1_rigidbody', 

0

### 4rd CAPRI evaluation

In [72]:
from biobb_haddock.haddock.capri_eval import capri_eval

step_idx = 8
output_evaluation_zip_path4 = f'{out_path}docking/{step_idx}/caprieval4.zip'
wf_caprieval4               = f'{out_path}docking/{step_idx}/wf_caprieval4.zip'

capri_eval(input_haddock_wf_data_zip  = wf_emref,
           reference_pdb_path         = ref_pdb_clean,
           output_evaluation_zip_path = output_evaluation_zip_path4,
           output_haddock_wf_data_zip = wf_caprieval4,
           properties                 = def_dict())

2025-05-09 15:48:23,500 [MainThread  ] [INFO ]  Module: biobb_haddock.haddock.capri_eval Version: 5.0.0
2025-05-09 15:48:23,500 [MainThread  ] [INFO ]  /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_8f195e59-69f4-4ae0-8194-71f437c53496 directory successfully created
2025-05-09 15:48:23,501 [MainThread  ] [INFO ]  Copy: ./data/antibody/4G6M_clean.pdb to /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_8f195e59-69f4-4ae0-8194-71f437c53496
2025-05-09 15:48:23,658 [MainThread  ] [INFO ]  Extracting: /home/rchaves/repo/ab_design/biobb_wf_haddock/data/antibody/docking/7/wf_emref.zip
2025-05-09 15:48:23,658 [MainThread  ] [INFO ]  to:
2025-05-09 15:48:23,659 [MainThread  ] [INFO ]  ['8d03a87f-2039-4b49-b800-125539d32735/0_topoaa', '8d03a87f-2039-4b49-b800-125539d32735/1_rigidbody', '8d03a87f-2039-4b49-b800-125539d32735/2_caprieval', '8d03a87f-2039-4b49-b800-125539d32735/3_seletop', '8d03a87f-2039-4b49-b800-125539d32735/4_caprieval', '8d03a87f-2039-4b49-b800-125539d32735/5_flexref'

0

### Clustering using FCC

In [73]:
from biobb_haddock.haddock.clust_fcc import clust_fcc

step_idx = 9
output_cluster_zip_path = f'{out_path}docking/{step_idx}/clustfcc.zip'
wf_clustfcc             = f'{out_path}docking/{step_idx}/wf_clustfcc.zip'

clust_fcc(input_haddock_wf_data_zip = wf_caprieval4,
         output_cluster_zip_path    = output_cluster_zip_path,
         output_haddock_wf_data_zip = wf_clustfcc,
         properties                 = def_dict())

2025-05-09 15:48:28,854 [MainThread  ] [INFO ]  Module: biobb_haddock.haddock.clust_fcc Version: 5.0.0
2025-05-09 15:48:28,854 [MainThread  ] [INFO ]  /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_0e2d6340-7d4b-45f8-b5a7-5bf857fa7f4c directory successfully created
2025-05-09 15:48:29,032 [MainThread  ] [INFO ]  Extracting: /home/rchaves/repo/ab_design/biobb_wf_haddock/data/antibody/docking/8/wf_caprieval4.zip
2025-05-09 15:48:29,033 [MainThread  ] [INFO ]  to:
2025-05-09 15:48:29,033 [MainThread  ] [INFO ]  ['de8ad066-3373-47b9-b071-83aa9a92909d/0_topoaa', 'de8ad066-3373-47b9-b071-83aa9a92909d/1_rigidbody', 'de8ad066-3373-47b9-b071-83aa9a92909d/2_caprieval', 'de8ad066-3373-47b9-b071-83aa9a92909d/3_seletop', 'de8ad066-3373-47b9-b071-83aa9a92909d/4_caprieval', 'de8ad066-3373-47b9-b071-83aa9a92909d/5_flexref', 'de8ad066-3373-47b9-b071-83aa9a92909d/6_caprieval', 'de8ad066-3373-47b9-b071-83aa9a92909d/7_emref', 'de8ad066-3373-47b9-b071-83aa9a92909d/8_caprieval', 'de8ad066-3373-47b9-b

0

### Selecting top clusters

In [74]:
from biobb_haddock.haddock.sele_top_clusts import sele_top_clusts

properties={
    'cfg': {
        'top_models': 4,
    },
}

step_idx = 10
output_seletopclusts_zip_path = f'{out_path}docking/{step_idx}/seletopclusts.zip'
wf_seletopclusts              = f'{out_path}docking/{step_idx}/wf_seletopclusts.zip'

sele_top_clusts(input_haddock_wf_data_zip  = wf_clustfcc,
                output_selection_zip_path  = output_seletopclusts_zip_path,
                output_haddock_wf_data_zip = wf_seletopclusts,
                properties                 = def_dict(properties))

2025-05-09 15:48:34,400 [MainThread  ] [INFO ]  Module: biobb_haddock.haddock.sele_top_clusts Version: 5.0.0
2025-05-09 15:48:34,401 [MainThread  ] [INFO ]  /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_a54ac9b5-26e4-4b31-a90e-578fc937e550 directory successfully created
2025-05-09 15:48:34,591 [MainThread  ] [INFO ]  Extracting: /home/rchaves/repo/ab_design/biobb_wf_haddock/data/antibody/docking/9/wf_clustfcc.zip
2025-05-09 15:48:34,592 [MainThread  ] [INFO ]  to:
2025-05-09 15:48:34,592 [MainThread  ] [INFO ]  ['12dde670-7d12-4481-948e-97472427c8fc/0_topoaa', '12dde670-7d12-4481-948e-97472427c8fc/1_rigidbody', '12dde670-7d12-4481-948e-97472427c8fc/2_caprieval', '12dde670-7d12-4481-948e-97472427c8fc/3_seletop', '12dde670-7d12-4481-948e-97472427c8fc/4_caprieval', '12dde670-7d12-4481-948e-97472427c8fc/5_flexref', '12dde670-7d12-4481-948e-97472427c8fc/6_caprieval', '12dde670-7d12-4481-948e-97472427c8fc/7_emref', '12dde670-7d12-4481-948e-97472427c8fc/8_caprieval', '12dde670-7d12-44

0

### 5th CAPRI evaluation

In [75]:
from biobb_haddock.haddock.capri_eval import capri_eval

step_idx = 11
output_evaluation_zip_path5 = f'{out_path}docking/{step_idx}/caprieval5.zip'
wf_caprieval5               = f'{out_path}docking/{step_idx}/wf_caprieval5.zip'

capri_eval(input_haddock_wf_data_zip  = wf_seletopclusts,
           reference_pdb_path         = ref_pdb_clean,
           output_evaluation_zip_path = output_evaluation_zip_path5,
           output_haddock_wf_data_zip = wf_caprieval5,
           properties                 = def_dict())

2025-05-09 15:48:59,742 [MainThread  ] [INFO ]  Module: biobb_haddock.haddock.capri_eval Version: 5.0.0
2025-05-09 15:48:59,743 [MainThread  ] [INFO ]  /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_b61524d0-bf78-45a4-964a-953c633f91d1 directory successfully created
2025-05-09 15:48:59,744 [MainThread  ] [INFO ]  Copy: ./data/antibody/4G6M_clean.pdb to /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_b61524d0-bf78-45a4-964a-953c633f91d1
2025-05-09 15:48:59,919 [MainThread  ] [INFO ]  Extracting: /home/rchaves/repo/ab_design/biobb_wf_haddock/data/antibody/docking/10/wf_seletopclusts.zip
2025-05-09 15:48:59,920 [MainThread  ] [INFO ]  to:
2025-05-09 15:48:59,920 [MainThread  ] [INFO ]  ['a6875993-a803-488c-b977-c0133a6b4b00/00_topoaa', 'a6875993-a803-488c-b977-c0133a6b4b00/01_rigidbody', 'a6875993-a803-488c-b977-c0133a6b4b00/02_caprieval', 'a6875993-a803-488c-b977-c0133a6b4b00/03_seletop', 'a6875993-a803-488c-b977-c0133a6b4b00/04_caprieval', 'a6875993-a803-488c-b977-c0133a6b4

0

### Contacts analysis

In [76]:
from biobb_haddock.haddock.contact_map import contact_map

step_idx = 12
output_contactmap_zip_path = f'{out_path}docking/{step_idx}/contact_map.zip'
wf_contact_map             = f'{out_path}docking/{step_idx}/wf_contact_map.zip'

contact_map(input_haddock_wf_data_zip  = wf_caprieval5,
            output_contactmap_zip_path = output_contactmap_zip_path,
            output_haddock_wf_data_zip = wf_contact_map,
            properties                 = def_dict())

2025-05-09 15:49:05,647 [MainThread  ] [INFO ]  Module: biobb_haddock.haddock.contact_map Version: 5.0.0
2025-05-09 15:49:05,648 [MainThread  ] [INFO ]  /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_a881af6d-9433-4478-9ded-1fe7b0f5dc9d directory successfully created
2025-05-09 15:49:05,664 [MainThread  ] [INFO ]  Copy: ./data/antibody/docking/11/wf_caprieval5.zip to /home/rchaves/repo/ab_design/biobb_wf_haddock/sandbox_a881af6d-9433-4478-9ded-1fe7b0f5dc9d
2025-05-09 15:49:05,823 [MainThread  ] [INFO ]  Extracting: /home/rchaves/repo/ab_design/biobb_wf_haddock/data/antibody/docking/11/wf_caprieval5.zip
2025-05-09 15:49:05,824 [MainThread  ] [INFO ]  to:
2025-05-09 15:49:05,824 [MainThread  ] [INFO ]  ['a865195a-d9b5-401b-bc4f-0033b027786c/00_topoaa', 'a865195a-d9b5-401b-bc4f-0033b027786c/01_rigidbody', 'a865195a-d9b5-401b-bc4f-0033b027786c/02_caprieval', 'a865195a-d9b5-401b-bc4f-0033b027786c/03_seletop', 'a865195a-d9b5-401b-bc4f-0033b027786c/04_caprieval', 'a865195a-d9b5-401b-bc

0

## Results

In [81]:
step_idx = 12
output_contactmap_zip_path = f'{out_path}docking/{step_idx}/contact_map.zip'
wf_contact_map             = f'{out_path}docking/{step_idx}/wf_contact_map.zip'


with zipfile.ZipFile(wf_contact_map, 'r') as zip_ref:
    zip_ref.extractall(out_path+'/final_results')

In [82]:
webbrowser.open(f"http://0.0.0.0:8000/data/antibody/final_results/analysis/12_contactmap_analysis/report.html")
!python3 -m http.server

Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...
127.0.0.1 - - [09/May/2025 16:04:34] "GET /data/antibody/final_results/analysis/12_contactmap_analysis/report.html HTTP/1.1" 200 -
127.0.0.1 - - [09/May/2025 16:04:40] "GET /data/antibody/final_results/analysis/ HTTP/1.1" 200 -
127.0.0.1 - - [09/May/2025 16:04:48] "GET /data/antibody/final_results/analysis/9_clustfcc_analysis/ HTTP/1.1" 200 -
127.0.0.1 - - [09/May/2025 16:04:51] "GET /data/antibody/final_results/analysis/09_clustfcc_analysis/ HTTP/1.1" 200 -
127.0.0.1 - - [09/May/2025 16:04:56] "GET /data/antibody/final_results/analysis/9_clustfcc_analysis/ HTTP/1.1" 200 -
127.0.0.1 - - [09/May/2025 16:05:05] "GET /data/antibody/final_results/analysis/09_clustfcc_analysis/ HTTP/1.1" 200 -
127.0.0.1 - - [09/May/2025 16:05:08] "GET /data/antibody/final_results/analysis/08_caprieval_analysis/ HTTP/1.1" 200 -
127.0.0.1 - - [09/May/2025 16:05:10] "GET /data/antibody/final_results/analysis/08_caprieval_analysis/report.html HTTP/1.1

In [None]:
tsv_dir = output_haddock_wf_data_zip[:-4]+'/8_caprieval/'
# Load the cluster and single data into pandas DataFrames
cluster_df = pd.read_csv(tsv_dir + 'capri_clt.tsv', sep='\t',comment='#')
single_df = pd.read_csv(tsv_dir + 'capri_ss.tsv', sep='\t',comment='#')

# DockQ: incorrect (<0.23), acceptable (0.23-0.49), medium (0.49-0.80), and high (>=0.80) 
display(single_df.head())
single_df = single_df.sort_values(by='dockq', ascending=False)
display(single_df.head())
display(cluster_df.head())

# Barnase-Barstar protein complex

In [3]:
# Barnase-Barstar protein complex
# From Chen, R., Mintseris, J., Janin, J. and Weng, Z. (2003)
# A protein‚Äìprotein docking benchmark. 
# Proteins, 52: 88-91. https://doi-org.sire.ub.edu/10.1002/prot.10390
barnase_id = "1A2P"
barnase_ch = "B"
barstar_id = "1A19"
barstar_ch = "A"
complex_id = "1BRS" # barnase_barstar_complex
complex_ch = "A,D"
out_path = 'data/barnase_barstar/'

## Prepare pdbs

In [7]:
# Downloading desired PDB files
# Import module
from biobb_io.api.pdb import pdb

# Create properties dict and inputs/outputs
barnase_pdb = f'{out_path}{barnase_id}.pdb'
barstar_pdb = f'{out_path}{barstar_id}.pdb'
complex_pdb = f'{out_path}{complex_id}.pdb'

# Create and launch bb
pdb(output_pdb_path=barnase_pdb, properties=def_dict({'pdb_code': barnase_id}))
pdb(output_pdb_path=barstar_pdb, properties=def_dict({'pdb_code': barstar_id}))
pdb(output_pdb_path=complex_pdb, properties=def_dict({'pdb_code': complex_id}))

0

In [8]:
# These are the pdbs we get from RCSB
show_pdbs([barnase_pdb, barstar_pdb, complex_pdb])

HBox(children=(NGLWidget(layout=Layout(width='100%')), NGLWidget(layout=Layout(width='100%')), NGLWidget(layou‚Ä¶

In [9]:
# Filtering specific chains: we need to get rid of repeated chains
from biobb_pdb_tools.pdb_tools.biobb_pdb_selchain import biobb_pdb_selchain

# Create properties dict and inputs/outputs
barnase_pdb_ch = f'{out_path}{barnase_id}_ch.pdb'
barstar_pdb_ch = f'{out_path}{barstar_id}_ch.pdb'
complex_pdb_ch = f'{out_path}{complex_id}_ch.pdb'

# # Create and launch bb
biobb_pdb_selchain(input_file_path  = barnase_pdb,
                   output_file_path = barnase_pdb_ch,
                   properties       = def_dict({'chains': barnase_ch}))

biobb_pdb_selchain(input_file_path  = barstar_pdb,
                   output_file_path = barstar_pdb_ch,
                   properties       = def_dict({'chains': barstar_ch}))

biobb_pdb_selchain(input_file_path  = complex_pdb,
                   output_file_path = complex_pdb_ch,
                   properties       = def_dict({'chains': complex_ch}))

0

In [10]:
# On a real case we don't have the reference to know how the proteins bind each other
# What information can use to guide the process?
show_pdbs([barnase_pdb_ch, barstar_pdb_ch, complex_pdb_ch])

HBox(children=(NGLWidget(layout=Layout(width='100%')), NGLWidget(layout=Layout(width='100%')), NGLWidget(layou‚Ä¶

## Prepare AIRs

In [11]:
# Solvent accessibility: 
from biobb_haddock.haddock_restraints.haddock3_accessibility import haddock3_accessibility

# Create properties dict and inputs/outputs
barnase_sasa_out = f'{out_path}{barnase_id}_sasa_out.txt'
barstar_sasa_out = f'{out_path}{barstar_id}_sasa_out.txt'
barnase_sasa_actpass = f'{out_path}{barnase_id}_sasa_actpass.txt'
barstar_sasa_actpass = f'{out_path}{barstar_id}_sasa_actpass.txt'

cutoff = 0.3
# Barnase Chain
haddock3_accessibility(
        input_pdb_path            = barnase_pdb_ch,
        output_accessibility_path = barnase_sasa_out,
        output_actpass_path       = barnase_sasa_actpass,
        properties                = def_dict({'chain': barnase_ch,
                                              'cutoff': cutoff}))
# Barstar Chain
haddock3_accessibility(
        input_pdb_path            = barstar_pdb_ch,
        output_accessibility_path = barstar_sasa_out,
        output_actpass_path       = barstar_sasa_actpass,
        properties                = def_dict({'chain': barstar_ch,
                                              'cutoff': cutoff}))

0

In [12]:
# Careful! Pockets are good places to bind but have low accessibility
view1 = display_actpass(barnase_pdb_ch, barnase_sasa_actpass)
view2 = display_actpass(barstar_pdb_ch, barstar_sasa_actpass)
ipywidgets.HBox([view1, view2])

HBox(children=(NGLWidget(layout=Layout(width='100%')), NGLWidget(layout=Layout(width='100%'))))

In [13]:
# Electrostatic energies:
# We see a postive charge in the binding site of barnase and a negative charge in the binding site of barstar
show_pdbs([barnase_pdb_ch, barstar_pdb_ch],surface=True)

HBox(children=(NGLWidget(layout=Layout(width='100%')), NGLWidget(layout=Layout(width='100%'))))

In [14]:
# Obtain passive from active selection
from biobb_haddock.haddock_restraints.haddock3_passive_from_active import haddock3_passive_from_active

barnase_pass2act = f'{out_path}{barnase_id}_manual_actpass.txt'
barstar_pass2act = f'{out_path}{barstar_id}_manual_actpass.txt'

haddock3_passive_from_active( 
    input_pdb_path      = barnase_pdb_ch,
    output_actpass_path = barnase_pass2act,
    properties          = def_dict({'active_list' : '27,73,83,87'})
)

haddock3_passive_from_active( 
    input_pdb_path      = barstar_pdb_ch,
    output_actpass_path = barstar_pass2act,
    properties          = def_dict({'active_list' : '33,35,39,43'})
)

0

In [15]:
view1 = display_actpass(barnase_pdb_ch, barnase_pass2act, opacity=0.3)
view2 = display_actpass(barstar_pdb_ch, barstar_pass2act, opacity=0.3)
ipywidgets.HBox([view1, view2])

HBox(children=(NGLWidget(layout=Layout(width='100%')), NGLWidget(layout=Layout(width='100%'))))

In [16]:
# Convert active/passive to ambiguous restraints
from biobb_haddock.haddock_restraints.haddock3_actpass_to_ambig import haddock3_actpass_to_ambig

# With SASA
barnase_barstar_sasa_tbl = f'{out_path}barnase_barstar_sasa.tbl'
haddock3_actpass_to_ambig( 
    input_actpass1_path = barnase_sasa_actpass,
    input_actpass2_path = barstar_sasa_actpass,    
    output_tbl_path     = barnase_barstar_sasa_tbl,
    properties          = def_dict({'pass_to_act' : True,  # tbl need actives, we use the passive as active
                                    'segid_one': barnase_ch, 
                                    'segid_two': barstar_ch}))

# With manual active/passive
barnase_barstar_manual_tbl = f'{out_path}barnase_barstar_manual.tbl'
haddock3_actpass_to_ambig( 
    input_actpass1_path = barnase_pass2act,
    input_actpass2_path = barstar_pass2act,    
    output_tbl_path     = barnase_barstar_manual_tbl,
    properties          = def_dict({'segid_one': barnase_ch,
                                    'segid_two': barstar_ch}))

# The restrain have the next format:
# assign (selection1) (selection2) distance, lower-bound correction, upper-bound correction

0

In [17]:
# Validate tbl
!haddock3-restraints validate_tbl {barnase_barstar_sasa_tbl} --silent
!haddock3-restraints validate_tbl {barnase_barstar_manual_tbl} --silent

[2025-05-09 12:48:51,024 cli_restraints INFO] [DEPRECATION NOTICE] This command will soon be replaced with `haddock-restraints`
[2025-05-09 12:48:51,574 cli_restraints INFO] [DEPRECATION NOTICE] This command will soon be replaced with `haddock-restraints`


## Docking

In [11]:
barnase_pdb_ch = f'{out_path}{barnase_id}_ch.pdb'
barstar_pdb_ch = f'{out_path}{barstar_id}_ch.pdb'
complex_pdb_ch = f'{out_path}{complex_id}_ch.pdb'
barnase_barstar_manual_tbl = 'data/barnase_barstar/barnase_barstar_manual.tbl'

### 0. Topology

In [6]:
from biobb_haddock.haddock.topology import topology

properties=def_dict({
    'cfg': {
        'tolerance': 0,
    },
})

step_idx = 0
barnase_top_zip_path = f'{out_path}{step_idx}/barnase_top.zip'
barstar_top_zip_path = f'{out_path}{step_idx}/barstar_top.zip'
wf_topology          = f'{out_path}{step_idx}/wf_topology.zip'

topology(mol1_input_pdb_path        = barnase_pdb_ch,
         mol2_input_pdb_path        = barstar_pdb_ch,
         mol1_output_top_zip_path   = barnase_top_zip_path,
         mol2_output_top_zip_path   = barstar_top_zip_path,
         output_haddock_wf_data_zip = wf_topology,
         properties                 = properties)

0

### 1. Rigid body docking

In [9]:
from biobb_haddock.haddock.rigid_body import rigid_body

properties=def_dict({
    'cfg': {
        'tolerance': 5,
        'sampling': 10,
        # turn on random definiton of AIRs
        'ranair': False
    },
})

step_idx = 1
docking_output_zip_path = f'{out_path}{step_idx}/docking.zip'
wf_rigidbody            = f'{out_path}{step_idx}/wf_rigidbody.zip'

rigid_body(input_haddock_wf_data_zip   = wf_topology,
           docking_output_zip_path     = docking_output_zip_path,
           ambig_restraints_table_path = barnase_barstar_manual_tbl,
           output_haddock_wf_data_zip  = wf_rigidbody,
           properties                  = properties)

0

In [20]:
folder = docking_output_zip_path[:-4]
if os.path.exists(folder):
    shutil.rmtree(folder)
if not os.path.exists(folder):
    os.makedirs(folder)

with zipfile.ZipFile(docking_output_zip_path, 'r') as zip_ref:
    zip_ref.extractall(folder)

In [21]:
import pytraj as pt
import glob

pdb_dir = "data/barnase_barstar/1/docking/"
pdb_files = sorted(glob.glob(f"{pdb_dir}/*.pdb.gz"))
def show_aligned(chain):
    # Get all PDB files and sort them
    # Create a trajectory from the PDB files
    traj = pt.iterload(pdb_files, top=pdb_files[0])
    # Save the trajectory
    # pt.write_traj(f"{pdb_dir}/combined_{chain}_aligned.dcd", traj, overwrite=True)
    pt.align(traj, ref=0, mask=f'::{chain}')
    traj.save(f"{pdb_dir}/combined_{chain}_aligned_clust.pdb", options="model", overwrite=True)
    view = nv.show_pytraj(traj)
    view.layout.width = '100%'
    return view

In [22]:
view1 = show_aligned('B') # barnase
view2 = show_aligned('A') # barstar

# Display the viewer
ipywidgets.HBox([view1, view2])

HBox(children=(NGLWidget(layout=Layout(width='100%'), max_frame=99), NGLWidget(layout=Layout(width='100%'), ma‚Ä¶

In [23]:
view1 = nv.show_structure_file(f"{pdb_dir}/combined_A_aligned_clust.pdb", default_representation=False)
view2 = nv.show_structure_file(f"{pdb_dir}/combined_B_aligned_clust.pdb", default_representation=False)
view1.add_ribbon(color='chainIndex')
view2.add_ribbon(color='chainIndex')
view1.layout.width = '100%'
view2.layout.width = '100%'
# Display the viewer
box = ipywidgets.HBox([view1, view2])
display(box)
# Create a dropdown widget
opts = ['All']
opts.extend([pdb_file.split('/')[-1].split('.')[0] for pdb_file in pdb_files])
mdsel = ipywidgets.Dropdown(
    options=opts,
    description='Sel. model:',
    disabled=False,
)
display(mdsel)

def on_dropdown_change(change):
    """Handle dropdown selection changes.
    From https://github.com/nglviewer/nglview/issues/765
    """
    if change['type'] == 'change' and change['name'] == 'value': 
        selected_file = change['new']
        if selected_file=='All':
            view1._remote_call('setSelection', target='compList', args=["*"], 
               kwargs=dict(component_index=0))
            view2._remote_call('setSelection', target='compList', args=["*"], 
               kwargs=dict(component_index=0))
        else:
            # Extract model number from the filename
            model_num = selected_file.split('_')[1]
            print(f"Selected model: {model_num}")
            # Update the view with the selected model
            view1._remote_call('setSelection', target='compList', 
                            args=[f"/{model_num}"], 
                            kwargs=dict(component_index=0))
            # You can also update view2 if needed
            view2._remote_call('setSelection', target='compList', 
                            args=[f"/{model_num}"], 
                            kwargs=dict(component_index=0))

# Register the callback function
mdsel.observe(on_dropdown_change, names='value')

HBox(children=(NGLWidget(layout=Layout(width='100%')), NGLWidget(layout=Layout(width='100%'))))

Dropdown(description='Sel. model:', options=('All', 'rigidbody_1', 'rigidbody_10', 'rigidbody_100', 'rigidbody‚Ä¶

### 2. CAPRI eval

In [12]:
from biobb_haddock.haddock.capri_eval import capri_eval

output_evaluation_zip_path = f'{out_path}2/caprieval.zip'
wf_caprieval               = f'{out_path}2/wf_caprieval.zip'

capri_eval(input_haddock_wf_data_zip  = wf_rigidbody,
           reference_pdb_path         = complex_pdb_ch,
           output_evaluation_zip_path = output_evaluation_zip_path,
           output_haddock_wf_data_zip = wf_caprieval,
           properties = def_dict())

0

In [25]:
with zipfile.ZipFile(wf_caprieval, 'r') as zip_ref:
    zip_ref.extractall(wf_caprieval[:-4])
    
webbrowser.open(f"http://0.0.0.0:8000/{wf_caprieval[:-4]}/analysis/2_caprieval_analysis/report.html")
!python3 -m http.server

Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...
127.0.0.1 - - [09/May/2025 12:51:25] "GET /data/barnase_barstar/2/wf_caprieval/analysis/2_caprieval_analysis/report.html HTTP/1.1" 200 -
127.0.0.1 - - [09/May/2025 12:51:26] code 404, message File not found
127.0.0.1 - - [09/May/2025 12:51:26] "GET /favicon.ico HTTP/1.1" 404 -

Keyboard interrupt received, exiting.
^C


In [None]:
tsv_dir = wf_caprieval[:-4]+'/2_caprieval/'
# Load the cluster and single data into pandas DataFrames
cluster_df = pd.read_csv(tsv_dir + 'capri_clt.tsv', sep='\t',comment='#')
single_df = pd.read_csv(tsv_dir + 'capri_ss.tsv', sep='\t',comment='#')

# DockQ: incorrect (<0.23), acceptable (0.23-0.49), medium (0.49-0.80), and high (>=0.80) 
display(single_df.head())
single_df = single_df.sort_values(by='dockq', ascending=False)
display(single_df.head())
display(cluster_df.head())

Unnamed: 0,model,md5,caprieval_rank,score,irmsd,fnat,lrmsd,ilrmsd,dockq,rmsd,...,dihe,elec,improper,rdcs,rg,sym,total,vdw,vean,xpcs
0,../1_rigidbody/rigidbody_21.pdb,-,1,-35.399,10.5,0.073,17.72,17.46,0.093,10.437,...,0.0,-5.081,0.0,0.0,0.0,0.0,138.927,-27.074,0.0,0.0
1,../1_rigidbody/rigidbody_32.pdb,-,2,-32.402,9.09,0.036,16.234,15.139,0.093,9.466,...,0.0,-6.379,0.0,0.0,0.0,0.0,38.012,-17.737,0.0,0.0
2,../1_rigidbody/rigidbody_23.pdb,-,3,-31.626,12.408,0.036,19.163,18.927,0.072,11.838,...,0.0,-3.388,0.0,0.0,0.0,0.0,-1.722,-25.743,0.0,0.0
3,../1_rigidbody/rigidbody_39.pdb,-,4,-30.608,12.285,0.073,18.308,18.422,0.088,11.766,...,0.0,-4.807,0.0,0.0,0.0,0.0,35.006,-33.078,0.0,0.0
4,../1_rigidbody/rigidbody_92.pdb,-,5,-30.26,1.698,0.382,5.985,4.938,0.496,1.644,...,0.0,-3.119,0.0,0.0,0.0,0.0,-22.287,-41.086,0.0,0.0


Unnamed: 0,model,md5,caprieval_rank,score,irmsd,fnat,lrmsd,ilrmsd,dockq,rmsd,...,dihe,elec,improper,rdcs,rg,sym,total,vdw,vean,xpcs
9,../1_rigidbody/rigidbody_14.pdb,-,10,-29.587,1.667,0.436,3.353,3.362,0.583,1.55,...,0.0,-3.629,0.0,0.0,0.0,0.0,62.121,-20.308,0.0,0.0
23,../1_rigidbody/rigidbody_69.pdb,-,24,-22.066,1.72,0.436,3.285,3.021,0.579,1.724,...,0.0,-1.703,0.0,0.0,0.0,0.0,19.948,-13.323,0.0,0.0
4,../1_rigidbody/rigidbody_92.pdb,-,5,-30.26,1.698,0.382,5.985,4.938,0.496,1.644,...,0.0,-3.119,0.0,0.0,0.0,0.0,-22.287,-41.086,0.0,0.0
7,../1_rigidbody/rigidbody_79.pdb,-,8,-29.777,1.76,0.382,6.202,5.176,0.485,1.69,...,0.0,-3.389,0.0,0.0,0.0,0.0,-31.637,-39.426,0.0,0.0
55,../1_rigidbody/rigidbody_70.pdb,-,56,-16.968,2.086,0.382,6.881,5.319,0.442,2.176,...,0.0,-2.606,0.0,0.0,0.0,0.0,-28.916,-29.317,0.0,0.0


Unnamed: 0,cluster_rank,cluster_id,n,under_eval,score,score_std,irmsd,irmsd_std,fnat,fnat_std,...,bsa_std,desolv,desolv_std,elec,elec_std,total,total_std,vdw,vdw_std,caprieval_rank
0,-,-,100,-,-32.509,1.786,11.071,1.37,0.055,0.018,...,106.215,-16.186,1.825,-4.914,1.062,52.556,52.263,-25.908,5.467,1


In [27]:
import gzip
import shutil
best_pdb = os.path.normpath(os.path.join(tsv_dir, single_df['model'][0]))
# Decompress the .gz file
with gzip.open(best_pdb + '.gz', 'rb') as f_in:
    with open(best_pdb, 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)

In [28]:
show_pdbs([best_pdb, complex_pdb_ch])

HBox(children=(NGLWidget(layout=Layout(width='100%')), NGLWidget(layout=Layout(width='100%'))))

In [29]:
# the reference and the input proteins have diferent number of residues/atoms, 
# so a fit based on rmsd like pytraj does fails

# # TODO: meter en structure utils
from Bio.PDB import PDBParser, PDBIO
from Bio.PDB.cealign import CEAligner

# Parse the structures
parser = PDBParser(QUIET=True)
structure1 = parser.get_structure("complex_pdb_ch", complex_pdb_ch)
structure2 = parser.get_structure("best_pdb", best_pdb)
    
# Perform CE alignment
aligner = CEAligner()
aligner.set_reference(structure1)
aligner.align(structure2)

# Save structure2 to a PDB file
output_pdb_path = f"{out_path}aligned_structure2.pdb"
io = PDBIO()
io.set_structure(structure2)
io.save(output_pdb_path)

In [30]:
view = nv.show_structure_file(output_pdb_path)
view.add_component(complex_pdb_ch)
view.clear()
view.component_0.add_cartoon(selection=f':{barnase_ch}', color='red')
view.component_0.add_cartoon(selection=f':{barstar_ch}', color='pink')
view.component_1.clear()
view.component_1.add_cartoon(selection=f':{complex_ch[0]}', color='blue')
view.component_1.add_cartoon(selection=f':{complex_ch[-1]}', color='cyan')
view

NGLWidget()

### 3. Extend docking

In [13]:
# Files are relative to the input_haddock_wf_data_zip
cfg ="""
[seletop]
select = 25

[caprieval]
reference_fname = "./data/2_caprieval/1BRS_ch.pdb"

[flexref]
tolerance = 5
ambig_fname = "./data/1_rigidbody/barnase_barstar_manual.tbl"

[caprieval]
reference_fname = "./data/2_caprieval/1BRS_ch.pdb"

[emref]
tolerance = 5
ambig_fname = "./data/1_rigidbody/barnase_barstar_manual.tbl"

[caprieval]
reference_fname = "./data/2_caprieval/1BRS_ch.pdb"
# ====================================================================
"""
haddock_config_path        = f'{out_path}docking-barnase-barstar.cfg'

with open(haddock_config_path, 'w') as config_file:
    config_file.write(cfg)

In [None]:
from biobb_haddock.haddock.haddock3_extend import haddock3_extend

output_haddock_wf_data_zip = f'{out_path}3/extend_wf.zip'  

haddock3_extend(input_haddock_wf_data_zip  = wf_caprieval,
                haddock_config_path        = haddock_config_path,
                output_haddock_wf_data_zip = output_haddock_wf_data_zip,
                properties = def_dict())

In [33]:
with zipfile.ZipFile(output_haddock_wf_data_zip, 'r') as zip_ref:
    zip_ref.extractall(output_haddock_wf_data_zip[:-4])
    
webbrowser.open(f"http://0.0.0.0:8000/{output_haddock_wf_data_zip[:-4]}/analysis/8_caprieval_analysis/report.html")
!python3 -m http.server

Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...
127.0.0.1 - - [09/May/2025 12:55:21] "GET /data/barnase_barstar/3/extend_wf/analysis/8_caprieval_analysis/report.html HTTP/1.1" 200 -

Keyboard interrupt received, exiting.
^C


In [None]:
tsv_dir = output_haddock_wf_data_zip[:-4]+'/8_caprieval/'
# Load the cluster and single data into pandas DataFrames
cluster_df = pd.read_csv(tsv_dir + 'capri_clt.tsv', sep='\t',comment='#')
single_df = pd.read_csv(tsv_dir + 'capri_ss.tsv', sep='\t',comment='#')

# DockQ: incorrect (<0.23), acceptable (0.23-0.49), medium (0.49-0.80), and high (>=0.80) 
display(single_df.head())
single_df = single_df.sort_values(by='dockq', ascending=False)
display(single_df.head())
display(cluster_df.head())

Unnamed: 0,model,md5,caprieval_rank,score,irmsd,fnat,lrmsd,ilrmsd,dockq,rmsd,...,dihe,elec,improper,rdcs,rg,sym,total,vdw,vean,xpcs
0,../7_emref/emref_18.pdb,-,1,-120.098,10.365,0.127,15.461,16.703,0.127,9.804,...,0.0,-274.875,0.0,0.0,0.0,0.0,-325.708,-51.643,0.0,0.0
1,../7_emref/emref_3.pdb,-,2,-115.348,12.775,0.055,18.909,18.743,0.079,11.925,...,0.0,-345.125,0.0,0.0,0.0,0.0,-382.854,-40.634,0.0,0.0
2,../7_emref/emref_9.pdb,-,3,-111.546,1.469,0.6,2.43,2.393,0.678,1.421,...,0.0,-340.269,0.0,0.0,0.0,0.0,-379.879,-41.365,0.0,0.0
3,../7_emref/emref_15.pdb,-,4,-105.09,11.924,0.036,17.889,18.371,0.079,11.277,...,0.0,-252.707,0.0,0.0,0.0,0.0,-286.816,-36.093,0.0,0.0
4,../7_emref/emref_4.pdb,-,5,-104.415,12.638,0.091,18.653,18.593,0.092,12.072,...,0.0,-276.266,0.0,0.0,0.0,0.0,-310.324,-40.087,0.0,0.0


Unnamed: 0,model,md5,caprieval_rank,score,irmsd,fnat,lrmsd,ilrmsd,dockq,rmsd,...,dihe,elec,improper,rdcs,rg,sym,total,vdw,vean,xpcs
2,../7_emref/emref_9.pdb,-,3,-111.546,1.469,0.6,2.43,2.393,0.678,1.421,...,0.0,-340.269,0.0,0.0,0.0,0.0,-379.879,-41.365,0.0,0.0
10,../7_emref/emref_10.pdb,-,11,-95.662,1.42,0.582,2.56,2.482,0.675,1.416,...,0.0,-304.442,0.0,0.0,0.0,0.0,-327.697,-29.073,0.0,0.0
11,../7_emref/emref_2.pdb,-,12,-95.461,1.483,0.509,5.794,4.232,0.566,1.619,...,0.0,-331.2,0.0,0.0,0.0,0.0,-346.908,-24.219,0.0,0.0
14,../7_emref/emref_7.pdb,-,15,-87.22,2.305,0.4,7.167,5.766,0.427,2.437,...,0.0,-261.264,0.0,0.0,0.0,0.0,-291.301,-30.562,0.0,0.0
16,../7_emref/emref_24.pdb,-,17,-81.04,2.174,0.345,7.846,5.798,0.403,2.448,...,0.0,-222.034,0.0,0.0,0.0,0.0,-216.602,-28.297,0.0,0.0


Unnamed: 0,cluster_rank,cluster_id,n,under_eval,score,score_std,irmsd,irmsd_std,fnat,fnat_std,...,bsa_std,desolv,desolv_std,elec,elec_std,total,total_std,vdw,vdw_std,caprieval_rank
0,-,-,25,-,-113.021,5.49,9.133,4.508,0.205,0.231,...,114.879,-10.124,6.382,-303.244,40.261,-343.814,40.004,-42.434,5.688,1
