<img src="images/logos.png">



In [1]:
%run src/init_notebooks.py
hide_toggle()

# A GROMACS - HADDOCK antibody workflow 

```
Author                : Alessandra Villa, Zuzana Jandova
Goal                  : Jupyter notebook for training purposes in antibody simulation and docking
Time                  : 10 minutes reading time, no computation wait time
Prerequisites         : Know how to run an md simulation, register with HADDOCK
Software requirements : GROMACS (version 2020) , pdb-tools, python 2.7, HMMER, Biopandas, Biopython
Tested for            : 
```
 

# Table of Contents

* [Use Case Introduction](#UCintro)
* [Description of the Workflow](#Workflow)
* [Structure preparation](#Structure)
* [Molecular Dynamics with GROMACS](#MD)
* [CDRs identification](#CDRs)
* [Cluster antibody trajectory](#Cluster) 
* [Prepare HADDOCK json file](#HADDOCK)

 ## <a class="anchor" id="UCintro" > Use Case Introduction </a>


Antibody design has grown into one of the fastest growing branches in the pharmaceutical industry. Antibodies promise extremely high specificity and the ability to use the body’s own immune system to kill e.g. tumors, however  their size and complexity make their computational design challenging. 
 
An antibody is a large protein that generally works by attaching itself to an antigen, which is a unique site of the pathogen. The binding harnesses the immune system to directly attack and destroy the pathogen. Antibodies can be highly specific while showing low immunogenicity, which is achieved by their unique structure. **The fragment crystallizable region (Fc region**) activates the immune response and is species specific, i.e. human Fc region should not evoke an immune response in humans.  **The fragment antigen-binding region (Fab region**) needs to be highly variable to be able to bind to antigens of various nature (high specificity).  The terminal (antigen recognising) domain of the Fab region is caleld **the variable domain (Fv domain)**.
 

<figure >
<img src="images/antibody_described.png">
</figure>

The small part of the Fab region that binds the antigen is called **paratope**. The part of the antigen that binds to an antibody is called **epitope**. The paratope consists of six highly flexible loops, known as **complementarity-determining regions (CDRs)** or hypervariable loops whose sequence and conformation are altered to bind to different antigens.  


 - Specific problem 

 - Introduction of use case as tutorial
 what they will learn from a tutorial
  
 - References
     - reading material


 ## <a class="anchor" id="Workflow"> Description of the Workflow </a>

This jupyter notebook combines two approaches: molecular dynamics (MD) simulation with [GROMACS](http://www.gromacs.org/About_Gromacs) and molecular docking with [HADDOCK](https://www.bonvinlab.org/software/haddock2.4/) to provide a good starting point for antibody design. We improve the sampling of the CDRs  using MD, extract the most diverse loop conformations and prepare such ensemble for running with HADDOCK. 



<img src="images/UC1_example.png" width="500">


To obtain the best prediction of a bound antibody-antigen complex, we will be using a worflow consisting of several steps. 
<img src="images/workflow.png" width="900">


1. Download PDBs of antibody and antigen
1. Pre-processing antibody pdb for HADDOCK
1. Converting antibody pdb into GROMACS
1. Generation of mdrun input file  
1. Setup MD of unbound antibody (gmx) run mdrun (run local or on HPC - script)
1. Trajectory postprocessing (remove pbc) gmx trjconv 
1. Define loop residues – (protocol Ambrosetti et al., 2020)
1. Index on loop residues & backbone (gmx)
1. Cluster MD of unbound antibody by loop residue conformations (gmx cluster) gromos
1. extract 20 most populated cluster (automatic with gromos)
1. Prepare clusters from MD for docking (renumber) ⟹ setup docking run json file






 ## System setting

The starting point is a antibody structure file. For this tutorial, we will utilize a Fab part of an antibody (PDB code [3RVT](http://www.rcsb.org/structure/3RVT)) which binds to the group 1 house dust mite allergen (PDB code [3F5B](http://www.rcsb.org/structure/3F5B)). As a reference, a crystal structure of the complex is available (PDB code [3RVW](http://www.rcsb.org/structure/3RVW)). All files are available from the RCSB website, https://www.rcsb.org/. For this tutorial, the PDB file for the crystal structure is depositied in this directory as "3RVT.pdb" .

Below you can visualize the antibody structure  

In alternative you can visualize the structure using a viewing program such as VMD.
Note: close the VMD window after you are done looking at the protein to continue with this notebook

# <a class="anchor" id="Structure">Structure preparation </a>

## Get your pdb

Prepare the conda enviroment
``` bash
git clone https://github.com/alevil-gmx/workflow_template.git
cd workflow_template

# Create conda enviroment with all the dependencies
conda env create 

# Install ANARCI
conda activate AB_workflow
cd src/anarci-1.3
python2.7 setup.py install
cd ../..
conda deactivate
```

In this step we will make use of the local version of [PDB-tools](http://www.bonvinlab.org/pdb-tools/). PDB-tools were designed to be a swiss-knife for the PDB format. They have no external dependencies, besides the Python programming language. You can find them on [Github](https://github.com/haddocking/pdb-tools) or as a webserver. [PDB-tools webserver](https://wenmr.science.uu.nl/pdbtools/) is a powerful tool that enables you to edit pdbs quickly and painlessly without any scripting knowledge.

In [9]:
%cd data

/Users/zuzka/Documents/UU/antibodies/workflow_template/workflow_template/data


In [12]:
!pdb_fetch -biounit 3RVT > input/3RVT.pdb

In [13]:
representations=[
    {"type": "cartoon", "params": {
        "sele": "protein and not ANI", "color": "chainname", 
    }},
    {"type": "ball+stick", "params": {
        "sele": "hetero"
    }},]   

In [14]:
%cd ../
import nglview as ng
view = ng.show_file("data/input/3RVT.pdb", defaultRepresentation= False)
view.representations = representations
view
# click and drag to rotate, zoom with your mouseweel 
# for more infor on this viewer have a look at https://github.com/nglviewer/nglview

/Users/zuzka/Documents/UU/antibodies/workflow_template/workflow_template


NGLWidget()

### Clean your pdb


In [15]:
%cd data
!pdb_chainxseg input/3RVT.pdb > 3RVT_seg.pdb ; pdb_splitseg 3RVT_seg.pdb ; pdb_reres -501 3RVT_seg_D.pdb  > 3RVT_seg_D_ren.pdb ; pdb_merge 3RVT_seg_C.pdb 3RVT_seg_D_ren.pdb > 3RVT_merged.pdb ; pdb_chain -A 3RVT_merged.pdb | pdb_seg| pdb_delhetatm | pdb_tidy  > 3RVT_clean.pdb ; sed -i "" '/ANISOU/d'  3RVT_clean.pdb  >> 3RVT_clean.pdb; rm *merged.pdb *seg* 
%cd ../

/Users/zuzka/Documents/UU/antibodies/workflow_template/workflow_template/data
/Users/zuzka/Documents/UU/antibodies/workflow_template/workflow_template


In [24]:
%ls data/input/3RVT_clean.pdb
view = ng.show_file('data/3RVT_clean.pdb',  defaultRepresentation= False)
view.representations = representations
view

ls: data/input/3RVT_clean.pdb: No such file or directory


NGLWidget()

Once you've had a look at the molecule, you are going to check that only the protein is present in the pdb file. Otherwise strip out all the other molecules in the crystal . To delete the other molecules , either use a plain text editor like vi, emacs (Linux/Mac), or Notepad (Windows). Do not use word processing software! 

Always check your .pdb file for entries listed under the comment MISSING, as these entries indicate either atoms or whole residues that are not present in the crystal structure. Terminal regions may be absent, and may not present a problem for dynamics.

#  <a class="anchor" id="MD"> Molecular Dynamics with GROMACS </a >

## Generating a topology

Now the PDB file should contain only protein atoms, and is ready to be input into GROMACS. 
The first GROMACS tool, we use, is pdb2gmx. The purpose of pdb2gmx is to generate three files:

* The topology for the molecule.
* A position restraint file.
* A post-processed structure file. 

The topology (topol.top by default) contains all the information necessary to define the molecule within a simulation. This information includes nonbonded parameters (atom types and charges) as well as bonded parameters (bonds, angles, dihedrals and atom connectivity).

## Force Field

Here, we made an important decision for the course of the simualtion in choosing the force field. Here we use CHARMM36m all-atom force field (see (here) for update in the CHARMM36 force field implementation for GROMACS http://mackerell.umaryland.edu/charmm_ff.shtml#gromacs. The force field files contain the information that will be written to the topology. 

In [8]:
%cd data
!tar xvf input/charmm36.tar

/Users/zuzka/Documents/UU/antibodies/workflow_template/workflow_template/data
x charmm36-jul2017.ff/
x charmm36-jul2017.ff/atomtypes.atp
x charmm36-jul2017.ff/cmap.itp
x charmm36-jul2017.ff/ffbonded.itp
x charmm36-jul2017.ff/ffnonbonded.itp
x charmm36-jul2017.ff/forcefield.doc
x charmm36-jul2017.ff/forcefield.itp
x charmm36-jul2017.ff/gb.itp
x charmm36-jul2017.ff/ions.itp
x charmm36-jul2017.ff/merged.arn
x charmm36-jul2017.ff/merged.c.tdb
x charmm36-jul2017.ff/merged.hdb
x charmm36-jul2017.ff/merged.n.tdb
x charmm36-jul2017.ff/merged.r2b
x charmm36-jul2017.ff/merged.rtp
x charmm36-jul2017.ff/merged.vsd
x charmm36-jul2017.ff/nbfix.itp
x charmm36-jul2017.ff/old_c36_cmap.itp
x charmm36-jul2017.ff/spc.itp
x charmm36-jul2017.ff/spce.itp
x charmm36-jul2017.ff/tip3p.itp
x charmm36-jul2017.ff/tip4p.itp
x charmm36-jul2017.ff/watermodels.dat


In [9]:
!gmx pdb2gmx -f input/3RVT.pdb -p antibody.top -o antibody.gro -i posre -ff charmm36-jul2017 -water tip3p -ignh -missing

                     :-) GROMACS - gmx pdb2gmx, 2019.1 (-:

                            GROMACS is written by:
     Emile Apol      Rossen Apostolov      Paul Bauer     Herman J.C. Berendsen
    Par Bjelkmar      Christian Blau   Viacheslav Bolnykh     Kevin Boyd    
 Aldert van Buuren   Rudi van Drunen     Anton Feenstra       Alan Gray     
  Gerrit Groenhof     Anca Hamuraru    Vincent Hindriksen  M. Eric Irrgang  
  Aleksei Iupinov   Christoph Junghans     Joe Jordan     Dimitrios Karkoulis
    Peter Kasson        Jiri Kraus      Carsten Kutzner      Per Larsson    
  Justin A. Lemkul    Viveca Lindahl    Magnus Lundborg     Erik Marklund   
    Pascal Merz     Pieter Meulenhoff    Teemu Murtola       Szilard Pall   
    Sander Pronk      Roland Schulz      Michael Shirts    Alexey Shvetsov  
   Alfons Sijbers     Peter Tieleman      Jon Vincent      Teemu Virolainen 
 Christian Wennberg    Maarten Wolf   
                           and the project leaders:
        Mark Abraham, Be

## defining simulation box

In [9]:
!gmx editconf -f antibody.gro -d 0.7 -bt dodecahedron -o box.gro

                     :-) GROMACS - gmx editconf, 2019.1 (-:

                            GROMACS is written by:
     Emile Apol      Rossen Apostolov      Paul Bauer     Herman J.C. Berendsen
    Par Bjelkmar      Christian Blau   Viacheslav Bolnykh     Kevin Boyd    
 Aldert van Buuren   Rudi van Drunen     Anton Feenstra       Alan Gray     
  Gerrit Groenhof     Anca Hamuraru    Vincent Hindriksen  M. Eric Irrgang  
  Aleksei Iupinov   Christoph Junghans     Joe Jordan     Dimitrios Karkoulis
    Peter Kasson        Jiri Kraus      Carsten Kutzner      Per Larsson    
  Justin A. Lemkul    Viveca Lindahl    Magnus Lundborg     Erik Marklund   
    Pascal Merz     Pieter Meulenhoff    Teemu Murtola       Szilard Pall   
    Sander Pronk      Roland Schulz      Michael Shirts    Alexey Shvetsov  
   Alfons Sijbers     Peter Tieleman      Jon Vincent      Teemu Virolainen 
 Christian Wennberg    Maarten Wolf   
                           and the project leaders:
        Mark Abraham, B

## filling the box with water molecules

In [10]:
!gmx solvate -cp box.gro -p antibody.top -o water.gro

                     :-) GROMACS - gmx solvate, 2019.1 (-:

                            GROMACS is written by:
     Emile Apol      Rossen Apostolov      Paul Bauer     Herman J.C. Berendsen
    Par Bjelkmar      Christian Blau   Viacheslav Bolnykh     Kevin Boyd    
 Aldert van Buuren   Rudi van Drunen     Anton Feenstra       Alan Gray     
  Gerrit Groenhof     Anca Hamuraru    Vincent Hindriksen  M. Eric Irrgang  
  Aleksei Iupinov   Christoph Junghans     Joe Jordan     Dimitrios Karkoulis
    Peter Kasson        Jiri Kraus      Carsten Kutzner      Per Larsson    
  Justin A. Lemkul    Viveca Lindahl    Magnus Lundborg     Erik Marklund   
    Pascal Merz     Pieter Meulenhoff    Teemu Murtola       Szilard Pall   
    Sander Pronk      Roland Schulz      Michael Shirts    Alexey Shvetsov  
   Alfons Sijbers     Peter Tieleman      Jon Vincent      Teemu Virolainen 
 Christian Wennberg    Maarten Wolf   
                           and the project leaders:
        Mark Abraham, Be

## adding ions

In [11]:
3581WL!touch ion.mdp
!gmx grompp -f ion.mdp -c water.gro -p antibody.top -o
!echo 13 | gmx genion -s topol.tpr -p antibody.top -neutral -conc 0.15 -o startMM.gro

                      :-) GROMACS - gmx grompp, 2019.1 (-:

                            GROMACS is written by:
     Emile Apol      Rossen Apostolov      Paul Bauer     Herman J.C. Berendsen
    Par Bjelkmar      Christian Blau   Viacheslav Bolnykh     Kevin Boyd    
 Aldert van Buuren   Rudi van Drunen     Anton Feenstra       Alan Gray     
  Gerrit Groenhof     Anca Hamuraru    Vincent Hindriksen  M. Eric Irrgang  
  Aleksei Iupinov   Christoph Junghans     Joe Jordan     Dimitrios Karkoulis
    Peter Kasson        Jiri Kraus      Carsten Kutzner      Per Larsson    
  Justin A. Lemkul    Viveca Lindahl    Magnus Lundborg     Erik Marklund   
    Pascal Merz     Pieter Meulenhoff    Teemu Murtola       Szilard Pall   
    Sander Pronk      Roland Schulz      Michael Shirts    Alexey Shvetsov  
   Alfons Sijbers     Peter Tieleman      Jon Vincent      Teemu Virolainen 
 Christian Wennberg    Maarten Wolf   
                           and the project leaders:
        Mark Abraham, Be

Replacing solvent molecule 13040 (atom 45781) with NA
Replacing solvent molecule 10471 (atom 38074) with NA
Replacing solvent molecule 7228 (atom 28345) with NA
Replacing solvent molecule 9835 (atom 36166) with NA
Replacing solvent molecule 9404 (atom 34873) with NA
Replacing solvent molecule 16427 (atom 55942) with NA
Replacing solvent molecule 14629 (atom 50548) with NA
Replacing solvent molecule 11400 (atom 40861) with NA
Replacing solvent molecule 2500 (atom 14161) with NA
Replacing solvent molecule 8526 (atom 32239) with NA
Replacing solvent molecule 4926 (atom 21439) with NA
Replacing solvent molecule 15427 (atom 52942) with NA
Replacing solvent molecule 2411 (atom 13894) with NA
Replacing solvent molecule 16357 (atom 55732) with NA
Replacing solvent molecule 11146 (atom 40099) with NA
Replacing solvent molecule 3873 (atom 18280) with NA
Replacing solvent molecule 1000 (atom 9661) with NA
Replacing solvent molecule 17007 (atom 57682) with NA
Replacing solvent molecule 7808 (atom 

# System equilibration

In [12]:
!mkdir MM
!gmx grompp -f steep_charmm36m.mdp -c startMM.gro -p antibody.top -o MM/topol.tpr! 
!gmx mdrun -v -s MM/topol.tpr -deffnm antibody_MM
!mkdir POS
!gmx grompp -f md_eq_posre_charmm36m.mdp -c antibodyMM.gro -p antibody.top -o POS/topol.tpr 
!gmx mdrun -v -s POS/topol.tpr -deffnm antibody_P

%cd ..
!gmx grompp -f ../../../mdp_files/md_eq_posre_charmm36m.mdp -c MM/confout.gro  -r MM/confout.gro -p antibody.top -o POS/topol.tpr
#gmx grompp -f ../../../mdp_files/md_charmm36m.mdp -c POS/confout.gro -p $name.top -o MD/topol.tpr

#cp topol.tpr in topol100.tpr
#gmx convert-tpr -s topol100.tpr -extend 400000 -o topol.tpr

mkdir: MM: File exists
                      :-) GROMACS - gmx grompp, 2019.1 (-:

                            GROMACS is written by:
     Emile Apol      Rossen Apostolov      Paul Bauer     Herman J.C. Berendsen
    Par Bjelkmar      Christian Blau   Viacheslav Bolnykh     Kevin Boyd    
 Aldert van Buuren   Rudi van Drunen     Anton Feenstra       Alan Gray     
  Gerrit Groenhof     Anca Hamuraru    Vincent Hindriksen  M. Eric Irrgang  
  Aleksei Iupinov   Christoph Junghans     Joe Jordan     Dimitrios Karkoulis
    Peter Kasson        Jiri Kraus      Carsten Kutzner      Per Larsson    
  Justin A. Lemkul    Viveca Lindahl    Magnus Lundborg     Erik Marklund   
    Pascal Merz     Pieter Meulenhoff    Teemu Murtola       Szilard Pall   
    Sander Pronk      Roland Schulz      Michael Shirts    Alexey Shvetsov  
   Alfons Sijbers     Peter Tieleman      Jon Vincent      Teemu Virolainen 
 Christian Wennberg    Maarten Wolf   
                           and the project leaders:
 

/Users/zuzka/Documents/UU/antibodies/workflow_template/workflow_template
                      :-) GROMACS - gmx grompp, 2019.1 (-:

                            GROMACS is written by:
     Emile Apol      Rossen Apostolov      Paul Bauer     Herman J.C. Berendsen
    Par Bjelkmar      Christian Blau   Viacheslav Bolnykh     Kevin Boyd    
 Aldert van Buuren   Rudi van Drunen     Anton Feenstra       Alan Gray     
  Gerrit Groenhof     Anca Hamuraru    Vincent Hindriksen  M. Eric Irrgang  
  Aleksei Iupinov   Christoph Junghans     Joe Jordan     Dimitrios Karkoulis
    Peter Kasson        Jiri Kraus      Carsten Kutzner      Per Larsson    
  Justin A. Lemkul    Viveca Lindahl    Magnus Lundborg     Erik Marklund   
    Pascal Merz     Pieter Meulenhoff    Teemu Murtola       Szilard Pall   
    Sander Pronk      Roland Schulz      Michael Shirts    Alexey Shvetsov  
   Alfons Sijbers     Peter Tieleman      Jon Vincent      Teemu Virolainen 
 Christian Wennberg    Maarte

## defining simulation box

# <a class="anchor" id="CDRs"> CDRs identification </a >

Antibodies are created by two identical pairs of  **light (L)** and **heavy (H)** chains. As shown above the variable domains of both chains are involved in the antigen recognition [1]. Each chain contains three <font color="#8F0306">**CDRs** (in red) </font>in their variable domain, which show the highest level of variability and directly interact with the antigen [2, 3]. 
Each CDR contains one <font color="#8F0306">**hyper variable loop (HV)** </font>(six loops in total) which are crucial for the recognition of the cognate antigen.

<img src="images/CDRs.png" width="600">

In this notbeook we will be using the protocol of [Ambrosetti, et al ArXiv, 2020](https://www.biorxiv.org/content/10.1101/2020.03.18.967828v1) to identify CDRs residues and convert them to the GROMACS format for further trajectory clustering.

1. Novotný J, Bruccoleri R, Newell J, et al (1983) Molecular anatomy of the antibody
binding site. J Biol Chem 258:14433–14437
2. Sela-Culang I, Kunik V, Ofran Y (2013) The Structural Basis of Antibody-Antigen
Recognition. Front Immunol 4:302. https://doi.org/10.3389/fimmu.2013.00302
3. MacCallum RM, Martin ACR, Thornton JM (1996) Antibody-antigen interactions:
Contact analysis and binding site topography. J Mol Biol 262:732–745.
https://doi.org/10.1006/jmbi.1996.0548

In [2]:

%cd data
# Renumber antibody with the Chothia scheme
!python2.7 ../scripts/ImmunoPDB.py -i input/3RVT.pdb -o 3RVT_ch.pdb --scheme c --rename --splitscfv

# Format the antib-ody in order to fit the HADDOCK format requirements
# and extract the HV loop residues and save them into a file
!python ../scripts/ab_haddock_format.py 3RVT_ch.pdb 3RVT_HADDOCK.pdb A > active.txt

# Add END and TER statements to the .pdb file
!pdb_tidy 3RVT_HADDOCK.pdb > oo; mv oo 3RVT_HADDOCK.pdb


/Users/zuzka/Documents/UU/antibodies/workflow_template/workflow_template/data
Traceback (most recent call last):
  File "../scripts/ab_haddock_format.py", line 22, in <module>
    import biopandas.pdb as bp
  File "/Users/zuzka/anaconda3/lib/python3.6/site-packages/biopandas/pdb/__init__.py", line 12, in <module>
    from .pandas_pdb import PandasPdb
  File "/Users/zuzka/anaconda3/lib/python3.6/site-packages/biopandas/pdb/pandas_pdb.py", line 9, in <module>
    import pandas as pd
  File "/Users/zuzka/anaconda3/lib/python3.6/site-packages/pandas/__init__.py", line 55, in <module>
    from pandas.core.api import (
  File "/Users/zuzka/anaconda3/lib/python3.6/site-packages/pandas/core/api.py", line 5, in <module>
    from pandas.core.arrays.integer import (
  File "/Users/zuzka/anaconda3/lib/python3.6/site-packages/pandas/core/arrays/__init__.py", line 13, in <module>
    from .sparse import SparseArray  # noqa: F401
  File "/Users/zuzka/anaconda3/lib/python3.6/site-packages/pandas/core/

In [23]:
!pwd
%cd ../

/Users/zuzka/Documents/UU/antibodies/workflow_template/workflow_template/data
/Users/zuzka/Documents/UU/antibodies/workflow_template/workflow_template


## Visualise CDR residues

In [9]:
f = open("data/active.txt")
lines=f.read()
representations=[
    {"type": "cartoon", "params": {
        "sele": "protein and not ANI", "color": "chainname", 
    }},
    {"type": "hyperball", 'params': {"sele":lines, "color":"orange"}}
]        

In [10]:


print(lines)
view = ng.show_file('data/3RVT_HADDOCK.pdb', defaultRepresentation= False)
view.representations = representations
view

26, 27, 28, 29, 30, 31, 32, 50, 51, 52, 91, 92, 93, 94, 95, 96, 239, 240, 241, 242, 243, 244, 245, 246, 267, 268, 269, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322



NGLWidget()

## Turn CDR residues into index file to cluster antibody trajectory

In [21]:
%cd data 

!echo -e  $(sed -e  '1 s/./ri /' -e 's/,/ /g'  active.txt)'\n' name 15 CDRs '\n'q  | gmx make_ndx -f topol.tpr -o index_jupy.ndx

[Errno 2] No such file or directory: 'data'
/Users/zuzka/Documents/UU/antibodies/workflow_template/workflow_template/data
                     :-) GROMACS - gmx make_ndx, 2019.1 (-:

                            GROMACS is written by:
     Emile Apol      Rossen Apostolov      Paul Bauer     Herman J.C. Berendsen
    Par Bjelkmar      Christian Blau   Viacheslav Bolnykh     Kevin Boyd    
 Aldert van Buuren   Rudi van Drunen     Anton Feenstra       Alan Gray     
  Gerrit Groenhof     Anca Hamuraru    Vincent Hindriksen  M. Eric Irrgang  
  Aleksei Iupinov   Christoph Junghans     Joe Jordan     Dimitrios Karkoulis
    Peter Kasson        Jiri Kraus      Carsten Kutzner      Per Larsson    
  Justin A. Lemkul    Viveca Lindahl    Magnus Lundborg     Erik Marklund   
    Pascal Merz     Pieter Meulenhoff    Teemu Murtola       Szilard Pall   
    Sander Pronk      Roland Schulz      Michael Shirts    Alexey Shvetsov  
   Alfons Sijbers     Peter Tieleman      Jon Vincent      Teemu Viro

In [22]:
Zuzana Jandová!tail -n 45 index_jupy.ndx

6616 6617 6618 6619 6620 6621 6622 6623 6624 6625 6626 6627 6628 6629 6630
6631 6632 6633 6634 6635 6636 6637 6638 6639 6640 6641 6642 6643 6644 6645
6646 6647 6648 6649 6650 6651 6652 6653 6654 6655 6656 6657 6658 6659 6660
6661
[ CDRs ]
  86   87   88   89   90   91   92   93   94   95   96   97   98   99  100
 101  102  403  404  405  406  407  408  409  410  411  412  413  414  415
 416  417  418  419  420  421  422  423  424  425  426  427  428  429  430
 431  432  433  434  435  436  437  438  439  440  441  442  443  444  445
 446  447  448  449  450  451  452  453  454  455  456  457  458  459  460
 461  462  463  464  465  466  467  468  469  470  471  472  473  474  475
 476  477  478  479  480  481  482  483  484  485  486  487  488  489  490
 491  492  493  494  495  496  497  498  499  500  501  502  503  801  802
 803  804  805  806  807  808  809  810  811  812  813  814  815  816  817
 818  819  820  821  822  823  824  825  826  827  828  829  830  831  8

# <a class="anchor" id="Cluster">  Cluster antibody trajectory </a>

## Create an ensemble for docking

In [46]:
%cd data
!pdb_mkensemble input/3RVT.pdb input1.pdb input1.pdb > 3RVT_ensemble.pdb

/Users/zuzka/Documents/UU/antibodies/workflow_template/workflow_template/data
ERROR!! File not found or not readable: 'input1.pdb'

Merges several PDB files into one multi-model (ensemble) file. Strips all
HEADER information and adds REMARK statements with the provenance of each
conformer.

Usage:
    python pdb_mkensemble.py <pdb file> <pdb file>

Example:
    python pdb_mkensemble.py 1ABC.pdb 1XYZ.pdb

This program is part of the `pdb-tools` suite of utilities and should not be
distributed isolatedly. The `pdb-tools` were created to quickly manipulate PDB
files using the terminal, and can be used sequentially, with one tool streaming
data to another. They are based on old FORTRAN77 code that was taking too much
effort to maintain and compile. RIP.


# <a class="anchor" id="HADDOCK">  Prepare HADDOCK `json` file </a>

We prepared a sample HADDOCK `json` file that one can submit to the [Submit file interface](https://bianca.science.uu.nl/haddock2.4/submit_file) of HADDOCK2.4. First one needs to register  for all HADDOCK services [https://bianca.science.uu.nl/auth/register/](https://bianca.science.uu.nl/auth/register/). Then we will modify the `json` file with a few available scripts to update it with our newly created pdb. In this case we use ambiguous restraints created from the reference crystal structure (PDB ID [3RVW](https://www.rcsb.org/structure/3RVW)). They were defined based on the true interface (all residues within 3.9Å from the other protein) and are also located in `data/input/ambig.tbl`.  If one wishes to replace them too, there is a script `data/script/replace_tbl.py` which updates the restraint as well. Here one needs to specify which type of restraints they want to replace (`-type (ambig,unambig,hbond,dihedral)`). In this scenario we are using increased sampling such that each antibody conformer is used in 100 model for it0 - rigid body docking. In this case we have an ensemble of 12 conformers, thus 1200 structures geenrated in it0 (`structures_0`). Further, we increase sampling from 200 to 400 per it1 (`structures_1`) and it1 water (`waterrefine`) and in the analysis part (`anastruc_1`) too. These changes are already included in the sample `json` file, thus do not need to be modified manually.   


### Modified sampling parameters:

`"anastruc_1": 400,
"structures_0": 1200,
"structures_1": 400,
"waterrefine": 400,`

In [10]:
%cd data
python  ../scripts/rswieplace_pdb.py  -param input/job_params.json -pdb input/3RVT_ensemble.pdb -i 1 > new.json

SyntaxError: invalid syntax (<ipython-input-10-d2aa19acd945>, line 2)

Visit [https://bianca.science.uu.nl/haddock2.4/submit_file](https://bianca.science.uu.nl/haddock2.4/submit_file) and upload the `json` file. 

In [4]:
%pwd

'/Users/zuzka/Documents/UU/antibodies/workflow_template/workflow_template'

# Writing the notebook

- Style the notebook with style sheet provided in src
    - Rationale: Easy way to set context for notebooks, create a "GROMACS"-tutorial feel
    - Con: requires users to execute the first cell
    - Pro: can serve as an introduction on how to execute cells

- Write headers (# title, ## header2, etc.) in seperate cells 
    - Rationale: this enables folding of sections 

- Include images as

![ImageNotFoundAlternativeTxt](images/Bioexcel_symbol.svg "Text you will see when hovering over the logo with the mouse")

![ImageNotFoundAlternativeTxt](images/non-existant.svg "Logo Title Text 1")


- Toggle solutions with the hide_toggle() function, hide the next cell with for_next=True

In [None]:
#hide this cell
hide_toggle()

In [None]:
hide_toggle(for_next=True)

In [None]:
#some solution that should not yet be visible

more text

# Common pitfalls

- Cells are stuck in evaluation (marked with `[*]`)
    - Reason: Cells are evaluated serially. Jupyter cannot run bash commands in the background. Especially when opening another programm, like vmd, the notebook requires that the window is closed before going further
    - Solution : Make workshop participants aware of this 
    - Solution : Show users the kernel -> interrupt solution
    - Solution : start evaluations in subprocess
        - Con : spawning subprocesses from within notebooks depends on the python verion. 
        - Con : This requires some python boilerplate code that makes it harder to have a one-to-one correspondence of the command line command and what users will read in the notebook
- Users don't notice that they can execute / run input cells with the run botton, but rather copy-paste or type into the command line
- Users feel like they are taken through the notebooks via "auto-pilot"
    - Solution: Include the `hide_toggles()` function
    - Solution: Use quizzes
    - Con: This requires executing the first line of code in the notebook to work well
- The notebook is unaware of the changes to shell environment and variables changes when executed with '! command!
    - Reason: Bash commands are executed in subshells 
- '!cd directory' does not work as expected
    - Reason: Bash commands are executed in subshells 
    - Solution: use `%cd ` or `cd ` to permanently change the current directory
    - Con: this is notebook 'magic' that might confuse useres
- Loading environment modules does not work
    - Reason: Bash commands are executed in subshells
    - Solution: Use python module commands
       - Con : Depends on the python version and importing external modules
- Markdown cell line breaks look differnetly when markdown cells are executed
- Showing contents of long files requires lots of scrolling
  - Solution: use `%less filename`
- Longer bash script is hard to read in a cell
  - Solution: use `%bash` on top of the cell, then continue with usual bash