# ABC MD Setup pipeline using BioExcel Building Blocks (biobb)

***

This **BioExcel Building Blocks library (BioBB) workflow** provides a pipeline to setup DNA structures for the **Ascona B-DNA Consortium** (ABC) members. It follows the work started with the [NAFlex](http://mmb.irbbarcelona.org/NAFlex/ABC) tool to offer a single, reproducible pipeline for structure preparation, ensuring **reproducibility** and **coherence** between all the members of the consortium. The **NAFlex pipeline** was used for the preparation of all the simulations done in the study: ***[The static and dynamic structural heterogeneities of B-DNA: extending Calladine–Dickerson rules](https://doi.org/10.1093/nar/gkz905)***. The workflow included in this **Jupyter Notebook** is **extending** and **updating** the **NAFlex pipeline**, and is being used for the **new ABC work**. 

The **setup process** is performed using the **biobb_amber** module from the **BioBB library**, which is wrapping the **AMBER MD package**. The forcefield used is the **ff14SB**, with the nucleic acids **parmbsc1 forcefield**, **Dang ions parameters** and **SPC/E Water model**.

The main **steps of the pipeline** are:

- Add missing atoms
- Energetically minimize the system (in vacuo)
- Solvate structure with a truncated octahedron box, with SCP/E water model
- Neutralize the system with Potassium ions
- Add an ionic concentration of 150mM of Cl- / K+ ions
- Randomize ions around the structure using cpptraj
- Energetically minimize the system (in solvent)

***

## Settings

### Biobb module used

 - [biobb_amber](https://github.com/bioexcel/biobb_amber): Tools to setup and run Molecular Dynamics simulations using the AMBER MD package.
 
### Auxiliar libraries used

 - [nb_conda_kernels](https://github.com/Anaconda-Platform/nb_conda_kernels): Enables a Jupyter Notebook or JupyterLab application in one conda environment to access kernels for Python, R, and other languages found in other environments.
 - [nglview](http://nglviewer.org/#nglview): Jupyter/IPython widget to interactively view molecular structures and trajectories in notebooks.
 - [ipywidgets](https://github.com/jupyter-widgets/ipywidgets): Interactive HTML widgets for Jupyter notebooks and the IPython kernel.
 - [plotly](https://plot.ly/python/offline/): Python interactive graphing library integrated in Jupyter notebooks.

### Conda Installation and Launch

```console
 git clone https://github.com/bioexcel/biobb_amber_ABC_MD_setup.git
 cd biobb_amber_ABC_MD_setup
 conda env create -f conda_env/environment.yml
 conda activate biobb_amber_ABC_MD_setup
 jupyter-nbextension enable --py --user widgetsnbextension
 jupyter-nbextension enable --py --user nglview
 jupyter-notebook biobb_wf_md_setup/notebooks/biobb_amber_ABC_MD_setup.ipynb
  ``` 

***
## Pipeline steps
 1. [Initial Parameters](#input)
 2. [Model DNA 3D Structure](#model)
 3. [Generate Topology](#top)
 4. [Energetically minimize the generated structure (vacuum)](#minv)
 5. [Add Water Box](#water)
 6. [Add Ions](#ions)
 7. [Randomize Ions](#random)
 8. [Energetically Minimize the System (solvent)](#mins)
 9. [Heating the system](#heat)
 10. [Output files](#output)
 
***
<table><tr style="background: white;">
<td> <img src="https://bioexcel.eu/wp-content/uploads/2019/04/Bioexcell_logo_1080px_transp.png" alt="Bioexcel2 logo" style="width: 300px;"/> </td>
<td style="width: 100px;"></td>
<td> <img src="http://mmb.irbbarcelona.org/NAFlex//images/abc.png" alt="ABC logo" style="width: 200px;"/> </td>
</tr></table>

***

## Auxiliar libraries

In [1]:
import nglview
import ipywidgets
import plotly
import plotly.graph_objs as go



<a id="input"></a>
## Initial parameters

**Input parameters** needed:

- **DNA sequence**: Nucleotide sequence to be modelled and prepared for a MD simulation (e.g. GCGCGGCTGATAAACGAAAGC)
- **Forcefield**: Forcefield to be used in the setup (e.g. protein.ff14SB). Values: protein.ff14SB, 
- **Water model**: Water model to be used in the setup (e.g. SPC/E). Values: SPC/E, TIP3PBOX.
- **Ion model**: Ion model to be used in the setup (e.g. Dang). Values: Dang, Cheatham.
- **Thermostat**: Thermostat to be used in the setup (e.g. ?). Values: ?

In [2]:
# Nucleotide sequence:
seq = "GCGCGGCTGATAAACGAAAGC"

In [3]:
# ABC protocol
forcefield = ["DNA.bsc1"] # ParmBSC1 (ff99 + bsc0 + bsc1) for DNA. Ivani et al. Nature Methods 13: 55, 2016
water_model = "SPCBOX" # SPC/E + Joung-Chetham monovalent ions + Li/Merz highly charged ions (+2 to +4, 12-6 normal usage set)
ions_dang = "leapin/frcmod.ionsdang_spce" # NOT INTEGRATED IN AMBERTOOLS 17 (Ambertools CONDA Package)
ions_model = "None" # NOT using default ions model (Joung & Cheatham)
thermostat = 3 # ntt=3: Langevin Dynamics
timestep = 0.002 # 2fs timestep

<a id="model"></a>
## Model DNA 3D structure

Model **DNA 3D structure** from a **nucleotide sequence** using the **nab tool** from the **AMBER MD package**.
***
**Building Blocks** used:
 - [nab_build_dna_structure](https://biobb-amber.readthedocs.io/en/latest/nab.html#module-nab.nab_build_dna_structure) from **biobb_amber.nab.nab_build_dna_structure**
***

In [4]:
# Import module
from biobb_amber.nab.nab_build_dna_structure import nab_build_dna_structure

# Create properties dict and inputs/outputs
dna_pdb = seq+'.pdb'
prop = {
    'sequence': seq,
    'helix_type': 'abdna', # Right Handed B-DNA, Arnott 
    'remove_tmp': True
}

#Create and launch bb
nab_build_dna_structure(output_pdb_path=dna_pdb,
    properties=prop)

2021-02-19 15:49:05,008 [MainThread  ] [INFO ]  Creating c4438bbf-1da0-472b-81db-3ed7601a3fef temporary folder
2021-02-19 15:49:05,010 [MainThread  ] [INFO ]  Creating command line with instructions and required arguments
2021-02-19 15:49:06,499 [MainThread  ] [INFO ]  nab  --compiler gcc --linker gfortran c4438bbf-1da0-472b-81db-3ed7601a3fef/nuc.nab  ; ./c4438bbf-1da0-472b-81db-3ed7601a3fef/nuc

2021-02-19 15:49:06,501 [MainThread  ] [INFO ]  Exit code 0

2021-02-19 15:49:06,503 [MainThread  ] [INFO ]  
Running: /anaconda3/envs/biobb_amber/bin/teLeap -s -f leap.in -I/anaconda3/envs/biobb_amber/dat/leap/cmd -I/anaconda3/envs/biobb_amber/dat/leap/parm -I/anaconda3/envs/biobb_amber/dat/leap/prep -I/anaconda3/envs/biobb_amber/dat/leap/lib > tleap.out
Reading parm file (tprmtop)
title:
default_name                                                                    

2021-02-19 15:49:06,506 [MainThread  ] [INFO ]  Removed: c4438bbf-1da0-472b-81db-3ed7601a3fef
2021-02-19 15:49:06,507 [MainTh

0

# Visualizing 3D structure

In [5]:
# Show protein
view = nglview.show_structure_file(dna_pdb)
view.add_representation(repr_type='ball+stick', selection='all')
view._remote_call('setSize', target='Widget', args=['','600px'])
view

NGLWidget()

<a id="water"></a>
## Generate Topology & Creating Water Box

Build the **DNA topology** and creating a **water box** surrounding the **DNA structure** using the **leap tool** from the **AMBER MD package**. 
<br/>
Define the **unit cell** for the **DNA structure MD system** and fill it with **water molecules**.<br/>
A **truncated octahedron** is used to define the unit cell, with a distance from the protein to the box edge of 10Å.
The **water model** used is the one defined in the first cell.
<br/>
Using the **forcefield** fixed in the first cell.

***
**Building Blocks** used:
 - [leap_solvate](https://biobb-amber.readthedocs.io/en/latest/leap.html#module-leap.leap_solvate) from **biobb_amber.leap.leap_solvate**
***

In [11]:
# Import module
from biobb_amber.leap.leap_solvate import leap_solvate

# Create prop dict and inputs/outputs
prop = {
    "forcefield" : forcefield,
    "water_type" : water_model,
    "ions_type" : ions_model, 
    "distance_to_molecule": "10.0", 
    "box_type": "truncated_octahedron",
    "remove_tmp": True
}
output_solv_pdb_path = 'structure.solv.pdb'
output_solv_top_path = 'structure.solv.parmtop'
output_solv_crd_path = 'structure.solv.crd'

#Create and launch bb
leap_solvate( input_pdb_path=dna_pdb,
                input_params_path=ions_dang,
                output_pdb_path=output_solv_pdb_path,
                output_top_path=output_solv_top_path,
                output_crd_path=output_solv_crd_path,
                properties=prop)

2021-02-19 15:53:19,286 [MainThread  ] [INFO ]  Creating 2b42c89a-6fb6-492c-85ef-2214fc565ce5 temporary folder
2021-02-19 15:53:19,287 [MainThread  ] [INFO ]  Creating command line with instructions and required arguments
2021-02-19 15:53:20,865 [MainThread  ] [INFO ]  tleap  -f 2b42c89a-6fb6-492c-85ef-2214fc565ce5/leap.in

2021-02-19 15:53:20,867 [MainThread  ] [INFO ]  Exit code 0

2021-02-19 15:53:20,868 [MainThread  ] [INFO ]  -I: Adding /anaconda3/envs/biobb_amber/dat/leap/prep to search path.
-I: Adding /anaconda3/envs/biobb_amber/dat/leap/lib to search path.
-I: Adding /anaconda3/envs/biobb_amber/dat/leap/parm to search path.
-I: Adding /anaconda3/envs/biobb_amber/dat/leap/cmd to search path.
-f: Source 2b42c89a-6fb6-492c-85ef-2214fc565ce5/leap.in.

Welcome to LEaP!
(no leaprc in search path)
Sourcing: ./2b42c89a-6fb6-492c-85ef-2214fc565ce5/leap.in
----- Source: /anaconda3/envs/biobb_amber/dat/leap/cmd/leaprc.DNA.bsc1
----- Source of /anaconda3/envs/biobb_amber/dat/leap/cmd/leap

0

In [12]:
# Show protein
view = nglview.show_structure_file(output_solv_pdb_path)
view.clear_representations()
view.add_representation(repr_type='ball+stick', selection='nucleic')
view.add_representation(repr_type='line', selection='water')
view._remote_call('setSize', target='Widget', args=['','600px'])
view

NGLWidget()

<a id="ions"></a>
## Adding additional ionic concentration

**Neutralizing** the system and adding an additional **ionic concentration** using the **leap tool** from the **AMBER MD package**. <br/>
Using **Potassium (K+)** and **Chloride (Cl-)** counterions and an **additional ionic concentration** of 150mM.
***
**Building Blocks** used:
 - [leap_add_ions](https://biobb-amber.readthedocs.io/en/latest/leap.html#module-leap.leap_add_ions) from **biobb_amber.leap.leap_add_ions**
***

In [13]:
# Import module
from biobb_amber.leap.leap_add_ions import leap_add_ions

# Create prop dict and inputs/outputs
prop = {
    "forcefield" : forcefield,
    "water_type" : water_model,
    "ions_type" : ions_model, 
    "neutralise" : True,
    "positive_ions_type": "K+",
    "negative_ions_type": "Cl-",
    "ionic_concentration" : 150, # 150mM
    "box_type": "truncated_octahedron",
    "remove_tmp": True
}
output_ions_pdb_path = 'structure.ions.pdb'
output_ions_top_path = 'structure.ions.parmtop'
output_ions_crd_path = 'structure.ions.crd'

# Create and launch bb
leap_add_ions(input_pdb_path=output_solv_pdb_path,
           input_params_path=ions_dang,
           output_pdb_path=output_ions_pdb_path,
           output_top_path=output_ions_top_path,
           output_crd_path=output_ions_crd_path,
           properties=prop)

2021-02-19 15:54:00,049 [MainThread  ] [INFO ]  Creating e296e5f5-83ab-423c-aee5-002a94b2a978 temporary folder
2021-02-19 15:54:00,288 [MainThread  ] [INFO ]  Creating command line with instructions and required arguments
2021-02-19 15:54:08,905 [MainThread  ] [INFO ]  tleap  -f e296e5f5-83ab-423c-aee5-002a94b2a978/leap.in

2021-02-19 15:54:08,906 [MainThread  ] [INFO ]  Exit code 0

2021-02-19 15:54:08,909 [MainThread  ] [INFO ]  -I: Adding /anaconda3/envs/biobb_amber/dat/leap/prep to search path.
-I: Adding /anaconda3/envs/biobb_amber/dat/leap/lib to search path.
-I: Adding /anaconda3/envs/biobb_amber/dat/leap/parm to search path.
-I: Adding /anaconda3/envs/biobb_amber/dat/leap/cmd to search path.
-f: Source e296e5f5-83ab-423c-aee5-002a94b2a978/leap.in.

Welcome to LEaP!
(no leaprc in search path)
Sourcing: ./e296e5f5-83ab-423c-aee5-002a94b2a978/leap.in
----- Source: /anaconda3/envs/biobb_amber/dat/leap/cmd/leaprc.DNA.bsc1
----- Source of /anaconda3/envs/biobb_amber/dat/leap/cmd/leap

2021-02-19 15:54:08,911 [MainThread  ] [INFO ]  Fixing truncated octahedron Box in the topology and coordinates files
2021-02-19 15:54:09,162 [MainThread  ] [INFO ]  Removed: e296e5f5-83ab-423c-aee5-002a94b2a978
2021-02-19 15:54:09,169 [MainThread  ] [INFO ]  Removed: leap.log


0

In [14]:
# Show protein
view = nglview.show_structure_file(output_ions_pdb_path)
view.clear_representations()
view.add_representation(repr_type='ball+stick', selection='nucleic')
view.add_representation(repr_type='spacefill', selection='Na+')
view.add_representation(repr_type='spacefill', selection='K+')
view.add_representation(repr_type='spacefill', selection='Cl-')
view._remote_call('setSize', target='Widget', args=['','600px'])
view

NGLWidget()

<a id="random"></a>
## Randomize ions

**Randomly swap** the positions of **solvent** and **ions** using the **cpptraj tool** from the **AMBER MD package**. <br/>
***
**Building Blocks** used:
 - [cpptraj_randomize_ions](https://biobb-amber.readthedocs.io/en/latest/cpptraj.html#module-cpptraj.cpptraj_randomize_ions) from **biobb_amber.cpptraj.cpptraj_randomize_ions**
***


In [15]:
# Import module
from biobb_amber.cpptraj.cpptraj_randomize_ions import cpptraj_randomize_ions

# Create prop dict and inputs/outputs
prop = { 
    "remove_tmp": True
}
output_cpptraj_crd_path = 'structure.randIons.crd'
output_cpptraj_pdb_path = 'structure.randIons.pdb'

# Create and launch bb
cpptraj_randomize_ions(
            input_top_path=output_ions_top_path,
            input_crd_path=output_ions_crd_path,
            output_pdb_path=output_cpptraj_pdb_path,
            output_crd_path=output_cpptraj_crd_path,
            properties=prop)

2021-02-19 15:56:36,230 [MainThread  ] [INFO ]  Creating a7cada99-c668-49b0-8160-b4981130f544 temporary folder
2021-02-19 15:56:36,234 [MainThread  ] [INFO ]  Creating command line with instructions and required arguments
2021-02-19 15:56:58,713 [MainThread  ] [INFO ]  cpptraj  structure.ions.parmtop -i a7cada99-c668-49b0-8160-b4981130f544/cpptraj.in

2021-02-19 15:56:58,716 [MainThread  ] [INFO ]  Exit code 0

2021-02-19 15:56:58,718 [MainThread  ] [INFO ]  
CPPTRAJ: Trajectory Analysis. V4.25.6
    ___  ___  ___  ___
     | \/ | \/ | \/ | 
    _|_/\_|_/\_|_/\_|_

| Date/time: 02/19/21 15:56:36
| Available memory: 65.566 MB

	Reading 'structure.ions.parmtop' as Amber Topology
	Radius Set: modified Bondi radii (mbondi)
INPUT: Reading input from 'a7cada99-c668-49b0-8160-b4981130f544/cpptraj.in'
  [trajin structure.ions.crd]
	Reading 'structure.ions.crd' as Amber Restart
  [randomizeions :K+,Cl-,Na+ around :DA,DC,DG,DT,D?3,D?5 by 5.0 overlap 3.5]
    RANDOMIZEIONS: Swapping postions of i

0

In [16]:
# Show protein
view = nglview.show_structure_file(output_cpptraj_pdb_path)
view.clear_representations()
view.add_representation(repr_type='ball+stick', selection='nucleic')
view.add_representation(repr_type='spacefill', selection='K+')
view.add_representation(repr_type='spacefill', selection='Cl-')
view.add_representation(repr_type='line', selection='water')
view._remote_call('setSize', target='Widget', args=['','600px'])
view

NGLWidget()

<a id="mins"></a>
## Energetically minimize the system

**Energetically minimize** the **DNA structure** (in solvent) using the **sander tool** from the **AMBER MD package**.
***
**Building Blocks** used:
 - [sander_mdrun](https://biobb-amber.readthedocs.io/en/latest/sander.html#module-sander.sander_mdrun) from **biobb_amber.sander.sander_mdrun**
 - [process_minout](https://biobb-amber.readthedocs.io/en/latest/process.html#module-process.process_minout) from **biobb_amber.process.process_minout**
***

In [17]:
# Import module
from biobb_amber.sander.sander_mdrun import sander_mdrun

# Create prop dict and inputs/outputs
prop = {
    "simulation_type" : "minimization",
    "mdin" : { 
        'maxcyc' : 500,
        'ntpr' : 1,
        'dt' : 0.0001
    },
    "remove_tmp": True
}
output_min_traj_path = 'sander.min.x'
output_min_rst_path = 'sander.min.rst'
output_min_log_path = 'sander.min.log'

# Create and launch bb
sander_mdrun(
            input_top_path=output_ions_top_path,
#            input_top_path=output_solv_top_path,
            input_crd_path=output_cpptraj_crd_path,
            output_traj_path=output_min_traj_path,
            output_rst_path=output_min_rst_path,
            output_log_path=output_min_log_path,
            properties=prop)

2021-02-19 15:58:30,391 [MainThread  ] [INFO ]  Creating 9044d1f2-89e9-4ca9-9398-5edc4901262e temporary folder
2021-02-19 15:58:30,392 [MainThread  ] [INFO ]  Creating command line with instructions and required arguments
2021-02-19 16:03:47,245 [MainThread  ] [INFO ]  sander -O -i 9044d1f2-89e9-4ca9-9398-5edc4901262e/sander.mdin -p structure.ions.parmtop -c structure.randIons.crd -r sander.min.rst -o sander.min.log -x sander.min.x

2021-02-19 16:03:47,250 [MainThread  ] [INFO ]  Exit code 0

2021-02-19 16:03:47,254 [MainThread  ] [INFO ]  Removed: mdinfo, 9044d1f2-89e9-4ca9-9398-5edc4901262e


0

### Checking Energy Minimization results
Checking **energy minimization** results. Plotting **potential energy** by time during the **minimization process**.

In [18]:
# Import module
from biobb_amber.process.process_minout import process_minout

# Create prop dict and inputs/outputs
prop = {
    #"terms" : ['ENERGY','RMS'],
    "terms" : ['ENERGY'],
    "remove_tmp": True
}
output_dat_path = 'sander.min.energy.dat'

# Create and launch bb
process_minout(input_log_path=output_min_log_path,
            output_dat_path=output_dat_path,
            properties=prop)

2021-02-19 16:05:38,142 [MainThread  ] [INFO ]  Creating command line with instructions and required arguments
2021-02-19 16:05:38,463 [MainThread  ] [INFO ]  process_minout.perl  sander.min.log

2021-02-19 16:05:38,465 [MainThread  ] [INFO ]  Exit code 0

2021-02-19 16:05:38,467 [MainThread  ] [INFO ]  Processing sander output file (sander.min.log)...
Processing step 50 of a possible 500...
Processing step 100 of a possible 500...
Processing step 150 of a possible 500...
Processing step 200 of a possible 500...
Processing step 250 of a possible 500...
Processing step 300 of a possible 500...
Processing step 350 of a possible 500...
Processing step 400 of a possible 500...
Processing step 450 of a possible 500...
Processing step 500 of a possible 500...
Processing step 500 of a possible 500...
Starting output...
Outputing summary.NSTEP
Outputing summary.ENERGY
Outputing summary.RMS
Outputing summary.GMAX
Outputing summary.NAME
Outputing summary.NUMBER
Outputing summary.BOND
Outputing s

0

In [19]:
#Read data from file and filter energy values higher than 1000 Kj/mol^-1
with open(output_dat_path,'r') as energy_file:
    x,y = map(
        list,
        zip(*[
            (float(line.split()[0]),float(line.split()[1]))
            for line in energy_file 
            if not line.startswith(("#","@")) 
            if float(line.split()[1]) < 1000 
        ])
    )

plotly.offline.init_notebook_mode(connected=True)

fig = {
    "data": [go.Scatter(x=x, y=y)],
    "layout": go.Layout(title="Energy Minimization",
                        xaxis=dict(title = "Energy Minimization Step"),
                        yaxis=dict(title = "Potential Energy kcal/mol")
                       )
}

plotly.offline.iplot(fig)

### Extracting final structure
Extracting **final PDB structure** using the **ambpdb** tool from the **AMBER MD package**. 

In [20]:
# Import module
from biobb_amber.ambpdb.amber_to_pdb import amber_to_pdb

# Create prop dict and inputs/outputs
output_ambpdb_final_path = 'structure.final.pdb'

# Create and launch bb
amber_to_pdb(input_top_path=output_ions_top_path,
            input_crd_path=output_min_rst_path,
            output_pdb_path=output_ambpdb_final_path
            )

2021-02-19 16:05:45,582 [MainThread  ] [INFO ]  Creating command line with instructions and required arguments
2021-02-19 16:05:45,898 [MainThread  ] [INFO ]  ambpdb  -p structure.ions.parmtop -c sander.min.rst >  structure.final.pdb

2021-02-19 16:05:45,899 [MainThread  ] [INFO ]  Exit code 0



0

<a id="output"></a>
## Output files

Important **Output files** generated:
 - {{output_ambpdb_final_path}}: **Final structure** of the MD setup protocol.
 - {{output_min_rst_path}}: **Final trajectory** of the MD setup protocol.
 - {{output_ions_top_path}}: **Final topology** of the MD system. 

 

In [23]:
from IPython.display import FileLink
display(FileLink(output_ambpdb_final_path))
display(FileLink(output_min_rst_path))
display(FileLink(output_ions_top_path))