### Using this notebook
- You have to execute this notebook by **running the cells in order**.
  - If you need to modify a value or a selection in a cell, you might need to **rerun the cells** below to reflect the change.
  - Running the whole workflow **exports all files to the user storage** and removes them from the temporary folders. This could cause some of the cells to not work properly if manually executed after a complete run (e.g. molecule visualizations). Therefore, a **cell by cell execution** is suggested instead of a whole run.
- **Any changes to the notebook need to be saved** before leaving the page in order to be persisted.
- A **setup process** is needed before running this notebook. This process installs software and libraries dependencies. Please find the **"Initial Setup Process" Jupyter netbook** in the Navigation section (left part of the collab) and execute it before starting with this workflow. 

# Preparing a protein-ligand complex to run an atomistic Molecular Dynamics simulation
***
**Aim: **This **use case** aims to illustrate the process of **setting up a simulation system** containing a **protein in complex with a ligand**, step by step. The particular example used is the **acetylcholinesterase**  protein (PDB code [1EVE](http://www.rcsb.org/structure/1EVE)), in complex with the **anti-alzheimer drug E2020 (Aricept)** (3-letter Code [E20](http://www.rcsb.org/ligand/E20)). 


**Cholinesterase inhibitors** are being utilized for symptomatic treatment of **Alzheimer's disease**. **E2020**, marketed as **Aricept**, is a member of a large family of N-benzylpiperidine-based acetylcholinesterase (AChE) inhibitors.


This workflow makes extensive use of the **BioExcel Building Blocks library** ([biobb](https://github.com/bioexcel/biobb)). Each step of the process is performed by a **building block** (bb), which are wrappers of tools/scripts that computes a particular functionality (e.g. Solvating a system). If you are interested in expanding/modifying the current workflow, please visit the **existing documentation** for each of the packages [here](https://github.com/bioexcel/biobb). 

Although the **pipeline** is presented **step by step** with associated information, it is extremely advisable to previously spend some time reading documentation about **Molecular Dynamics simulations**, to get familiar with the terms used, especially for newcomers to the field. 

This workflow is based on the official Gromacs Protein-ligand complex MD setup tutorial: http://www.mdtutorials.com/gmx/complex/index.html

***
**Version:** 1.0 (August 2019)
***
**Contributors:**  Adam Hospital, Pau Andrio, Aurélien Luciani, Genís Bayarri, Josep Lluís Gelpí, Modesto Orozco (IRB-Barcelona, Spain)
***
**Contact:** [adam.hospital@irbbarcelona.org](mailto:adam.hospital@irbbarcelona.org)
***
**Thanks:** This use case took the code to generate the data in a single folder to finally copy it to the storage collab from the **multipipsa tool to calculate the electrostatic potential surrounding a protein in aqueous solution** use case by Neil Bruce, Lukas Adam, Stefan Richter, Rebecca Wade (HITS, Heidelberg, Germany).

## Setting up the working environment

### Importing required libraries

In [None]:
import os, datetime, magic
import nglview
import zipfile
import ipywidgets
import simpletraj
from hbp_service_client.storage_service.client import Client

### Set up collab storage for saving data at the end of the MD setup

In [None]:
# Find your own collab storage path
collab_path = get_collab_storage_path()
print(collab_path)
storage_client = Client.new(oauth.get_token())

### Set up local directory structure

In [None]:
# Create a local working directory
try:
    homeDir = os.environ['HOME']
except:
    print("Error in environment")

else:
    workDir = os.path.join(homeDir, 'IRB')
    if not os.path.isdir(workDir):
        try:
            os.mkdir(workDir)
        except:
            print("unable to make working directory")
    
    # Make a new directory to run the use case in. 
    # If directory already exists, add a number to make a unique name
    baseDir = 'Complex_MDSetup'
    dirIter = 0
    useCaseDir = os.path.join(workDir, baseDir)
    print(useCaseDir)
    
    if os.path.exists(useCaseDir):
        while os.path.exists(useCaseDir):
            dirIter += 1
            useCaseDir = os.path.join(workDir, baseDir + '.' + str(dirIter))            
    
    try:
        os.mkdir(useCaseDir)
    except:
        print("Failed to make use case working directory")
    else:
        print("Working directory for current use case: %s" % useCaseDir)
        os.chdir(useCaseDir)


### Defining logging output cells
By default, Jupyter notebooks display the logging information in **red-coloured cells**.

Here they are redefined to different colours, depending on the **logging level** (INFO, WARNING, ERROR), in order to avoid confusions with **critical error messages**. 

If you prefer to keep the default Juyter notebooks configuration, please **disable/comment** (or just not execute) the next cell.

In [None]:
from IPython.core.display import display, HTML
display(HTML('''<script>
const mo = new MutationObserver(
  mutations => mutations.forEach(mutation => {
    const element = mutation.target.querySelector(
        '.output_text.output_stderr');
    if (!element) return;
    if (element.textContent.includes('[INFO')) 
        element.style.background = '#DDD';
    else if (element.textContent.includes('[WARN')) 
        element.style.background = 'sandybrown';
    else if (element.textContent.includes('[ERROR')) 
        element.style.background = 'salmon';
}));
mo.observe(document.documentElement, 
    { childList: true, subtree: true });
</script>'''))


***
## Input parameters
**Input parameters** needed:
 - **pdbCode**: PDB code of the protein-ligand complex structure (e.g. 4N00)
 - **ligandCode**: Small molecule 3-letter code for the ligand structure (e.g. 2EX)
 - **mol_charge**: Charge of the small molecule, needed to add hydrogen atoms.
***

In [None]:
pdbCode = "1EVE"
ligandCode = "E20"
mol_charge = 0

***
## Fetching PDB structure
Downloading **PDB structure** with the **protein-ligand complex** from the RCSB PDB database.<br>
Alternatively, a PDB file can be used as starting structure. <br>
Splitting the molecule in **three different files**: 
- **proteinFile**: Protein structure
- **ligandFile**: Ligand structure
- **complexFile**: Protein-ligand complex structure 

***

In [None]:
# Downloading desired PDB file 
# Import module
from biobb_io.api.pdb import Pdb

# Create properties dict and inputs/outputs
downloaded_pdb = pdbCode+'.orig.pdb'
prop = {
    'pdb_code': pdbCode,
    'filter': False
}

# Create and launch bb
Pdb(output_pdb_path=downloaded_pdb,
    properties=prop).launch()

# Extracting Protein, Ligand and Protein-Ligand Complex to three different files
proteinFile = pdbCode+'.pdb'
ligandFile = ligandCode+'.pdb'
complexFile = pdbCode+'_'+ligandCode+'.pdb'

# Building Protein PDB file
with open(proteinFile, 'w') as outfile:
    with open(downloaded_pdb) as infile:
        for line in infile:
            if line.startswith('ATOM'):
                outfile.write(line)

# Building Ligand PDB file
with open(ligandFile, 'w') as outfile:
    with open(downloaded_pdb) as infile:
        for line in infile:
            if line.startswith('HETATM') and ligandCode in line:
                outfile.write(line)

# Building Protein-Ligand Complex PDB file
filenames = [proteinFile,ligandFile]
with open(complexFile, 'w') as outfile:
    for fname in filenames:
        with open(fname) as infile:
            for line in infile:
                outfile.write(line)

### Visualizing 3D structures
Visualizing the generated **PDB structures** using **NGL**:  
- **Protein structure** (Left)
- **Ligand structure** (Center)
- **Protein-ligand complex** (Right)  

In [None]:
# Show structures: protein, ligand and protein-ligand complex
# Better seen in "Full Screen Mode"

# Protein
sprotein = nglview.FileStructure(
    os.path.join(useCaseDir, proteinFile))
view1 = nglview.show_file(sprotein)
view1._remote_call('setSize', target='Widget', args=['350px','400px'])
view1.camera='orthographic'

# Ligand
sligand = nglview.FileStructure(
    os.path.join(useCaseDir, ligandFile))
view2 = nglview.show_file(sligand)
view2.add_representation(repr_type='ball+stick')
view2._remote_call('setSize', target='Widget', args=['350px','400px'])
view2.camera='orthographic'

# Complex
scomplex = nglview.FileStructure(
    os.path.join(useCaseDir, complexFile))
view3 = nglview.show_file(scomplex)
view3.add_representation(repr_type='licorice', radius='.5', selection=ligandCode)
view3._remote_call('setSize', target='Widget', args=['350px','400px'])
view3.camera='orthographic'

# Show
ipywidgets.HBox([view1, view2, view3])

***
## Fix protein structure
**Checking** and **fixing** (if needed) the protein structure:<br>
- **Modeling** **missing side-chain atoms**, modifying incorrect **amide assignments**, choosing **alternative locations**.<br>
- **Checking** for missing **backbone atoms**, **heteroatoms**, **modified residues** and possible **atomic clashes**.

***

In [None]:
# Check & Fix Protein Structure
# Import module
from biobb_model.model.fix_side_chain import FixSideChain

# Create prop dict and inputs/outputs
fixed_pdb = pdbCode+'_fixed.pdb'

# Create and launch bb
FixSideChain(input_pdb_path=proteinFile,
             output_pdb_path=fixed_pdb).launch()

***
## Create protein system topology
**Building GROMACS topology** corresponding to the protein structure.<br>
Force field used in this tutorial is [**amber99sb-ildn**](https://dx.doi.org/10.1002%2Fprot.22711): AMBER **parm99** force field with **corrections on backbone** (sb) and **side-chain torsion potentials** (ildn). Water molecules type used in this tutorial is [**spc/e**](https://pubs.acs.org/doi/abs/10.1021/j100308a038).<br>
Adding **hydrogen atoms** if missing. Automatically identifying **disulfide bridges**. <br>

Generating two output files: 
- **GROMACS structure** (gro file)
- **GROMACS topology** ZIP compressed file containing:
    - *GROMACS topology top file* (top file)
    - *GROMACS position restraint file/s* (itp file/s)
*** 

In [None]:
# Create Protein system topology
# Import module
from biobb_md.gromacs.pdb2gmx import Pdb2gmx

# Create inputs/outputs
output_pdb2gmx_gro = pdbCode+'_pdb2gmx.gro'
output_pdb2gmx_top_zip = pdbCode+'_pdb2gmx_top.zip'
prop = {
    'force_field' : 'amber99sb-ildn',
    'water_type': 'spce'
}

# Create and launch bb
Pdb2gmx(input_pdb_path=fixed_pdb,
        output_gro_path=output_pdb2gmx_gro,
        output_top_zip_path=output_pdb2gmx_top_zip,
        properties=prop).launch()

***
## Create ligand system topology
**Building GROMACS topology** corresponding to the ligand structure.<br>
Force field used in this tutorial step is **amberGAFF**: [General AMBER Force Field](http://ambermd.org/antechamber/gaff.html), designed for rational drug design.<br>
- Step 1: Add **hydrogen atoms** if missing.
- Step 2: **Energetically minimize the system** with the new hydrogen atoms. 
- Step 3: Generate **ligand topology** (parameters). 
***   

### Step 1: Add **hydrogen atoms**

In [None]:
# Create Ligand system topology, STEP 1
# Reduce_add_hydrogens: add Hydrogen atoms to a small molecule (using Reduce tool from Ambertools package)
# Import module
from biobb_chemistry.ambertools.reduce_add_hydrogens import ReduceAddHydrogens

# Create prop dict and inputs/outputs
output_reduce_h = ligandCode+'.reduce.H.pdb' 
prop = {
    'nuclear' : 'true'
}

# Create and launch bb
ReduceAddHydrogens(input_path=ligandFile,
                   output_path=output_reduce_h,
                   properties=prop).launch()
# Show protein
view = nglview.show_file(output_pdb2gmx_gro)
view.add_representation(repr_type='ball+stick', selection='all')
view._remote_call('setSize', target='Widget', args=['','600px'])
view.camera='orthographic'
view

### Step 2: **Energetically minimize the system** with the new hydrogen atoms. 

In [None]:
# Create Ligand system topology, STEP 2
# Babel_minimize: Structure energy minimization of a small molecule after being modified adding hydrogen atoms
# Import module
from biobb_chemistry.babelm.babel_minimize import BabelMinimize

# Create prop dict and inputs/outputs
output_babel_min = ligandCode+'.H.min.mol2'                              
prop = {
    'method' : 'sd',
    'criteria' : '1e-10',
    'force_field' : 'GAFF'
}


# Create and launch bb
BabelMinimize(input_path=output_reduce_h,
              output_path=output_babel_min,
              properties=prop).launch()

### Visualizing 3D structures
Visualizing the small molecule generated **PDB structures** using **NGL**:  
- **Original Ligand Structure** (Left)
- **Ligand Structure with hydrogen atoms added** (with Reduce program) (Center)
- **Ligand Structure with hydrogen atoms added** (with Reduce program), **energy minimized** (with Open Babel) (Right) 

In [None]:
# Show different structures generated (for comparison)
# Better seen in "Full Screen Mode"

# Original Ligand
sligand = nglview.FileStructure(
    os.path.join(useCaseDir, ligandFile))
view1 = nglview.show_file(sligand)
view1.add_representation(repr_type='ball+stick')
view1._remote_call('setSize', target='Widget', args=['350px','400px'])
view1.camera='orthographic'

# Ligand with added Hydrogen atoms
sligandH = nglview.FileStructure(
    os.path.join(useCaseDir, output_reduce_h))
view2 = nglview.show_file(sligandH)
view2.add_representation(repr_type='ball+stick')
view2._remote_call('setSize', target='Widget', args=['350px','400px'])
view2.camera='orthographic'

# Ligand with added Hydrogens, energy minimized
sligandHmin = nglview.FileStructure(
    os.path.join(useCaseDir, output_babel_min))
view3 = nglview.show_file(sligandHmin)
view3.add_representation(repr_type='ball+stick')
view3._remote_call('setSize', target='Widget', args=['350px','400px'])
view3.camera='orthographic'

# Show
ipywidgets.HBox([view1, view2, view3])

### Step 3: Generate **ligand topology** (parameters).

In [None]:
# Create Ligand system topology, STEP 3
# Acpype_params_gmx: Generation of topologies for GROMACS with ACPype
# Import module
from biobb_chemistry.acpype.acpype_params_gmx import AcpypeParamsGMX

# Create prop dict and inputs/outputs
output_acpype_gro = ligandCode+'params.gro'
output_acpype_itp = ligandCode+'params.itp'
output_acpype_top = ligandCode+'params.top'
output_acpype = ligandCode+'params'
prop = {
    'basename' : output_acpype,
    'charge' : mol_charge
}

# Create and launch bb
AcpypeParamsGMX(input_path=output_babel_min, 
                output_path_gro=output_acpype_gro,
                output_path_itp=output_acpype_itp,
                output_path_top=output_acpype_top,
                properties=prop).launch()

***
## Create new protein-ligand complex structure file
Building new **protein-ligand complex** PDB file with:
- The new **protein system** with fixed problems from *Fix Protein Structure* step and hydrogens atoms added from *Create Protein System Topology* step.
- The new **ligand system** with hydrogens atoms added from *Create Ligand System Topology* step. 

This new structure is needed for **GROMACS** as it is **force field-compliant**, it **has all the new hydrogen atoms**, and the **atom names are matching the newly generated protein and ligand topologies**.
***

In [None]:
# biobb analysis module
from biobb_analysis.gromacs.gmx_trjconv_str import GMXTrjConvStr

# Convert gro (with hydrogens) to pdb (PROTEIN)
proteinFile_H = pdbCode+'_'+ligandCode+'_complex_H.pdb'
prop = {
    'selection' : 'System'
}

# Create and launch bb
GMXTrjConvStr(input_structure_path=output_pdb2gmx_gro,
              input_top_path=output_pdb2gmx_gro,
              output_str_path=proteinFile_H, 
              properties=prop).launch()

# Convert gro (with hydrogens) to pdb (LIGAND)
ligandFile_H = ligandCode+'_complex_H.pdb'
prop = {
    'selection' : 'System'
}

# Create and launch bb
GMXTrjConvStr(input_structure_path=output_acpype_gro,
              input_top_path=output_acpype_gro,
              output_str_path=ligandFile_H, 
              properties=prop).launch()

# Concatenating both PDB files: Protein + Ligand
complexFile_H = pdbCode+'_'+ligandCode+'_H.pdb'
filenames = [proteinFile_H,ligandFile_H]
with open(complexFile_H, 'w') as outfile:
    for fname in filenames:
        with open(fname) as infile:
            for line in infile:
                if line.startswith('ATOM') or line.startswith('HETATM'):
                    outfile.write(line)

***
## Create new protein-ligand complex topology file
Building new **protein-ligand complex** GROMACS topology file with:
- The new **protein system** topology generated from *Create Protein System Topology* step.
- The new **ligand system** topology generated from *Create Ligand System Topology* step. 

NOTE: From this point on, the **protein-ligand complex structure and topology** generated can be used in a regular MD setup.
***

In [None]:
# AppendLigand: Append a ligand to a GROMACS topology
# Import module
from biobb_md.gromacs_extra.append_ligand import AppendLigand

# Create prop dict and inputs/outputs
output_complex_top = pdbCode+'_'+ligandCode+'_complex.top.zip'

# Create and launch bb
AppendLigand(input_top_zip_path=output_pdb2gmx_top_zip,
             input_itp_path=output_acpype_itp, 
             output_top_zip_path=output_complex_top).launch()

***
## Create solvent box
Define the unit cell for the **protein-ligand complex** to fill it with water molecules.<br>
**Truncated octahedron** box is used for the unit cell. This box type is the one which best reflects the geometry of the solute/protein, in this case a **globular protein**, as it approximates a sphere. It is also convenient for the computational point of view, as it accumulates **less water molecules at the corners**, reducing the final number of water molecules in the system and making the simulation run faster.<br> A **protein to box** distance of **0.8 nm** is used, and the protein is **centered in the box**.  

***

In [None]:
# Editconf: Create solvent box
# Import module
from biobb_md.gromacs.editconf import Editconf

# Create prop dict and inputs/outputs
output_editconf_gro = pdbCode+'_'+ligandCode+'_complex_editconf.gro'

prop = {
    'box_type': 'octahedron',
    'distance_to_molecule': 0.8
}

# Create and launch bb
Editconf(input_gro_path=complexFile_H, 
         output_gro_path=output_editconf_gro,
         properties=prop).launch()

***
## Fill the box with water molecules
Fill the unit cell for the **protein-ligand complex** with water molecules.<br>
The solvent type used is the default **Simple Point Charge water (SPC)**, a generic equilibrated 3-point solvent model. 

***

In [None]:
# Solvate: Fill the box with water molecules
from biobb_md.gromacs.solvate import Solvate

# Create prop dict and inputs/outputs
output_solvate_gro = pdbCode+'_'+ligandCode+'_solvate.gro'
output_solvate_top_zip = pdbCode+'_'+ligandCode+'_solvate_top.zip'

# Create and launch bb
Solvate(input_solute_gro_path=output_editconf_gro,
        output_gro_path=output_solvate_gro,
        input_top_zip_path=output_complex_top,
        output_top_zip_path=output_solvate_top_zip).launch()

### Visualizing 3D structure
Visualizing the **protein-ligand complex** with the newly added **solvent box** using **NGL**<br>
Although the system contains an **octahedral box** filled with **water molecules** surrounding the **protein structure**, the representation shown is the one used internally by **GROMACS MD package**. **GROMACS** always use the most **numerically efficient representation** of the coordinates, one that has everything re-wrapped into a **triclinic unit cell**, that's why the system doesn't look like an **octahedron**.

In [None]:
# Show protein
sprotein_solv = nglview.FileStructure(
    os.path.join(useCaseDir, output_solvate_gro))
view = nglview.show_file(sprotein_solv)
view.clear_representations()
view.add_representation(repr_type='cartoon', selection='protein', color='sstruc')
view.add_representation(repr_type='licorice', radius='.5', selection=ligandCode)
view.add_representation(repr_type='line', linewidth='1', selection='SOL', opacity='.3')
view._remote_call('setSize', target='Widget', args=['','600px'])
view.camera='orthographic'
view

***
## Adding ions
Add ions to neutralize the **protein-ligand complex** and reach a desired ionic concentration.
- Step 1: Creating portable binary run file for ion generation
- Step 2: Adding ions to **neutralize** the system and reach a **0.05 molar ionic concentration**
***

### Step 1: Creating portable binary run file for ion generation
A simple **energy minimization** molecular dynamics parameters (mdp) properties will be used to generate the portable binary run file for **ion generation**, although **any legitimate combination of parameters** could be used in this step.

In [None]:
# Grompp: Creating portable binary run file for ion generation
from biobb_md.gromacs.grompp import Grompp

# Create prop dict and inputs/outputs
prop = {
    'mdp':{
        'type': 'minimization',
        'nsteps':'5000'
    }
}
output_gppion_tpr = pdbCode+'_'+ligandCode+'_complex_gppion.tpr'

# Create and launch bb
Grompp(input_gro_path=output_solvate_gro,
       input_top_zip_path=output_solvate_top_zip, 
       output_tpr_path=output_gppion_tpr,
       properties=prop).launch()

### Step 2: Adding ions to neutralize the system
Replace **solvent molecules** with **ions** to **neutralize** the system.

In [None]:
# Genion: Adding ions to reach a 0.05 molar concentration
from biobb_md.gromacs.genion import Genion

# Create prop dict and inputs/outputs
prop={
    'neutral':True
}
output_genion_gro = pdbCode+'_'+ligandCode+'_genion.gro'
output_genion_top_zip = pdbCode+'_'+ligandCode+'_genion_top.zip'

# Create and launch bb
Genion(input_tpr_path=output_gppion_tpr,
       output_gro_path=output_genion_gro, 
       input_top_zip_path=output_solvate_top_zip,
       output_top_zip_path=output_genion_top_zip, 
       properties=prop).launch()

### Visualizing 3D structure
Visualizing the **protein-ligand complex** with the newly added **ionic concentration** using **NGL**

In [None]:
# Show protein
sprotein_ions = nglview.FileStructure(
    os.path.join(useCaseDir, output_genion_gro))
view = nglview.show_file(sprotein_ions)
view.clear_representations()
view.add_representation(repr_type='cartoon', selection='protein', color='sstruc')
view.add_representation(repr_type='licorice', radius='.5', selection=ligandCode)
view.add_representation(repr_type='ball+stick', selection='NA')
view.add_representation(repr_type='ball+stick', selection='CL')
view._remote_call('setSize', target='Widget', args=['','600px'])
view.camera='orthographic'
view

***
## Energetically minimize the system
Energetically minimize the **protein-ligand complex** till reaching a desired potential energy.
- Step 1: Creating portable binary run file for energy minimization
- Step 2: Energetically minimize the **protein-ligand complex** till reaching a force of 500 kJ mol-1 nm-1.
- Step 3: Checking **energy minimization** results. Plotting energy by time during the **minimization** process.
***

### Step 1: Creating portable binary run file for energy minimization
The **minimization** type of the **molecular dynamics parameters (mdp) property** contains the main default parameters to run an **energy minimization**:

-  integrator  = steep ; Algorithm (steep = steepest descent minimization)
-  emtol       = 1000.0 ; Stop minimization when the maximum force < 1000.0 kJ/mol/nm
-  emstep      = 0.01 ; Minimization step size (nm)
-  nsteps      = 50000 ; Maximum number of (minimization) steps to perform

In this particular example, the method used to run the **energy minimization** is the default **steepest descent**, but the **maximum force** is placed at **500 KJ/mol\*nm^2**, and the **maximum number of steps** to perform (if the maximum force is not reached) to **5,000 steps**. 

In [None]:
# Grompp: Creating portable binary run file for mdrun
from biobb_md.gromacs.grompp import Grompp

# Create prop dict and inputs/outputs
prop = {
    'mdp':{
        'type': 'minimization',
        'nsteps':'5000',
        'emstep': 0.01,
        'emtol':'500'
    }
}
output_gppmin_tpr = pdbCode+'_'+ligandCode+'_gppmin.tpr'

# Create and launch bb
Grompp(input_gro_path=output_genion_gro,
       input_top_zip_path=output_genion_top_zip,
       output_tpr_path=output_gppmin_tpr,
       properties=prop).launch()

### Step 2: Running Energy Minimization
Running **energy minimization** using the **tpr file** generated in the previous step. 

In [None]:
# Mdrun: Running minimization
from biobb_md.gromacs.mdrun import Mdrun

# Create prop dict and inputs/outputs
output_min_trr = pdbCode+'_'+ligandCode+'_min.trr'
output_min_gro = pdbCode+'_'+ligandCode+'_min.gro'
output_min_edr = pdbCode+'_'+ligandCode+'_min.edr'
output_min_log = pdbCode+'_'+ligandCode+'_min.log'

# Create and launch bb
Mdrun(input_tpr_path=output_gppmin_tpr,
      output_trr_path=output_min_trr, 
      output_gro_path=output_min_gro,
      output_edr_path=output_min_edr, 
      output_log_path=output_min_log).launch()

### Step 3: Checking Energy Minimization results
Checking **energy minimization** results. Plotting **potential energy** by time during the minimization process. 

In [None]:
# GMXEnergy: Getting system energy by time  
from biobb_analysis.gromacs.gmx_energy import GMXEnergy

# Create prop dict and inputs/outputs
output_min_ene_xvg = pdbCode+'_'+ligandCode+'_min_ene.xvg'
prop = {
    'terms':  ["Potential"]
}

# Create and launch bb
GMXEnergy(input_energy_path=output_min_edr, 
          output_xvg_path=output_min_ene_xvg, 
          properties=prop).launch()

In [None]:
import plotly
import plotly.graph_objs as go

#Read data from file and filter energy values higher than 1000 Kj/mol^-1
with open(output_min_ene_xvg,'r') as energy_file:
    x,y = map(
        list,
        zip(*[
            (float(line.split()[0]),float(line.split()[1]))
            for line in energy_file 
            if not line.startswith(("#","@")) 
            if float(line.split()[1]) < 1000 
        ])
    )

plotly.offline.init_notebook_mode(connected=True)

plotly.offline.iplot({
    "data": [go.Scatter(x=x, y=y)],
    "layout": go.Layout(title="Energy Minimization",
                        xaxis=dict(title = "Energy Minimization Step"),
                        yaxis=dict(title = "Potential Energy KJ/mol-1")
                       )
})

***
## Preparing Ligand Restraints
Before starting an equilibration process of a **protein-ligand complex** system, it is recommended to apply some **restraints** to the small molecule, to avoid a possible change in position due to protein repulsion. **Position restraints** will be applied to the ligand, using a **force constant of 1000 KJ/mol\*nm^2** on the three coordinates: x, y and z.
- Step 1: Creating an index file with a new group including just the **small molecule heavy atoms**.
- Step 2: Generating the **position restraints** file.
***

### Step 1: Creating an index file for the small molecule heavy atoms

In [None]:
# MakeNdx: Creating index file with a new group (small molecule heavy atoms)
from biobb_md.gromacs.make_ndx import MakeNdx

# Create prop dict and inputs/outputs
output_ligand_ndx = ligandCode+'_index.ndx'
prop = {
    'selection': "0 & ! a H*"
}

# Create and launch bb
MakeNdx(input_structure_path=output_acpype_gro,
        output_ndx_path=output_ligand_ndx,
        properties=prop).launch()

### Step 2: Generating the position restraints file

In [None]:
# Genrestr: Generating the position restraints file
from biobb_md.gromacs.genrestr import Genrestr

# Create prop dict and inputs/outputs
output_restraints_top = pdbCode+'_'+ligandCode+'_restraints.zip'
prop = {
    'force_constants': "1000 1000 1000"
}

# Create and launch bb
Genrestr(input_structure_path=output_min_gro,
         input_top_zip_path=output_genion_top_zip,
         input_ndx_path=output_ligand_ndx,
         output_top_zip_path=output_restraints_top,
         properties=prop).launch()

***
## Equilibrate the system (NVT)
Equilibrate the **protein-ligand complex** system in NVT ensemble (constant Number of particles, Volume and Temperature). To avoid temperature coupling problems, a new *"system"* group will be created including the **protein** + the **ligand** to be assigned to a single thermostatting group.

- Step 1: Creating an index file with a new group including the **protein-ligand complex**.
- Step 2: Creating portable binary run file for system equilibration
- Step 3: Equilibrate the **protein-ligand complex** with NVT ensemble.
- Step 4: Checking **NVT Equilibration** results. Plotting **system temperature** by time during the **NVT equilibration** process. 
***

### Step 1: Creating an index file with a new group including the **protein-ligand complex**

In [None]:
# MakeNdx: Creating index file with a new group (protein-ligand complex)
from biobb_md.gromacs.make_ndx import MakeNdx

# Create prop dict and inputs/outputs
output_complex_ndx = pdbCode+'_'+ligandCode+'_index.ndx'
prop = {
    #'selection': "\\\"Protein\\\"|\\\"" + ligandCode + "\\\""
    'selection': "\\\"Protein\\\"|\\\"Other\\\"" 
}

# Create and launch bb
MakeNdx(input_structure_path=output_min_gro,
        output_ndx_path=output_complex_ndx,
        properties=prop).launch()

### Step 2: Creating portable binary run file for system equilibration (NVT)
Note that for the purposes of temperature coupling, the **protein-ligand complex** (*Protein_Other*) is considered as a single entity.

In [None]:
# Grompp: Creating portable binary run file for NVT System Equilibration
from biobb_md.gromacs.grompp import Grompp

# Create prop dict and inputs/outputs
output_gppnvt_tpr = pdbCode+'_'+ligandCode+'gppnvt.tpr'
prop = {
    'mdp':{
        'type': 'nvt',
        'nsteps':'5000',
        'dt' : '0.001',
        'tc-grps': 'Protein_Other Water_and_ions',
        'define': '-DPOSRES'
    }
}

# Create and launch bb
Grompp(input_gro_path=output_min_gro,
       input_top_zip_path=output_genion_top_zip,
       input_ndx_path=output_complex_ndx,
       output_tpr_path=output_gppnvt_tpr,
       properties=prop).launch()

### Step 3: Running NVT equilibration

In [None]:
# Mdrun: Running NVT System Equilibration 
from biobb_md.gromacs.mdrun import Mdrun

# Create prop dict and inputs/outputs
output_nvt_trr = pdbCode+'_'+ligandCode+'_nvt.trr'
output_nvt_gro = pdbCode+'_'+ligandCode+'_nvt.gro'
output_nvt_edr = pdbCode+'_'+ligandCode+'_nvt.edr'
output_nvt_log = pdbCode+'_'+ligandCode+'_nvt.log'
output_nvt_cpt = pdbCode+'_'+ligandCode+'_nvt.cpt'

# Create and launch bb
Mdrun(input_tpr_path=output_gppnvt_tpr,
      output_trr_path=output_nvt_trr,
      output_gro_path=output_nvt_gro,
      output_edr_path=output_nvt_edr,
      output_log_path=output_nvt_log,
      output_cpt_path=output_nvt_cpt).launch()

### Step 4: Checking NVT Equilibration results
Checking **NVT Equilibration** results. Plotting **system temperature** by time during the NVT equilibration process. 

In [None]:
# GMXEnergy: Getting system temperature by time during NVT Equilibration  
from biobb_analysis.gromacs.gmx_energy import GMXEnergy

# Create prop dict and inputs/outputs
output_nvt_temp_xvg = pdbCode+'_'+ligandCode+'_nvt_temp.xvg'
prop = {
    'terms':  ["Temperature"]
}

# Create and launch bb
GMXEnergy(input_energy_path=output_nvt_edr, 
          output_xvg_path=output_nvt_temp_xvg, 
          properties=prop).launch()

In [None]:
import plotly
import plotly.graph_objs as go

# Read temperature data from file 
with open(output_nvt_temp_xvg,'r') as temperature_file:
    x,y = map(
        list,
        zip(*[
            (float(line.split()[0]),float(line.split()[1]))
            for line in temperature_file 
            if not line.startswith(("#","@")) 
        ])
    )

plotly.offline.init_notebook_mode(connected=True)

plotly.offline.iplot({
    "data": [go.Scatter(x=x, y=y)],
    "layout": go.Layout(title="Temperature during NVT Equilibration",
                        xaxis=dict(title = "Time (ps)"),
                        yaxis=dict(title = "Temperature (K)")
                       )
})

***
## Equilibrate the system (NPT)
Equilibrate the **protein-ligand complex** system in NPT ensemble (constant Number of particles, Pressure and Temperature) .
- Step 1: Creating portable binary run file for system equilibration
- Step 2: Equilibrate the **protein-ligand complex** with NPT ensemble.
- Step 3: Checking **NPT Equilibration** results. Plotting **system pressure and density** by time during the **NPT equilibration** process.
***

### Step 1: Creating portable binary run file for system equilibration (NPT)

In [None]:
# Grompp: Creating portable binary run file for (NPT) System Equilibration
from biobb_md.gromacs.grompp import Grompp

# Create prop dict and inputs/outputs
output_gppnpt_tpr = pdbCode+'_'+ligandCode+'_gppnpt.tpr'
prop = {
    'mdp':{
        'type': 'npt',
        'nsteps':'5000',
        'tc-grps': 'Protein_Other Water_and_ions',
        'define': '-DPOSRES'
    }
}

# Create and launch bb
Grompp(input_gro_path=output_nvt_gro,
       input_top_zip_path=output_genion_top_zip,
       input_ndx_path=output_complex_ndx,
       output_tpr_path=output_gppnpt_tpr,
       input_cpt_path=output_nvt_cpt,
       properties=prop).launch()


### Step 2: Running NPT equilibration

In [None]:
# Mdrun: Running NPT System Equilibration
from biobb_md.gromacs.mdrun import Mdrun

# Create prop dict and inputs/outputs
output_npt_trr = pdbCode+'_'+ligandCode+'_npt.trr'
output_npt_gro = pdbCode+'_'+ligandCode+'_npt.gro'
output_npt_edr = pdbCode+'_'+ligandCode+'_npt.edr'
output_npt_log = pdbCode+'_'+ligandCode+'_npt.log'
output_npt_cpt = pdbCode+'_'+ligandCode+'_npt.cpt'

# Create and launch bb
Mdrun(input_tpr_path=output_gppnpt_tpr,
      output_trr_path=output_npt_trr,
      output_gro_path=output_npt_gro,
      output_edr_path=output_npt_edr,
      output_log_path=output_npt_log,
      output_cpt_path=output_npt_cpt).launch()

### Step 3: Checking NPT Equilibration results
Checking **NPT Equilibration** results. Plotting **system pressure and density** by time during the **NPT equilibration** process. 

In [None]:
# GMXEnergy: Getting system pressure and density by time during NPT Equilibration  
from biobb_analysis.gromacs.gmx_energy import GMXEnergy

# Create prop dict and inputs/outputs
output_npt_pd_xvg = pdbCode+'_'+ligandCode+'_npt_PD.xvg'
prop = {
    'terms':  ["Pressure","Density"]
}

# Create and launch bb
GMXEnergy(input_energy_path=output_npt_edr, 
          output_xvg_path=output_npt_pd_xvg, 
          properties=prop).launch()

In [None]:
import plotly
from plotly import tools
import plotly.graph_objs as go

# Read pressure and density data from file 
with open(output_npt_pd_xvg,'r') as pd_file:
    x,y,z = map(
        list,
        zip(*[
            (float(line.split()[0]),float(line.split()[1]),float(line.split()[2]))
            for line in pd_file 
            if not line.startswith(("#","@")) 
        ])
    )

plotly.offline.init_notebook_mode(connected=True)

trace1 = go.Scatter(
    x=x,y=y
)
trace2 = go.Scatter(
    x=x,y=z
)

fig = tools.make_subplots(rows=1, cols=2, print_grid=False)

fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 1, 2)

fig['layout']['xaxis1'].update(title='Time (ps)')
fig['layout']['xaxis2'].update(title='Time (ps)')
fig['layout']['yaxis1'].update(title='Pressure (bar)')
fig['layout']['yaxis2'].update(title='Density (Kg*m^-3)')

fig['layout'].update(title='Pressure and Density during NPT Equilibration')
fig['layout'].update(showlegend=False)

plotly.offline.iplot(fig)

***
## Free Molecular Dynamics Simulation
Upon completion of the **two equilibration phases (NVT and NPT)**, the system is now well-equilibrated at the desired temperature and pressure. The **position restraints** can now be released. The last step of the **protein-ligand complex** MD setup is a short, **free MD simulation**, to ensure the robustness of the system. 
- Step 1: Creating portable binary run file to run a **free MD simulation**.
- Step 2: Run short MD simulation of the **protein-ligand complex**.
- Step 3: Checking results for the final step of the setup process, the **free MD run**. Plotting **Root Mean Square deviation (RMSd)** and **Radius of Gyration (Rgyr)** by time during the **free MD run** step. 
***

### Step 1: Creating portable binary run file to run a free MD simulation

In [None]:
# Grompp: Creating portable binary run file for mdrun
from biobb_md.gromacs.grompp import Grompp

# Create prop dict and inputs/outputs
prop = {
    'mdp':{
        'type': 'free',
        #'nsteps':'500000' # 1 ns (500,000 steps x 2fs per step)
        #'nsteps':'50000' # 100 ps (50,000 steps x 2fs per step)
        'nsteps':'25000' # 50 ps (25,000 steps x 2fs per step)
        #'nsteps':'5000' # 10 ps (5,000 steps x 2fs per step)
    }
}
output_gppmd_tpr = pdbCode+'_'+ligandCode + '_gppmd.tpr'

# Create and launch bb
Grompp(input_gro_path=output_npt_gro,
       input_top_zip_path=output_genion_top_zip,
       output_tpr_path=output_gppmd_tpr,
       input_cpt_path=output_npt_cpt,
       properties=prop).launch()

### Step 2: Running short free MD simulation

In [None]:
# Mdrun: Running free dynamics
from biobb_md.gromacs.mdrun import Mdrun

# Create prop dict and inputs/outputs
output_md_trr = pdbCode+'_'+ligandCode+'_md.trr'
output_md_gro = pdbCode+'_'+ligandCode+'_md.gro'
output_md_edr = pdbCode+'_'+ligandCode+'_md.edr'
output_md_log = pdbCode+'_'+ligandCode+'_md.log'
output_md_cpt = pdbCode+'_'+ligandCode+'_md.cpt'

# Create and launch bb
Mdrun(input_tpr_path=output_gppmd_tpr,
      output_trr_path=output_md_trr,
      output_gro_path=output_md_gro,
      output_edr_path=output_md_edr,
      output_log_path=output_md_log,
      output_cpt_path=output_md_cpt).launch()

### Step 3: Checking free MD simulation results
Checking results for the final step of the setup process, the **free MD run**. Plotting **Root Mean Square deviation (RMSd)** and **Radius of Gyration (Rgyr)** by time during the **free MD run** step. **RMSd** against the **experimental structure** (input structure of the pipeline) and against the **minimized and equilibrated structure** (output structure of the NPT equilibration step).

In [None]:
# GMXRms: Computing Root Mean Square deviation to analyse structural stability 
#         RMSd against minimized and equilibrated snapshot (backbone atoms)   

from biobb_analysis.gromacs.gmx_rms import GMXRms

# Create prop dict and inputs/outputs
output_rms_first = pdbCode+'_'+ligandCode+'_rms_first.xvg'
prop = {
    'selection':  'Backbone'
}

# Create and launch bb
GMXRms(input_structure_path=output_gppmd_tpr,
         input_traj_path=output_md_trr,
         output_xvg_path=output_rms_first, 
          properties=prop).launch()

In [None]:
# GMXRms: Computing Root Mean Square deviation to analyse structural stability 
#         RMSd against experimental structure (backbone atoms)   

from biobb_analysis.gromacs.gmx_rms import GMXRms

# Create prop dict and inputs/outputs
output_rms_exp = pdbCode+'_'+ligandCode+'_rms_exp.xvg'
prop = {
    'selection':  'Backbone'
}

# Create and launch bb
GMXRms(input_structure_path=output_gppmin_tpr,
         input_traj_path=output_md_trr,
         output_xvg_path=output_rms_exp, 
          properties=prop).launch()

In [None]:
import plotly
import plotly.graph_objs as go

# Read RMS vs first snapshot data from file 
with open(output_rms_first,'r') as rms_first_file:
    x,y = map(
        list,
        zip(*[
            (float(line.split()[0]),float(line.split()[1]))
            for line in rms_first_file 
            if not line.startswith(("#","@")) 
        ])
    )

# Read RMS vs experimental structure data from file 
with open(output_rms_exp,'r') as rms_exp_file:
    x2,y2 = map(
        list,
        zip(*[
            (float(line.split()[0]),float(line.split()[1]))
            for line in rms_exp_file
            if not line.startswith(("#","@")) 
        ])
    )
    
trace1 = go.Scatter(
    x = x,
    y = y,
    name = 'RMSd vs first'
)

trace2 = go.Scatter(
    x = x,
    y = y2,
    name = 'RMSd vs exp'
)

data = [trace1, trace2]

plotly.offline.init_notebook_mode(connected=True)

plotly.offline.iplot({
    "data": data,
    "layout": go.Layout(title="RMSd during free MD Simulation",
                        xaxis=dict(title = "Time (ps)"),
                        yaxis=dict(title = "RMSd (nm)")
                       )
})


In [None]:
# GMXRgyr: Computing Radius of Gyration to measure the protein compactness during the free MD simulation 

from biobb_analysis.gromacs.gmx_rgyr import GMXRgyr

# Create prop dict and inputs/outputs
output_rgyr = pdbCode+'_'+ligandCode+'_rgyr.xvg'
prop = {
    'selection':  'Backbone'
}

# Create and launch bb
GMXRms(input_structure_path=output_gppmin_tpr,
         input_traj_path=output_md_trr,
         output_xvg_path=output_rgyr, 
          properties=prop).launch()

In [None]:
import plotly
import plotly.graph_objs as go

# Read Rgyr data from file 
with open(output_rgyr,'r') as rgyr_file:
    x,y = map(
        list,
        zip(*[
            (float(line.split()[0]),float(line.split()[1]))
            for line in rgyr_file 
            if not line.startswith(("#","@")) 
        ])
    )

plotly.offline.init_notebook_mode(connected=True)

plotly.offline.iplot({
    "data": [go.Scatter(x=x, y=y)],
    "layout": go.Layout(title="Radius of Gyration",
                        xaxis=dict(title = "Time (ps)"),
                        yaxis=dict(title = "Rgyr (nm)")
                       )
})

***
## Post-processing and Visualizing resulting 3D trajectory
Post-processing and Visualizing the **protein-ligand complex system** MD setup **resulting trajectory** using **NGL**
- Step 1: *Imaging* the resulting trajectory, **stripping out water molecules and ions** and **correcting periodicity issues**.
- Step 2: Generating a *dry* structure, **removing water molecules and ions** from the final snapshot of the MD setup pipeline.
- Step 3: Visualizing the *imaged* trajectory using the *dry* structure as a **topology**. 
***

### Step 1: *Imaging* the resulting trajectory.
Stripping out **water molecules and ions** and **correcting periodicity issues**

In [None]:
# GMXImage: "Imaging" the resulting trajectory
#           Removing water molecules and ions from the resulting structure
from biobb_analysis.gromacs.gmx_image import GMXImage

# Create prop dict and inputs/outputs
output_imaged_traj = pdbCode+'_imaged_traj.trr'
prop = {
    'center_selection':  'Protein',
    'output_selection': 'Protein',
    'pbc' : 'mol',
    'center' : True
}

# Create and launch bb
GMXImage(input_traj_path=output_md_trr,
         input_top_path=output_gppmd_tpr,
         output_traj_path=output_imaged_traj, 
          properties=prop).launch()

### Step 2: Generating the output *dry* structure.
**Removing water molecules and ions** from the resulting structure

In [None]:
# GMXTrjConvStr: Converting and/or manipulating a structure
#                Removing water molecules and ions from the resulting structure
#                The "dry" structure will be used as a topology to visualize 
#                the "imaged dry" trajectory generated in the previous step.
from biobb_analysis.gromacs.gmx_trjconv_str import GMXTrjConvStr

# Create prop dict and inputs/outputs
output_dry_gro = pdbCode+'_md_dry.gro'
prop = {
    'selection':  'Protein'
}

# Create and launch bb
GMXTrjConvStr(input_structure_path=output_md_gro,
         input_top_path=output_gppmd_tpr,
         output_str_path=output_dry_gro, 
          properties=prop).launch()

### Step 3: Visualizing the generated dehydrated trajectory.
Using the **imaged trajectory** (output of the Post-processing step 1) with the **dry structure** (output of the Post-processing step 2) as a topology.

In [None]:
# Show trajectory
nglview.show_simpletraj(nglview.SimpletrajTrajectory(output_imaged_traj, output_dry_gro), gui=True)

***
## Saving results to the storage area
Getting all generated files in the use case folder and transfer them to the storage area
***

In [None]:
# Building a zip file with all the generated info
zip_file = pdbCode + '_Complex_MDSetup' + ".zip"
with zipfile.ZipFile(zip_file, 'w') as zip_f:
    print ("Generating a zip file with all the content generated\
in the {} project.".format(useCaseDir))
    for fName in os.listdir(useCaseDir):
        if not "zip" in fName:
            localFile = os.path.join(useCaseDir, fName)
            print ("Adding {} to the zip file...".format(fName))
            zip_f.write(localFile, arcname=fName)

In [None]:
# Saving results to the storage area
baseStorageDir = pdbCode + '_Complex_MDSetup_'
timestamp = datetime.datetime.now().strftime('%Y-%m-%d-%H-%M-%S')
storageDir = os.path.join(collab_path, baseStorageDir + timestamp)
try:
    print('Creating storage directory: %s' % storageDir)
    storage_client.mkdir(storageDir)
except:
    print('There was an error creating the storage directory')
else:
    # Copy files to the storage area and remove the local files
    cleanDir = True
    for fName in os.listdir(useCaseDir):
        localFile = os.path.join(useCaseDir, fName)
        if not os.path.isdir(localFile):
            storageFile = os.path.join(storageDir, fName)
            fType =  magic.Magic(mime=True).from_file(localFile)
            try:
                storage_client.upload_file(localFile, storageFile, fType)
            except:
                print('Error copying %s to storage' % fName)
                cleanDir = False
            else: 
                os.remove(localFile)
        else:
            cleanDir = False

    print('All files in the working directory have been moved to the storage area directory:')
    print(storageDir)
    os.chdir(homeDir)
    if cleanDir:
        os.rmdir(useCaseDir)