First import all the libraries, check that the MD engines are present and all python libraries are present.
Also checks if the other scripts are present.

In [1]:

#import libraries
import BioSimSpace as BSS
import glob
import csv
import numpy as np
import itertools
import os

print(BSS.__version__)

# TODO check if all engines are correctly installed

ModuleNotFoundError: BioSimSpace currently requires the Sire Python interpreter: www.siremol.org

# Overview of the FEP pipeline

0. Copy the scripts folder (containing this ipynb also) into the correct location of where it is to be run.
This could be the protein folder name. Set the variables for this below.

### All steps overview

1. Setup and dock ligands in flare.
2. Export the protein as a pdb, export the docked ligands as mol2.
3. Create the perturbation network - that is this file. Choose the parameters for the run.
4. Manually paramaterise the protein and save in the correct location.
5. Run the lig prep.
6. Run the FEP prep.
7. Run the production runs.
8. Perform the analysis.

In [4]:
#set variables
main_folder = "/home/anna/Documents/1st_yr/amber_bss_sem2/testing_two/tyk2"

# scripts should be located in:
scripts_folder = f"{main_folder}/scripts"
# check all scripts are present
# TODO scripts check

# make sure other folders exist too.
path_to_ligands = f"{main_folder}/inputs/ligands"
if not os.path.exists(path_to_ligands):
    os.mkdir(path_to_ligands)

exec_dir = f"{main_folder}/execution_model"
if not os.path.exists(exec_dir):
    os.mkdir(exec_dir)

# TODO set main directory variable in bash script for running all.

#### Step 1
Dock all ligands in Flare based on structure of X-ray bound ligand. Choose best pose. Rename all ligands so correct name.

#### Step 2
Export all the docked ligands as mol2 files and make sure these are saved in the {main_folder}/inputs/ligands folder set in the variable below.

Export the protein as a pdb, keeping the crystal waters. Save the protein files as {main_folder}/inputs/protein_exported.pdb for paramaterisation or other suitable name for manual paramaterisation later.

#### Step 3

Creating the perturbation network is detailed in the next few cells.
Set the vatiables for the paths.
Choose the options from the nodes.

All ligands to be considered should be in the neccessary folder as mol2 files. These should be saved in their already docked position from where they were docked in flare so the coordinates are correct for the rest of the process.

In [5]:
#nodes to pick things
node = BSS.Gateway.Node("A node to create input files for molecular dynamics simulation.")

node.addInput("Ligand FF", BSS.Gateway.String(help="Force field to parameterise ligands with.",
                                             allowed=["GAFF1", "GAFF2", "OpenForceField"],
                                             default="GAFF2"))

node.addInput("Protein FF", BSS.Gateway.String(help="Force field to parameterise the protein with.",
                                             allowed=["FF03", "FF14SB", "FF99", "FF99SB", "FF99SBILDN"],
                                             default="FF14SB"))

node.addInput("Water Model", BSS.Gateway.String(help="Water model to use.",
                                             allowed=["SPC", "SPCE", "TIP3P", "TIP4P", "TIP5P"],
                                             default="TIP3P"))

node.addInput("Box Edges", BSS.Gateway.String(help="Size of water box around molecular system.",
                                             allowed=["20*angstrom", "25*angstrom", "30*angstrom", "35*angstrom", "45*angstrom", "5*nm", "7*nm", "10*nm"],
                                             default="20*angstrom"))

node.addInput("Box Shape", BSS.Gateway.String(help="Geometric shape of water box.",
                                             allowed=["cubic", "truncatedOctahedron"],
                                             default="cubic"))

node.addInput("Run Time", BSS.Gateway.String(help="The sampling time per lambda window.",
                                             allowed=["10*ps", "100*ps", "1*ns", "2*ns", "3*ns", "4*ns", "5*ns", "8*ns", "10*ns", "12*ns", "15*ns"],
                                             default="4*ns"))

node.addInput("HMR", BSS.Gateway.String(help="Whether or not Hydrogen Mass repartitioning should be used. If true, a timestep of 4 fs will be used.",
                                             allowed=["True","False"],
                                             default="True"))

node.addInput("FEP Engine", BSS.Gateway.String(help="Engine to run FEP with.",
                                             allowed=[e.upper() for e in BSS.FreeEnergy.engines()],
                                             default="SOMD"))

node.addInput("LambdaWindows", BSS.Gateway.String(help="The number of lambda windows for regular transformations.",
                                             allowed=["3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20"],
                                             default="11"))

node.addInput("DiffLambdaWindows", BSS.Gateway.String(help="The number of lambda windows for difficult transformations.",
                                             allowed=["4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20"],
                                             default="17"))
                                             
node.addInput("LOMAP Threshold", BSS.Gateway.String(help="The LOMAP score threshold to define difficult transformations.",
                                             allowed=["0.1", "0.2", "0.3", "0.4", "0.5", "0.6", "0.7", "0.8", "0.9"],
                                             default="0.4"))

node.addInput("Number of repeats", BSS.Gateway.String(help="The number of repeats of the simulation.",
                                             allowed=[str(i) for i in range (1,11)],
                                             default=str(1)))


node.showControls()

Box(children=(Box(children=(Box(children=(Label(value='Ligand FF: Force field to parameterise ligands with.'),…

object.__init__() takes exactly one argument (the instance to initialize)
This is deprecated in traitlets 4.2.This error will be raised in a future release of traitlets.
  super(Widget, self).__init__(**kwargs)
object.__init__() takes exactly one argument (the instance to initialize)
This is deprecated in traitlets 4.2.This error will be raised in a future release of traitlets.
  super(Widget, self).__init__(**kwargs)
object.__init__() takes exactly one argument (the instance to initialize)
This is deprecated in traitlets 4.2.This error will be raised in a future release of traitlets.
  super(Widget, self).__init__(**kwargs)
object.__init__() takes exactly one argument (the instance to initialize)
This is deprecated in traitlets 4.2.This error will be raised in a future release of traitlets.
  super(Widget, self).__init__(**kwargs)
object.__init__() takes exactly one argument (the instance to initialize)
This is deprecated in traitlets 4.2.This error will be raised in a future release 

In [6]:
#generate transformation network based on ligands put in

ligand_files = glob.glob(f"{path_to_ligands}/*.mol2")

ligands = []
ligand_names = []

for filepath in ligand_files:
    # append the molecule object to a list.
    ligands.append(BSS.IO.readMolecules(filepath)[0])
    
    # append the molecule name to another list so that we can use the name of each molecule in our workflow.
    ligand_names.append(filepath.split("/")[-1].replace(".mol2",""))

tranformations, lomap_scores = BSS.Align.generateNetwork(ligands, plot_network=True, names=ligand_names)

#print the transformation network
pert_network_dict = {}
transformations_named = [(ligand_names[transf[0]], ligand_names[transf[1]]) for transf in tranformations]
for transf, score in zip(transformations_named, lomap_scores):
    print(transf, score)
    pert_network_dict[transf] = score


AlignmentError: Unable to create network plot!

In [5]:
#add transformations to the network
pert_network_dict[('ejm42', 'ejm31')] = 0.5
pert_network_dict[('ejm31', 'ejm42')] = 0.5
pert_network_dict[('ejm42', 'ejm54')] = 0.5
pert_network_dict[('ejm42', 'ejm55')] = 0.5
pert_network_dict[('ejm55', 'ejm54')] = 0.5

#remove transformations from the network
for key in [('ejm55', 'ejm31'), ('ejm54', 'ejm31'), ('ejm55', 'ejm42')]:
    del pert_network_dict[key]

#show the adjusted network
pert_network_dict

NameError: name 'pert_network_dict' is not defined

In [8]:
#write files for execution model to be executed correctly.

# write ligands file.
with open(f"{exec_dir}/ligands.dat", "w") as ligands_file:
    writer = csv.writer(ligands_file)
    for lig in ligand_names:
        writer.writerow([lig])


In [None]:
# write perts file. Base the lambda schedule on the file generated in the previous cell.
np.set_printoptions(formatter={'float': '{: .4f}'.format})

# from protocol, derive the engine we want to use on the cluster.
engine = node.getInput('FEP Engine').upper()

with open(f"{exec_dir}/network.dat", "w") as network_file:

    writer = csv.writer(network_file, delimiter=" ")
    
    for pert, lomap_score in pert_network_dict.items():
        # based on the provided (at top of notebook) lambda allocations and LOMAP threshold, decide allocation.
        if lomap_score == None or lomap_score < float(node.getInput("LOMAP Threshold")):
            num_lambda = node.getInput("DiffLambdaWindows")
        else:
            num_lambda = node.getInput("LambdaWindows")
            
       
        # given the number of allocated lambda windows, generate an array for parsing downstream.
        lam_array_np = np.around(np.linspace(0, 1, int(num_lambda)), decimals=5)

        # make the array into a format readable by bash.
        lam_array = str(lam_array_np).replace("[ ", "").replace("]", "").replace("  ", ",").replace('\n', '')

        # write out both directions for this perturbation.
        writer.writerow([pert[0], pert[1], len(lam_array_np), lam_array, engine])
        writer.writerow([pert[1], pert[0], len(lam_array_np), lam_array, engine])         



In [12]:
# create protocol. 
protocol = [
    f"ligand forcefield = {node.getInput('Ligand FF')}",
    f"protein forcefield = {node.getInput('Protein FF')}",
    f"solvent = {node.getInput('Water Model')}",
    f"box edges = {node.getInput('Box Edges')}",
    f"box type = {node.getInput('Box Shape')}",
    f"protocol = default",
    f"sampling = {node.getInput('Run Time')}",
    f"engine = {node.getInput('FEP Engine').upper()}",
    f"HMR = {node.getInput('HMR')}",
    f"repeats = {node.getInput('Number of repeats')}"
]

# write protocol to file.
with open(f"{exec_dir}/protocol.dat", "w") as protocol_file:
    writer = csv.writer(protocol_file)

    for prot_line in protocol:
        
        writer.writerow([prot_line])

In [13]:
print("Ligands...")
f = open(f"{exec_dir}/ligands.dat")
for row in csv.reader(f):
    print(str(row))

print('\n')
print("Network...")
f = open(f"{exec_dir}/network.dat")
for row in csv.reader(f):
    print(str(row))

print('\n')
print("Protocol...")
f=open(f"{exec_dir}/protocol.dat")
for row in csv.reader(f):
    print(str(row))

Ligands...
['ejm54']
['ejm55']
['ejm42']
['ejm31']


Protocol...
['ligand forcefield = GAFF2']
['protein forcefield = FF14SB']
['solvent = TIP3P']
['box edges = 20*angstrom']
['box type = cubic']
['protocol = default']
['sampling = 2*ns']
['engine = SOMD']
['HMR = True']
['repeats = 3']


#### Step 4 (cont.)
The above cell should have printed out all the ligands, the network and the protocol. Check these are all okay.


#### Step 5
Manually paramaterise the protein with the correct protonation states (espescially around the ligand binding site!) and checking for any missing residues or random solvent molecules. Keep the crystal waters.
This can be done using tleap and the protein ff and water selected above.
Make sure it is saved in saved in the {main_folder}/inputs/ as prot_water.rst7 and prot_water.prm7 .

#### Steps 6-8
Run the lig prep, FEP prep, production run.
These processes are contained within bash scripts to allow for it to run on a cluster.
At this point, the following files and folders (XXX) can be copied over a different location or whatever to run.
The overall script to execute all of these is xyz.

#### Step 9
The analysis to intially process the data is carried out using xyz. This data is saved here in the folders - .
Output of methods and any issues used are documented in xyz.
xyz can be used to prepare default graphs.
Cells below are an alternate way to adjust the plotting to diff situations.