# Druggability project

### Andreu Bofill, Inés Sentís, Mariona Torrens, Alejandro Varela

This project aims to provide a simple platform to detect among a set of ligands and a protein if their interaction result in a system with a free energy lower than -2 kcal/mol. This would reflect a good interaction between the ligand and the protein which is a very interesting property in a drug as, ideally, a low energy of interaction may correspond with a good drug candidate.

This platform starts parameterizing the selected ligands to achieve a conformation as close to reality as possible. This process is really computationally demanding and depends on the number of atoms of the molecule being parameterized. The script that acomplishes this is in the github repository and it is called 'parameter.py'.



In [None]:
from htmd import *
from htmd.molecule.util import maxDistance
from htmd.protocols.equilibration_v1 import Equilibration
from htmd.protocols.production_v1 import Production
from natsort import natsorted

def simulate(pdbpath,ligandpath,path_ligand_rtf,path_ligand_prm,nbuilds=4,run_time=50,minsim=6,maxsim=8,numbep=12,dimtica=3,sleeping=14400):
    poses=dockinit(pdbpath,ligandpath)
    print('\nDocking finished.')
    building(poses,path_ligand_rtf,path_ligand_prm,nbuilds)
    print('\nAll systems build.')
    Equilibrate()
    print('All systems equilibrated.Entering production, this could take days of running...')
    Produce(run_time)
    print('Finished producing. Starting the adaptive run, this could take days of running...')
    adaptive(minsim,maxsim,numbep,dimtica,sleeping)

To start, this platform initializes the system by doing a docking between the ligand and the protein using the dock function of HTMD. The top 5 poses are used to build the systems, each pose is built independently. The point of starting with docked position is that it ensures a good starting point to run a simulation and saves time and computer resources.

In [None]:
def dockinit(pdbpath,ligandpath):
    prot = Molecule(pdbpath) #'ethtryp/trypsin.pdb'
    prot.filter('protein or water or resname CA') #before it said chain A and (...)
    prot.set('segid', 'P', sel='protein and noh')
    prot.set('segid', 'W', sel='water')
    prot.set('segid', 'CA', sel='resname CA')
    D = maxDistance(prot, 'all')
    D = D + 15
    prot.center()
    lig = Molecule(ligandpath) # 'ethtryp/ethanol.pdb'
    print(lig,prot)
    poses, scores = dock(prot, lig)
    return(poses)

Each of the five different poses are solvated and a salt concentration of 0.15  is added, to simulate cell conditions.

In [None]:
def building(poses,path_ligand_rtf,path_ligand_prm,nbuilds=4):
    moltbuilt=[]
    for i, p in enumerate(poses):
        ligand = p
        ligand.set('segid','L')
        ligand.set('resname','MOL')
        mol = Molecule(name='combo')
        mol.append(prot)
        mol.append(ligand)

        smol = solvate(mol, minmax=[[-D, -D, -D], [D, D, D]])
        topos  = ['top/top_all22star_prot.rtf', 'top/top_water_ions.rtf',path_ligand_rtf] #'./ethtryp/ethanol.rtf'
        params = ['par/par_all22star_prot.prm', 'par/par_water_ions.prm', path_ligand_prm] #'./ethtryp/ethanol.prm'

        moltbuilt.append(charmm.build(smol, topo=topos, param=params, outdir='./docked/build/{}/'.format(i+1), saltconc=0.15))
        if i==nbuilds:
            break

After this, an equilibration protocol is performed over each system. This allows us to stablish a temperature of 298 Kelvin in each system using 1000 time steps.

In [None]:
def Equilibrate():
    md = Equilibration()
    md.numsteps = 1000
    md.temperature = 298
    builds=natsorted(glob('docked/build/*/'))
    for i,b in enumerate(builds):
        md.write(b,'docked/equil/{}/'.format(i+1))
    mdx = AcemdLocal()
    mdx.submit(glob('./docked/equil/*/'))
    mdx.wait()

The already equilibrated systems enter the production step where trajectories for each system are created using the Newton equations of motion.

In [None]:
def Produce(run_time=50):
    equils=natsorted(glob('docked/equil/*/'))
    for i,b in enumerate(equils):
        md= Production()
        md.acemd.bincoordinates = 'output.coor'
        md.acemd.extendedsystem  = 'output.xsc'
        md.acemd.binvelocities=None
        md.acemd.binindex=None
        md.acemd.run=str(run_time)+'ns'
        md.temperature = 300
        equils=natsorted(glob('docked/equil/*/'))
        md.write('./docked/equil/{}/'.format(i+1), 'docked/generators/{}/'.format(i+1))

    mdx = AcemdLocal()
    mdx.submit(glob('./docked/generators/*/'))
    mdx.wait()

Finally, we run adaptive to generate the epochs which will finally be used for the ligand binding analysis. A folder called 'filtered' will be created in the working directory which will contain the filtered trajectories for all the epochs. The point of doing adaptative is to accelerate the simulation proccess by selecting those results that represent an advanced position to avoid repetition from the beggining.

In [None]:
def adaptive(minsim=6,maxsim=8,numbep=12,dimtica=3,sleeping=14400):
    md = AdaptiveRun()
    md.nmin=minsim
    md.nmax=maxsim
    md.nepochs = numbep
    md.app = AcemdLocal()
    md.generatorspath='./docked/generators/'
    md.datapath='./docked/generators/'
    md.inputpath='./docked/generators/'
    md.dryrun = False 
    md.metricsel1 = 'name CA'
    md.metricsel2 = 'resname MOL and noh'
    md.metrictype = 'contacts'
    md.ticadim = dimtica
    md.updateperiod = sleeping
    md.run()

Once your epochs are generated, we can analyse the interaction between the ligands and the protein.

In [None]:
#sims = simlist(glob('input/*/'), glob('input/*/structure.pdb'))
#fsims = simfilter(sims, './filtered/', filtersel='not water')
sims = simlist(glob('./filtered/*/'), './filtered/filtered.pdb')

In order to build a Markov Model we need to work on a lower dimensional space, we use the binary contact map between alpha carbons and the ligand. (?) el tutorial dice otra cosa, pero me da que está mal: https://www.htmd.org/docs/htmd.projections.metricdistance.html?highlight=metricdistance 

In [None]:
metr = Metric(sims)
metr.projection(MetricDistance('protein and name CA', 'resname MOL and noh', metric='contacts'))
data = metr.project()
data.fstep = 0.1

We visualize now the length of the trajectories to see if they are equal. The trajectories that are not equal to the mode are eliminated because, probably, they are corrupted.

In [None]:
data.plotTrajSizes()
data.dropTraj()

TICA is performed to achive greater differentiation of metastable  minima.

In [None]:
tica = TICA(data, 10)
dataTica = tica.project(3)

In [None]:
dataBoot = dataTica.bootstrap(0.8)
dataBoot.cluster(MiniBatchKMeans(n_clusters=1000), mergesmall=5) #try with dataTica instead of dataBoot

Once the clustering is done, it is time to construct the markov model, to do this, an ITS plot has to be observed and see at which time lag time do timescales start converging and also, to see how many different timescales there are.

In [None]:
model = Model(dataBoot) #try with dataTica
model.plotTimescales() 

In [None]:
model.markovModel(50, 5) 

In [None]:
htmd.config(viewer='vmd')
#model.viewStates(ligand='resname MOL and noh')
mols = model.getStates()
print(mols)

In [None]:
kin = Kinetics(mols[1], temperature=298, concentration=0.0037)

r = kin.getRates()
print(r.g0eq)

In [None]:
kin.plotRates(rates=('g0eq'))

In [None]:
kin.plotFluxPathways()