/
scripts.txt
626 lines (402 loc) · 20.1 KB
/
scripts.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
.. |Pow| replace:: *P*\ :sub:`OW`
.. |logPow| replace:: log *P*\ :sub:`OW`
.. _mdpow-scripts-label:
The ``mdpow-*`` scripts
=======================
A number of python scripts are installed together with the
:mod:`mdpow` package. They simplify some common tasks (especially at
the analysis stage) but they make some assumptions about directory
layout and filenames. If one uses defaults for all directory and
filename options then it should "just work".
In particular, a directory hierarchy such as the following is
assumed::
moleculename/
water.simulation
octanol.simulation
Equilibrium/
water/
octanol/
FEP/
water/
Ghyd.fep
Coulomb/
VDW/
octanol/
Goct.fep
Coulomb/
VDW/
*moleculename* is, for instance, "benzene" or "amantadine"; in the run
input file (see `Equilibrium simulations`_) is the value of the
variable ``name`` in the ``[setup]`` section.
.. _mdpow-equilibrium-label:
Equilibrium simulations
-----------------------
The :program:`mdpow-equilibrium` script
* sets up equilibrium MD simulations for the solvents (e.g., *water* or
*octanol*)
* runs **energy minimization**, **MD_relaxed**, and **MD_NPT**
protocols; the user can choose if she wants to launch
:program:`mdrun` herself (e.g. on a cluster) or let the script do it
locally on the workstation
The script runs essentially the same steps as described in the tutorial
:ref:`example_octanol-label` but it gathers all required parameters from a run
input file and it allows one to stop and continue and the protocol
transparently.
It requires as at least Gromacs 4.6.5 ready to run (check that all commands can
be found). The required **input** is
1. a run configuration file (*runinput.yml*);
2. a structure file (PDB or GRO) for the compound
3. a Gromacs ITP file for the compound (OPLS/AA force field)
A template *RUNFILE* can be generated with :option:`mdpow-get-runinput
runinput.yml`, which will copy the default run input file bundled with
MDPOW and put it in the current directory under the name
``runinput.yml``.
For an example of a *RUNFILE* see :download:`benzene.yml
<../../examples/benzene.yml>`. It is recommended to use absolute paths
to file names. The run input file uses `YAML`_ syntax (and is parsed
by :mod:`yaml`).
.. _YAML: http://yaml.org/
.. Note:: It is recommended to enclose all strings in the input file
in quotes, especially if they can be interpreted as
numbers. For example, a name "005" would be interpreted as
the number 5 unless explicitly quoted.
The script keeps track of the stages of the simulation protocol and allows the
user to **restart from the last completed stage**. For instance, one can use the
script to set up a simulation, then run the simulation on a cluster, transfer
back the generated files, and start :program:`mdpow-equilibrium` again with the
exact same input to finish the protocol. Since Gromacs 4.5 it is also possible
to interrupt a running :program:`mdrun` process (e.g. with :kbd:`Control-c`)
and then resume the simulation at the last saved trajectory checkpoint by
running :program:`mdpow-equilibrium` again.
If in doubt, just try running :program:`mdpow-equilibrium` running again and
let it figure out the best course of action. Look at the log file to see what
has been done. Lines reading "Fast forwarding: *stage*" indicate that results
from *stage* are available.
Usage of the command:
.. program:: mdpow-equilibrium
:program:`mdpow-equilibrium` [options] *RUNFILE*
Set up (and possibly run) the equilibration equilibrium simulation for one
compound and one solvent. All parameters except the solvent are specified in
the *RUNFILE*.
Options:
.. cmdoption:: -h, --help
show this help message and exit
.. cmdoption:: -S <NAME>, --solvent=<NAME>
solvent ``<NAME>`` for compound, can be 'water', 'octanol', 'cyclohexane' [water]
.. cmdoption:: -d <DIRECTORY>, --dirname=<DIRECTORY>
generate files and directories in ``<DIRECTORY>``, which is created if it does not
already exist. The default is to use the molecule *name* from the run input
file.
.. _mdpow-fep-label:
FEP simulations
---------------
The :program:`mdpow-fep` script sets up (and can also run) the free energy
perturbation calculations for one compound and one solvent. It uses the results
from :ref:`mdpow-equilibrium <mdpow-equilibrium-label>` together with the run
input file.
You will require:
1. at least Gromacs 4.0.5 ready to run (check that all commands can
be found)
2. the results from a previous complete run of :program:`mdpow-equilibrium`
Usage of the command:
.. program:: mdpow-fep
:program:`mdpow-fep` [options] *RUNFILE*
Options:
.. cmdoption:: -h, --help
show this help message and exit
.. cmdoption:: --get-template
generate a template *RUNFILE* and exit
.. cmdoption:: -S <NAME>, --solvent=<NAME>
solvent ``<NAME>`` for compound, can be 'water' or 'octanol' [water]
.. cmdoption:: -d <DIRECTORY>, --dirname=<DIRECTORY>
generate files and directories in ``<DIRECTORY>``, which is created if it
does not already exist. The default is to use the molecule *name* from the run
input file.
.. _mdpow-pow-label:
Running analysis
----------------
Solvation free energy
~~~~~~~~~~~~~~~~~~~~~
The :program:`mdpow-solvationenergy` calculatest the solvation free
energy. It
* collects data from FEP simulations (converting EDR files to XVG.bz2
when necessary)
* calculates desolvation free energies for *solvent* --> vacuum via
thermodynamic integration (TI)
* plots dV/dlambda graphs
* appends results to ``energies.txt`` (when the
default names are chosen), see :ref:`mdpow-pow-outputformat-label`.
Usage of the command:
.. program:: mdpow-solvationenergy
:program:`mdpow-solvationenergy` [options] DIRECTORY [DIRECTORY ...]
Run the free energy analysis for a solvent in ``<DIRECTORY>/FEP`` and
return ``DeltaG``.
``DIRECTORY`` should contain all the files resulting from running
``mdpow.fep.Ghyd.setup()`` (or the corresponding ``Goct.setup()`` or
``Gcyclohexane.setup()`` and the results of the MD FEP simulations. It
relies on the canonical naming scheme (basically: just use the defaults
as in the tutorial).
The dV/dlambda plots can be produced automatically (``--plot=auto``). If
multiple ``DIRECTORY`` arguments are provided then one has to choose the
"auto" option (or ``None``).
The total solvation free energy is calculated as
DeltaG* = -(DeltaG_coul + DeltaG_vdw)
Note that the standard state refers to the "Ben-Naim" standard state of
transferring 1 M of compound in the gas phase to 1 M in the aqueous
phase.
Results are *appended* to a data file with **Output file format**::
. ---------- kJ/mol ---
molecule solvent DeltaG* coulomb vdw
All observables are quoted with an error estimate, which is derived
from the correlation time and error propagation through Simpson's rule
(see :meth:`mdpow.fep.Gsolv`). It ultimately comes from the error on
<dV/dlambda>, which is estimated as the error to determine the
average.
:molecule:
molecule name as used in the itp
:DeltaG*:
solvation free energy vacuum --> solvent, in kJ/mol
:coulomb:
discharging contribution to the DeltaG*
:vdw:
decoupling contribution to the DeltaG*
:directory:
folder in which the simulations were stored
Options:
.. cmdoption:: -h, --help
show this help message and exit
.. cmdoption:: -S <NAME>, --solvent=<NAME>
solvent NAME for compound, 'water', 'octanol', or 'cyclohexane' [water]
.. cmdoption:: -p <FILE>, --plotfile=<FILE>
plot dV/dlambda to :file:`{FILE}`; use png or pdf suffix to determine the
file type. 'auto' generates a pdf file :file:`{DIRECTORY}/dVdl.pdf` and
'None' disables it [auto]
.. cmdoption:: -e <FILE>, --energies=<FILE>
append solvation free energies to ``<FILE>`` [:file:`energies.txt`]
.. cmdoption:: -s <N>, --stride=<N>
Use every N-th datapoint from the original dV/dlambda data. Using a
number larger than 1 can significantly reduce memory usage with little
impact on the precision of the final output. [10]
.. cmdoption:: --force
force rereading all data [False]
.. cmdoption:: --ignore-corrupted
skip lines in the md.xvg files that cannot be parsed, perhaps because to an
intermittent file system problem. In order to salvage the existing data,
this option is provided for diagnostic purposes. [False]
.. Warning:: Only use this option with care; skipped lines can indicate that
other parts of the file have been corrupted but still pass the line
corruption test. USE AT YOUR OWN RISK.
It is recommended to simply rerun the affected simulation(s).
Partition coefficients
~~~~~~~~~~~~~~~~~~~~~~
The :program:`mdpow-pow` script
* collects data from FEP simulations
* calculates desolvation free energies for octanol --> vacuum and
water --> vacuum via thermodynamic integration (TI)
* calculates transfer free energy water --> octanol
* calculates |logPow|
* plots dV/dlambda graphs
* appends results to ``pow.txt`` and ``energies.txt`` (when the
default names are chosen), see :ref:`mdpow-pow-outputformat-label`.
Usage of the command:
.. program:: mdpow-pow
:program:`mdpow-pow` [options] *DIRECTORY* [*DIRECTORY* ...]
Run the free energy analysis for water and octanol in *DIRECTORY*/FEP and
return the octanol-water partition coefficient |logPow|.
*DIRECTORY* should contain all the files resulting from running
:meth:`mdpow.fep.Goct.setup` and :meth:`mdpow.fep.Goct.setup` and the results
of the MD FEP simulations. It relies on the canonical naming scheme (basically:
just use the defaults as in the tutorial).
The dV/dlambda plots can be produced automatically (:option:`mdpow-pow
-p` auto). If multiple *DIRECTORY* arguments are provided then one has to
choose the auto option (or ``None``).
Options:
.. cmdoption:: -h, --help
show this help message and exit
.. cmdoption:: -p <FILE>, --plotfile=<FILE>
plot dV/dlambda to :file:`{FILE}`; use png or pdf suffix to determine the
file type. 'auto' generates a pdf file :file:`{DIRECTORY}/dVdl.pdf` and
'None' disables it [auto]
.. cmdoption:: -o <FILE>, --outfile=<FILE>
append one-line results summary to ``<FILE>`` [:file:`pow.txt`]
.. cmdoption:: -e <FILE>, --energies=<FILE>
append solvation free energies to ``<FILE>`` [:file:`energies.txt`]
.. cmdoption:: --force
force rereading all data [False]
.. cmdoption:: --ignore-corrupted
skip lines in the md.xvg files that cannot be parsed, perhaps because to an
intermittent file system problem. In order to salvage the existing data,
this option is provided for diagnostic purposes. [False]
.. Warning:: Only use this option with care; skipped lines can indicate that
other parts of the file have been corrupted but still pass the line
corruption test. USE AT YOUR OWN RISK.
It is recommended to simply rerun the affected simulation(s).
.. _mdpow-pow-outputformat-label:
Output data file formats
~~~~~~~~~~~~~~~~~~~~~~~~
Results are *appended* to data files.
.. Note:: Energies are always output in **kJ/mol**.
POW summary file
................
The :file:`pow.txt` output file summarises the results from the water and
octanol solvation calculations. Its name set with the option :option:`mdpow-pow -o`.
It contains fixed column data in the following order and all energies are
in **kJ/mol**. See the :ref:`Table of computed water-octanol transfer energies
and logPow <table-logPow-label>` as an example.
**itp_name**
*molecule* identifier from the itp file
**DeltaG0**
transfer free energy from water to octanol (difference between
**DeltaG0** for octanol and water), in kJ/mol; >0: partitions into
water, <0: partitions into octanol,
**errDG0**
error on **DeltaG0**; errors are calculated through propagation of
errors from the errors on the means <dV/dlambda>
**logPOW**
|logPow|, base-10 logarithm of the octanol-water partition
coefficient; >0: partitions into octanol, <0: partitions preferrably
into water
**errlogP**
error on **logPow**
**directory**
directory under which all data files are stored; by default this is
the *name* of the molecule and hence it can be used to identify the
compound.
.. _table-logPow-label:
.. Table:: Computed |logPow| and water-to-octanol transfer energies (in kJ/mol).
======== ======== ======== ========= ======== ===========================================
itp_name DeltaG0 errDG0 logPow errlogP directory
======== ======== ======== ========= ======== ===========================================
BNZ -12.87 0.43 +2.24 0.07 benzene
OC9 -16.24 1.12 +2.83 0.20 octanol
URE +4.66 1.13 -0.81 0.20 urea
902 +9.35 1.06 -1.63 0.18 water_TIP4P
======== ======== ======== ========= ======== ===========================================
Energy file
...........
The :file:`energy.txt` output file collects all computed energy terms together
with the results also found in the summary file :file:`pow.txt`. Its name is
set with the option :option:`mdpow-pow -e`. It contains fixed column data in
the following order and all energies are in **kJ/mol**. As an example see
:ref:`Table of Solvation Energies <table-energies-label>` for the same
compounds as above.
**molecule**
*molecule* identifier from the itp file
**solvent**
solvent name (water, octanol)
**DeltaG0**
solvation free energy difference in kJ/mol (Ben-Naim standard
state, i.e. 1M gas/1M solution)
**errDG0**
error on **DeltaG0**, calculated through propagation of errors from the
errors on the mean <dV/dlambda>
**coulomb**
Coulomb (discharging) contribution to the solvation free energy **DeltaG0**
**errCoul**
error on **coulomb**
**VDW**
Van der Waals/Lennard-Jones (decoupling) contribution to **DeltaG0**
**errVDW**
error on **VDW**
**Vdp**
correction if the FEP are run at constant volume but the
desired solvation free energy is the Gibbs energy (**currently
neglected** and set to 0)
**errVdp**
error on **Vdp**
**w2oct**
transfer free energy from water to octanol (difference between
**DeltaG0** for octanol and water)
**errw2oct**
error on **w2oct**
**logPOW**
|logPow|
**errlogP**
error on **logPow**
**directory**
directory under which all data files are stored; by default this is
the *name* of the molecule and hence it can be used to identify the
compound.
.. _table-energies-label:
.. Table:: Solvation free energies for compounds in water and octanol (in kJ/mol).
========== ========== ======== ======== ======== ======== ======== ======== ======== ======== ======== ======== ======== ======== ====================
molecule solvent DeltaG0 errDG0 coulomb errCoul VDW errVDW Vdp errVdp w2oct errw2oct logPOW errlogP directory
========== ========== ======== ======== ======== ======== ======== ======== ======== ======== ======== ======== ======== ======== ====================
BNZ water -2.97 0.21 +7.65 0.07 -4.68 0.20 +0.00 0.00 -12.87 0.43 +2.24 0.07 benzene
BNZ octanol -15.84 0.37 +1.40 0.19 +14.44 0.32 +0.00 0.00 -12.87 0.43 +2.24 0.07 benzene
OC9 water -16.03 0.32 +27.35 0.09 -11.32 0.31 +0.00 0.00 -16.24 1.12 +2.83 0.20 octanol
OC9 octanol -32.28 1.08 +11.32 0.92 +20.96 0.56 +0.00 0.00 -16.24 1.12 +2.83 0.20 octanol
URE water -53.52 0.17 +56.94 0.11 -3.41 0.14 +0.00 0.00 +4.66 1.13 -0.81 0.20 urea
URE octanol -48.86 1.12 +35.75 1.09 +13.11 0.25 +0.00 0.00 +4.66 1.13 -0.81 0.20 urea
902 water -25.46 0.11 +34.93 0.10 -9.48 0.06 +0.00 0.00 +9.35 1.06 -1.63 0.18 water_TIP4P
902 octanol -16.11 1.05 +21.16 1.05 -5.05 0.09 +0.00 0.00 +9.35 1.06 -1.63 0.18 water_TIP4P
========== ========== ======== ======== ======== ======== ======== ======== ======== ======== ======== ======== ======== ======== ====================
House-keeping scripts
---------------------
A number of scripts are provided to complete simple tasks; they can be
used to check that all required files are present or they can help in
fixing small problems without one having to write Python code to do
this oneself (e.g. by manipulating the checlpoint files). They
generally make the same assumptions about file system layout as the
other mdpow scripts.
Checking if the simulation is complete
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Run :program:`mdpow-check` in order to check if all necessary files
are available.
Usage of the command:
.. program:: mdpow-check
:program:`mdpow-check` [options] *DIRECTORY* [*DIRECTORY* ...]
Check status of the progress of the project in *DIRECTORY*.
Output is only written to the status file (:file:`status.txt`). A quick way to find
problems is to do a ::
cat status.txt | gawk -F '|' '$2 !~ /OK/ {print $0}'
Options:
.. cmdoption:: -h, --help
show this help message and exit
.. cmdoption:: -o <FILE>, --outfile=<FILE>
write status results to :file:`{FILE}` [:file:`status.txt`]
Changing paths in :file:`water.simulation` and :file:`octanol.simulation`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It can become necessary to recreate the :file:`{solvent}.simulation`
status/checkpoint files in order to change paths, e.g. when one moved
the directories or transferred all files to a different file system.
Typically, one would execute the :program:`mdpow-rebuild-simulation` command
in the parent directory of *moleculename*.
Usage of the command:
.. program:: mdpow-rebuild-simulation
:program:`mdpow-rebuild-simulation` [options] *DIRECTORY* [*DIRECTORY* ...]
Re-create the ``water.simulation`` or ``octanol.simulation`` file with
adjusted paths (now rooted at *DIRECTORY*).
Options:
.. cmdoption:: -h, --help
show this help message and exit
.. cmdoption:: --solvent=<NAME>
rebuild file for 'water', 'octanol', or 'all' [all]
Re-building :file:`Ghyd.fep` and :file:`Goct.fep`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It can become necessary to recreate the :file:`{name}.fep` status/checkpoint
files (e.g. if the files became corrupted due to a software error or in order
to change paths).
Typically, one would execute the :program:`mdpow-rebuild-fep` command
in the parent directory of *moleculename*.
Usage of the command:
.. program:: mdpow-rebuild-fep
:program:`mdpow-rebuild-fep` [options] *DIRECTORY* [*DIRECTORY* ...]
Re-create the :file:`Goct.fep` or :file:`Ghyd.fep` file using the appropriate
equilibrium simulation file under *DIRECTORY*/FEP.
This should only be necessary when the fep file was destroyed due to a software
error or when the files are transferred to a different file system and some of
the paths stored in the :file:`{name}.fep` files have to be changed.
Options:
.. cmdoption:: -h, --help
show this help message and exit
.. cmdoption:: --solvent=<NAME>
rebuild fep for 'water', 'octanol', or 'all' [all]
.. cmdoption:: --setup=<LIST>
Re-generate queuing system scripts with appropriate paths: runs
:meth:`fep.Gsolv.setup` with argument `qscript=[LIST]` after
fixing Gsolv.
``LIST`` should contain a comma-separated list of queing system
templates. For example:
``'icsn_8pd.sge,icsn_2pd.sge,local.sh'``. It is an error if
``--setup`` is set without a ``LIST``.