-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running adaptive sampling using AMBER on a cluster using SLURM or PBS #255
Comments
it should work in the sense that some work was done there, but it is
largely untested by us at least.
…On 13 February 2017 at 18:20, eric-jm-lang ***@***.***> wrote:
Hello,
I am very interested in using htmd to run some adaptive sampling
simulations.
However, the examples I have seen on adaptive sampling seem to deal only
with ACEMD on a local GPU cluster.
I would like to know if it is possible to run adaptive sampling using
Amber on GPUs (i.e. pmemd.cuda) on a cluster that either relies on PBS or
SLURM to manage the queue. If yes could you please let me know what I
should specify in my scripts to be able to run such kind of adaptive
sampling.
Many thanks in advance
Eric
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#255>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHaqOqfXCqO5YMGIR9J46x6kKcy2hNBIks5rcJDdgaJpZM4L_eyy>
.
--
<http://www.acellera.com>
<https://twitter.com/acellera>
<https://www.youtube.com/user/acelleracom>
<https://www.linkedin.com/company/2133167?trk=tyah&trkInfo=clickedVertical%3Acompany%2CclickedEntityId%3A2133167%2Cidx%3A2-1-2%2CtarId%3A1448018583204%2Ctas%3Aacellera>
<https://www.acellera.com/md-simulation-blog-news/>
<http://is.gd/1eXkbS>
|
Hi Eric, About Amber (pmemd.cuda): like Gianni said, work was done on that, but it's untested by us and we cannot support it. About the queue/resources: we have been refactoring it recently, so if you see any problems in the documentation, let us know. in short, you can use SLURM, but not PBS yet, has it is untested.
or for adaptive:
|
Hey Eric. The queues will work fine (i.e. SLURM/PBS). |
Hello everyone, Thanks a lot for your very quick replies. So I understand this is untested by you and that you cannot support it but I would nonetheless be very interested in testing/using it. I have a good knowledge of AMBER but I am an absolute beginner with the htmd package.
Slurm script:
Stefan, is the Is the python script that starts the adaptive sampling supposed to run on the front node has a hidden process? or is it supposed to be a job submitted to the queue to run on a single core? Many thanks in advance for your help! Eric |
Nice :) So a simulation folder for me looks like this:
The
In our case since we run ACEMD our
In your case it would probably look like this
So if I remember correctly the
The adaptive will run on your local machine (or whichever machine you want which has access to |
Oh sorry, no |
@stefdoerr, isn't adaptive going to fail without using the |
Hi Stefan,
Thanks a lot!
Yes indeed .nc are trajectory files .rst7 are the coordinate files. so in
this example prot1_md1.rst7 is the file with the coordinates to restart the
simulations for a second md run for example. However the rst7 file only
contains the coordinates from the last step of the trajectory. Is that
sufficient? I thought the trajectory files where read and a frame extracted
from there? is it only in the case of ACEMD? Would it be possible if the
trajectory is written in ASCII format?
Many thanks,
Eric
…On 14 February 2017 at 11:12, Stefan Doerr ***@***.***> wrote:
Oh sorry, no nc are trajectory files right? What are the coordinate file
extensions for AMBER?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#255 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AOjRBFUSSwNr4AjriYm8_AZdpqE-GeImks5rcYwOgaJpZM4L_eyy>
.
--
Eric Lang
BrisSynBio Postdoctoral Research Associate Modelling
Centre for Computational Chemistry
School of Chemistry - University of Bristol
Bristol BS8 1TS - United Kingdom
|
@j3mdamas I don't get the problem. He will pass the @eric-jm-lang Ok so scratch what I wrote earlier. It should be |
Ok, thanks.
No rst and rst7 should be the same.
Ok so indeed the adaptive will read the trajectory file and select a frame
and write it to a rst file. got it.
I will try to do this and see how it goes.
Thanks a lot
Eric
…On 14 February 2017 at 11:48, Stefan Doerr ***@***.***> wrote:
@j3mdamas <https://github.com/j3mdamas> I don't get the problem. He will
pass the SlurmQueue object to ad.app.
@eric-jm-lang <https://github.com/eric-jm-lang> Ok so scratch what I
wrote earlier. It should be input.rst7. The problem is I don't know if we
can write rst7 files. I know we can write rst files. Was there a version
change?
So, the adaptive will read your trajectory nc files, select from them a
frame and write that frame as an rst file in the copied simulation folder
to start a new simulation from that coordinate
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#255 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AOjRBA1PuKyLSE4kcFDq7XPsusbCKG2_ks5rcZR1gaJpZM4L_eyy>
.
--
Eric Lang
BrisSynBio Postdoctoral Research Associate Modelling
Centre for Computational Chemistry
School of Chemistry - University of Bristol
Bristol BS8 1TS - United Kingdom
|
Show me the |
@stefdoerr then I think the |
Yes? As I said he should only write a |
I see, I misinterpreted then. Well, @eric-jm-lang, check https://www.htmd.org/docs/latest/htmd.queues.slurmqueue.html#module-htmd.queues.slurmqueue to see if the sbatch options you want are available when setting up your |
I guess the PBS queue is not working in version 1.5.17?
My script:
|
We haven't exposed it yet, due to being heavily untested. You can import it using: from htmd.queues.pbsqueue import PBSQueue I'll add it now, and see if someone is willing to test it for us. |
I wrote my own job.sh script, which seems to work in our PBS queuing system. Is there a way to make HTMD just to copy this file to the folders & run it? Or is it always implied in the queues that you need HTMD to write the job.sh files? BTW, as expected the PBSQueue does not work.
|
The queues always write their own job.sh files but we could just take your options and add them to the class. The rest of the logic which I see in the script can probably be moved to the run.sh file which is called by our job.sh file and is not modified by HTMD |
The only two options which you have and are not supported by our PBSqueue are scratch_local and cl_doom. I will add these now. The rest of the script (everything under #PBS commands) you can move to a |
md = PBSQueue
md.jobname = 'AceMD'
md.queue = 'gpu@arien-pro.ics.muni.cz'
md.ncpus = 1
md.ngpus = 1
md.memory = '5000'
md.walltime = '86400' Plus the options Stefan is adding. |
40ae055 md = PBSQueue
md.jobname = 'AceMD'
md.queue = 'gpu@arien-pro.ics.muni.cz'
md.ncpus = 1
md.ngpus = 1
md.memory = 5000
md.walltime = 86400
md.cluster = 'doom'
md.scratch_local = 5000 and move the rest of the logic into the run.sh |
Do you have any idea when the next release is happening? |
With the |
Just a note, a warning exists on the documentation of |
Hi, I'm going to use this thread as I'm also using AMBER. I'm not using any queuing system, just trying to run the simulations straight from the machine which has the GPUs. If you want I can open this on a different issue but I though it could be useful for other AMBER users. I haven't used htmd for awhile and I guess the API has changed. When we wrote the
By running the following commands: adapt = htmd.AdaptiveRun()
adapt.nmin = 3
adapt.nmax = 4
adapt.nepochs = 10
adapt.metricsel1 = 'name CA'
adapt.metrictype = 'distances'
adapt.ticadim = 5
adapt.updateperiod = 7200
adapt.filtersel = 'protein'
adapt.app = htmd.apps.pmemdlocal.PmemdLocal(
pmemd='/usr/local/amber/bin/pmemd.cuda_SPFP',
datadir='./data')
adapt.generatorspath = './ready'
adapt.inputpath = './input'
adapt.datapath = './data'
adapt.filteredpath = './filtered'
adapt.run() I've seen now that the adapt = htmd.AdaptiveMD()
adapt.nmin = 3
adapt.nmax = 4
adapt.nepochs = 10
adapt.updateperiod = 100
adapt.projection = htmd.projections.metricdistance.MetricDistance(sel1='resname LIG', sel2='name CA')
adapt.app = htmd.apps.pmemdlocal.PmemdLocal(
pmemd='/usr/local/amber/bin/pmemd.cuda_SPFP',
datadir='./data',
devices=[0, 1, 2, 3])
adapt.generatorspath = './ready'
adapt.inputpath = './input'
adapt.datapath = './data'
adapt.filteredpath = './filtered'
adapt.run() But this crashes out after the first epoch is finished with the following traceback:
And the following directory structure:
What's the recommended way to be running simple adaptive runs? Also, side note: when the script is running, I don't see any logging information that should be coming out of |
Thanks for the complete report! The reason it crashes is because it can't find pdb files in the input folders. Admittedly I could probably change it to accept prmtop files or psf files as well since we have the reader functionallity but for the moment simlist in adaptive expects to find pdb files as you can see in the |
Thanks! So do I have to manually copy the PDB files in the input folders? Edit: Looks like they're copied automatically if the PDB exists in the root folder. I'll report back when/if it crashes 😄 |
@jeiros Sorry for making you do so much testing. Could you please instead pass me a single |
@stefdoerr Don't worry, I understand supporting AMBER engine is extra work. I actually set up the I'll send you the requested files for testing to your email via file exchange of Imperial, since I can't upload them here. |
Yes it's definitely filtering related but it's probably a very simple fix once I have the files. Thanks very much! If it's ok with you I might add those files (or parts of them) as HTMD tests to avoid future regressions as well. |
@jeiros The bug is fixed actually. You just need to delete your filtered folder. I made a new filtered folder (filteredS) for example and compared it with yours (filtered): In [7]: mol = Molecule('./filteredS/filtered.pdb')
In [9]: mol.read('./filteredS/e1s1_ready/Production.filtered.nc')
In [10]: mol = Molecule('./filtered/filtered.pdb')
In [11]: mol.read('./filtered/e1s1_ready/Production.filtered.nc')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-11-9f28e3f3679c> in <module>()
----> 1 mol.read('./filtered/e1s1_ready/Production.filtered.nc') I will check if the rest runs fine though |
Ok, there is a bug though when working with your PDB file #276 I will need to see what's wrong there. |
Ok no problem thanks for taking a look into it :) |
@jeiros ok this works perfectly on my computer after the bug fixes app = htmd.LocalGPUQueue()
app.devices = [0,]
adapt = htmd.AdaptiveMD()
adapt.nmin = 1
adapt.nmax = 3
adapt.nepochs = 10
adapt.updateperiod = 100
adapt.projection = htmd.projections.metricdistance.MetricDistance(sel1='resname LIG and name C6 C10 C19', sel2='name CA')
adapt.app = app
adapt.filtersel = 'not water and not resname "Na\+" "Cl\-"'
adapt.generatorspath = './ready'
adapt.inputpath = './input'
adapt.datapath = './data'
adapt.filteredpath = './filtered'
adapt.coorname = 'structure.ncrst'
adapt.run() Bug fixes:Had to read the topology because MDtraj doesn't like reading pure trajectory files. Fixed here: 656d0a8 and here: 24ee39f. New release 1.7.13 being built right now https://travis-ci.org/Acellera/htmd/builds/210582795 Minor changes for improvements:
This is what it looks like after execution [sdoerr@loro Mon15:32 test-htmd] ll input/e2s1_e1s1p0f2/
total 82544
-rw-rw---- 1 sdoerr lab 0 Mar 13 12:43 log.txt
-rw-rw---- 1 sdoerr lab 561 Mar 13 12:43 mdinfo
-rw-rw---- 1 sdoerr lab 150 Mar 13 12:43 MD.sh
-rw-rw---- 1 sdoerr lab 230 Mar 13 12:43 Production.in
-rw-rw---- 1 sdoerr lab 7255212 Mar 13 15:32 structure.ncrst
-rw-r----- 1 sdoerr lab 27250369 Mar 10 15:06 structure.pdb
-rw-r----- 1 sdoerr lab 50002440 Mar 10 10:22 structure.prmtop Remaining issues:I need to create the new simlist class which autodetects topology files such as # Original line in adaptiverun.py
sims = simlist(glob(path.join(self.datapath, '*', '')), glob(path.join(self.inputpath, '*', 'structure.pdb')),
glob(path.join(self.inputpath, '*', '')))
# Needs to be currently replaced by
sims = simlist(glob(path.join(self.datapath, '*', '')), glob(path.join(self.inputpath, '*', 'structure.prmtop')),
glob(path.join(self.inputpath, '*', ''))) Once 1.7.13 is out test it and tell me if you encounter any other problems. Other problems would probably be from PMEMD App since I can't test that. But the rest runs fine locally. Edit: Remember to delete the |
Thank you so much! There are actually different versions of 'restart' files in AMBER, in ASCII and binary version. Mine is binary. It can be read with mdtraj: md.load_ncrestrt('structure.rst', top='structure.prmtop')
<mdtraj.Trajectory with 1 frames, 302263 atoms, 99238 residues, and unitcells at 0x7fe587d10588> I thought the convention was to name them Thank you so much for this! Also, I was going crazy with the counterions stripping, didn't know that I had to scape the |
Also, I did change the line in |
Ah yes that ions escaping thing is horrible but it's not our fault. VMD (whos atomselection syntax we use) is just weird like that. Yes try with rst7, just remember to read it also correctly in your input file. |
Again, conda behaving weirdly:
If I make a new env with python 3.6 and the full anaconda installation:
The new release is there. But:
|
Maybe try Works fine on my fresh py36 miniconda install. |
That still picks up the 1.7.11 version
|
But it's trying to pull 3.5 so weird. I am sorry, I can't help beyond suggesting a fresh install 😞 Conda is just annoying sometimes... |
Yes it's weird that Note: Solve it by doing:
not the prettiest, but I think it worked:
|
I think it's just because you're on OSX and we did not make sure the OSX
build passed. I am going to restart the OSX build and I'll let you know
when it's available.
…On Mon, Mar 13, 2017 at 5:44 PM, Juan Eiros ***@***.***> wrote:
Yes it's weird that anaconda search htmd finds the new version but conda
install -c acellera htmd=1.7.13 doesn't do anything. I'll give it some
time 😕
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#255 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AKTzQj7kGucU0rsLOrosNatyS5vZUAzSks5rlXJYgaJpZM4L_eyy>
.
|
I'm running this on a Linux machine since that's where I have the GPUs |
I have a suspicion this might be related to the automatical dependency generation. I will take a look at it tomorrow |
Were you starting with a fresh htmd-py35 environment? |
Hi, I managed to get the 1.7.13 version for a python 3.6 environment. Starting from 0, creating input files: ProdTest = Production()
ProdTest.amber.nstlim = 2500
ProdTest.amber.ntx = 2
ProdTest.amber.irest = 0
ProdTest.amber.parmfile = 'structure.prmtop'
ProdTest.amber.coordinates = 'structure.ncrst'
ProdTest.amber.dt = 0.004
ProdTest.amber.ntpr = 500
ProdTest.amber.ntwr = 500
ProdTest.amber.ntwx = 250
ProdTest.amber.ntwx = 250
ProdTest.write('./', './ready') This gives the following:
Using @stefdoerr commands: app = htmd.LocalGPUQueue()
app.devices = [0,]
adapt = htmd.AdaptiveMD()
adapt.nmin = 1
adapt.nmax = 3
adapt.nepochs = 10
adapt.updateperiod = 100
adapt.projection = htmd.projections.metricdistance.MetricDistance(sel1='resname LIG and name C6 C10 C19', sel2='name CA')
adapt.app = app
adapt.filtersel = 'not water and not resname "Na\+" "Cl\-"'
adapt.generatorspath = './ready'
adapt.inputpath = './input'
adapt.datapath = './data'
adapt.filteredpath = './filtered'
adapt.coorname = 'structure.ncrst'
adapt.run() Fails with the following logs & Traceback:
It's failing to launch the simulations. From what I've seen , the
$ cat input/e1s1_ready/job.sh
#!/bin/bash
export CUDA_VISIBLE_DEVICES=0
cd /home/je714/try_adaptive/from_manual_build/input/e1s1_ready
/home/je714/try_adaptive/from_manual_build/input/e1s1_ready/run.sh Also, we are not doing adapt.app = htmd.apps.pmemdlocal.PmemdLocal(
pmemd='/usr/local/amber/bin/pmemd.cuda_SPFP',
datadir='./data',
devices=[0, 1, 2, 3]) but adapt.app = app where $ cat input/e1s1_ready/MD.sh
ENGINE -O -i Production.in -o Production.out -p structure.prmtop -c structure.ncrst -x Production.nc -r Production_new.rst |
The To make it clearer: So now the problem is that the On the matter of the error with simlist: |
So to summarize the only two problems here are: Right? I can do that. |
Thanks for clarfiying. I got it to work for the moment by using app = htmd.apps.pmemdlocal.PmemdLocal(
pmemd='/usr/local/amber/bin/pmemd.cuda_SPFP',
datadir='./data',
devices=[0, 1, 2, 3])
adapt = htmd.AdaptiveMD()
adapt.nmin = 1
adapt.nmax = 3
adapt.nepochs = 10
adapt.updateperiod = 100
adapt.projection = htmd.projections.metricdistance.MetricDistance(sel1='resname LIG and name C6 C10 C19', sel2='name CA')
adapt.app = app
adapt.filtersel = 'not water and not resname "Na\+" "Cl\-"'
adapt.generatorspath = './ready'
adapt.inputpath = './input'
adapt.datapath = './data'
adapt.filteredpath = './filtered'
adapt.coorname = 'structure.ncrst'
adapt.run() But I'll switch to using the queues. |
Yes that would be it. Don't worry about it, I can play around with it and submit a PR once I think it's working fine with the queues. To make 'changes' to an htmd installation, here's what I do:
Is that how you go about it? I am not too sure it's working for me since I keep getting HTMD: Logging setup failed when I |
No, sorry. The setup.py doesn't work as far as I know.
That's it. The only issue might be the C |
The setup.py is for pypi packaging, as far as I recall and it's still under development (#237) |
@jeiros I am going to close this. Made a new issue for it |
Hello,
I am very interested in using htmd to run some adaptive sampling simulations.
However, the examples I have seen on adaptive sampling seem to deal only with ACEMD on a local GPU cluster.
I would like to know if it is possible to run adaptive sampling using Amber on GPUs (i.e. pmemd.cuda) on a cluster that either relies on PBS or SLURM to manage the queue. If yes could you please let me know what I should specify in my scripts to be able to run such kind of adaptive sampling.
Many thanks in advance
Eric
The text was updated successfully, but these errors were encountered: