Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running adaptive sampling using AMBER on a cluster using SLURM or PBS #255

Closed
eric-jm-lang opened this issue Feb 13, 2017 · 70 comments
Closed
Assignees

Comments

@eric-jm-lang
Copy link

Hello,
I am very interested in using htmd to run some adaptive sampling simulations.
However, the examples I have seen on adaptive sampling seem to deal only with ACEMD on a local GPU cluster.
I would like to know if it is possible to run adaptive sampling using Amber on GPUs (i.e. pmemd.cuda) on a cluster that either relies on PBS or SLURM to manage the queue. If yes could you please let me know what I should specify in my scripts to be able to run such kind of adaptive sampling.
Many thanks in advance
Eric

@giadefa
Copy link
Contributor

giadefa commented Feb 13, 2017 via email

@j3mdamas j3mdamas self-assigned this Feb 13, 2017
@j3mdamas
Copy link
Contributor

Hi Eric,

About Amber (pmemd.cuda): like Gianni said, work was done on that, but it's untested by us and we cannot support it.

About the queue/resources: we have been refactoring it recently, so if you see any problems in the documentation, let us know. in short, you can use SLURM, but not PBS yet, has it is untested.

md = SlurmQueue()
md.queue = <the name of the queue>
md.submit('./outdir_of_protocol')

or for adaptive:

md = AdaptiveMD()
md.app = SlurmQueue()

@stefdoerr
Copy link
Contributor

Hey Eric. The queues will work fine (i.e. SLURM/PBS).
The only practical issue is that you will need to write a run.sh script inside your simulation folders which takes an input.coor or some other coordinate format that AMBER supports and will run the simulation using that. It should be trivial if you have some knowledge of AMBER and we can help you if you can show us how a simulation is typically run from command line.

@eric-jm-lang
Copy link
Author

Hello everyone,

Thanks a lot for your very quick replies. So I understand this is untested by you and that you cannot support it but I would nonetheless be very interested in testing/using it.

I have a good knowledge of AMBER but I am an absolute beginner with the htmd package.
Here is an example of input file and slurm script I am using to run the MD simulations I am interested in running adaptive sampling on:
AMBER input file (md1.i):

&cntrl
 imin=0,  			           ! Not a minimisation run
 irest=1, 			           ! Restart simulation
 ntx=5, 			             ! Read coordinates and velocities from coordinate file
 nscm=1000,			           ! Reset COM every 1000 steps
 nstlim=800000000, dt=0.004,	 ! Run MD for 3.2 us with a timestep of 4 fs
 ntpr=2500, ntwx=25000, 		 ! Write the trajectory every 100 ps and the energies every 10 ps
 ioutfm=1,			           ! Use Binary NetCDF trajectory format (better)
 iwrap=0, 			           ! No wrapping will be performed
 ntxo=2, 			             ! NetCDF rst file
 ntwr=25000000,			       ! Write a restrt file every 100 ns steps, if negative value, the files are not overwritten
 cut=1000.0,               ! Cut-off electrostatics
 ntb=0,                    ! no periodicity
 ntc=2, ntf=2,			       ! SHAKE on all H
 ntp=0,				             ! No pressure regulation
 ntt=3,    			           ! Temperature regulation using langevin dynamics
 tempi=278.15,
 temp0=278.15,
 igb=8,                    ! Implicit solvent
 saltcon=0.137,         ! Ionic concentration
 gamma_ln=1.0, 			       ! Langevin thermostat collision frequency
 ig=-1,				             ! Randomize the seed for the pseudo-random number generator
 ntr=0, 			             ! Flag for restraining specified atoms
 nmropt=0,			           ! NMR restraints and weight changes read
/

Slurm script:

#!/bin/bash -login
#
#SBATCH -p gpu
#SBATCH -J name
#SBATCH --time=24:00:00     # Walltime
#SBATCH -A XXX        # Project Account
#SBATCH --ntasks-per-node=1 # number of tasks per node
#SBATCH --gres=gpu:1
#
module add apps/amber-16
#
pmemd.cuda -O -i md1.i -o prot1_md1.mdout -p prot1.parm7 -c prot1_eq.rst7   -ref prot1.rst7 -x prot1_md1.nc -r prot1_md1.rst7 -inf prot1_md1.mdinfo

Stefan, is the run.sh file you mentioned equivalent to the above slurm script?
Could you please let me know how to implement those two scripts within the htmd framework in order to run my simulations?

Is the python script that starts the adaptive sampling supposed to run on the front node has a hidden process? or is it supposed to be a job submitted to the queue to run on a single core?

Many thanks in advance for your help!

Eric

@stefdoerr
Copy link
Contributor

stefdoerr commented Feb 14, 2017

Nice :)

So a simulation folder for me looks like this:

[sdoerr@loro Tue11:43 equil_eukar6] tree 3jyc/
3jyc/
├── input
├── job.sh
├── parameters
├── run.sh
├── structure.pdb
└── structure.psf

The job.sh slurm bash file is written automatically by HTMD but instead of calling your last two commands it just calls run.sh. They look like this and are automatically generated by the SlurmQueue class.

#!/bin/bash
#
#SBATCH --job-name=MYNAME
#SBATCH --partition=multiscale,multiscaleCPU,playmolecule
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1000
#SBATCH --priority=gpu_priority
#SBATCH --workdir=/pathtosim/
#SBATCH --output=slurm.%N.%j.out
#SBATCH --error=slurm.%N.%j.err
#SBATCH --export=ACEMD_HOME,HTMD_LICENSE_FILE

trap "touch /pathtosim/htmd.queues.done" EXIT SIGTERM

cd /pathtosim/
/pathtosim/run.sh

In our case since we run ACEMD our run.sh file looks like this:

#!/bin/bash
acemd >log.txt 2>&1

In your case it would probably look like this

#!/bin/bash
module add apps/amber-16
pmemd.cuda -O -i md1.i -o prot1_md1.mdout -p prot1.parm7 -c prot1_eq.rst7   -ref prot1.rst7 -x input.nc -r prot1_md1.rst7 -inf prot1_md1.mdinfo

So if I remember correctly the .nc file are the coordinates, right? Then you need to tell the adaptive to write an nc file with the new coordinates for each simulation.
So to understand how the adaptive starts new simulations. It picks a frame from an old simulation, copies the whole simulation folder and replaces the input.nc file. This creates a "new" simulation. So you need to make sure that everything would work fine if you replaced the coordinate file in your simulation folders.

ad = AdaptiveMD()
[...]
ad.coorname = 'input.nc'
[...]

The adaptive will run on your local machine (or whichever machine you want which has access to sbatch, squeue etc). It will then distribute simulations to SLURM

@stefdoerr
Copy link
Contributor

stefdoerr commented Feb 14, 2017

Oh sorry, no nc are trajectory files right? What are the input coordinate file extensions for AMBER?

@j3mdamas
Copy link
Contributor

@stefdoerr, isn't adaptive going to fail without using the SlurmQueue() in AdaptiveMD.app? Even though job.sh is copied, AdaptiveMD would not know how to submit, no? And if it does, it will overwrite the previous job.sh.

@eric-jm-lang
Copy link
Author

eric-jm-lang commented Feb 14, 2017 via email

@stefdoerr
Copy link
Contributor

stefdoerr commented Feb 14, 2017

@j3mdamas I don't get the problem. He will pass the SlurmQueue object to ad.app.

@eric-jm-lang Ok so scratch what I wrote earlier. It should be input.rst. The problem is I don't know if we can write rst7 files. I know we can write rst files. Was there a version change?
So, the adaptive will read your trajectory nc files, select from them a frame and write that frame as an rst file in the copied simulation folder to start a new simulation from that coordinate

@eric-jm-lang
Copy link
Author

eric-jm-lang commented Feb 14, 2017 via email

@stefdoerr
Copy link
Contributor

Show me the run.sh file once you have it. Remember to also set adaptive to write the correct fileformat as I showed you before. The rest should then work

@j3mdamas
Copy link
Contributor

@stefdoerr then I think the job.sh will be overwritten by _createJobScript, no?

@stefdoerr
Copy link
Contributor

stefdoerr commented Feb 14, 2017

Yes? As I said he should only write a run.sh script.

@j3mdamas
Copy link
Contributor

I see, I misinterpreted then.

Well, @eric-jm-lang, check https://www.htmd.org/docs/latest/htmd.queues.slurmqueue.html#module-htmd.queues.slurmqueue to see if the sbatch options you want are available when setting up your AdaptiveMD.app. For example, we don't use the account or the ntasks per node.

@piia600
Copy link

piia600 commented Mar 1, 2017

I guess the PBS queue is not working in version 1.5.17?

Traceback (most recent call last):
  File "equil_gpu.py", line 8, in <module>
    app = PBSQueue()
NameError: name 'PBSQueue' is not defined

My script:

from htmd import *
from htmd.protocols.equilibration_v2 import Equilibration
md = Equilibration()
md.runtime = 5
md.timeunits = 'ns'
md.temperature = 310
md.write('./build/','./equil')
app = PBSQueue()
app.queue = 'gpu'
app.ngpu = 1
app.walltime = 14000
app.submit('./equil/')
app.wait()

@j3mdamas
Copy link
Contributor

j3mdamas commented Mar 1, 2017

We haven't exposed it yet, due to being heavily untested.

You can import it using:

from htmd.queues.pbsqueue import PBSQueue

I'll add it now, and see if someone is willing to test it for us.

@piia600
Copy link

piia600 commented Mar 2, 2017

I wrote my own job.sh script, which seems to work in our PBS queuing system. Is there a way to make HTMD just to copy this file to the folders & run it? Or is it always implied in the queues that you need HTMD to write the job.sh files?

BTW, as expected the PBSQueue does not work.

#!/bin/bash
#PBS -N AceMD
#PBS -q gpu@arien-pro.ics.muni.cz
#PBS -l select=1:ncpus=1:ngpus=1:mem=5gb:scratch_local=5gb:cl_doom=True
#PBS -l walltime=24:0:0

cd $SCRATCHDIR || exit 1
cp -r $PBS_O_WORKDIR/* $SCRATCHDIR/

./run.sh

cp -r $SCRATCHDIR/* $PBS_O_WORKDIR/. && rm -rf $SCRATCHDIR

cd $PBS_O_WORKDIR
sim=`basename $PBS_O_WORKDIR`
simpath=`dirname $(dirname $PBS_O_WORKDIR)`
mv *.xtc $simpath/data/$sim

exit 0

@stefdoerr
Copy link
Contributor

The queues always write their own job.sh files but we could just take your options and add them to the class. The rest of the logic which I see in the script can probably be moved to the run.sh file which is called by our job.sh file and is not modified by HTMD

@stefdoerr
Copy link
Contributor

The only two options which you have and are not supported by our PBSqueue are scratch_local and cl_doom. I will add these now. The rest of the script (everything under #PBS commands) you can move to a run.sh script.

@j3mdamas
Copy link
Contributor

j3mdamas commented Mar 2, 2017

md = PBSQueue
md.jobname = 'AceMD'
md.queue = 'gpu@arien-pro.ics.muni.cz'
md.ncpus = 1
md.ngpus = 1
md.memory = '5000'
md.walltime = '86400'

Plus the options Stefan is adding.

@stefdoerr
Copy link
Contributor

40ae055
Done. You can pull the latest HTMD from github or wait for the next release.
So as Joao said, this should work (added my own parameters):

md = PBSQueue
md.jobname = 'AceMD'
md.queue = 'gpu@arien-pro.ics.muni.cz'
md.ncpus = 1
md.ngpus = 1
md.memory = 5000
md.walltime = 86400
md.cluster = 'doom'
md.scratch_local = 5000

and move the rest of the logic into the run.sh

@piia600
Copy link

piia600 commented Mar 2, 2017

Do you have any idea when the next release is happening?

@j3mdamas
Copy link
Contributor

j3mdamas commented Mar 2, 2017

With the cluster and scratch_local variables? probably just in one week.

@j3mdamas
Copy link
Contributor

j3mdamas commented Mar 7, 2017

Just a note, a warning exists on the documentation of PBSQueue: https://www.htmd.org/docs/latest/htmd.queues.pbsqueue.html

@jeiros
Copy link
Contributor

jeiros commented Mar 9, 2017

Hi,

I'm going to use this thread as I'm also using AMBER. I'm not using any queuing system, just trying to run the simulations straight from the machine which has the GPUs. If you want I can open this on a different issue but I though it could be useful for other AMBER users.

I haven't used htmd for awhile and I guess the API has changed. When we wrote the PmemdLocal class app I was able to run adaptive simulations on a directory with the following structure:

.
├── adaptive.ipynb
├── ready
│   ├── MD.sh
│   ├── Production.in
│   ├── structure.prmtop
│   └── structure.rst
├── structure.prmtop
└── structure.rst

By running the following commands:

adapt = htmd.AdaptiveRun()
adapt.nmin = 3
adapt.nmax = 4
adapt.nepochs = 10
adapt.metricsel1 = 'name CA'
adapt.metrictype = 'distances'
adapt.ticadim = 5
adapt.updateperiod = 7200
adapt.filtersel = 'protein'
adapt.app = htmd.apps.pmemdlocal.PmemdLocal(
    pmemd='/usr/local/amber/bin/pmemd.cuda_SPFP',
    datadir='./data')
adapt.generatorspath = './ready'
adapt.inputpath = './input'
adapt.datapath = './data'
adapt.filteredpath = './filtered'
adapt.run()

I've seen now that the AdaptiveRun class is not available anymore so I've changed this a bit, to give:

adapt = htmd.AdaptiveMD()
adapt.nmin = 3
adapt.nmax = 4
adapt.nepochs = 10
adapt.updateperiod = 100
adapt.projection = htmd.projections.metricdistance.MetricDistance(sel1='resname LIG', sel2='name CA')
adapt.app = htmd.apps.pmemdlocal.PmemdLocal(
    pmemd='/usr/local/amber/bin/pmemd.cuda_SPFP',
    datadir='./data',
    devices=[0, 1, 2, 3])
adapt.generatorspath = './ready'
adapt.inputpath = './input'
adapt.datapath = './data'
adapt.filteredpath = './filtered'
adapt.run()

But this crashes out after the first epoch is finished with the following traceback:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-15-fd22e54018ed> in <module>()
     13 adapt.datapath = './data'
     14 adapt.filteredpath = './filtered'
---> 15 adapt.run()

/home/je714/htmd/htmd/adaptive/adaptive.py in run(self)
     97                 # If currently running simulations are lower than nmin start new ones to reach nmax number of sims
     98                 if self._running <= self.nmin and epoch < self.nepochs:
---> 99                     flag = self._algorithm()
    100                     if flag is False:
    101                         self._unsetLock()

/home/je714/htmd/htmd/adaptive/adaptiverun.py in _algorithm(self)
    123 
    124     def _algorithm(self):
--> 125         data = self._getData(self._getSimlist())
    126         if not self._checkNFrames(data): return False
    127         self._createMSM(data)

/home/je714/htmd/htmd/adaptive/adaptiverun.py in _getSimlist(self)
    140         logger.info('Postprocessing new data')
    141         sims = simlist(glob(path.join(self.datapath, '*', '')), glob(path.join(self.inputpath, '*', 'structure.pdb')),
--> 142                        glob(path.join(self.inputpath, '*', '')))
    143         if self.filter:
    144             sims = simfilter(sims, self.filteredpath, filtersel=self.filtersel)

/home/je714/htmd/htmd/simlist.py in simlist(datafolders, molfiles, inputfolders)
    134         raise FileNotFoundError('No data folders were given, check your arguments.')
    135     if not molfiles:
--> 136         raise FileNotFoundError('No molecule files were given, check your arguments.')
    137     if isinstance(molfiles, str):
    138         molfiles = [molfiles]

FileNotFoundError: No molecule files were given, check your arguments.

And the following directory structure:

.
├── adaptive.ipynb
├── data
│   ├── e1s1_ready
│   │   └── Production.nc
│   ├── e1s2_ready
│   │   └── Production.nc
│   ├── e1s3_ready
│   │   └── Production.nc
│   └── e1s4_ready
├── input
│   ├── e1s1_ready
│   │   ├── log.txt
│   │   ├── mdinfo
│   │   ├── MD.sh
│   │   ├── Production.in
│   │   ├── Production_new.rst
│   │   ├── Production.out
│   │   ├── structure.prmtop
│   │   └── structure.rst
│   ├── e1s2_ready
│   │   ├── log.txt
│   │   ├── mdinfo
│   │   ├── MD.sh
│   │   ├── Production.in
│   │   ├── Production_new.rst
│   │   ├── Production.out
│   │   ├── structure.prmtop
│   │   └── structure.rst
│   ├── e1s3_ready
│   │   ├── log.txt
│   │   ├── mdinfo
│   │   ├── MD.sh
│   │   ├── Production.in
│   │   ├── Production_new.rst
│   │   ├── Production.out
│   │   ├── structure.prmtop
│   │   └── structure.rst
│   └── e1s4_ready
│       ├── log.txt
│       ├── MD.sh
│       ├── Production.in
│       ├── Production.out
│       ├── structure.prmtop
│       └── structure.rst
├── ready
│   ├── MD.sh
│   ├── Production.in
│   ├── structure.prmtop
│   └── structure.rst
├── structure.prmtop
└── structure.rst

What's the recommended way to be running simple adaptive runs?

Also, side note: when the script is running, I don't see any logging information that should be coming out of logger.info.

@stefdoerr
Copy link
Contributor

Thanks for the complete report!

The reason it crashes is because it can't find pdb files in the input folders. Admittedly I could probably change it to accept prmtop files or psf files as well since we have the reader functionallity but for the moment simlist in adaptive expects to find pdb files as you can see in the glob(path.join(self.inputpath, '*', 'structure.pdb')) line

@stefdoerr stefdoerr mentioned this issue Mar 9, 2017
@jeiros
Copy link
Contributor

jeiros commented Mar 10, 2017

Thanks! So do I have to manually copy the PDB files in the input folders?

Edit: Looks like they're copied automatically if the PDB exists in the root folder. I'll report back when/if it crashes 😄

@stefdoerr
Copy link
Contributor

@jeiros Sorry for making you do so much testing. Could you please instead pass me a single data/X and input/X folder so that I can test it locally? Only one trajectory/input should be enough. Thanks!

@jeiros
Copy link
Contributor

jeiros commented Mar 13, 2017

@stefdoerr Don't worry, I understand supporting AMBER engine is extra work. I actually set up the adaptive.filter = False and it went on to do the projections. But it failed after awhile with a really long traceback coming from joblib, I think. I'm retrying this with longer simulations, since I thought maybe it isn't possible to build a markov model with the test runs that I am doing (which are only a few ps long).

I'll send you the requested files for testing to your email via file exchange of Imperial, since I can't upload them here.

@stefdoerr
Copy link
Contributor

Yes it's definitely filtering related but it's probably a very simple fix once I have the files. Thanks very much!

If it's ok with you I might add those files (or parts of them) as HTMD tests to avoid future regressions as well.

@stefdoerr
Copy link
Contributor

@jeiros The bug is fixed actually. You just need to delete your filtered folder. I made a new filtered folder (filteredS) for example and compared it with yours (filtered):

In [7]: mol = Molecule('./filteredS/filtered.pdb')

In [9]: mol.read('./filteredS/e1s1_ready/Production.filtered.nc')

In [10]: mol = Molecule('./filtered/filtered.pdb')

In [11]: mol.read('./filtered/e1s1_ready/Production.filtered.nc')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-9f28e3f3679c> in <module>()
----> 1 mol.read('./filtered/e1s1_ready/Production.filtered.nc')

I will check if the rest runs fine though

@stefdoerr
Copy link
Contributor

stefdoerr commented Mar 13, 2017

Ok, there is a bug though when working with your PDB file #276 I will need to see what's wrong there.
If you modify my adaptive code to read prmtop files though, as you did from what I see, it will work, albeit extremely slowly which I don't know exactly why.

@jeiros
Copy link
Contributor

jeiros commented Mar 13, 2017

Ok no problem thanks for taking a look into it :)

@stefdoerr
Copy link
Contributor

stefdoerr commented Mar 13, 2017

@jeiros ok this works perfectly on my computer after the bug fixes

app = htmd.LocalGPUQueue()
app.devices = [0,]

adapt = htmd.AdaptiveMD()
adapt.nmin = 1
adapt.nmax = 3
adapt.nepochs = 10
adapt.updateperiod = 100
adapt.projection = htmd.projections.metricdistance.MetricDistance(sel1='resname LIG and name C6 C10 C19', sel2='name CA')
adapt.app = app
adapt.filtersel = 'not water and not resname "Na\+" "Cl\-"'
adapt.generatorspath = './ready'
adapt.inputpath = './input'
adapt.datapath = './data'
adapt.filteredpath = './filtered'
adapt.coorname = 'structure.ncrst'
adapt.run()

Bug fixes:

Had to read the topology because MDtraj doesn't like reading pure trajectory files. Fixed here: 656d0a8 and here: 24ee39f. New release 1.7.13 being built right now https://travis-ci.org/Acellera/htmd/builds/210582795

Minor changes for improvements:

  1. I removed the ions during filtering since we don't need them and they might differ in number between systems.
  2. I selected only 2-3 atoms of the ligand to drastically reduce computation time and memory usage by using a selection like: 'resname LIG and name C6 C10 C19'. This should be a good enough proxy of the conformations of the ligand and reduces the dimensionality from 30k dimensions to 1257 dimensions (1/30th memory usage and much faster calculation).
  3. I changed the default input.coor of adaptive to structure.ncrst since that is what your simulation program probably needs for input coordinates. Take care, you need to change your input file to read ncrst instead of rst. MDtraj cannot write rst, I just assumed here that ncrst is the correct equivalent extension but maybe it's rst7? I have no clue about this so you should enlighten me.

This is what it looks like after execution

[sdoerr@loro Mon15:32 test-htmd] ll input/e2s1_e1s1p0f2/
total 82544
-rw-rw---- 1 sdoerr lab        0 Mar 13 12:43 log.txt
-rw-rw---- 1 sdoerr lab      561 Mar 13 12:43 mdinfo
-rw-rw---- 1 sdoerr lab      150 Mar 13 12:43 MD.sh
-rw-rw---- 1 sdoerr lab      230 Mar 13 12:43 Production.in
-rw-rw---- 1 sdoerr lab  7255212 Mar 13 15:32 structure.ncrst
-rw-r----- 1 sdoerr lab 27250369 Mar 10 15:06 structure.pdb
-rw-r----- 1 sdoerr lab 50002440 Mar 10 10:22 structure.prmtop

Remaining issues:

I need to create the new simlist class which autodetects topology files such as *.prmtop in adaptive folders instead of expecting a structure.pdb file as defined in this issue #275. For the moment you seem to have worked around it by changing the line in adaptiverun.py if I understand correctly.

# Original line in adaptiverun.py
sims = simlist(glob(path.join(self.datapath, '*', '')), glob(path.join(self.inputpath, '*', 'structure.pdb')),
                       glob(path.join(self.inputpath, '*', '')))
# Needs to be currently replaced by
sims = simlist(glob(path.join(self.datapath, '*', '')), glob(path.join(self.inputpath, '*', 'structure.prmtop')),
                       glob(path.join(self.inputpath, '*', '')))

Once 1.7.13 is out test it and tell me if you encounter any other problems. Other problems would probably be from PMEMD App since I can't test that. But the rest runs fine locally.

Edit: Remember to delete the filtered folder after you install the new HTMD version as I mentioned before because if the files already exist it won't overwrite them.

@jeiros
Copy link
Contributor

jeiros commented Mar 13, 2017

Thank you so much! There are actually different versions of 'restart' files in AMBER, in ASCII and binary version. Mine is binary. It can be read with mdtraj:

md.load_ncrestrt('structure.rst', top='structure.prmtop')
<mdtraj.Trajectory with 1 frames, 302263 atoms, 99238 residues, and unitcells at 0x7fe587d10588>

I thought the convention was to name them .rst but I do have seen people saving them as .rst7, but the file termination shouldn't affect what's inside the file?

Thank you so much for this! Also, I was going crazy with the counterions stripping, didn't know that I had to scape the + and - characters 😅

@jeiros
Copy link
Contributor

jeiros commented Mar 13, 2017

Also, I did change the line in adaptiverun.py to accept prmtop files, but since I've upgraded this morning to the new htmd version that was overwritten. I'll play around with changing it once I get my hands on the 1.7.13 release.

@stefdoerr
Copy link
Contributor

Ah yes that ions escaping thing is horrible but it's not our fault. VMD (whos atomselection syntax we use) is just weird like that.

Yes try with rst7, just remember to read it also correctly in your input file.
1.7.13 is out so give it a try whenever you find time :)
Thanks!

@jeiros
Copy link
Contributor

jeiros commented Mar 13, 2017

Again, conda behaving weirdly:

$ conda upgrade htmd
Fetching package metadata ...............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /home/je714/anaconda3/envs/htmd-py35:
#
htmd                      1.7.11                   py35_0    acellera

If I make a new env with python 3.6 and the full anaconda installation: conda create --name htmd-py36 anaconda -y and then look for HTMD

$ anaconda search htmd
Using Anaconda API: https://api.anaconda.org
Run 'anaconda show <USER/PACKAGE>' to get more details:
Packages:
     Name                      |  Version | Package Types   | Platforms
     ------------------------- |   ------ | --------------- | ---------------
     acellera-basic/htmd       |   1.0.10 | conda           | linux-64, win-64, osx-64
     acellera-basic/htmd-data  |   0.0.33 | conda           | linux-64, win-64, osx-64
     acellera-basic/htmdbabel  |   2.3.95 | conda           | linux-64, osx-64
     acellera/HTMD             |   1.7.13 | conda           | linux-64, win-64, osx-64
                                          : High Throughput Molecular Dynamics

The new release is there. But:

$ conda install -c acellera htmd=1.7.13
Fetching package metadata ...............


PackageNotFoundError: Package not found: '' Package missing in current linux-64 channels:
  - htmd 1.7.13*

You can search for packages on anaconda.org with

    anaconda search -t conda htmd

conda install seems to still be picking up the 1.7.11 release. I'll give it a bit of time to see if it detects the new release (?)

@stefdoerr
Copy link
Contributor

stefdoerr commented Mar 13, 2017

Maybe try
conda uninstall htmd --force
and install again? Seems to help sometimes.

Works fine on my fresh py36 miniconda install.

@jeiros
Copy link
Contributor

jeiros commented Mar 13, 2017

That still picks up the 1.7.11 version

(htmd-py35) je714@titanx2:~$ conda uninstall htmd --force
Fetching package metadata ...............

Package plan for package removal in environment /home/je714/anaconda3/envs/htmd-py35:

The following packages will be REMOVED:

    htmd: 1.7.11-py35_0 acellera

Proceed ([y]/n)? y

(htmd-py35) je714@titanx2:~$ conda install htmd
Fetching package metadata ...............
Solving package specifications: .

Package plan for installation in environment /home/je714/anaconda3/envs/htmd-py35:

The following NEW packages will be INSTALLED:

    htmd: 1.7.11-py35_0 acellera

Proceed ([y]/n)? y

@stefdoerr
Copy link
Contributor

stefdoerr commented Mar 13, 2017

But it's trying to pull 3.5 so weird.
You can also see here that both 3.5 and 3.6 versions are available for 1.7.13: https://anaconda.org/acellera/HTMD/files

I am sorry, I can't help beyond suggesting a fresh install 😞 Conda is just annoying sometimes...

@jeiros
Copy link
Contributor

jeiros commented Mar 13, 2017

Yes it's weird that anaconda search htmd finds the new version but conda install -c acellera htmd=1.7.13 doesn't do anything. I'll give it some time 😕

Note: Solve it by doing:

$ wget https://anaconda.org/acellera/HTMD/1.7.13/download/linux-64/htmd-1.7.13-py35_0.tar.bz2
$ conda install --offline htmd-1.7.13-py35_0.tar.bz2

not the prettiest, but I think it worked:

$ conda list | grep htmd
htmd                      1.7.13                   py35_0    file:///home/je714

@j3mdamas
Copy link
Contributor

j3mdamas commented Mar 13, 2017 via email

@jeiros
Copy link
Contributor

jeiros commented Mar 13, 2017

I'm running this on a Linux machine since that's where I have the GPUs

@stefdoerr
Copy link
Contributor

I have a suspicion this might be related to the automatical dependency generation. I will take a look at it tomorrow

@mj-harvey
Copy link
Contributor

Were you starting with a fresh htmd-py35 environment?
If not, make a new one and try again. Shoudnt be any need to use 3.5 anymore - our release for 3.6 is out

@jeiros
Copy link
Contributor

jeiros commented Mar 14, 2017

Hi, I managed to get the 1.7.13 version for a python 3.6 environment.

Starting from 0, creating input files:

ProdTest = Production()
ProdTest.amber.nstlim = 2500
ProdTest.amber.ntx = 2
ProdTest.amber.irest = 0
ProdTest.amber.parmfile = 'structure.prmtop'
ProdTest.amber.coordinates = 'structure.ncrst'
ProdTest.amber.dt = 0.004
ProdTest.amber.ntpr = 500
ProdTest.amber.ntwr = 500
ProdTest.amber.ntwx = 250

ProdTest.amber.ntwx = 250 
ProdTest.write('./', './ready')

This gives the following:

$ ll ready/
total 63012
-rw-rw-r-- 1 je714 je714      122 Mar 14 10:25 MD.sh
-rw-rw-r-- 1 je714 je714      230 Mar 14 10:25 Production.in
-rw-r--r-- 1 je714 je714 14509680 Mar 14 10:25 structure.ncrst
-rw-r--r-- 1 je714 je714 50002440 Mar 14 10:25 structure.prmtop
$  cat ready/MD.sh
ENGINE -O -i Production.in -o Production.out -p structure.prmtop -c structure.ncrst -x Production.nc -r Production_new.rst

Using @stefdoerr commands:

app = htmd.LocalGPUQueue()
app.devices = [0,]

adapt = htmd.AdaptiveMD()
adapt.nmin = 1
adapt.nmax = 3
adapt.nepochs = 10
adapt.updateperiod = 100
adapt.projection = htmd.projections.metricdistance.MetricDistance(sel1='resname LIG and name C6 C10 C19', sel2='name CA')
adapt.app = app
adapt.filtersel = 'not water and not resname "Na\+" "Cl\-"'
adapt.generatorspath = './ready'
adapt.inputpath = './input'
adapt.datapath = './data'
adapt.filteredpath = './filtered'
adapt.coorname = 'structure.ncrst'
adapt.run()

Fails with the following logs & Traceback:

2017-03-14 10:25:47,756 - htmd.adaptive.adaptive - INFO - Processing epoch 0
2017-03-14 10:25:47,758 - htmd.adaptive.adaptive - INFO - Epoch 0, generating first batch
2017-03-14 10:25:47,759 - htmd.adaptive.adaptive - INFO - Generators folder has no subdirectories, using folder itself
2017-03-14 10:25:47,932 - htmd.queues.localqueue - INFO - Using GPU devices 0
2017-03-14 10:25:47,934 - htmd.queues.localqueue - INFO - Queueing /home/je714/try_adaptive/from_manual_build/input/e1s1_ready
2017-03-14 10:25:47,935 - htmd.queues.localqueue - INFO - Running /home/je714/try_adaptive/from_manual_build/input/e1s1_ready on GPU device 0
2017-03-14 10:25:47,935 - htmd.queues.localqueue - INFO - Queueing /home/je714/try_adaptive/from_manual_build/input/e1s2_ready
2017-03-14 10:25:47,939 - htmd.queues.localqueue - INFO - Queueing /home/je714/try_adaptive/from_manual_build/input/e1s3_ready
2017-03-14 10:25:47,945 - htmd.adaptive.adaptive - INFO - Sleeping for 100 seconds.
2017-03-14 10:25:47,948 - htmd.queues.localqueue - INFO - Error in simulation /home/je714/try_adaptive/from_manual_build/input/e1s1_ready. Command '/home/je714/try_adaptive/from_manual_build/input/e1s1_ready/job.sh' returned non-zero exit status 127.
2017-03-14 10:25:47,950 - htmd.queues.localqueue - INFO - Running /home/je714/try_adaptive/from_manual_build/input/e1s2_ready on GPU device 0
2017-03-14 10:25:47,958 - htmd.queues.localqueue - INFO - Error in simulation /home/je714/try_adaptive/from_manual_build/input/e1s2_ready. Command '/home/je714/try_adaptive/from_manual_build/input/e1s2_ready/job.sh' returned non-zero exit status 127.
2017-03-14 10:25:47,960 - htmd.queues.localqueue - INFO - Running /home/je714/try_adaptive/from_manual_build/input/e1s3_ready on GPU device 0
2017-03-14 10:25:47,970 - htmd.queues.localqueue - INFO - Error in simulation /home/je714/try_adaptive/from_manual_build/input/e1s3_ready. Command '/home/je714/try_adaptive/from_manual_build/input/e1s3_ready/job.sh' returned non-zero exit status 127.
2017-03-14 10:27:27,953 - htmd.adaptive.adaptive - INFO - Processing epoch 1
2017-03-14 10:27:27,955 - htmd.adaptive.adaptive - INFO - Retrieving simulations.
2017-03-14 10:27:27,957 - htmd.adaptive.adaptive - INFO - 0 simulations in progress
2017-03-14 10:27:27,959 - htmd.adaptive.adaptiverun - INFO - Postprocessing new data
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-11-828aa70f8ca4> in <module>()
     15 adapt.filteredpath = './filtered'
     16 adapt.coorname = 'structure.ncrst'
---> 17 adapt.run()

/home/je714/anaconda3/envs/htmd-py36/lib/python3.6/site-packages/htmd/adaptive/adaptive.py in run(self)
     97                 # If currently running simulations are lower than nmin start new ones to reach nmax number of sims
     98                 if self._running <= self.nmin and epoch < self.nepochs:
---> 99                     flag = self._algorithm()
    100                     if flag is False:
    101                         self._unsetLock()

/home/je714/anaconda3/envs/htmd-py36/lib/python3.6/site-packages/htmd/adaptive/adaptiverun.py in _algorithm(self)
    123 
    124     def _algorithm(self):
--> 125         data = self._getData(self._getSimlist())
    126         if not self._checkNFrames(data): return False
    127         self._createMSM(data)

/home/je714/anaconda3/envs/htmd-py36/lib/python3.6/site-packages/htmd/adaptive/adaptiverun.py in _getSimlist(self)
    140         logger.info('Postprocessing new data')
    141         sims = simlist(glob(path.join(self.datapath, '*', '')), glob(path.join(self.inputpath, '*', 'structure.pdb')),
--> 142                        glob(path.join(self.inputpath, '*', '')))
    143         if self.filter:
    144             sims = simfilter(sims, self.filteredpath, filtersel=self.filtersel)

/home/je714/anaconda3/envs/htmd-py36/lib/python3.6/site-packages/htmd/simlist.py in simlist(datafolders, molfiles, inputfolders)
    132 
    133     if not datafolders:
--> 134         raise FileNotFoundError('No data folders were given, check your arguments.')
    135     if not molfiles:
    136         raise FileNotFoundError('No molecule files were given, check your arguments.')

FileNotFoundError: No data folders were given, check your arguments.

It's failing to launch the simulations. From what I've seen , the job.sh script is expectin a run.sh script to launch the simulations with the appropriate command but the one produced by the htmd.protocols.pmemdproduction.Production class is called MD.sh.

$ l input/e1s1_ready/
total 62M
-rw-rw-r-- 1 je714 122 Mar 14 10:25 MD.sh
-rw-rw-r-- 1 je714 230 Mar 14 10:25 Production.in
-rwx------ 1 je714 172 Mar 14 10:25 job.sh
-rw-r--r-- 1 je714 14M Mar 14 10:25 structure.ncrst
-rw-r--r-- 1 je714 48M Mar 14 10:25 structure.prmtop
$ cat input/e1s1_ready/job.sh
#!/bin/bash

export CUDA_VISIBLE_DEVICES=0
cd /home/je714/try_adaptive/from_manual_build/input/e1s1_ready
/home/je714/try_adaptive/from_manual_build/input/e1s1_ready/run.sh

Also, we are not doing

adapt.app = htmd.apps.pmemdlocal.PmemdLocal(
    pmemd='/usr/local/amber/bin/pmemd.cuda_SPFP',
    datadir='./data',
    devices=[0, 1, 2, 3])

but

adapt.app = app

where app is a htmd.LocalGPUQueue object. So the MD.sh doesn't know what ENGINE it should use and is not overwritten:

$ cat input/e1s1_ready/MD.sh
ENGINE -O -i Production.in -o Production.out -p structure.prmtop -c structure.ncrst -x Production.nc -r Production_new.rst

@stefdoerr
Copy link
Contributor

The pmemdproduction class needs rewriting indeed. For the moment I would suggest you just write a run.sh script manually which will then be called by LocalGPUQueue with the job.sh script.

To make it clearer:
We decided to separate the queuing systems from the simulation software (hence why we don't use Apps anymore and we use Queues). They work as follows: the protocols are software specific (ours are for Acemd, you wrote the one for pmemd) and they write out a run.sh file which should be standalone enough to execute a simulation.
Then the queueing classes simply write a job.sh which does some queue-specific stuff like hiding all GPUs except one, or submitting to SLURM and then call your run.sh which runs the simulation.

So now the problem is that the pmemdproduction module is out of date and needs updating to the realities of the new world. The engine will need to passed to the pmemdproduction class so that it writes it into the run.sh script.

On the matter of the error with simlist:
The error just tells you that you don't have any subfolders in your data folder while you do have some called eXsX in input folder and also your retrieve method didn't make any data folders. It might be in a way a minor bug. You can fix it by starting from epoch 1 since I see that you have simulations for epoch 1 and putting them into your data directory.

@stefdoerr
Copy link
Contributor

So to summarize the only two problems here are:
a) We need to add the ENGINE to the PMEMD Production protocol to write it to the run.sh file
b) Rename the MD.sh to run.sh

Right? I can do that.

@jeiros
Copy link
Contributor

jeiros commented Mar 14, 2017

Thanks for clarfiying. I got it to work for the moment by using

app = htmd.apps.pmemdlocal.PmemdLocal(
    pmemd='/usr/local/amber/bin/pmemd.cuda_SPFP',
    datadir='./data',
    devices=[0, 1, 2, 3])

adapt = htmd.AdaptiveMD()
adapt.nmin = 1
adapt.nmax = 3
adapt.nepochs = 10
adapt.updateperiod = 100
adapt.projection = htmd.projections.metricdistance.MetricDistance(sel1='resname LIG and name C6 C10 C19', sel2='name CA')
adapt.app = app
adapt.filtersel = 'not water and not resname "Na\+" "Cl\-"'
adapt.generatorspath = './ready'
adapt.inputpath = './input'
adapt.datapath = './data'
adapt.filteredpath = './filtered'
adapt.coorname = 'structure.ncrst'
adapt.run()

But I'll switch to using the queues.

@jeiros
Copy link
Contributor

jeiros commented Mar 14, 2017

Yes that would be it. Don't worry about it, I can play around with it and submit a PR once I think it's working fine with the queues.

To make 'changes' to an htmd installation, here's what I do:

  1. Conda install htmd-deps on a new conda environment
  2. Clone the git repo
  3. export PYTHONPATH to the repo path
  4. cd into the htmd/ repo and run python setup.py install --user

Is that how you go about it?

I am not too sure it's working for me since I keep getting HTMD: Logging setup failed when I import htmd

@stefdoerr
Copy link
Contributor

stefdoerr commented Mar 14, 2017

No, sorry. The setup.py doesn't work as far as I know.
I just do:

  1. clone repo
  2. prepend it to the PYTHONPATH
  3. Use the normal conda environment

That's it.

The only issue might be the C .so libs which you might have to copy from the conda installation (do htmd.home() to see where it's installed) to the equivallent git folder.

@j3mdamas
Copy link
Contributor

The setup.py is for pypi packaging, as far as I recall and it's still under development (#237)

@stefdoerr
Copy link
Contributor

@jeiros I added automatic topology detection now to HTMD in the latest commit
3d60ae3

So no reason to modify adaptiverun.py anymore to read prmtop files.

@stefdoerr
Copy link
Contributor

@jeiros I am going to close this. Made a new issue for it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants