-
Notifications
You must be signed in to change notification settings - Fork 642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling umbrella sampling simulations #843
Comments
WRT handling the different simulations, I'm surprised @dotsdl hasn't shamelessly plugged MDSynthesis yet, so I will. It lets you have objects that represent a Simulation and then save data against them, and transfer atom selections across different runs etc. |
@richardjgowers It is a very good idea to leverage MDSynthesis for that. It will make a lot of things easier storing multiple simulations |
I agree that it could be nice to plug MDSynthesis there, but I do not think the feature described here should need MDSynthesis. I doubt that a persistent storage is required here, even though it can be useful in some cases. |
I think this is an excellent use case for MDSynthesis, and not necessarily because of the persistence (although that can definitely help). The built-in mechanisms for filtering and grouping based on simulation metadata can do a lot to help with rapidly prototyping here, and ultimately can just be used as the backend for the complicated stuff that needs implementing. I think this is preferable to what would probably be a lot of reinventing the wheel, time which instead could be used on the core WHAM elements. Once we have something that works, we can see how little of it requires MDSynthesis and cut, cut, cut. |
Restarting this discussion now I'm (finally) moving on from add_auxiliary stuff. I'll start here looking at the setting up + data storage side.
Again, some of these might be the same between windows, and the temp, restraint constant and reaction coord value could be pulled from simulation input files but for now just stick to manually specifying. Loading an umbrella sampling set might then look something like: us = Umbrella(topology, [trajectories], pull_f=[pull_force_files], {add_auxiliary kwargs},
restr=[restrained_values], temp=temperature(s), k=restraint_constant(s)) Which might be starting to get a little messy, particularly if there's also other kwargs for loading each universe, and for the US set in general? In any case, creating the Umbrella instance should end up for each window loading the universe, adding the appropriate auxiliary data as say 'pull_f' or 'pull_x', and if we're using MDSynthesis, the other window information can be stored in appropriate categories. The Umbrella instance itself could then store the list of windows and let us iterate over them; and later will have all the various analysis methods. Hopefully that sounds sensible - again, any comments/suggestions/etc welcome! |
Some questions:
|
Just to be annoying.. couldn't the windows all have a different number of (solvent) atoms? So instead of |
You can have a Assuming the windows are all produced in roughly the same way, the |
I guess I mainly see the Umbrella object allowing to run various analysis without having to pass the list of universes every time, so e.g.
Being able to add windows later would be a good idea - I guess just something like
If we give the windows names, or default name them based on e.g. reaction coord value, input order, something like
Yes - it makes sense to allow setting up from already established universes and/or auxiliary readers (but still allow just passing in the list of files when they're set up the same).
I'm not sure what you mean, sorry? |
Doesn't gromacs already produce some directory structure for umbrella, replica exchange simulations that we could parse? |
I do not know for replica exchange, but there no such thing for umbrella sampling simulations. US windows are usually separate simulations, and nothing connects them for gromacs. —You are receiving this because you commented.Reply to this email directly, view it on GitHub, or mute the thread. |
tl;dr The
So you see the However, I would prefer the analyses to be fed an us = Umbrella([...])
pmf = Wham(us, [...]) rather than us = Umbrella([...])
pmf = us.wham([...]) Feeding the collection of trajectories to the analyses makes it easier to write new analyses. We could have a
Good to me. I would also have a
Here, I really prefer the dict way rather than the namespace way. My windows are likely to be named after distances so I want to be able to use names like "2.3nm". I would like to be able to access my windows by index too. Here is a use case. I assume I work on alpha helices dimerisation in a bilayer with a setup similar to http://dx.doi.org/10.1016/j.chemphyslip.2013.02.001. I carried out one window simulation for each distances between 0.5 and 4 nm with a 0.1 nm increment. I loaded all the trajectories in a # I want to check something on one of the windows.
# I'll get the one named "2.5".
u = us["2.5"]
# I wand to access the first trajectory.
u = us[0]
# I realized helices are to close to each other in the closest windows, and there is not
enough sampling in the furthest ones.
new_us = us[5:-5]
# I just want a quick preliminary result with less granulosity.
# I run my analyses only on every other window.
new_us = us[::2]
# I want to loop over my trajectories.
for u in us:
do_something(u) Latter on, there are things that I would like to do. Among them, I would like to be able to have numerical properties attached to each window, and to be able to select the windows based on these properties. How great would it be to be able to do something like: # Select windows based on the inputed distance constraint.
new_us = us.select('1.0 <= umbrella_distance <= 3.5') # I really like nm
# Select windows based on the actual average distance.
somehow_calculate_the_inter_helical_distance(us)
new_us = us.select('mean_distance > 10') # mda works in Å
Never mind, I was doing some over-engineering here. I saw a problem if the user has to deal with to auxiliary field that have to be build differently (for instance the On a more general standpoint, I am not sure the For instance, I expect the collection of trajectory to be able to tell me that all the trajectory have a This indeed look quite a lot to what MDSynthesis provides. In the future, it would be great to be able to build a collection from any level of organisation provided by detreant. As I see it, the class needs at least the following methods, or equivalent ones:
No need for All the magic happen in a |
In addition to my (way too) long post from yesterday, here are some random ideas. Maybe some of them will stick. I suggested yesterday that the meta data could be saved in the Alternatively, they could be saved in the us = Umbrella([...])
us[0].data['umbrella_distance'] We may not want to save to many things inside the universes. Then, we could wrap each universe in an object, lets call it |
Thanks! Something more general like that sounds much better. I'll start putting something together and make another WIP pull request. So the idea would be to start off independent of MDSynthesis, making an |
I suggest you play around with the sims first on your own for to get to know the MDSynthesis-API and read the excellent scipy paper from @dotsdl. Since neither of you knows exactly how the API should look like I would suggest to start from the Sims objects and the 'Bundles' MDSynthesis provides (it already supports meta data for example and a good deal of queries). You can keep an additional notebook in a gist or separate repository where you can play around with the API and do quick iterations. This workflow would aim at understanding the Sim objects so that you better know what to copy and allow for quick iteration on the API to see what works for you and what doesn't. A side note. For the behavior in class Collection(Namespace):
def __getitem__(self, item):
if isinstance(item, str):
# calling super like this might be wrong.
# This calls from __getitem__ from the parent class (here `Namespace`)
return super(Collection, __getitem__)(item);
else:
# work on the integer. Btw the pure integer access that @jbarnoud suggest requires some kind of natural sorting for the simulations in a Collection/Bundle. This can be defined for Umbrella simulations and Replica Exchange simulations but there is almost no way we know how sort using an intelligent guessing algorithm. You have to provide some function/kwarg that can be used to define an ordering of the universes in the Collection/Bundle. |
@fiona-naughton --- I do not know how practical my suggestions are. So far, I took the problems I saw one at a time and suggested a solution that could fit. There are things I did not consider yet and that may be worth discussing before settling to an API. For instance, how practical will each option be for parallelisation? Indeed, if we have a collection of trajectory for analyses to use, we want that collection to be parallelisation-friendly.
I am counting pluses and minuses for each option, but I would be happy with ideas for others. Ping @MDAnalysis/coredevs ? As @kain88-de suggested, it would be best to first play a bit with MDSynthesis Anyway, you can setup the container already. It may help us figuring out the drawbacks of the different options. Maybe you could write a jupyter notebook with prototypes? If the prototypes (even very rought) come in soon enough, we should be able to discuss them, to build on top of them, and to settle on one of them by the end of next week. Once a prototype chosen, there will only be some cleaning to do to get a proper PR. @kain88-de --- So far, we always assumed that the collection would be fed a list of universes or path to trajectories. Since these lists are ordered, we just have to use the same order as the input. Being able to sort a collection according to a given criterion would be great, though! |
I read to fast. I basically said the same as @kain88-de, but with much more words. |
You underestimate my stupidity ;-). I would glob for folders where no sorting is guaranteed from glob import glob
import MDAnalysis as mda
col = mda.Bundle(glob('parameter-study-val_*/sim.xtc') I expect something like this to work when I can supply lists. But the ordering isn't specified here. |
On 24/06/16 15:03, kain88-de wrote:
But most importantly, if you do not care about the order, then it is not |
Sorry, I'm confused about exactly what you think I should be doing - playing around with MDSynthesis is definitely a good idea, but is that just to see how it works as an example, or should I start setting up stuff assuming it's using MDSynthesis (at least at first)? And I'm not sure what you mean for different prototypes - do you mean having an example class(es) defined for each case of where we could store the metadata? What exactly should the prototypes do that would allow us to compare between them, or is it more a matter of finding out how easy/hard each is to set up + get working? I'm also having a bit of trouble installing MDSynthesis because it's complaining about not finding HDF5 - I think it's already installed, but no idea how to figure out where so I can pass that in? Re: using glob, I was assuming it'd be passed like |
We have 3 scenarii based on my earlier comment (#843 (comment)). The base is common to the 3 of them: a list/dict like container with some method to request what matadata/aux field is common to all the trajectories that it contains. The differences are:
I would like you to roughly implement the 3 scenarii in a jupyter notebook so we can see which one is the more practical, and to identify hidden traps on the way. I cannot help with installing MDSynthesis right now, especially without more details. Hopefully, asking on the MDSynthesis channels (mailing list? issue tracker?) could help. In the mean time, you can write a class that mimicks the relevant parts of the Writing the prototype should not be very long. Ideally, the notebook would demonstrate how to access the universes, and the metadata in the 3 scenarii. |
About mdsynthesis please write what you did and your problems to the datreant mailing list. We can have a look over your problems there. As an alternative use conda there everything works. |
Ok, thanks. I'll start on that. (I dug through my anaconda installation to find hdf5 + manually specified the location; MDSynthesis seems to have installed properly now) |
Please still write on the mailinglist we provide conda packages. It shouldn't have been a problem to install. |
@kain88-de this is an aside, but we never merged instructions for installing |
I played with MDSynthesis and went through the scipy paper. A Bundle seems to cover everything I described above but the handling of AuxReaders (make sense, they are brand new). Wether or not we end up using it, I will have a very serious look at MDSynthesis for my own simulations. |
Yes, it certainly looks very useful! |
It looks interesting, indeed. That is definitively something to consider! I would wait to be sure that we need it before adding it, though. Adding things to |
@fiona-naughton one goal of I'd focus on making auxiliaries work within a |
@jbarnoud, @fiona-naughton I think it would be a lot of work to mimic the API of an I am of the (biased) opinion that MDSynthesis is probably one of the best answers to any problem pulling together data from multiple simulations. I'm happy to help with any effort that makes use of it toward something useful, and I think something as well-defined and focused as getting free energy differences from umbrella-sampling simulations should be a lot easier to pull together with it than without. |
@fiona-naughton suggestion would do just that 👍
Should the aux reader added with
|
Not at all. Once auxiliaries are a thing in MDAnalysis, we can add in a mechanism for extracting this information from a
At the moment MDSynthesis has both
Analysis modules are allowed to have any dependencies they want. So, if MDSynthesis is particularly useful here, no worries. My comment above was just a thought since a barrier to using it might be the HDF5 stuff, which is included as a convenience. |
I put up a gist over here with the Jupyter notebook I've been testing the different data-storing-options in. Being able to use Bundles definitely simplified several things, though the others seem to be working more or less the same for the stuff I've implemented at the moment. If nothing else, being able to read in Sims with various metadata already added would be very convenient (I guess so long as your naming is consistent). If we can have MDSynthesis without A thought on reloading Universes with Auxiliaries - when the auxiliary data is in a file it looks like it'd be relatively straightforward, but if we're allowing to input auxiliary data as arrays etc we'd need to save that out somewhere, so we'd need to have a format for that (which is where |
Good work with the prototypes @fiona-naughton. It seems to be a lot of knitting to have everything in sync, a knitting that is already quite well done by MDSynthesis. One issue that I see is that you use sim.categories to store data. This is understandable as we suggested to have mdsynthesis.data optional. But it misuses detreant's semantics, and it would not fit regular mdsynthesis use. Not using mdsynthesis as a user would expect it to be used makes it less interesting to use mdsynthesis. Would it be possible to have a degraded version of detreant.data that only deals with pure python values, and perhaps numpy arrays? This degraded Limb would get superseded by the proper detreant.data if installed. This way, mdsynthesis's Sims and detreant's Bundles can be used already for the simplest cases; if the user use cases become more complex, the user can move to the proper detreant.data seamlessly. What do we need to use MDSynthesis? From what I see, we need a detreant Limb to describe the aux readers, and we need Universes to be able to list their attached auxiliary fields in a way that allows to regenerate them. Am I missing anything? |
@jbarnoud I was assuming data will mostly be metadata, like 'restraint constant' or 'restrained value', which I understood as the sort of thing |
For future reference I would like to use boolean expressions for selecting universes from a bundle. Like |
@kain88-de perhaps that should be reserved for discussion in |
@jbarnoud, @fiona-naughton this is actually a perfectly acceptable use of categories. They are simply key-value pairs, and I use them all the time for storing parameters like these for easy grouping and filtering later. The |
@dotsdl Great then. No need for complicated shenanigans. @fiona-naughton How can an Universe list the auxiliaries? Maybe the AuxReader should keep memory of its kwargs the same way as Universe does? Once we have that, we can consider adding a Limb to the Sim for the auxiliary fields and hope @dotsdl and @kain88-de will accept it. With Sim handling the auxiliary fields, a Bundle covers everything we expected from a collection of Universes and we can move on to the next step. |
Getting the kwargs should be straightforward, either storing them or noting down the current values of the appropriate attributes (the latter might work better given say the Again, for the data itself - if it's a file we can just keep the filename/path (which XVGReader currently doesn't do, but I can add it); if in the future we add AuxReaders for say ndarrays, we'd need to save the data somewhere, but we can leave that problem for later... |
On 30/06/16 14:23, Fiona Naughton wrote:
|
I've added a |
Good! Why would it work differently with MDSynthesis? What would be different? |
Not really different - I guess I was just thinking we might need to add something to get from how we'd store the kwargs etc in an auxiliary Limb to one big dictionary for add_auxiliary (and visa versa), but should be straightforward enough. |
Having disappeared for a while, I'm finally back working on this >.< I've gone ahead using So next step would be to add auxiliaries to MDSynthesis, which looks like we could add as a Limb; @dotsdl, any suggestions on how best to proceed? |
@fiona-naughton I'll have a look at the auxiliaries API and get a sense for how best to persist the information in a I'm happy to take that on from the side of MDSynthesis; I don't think it'll be a complicated addition, so I'm in a good position to get it there quickly. |
See comments on #900 and #923 : this is better implemented as a separate small package. Many thanks to @fiona-naughton who gave us the infrastructure (aux) to make this (and other cool things...) possible! |
Dear All, reaction coordinate d(C1-Br2) - d(O6-C1)&rst Thanks |
@HafizSaqib – you better ask such a question on the mailing list of your MD code (e.g. AMBER http://archive.ambermd.org/ ). The issue tracker is discussing bugs and enhancement for the MDAnalysis library |
As started on the mailing list; linked with #785 and #842.
It would be nice to have some form of class for dealing with a set of umbrella sampling (or similar) simulations. At the moment this is a very general overview of what I'd like it to do; more specific practicalities (hopefully) to be added!
Ideally, I'd see an umbrella sampling class as being able to:
Store the trajectory/other relevant data for each window
Given the list of trajectory and extra pull force/reaction coordinate data files for each window, create a universe for each and add the corresponding extra data using the add_auxiliary method in #785.
It’d also need to store temperature and restraining potential (force constant and center, if we assume harmonic potentials) for each window – supplied directly or presumably could parsed in from simulation parameter files. These and the trajectories could be stored in the class as dictionaries, or together in another class, with each window assigned an appropriate name.
Perform existing analysis across the set of simulations
Given an existing set of steps to calculate a particular property of interest, it would be nice to be able to pass this to the umbrella class to be run over all the trajectories, then from the resultant time series calculate and return either histograms or averages over each window, along the reaction coordinate, or overall.
The umbrella class should also have an ‘equilibration time’ attribute, so the (non-equilibrium) data before this time can be ignored for analysis.
Example uses of this, taking e.g. the case of investigating binding of two molecules (reaction coordinate is distance between them):
Run WHAM
Call the WHAM implementation created per #842, passing the auxiliary data/window parameters, and return/store the resultant PMF profile, optionally with estimated error.
Some related features that would be useful:
The general practice for deciding the equilibration time and checking convergence of US simulations is to perform WHAM for consecutive blocks of time and check how the free energy profile evolves.
It would be nice if the umbrella class could simplify this with a method to perform all the WHAM calls on appropriate intervals and plotting, to then manually check convergence and add equilibration time. It would be even nicer if this could be fully automated, though I’m not yet sure how this would work.
Similarly, checking for sufficient sampling along the reaction coordinate tends to be a relatively arbitrary check that the histograms from each window look sufficiently overlapped. It’d be nice to have an automated way of doing this, which need only calculate the overlap of reaction-coordinate histograms between adjacent windows and check it’s above a cut-off.
Return the value of the free energy change along the reaction coordinate (e.g. binding free energy when reaction coordinate is distance between two molecules) from the profile. This might be a bit difficult since there doesn’t seem to be particular agreement on how to get from the PMF profile to a more meaningful e.g. binding free energy – this is sometimes taken as just the well depth of the profile, or sometimes involves integrating the profile with various factors to account for missing degrees of freedom and additional terms to account for restraints, etc.
The text was updated successfully, but these errors were encountered: