New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding WHAM to MDAnalysis #842
Comments
I had a brief look at the implementations you listed.
I also had a very quick look at UWHAM. It seems to be python 2.7 only, while we try to be compatible with python 3. If UWHAM is really worth the shot, we could work with them to make their code python 3 compatible, and push them so UWHAM got submitted on pipy. Based on that, I would go for Grossfield's implementation. I'll have a deeper look at UWHAM latter on. |
Grossfields license is super restrictive because he is using code from numerical recipes. We would need to ask the numerical recipes authors for conformation. This is a little bit hidden information in his readme. The other stuff is indeed BSD licensed, I'm not sure how much work it is to find out what functions are from numerical recipes. |
Good catch. There are two functions from the numerical recipes. They are in a On 06-05-16 13:34, kain88-de wrote:
|
Not necessarily. The numerical recipes stuff is full of errors and bugs. Besides for fast code modern cpu's have different requirements. |
Interesting paper about WHAM. I'm not aware the tests suggested in this paper are implemented in any of the standard WHAM tools |
Btw, if want to do something with Grossfield's wham code then I can talk to him about it, I know Alan pretty well. |
@fiona-naughton, can you outline how you would like to use wham with MDAnalysis, i.e., a mock up of what commands you would run if some form of wham was implemented? Would this rely on using auxiliary information in trajectories #785? I want to get a feel for what a user would gain and how it would be simpler as opposed to, say, extract sampled CVs from trajectories or output files, write to data files, and then feed to 3rd party tools (been there, done it, bought the T-shirt, published the paper). |
I'd like something like I currently instead do this as you described, and while it does work, if I want to run WHAM as part of a larger analysis script I have to run a shell command from Python to run WHAM (I’m using g_wham), then parse in the resulting file(s). This seems a little nasty, and trying to find out what went wrong when the bash-command-from-Python failed doesn’t seem straightforward (this might be particular of g_wham, though?). I also think it would be nice if the user had to worry less about the right file formatting. In any case, It’d also be nice to see something like |
From my experience so far, any analysis done on groups of trajectories can be done well within the MDSynthesis frame work. I think I would build a convenient wham analysis workflow on top of MDSynthesis because @dotsdl already solved many of the problems. This will still require the aux reader, of course. It would be convenient to have a native wham implementation (but if there's not enough time then calling an external command is still a good alternative option). I like the idea of using MDAnalysis to easily reweight any quantity calculated from windows. It would be nice if this was easy as reweighted_timeseries = reweight(simulations, analysis_function, pmf_function, ...) where
Yes, MDAnalysis attempts to be format-agnostic, i.e., once you have your (aux) trajectories you should be able to run the same kind of analysis no matter how you got the trajectories. |
Hi Fiona, The PyEMMA project also contains WHAMM. http://emma-project.org/latest/api/index_thermo.html I played around with the test cases yesterday and I found it looks promising. |
Starting up the discussion here again so we can start getting something worked out while I finish up the auxiliary/bundle stuff. Not sure if there is a consensus about wrapping an existing implementation? In terms of functionality they seem to be similar enough, but not sure if any would work best with MDAnalysis? (thanks @lukas-stelzl for pointing out PyEMMA – it looks a little less straightforward that the others but I'll have another proper look). I'd maybe lean towards Grossfield? A WHAM anaylsis function could subclass a more generic bundle-analysis function, which accepts a bundle (from #900) and checks that 'expected' auxiliary/metadata specified in the analysis function are present for each trajectory before running; also passing in the actual names of the expected data in the bundle if not the defaults. So using WHAM might look like: profile = WHAM(bundle, temp='T', restraint_constant='k', restrained_value='x',
pull_force='pullf', **kwargs) (With additional kwargs to pass to the WHAM implementation – start time, end time, etc). It could also include an option for calculating errors, depending on implementation (could return a tuple of profile, error), and we can add other functions to e.g re-weight the reaction coordinate like as @orbeckst suggested. Doesn't really apply to WHAM, but in general for other Bundle-analysis where we're running over each window in turn we can have a Bauhaus approach like in #719. |
@jbarnoud and I have been looking into those numerical recipes pieces of code and they should be straightforward to replace. (And maybe even advisably so, at least in the case of the ran2 random number generator) |
Ok sounds good. Alternative: cythonize WHAM and then you can use numpy arrays instead of dealing withg memory management at the C level (which is a lot of the NR code about, IIRC). Oliver Beckstein * orbeckst@gmx.net |
An alternative to PyEmma is pymbar and I think quite a lot of development went into it. https://github.com/choderalab/pymbar/tree/master/examples . Also has an Umbrella Sampling test case. MBAR is closely related to WHAMM. http://www.alchemistry.org/wiki/Multistate_Bennett_Acceptance_Ratio |
Does pymbar do WHAM for the Umbrella Sampling? Or it just extends the MBAR concept? |
Ask @dotsdl about his experience with the code base…
Oliver Beckstein * orbeckst@gmx.net |
@lukas-stelzl iirc WHAM is related to MBAR in the sense that it will give the same answer with infinite sampling, but numerically it is a completely different method.
|
My take on it would be to start with a wrapper around the Grossfield's WHAM binary. It would be an interface that would build the input files for Grossfield's WHAM, run the program, read the output, and present them to the user. This is, I think, the best way to have something that work as soon as possible. We can then discuss the user interface on something that we can actually use. Latter on, we could interface directly with the C source so we do not have to create the intermediate files. Also, once we have a working user interface, we can adapt it to other backend/method if needed. |
Sounds like a good plan to get things rolling. It also won’t require us to immediately look at licensing issues (although for testing we will probably need to install wham). (Many years ago I wrote a wrapper for input file generation for wham and although you’re are likely better off starting fresh, I nevertheless uploaded the code to github so that anyone can have a look if they like: https://github.com/orbeckst/AdK_analysis/blob/master/python/PMF/wham.py – the code is perhaps a good example why many scientists are reluctant to share code used in papers because it’s ugly, not general, not tested… but perhaps it can be useful nevertheless.)
Good plan. You can then also look at higher-dimensional wham (not just 1D like wham or 2D like wham2d). Oliver Beckstein * orbeckst@gmx.net |
Thanks all; apologies it's taken me a while to get back around to this. So looks like I'll start working with the Grossfield wham. Looking at running from the command line, there's quite a few things required, but most of them we could have as optional + set defaults for (the number of bins + tolerance to use; the values for optional bootstrap error analysis) or calculate values for (the max/min reaction coordinate values). If the simulations all have the same temperature, what we'd need at least would be the spring constant, value of the reaction coordinate restrained to, and value of the reaction coordinate at each time point, all for each window. In a MDAnalysis wham, our input would have...
(If the simulations have different temperatures, there also needs to be a potential energy value for each time for each window, which'd be another auxiliary, and a specified temperature to calculate at and a metadata name for the temperature at each window.) The main parts of the output are the profile, the error (if boostrap anaylsis was used), and 'F values' for each window (used to calculate weighted averages for other properties). I'm not sure what the best return format would be, since the error is optional, and I don't know how often the 'F values' be used? |
Sounds reasonable. How would the API looks like? Could you show what a call would look like, and what wham command it would actually run? |
I think extending from what I had above: wham(bundle)
wham(bundle, start_time=1000)
wham(bundle, temp='T', restraint_constant='k', restrained_value='x',
pull_force='pullf')
wham(bundle, bootstrap=True)
wham(bundle, n_bins=500, min=1, max=5, periodic='radians')
wham(bundle, run_temp=300) which would run the equivalent to the following from the command line, respectively (I'm taking here the same default values g_wham uses for e.g. the number of bins; default min/max would have to be calculated): wham [min] [max] 200 1e-06 [temp] 0 [input file] [output file]
# where data for input file is taken from metadata/auxiliary assuming default names
# temp should be the same for each simulation
wham [min] [max] 200 1e06 [temp] 0 [input file] [output file]
# input data of reaction coord values only uses time points from 1000ps.
wham [min] [max] 200 1e-06 [T] 0 [input file] [output file]
# where data for input file is taken from metadata/auxilary with the provided names
wham [min] [max] 200 1e-06 [temp] 0 [input file] [output file] 200 [seed]
wham Ppi 1 5 500 1e-06 [temp] 0 [input file] [output file]
wham [min] [max] 200 1e-06 300 0 [input file] [output file]
# temp can be different for each simulation The return could be a 2 x n_bins array without bootstrap error analysis, and 3 x n_bins with (X, Y, dY); though I'm still not sure how/if to fit in the 'F values'. (There's also the 2d wham, but probably better stick with the 1d for now) |
I would also add a Finally, Grossfield's WHAM can be compiled to use kJ/mol or kcal/mol: we probably need an option to deal with this. Anyway, it looks good to me. I am impatient to be able to play with this. |
See comments on #900 and #923 : this is better implemented as a separate small package. Many thanks to @fiona-naughton who gave us the infrastructure (aux) to make this (and other cool things...) possible! |
It would be useful to have an implementation of WHAM with MDAnalysis to be able to run alongside other analysis of a set of trajectories from Umbrella Sampling (US), either a wrapper for an existing version or starting from scratch.
As I see it, it'd accept an array of pull force/reaction coordinate data from US simulations (and the relevant temperature/restraining potential/etc information), or the list of files that contain that data, and returning the PMF profile, optionally with estimated errors.
There are several existing WHAM implementations floating around that do more or less this, particularly:
There’s also a python implementation that uses the related UWHAM algorithm here (license: GPL). This supposedly converges faster than the original WHAM equations that (as far as I can tell) the above three implementations use, though g_wham at least has several measures in place to aid acceleration of convergence.
Some other points that might be worth considering:
The text was updated successfully, but these errors were encountered: