Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling umbrella sampling simulations #843

Closed
fiona-naughton opened this issue May 5, 2016 · 49 comments
Closed

Handling umbrella sampling simulations #843

fiona-naughton opened this issue May 5, 2016 · 49 comments
Assignees

Comments

@fiona-naughton
Copy link
Contributor

As started on the mailing list; linked with #785 and #842.

It would be nice to have some form of class for dealing with a set of umbrella sampling (or similar) simulations. At the moment this is a very general overview of what I'd like it to do; more specific practicalities (hopefully) to be added!

Ideally, I'd see an umbrella sampling class as being able to:

Store the trajectory/other relevant data for each window

Given the list of trajectory and extra pull force/reaction coordinate data files for each window, create a universe for each and add the corresponding extra data using the add_auxiliary method in #785.

It’d also need to store temperature and restraining potential (force constant and center, if we assume harmonic potentials) for each window – supplied directly or presumably could parsed in from simulation parameter files. These and the trajectories could be stored in the class as dictionaries, or together in another class, with each window assigned an appropriate name.

Perform existing analysis across the set of simulations

Given an existing set of steps to calculate a particular property of interest, it would be nice to be able to pass this to the umbrella class to be run over all the trajectories, then from the resultant time series calculate and return either histograms or averages over each window, along the reaction coordinate, or overall.

The umbrella class should also have an ‘equilibration time’ attribute, so the (non-equilibrium) data before this time can be ignored for analysis.

Example uses of this, taking e.g. the case of investigating binding of two molecules (reaction coordinate is distance between them):

  • Obtaining histograms of relative orientation of the two molecules (calculated by measuring appropriate angles) to check sampling along this other degree of freedom
  • Taking averages of properties of the environment (e.g. if there's a membrane involved) along the reaction coordinate to see if/how these change with the molecule separation.
  • Performing RMSD analysis and pulling out an ‘average’ structure for each window would be less straightforward but also quite useful.

Run WHAM

Call the WHAM implementation created per #842, passing the auxiliary data/window parameters, and return/store the resultant PMF profile, optionally with estimated error.

Some related features that would be useful:

  • Check convergence
    The general practice for deciding the equilibration time and checking convergence of US simulations is to perform WHAM for consecutive blocks of time and check how the free energy profile evolves.
    It would be nice if the umbrella class could simplify this with a method to perform all the WHAM calls on appropriate intervals and plotting, to then manually check convergence and add equilibration time. It would be even nicer if this could be fully automated, though I’m not yet sure how this would work.
  • Check sampling
    Similarly, checking for sufficient sampling along the reaction coordinate tends to be a relatively arbitrary check that the histograms from each window look sufficiently overlapped. It’d be nice to have an automated way of doing this, which need only calculate the overlap of reaction-coordinate histograms between adjacent windows and check it’s above a cut-off.
  • Calculate free energy
    Return the value of the free energy change along the reaction coordinate (e.g. binding free energy when reaction coordinate is distance between two molecules) from the profile. This might be a bit difficult since there doesn’t seem to be particular agreement on how to get from the PMF profile to a more meaningful e.g. binding free energy – this is sometimes taken as just the well depth of the profile, or sometimes involves integrating the profile with various factors to account for missing degrees of freedom and additional terms to account for restraints, etc.
@kain88-de kain88-de added this to the 0.16.0 milestone May 6, 2016
@richardjgowers
Copy link
Member

WRT handling the different simulations, I'm surprised @dotsdl hasn't shamelessly plugged MDSynthesis yet, so I will. It lets you have objects that represent a Simulation and then save data against them, and transfer atom selections across different runs etc.

@kain88-de
Copy link
Member

@richardjgowers It is a very good idea to leverage MDSynthesis for that. It will make a lot of things easier storing multiple simulations

@jbarnoud
Copy link
Contributor

jbarnoud commented May 9, 2016

I agree that it could be nice to plug MDSynthesis there, but I do not think the feature described here should need MDSynthesis. I doubt that a persistent storage is required here, even though it can be useful in some cases.

@dotsdl
Copy link
Member

dotsdl commented May 9, 2016

I think this is an excellent use case for MDSynthesis, and not necessarily because of the persistence (although that can definitely help). The built-in mechanisms for filtering and grouping based on simulation metadata can do a lot to help with rapidly prototyping here, and ultimately can just be used as the backend for the complicated stuff that needs implementing.

I think this is preferable to what would probably be a lot of reinventing the wheel, time which instead could be used on the core WHAM elements. Once we have something that works, we can see how little of it requires MDSynthesis and cut, cut, cut.

@fiona-naughton
Copy link
Contributor Author

Restarting this discussion now I'm (finally) moving on from add_auxiliary stuff. I'll start here looking at the setting up + data storage side.
So for each window we're going to need:

  • trajectory
  • topology
  • temperature
  • restraint constant
  • restrained reaction coord value
  • reaction coord/pull force data
  • (a name?)

Again, some of these might be the same between windows, and the temp, restraint constant and reaction coord value could be pulled from simulation input files but for now just stick to manually specifying.

Loading an umbrella sampling set might then look something like:

us = Umbrella(topology, [trajectories], pull_f=[pull_force_files], {add_auxiliary kwargs}, 
             restr=[restrained_values], temp=temperature(s), k=restraint_constant(s))  

Which might be starting to get a little messy, particularly if there's also other kwargs for loading each universe, and for the US set in general?

In any case, creating the Umbrella instance should end up for each window loading the universe, adding the appropriate auxiliary data as say 'pull_f' or 'pull_x', and if we're using MDSynthesis, the other window information can be stored in appropriate categories. The Umbrella instance itself could then store the list of windows and let us iterate over them; and later will have all the various analysis methods.

Hopefully that sounds sensible - again, any comments/suggestions/etc welcome!

@jbarnoud
Copy link
Contributor

Some questions:

  • how do you see the Umbrella object used?
  • does the object brings anything compared to a list of universes?
  • can I add a window?
  • how do I access a given window?

@richardjgowers
Copy link
Member

Just to be annoying.. couldn't the windows all have a different number of (solvent) atoms? So instead of topology [trajs] wouldn't a more robust setup be [universes]

@jbarnoud
Copy link
Contributor

Which might be starting to get a little messy, particularly if there's also other kwargs for loading each universe, and for the US set in general?

You can have a universe_kwargs argument in the init of the Umbrella class. If that argument is a dictionary, you can pass it when creating the universes. It seems a reasonable assumption that the keyword arguments will be the same for all the universes. If the universes are all built in a custom way, then you may need to allow the user to create the Umbrella object from existing universes. (I am too slow, just @richardjgowers suggested that already)

Assuming the windows are all produced in roughly the same way, the AuxReader for all the windows expect the same kwargs. So you should be able to do the same with an aux_kwargs. Where it starts to be annoying is if the different auxiliary fields (pull_f, pull_x, ...) need to be built differently—which is very likely. Accepting a list of already instantiated aux reader for each field may be a solution; a way to build these list from a template would come handy, then.

@fiona-naughton
Copy link
Contributor Author

  • how do you see the Umbrella object used?
  • does the object brings anything compared to a list of universes?

I guess I mainly see the Umbrella object allowing to run various analysis without having to pass the list of universes every time, so e.g. profile = Umbrella.wham([other options]) rather than wham([list of universes], start_time, [other options], and also storing the PMF profile and the 'convergence time' to use for calculating the profile + other analysis.

  • can I add a window?

Being able to add windows later would be a good idea - I guess just something like us.add_window(name, trajectory, pull_force_file, reaction_coord_value, **kwargs) with the other parameters (topology, restraint constant, various kwargs, ...) specified if necessary; if they're the same as for other windows, the Umbrella object could also store the default values so we don't need to specify them again.

  • how do I access a given window?

If we give the windows names, or default name them based on e.g. reaction coord value, input order, something like us.windows['window1'], etc; though it'd be nice if the windows could be accessed as e.g. us.window1, us.window2 (similar to the timestep auxiliary values in #868 ).

Just to be annoying.. couldn't the windows all have a different number of (solvent) atoms? So instead of topology [trajs] wouldn't a more robust setup be [universes]

Yes - it makes sense to allow setting up from already established universes and/or auxiliary readers (but still allow just passing in the list of files when they're set up the same).

a way to build these list from a template would come handy, then.

I'm not sure what you mean, sorry?

@kain88-de
Copy link
Member

Doesn't gromacs already produce some directory structure for umbrella, replica exchange simulations that we could parse?

@jbarnoud
Copy link
Contributor

I do not know for replica exchange, but there no such thing for umbrella sampling simulations. US windows are usually separate simulations, and nothing connects them for gromacs.
Le 22 juin 2016 8:22 PM, kain88-de notifications@github.com a écrit :Doesn't gromacs already produce some directory structure for umbrella, replica exchange simulations that we could parse?

—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or mute the thread.

@jbarnoud
Copy link
Contributor

tl;dr The Umbrella class scope should be more general. I see it as a general collection of universes that gives uniform access to metadata, and that tells what auxiliary fields and what metadata we are sure to find for all universes it contains. This collection can be used by analyses class that need to deal with multiple trajectories to make sure all universes have what the analysis needs.

  • how do you see the Umbrella object used?
  • does the object brings anything compared to a list of universes?

I guess I mainly see the Umbrella object allowing to run various analysis without having to pass the list of universes every time, so e.g. profile = Umbrella.wham([other options]) rather than wham([list of universes], start_time, [other options], and also storing the PMF profile and the 'convergence time' to use for calculating the profile + other analysis.

So you see the Umbrella class as a collection of Universes with attribute attached to then. Then, analyses that need multiple trajectories as input could use that object. I like that.

However, I would prefer the analyses to be fed an Umbrella instance rather than the analyses being methods. In other words, I would prefer:

us = Umbrella([...])
pmf = Wham(us, [...])

rather than

us = Umbrella([...])
pmf = us.wham([...])

Feeding the collection of trajectories to the analyses makes it easier to write new analyses. We could have a AnalysisCollectionBase in the same way we have AnalysisBase, and follow the same Bauhaus spirit.

  • can I add a window?

Being able to add windows later would be a good idea - I guess just something like us.add_window(name, trajectory, pull_force_file, reaction_coord_value, **kwargs) with the other parameters (topology, restraint constant, various kwargs, ...) specified if necessary; if they're the same as for other windows, the Umbrella object could also store the default values so we don't need to specify them again.

Good to me. I would also have a us.add_windows or us.add_multiple_windows, and us.insert_window similar to list.insert.

  • how do I access a given window?

If we give the windows names, or default name them based on e.g. reaction coord value, input order, something like us.windows['window1'], etc; though it'd be nice if the windows could be accessed as e.g. us.window1, us.window2 (similar to the timestep auxiliary values in #868 ).

Here, I really prefer the dict way rather than the namespace way. My windows are likely to be named after distances so I want to be able to use names like "2.3nm". I would like to be able to access my windows by index too.

Here is a use case. I assume I work on alpha helices dimerisation in a bilayer with a setup similar to http://dx.doi.org/10.1016/j.chemphyslip.2013.02.001. I carried out one window simulation for each distances between 0.5 and 4 nm with a 0.1 nm increment. I loaded all the trajectories in a Umbrella object called us with the distances in nm as names (i.e. "0.5", "0.6", ..., "4.0").

# I want to check something on one of the windows.
# I'll get the one named "2.5".
u = us["2.5"]

# I wand to access the first trajectory.
u = us[0]

# I realized helices are to close to each other in the closest windows, and there is not
enough sampling in the furthest ones.
new_us = us[5:-5]

# I just want a quick preliminary result with less granulosity.
# I run my analyses only on every other window.
new_us = us[::2]

# I want to loop over my trajectories.
for u in us:
    do_something(u)

Latter on, there are things that I would like to do. Among them, I would like to be able to have numerical properties attached to each window, and to be able to select the windows based on these properties. How great would it be to be able to do something like:

# Select windows based on the inputed distance constraint.
new_us = us.select('1.0 <= umbrella_distance <= 3.5')  # I really like nm

# Select windows based on the actual average distance.
somehow_calculate_the_inter_helical_distance(us)
new_us = us.select('mean_distance > 10')  # mda works in Å

a way to build these list from a template would come handy, then.

I'm not sure what you mean, sorry?

Never mind, I was doing some over-engineering here. I saw a problem if the user has to deal with to auxiliary field that have to be build differently (for instance the AuxReader for pull_x and pull_f need different kwargs), then I though it would be convenient to have a way to provide, for each aux field, the list of file and a template so the Umbrella class can build the list of AuxReader for each field in a different way. It is actually much easier to let the user produce a list of AuxReader himself.


On a more general standpoint, I am not sure the Umbrella class has anything specific to umbrella sampling. At the end of the day, it is a generic container that allows to deal with a collection of trajectories. I expect that container to give me access to the trajectories at least like a list, but also to allow me to access properties of these trajectories in a uniform way, an to tell me what I can expect to have access to in that uniform way.

For instance, I expect the collection of trajectory to be able to tell me that all the trajectory have a pull_x and a pull_f auxiliary field, and that they all have a umbrella_distance metadata attached to them.

This indeed look quite a lot to what MDSynthesis provides. In the future, it would be great to be able to build a collection from any level of organisation provided by detreant.

As I see it, the class needs at least the following methods, or equivalent ones:

  • __init__ of course
  • __len__ gives the number of trajectories
  • __getitem__ by name, by index, and by slices
  • __add__ to combine collections (assuming they are compatible)
  • is_compatible to know if an other collection is compatible and can be concatenated
  • append to add a new trajectory at the end
  • insert to add a trajectory at a given position (for instance if I simulated new windows to fill a gap in the sampling)
  • insert_multiple because I probably simulated more that one window to fill the gap
  • aux_fields to list the auxiliary fields that are common to all the trajectories
  • data_fields to list the metadata that are common to all the trajectories
  • get_data(trajectory_key, field_name) get the value corresponding to the given metadata field for the given trajectory
  • set_data(trajectory_key, field_name) set the value
  • get_data_serie(field_name) get the values for the corresponding metadata field as a list in the same order as the trajectory so I can plot it
  • set_data_serie

No need for __iter__ or __next__, having __len__ and __getitem__ is enough for the object to behave as an iterable. The name Umbrella may not be appropriate is the scope of the class is more general.

All the magic happen in a CollectionAnalysisBase class that will use that collection of trajectories. The Wham class in #842 could inherit from that class.

@jbarnoud
Copy link
Contributor

jbarnoud commented Jun 24, 2016

In addition to my (way too) long post from yesterday, here are some random ideas. Maybe some of them will stick.

I suggested yesterday that the meta data could be saved in the Umbrella object, and that they could be accessible through the get_data and set_data methods.

Alternatively, they could be saved in the Universe object directly:

us = Umbrella([...])
us[0].data['umbrella_distance']

We may not want to save to many things inside the universes. Then, we could wrap each universe in an object, lets call it Sim, and have that wrapper hold the metadata in a data attribute. Then the wrapper really looks like a Sim object from MDSynthesis. We could copy the relevant part of its API, so we could use indifferently MDSyntesis.Sim or MDAnalysis.Sim in the container. That could be too much layers, but it would let us have cool stuff in the future like preset selections.

@fiona-naughton
Copy link
Contributor Author

Thanks! Something more general like that sounds much better. I'll start putting something together and make another WIP pull request.

So the idea would be to start off independent of MDSynthesis, making an MDAnalysis.Sim to pair trajectories/auxiliaries/metadata, and the Umbrella object (perhaps renamed to something like UniverseCollection?) storing a set of these and aux/metadata names (set up either from a list of MDAnalysis.Sim and required aux/metadata, or passing in the lists of trajectories, auxiliaries and metadata properties to make the Sims); then later, also allowing a list of MDSynthesis.Sim to be provided instead (and creating an MDSynthesis.Sim from a MDAnalysis.Sim)?

@kain88-de
Copy link
Member

I suggest you play around with the sims first on your own for to get to know the MDSynthesis-API and read the excellent scipy paper from @dotsdl.

Since neither of you knows exactly how the API should look like I would suggest to start from the Sims objects and the 'Bundles' MDSynthesis provides (it already supports meta data for example and a good deal of queries). You can keep an additional notebook in a gist or separate repository where you can play around with the API and do quick iterations.

This workflow would aim at understanding the Sim objects so that you better know what to copy and allow for quick iteration on the API to see what works for you and what doesn't.

A side note. For the behavior in __getitem__ that @jbarnoud prefers you can still use the Namespace class, at least for the names. Using super we can use pythons inheritance to only call a different __getitem__ when a integer or slice is given.

class Collection(Namespace):
    def __getitem__(self, item):
        if isinstance(item, str):
            # calling super like this might be wrong.
            # This calls from __getitem__ from the parent class (here `Namespace`)
            return super(Collection, __getitem__)(item);
        else:
            # work on the integer.

Btw the pure integer access that @jbarnoud suggest requires some kind of natural sorting for the simulations in a Collection/Bundle. This can be defined for Umbrella simulations and Replica Exchange simulations but there is almost no way we know how sort using an intelligent guessing algorithm. You have to provide some function/kwarg that can be used to define an ordering of the universes in the Collection/Bundle.

@jbarnoud
Copy link
Contributor

@fiona-naughton --- I do not know how practical my suggestions are. So far, I took the problems I saw one at a time and suggested a solution that could fit. There are things I did not consider yet and that may be worth discussing before settling to an API.

For instance, how practical will each option be for parallelisation? Indeed, if we have a collection of trajectory for analyses to use, we want that collection to be parallelisation-friendly.

  • Having the data in the collection might be the worst. We would have to give each process a universe and the metadata associated with it.
  • Having the data in the universe makes big fat universes. Earlier, we mentioned that it would be nice to have small disposable universes.
  • Having a Sim wrapper seems a good compromise, but it adds an additional layer.

I am counting pluses and minuses for each option, but I would be happy with ideas for others. Ping @MDAnalysis/coredevs ?

As @kain88-de suggested, it would be best to first play a bit with MDSynthesis Sim to see if it what we need here. Maybe @dotsdl could comment on the idea?

Anyway, you can setup the container already. It may help us figuring out the drawbacks of the different options. Maybe you could write a jupyter notebook with prototypes? If the prototypes (even very rought) come in soon enough, we should be able to discuss them, to build on top of them, and to settle on one of them by the end of next week. Once a prototype chosen, there will only be some cleaning to do to get a proper PR.

@kain88-de --- So far, we always assumed that the collection would be fed a list of universes or path to trajectories. Since these lists are ordered, we just have to use the same order as the input. Being able to sort a collection according to a given criterion would be great, though!

@jbarnoud
Copy link
Contributor

I read to fast. I basically said the same as @kain88-de, but with much more words.

@kain88-de
Copy link
Member

@kain88-de --- So far, we always assumed that the collection would be fed a list of universes or path to trajectories. Since these lists are ordered, we just have to use the same order as the input. Being able to sort a collection according to a given criterion would be great, though!

You underestimate my stupidity ;-). I would glob for folders where no sorting is guaranteed

from glob import glob
import MDAnalysis as mda

col = mda.Bundle(glob('parameter-study-val_*/sim.xtc')

I expect something like this to work when I can supply lists. But the ordering isn't specified here.

@jbarnoud
Copy link
Contributor

On 24/06/16 15:03, kain88-de wrote:

@kain88-de <https://github.com/kain88-de> --- So far, we always
assumed that the collection would be fed a list of universes or
path to trajectories. Since these lists are ordered, we just have
to use the same order as the input. Being able to sort a
collection according to a given criterion would be great, though!

You underestimate my stupidity ;-). I would glob for folders where no
sorting is guaranteed

from globimport glob
import MDAnalysisas mda

col= mda.Bundle(glob('parameter-study-val_*/sim.xtc')

I expect something like this to work when I can supply lists. But the
ordering isn't specified here.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#843 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABUWumKoWms5nhbseqDy8deG_6COLkeGks5qO9WOgaJpZM4IX_y4.

Hopefully, your AuxReader and other properties will have the same order
or you may have issues with the current proposal (hey! one more issue I
did not think about, thanks!).

But most importantly, if you do not care about the order, then it is not
an issue that the ordering is not specified. What matters however is
that the order remains constant in the lifetime of the Bundle (or
UniverseCollection).

@fiona-naughton
Copy link
Contributor Author

Sorry, I'm confused about exactly what you think I should be doing - playing around with MDSynthesis is definitely a good idea, but is that just to see how it works as an example, or should I start setting up stuff assuming it's using MDSynthesis (at least at first)? And I'm not sure what you mean for different prototypes - do you mean having an example class(es) defined for each case of where we could store the metadata? What exactly should the prototypes do that would allow us to compare between them, or is it more a matter of finding out how easy/hard each is to set up + get working?

I'm also having a bit of trouble installing MDSynthesis because it's complaining about not finding HDF5 - I think it's already installed, but no idea how to figure out where so I can pass that in?

Re: using glob, I was assuming it'd be passed like sorted(glob('*.xtc')), though there might still be the problem of making sure the auxiliaries have the same order depending on the naming scheme - something to think about...

@jbarnoud
Copy link
Contributor

We have 3 scenarii based on my earlier comment (#843 (comment)). The base is common to the 3 of them: a list/dict like container with some method to request what matadata/aux field is common to all the trajectories that it contains. The differences are:

  • One stores the metadata in the container
  • One stores the metadata in the universe
  • One stores the metadata in a wrapper class around the universe; that wrapper class can be a Sim from MDSynthesis, or a custom class that mimick the MDSynthesis relevant APIs

I would like you to roughly implement the 3 scenarii in a jupyter notebook so we can see which one is the more practical, and to identify hidden traps on the way.

I cannot help with installing MDSynthesis right now, especially without more details. Hopefully, asking on the MDSynthesis channels (mailing list? issue tracker?) could help. In the mean time, you can write a class that mimicks the relevant parts of the Sim class. If we go for the wrapper option, we may need that class anyway as we probably do not want MDSynthesis as a hard dependancy.

Writing the prototype should not be very long. Ideally, the notebook would demonstrate how to access the universes, and the metadata in the 3 scenarii.

@kain88-de
Copy link
Member

About mdsynthesis please write what you did and your problems to the datreant mailing list. We can have a look over your problems there. As an alternative use conda there everything works.

@fiona-naughton
Copy link
Contributor Author

Ok, thanks. I'll start on that.

(I dug through my anaconda installation to find hdf5 + manually specified the location; MDSynthesis seems to have installed properly now)

@kain88-de
Copy link
Member

Please still write on the mailinglist we provide conda packages. It shouldn't have been a problem to install.

@dotsdl
Copy link
Member

dotsdl commented Jun 24, 2016

@kain88-de this is an aside, but we never merged instructions for installing mdsynthesis as a conda package. Do you mind writing a section on how to do this for the install docs? I think it really just boils down to conda install -c datreant mdsynthesis, but there might be caveats that you know better than I.

@jbarnoud
Copy link
Contributor

I played with MDSynthesis and went through the scipy paper. A Bundle seems to cover everything I described above but the handling of AuxReaders (make sense, they are brand new). Wether or not we end up using it, I will have a very serious look at MDSynthesis for my own simulations.

@fiona-naughton
Copy link
Contributor Author

Yes, it certainly looks very useful!
I was considering if it's worth adding an auxiliary option to the trajectory Reader, so you could do e.g. u = Universe(topol, traj, aux={'auxname': [auxdata, kwargs], ...}); that would let us deal with auxiliaries in a mdsynthesis.Sim through the universedef, but not sure if that's unnecessarily cluttering ?

@jbarnoud
Copy link
Contributor

I was considering if it's worth adding an auxiliary option to the trajectory Reader, so you could do e.g. u = Universe(topol, traj, aux={'auxname': [auxdata, kwargs], ...}); that would let us deal with auxiliaries in a mdsynthesis.Sim through the universedef, but not sure if that's unnecessarily cluttering ?

It looks interesting, indeed. That is definitively something to consider! I would wait to be sure that we need it before adding it, though. Adding things to Universe.__init__ starts to be a critical things as there are already so many things there.

@dotsdl
Copy link
Member

dotsdl commented Jun 27, 2016

@fiona-naughton one goal of MDSynthesis is to completely persist the required information to regenerate the state of a Universe, so happy to add mechanisms for storing information needed for auxiliaries attached to the Universe.

I'd focus on making auxiliaries work within a Universe standalone, but make it so that the information needed to initialize a Universe with those auxiliaries is simple and accessible. This is similar to the reason for making a Universe hold onto its init kwargs that we did earlier this year.

@dotsdl
Copy link
Member

dotsdl commented Jun 27, 2016

@jbarnoud, @fiona-naughton I think it would be a lot of work to mimic the API of an mdsynthesis.Sim because a lot of their behavior is defined in special methods that make them pretty Pythonic, and not a whole lot would be gained from re-engineering it. Something we might consider to make MDSynthesis a lighter dependency more generally would be to remove datreant.data as a requirement (does this make it less useful @kain88-de?); that would remove the need to install, e.g. pandas, h5py, pytables, HDF5 libs, etc., unless one wanted to use the Sim.data interface and friends.

I am of the (biased) opinion that MDSynthesis is probably one of the best answers to any problem pulling together data from multiple simulations. I'm happy to help with any effort that makes use of it toward something useful, and I think something as well-defined and focused as getting free energy differences from umbrella-sampling simulations should be a lot easier to pull together with it than without.

@jbarnoud
Copy link
Contributor

I'd focus on making auxiliaries work within a Universe standalone, but make it so that the information needed to initialize a Universe with those auxiliaries is simple and accessible.

@fiona-naughton suggestion would do just that 👍

This is similar to the reason for making a Universe hold onto its init kwargs that we did earlier this year.

Should the aux reader added with u.trajectory.add_aux add themselves to the universe kwargs (which would be very hacky, universe.kwargs is read only)? I am not a fan of it, but currently there is no way to recreate a universe with the same auxiliary fields. This can be an issue latter on if we spawn short-lived readers.

Something we might consider to make MDSynthesis a lighter dependency more generally would be to remove datreant.data as a requirement.

detreant.core does not look to scary as a dependency. My guess is that if a user wants to do funky stuff, he just needs to install detreant.data. But before we add a dependancy to MDAnalysis, it would be best to have some consensus on the matter, wouldn't it?

@dotsdl
Copy link
Member

dotsdl commented Jun 28, 2016

Should the aux reader added with u.trajectory.add_aux add themselves to the universe kwargs (which would be very hacky, universe.kwargs is read only)?

Not at all. Once auxiliaries are a thing in MDAnalysis, we can add in a mechanism for extracting this information from a Universe in MDSynthesis. It should not be added to kwargs since these aren't kwargs. I was just commenting that making Universes hold onto this information in a meaningful way (as we did with its kwargs) is necessary to make it possible to recreate them within a Sim.

detreant.core does not look to scary as a dependency. My guess is that if a user wants to do funky stuff, he just needs to install detreant.data.

At the moment MDSynthesis has both datreant.core and datreant.data as dependencies. This is out of convenience, since the idea is that MDSynthesis is less a "developer's package" and more a user's package. Basically, it comes "batteries included." What I'm suggesting is that we could drop datreant.data as a dependency for MDSynthesis and there would be no need for HDF5 at all. Installing MDSynthesis would pretty much be pure Python.

But before we add a dependancy to MDAnalysis, it would be best to have some consensus on the matter, wouldn't it?

Analysis modules are allowed to have any dependencies they want. So, if MDSynthesis is particularly useful here, no worries. My comment above was just a thought since a barrier to using it might be the HDF5 stuff, which is included as a convenience.

@fiona-naughton
Copy link
Contributor Author

I put up a gist over here with the Jupyter notebook I've been testing the different data-storing-options in. Being able to use Bundles definitely simplified several things, though the others seem to be working more or less the same for the stuff I've implemented at the moment. If nothing else, being able to read in Sims with various metadata already added would be very convenient (I guess so long as your naming is consistent). If we can have MDSynthesis without datreant.data as a requirement, that indeed sounds nice.

A thought on reloading Universes with Auxiliaries - when the auxiliary data is in a file it looks like it'd be relatively straightforward, but if we're allowing to input auxiliary data as arrays etc we'd need to save that out somewhere, so we'd need to have a format for that (which is where datreant.data would be useful...)

@jbarnoud
Copy link
Contributor

Good work with the prototypes @fiona-naughton. It seems to be a lot of knitting to have everything in sync, a knitting that is already quite well done by MDSynthesis.

One issue that I see is that you use sim.categories to store data. This is understandable as we suggested to have mdsynthesis.data optional. But it misuses detreant's semantics, and it would not fit regular mdsynthesis use. Not using mdsynthesis as a user would expect it to be used makes it less interesting to use mdsynthesis.

Would it be possible to have a degraded version of detreant.data that only deals with pure python values, and perhaps numpy arrays? This degraded Limb would get superseded by the proper detreant.data if installed. This way, mdsynthesis's Sims and detreant's Bundles can be used already for the simplest cases; if the user use cases become more complex, the user can move to the proper detreant.data seamlessly.

What do we need to use MDSynthesis? From what I see, we need a detreant Limb to describe the aux readers, and we need Universes to be able to list their attached auxiliary fields in a way that allows to regenerate them. Am I missing anything?

@fiona-naughton
Copy link
Contributor Author

@jbarnoud I was assuming data will mostly be metadata, like 'restraint constant' or 'restrained value', which I understood as the sort of thing Sim.categories was intended for? I guess in the sense that we're not likely to be actually categorising using these categories, only getting individual values to calculate with, I can see what you mean; but seems excesive to use datreant.data for single values. For data that's not a single value like that though, yes, categories is probably not ideal.

@kain88-de
Copy link
Member

For future reference I would like to use boolean expressions for selecting universes from a bundle. Like us = bundle.get('order-Param < 0.5') to get a bundle with all simulations where the order parameters is smaller then 0.5. This wil likely be to complicated for the first iteration of the Bundle but it would be nice if you keep things like that in mind.

@dotsdl
Copy link
Member

dotsdl commented Jun 29, 2016

@kain88-de perhaps that should be reserved for discussion in datreant proper, along the same lines as e.g. datreant/datreant#65 , but for categories. Probably overkill to include that here.

@dotsdl
Copy link
Member

dotsdl commented Jun 29, 2016

@jbarnoud, @fiona-naughton this is actually a perfectly acceptable use of categories. They are simply key-value pairs, and I use them all the time for storing parameters like these for easy grouping and filtering later.

The Data limb provided by datreant.data can store single values (it will pickle them), but that's more a convenience than anything. It will even be slower (probably) than accessing categories, since unpickling requires more time than deserializing the json state file.

@jbarnoud
Copy link
Contributor

@dotsdl Great then. No need for complicated shenanigans.

@fiona-naughton How can an Universe list the auxiliaries? Maybe the AuxReader should keep memory of its kwargs the same way as Universe does? Once we have that, we can consider adding a Limb to the Sim for the auxiliary fields and hope @dotsdl and @kain88-de will accept it.

With Sim handling the auxiliary fields, a Bundle covers everything we expected from a collection of Universes and we can move on to the next step.

@fiona-naughton
Copy link
Contributor Author

Getting the kwargs should be straightforward, either storing them or noting down the current values of the appropriate attributes (the latter might work better given say the represent_ts_as and cutoff options can currently be changed later, though this also means they're not 100% necessary to replicate exactly).

Again, for the data itself - if it's a file we can just keep the filename/path (which XVGReader currently doesn't do, but I can add it); if in the future we add AuxReaders for say ndarrays, we'd need to save the data somewhere, but we can leave that problem for later...

@jbarnoud
Copy link
Contributor

On 30/06/16 14:23, Fiona Naughton wrote:

Getting the kwargs should be straightforward, either storing them or
noting down the current values of the appropriate attributes (the
latter might work better given say the |represent_ts_as| and |cutoff|
options can currently be changed later, though this also means they're
not 100% necessary to replicate exactly).

Again, for the data itself - if it's a file we can just keep the
filename/path (which XVGReader currently doesn't do, but I can add
it); if in the future we add AuxReaders for say ndarrays, we'd need to
save the data somewhere, but we can leave that problem for later...


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#843 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABUWujxXYcTsD8zPiZAuUjbLorOGkxmjks5qQ7VVgaJpZM4IX_y4.

Let's get something that works. We can always improve it latter.

@fiona-naughton
Copy link
Contributor Author

I've added a get_description method to AuxReader and get_aux_descriptions to the trajectory reader in #868; they currently return dictionaries with the required kwargs etc for replicating an AuxReader, which can be passed straight to add_auxiliary to replicate the auxiliary in another trajectory. May not be the best way to go about it (and it'll presumably work a bit differently with MDSynthesis), but it seems to be working!

@jbarnoud
Copy link
Contributor

jbarnoud commented Jul 1, 2016

Good! Why would it work differently with MDSynthesis? What would be different?

@fiona-naughton
Copy link
Contributor Author

Not really different - I guess I was just thinking we might need to add something to get from how we'd store the kwargs etc in an auxiliary Limb to one big dictionary for add_auxiliary (and visa versa), but should be straightforward enough.

@fiona-naughton
Copy link
Contributor Author

Having disappeared for a while, I'm finally back working on this >.<

I've gone ahead using MDSynthesis, since it looks like the best option. @jbarnoud pointed out that we might as well just use the Bundles directly, so I've put up a WIP-PR #900 with a function that'll return a bundle given a set of Sims/Universes/trajectories (and add corresponding sets of auxiliary/meta data); as far as I can tell, all the things we wanted can be done directly with the returned Bundle.

So next step would be to add auxiliaries to MDSynthesis, which looks like we could add as a Limb; @dotsdl, any suggestions on how best to proceed?

@dotsdl
Copy link
Member

dotsdl commented Jul 18, 2016

@fiona-naughton I'll have a look at the auxiliaries API and get a sense for how best to persist the information in a Sim. We'll probably make it part of the Universe definition, and make it available from the existing universedef Limb.

I'm happy to take that on from the side of MDSynthesis; I don't think it'll be a complicated addition, so I'm in a good position to get it there quickly.

@orbeckst
Copy link
Member

See comments on #900 and #923 : this is better implemented as a separate small package.

Many thanks to @fiona-naughton who gave us the infrastructure (aux) to make this (and other cool things...) possible!

@HafizSaqib
Copy link

Dear All,
Can anybody guide me, how I select the restrain value for umbrella Sampling.
I want two find the simple reaction mechanism just like
Ch3-Br + Oh ---------> Ch3Oh + Br-
I have used a set of windows Example of one window file is below. But thats not working find I also tried it by changing the r1,r2,r3 & r4 values but the results are not good.
Could you please guide me how I select a good restrain value?

reaction coordinate d(C1-Br2) - d(O6-C1)

&rst
iat=1,2,6,1, rstwt=1,-1,,
r1=-10.00,r2=0.00,r3=0.00,r4=10.00,
rk2=100,rk3=100,
/
&rst
iat=1,2,6,1, rstwt=1,-1,
rk2=100,rk3=100,
/

Thanks
Hafiz

@orbeckst
Copy link
Member

@HafizSaqib – you better ask such a question on the mailing list of your MD code (e.g. AMBER http://archive.ambermd.org/ ). The issue tracker is discussing bugs and enhancement for the MDAnalysis library

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants