Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to YAML format for GromacsWrapper configs #49

Closed
dotsdl opened this issue Apr 8, 2016 · 12 comments
Closed

Switch to YAML format for GromacsWrapper configs #49

dotsdl opened this issue Apr 8, 2016 · 12 comments

Comments

@dotsdl
Copy link
Collaborator

dotsdl commented Apr 8, 2016

To make it possible to support multiple versions of gromacs installed on a single machine, as well as to make it easier to support custom user environments, we should switch from a INI-style config to a YAML configuration.

A portion of this config would go a long way in solving #48 and #26. For example, the schema for the tools available from various versions of gromacs could look like:

versions:
    4.6.5:
        serial:
            base: ""
            names:
                - grompp
                - trjconv
                - g_rms
        mpi:
            base: ""
            names:
                - mdrun_mpi

    5.1.1:
        serial:
            base: "gmx"
            names:
                - grompp
                - trjconv
                - rms
                - mdrun
        mpi:
            base: "gmx_mpi"
            names:
                - mdrun

We would then probably require specifying which version one plans to use on import. So, one can do:

import gromacs
gromacs.listconfig()             # could use interactively to list config keys available
gromacs.useconfig('4.6.5')  # this will build the classes from the tools found

This does not solve the problem where the same name is present multiple times within the same config, such as for mdrun in 5.1.1 above. At present the same class names are built regardless of the version used, so this would result in an exception.

Unless we build class names with clear namespace differences (such as using the base given in the config), we can't get around this problem. Although possible, having a library that changes its API depending on the config is probably not the best of ideas.

@dotsdl dotsdl self-assigned this Apr 8, 2016
@dotsdl
Copy link
Collaborator Author

dotsdl commented Apr 8, 2016

I may poke at this this weekend, since I'm gearing up to use GromacsWrapper extensively again.

@orbeckst
Copy link
Member

orbeckst commented Apr 8, 2016

As we just discussed, definitely a step in the right direction, I'd very much like to see something like this. It's going to be a bit of a hassle to convert existing cfg files but a user only has to do this once (and maybe we can even cook up a script to do this when the time comes).

For the name collisions I would just raise an error and have the user make the names unique within one version. (We can later think of other ways to make it work, e.g. allowing classes to carry "tags" around so that you can provide an additional kwarg to select the version of mdrun that you configured. there's still the problem that you already mentioned that the API is suddenly defined in the configuration file but on the other hand this is already true because the pre- and postfixes of Gromacs commands are fully user configurable....)

I really like the idea of doing

import gromacs

and not having any gromacs tools loaded (which currently incurs a significant performance penalty because each tool is run to check it's available and to extract its doc string). When import gromacs is cheap it's easier to use the other functionality such as the different fileformat parsers.

Ultimately, we could also look into looking for gromacs tools in the environment (say grompp), parse its version string, and then use it to heuristically find a suitable configuration entry, something like

gromacs.use('auto')

@whitead
Copy link
Collaborator

whitead commented Apr 8, 2016

Edited: I did not quite understand what you were getting at with the mixed version/mixed mpi/serial use case.

I don't really understand why you would want to have a mix of different gromacs versions, but isn't this already possible by changing which configuration file you load? Just having one for each Gromacs version? You can just do as @orbeckst suggested and just make the load file come after import.

I agree there is the issue of the API instability of GromacsWrapper tool names. They change based on which Gromacs version is used.

Just FYI, the jupyter projects have a similar set of issues with configurations, especially generating the initial default configurations. They use traitlets which I've been using in one of my own projects. It could be one option to get away from trying to have custom templates. It also supports aliasing traits (basically typed class properties).

@orbeckst
Copy link
Member

orbeckst commented Apr 8, 2016

tl;dr: Need to make a decision how much the user installation of Gromacs (names) should be reflected in GW tool names.

Let me comment on the points you edited out because they're good points:

I think the user should only have to specify the version, prefix command (gmx), suffix,

I forgot about single/double precision suffix. In 4.x this was suffixed to the tool name as in mdrun_d or grompp_d. How is this done in 5.x?

Only specifying the driver command and suffix be easier:

versions:
    4.6.5:
        serial:
            driver: ""
        serial_double:
            driver: ""
            suffix: "_d"
        mpi:
            driver: ""
            suffix: "_mpi"
       mpi_double:
            driver: ""
            suffix: "_d_mpi"

    5.1.1:
        serial:
            driver: "gmx"
        mpi:
            driver: "gmx_mpi"

The translation to class names for 4.x is straightforward, if I have g_something and g_something_d_mpi then they become tools.G_something and tools.G_something_d_mpi (nevermind that _mpi only makes sense for mdrun IIRC...).

However, I am then at a loss how we translate this to class names for 5.x: Is the driver command always gmx or gmx_suffix? If so, we could extract the suffix and generate tool classes in the same fashion as for 4.x: gmx something becomes tools.Something (and tools.G_something with #46) and gmx_mpi something would become tools.Something_mpi or gmx_d something would be tools.Something_d (and tools.G_something_d).

It makes your scripts dependent on your local installation and it would be nice to be able to say we're abstracting this at least a little bit. But on the other hand, GW's primary purpose is just to wrap the local tools... so maybe we should just do that and take anything else as a bonus?

and if to use a separate command for mdrun.

Some people want to use mdrun and mdrun_d in the same workflow, or perhaps mdrun and mdrun_mpi (I am just using the 4.x naming here) so we should be able to accommodate this.

We should have a list of tool names for each version and generate a configuration file that way.

You're right that we could just generate the base tool names for most versions of Gromacs:

  • for 4.6.x that's essentially ls ...gromacs/bin and
  • for 5.x gmx help commands.

I don't really understand why you would want to have a mix of different gromacs versions, but isn't this already possible by changing which configuration file you load? Just having one for each Gromacs version?

The discussion was not about mixing, say 4.6.5 with 5.1.1 but rather making, say, serial and mpi versions of the tools available in the same GW session. (Btw, currently it's really inconvenient to use different Gromacs versions because at the moment the only thing you can do is rename you ~/.gromacswrapper.cfg file... organizing the configurations in a more structured and hierarchical fashion will make it a lot easier to do such switches; the INI format is not hierarchical and makes it very clumsy to represent something like version -> 4.6.5 -> tools...)

@frchalaoux
Copy link

On my system, each Gromacs environnent is loaded with "module load gromacs-xxx", Therefrom I imagine I can move the rigth config (cfg, templates, scripts, ...) in my home. A config file per version could be enought but why not have multiple versions in cfg, Ultimately GW will read the right section corresponding to your gromacs-xxx !?

Juste one remark, on my system there is 2 gmx_mpi, one for the AVX processors and an another for the AVX2 processors

dotsdl added a commit that referenced this issue Apr 20, 2016
Since Gromacs 5.1.x removed the use of g_* methods, instead opting only
for the new ``gmx <command>`` style, this config features the necessary
bits to use the commands this way. This configuration will *not* work
for Gromacs 4.x or earlier. We will hopefully replace this config style
with one that works for multiple versions more flexibly with #49.

Removed ``extras``, as these tools are no longer maintained and it's not
clear that they will work as expected under this config.
@orbeckst orbeckst added this to the 0.6 milestone May 26, 2016
@orbeckst
Copy link
Member

orbeckst commented Jun 7, 2016

@pslacerda suggested that by default we should just be autodetecting the standard gromacs tools (#55 (comment)) if the user sourced GMXRC or if the GMXRC was provided in the cfg file.

This would be as simple as

  • For Gromacs 5: parse gmx help commands:

    import subprocess
    gmx = subprocess.Popen(['gmx', 'help', 'commands'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stderr, stdout = gmx.communicate()
    tools = [line.split()[0] for line in stderr.split('\n')
                 if (line.strip() and line[0] == " " and line[:22].strip())]
  • For Gromacs 4 look for executable files in the Gromacs BINDIR (but note that this can produce unintended results when Gromacs is installed into a standard bin directory... suddenly you have /usr/bin/* in your GromacsWrapper name space... not the end of the world but messy):

    import os, environ
    import glob
    tools = [os.path.basename(fpath) for fpath in glob.glob(os.path.join(os.environ['GMXBIN'], "*"))
                if os.path.isfile(fpath) and os.access(fpath, os.X_OK)]

@pslacerda
Copy link
Collaborator

Hi committers!

If we autodetect the tools we could reduce the size of config files and simplify the code. YAML is a neat and structured format but probably requires an external package. We can also use ini files as usual where the section titles indicate the nesting:

[Gromacs]
logfile = /path/to/logfile
GMXRC = /defaut/GMXRC

[Gromacs/5.0]
GMXRC = /usr/local/gromacs5/bin/GMXRC

[Gromacs/4.7]
GMXRC  = /usr/local/gromacs4/bin/GMXRC
extra  = /path/g_extra /path/g_other

Then 5.0 and 4.7 will override the defaults if the user choose them specifically. This same strategy can also be applied in YAML files as well. Also would be very nice to try to load the config file from the current and parent directory before attempt $HOME,

Now there is also an helper function for retrieving the output of a process.

import subprocess
stdout = subprocess.check_output(['gmx', 'help', 'commands'])

@orbeckst
Copy link
Member

orbeckst commented Jun 9, 2016

Installing pyyaml is easily installed together with everything else so I am not too worried. And yaml just has better support for logical data structures. I dislike parsing information from e.g. headers – rather I use the right tool for the job.

I like a simplified cfg file and doing the auto-detection by default.

@orbeckst
Copy link
Member

orbeckst commented Jun 9, 2016

Btw, if traitlets seem to work better as suggested by @whitead in #49 (comment) then we could try them instead of yaml. I just don't have any experience with them. What would a mock-up of a configuration look like with traitlets?

orbeckst added a commit that referenced this issue Jun 10, 2016
- fixes #63
- new helper run.get_double_or_single_prec_mdrun()
- manually tested with Gromacs 5.1.2: without gromacs.mdrun_d, the function
  just falls back to gromacs.mdrun
- Note that it is not really clear how we want to represent double precision
  with Gromacs 5.x (there is some discussion at #49) but right now I actually
  don't know how one *would* get a double prec 'gmx_d mdrun' in GW
@pslacerda
Copy link
Collaborator

Just another way, maybe multiple versions can be used as different groups and the user choose the appropriate one using the API, defaulting for the first:

  grp4.7           = grompp trjconv g_rms

  grp4.7mpi        = mdrun_mpi
  grp4.7mpi_base   = ; no needed
  grp4.7mpi_suffix = _mpi

  grp5.0      = grompp trjconv rms mdrun
  grp5.0_base = gmx

  grp5.0d      = grompp trjconv rms mdrun
  grp5.0d_base = gmx_d
  ; or  grp5.0d_base   = gmx
  ; and grp5.0d_suffix = _d

  grp5.0mpi      = mdrun
  grp5.0mpi_base = gmx_mpi
gromacs.use_tool_group('grp5.0mpi')

And the default can also be group = grp5.0mpi. Of course the same approach can be made using YAML instead of INI as you wrote before. Any way wee need to think about inquiring Gromacs 5 commands automatically

versions:
    5.1.1:
        serial:
            base: "gmx"
        mpi:
            base: "gmx_mpi"

, which I don't have equivalent in plain INI, except maybe:

  grp5.0 =
  grp5.0_base = gmx

But if wee put all standard Gromacs 4 commands inside a code list variable and just test for the presence of them we can also remove the need to enumerate them all in the configuration file, leaving it for custom commands.

@orbeckst
Copy link
Member

orbeckst commented Jul 11, 2016

I've been looking at Configurable objects with traitlets.config and this seems worthwhile thinking about. As I understand it at the moment, the cfg file would then be a Python script that just sets a bunch of attributes on one big Gromacs class, e.g.

c.Gromacs.paths.configdir = "~/.gromacswrapper"
c.Gromacs.paths.configfile = "~/.gromacswrapper.cfg"
c.Gromacs.paths.templatesdir = "${configdir}/templates"
# ...
c.Gromacs.release = "5.1.2"
c.Gromacs.GMXRC = "/usr/local/bin/GMXRC"
c.Gromacs.tools = ["gmx:mdrun", "gmx:grompp", "gmx:editconf", ...]
c.Gromacs.groups = ["tools"]
c.Gromacs.logging.filename = "gromacs.log"
c.Gromacs.logging.loglevel.console = "INFO"
c.Gromacs.logging.loglevel.file = "DEBUG"

This would configure a class named Gromacs that could be used as a base class for most other classes.

One could perhaps implement multiple releases with

c.Gromacs.releases['serial_5.1.2'].release = "5.1.2"
c.Gromacs.releases['serial_5.1.2'].GMXRC = "/opt/packages/gromacs/versions/5.1.2/serial"
c.Gromacs.releases['mpi_5.1.2'].release = "5.1.2"
c.Gromacs.releases['mpi_5.1.2'].GMXRC = "/opt/packages/gromacs/versions/5.1.2/mpi/gnu"
c.Gromacs.releases['serial_5.1.2'].release = "4.6.6"
c.Gromacs.releases['serial_5.1.2'].GMXRC = "/opt/packages/gromacs/versions/4.6.6/serial"

There are probably better ways to organize things...

By the way, the config file can also be JSON.

@orbeckst orbeckst modified the milestones: 0.6, 0.7 Sep 5, 2016
@orbeckst orbeckst removed this from the 0.6 milestone Sep 5, 2016
@orbeckst
Copy link
Member

This issue has become pretty low priority because since autodetecting works so well, the config file can be absent or very small.

I am just going to close this with a wont-fix until someone else resurrects it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants