Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Issue 785 add auxiliary #868

Merged
merged 21 commits into from
Aug 17, 2016

Conversation

fiona-naughton
Copy link
Contributor

@fiona-naughton fiona-naughton commented Jun 3, 2016

WIP for #785.

Currently, allows adding data from an xvg file to a trajectory by add_auxiliary(filename, name, **kwargs). When reading the next trajectory timestep, the auxiliary timesteps closest to that trajectory timestep are read through, and a 'representative' value calculated from these and stored in the timestep in an aux namespace, as ts.aux.name. The 'representative' value can be either an average or the value from the closest timepoint (with an optional cutoff).

The XVGReader itself also has go_to_step and go_to_ts methods for jumping to specified auxiliary steps/trajectory timesteps; the latter's used to bring the auxiliary to the same time as the trajectory when it's first added, but I've not hooked it up with the equivalent trajectory method yet.

There's a couple things that seem to work but I'm not sure I've gone about the right way (particularly the go_to methods; they currently just start reading form the top of the file and wait till the times match up...); apologies if I'm doing anything particularly silly!

TODO

  • Expand to other filetypes + implement a guess_reader
  • Allow to select columns/name individually when reading multi-column data?
  • Dealing with 'missing data'?
  • Allow to iterate with auxiliary dt, rather than trajectory's
  • Proper testing!

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@@ -216,6 +224,9 @@ def __init__(self, n_atoms, **kwargs):
self.has_forces = kwargs.get('forces', False)

self._unitcell = self._init_unitcell()

# set up aux namespace for adding auxiliary data
self.aux = Namespace()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just replace this with a simple dictionary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on how we want to access the auxiliary data. In all the discussions we assumed something like ts.aux.force; this requires a class as what @fiona-naughton used. ts.aux['force']is fine by me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually would prefer the first ts.aux.force because I can TAB complete very easily in an interactive session. (yeah I'm lazy)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the Namespace solution. (tab-completion plus I'm lazy to type square brackets)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I like it with the Namespace; it doesn't make much difference, but it's that little bit easier to type! (and it's nice to be lazy sometimes >_< )

@richardjgowers
Copy link
Member

Looks good. With the xvgreader, you can probably get away with reading the entire file at once, then seeking is just finding places in the numpy array that stores it.

That said, all that is just an implementation detail, get the tests up and running as they will mostly stay the same as you keep working on this. These can (and should) be very simple, eg you could have an xvg file with just [1, 2, 3] in it, then check that reading this gives you 1, 2, 3 again.

## the auxreader is the first step 'belonging' of the next ts...

def reset_ts(self, ts):
self.ts_data = np.array([])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you're passing in ts and not using it?

@mnmelo
Copy link
Member

mnmelo commented Jun 6, 2016

I just added a comment/question in #785 to expand on @richardjgowers' idea of reading data all at once.

I raised this also because the current purpose of the AuxReader base class is a bit unclear to me. It seems to be a file-agnostic class, suitable for a read-all-at-once model. However, it has references to self.reopen(), which point to it being more of an abstract class for always-read-from-file subclasses.

@richardjgowers
Copy link
Member

Ok so now the PR you did has gone through, you need to update your local version of develop (fetch and pull) and then merge develop into this branch to get the changes you recently made into this branch. You might get a merge conflict on files that changed in each branch, you just need to go over it carefully maybe using something like meld or diffuse.

@kain88-de
Copy link
Member

merge develop into this branch to get the changes you recently made into this branch

Please to a rebase instead of a merge. It keeps a cleaner history and it is easier to spot when the git-merge conflict solver does unintended things (removed code can reappear or added code can be deleted by accident!)

@fiona-naughton
Copy link
Contributor Author

Ah, just merged it sorry (but doesn't look like it even worked completely... I'm using GitHub desktop at the moment, and not entirely sure what it's done?). Might leave it to figure out tomorrow when I'm back at work and not on Windows...

@jbarnoud
Copy link
Contributor

jbarnoud commented Jun 9, 2016

It seems that you broke something with Timesteps. Multiple tests fail with the following traceback:

======================================================================
ERROR: MDAnalysisTests.coordinates.test_dlpoly.TestDLPolyHistoryMinimal.test_getting
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/MDAnalysis/mdanalysis/miniconda/envs/pyenv/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/travis/build/MDAnalysis/mdanalysis/miniconda/envs/pyenv/lib/python2.7/site-packages/MDAnalysisTests/coordinates/test_dlpoly.py", line 123, in test_getting
    assert_equal(ts.frame, 1)
AttributeError: 'NoneType' object has no attribute 'frame'

Since different readers are broken in the same way, the breakage most likely happened in the ProtoReader or the base Reader class that get to update ts.frame wrongly. I do not see which of your changes could cause that, though.

class Namespace(object):
# set up a basic class so we can make an 'aux' namespace in Timestep
# for auxiliary data; might be better defined elsewhere...?
# There's probably a better way to do this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to me as the good way and the good place to do this, as there seems to be a consensus around the namespace vs the dictionary. The class can be moved latter to .lib.util if it get duplicated.

@coveralls
Copy link

coveralls commented Jun 9, 2016

Coverage Status

Coverage decreased (-0.8%) to 79.501% when pulling de25b17 on fiona-naughton:issue-785-add-auxiliary into debae2b on MDAnalysis:develop.

@fiona-naughton
Copy link
Contributor Author

I've made a bunch of changes, largely to make the matching of auxiliary steps to trajectory timesteps a little nicer (I think) and hopefully clearer. There's now a step_to_frame(i, ts) method to return the frame number an auxiliary step i will be considered part of; I'm now using this for the current/previous/next step number to check both if we're starting at the right spot and whether to keep reading to the next step while reading a timestep (in place of step_in_ts and first_in_ts). It also means the current auxiliary step after reading a timestep will be the last step in that timestep, rather than the first in the next one. Hopefully that makes more sense!

(The failed tests were due to my forgetting to add a return line; it's fixed now. I'll finish properly writing up some tests for the new stuff and hopefully add that soon)

When reading in a trajectory, auxiliary data steps are 'assigned' the
closest trajectory timestep.

Attributes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be:

Attributes
==========

to follow numy doctring guidelines.

fiona-naughton and others added 20 commits August 16, 2016 17:26
Optimise perf from use of range in auxiliaries
fix step_to_frame when trajectory starts after 0
…ux take into account cutoff

tidy docs + tests
@fiona-naughton
Copy link
Contributor Author

Rebased + had a go at squashing some of the commits - hopefully nothing went disastrously wrong and it still makes sense!

@coveralls
Copy link

coveralls commented Aug 16, 2016

Coverage Status

Changes Unknown when pulling a9e067d on fiona-naughton:issue-785-add-auxiliary into * on MDAnalysis:develop*.

@jbarnoud jbarnoud merged commit 06c3e14 into MDAnalysis:develop Aug 17, 2016
@jbarnoud
Copy link
Contributor

Youhou! It is finally merged! Congrats!

@jbarnoud jbarnoud self-assigned this Aug 17, 2016
@orbeckst
Copy link
Member

@fiona-naughton congratulations! Thank you for this exciting new feature and your participation in GSoC! I hope we see more of you, even after the end of GSoC.

abiedermann pushed a commit to abiedermann/mdanalysis that referenced this pull request Jan 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants