Issue 363 - New Topology System #815

Merged
merged 457 commits into from Nov 7, 2016

Conversation

Projects
@dotsdl
Member

dotsdl commented Apr 4, 2016

Fixes #363

This is a WIP PR to serve as a focal point for discussion on the new topology system before it is merged into develop. The merge will happen soon after release of 0.15.0.

Changes made in this Pull Request:

  • WIP

TODO List

  • Get tests passing!
  • Implement guessers
  • Make flags work
  • Make moving residues work (in progress @richardjgowers)
  • For each Parser, check all fields are read and correctly named/assigned
  • For each Writer, check that no assumptions are made on the contents of Timesteps/AtomGroups
@richardjgowers

This comment has been minimized.

Show comment
Hide comment
@richardjgowers

richardjgowers Feb 6, 2016

Member

Looks good

Member

richardjgowers commented on 456b00d Feb 6, 2016

Looks good

dotsdl and others added some commits Feb 6, 2016

Added total_charge function to Charges TopologyAttr.
Somehow missed this one during the migration of old AtomGroup functions.
Fix DATA type parsing and add explicit bond types
**NOTE:** `atoms.bonds` is broken for atom groups without explicit bond typing
in this commit.

In order to write a LAMMPS.DATAWriter, I first need explicit typing of bonds,
angles, dihedrals, and impropers beyond the tupling of atoms types done now.
To hack bond types into the topology, I branched issue-363 and tried the
following:

1. Add optional `types` keyword to `topologyattrs.Bonds`. This is stored
in the class instance a `self._bondtypes`. A topology parser can pass `types`
to the Bonds topologyattr.
2. When `topologyattrs.Bonds` is called upon to generate a TopologyGroup,
these values are passed on via a new keyword in `TopologyGroup.__init__()`.
They are stored in the TopologyGroup instance as `self._bondtypes`.
3. I modified `TopologyGroup.__getitem__` to yield a Bond with attribute
`_bondtype`. If `_bondtype is not None` then it overrides the default behavior
of `bond.type`.
4. I modified `TopologyGroup()` calls wherever I could find them to pass on
`_bondtype` in the instantiation.

I found that the DATAParser in branch #363 is slightly broken, because
the atom types are saved as integers which breaks `select_atoms` (I think it
expects strings, and typing the types as strings on parsing fixes this issue).
It seems there is currently no test for atom selection from LAMMPS DATA files,
so this might have gone undetected.

An error is raised for `ag.bonds` for an atom group that doesn't have bonds,
but this seems to also be the case for branch issue-363 right now, so I didn't
try to fix this.

Two places are especially ugly. One is where I tried to hack `_bondtypes`
into the existing code to remove duplicate bonds. This is also where
the code seems to be broken for non-explicit bond types. In this case,
I set `_bondtypes` to be a numpy array full of `None` with `dtype=object`.
It seems that `unique_rows` fails for this case, although it works fine
when `_bondtypes.dtype == "|S1"`.
replace unique_row with np.unique and np records
The function `unique_row` was causing problems because bond type could
be None, and the containing numpy array had dtype==object. I replaced
`unique_rows` with `np.unique` by first converting the bond indices and
bond types into a numpy record, and then breaking this record back into
the bond indices and bond types.

I also simplified some of what I'd written in `topologyattrs.Bonds`.
Fix handling of empty TopologyGroups
If parser told universe to create empty TopologyGroup, the resulting
group was an attribute of atoms but gave IndexError on trying to access
it. Seems to work as expected now... creating TopologyGroup with 0
TopologyObjects
Merge pull request #772 from khuston/DATAWriter
Fix DATA type parsing and add explicit bond types
DATAWriter, velocity unit conversion, unit test
Writes selection at current trajectory frame to file, including sections
Atoms, Masses, Velocities, Bonds, Angles, Dihedrals, and Impropers (if
these are defined). Atoms section is written in the "full" sub-style if
charges are available or "molecular" sub-style if they are not.
Molecule id in atoms section is set to to 0.

No other sections are written to the DATA file.
As of this writing, other sections are not parsed into the topology
by the `DATAReader`.

If the selection includes a partial fragment, then the outputted DATA
file will be invalid, because it will describe bonds between atoms
which do not exist in the Atoms section.

By default the writer assumes "conventional" or "real" LAMMPS units
where length is measured in Angstroms and velocity is measured in
Angstroms per femtosecond. If other units are desired, they must be
specified.

Raises ValueError if atom types are not convertable to integers or if
atoms of the same type don't all have the same mass.

An AttributeError will be raised if the atom group doesn't have masses.

I added tests to make sure topology attributes of the written file
(after being read again) match the topology attributes read from
the original file. Checks types, bonds, positions, velocities, etc.

There was a problem with test data mini.data because it had atom type
3 which had undefined mass, so I changed the atom type to 1.

The DATAParser before was not converting velocity units, so I added
default units Angstroms/fs which are the "real" units in LAMMPS.
include only bonds etc. whose atoms are in atomgrp
I added an associated test and fixed part of a test that was passing in
an invalid way.
Made recommended style fixes
I also removed the part of the docstring describing the
not-yet-implemented feature because it would probably cause more
false hope than good.
Moved as many things out of AtomGroup and into GroupBase as reasonable.
We want ResidueGroups and SegmentGroups to behave roughly in the same
way as before, that is having methods they had when they were subclasses
of AtomGroup. Not everything makes sense, but most things do.

This was consensus from discussion in #703
(#703 (comment))
Moved residue angle selections to Atomnames.
I consider these very fragile since they depend on particular atom names
(however customary), but included them where they should go since they
depend on atom names. They will need to be refactored, though, since
they depend on getitem working for atom names, which is not something
we've included (so far) for any groups.
Added element topologyattr.
Probably useful for PDB files or any other topology file that gives an
element name.
Merge branch 'develop' into issue-363
Conflicts:
	package/MDAnalysis/coordinates/MOL2.py
	package/MDAnalysis/coordinates/PDB.py
	package/MDAnalysis/core/AtomGroup.py
	package/MDAnalysis/core/Selection.py
	package/MDAnalysis/core/Timeseries.py
	package/MDAnalysis/core/__init__.py
	package/MDAnalysis/core/topologyobjects.py
	package/MDAnalysis/lib/util.py
	package/MDAnalysis/topology/CRDParser.py
	package/MDAnalysis/topology/DLPolyParser.py
	package/MDAnalysis/topology/GMSParser.py
	package/MDAnalysis/topology/GROParser.py
	package/MDAnalysis/topology/HoomdXMLParser.py
	package/MDAnalysis/topology/PDBParser.py
	package/MDAnalysis/topology/PDBQTParser.py
	package/MDAnalysis/topology/PSFParser.py
	package/MDAnalysis/topology/PrimitivePDBParser.py
	package/MDAnalysis/topology/TOPParser.py
	package/MDAnalysis/topology/XYZParser.py
	package/MDAnalysis/topology/base.py
	package/MDAnalysis/topology/core.py
	package/MDAnalysis/topology/tpr/utils.py
	testsuite/MDAnalysisTests/test_atomselections.py
	testsuite/MDAnalysisTests/test_topology.py
	testsuite/MDAnalysisTests/test_util.py
	testsuite/MDAnalysisTests/topology/test_gro.py
	testsuite/MDAnalysisTests/topology/test_tprparser.py
Getting atomindices are resindices gives iterable same len as caller
This is to match decided behavior for accessing attributes at various
levels.

@dotsdl dotsdl added this to the Topology refactor milestone Apr 4, 2016

@orbeckst

This comment has been minimized.

Show comment
Hide comment
@orbeckst

orbeckst Apr 5, 2016

Member

OMG, this is really happening?

Member

orbeckst commented Apr 5, 2016

OMG, this is really happening?

@dotsdl

This comment has been minimized.

Show comment
Hide comment
@dotsdl

dotsdl Apr 5, 2016

Member

@orbeckst yes. :D

In contrast to #813, this will probably be the biggest PR you will ever merge.

Member

dotsdl commented Apr 5, 2016

@orbeckst yes. :D

In contrast to #813, this will probably be the biggest PR you will ever merge.

richardjgowers and others added some commits Apr 7, 2016

More tests for group slicing
Fixed boolean slicing with list
More tests for group slicing
Fixed slicing group with empty list
Fixed contains on groups

Added test for group getitem singular

Added tests for slicing with numpy integer array

More tests for group getitem

First tests for Group addition

Fixed Components eq comparison

Added tests for groups TypeError

Started tests for Group level transitions

Fixed downward level transitions in Groups

Added tests for up and down shifts in Groups

Added tests for Group.unique

Added component to group tests
Fix Merge for issue-363 branch (#781)
* Fix Merge for issue-363 branch

Rewrote mda.Merge to work for new topology style.

I also changed the default argument for attrs in core.Topology() beacuse
the default argument of None caused an error, whereas using an empty
list instead will return a blank Topology.

* fix atomgrps for same universe, enable sum

My first attempt at collecting atom groups for the same universe was
flawed if args contained more than one atom group from the same
universe. Hopefully this one is correct.

I also override __rsum__ on AtomGroups to allow use of sum.

* cleanup and describe which topologyattr get Merged

* correct Merge atom order, segind, add top tests

I changed my previously proposed mda.Merge to align more closely with
the old Merge behavior, in that it preserves the atom order as given to
Merge. Given the introduction of residue and segment indices (distinct
from ids), the behavior of this new Merge assigns separate residue and
segment indices to each argument of Merge. That is, if two arguments to
merge are atomgroups from the same universe with the same residue
indices, in the post-merge universe, they will have different residue
indices. If the user doesn't want this, s/he should add together the
atom groups before merging.

My understanding is that resid is just an attribute of a residue,
whereas the residue index is the unique identifier of that residue
instance. I didn't think it would make much sense for Merge to duplicate
atoms by giving them different indices while keeping their residue
indices the same, such that you have doubled or tripled (etc.) the
number of atoms in a residue or segment.

Before I was sending the per-atom segment indices to Topology instead of
the per-residue segment indices. Topology.__init__() or
TransTable.__init__() should probably check that the correct length
index arrays are passed on.

I added an attribute to the TopologyAttrs called `per_object` where the
value of this attribute is 'atom', 'residue', or 'segment' if the length
of the TopologyAttr value should equal n_atoms, n_residues, or
n_segments. The Universe method `_process_attr` now checks if the
TopologyAttr has `per_object` attribute, and raises a ValueError of the
length of the value array is not as expected.

Finally I updated `test_modelling.py`. This actually increased the
number of errors raised during testing by 2, because instead of stopping
at an error in the first few lines of the test file, it goes on and
errors occur for Capping and `atoms.write`.
Added tests for using Sum on Groups and Components
Fixed components not summing correctly
@richardjgowers

This comment has been minimized.

Show comment
Hide comment
@richardjgowers

richardjgowers Oct 20, 2016

Member

Still to do is to remove all references to core.AtomGroup.AtomGroup and core.AtomGroup.Universe in docs (and any example code). These have become core.groups.AtomGroup and core.universe.Universe. For backwards compatibility we can add a stub module core.AtomGroup which makes the namespace still work but raises a warning on import.

Member

richardjgowers commented Oct 20, 2016

Still to do is to remove all references to core.AtomGroup.AtomGroup and core.AtomGroup.Universe in docs (and any example code). These have become core.groups.AtomGroup and core.universe.Universe. For backwards compatibility we can add a stub module core.AtomGroup which makes the namespace still work but raises a warning on import.

kain88-de and others added some commits Oct 21, 2016

PEP8 changes to topologyattrs: Fixes #1045 (#1042)
* pep8 changes topologyattrs

* pep8 changes test file

* add license to test
Issue 363 pep8 (#1039)
* PEP8 Changes

Make code more PEP8 compliant

* remove 0.16 deprecated function

The time of deprecating this was lost in some merge.

* test PEP8 Changes

* test hash function

* insert future TODO
@kain88-de

This comment has been minimized.

Show comment
Hide comment
@kain88-de

kain88-de Oct 23, 2016

Member

@wouterboomsma Are the ensemble estimates really that sensitive. I would understand a rounding error changing the result by a few percent even 20-30% but more then doubling the result seems strange.

@wouterboomsma Are the ensemble estimates really that sensitive. I would understand a rounding error changing the result by a few percent even 20-30% but more then doubling the result seems strange.

This comment has been minimized.

Show comment
Hide comment
@wouterboomsma

wouterboomsma Oct 23, 2016

Contributor

@kain88-de The hes method is quite sensitive to small ensemble sizes. The problem is partly caused by the fact that I had to reduce the size of the trajectories we use in the tests because our tests were taking too long to complete. In the calculation of the hes method, we estimate a covariance matrix of the coordinates, and take its inverse, which is inherently a rather sensitive operation - and certainly so when using only 20 structures. As they are written now, the tests do not really reflect realistic use cases, because they would typically be run on much larger trajectories.

But I completely agree that it is annoying that the Encore tests will act-up every time anyone makes a change like this. I'll just try to find a way to make them more robust within the next few days.

Contributor

wouterboomsma replied Oct 23, 2016

@kain88-de The hes method is quite sensitive to small ensemble sizes. The problem is partly caused by the fact that I had to reduce the size of the trajectories we use in the tests because our tests were taking too long to complete. In the calculation of the hes method, we estimate a covariance matrix of the coordinates, and take its inverse, which is inherently a rather sensitive operation - and certainly so when using only 20 structures. As they are written now, the tests do not really reflect realistic use cases, because they would typically be run on much larger trajectories.

But I completely agree that it is annoying that the Encore tests will act-up every time anyone makes a change like this. I'll just try to find a way to make them more robust within the next few days.

This comment has been minimized.

Show comment
Hide comment
@kain88-de

kain88-de Oct 23, 2016

Member

OK yeah taking and inverse matrix is very sensitive. But you are taking a mass weighted covariance matrix?

But good to know that this will less likely happen when used with real data.

Member

kain88-de replied Oct 23, 2016

OK yeah taking and inverse matrix is very sensitive. But you are taking a mass weighted covariance matrix?

But good to know that this will less likely happen when used with real data.

This comment has been minimized.

Show comment
Hide comment
@wouterboomsma

wouterboomsma Oct 23, 2016

Contributor

Yeah, the default is mass_weighted, but this is an option.

Contributor

wouterboomsma replied Oct 23, 2016

Yeah, the default is mass_weighted, but this is an option.

This comment has been minimized.

Show comment
Hide comment
@wouterboomsma

wouterboomsma Nov 1, 2016

Contributor

@kain88-de, @richardjgowers, I've attempted to make this particular encore test more robust in #1060.

Contributor

wouterboomsma replied Nov 1, 2016

@kain88-de, @richardjgowers, I've attempted to make this particular encore test more robust in #1060.

richardjgowers and others added some commits Oct 23, 2016

Added tests for topology.guessers module (#1048)
* Added tests for topology.guessers module

* TST: More tests for test_guessers
Readded Segment.r# access to Residues
Readded tests for phi_selection and friends
Issue 363 align pa (#1051)
* fix bug and docs in rotaxis

If a == b this used to return [nan, nan, nan] due to the division by 0.

* pep8 changes test_transformations.py

* add rotaxis test

* add align_principal_axis test

* update changelog
@orbeckst

This comment has been minimized.

Show comment
Hide comment
@orbeckst

orbeckst Oct 25, 2016

Member

On 20 Oct, 2016, at 13:14, Richard Gowers notifications@github.com wrote:

Still to do is to remove all references to core.AtomGroup.AtomGroup and core.AtomGroup.Universe in docs (and any example code). These have become core.groups.AtomGroup and core.universe.Universe. For backwards compatibility we can add a stub module core.AtomGroup which makes the namespace still work but raises a warning on import.

Good idea, because it's been a common idiom to use AtomGroup.{AtomGroup,Universe}

Oliver Beckstein * orbeckst@gmx.net
skype: orbeckst * orbeckst@gmail.com

Member

orbeckst commented Oct 25, 2016

On 20 Oct, 2016, at 13:14, Richard Gowers notifications@github.com wrote:

Still to do is to remove all references to core.AtomGroup.AtomGroup and core.AtomGroup.Universe in docs (and any example code). These have become core.groups.AtomGroup and core.universe.Universe. For backwards compatibility we can add a stub module core.AtomGroup which makes the namespace still work but raises a warning on import.

Good idea, because it's been a common idiom to use AtomGroup.{AtomGroup,Universe}

Oliver Beckstein * orbeckst@gmx.net
skype: orbeckst * orbeckst@gmail.com

abiedermann pushed a commit to abiedermann/mdanalysis that referenced this pull request Oct 26, 2016

iQIcBAABCAAGBQJXj88NAAoJEObgNkBNX4V0xjkP/jJvd2bevjPTFHKOspDErTtm
 NIMUstA8lOQd9w5kL+PcYC3o+J7vh1/gjIyJsZ5iE9y9zQ3aYtzFQShPnCfrKYyl
 Bco/xr+U3bN9piXTZqhYKIRzVUs6SSd2uy7q63de8QDNsNWkKouKZ1+PhvZPkMNN
 i+PhBW/gp+PXeta+4Y5REnBrUpX4bW3DCHuKTJ+nM80PXmsMG0i64ShRX/umXnoG
 qpBfOGfnr40SD/ZFmmYc8qqZvllzPjew8GMGXXdjidetOZHaAkFRDMzOr3FNYkWD
 2zSKN95CJGDynSwdsDTo9rrUN5lcJr58JG+JeDBLoK7XxN5VVhPge+cV0gHF2SLk
 TQcgRXXY/AJY0ZpME0PRk7YKpUPqW/UjKPHENRFqNPlskF+8dzvFu2KNk928A+Ws
 4otsRWCzn76tzXsGAK66kXsX9l2mc3IxhDy9TH69GXhog7ASHlglJZ1kZTcPPR8T
 h7SfjW1Pfi7C/HL7RJ+shmmIQd4qvfn828JK/SoHud6dG7/wmfqYIiva7RB0usdt
 CK8y7an2XUGXjnvIv2C2CDUHKy0KgtdRw6wpwChkeVl9ci59cT0vHkOZqXgGSaBS
 L168OTkm0djbpjrMGj8vWe3lna1/Fxf+ELYOC/rUTPOYt7T0SuRoUHhcHhURCD/w
 vViD4Ic6nvs7OR+ODkyZ
 =eUcw
 -----END PGP SIGNATURE-----

adjusted HOLE open file descriptor test (#129) for Darwin

In order to provoke the failure described by issue #129 we artificially lower the open file descriptors.
On Mac OS X on travis this always leads to failing tests, as discussed under
MDAnalysis#901 (comment) ; this hack increases the allowed
open fds when we are running on Darwin. According to @jbarnoud, this problem is solved by #363 so once that
corresponding PR #815 is merged we should be able to revert this commit.

kain88-de and others added some commits Nov 1, 2016

WIP: Issue 363 tests (#1052)
* Add tests

* TST: Added tests for get_named_residue

* fix imports and pylint warning
Issue 363 pqrwrite (#1061)
* Finished checking PQRWriter

* MOL2Writer now cleanly doesn't write non mol2 sources

* Finished checking PDBQT
Merge branch 'develop' into issue-363
Conflicts:
	package/AUTHORS
	package/CHANGELOG
	testsuite/MDAnalysisTests/analysis/test_psa.py
@richardjgowers

This comment has been minimized.

Show comment
Hide comment
@richardjgowers

richardjgowers Nov 7, 2016

Member

@kain88-de sorry I missed yesterday, you can merge this when ready

Member

richardjgowers commented Nov 7, 2016

@kain88-de sorry I missed yesterday, you can merge this when ready

@kain88-de kain88-de merged commit 94f1d75 into develop Nov 7, 2016

2 of 3 checks passed

continuous-integration/travis-ci/push The Travis CI build is in progress
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.7%) to 86.483%
Details
@kain88-de

This comment has been minimized.

Show comment
Hide comment
@kain88-de

kain88-de Nov 7, 2016

Member

OK the new topology system is now in develop. Everyone has done a great job. We even managed to increase coverage with this massive PR.

Member

kain88-de commented Nov 7, 2016

OK the new topology system is now in develop. Everyone has done a great job. We even managed to increase coverage with this massive PR.

@jbarnoud

This comment has been minimized.

Show comment
Hide comment
@jbarnoud

jbarnoud Nov 7, 2016

Contributor

Wow! Congrats!

On 07-11-16 12:09, Max Linke wrote:

OK the new topology system is now in develop. Everyone has done a
great job. We even managed to increase coverage with this massive PR.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#815 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABUWukKjAMo2nzBt57gBsgP8cphUjKKCks5q7wbRgaJpZM4H_mdq.

Contributor

jbarnoud commented Nov 7, 2016

Wow! Congrats!

On 07-11-16 12:09, Max Linke wrote:

OK the new topology system is now in develop. Everyone has done a
great job. We even managed to increase coverage with this massive PR.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#815 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABUWukKjAMo2nzBt57gBsgP8cphUjKKCks5q7wbRgaJpZM4H_mdq.

@orbeckst orbeckst changed the title from WIP: Issue 363 - New Topology System to Issue 363 - New Topology System Nov 7, 2016

@orbeckst

This comment has been minimized.

Show comment
Hide comment
@orbeckst

orbeckst Nov 7, 2016

Member

Epic merge!

Special kudos to @richardjgowers, @dotsdl and @kain88-de. It's been almost 1 year since @richardjgowers visited ASU and started the topology overhaul together with @dotsdl. They got the basics done in about 1 month

Richard and I spent a good deal of time during his visit to Arizona working away at issue #363, and we are almost finished with it.

but it took another ~45 weeks to make it the new foundation for MDAnalysis. The 10/90 rule at work...

363_blackboard_draft_asu

Member

orbeckst commented Nov 7, 2016

Epic merge!

Special kudos to @richardjgowers, @dotsdl and @kain88-de. It's been almost 1 year since @richardjgowers visited ASU and started the topology overhaul together with @dotsdl. They got the basics done in about 1 month

Richard and I spent a good deal of time during his visit to Arizona working away at issue #363, and we are almost finished with it.

but it took another ~45 weeks to make it the new foundation for MDAnalysis. The 10/90 rule at work...

363_blackboard_draft_asu

@jdetle

This comment has been minimized.

Show comment
Hide comment
@jdetle

jdetle Nov 7, 2016

Contributor

Incredible!

Contributor

jdetle commented Nov 7, 2016

Incredible!

@dotsdl

This comment has been minimized.

Show comment
Hide comment
@dotsdl

dotsdl Nov 7, 2016

Member

Wooo! This is a big deal! Thanks everyone for the hard work on all this. It really was a great community effort, and really reflects well on the mettle of our dev team. I mean, look at that participant list on the right.

Looking forward to what we can make happen with the new topology system. This enables tons of new possibilities. :D

Member

dotsdl commented Nov 7, 2016

Wooo! This is a big deal! Thanks everyone for the hard work on all this. It really was a great community effort, and really reflects well on the mettle of our dev team. I mean, look at that participant list on the right.

Looking forward to what we can make happen with the new topology system. This enables tons of new possibilities. :D

abiedermann pushed a commit to abiedermann/mdanalysis that referenced this pull request Jan 5, 2017

Merge pull request #815 from MDAnalysis/issue-363
WIP: Issue 363 - New Topology System
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment