Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move from Atom based topology to array based #363

Closed
richardjgowers opened this issue Jul 22, 2015 · 32 comments
Closed

Move from Atom based topology to array based #363

richardjgowers opened this issue Jul 22, 2015 · 32 comments

Comments

@richardjgowers
Copy link
Member

This is something @dotsdl and I have been tinkering with. Basically, it's changing the AtomGroup class to be a thin wrapper around a numpy array (rather than a list of Atoms).

The latest benchmark is here:
https://github.com/richardjgowers/atomgroup_olympics/blob/master/types2.ipynb
Dev list discussion:
https://groups.google.com/forum/#!topic/mdnalysis-devel/aHSB-ryCEQ0

We're seeing about 10-30x speedup for most common operations, so I think we can start to think about implementing this. I think this Issue will act as an anchor for sub issues.

I think first thing to do is to make sure that tests currently cover everything, so we're confident that we aren't changing behaviour as we go.

@orbeckst
Copy link
Member

+1 (and we probably want this in a pre-1.0 milestone — either 0.12 or a 0.13 — just to make sure that we don't bake something into 1.0 that we haven't fully explored for a while)

@orbeckst
Copy link
Member

And more test coverage like #362 is definitely necessary before embarking on this in earnest.
We will also need tests that check in more detail what happens when AtomGroups are changed and how changes propagate.

@kain88-de
Copy link
Member

After having to add a new property to the Atom and AtomGroup class I can only say you should make the AtomGroup the base class for everything. I noticed this funny thing that I read a whole array in PrimitivePDBReader and make that accessible via the 'TimeStepobject to theAtomclass. Now remember atom can potentially access all the occupancy from all other atoms in the universe. But the code to access array inAtomGroup` is

@property
def some_metadata(self):
    return np.array([atom.metadata for atom in self.atoms])

This idiomatic code (when Atom it the base of the Topology) has to be slow by design. I iterate over a class container to iterate over an array just to build a shallow copy of said array and return it.

@property
def some_metadata(self):
        return self.ts.data['metadata'][self.atom_idx]

This should result in faster code by design in the end. We don't make hops between array in container classes. It gets much easier to set things correctly at the AtomGroup level. And it helps with the implicit assumption in the Readers that we work on whole frames.

@richardjgowers
Copy link
Member Author

Yep, you've pretty much hit the nail on the head for why we're changing this. Plus once we've got everything in arrays it's statically typed.....

@dotsdl dotsdl modified the milestones: Topology refactor, 0.13 Dec 1, 2015
jbarnoud added a commit to jbarnoud/mdanalysis that referenced this issue Dec 13, 2015
See MDAnalysis#572

Some other changes where also done:
* `MDAnalysis/core/__init__.py` import `selection` rather than
  `Selection` following the module renaming in commit
  78abe4e
* Add messages when some assertions fail in
  `MDAnalysisTests/topology/base.py`
jbarnoud added a commit to jbarnoud/mdanalysis that referenced this issue Dec 13, 2015
See MDAnalysis#572

Some other changes where also done:
* `MDAnalysis/core/__init__.py` import `selection` rather than
  `Selection` following the module renaming in commit
  78abe4e
* Add messages when some assertions fail in
  `MDAnalysisTests/topology/base.py`
richardjgowers added a commit that referenced this issue Dec 15, 2015
Adapt TPR parser as required by #363
abiedermann pushed a commit to abiedermann/mdanalysis that referenced this issue Oct 26, 2016
 NIMUstA8lOQd9w5kL+PcYC3o+J7vh1/gjIyJsZ5iE9y9zQ3aYtzFQShPnCfrKYyl
 Bco/xr+U3bN9piXTZqhYKIRzVUs6SSd2uy7q63de8QDNsNWkKouKZ1+PhvZPkMNN
 i+PhBW/gp+PXeta+4Y5REnBrUpX4bW3DCHuKTJ+nM80PXmsMG0i64ShRX/umXnoG
 qpBfOGfnr40SD/ZFmmYc8qqZvllzPjew8GMGXXdjidetOZHaAkFRDMzOr3FNYkWD
 2zSKN95CJGDynSwdsDTo9rrUN5lcJr58JG+JeDBLoK7XxN5VVhPge+cV0gHF2SLk
 TQcgRXXY/AJY0ZpME0PRk7YKpUPqW/UjKPHENRFqNPlskF+8dzvFu2KNk928A+Ws
 4otsRWCzn76tzXsGAK66kXsX9l2mc3IxhDy9TH69GXhog7ASHlglJZ1kZTcPPR8T
 h7SfjW1Pfi7C/HL7RJ+shmmIQd4qvfn828JK/SoHud6dG7/wmfqYIiva7RB0usdt
 CK8y7an2XUGXjnvIv2C2CDUHKy0KgtdRw6wpwChkeVl9ci59cT0vHkOZqXgGSaBS
 L168OTkm0djbpjrMGj8vWe3lna1/Fxf+ELYOC/rUTPOYt7T0SuRoUHhcHhURCD/w
 vViD4Ic6nvs7OR+ODkyZ
 =eUcw
 -----END PGP SIGNATURE-----

adjusted HOLE open file descriptor test (MDAnalysis#129) for Darwin

In order to provoke the failure described by issue MDAnalysis#129 we artificially lower the open file descriptors.
On Mac OS X on travis this always leads to failing tests, as discussed under
MDAnalysis#901 (comment) ; this hack increases the allowed
open fds when we are running on Darwin. According to @jbarnoud, this problem is solved by MDAnalysis#363 so once that
corresponding PR MDAnalysis#815 is merged we should be able to revert this commit.
orbeckst added a commit that referenced this issue Nov 18, 2016
- added rst stubs for sphinx
- updated rst docs (added links, reformatted, fixed, found out that
  Napoleon REQUIRES two lines before .. versionchanged/versionadded)
- removed AtomGroup.rst

TODO:
- explain topology system and clean up core_modules.rst (work
  for @richardjgowers and @dotsdl)
orbeckst added a commit that referenced this issue Nov 18, 2016
find . -name '*.rst' -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.Universe)(MDAnalysis.core.universe.Universe)g'
find . -name '*.py' -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.Universe)(MDAnalysis.core.universe.Universe)g'
find . -name '*.rst' -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.AtomGroup)(MDAnalysis.core.groups.AtomGroup)g'
find . -name '*.py' -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.AtomGroup)(MDAnalysis.core.groups.AtomGroup)g'
orbeckst added a commit that referenced this issue Nov 18, 2016
- AtomGroup.Atom --> groups.Atom
- AtomGroup.Merge --> universe.Merge
- AtomGroup.ResidueGroup --> groups.ResidueGroup (no SegmentGroup needed to be fixed, no Residue or Segment found)
- core.Selection --> core.selection

find . \( -name '*.py' -o -name '*.rst' \) -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.Atom)(MDAnalysis.core.groups.Atom)g'
find . \( -name '*.py' -o -name '*.rst' \) -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.Merge)(MDAnalysis.core.universe.Merge)g'
find . \( -name '*.py' -o -name '*.rst' \) -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.ResidueGroup)(MDAnalysis.core.universe.ResidueGroup)g'
orbeckst added a commit that referenced this issue Nov 18, 2016
- broken reST links
- removed some of the obviously old outdated topology discussions (but
  still need proper discussion of #363 changes)
- added more cross references and SeeAlso
- even more fixes everywhere
orbeckst added a commit that referenced this issue Nov 18, 2016
- added rst stubs for sphinx
- updated rst docs (added links, reformatted, fixed, found out that
  Napoleon REQUIRES two lines before .. versionchanged/versionadded)
- removed AtomGroup.rst

TODO:
- explain topology system and clean up core_modules.rst (work
  for @richardjgowers and @dotsdl)
orbeckst added a commit that referenced this issue Nov 18, 2016
find . -name '*.rst' -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.Universe)(MDAnalysis.core.universe.Universe)g'
find . -name '*.py' -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.Universe)(MDAnalysis.core.universe.Universe)g'
find . -name '*.rst' -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.AtomGroup)(MDAnalysis.core.groups.AtomGroup)g'
find . -name '*.py' -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.AtomGroup)(MDAnalysis.core.groups.AtomGroup)g'
orbeckst added a commit that referenced this issue Nov 18, 2016
- AtomGroup.Atom --> groups.Atom
- AtomGroup.Merge --> universe.Merge
- AtomGroup.ResidueGroup --> groups.ResidueGroup (no SegmentGroup needed to be fixed, no Residue or Segment found)
- core.Selection --> core.selection

find . \( -name '*.py' -o -name '*.rst' \) -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.Atom)(MDAnalysis.core.groups.Atom)g'
find . \( -name '*.py' -o -name '*.rst' \) -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.Merge)(MDAnalysis.core.universe.Merge)g'
find . \( -name '*.py' -o -name '*.rst' \) -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.ResidueGroup)(MDAnalysis.core.universe.ResidueGroup)g'
orbeckst added a commit that referenced this issue Nov 18, 2016
- broken reST links
- removed some of the obviously old outdated topology discussions (but
  still need proper discussion of #363 changes)
- added more cross references and SeeAlso
- even more fixes everywhere
@orbeckst orbeckst mentioned this issue Nov 18, 2016
2 tasks
orbeckst added a commit that referenced this issue Nov 19, 2016
- added missing docs for new topology system (#363)
- removed some of the obviously old outdated topology discussions (but  still need proper discussion of #363 changes – see issue #1078 )
- added rst  sphinx stubs for new modules in core
- updated rst docs (added links, reformatted, fixed, found out that
  Napoleon REQUIRES two lines before .. versionchanged/versionadded)
- removed AtomGroup.rst
- restructured core_modules
- fixed Universe and AtomGroup reST references (core changes in #363)

    find . -name '*.rst' -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.Universe (MDAnalysis.core.universe.Universe)g'
    find . -name '*.py' -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.Universe)(MDAnalysis.core.universe.Universe)g'
    find . -name '*.rst' -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.AtomGroup)(MDAnalysis.core.groups.AtomGroup)g'
    find . -name '*.py' -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.AtomGroup)(MDAnalysis.core.groups.AtomGroup)g'

- fixed reST links for other changes in core (#363)
   - AtomGroup.Atom --> groups.Atom
   - AtomGroup.Merge --> universe.Merge
   - AtomGroup.ResidueGroup --> groups.ResidueGroup (no SegmentGroup needed to be fixed, no Residue or Segment found)
   - core.Selection --> core.selection

   find . \( -name '*.py' -o -name '*.rst' \) -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.Atom)(MDAnalysis.core.groups.Atom)g'
   find . \( -name '*.py' -o -name '*.rst' \) -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.Merge)(MDAnalysis.core.universe.Merge)g'
   find . \( -name '*.py' -o -name '*.rst' \) -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.ResidueGroup)(MDAnalysis.core.universe.ResidueGroup)g'

- many more doc fixes everywhere
   - broken reST links
   - added more cross references and SeeAlso
abiedermann pushed a commit to abiedermann/mdanalysis that referenced this issue Jan 5, 2017
- added missing docs for new topology system (MDAnalysis#363)
- removed some of the obviously old outdated topology discussions (but  still need proper discussion of MDAnalysis#363 changes – see issue MDAnalysis#1078 )
- added rst  sphinx stubs for new modules in core
- updated rst docs (added links, reformatted, fixed, found out that
  Napoleon REQUIRES two lines before .. versionchanged/versionadded)
- removed AtomGroup.rst
- restructured core_modules
- fixed Universe and AtomGroup reST references (core changes in MDAnalysis#363)

    find . -name '*.rst' -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.Universe (MDAnalysis.core.universe.Universe)g'
    find . -name '*.py' -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.Universe)(MDAnalysis.core.universe.Universe)g'
    find . -name '*.rst' -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.AtomGroup)(MDAnalysis.core.groups.AtomGroup)g'
    find . -name '*.py' -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.AtomGroup)(MDAnalysis.core.groups.AtomGroup)g'

- fixed reST links for other changes in core (MDAnalysis#363)
   - AtomGroup.Atom --> groups.Atom
   - AtomGroup.Merge --> universe.Merge
   - AtomGroup.ResidueGroup --> groups.ResidueGroup (no SegmentGroup needed to be fixed, no Residue or Segment found)
   - core.Selection --> core.selection

   find . \( -name '*.py' -o -name '*.rst' \) -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.Atom)(MDAnalysis.core.groups.Atom)g'
   find . \( -name '*.py' -o -name '*.rst' \) -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.Merge)(MDAnalysis.core.universe.Merge)g'
   find . \( -name '*.py' -o -name '*.rst' \) -print0 | xargs -0 perl -wpi~ -e 's(MDAnalysis\.core\.AtomGroup\.ResidueGroup)(MDAnalysis.core.universe.ResidueGroup)g'

- many more doc fixes everywhere
   - broken reST links
   - added more cross references and SeeAlso
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants