Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inclusion of ENCORE into MDAnalysis #797

Merged
merged 124 commits into from Oct 12, 2016

Conversation

mtiberti
Copy link
Contributor

@mtiberti mtiberti commented Mar 23, 2016

Fixes #

Changes made in this Pull Request:

  • Added ArrayReader trajectory reader
  • Added encore module under MDAnalysis.analysis

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

mtiberti and others added 30 commits January 26, 2016 15:41
        - added header to similarity
		- Updated header to Ensemble
        - added examples for
			- hes()
			- ces()
			- dres()
			- Ensemble
        - added docstrings for hes(),ces(),dres()
		- updated docstrings to follow numpy style doc
		- updated codestyle to follow PEP8
    - updated examples to work
    - added description of the stocastic nature of dres
    - added note of stocatic nature of dres
    - changed return values header back to numpy array
    - changed returns back to numpy array
    - added note about None in return array
…w passed to the individual methods instead.

Removed coordinates attributes from Ensemble object. This is now delegated to the new ArrayReader trajectory reader
Added ArrayReader trajectory reader for reading from a numpy array through the standard trajectory interface
Fixed minor issues in ArrayReader
Updated documentation strings in all Encore files.
@coveralls
Copy link

Coverage Status

Coverage decreased (-3.2%) to 81.193% when pulling 43c2159 on encore-similarity:feature-encore into 396e040 on MDAnalysis:develop.

@wouterboomsma
Copy link
Contributor

With #976 merged, I'll be returning my attention to this PR. We'll of course start by merging in the newest changes, which should simplify things a bit, now that the MemoryReader stuff is gone. However, it would be good to have a discussion about how best to proceed from here. Given the success of splitting out the MemoryReader into a separate PR, would it make sense to try and do the same for the clustering and dimensionality reduction code? This would only make sense if there is general consensus to move analysis.encore.clustering and analysis.encore.dimensionality_reduction into separate analysis components (so becoming analysis.clustering and analysis.dimensionality_reduction). I see that in the time it took us to finish this pull request, someone has contributed a PCA module. These efforts should of course be merged somehow if we follow this path. Any thoughts on this?

yield (i, j)


def merge_universes(universes):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this concatenating the trajectories of the Universes? Could do with more docs on how it's combining different Universes

@@ -0,0 +1,257 @@
# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this is showing up in this PR after we merged this in another PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sorry. As I mentioned in the comment right before yours, we were still in the process of merging in the newest changes, which is why MemoryReader still appeared. This is done now - so all MemoryReader changes should now be gone.

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a few better error messages to avoid headaches later on.

`scalar` : float
Scalar to multiply with.
"""
newMatrix = TriangularMatrix(self.size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't this just be self._elements *= scalar, return self?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments, I'm almost done addressing them. About this one, if we return self as you suggest, cases in which the returned value is assigned to another variable (as in tm2 = tm1 * scalar) would result in unexpected behavior, as both the assigned variable (tm2) and the array that is being multiplied (tm1) would be affected. The solution you're proposing could be more fit for __imul__ (and of course for __iadd__). Any thought on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, it does need to return a new object, you're right - I'm thinking of everything as an __iadd__.
If you want to be super safe you could write this as newMatrix = self.__class__(self.size) which might save someone a headache if they ever subclass this.

return newMatrix

__rmul__ = __mul__

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add __add__ too?

import MDAnalysis as mda
from ...coordinates.memory import MemoryReader

class TriangularMatrix(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good candidate for lib.util

self.metadata = {}
self.elements = elem_list
if centroid not in self.elements:
raise LookupError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs an error message, even just "Centroid not in elem_list"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 232a0a5

if metadata:
for name, data in six.iteritems(metadata):
if len(data) != self.size:
raise TypeError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 232a0a5


def add_metadata(self, name, data):
if len(data) != self.size:
raise TypeError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 232a0a5

return

if not len(set(map(type, elements))) == 1:
raise TypeError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 232a0a5

centroids = np.unique(elements_array)
for i in centroids:
if elements[i] != i:
raise AssertionError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertion error isn't correct here, something like ValueError. Also needs a message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 232a0a5

@orbeckst
Copy link
Member

Given the success of splitting out the MemoryReader into a separate PR #976, would it make sense to try and do the same for the clustering and dimensionality reduction code? This would only make sense if there is general consensus to move analysis.encore.clustering and analysis.encore.dimensionality_reduction into separate analysis components (so becoming analysis.clustering and analysis.dimensionality_reduction). I see that in the time it took us to finish this pull request, someone has contributed a PCA module. These efforts should of course be merged somehow if we follow this path. Any thoughts on this?

In general I like the idea of moving generally usable components into separate modules. Perhaps clustering could become lib.clustering at some point. Similarly, I like "dimensionality_reduction" as an analysis module.

My impression of this PR, however, is that there's not much else that needs to be done (correct me if I am wrong). In this case I'd rather move towards getting it merged and then we take a breather and think about refactoring. (We also want to see how everything plays with #363, that will be highest priority.)

My opinion is: move ahead with ENCORE as a fully functioning unit and then we take stock and see how we can get most leverage from your code.

Other opinions welcome!

@wouterboomsma
Copy link
Contributor

@orbeckst, thanks for the input. I wasn't aware of these more big-picture changes that are on the way. It certainly makes sense to wait a bit and see where things land. We'll try to resolve the suggested changes as soon as possible, so we can finish this PR.

- Updated the docstrings.
- Added examples to the functions dres, ces to show how to use different
  clustering/dimensional reductions methods and different parameters at ones.
- Added examples to the functions dres_convergence, ces_convergence
…/mdanalysis into feature-encore

merged with Matteos latest changes
@richardjgowers
Copy link
Member

@wouterboomsma @mtiberti I think this is finished right? Are you both happy for it to merge now?

@wouterboomsma
Copy link
Contributor

@richardjgowers Sorry for the slow response. I'm completely buried in teaching these days. Yes, we are happy with the current state of this PR, and would love to see it merged.

@richardjgowers richardjgowers merged commit fc24005 into MDAnalysis:develop Oct 12, 2016
@jbarnoud
Copy link
Contributor

Youhou! That one was a big piece. Congrats! 🎉

@wouterboomsma
Copy link
Contributor

Wonderful! Thank you all so much for all the time and effort you put into looking through and commenting on our code. It's been a long, but very constructive journey :)

@orbeckst
Copy link
Member

orbeckst commented Oct 13, 2016

Thank you everyone, that was a great effort from both the ENCORE developers and the tireless code reviewers. @mtiberti , @tbengtsen, and @wouterboomsma I really appreciate your tenacity in making it work and contributing a lot of cool new stuff to the project.

Hooray!

@mtiberti
Copy link
Contributor Author

Great news!! Thanks a lot all of you for the amount of work that was put into this! It has been quite a journey, but we undoubtedly made ENCORE more robust and easy to use, especially now that is deeply integrated within the framework of MDAnalysis. Thanks again :) 👍

abiedermann pushed a commit to abiedermann/mdanalysis that referenced this pull request Jan 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants