Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ensemble_wrapper decorator function #199

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

ALescoulie
Copy link
Contributor

@ALescoulie ALescoulie commented Sep 24, 2021

ensemble_wrapper decorator

To simplify the process of extending AnalysisBase sub-classes to Ensembles I wrote a class decorator function.

Example use:

        class BaseTest(AnalysisBase):
            def __init__(self, system: mda.Universe):
                super(BaseTest, self).__init__(system.trajectory)
                self.system = system

            def _prepare(self):
                self._res_arr = []

            def _single_frame(self):
                self._res_arr.append(len(self.system.select_atoms('not resname SOL')))
                assert self._res_arr[-1] == 42

            def _conclude(self):
                self.results = self._res_arr

        @ensemble_wrapper
        class EnsembleBaseTest(BaseTest):
            pass

        Sim = Ensemble(dirname=self.tmpdir.name, solvents=['water'])
        SolvCount = EnsembleBaseTest(Sim).run(stop=10)

It just adds new __init__, _prepare_ensemble. _conclude_system, _conclude_ensemble, and run methods that run the base class and collect the results. The prepare and concluded are defined so that the user can modify the data processing of results.

TODO

  • Write more robust tests

@ALescoulie ALescoulie self-assigned this Sep 24, 2021
@ALescoulie ALescoulie added this to the 0.8.0 milestone Sep 24, 2021
@ALescoulie ALescoulie linked an issue Sep 24, 2021 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Sep 24, 2021

Codecov Report

Merging #199 (9afc298) into develop (2088ae4) will increase coverage by 0.26%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #199      +/-   ##
===========================================
+ Coverage    78.93%   79.20%   +0.26%     
===========================================
  Files           12       12              
  Lines         1709     1731      +22     
  Branches       254      256       +2     
===========================================
+ Hits          1349     1371      +22     
  Misses         276      276              
  Partials        84       84              
Impacted Files Coverage Δ
mdpow/analysis/ensemble.py 96.98% <100.00%> (+0.31%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2088ae4...9afc298. Read the comment docs.

@ALescoulie ALescoulie marked this pull request as ready for review September 26, 2021 06:07
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a bunch of comments inline but my major issue is that the wrapper class has nothing to do with EnsembleAnalysisisinstance(wrapped_analysis, EnsembleAnalysis) will be False. This is not only confusing and a bit of a quick hack but it also leads to code duplication. It's problematic that the run() methods are different because as soon as you change it in EnsembleAnalysis you also must remember changing it in the wrapper. This becomes unmaintainable quickly.

Rethink your approach. Perhaps you can use this code as a basis to rewrite EnsembleAnalysis? Either way, there can be only one.

There can be only one! ensemble.EnsembleAnalysis


ensemble_wrapper Decorator
__________________________

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add more text here that shows a developer how to use it. Perhaps based on a real world example using some of the simpler MDAnalysis base classes?

There should be enough information for another developer (or future you) to learn how to use it. This means answers to the following questions:

  1. What problem is the code going to solve?
  2. What is the advantage of doing it this way?
  3. What do I need to provide?
  4. How do I do it?
  5. What are the limitations?



def ensemble_wrapper(cls):
"""A decorator for :class:`MDAnalysis.Universe <MDAnalysis.analysis.base.AnalysisBase>` subclasses modifying
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a long 1-line string – shorten

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and add this in the body

pass

Ens = Ensemble(dirname='mol_dir)
ExRun = Example(Ens)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd call the instance ex or something else in lower case for an instance.

Comment on lines +563 to +564
class Example(AnalysisBase):
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A pass class is too trivial, it does not show how you deal with AtomGroups and whatever else one might want to know. It just shows how to use a decorator. Less trivial example here and perhaps a proper example in the text outside the function.

Comment on lines +577 to +595
def _prepare_ensemble(self):
# Defined separately so user can modify behavior
self._results_dict = {x: None for x in self._ensemble.keys()}

def _conclude_system(self):
# Defined separately so user can modify behavior
self._results_dict[self._key] = self._SystemRun.results

def _conclude_ensemble(self):
self.results = self._results_dict

def run(self, start=0, stop=0, step=1):
self._prepare_ensemble()
for self._key in self._ensemble.keys():
self._SystemRun = self._Analysis(self._ensemble[self._key], *self._args, **self._kwargs)
self._SystemRun.run(start=start, step=step, stop=stop)
self._conclude_system()
self._conclude_ensemble()
return self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like code duplication – my bigger conceptual problem is that a wrapped AnalysisBase is not an instance of EnsembleAnalysis.

@@ -161,3 +162,29 @@ def test_value_error(self):
dh4 = ens.select_atoms('name C4 or name C17 or name S2 or name N3')
with pytest.raises(ValueError):
dh_run = DihedralAnalysis([dh1, dh2, dh4, dh3]).run(start=0, stop=4, step=1)

def test_ensemble_wrapper1(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the "1" ?


def _single_frame(self):
self._res_arr.append(len(self.system.select_atoms('not resname SOL')))
assert self._res_arr[-1] == 42
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there an assert in AnalysisBase?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want to say that you know that there are 42 atoms in the solute. However, this assert looks as if you wanted it to be part of the test so it's confusing. At a minimum, add a comment.

But perhaps you could just make it a more realistic analysis? People also look at tests to see how to use code so giving a good example here is useful. And you could use it for the docs as well.

class BaseTest(AnalysisBase):
def __init__(self, system: mda.Universe):
super(BaseTest, self).__init__(system.trajectory)
self.system = system
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not call it universe instead of system? That would be clearer.

@orbeckst orbeckst modified the milestones: 0.8.0, 1.0 Jan 3, 2022
@orbeckst
Copy link
Member

orbeckst commented Sep 8, 2022

In order to re-use MDAnalysis analysis classes, I suggest to change the EnsembleAnalysis run method so that it either just runs _single_Universe() or does the per-frame loop. The idea is that you can quickly re-use standard analysis classes by putting them into _single_Universe()

def _single_universe(self):
    analysis = MDAnalysis.analysis.foo.BarAnalysis(....)
    analysis.run(**kwargs)
    # store results right away
    self.result_dict[self._key[0]][self._key[1]][self._key[2]] = analysis.result

(We don't even need _conclude_universe().)

What do you think @ALescoulie ? Would this simplify things sufficiently without breaking any of the existing code?

cc @cadeduckworth

@ALescoulie
Copy link
Contributor Author

I think that could work, I could also do a runtime check to see if _single_frame is implemented, then call a different run function in each case. I could also write a function that returns a new EnsembleAnalysis subclass based on an MDAnalysis AnalysisBase subclass.

@orbeckst
Copy link
Member

orbeckst commented Sep 8, 2022

Maybe have not implemented methods raise NotImplementedError and try/except?

not sure if this is a good pattern…

@ALescoulie
Copy link
Contributor Author

Thats what I was thinking then at runtime calling either one of two different run methods, either one that works on each frame, or one that works on each Universe. It would be more maintainable than a decorator or function, but is a bit more complicated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hydrogen Bonding Analysis
2 participants